How to whitelist domains or URL path in CrowdSec to avoid false positives?
What is a WAF and what is CrowdSec?
A WAF (Web Application Firewall) is a security tool that monitors, filters, and blocks malicious HTTP traffic targeting web applications. It protects the web applications from attacks like SQL injections, cross-site scripting (XSS), and DDoS. It operates at OSI layer 7 (application) in a similar maner than a regular Firewall which operate at layer 4 (Transport) or 3 (Network). Both use rules to evaluate if a packet / a request should be allowed through.
CrowdSec is an open-source, collaborative WAF that analyzes logs web applications to detect and block malicious behaviour. In addition to allowing or blocking requests as they come, CrowdSec takes a more precautious approach: CrowdSec blocks the origin IP from request via the remediation components (e.g. Caddy). It effectively prevents the malicious IP from trying other probing or exploit activities. To take it to the next level all malicious IP are shared between CrowdSec users based on a reputation system. If more than n CrowdSec users see a malicious IP actively scanning and making malicious requests, there is a chance that IP will land in the global blocklist. In that case, the IP will be blocked by all CrowdSec users by default protecting everyone at once from a potential threat.
More on CrowdSec architecture :

CrowdSec extends protection by crowdsourcing threat data and automating responses across multiple systems.
You're saying false positives?
While this is great once I installed it on my homelab, I started seeing my getting IP banned while browsing my services... A WAF as always requires tuning and whitelisting because its generic security rules often trigger false positives (or false negatives). In my case it was more a false positive problem as my IP was getting banned for no reason. Most collections and detection scenarios in CrowdSec can be installed without tuning. Still, since every web application has unique behaviours I was getting banned on a specific services URL. For the record, here are the collections I installed crowdsecurity/caddy, crowdsecurity/appsec-virtual-patching, crowdsecurity/appsec-generic-rules, nothing too crazy.
Before resolving the false positive let me show you how I organized the crowsec container data. Here is my compose file (the ports are not open because I use docker networks to hide the traffic from other hosts... security !) :
crowdsec:
image: crowdsecurity/crowdsec:latest
restart: unless-stopped
environment:
- TZ=${TZ}
- COLLECTIONS=crowdsecurity/caddy crowdsecurity/appsec-virtual-patching crowdsecurity/appsec-generic-rules
- PUID=99
- PGID=100
#ports:
# Local API of CrowdSec, which:
# - Accepts decisions from bouncers (e.g., nginx, firewall bouncers).
# - Is queried by the CrowdSec agent or by third-party tools.
# - Can be used to interact with the API manually
#- 8083:8080/tcp
# Prometheus metrics
#- 6060:6060/tcp
# AppSec component of CrowdSec (log analysis)
# See see /etc/crowdsec/acquis.d/ for configuration
#- 7422:7422/tcp
volumes:
- $BASE_PATH/crowdsec/data/:/var/lib/crowdsec/data:rw # Data Directory
- $BASE_PATH/crowdsec/:/etc/crowdsec:rw # Configuration Directory
- $BASE_PATH/caddy/logs:/var/log/caddy:rw # Caddy logs to analyze:
networks:
- public-net
- monitoring-netResolving false positives!
Let's take the example of a false positive I experienced using Karakeep, my IP was getting banned from browsing all my bookmarks. Browsing was spamming the service with HTTP GET requests which CrowdSec wrongly identified as a probing behaviour.
The scenario messing with Karakeep is the http-crawl-non_statics one. You can identify the one messing with your instance by looking at the decisions (basically the bans). Use the command from inside CrowdSec container cscli decision list you will see your IP in there and the reason (scenario responsible) for the ban. You will see something like:
ID │ Source │ Scope:Value │ Reason │ Action │ Country │ AS │ Events │ expiration │ Alert ID │
├──────────┼──────────┼───────────────────┼───────────────────────────────────────┼────────┼─────────┼──────────────────────────────────┼────────┼────────────┼──────────┤
│ 14157191 │ crowdsec │ Ip:91.239.157.XX │ crowdsecurity/http-probing │ ban │ DE │ 62240 Clouvider Limited │ 11 │ 3h30m15s │ 5657 The temporary fix...
While you can remove the decision using the command cscli decision remove --id <your_decision_id_here> it only resolves the issue temporarily. Browsing the same URL will be again seen as a malicious behaviour by CrowdSec. To permanently resolve the issue, we need to change CrowdSec interpretation of a malicious behaviour.
The big guns fix...
All decisions in CrowdSec are the result of a scenario having identified a behaviour as malicious. You can see above the behaviour was identified as malicious by crowdsecurity/http-probing scenario.
Let's tweak the scenario! CrowdSec container will replace the scenario you installed from the webUI with one you provided to the container directly inside /etc/crowdsec/scenario. My CrowdSec docker config has the following binding: $BASE_PATH/crowdsec/:/etc/crowdsec.
You can add a YAML file to override the current http-crawl-non_statics scenario inside $BASE_PATH/crowdsec/scenario to exclude the Karakeep domain. You can check is the scenario is overwritten using cscli scenario list. You will see something like 🏠 enabled,local.
I knew evt.Meta.target_fqdn was the field where the URL was located using the command line tail -n 5 /var/log/caddy/<karakeep-log-file.log> | cscli explain --type caddy -f - -v. Check the line evt.Parsed.static_ressource.
I then modified the filter line to omit the request on GET and HEAD request specific to the domain where I had issues karakeep.yourdomain.com. Check the line starting with filter...
type: leaky
name: crowdsecurity/http-crawl-non_statics
description: "Detect aggressive crawl on non static resources"
#cscli explain --log '....' -t caddy -v
filter: "evt.Meta.log_type in ['http_access-log', 'http_error-log'] && evt.Parsed.static_ressource == 'false' && evt.Parsed.verb in ['GET', 'HEAD'] && evt.Meta.target_fqdn != 'karakeep.yourdomain.com'"
distinct: "evt.Parsed.file_name"
leakspeed: 0.5s
capacity: 40
...
# Rest of the value are left as defaultWhile this works it's disabling the scenario on the entire domain. Doing so whitelists more that what is needed. For example, the login page will also be whitelisted against this behaviour which is not necessary.
The better fix...
Let's see another way to accomplish the same result, the real way CrowdSec does whitelists.
One benefit of this whitelist approach is that we can selectively bypass problematic paths rather than turning off the scenario http-crawl-non-statics for the whole domain. You just have to add the following file in the /etc/crowdsec/parsers/s02-enrich/ (look at the docker mapping above)
name: my/karakeep-whitelists
description: "Whitelist false positives from Karakeep"
filter: "evt.Meta.service == 'http' && evt.Meta.log_type in ['http_access-log', 'http_error-log']"
whitelist:
reason: "Whitelist false positives from Karakeep"
expression:
- evt.Meta.target_fqdn == 'karakeep.yourdomain.com' && evt.Meta.http_verb == 'GET' && evt.Meta.http_status == '200' && evt.Parsed.request contains '/dashboard/preview'
- evt.Meta.target_fqdn == 'karakeep.yourdomain.com' && evt.Meta.http_verb == 'GET' && evt.Meta.http_status == '200' && evt.Parsed.request contains '/dashboard/tags'Now the successful HTTP requests (GET 200) on my Karakeep service will be discarded at parsing time.
Running the command tail -n 5 /var/log/caddy/<karakeep-log-file.log> | cscli explain --type caddy -f - -v will give you something like this confirming the whitelist works as intended:
| ├ 🟢 my/karakeep-whitelists (~2 [whitelisted])
| ├ update evt.Whitelisted : %!s(bool=false) -> true
| ├ update evt.WhitelistReason : -> Whitelist false positives from Karakeep
| ├ 🟢 crowdsecurity/public-dns-allowlist (unchanged)
| ├ 🟢 my/requests-whitelists (unchanged)
| └ 🟢 crowdsecurity/whitelists (unchanged)
└-------- parser success, ignored by whitelist (Whitelist false positives from Karakeep) 🟢While this approach is the best from a security perspective (we only whitelist what is needed) it also has its drawback. Now, the whitelist is really specific to the application, if an endpoint changes in a newer Karakeep version the whitelist might not be effective anymore and the false positives will come back.
