~/portfolio/ blog/ wazuh-tuning

Tuning Wazuh to stop crying wolf.

A 4-week project to cut SIEM alert noise by 70%. The difference between rule volume and signal — and how to write decoders that don't match everything.

The problem: 1,200 alerts a day

When I inherited the Wazuh deployment, the dashboard showed over 1,200 alerts per day. The team had learned to ignore the SIEM. When something real came in, it looked identical to the 40 false positives around it. The system was technically running — just not actually detecting anything useful.

The goal: reduce noise so that the alerts that do fire are worth reading. Four weeks, no new hardware, no Wazuh upgrade. Just rule and decoder work.

1,200
alerts / day before
360
alerts / day after
70%
noise reduction

How Wazuh rules actually work

Wazuh processes log events through a pipeline: decoders parse raw log lines into structured fields, then rules match against those fields and fire alerts. Rules can reference parent rules, group related events, and override each other via overwrite.

Most out-of-the-box noise comes from three places:

  • Overly broad decoders that match too many log sources at once
  • Rules with no frequency threshold that fire on every single occurrence
  • Duplicate rule paths where the same event triggers 3–4 rules in sequence

Phase 1 — audit what's actually firing

Before changing anything, I needed to know which rules were generating the most volume. Wazuh's built-in dashboards in Kibana/OpenSearch show this, but pulling a raw count from the alerts log is faster:

// alert volume by rule ID
$ grep '"rule"' /var/ossec/logs/alerts/alerts.json \ | python3 -c "import sys,json,collections; \ c=collections.Counter(json.loads(l)['rule']['id'] \ for l in sys.stdin); \ [print(v,k) for k,v in c.most_common(20)]" # top 20 noisy rules by fire count

The top 5 rule IDs accounted for 60% of all alerts. Three were syslog authentication rules firing on routine sudo usage. One was a web server access log rule with no frequency gate. One was a Windows event rule matching every user logon.

Phase 2 — local rule overrides

Wazuh's default rules live in /var/ossec/ruleset/rules/. You never edit these — upgrades overwrite them. The right place is /var/ossec/etc/rules/local_rules.xml, where you can use overwrite="yes" to replace a default rule's behaviour.

// local_rules.xml — frequency gate
<!-- Fire only after 10 occurrences in 120 seconds --> <rule id="5503" level="5" frequency="10" timeframe="120" overwrite="yes"> <if_matched_sid>5501</if_matched_sid> <description>sshd: brute force attempt</description> </rule>

Adding frequency="10" timeframe="120" to the SSH auth failure rule collapsed hundreds of daily alerts into a handful of genuine brute-force detections.

Phase 3 — decoder tightening

Two decoders were matching log sources they shouldn't. A generic syslog decoder was picking up application logs that had their own specific decoder. The fix is explicit program_name matching so the generic decoder only runs when nothing else claims the log line first.

// local_decoders.xml — targeted match
<decoder name="nginx-access"> <program_name>nginx</program_name> <prematch>^\S+ \S+ \S+ \[</prematch> <regex>(\S+) \S+ \S+ \[(\S+)\] "(\w+) (\S+).*" (\d+)</regex> <order>srcip, timestamp, method, url, status</order> </decoder>

Results and what changed

After four weeks of iterative tuning, daily alert volume dropped from 1,200 to around 360. More importantly, the signal-to-noise ratio flipped — the team started reading alerts again because firing an alert actually meant something.

The rule: if an alert fires more than 50 times in a day and zero of those firings led to a triage action, it's noise. Write a frequency gate or raise the level threshold until it's useful or silent.
  • Frequency gates on authentication rules reduced sudo/logon noise by 85%
  • Decoder scoping eliminated double-firing on nginx access logs
  • Rule level adjustment pushed informational events below the dashboard threshold
  • Group-based suppression collapsed related alerts (failed login → brute force) into single parent events
← sqli labs next: iso 27001 policies →