Parsing and Triaging CSP Violation Reports

This guide is part of the CSP violation reporting and monitoring reference and covers what to do after the reports arrive: parsing the two payload shapes, normalizing their fields, deduplicating the flood, and deciding which violations are attacks, which are policy gaps, and which are pure noise. Raw Content-Security-Policy reports are not directly actionable — they arrive in two incompatible JSON layouts, repeat thousands of times for one root cause, and bury genuine injection attempts under browser-extension chatter. Parsing means turning that stream into a deduplicated, ranked list keyed on (document-uri, violated-directive, blocked-uri).

Configuration Syntax & Exact Values

The two delivery mechanisms produce two different payload shapes. Your parser must handle both.

A legacy report-uri delivery — Content-Type: application/csp-report, a single object:

{
  "csp-report": {
    "document-uri": "https://example.com/checkout",
    "referrer": "https://example.com/cart",
    "violated-directive": "script-src-elem",
    "effective-directive": "script-src-elem",
    "original-policy": "default-src 'self'; script-src 'self'; report-uri /csp",
    "blocked-uri": "https://cdn.thirdparty.example/widget.js",
    "disposition": "report",
    "status-code": 200,
    "line-number": 42,
    "column-number": 8,
    "source-file": "https://example.com/checkout"
  }
}

A modern report-to delivery — Content-Type: application/reports+json, a JSON array:

[
  {
    "age": 240,
    "type": "csp-violation",
    "url": "https://example.com/checkout",
    "user_agent": "Mozilla/5.0 …",
    "body": {
      "documentURL": "https://example.com/checkout",
      "referrer": "https://example.com/cart",
      "effectiveDirective": "script-src-elem",
      "originalPolicy": "default-src 'self'; script-src 'self'; report-to csp-endpoint",
      "blockedURL": "https://cdn.thirdparty.example/widget.js",
      "disposition": "report",
      "statusCode": 200,
      "lineNumber": 42,
      "columnNumber": 8,
      "sourceFile": "https://example.com/checkout"
    }
  }
]

Field-by-field, the four fields that drive every triage decision:

For where these directives are emitted and the full report-uri vs report-to mechanics, see the CSP violation reporting reference.

CSP violation triage decision tree A decision tree routing an incoming report by its blocked-uri value into noise, policy gap, or attack-investigation buckets. Incoming report read blocked-uri extension scheme? yes Noise drop / count only no inline / eval / data:? your own code yes Policy gap add nonce / hash no Unknown host investigate as attack alert + dedupe by key
Route each report by blocked-uri: extension schemes are noise, inline/eval/data: are your own policy gaps, and unknown hosts warrant an attack investigation.

Server-Side Configuration

Node receiver that parses and dedupes

const crypto = require('crypto');

const NOISE = /^(chrome|moz|safari)-extension:|^about$|^null$|^$/;
const seen = new Map(); // key -> { count, first, last }

function normalize(rec) {
  const b = rec.body || rec['csp-report'] || rec; // report-to nests under body
  return {
    documentUri: (b.documentURL || b['document-uri'] || '').split('?')[0],
    directive: (b.effectiveDirective || b['violated-directive'] || '').split(' ')[0],
    blockedUri: b.blockedURL || b['blocked-uri'] || '',
    disposition: b.disposition || 'enforce',
  };
}

function ingest(payload) {
  const recs = Array.isArray(payload) ? payload : [payload];
  for (const rec of recs) {
    const v = normalize(rec);
    if (NOISE.test(v.blockedUri)) continue;            // discard extension/proxy noise
    const key = crypto
      .createHash('sha1')
      .update(`${v.documentUri}|${v.directive}|${v.blockedUri}`)
      .digest('hex');
    const hit = seen.get(key);
    if (hit) { hit.count++; hit.last = Date.now(); }   // dedupe: bump count, do not re-store
    else seen.set(key, { ...v, count: 1, first: Date.now(), last: Date.now() });
  }
}

The dedupe key is sha1(document-uri + violated-directive + blocked-uri) with the query string stripped — one row per root cause, with a count, instead of one row per browser event.

Python receiver

import hashlib
import re

NOISE = re.compile(r"^(chrome|moz|safari)-extension:|^about$|^null$|^$")
seen: dict[str, dict] = {}

def normalize(rec: dict) -> dict:
    b = rec.get("body") or rec.get("csp-report") or rec
    return {
        "document_uri": (b.get("documentURL") or b.get("document-uri") or "").split("?")[0],
        "directive": (b.get("effectiveDirective") or b.get("violated-directive") or "").split(" ")[0],
        "blocked_uri": b.get("blockedURL") or b.get("blocked-uri") or "",
        "disposition": b.get("disposition", "enforce"),
    }

def ingest(payload):
    records = payload if isinstance(payload, list) else [payload]
    for rec in records:
        v = normalize(rec)
        if NOISE.match(v["blocked_uri"]):
            continue
        key = hashlib.sha1(
            f"{v['document_uri']}|{v['directive']}|{v['blocked_uri']}".encode()
        ).hexdigest()
        if key in seen:
            seen[key]["count"] += 1
        else:
            seen[key] = {**v, "count": 1}

Sample aggregation query

Once normalized rows land in a table (csp_violations(document_uri, directive, blocked_uri, disposition, ts)), rank root causes:

SELECT document_uri, directive, blocked_uri, disposition,
       COUNT(*) AS hits,
       MIN(ts)  AS first_seen,
       MAX(ts)  AS last_seen
FROM csp_violations
WHERE ts > NOW() - INTERVAL '7 days'
  AND blocked_uri NOT LIKE '%-extension:%'   -- exclude residual noise
  AND blocked_uri NOT IN ('inline', 'eval', 'about', '')
GROUP BY document_uri, directive, blocked_uri, disposition
ORDER BY hits DESC
LIMIT 50;

This collapses the flood into the top distinct violations, sorted by frequency — the worklist for tightening the policy.

Diagnostic & Verification Steps

Feed a known payload through the parser and confirm the normalized output.

curl -s -X POST -H 'Content-Type: application/reports+json' \
  -d '[{"type":"csp-violation","age":0,"url":"https://example.com/checkout","body":{"documentURL":"https://example.com/checkout?sid=abc","effectiveDirective":"script-src-elem","blockedURL":"https://cdn.thirdparty.example/widget.js","disposition":"report"}}]' \
  https://reports.example.com/csp -o /dev/null -w '%{http_code}\n'

Expected output: 204.

Expected normalized record after parsing (query string stripped, directive reduced to bare name):

{
  "documentUri": "https://example.com/checkout",
  "directive": "script-src-elem",
  "blockedUri": "https://cdn.thirdparty.example/widget.js",
  "disposition": "report",
  "count": 1
}

Posting the same payload again must leave one row with "count": 2, confirming dedupe works rather than producing a second row.

Edge Cases, Security Implications & Safe Rollback

Safe rollback (non-destructive): parsing and storage are downstream of a Report-Only header, so disabling the parser or the storage write only stops telemetry — it cannot affect page rendering or the enforced policy. To halt ingestion under a flood, return 204 and drop the body before ingest(); re-enable once rate-limiting is in place.

Frequently Asked Questions

How do I dedupe reports that describe the same root cause? Hash document-uri (query string stripped) + the bare violated-directive + blocked-uri, and keep a count per hash instead of one row per event. One broken third-party script then becomes a single high-count row rather than thousands of duplicates.

Why does blocked-uri only show a hostname with no path? Browsers truncate cross-origin blocked URLs to scheme + host (or just the scheme) to avoid leaking the target path. This is expected; group on the truncated value and do not treat the missing path as anomalous.

Are inline and eval violations attacks? Usually not — they are your own inline scripts/styles or eval() usage that the policy correctly refuses. They are policy gaps to fix with nonces or hashes, not with 'unsafe-inline'. A surge of them from a single page after a deploy is far more likely a regression than an attack.

Can violation reports contain personal data? Yes. URLs in document-uri, referrer, and blocked-uri may carry session or reset tokens and other identifiers in query strings. Strip queries on ingest, redact known token params, and apply log-grade retention and access controls to the report store.

Conclusion

Roll the parser out the same way you roll out the policy: receive into a staging store first, confirm both payload shapes normalize to one record and that re-posts increment a count rather than duplicate, then enable noise filtering and dedupe, and only then point production reporting at it. Promote to enforcement-disposition alerting last, once the Report-Only stream is quiet and every recurring blocked-uri is explained.