Parsing and Triaging CSP Violation Reports

Q: How do I dedupe reports that describe the same root cause?

Hash document-uri with the query string stripped plus the bare violated-directive plus blocked-uri, and keep a count per hash instead of one row per event. One broken script becomes a single high-count row rather than thousands of duplicates.

Q: Why does blocked-uri only show a hostname with no path?

Browsers truncate cross-origin blocked URLs to scheme plus host, or just the scheme, to avoid leaking the target path. This is expected; group on the truncated value and do not treat the missing path as anomalous.

Q: Are inline and eval violations attacks?

Usually not. They are your own inline scripts or eval usage that the policy correctly refuses, and are policy gaps to fix with nonces or hashes rather than unsafe-inline. A surge from one page after a deploy is more likely a regression than an attack.

Q: Can violation reports contain personal data?

Yes. URLs in document-uri, referrer, and blocked-uri may carry session or reset tokens and other identifiers in query strings. Strip queries on ingest, redact known token params, and apply log-grade retention and access controls to the report store.

This guide is part of the CSP violation reporting and monitoring reference and covers what to do after the reports arrive: parsing the two payload shapes, normalizing their fields, deduplicating the flood, and deciding which violations are attacks, which are policy gaps, and which are pure noise. Raw Content-Security-Policy reports are not directly actionable — they arrive in two incompatible JSON layouts, repeat thousands of times for one root cause, and bury genuine injection attempts under browser-extension chatter. Parsing means turning that stream into a deduplicated, ranked list keyed on (document-uri, violated-directive, blocked-uri).

Configuration Syntax & Exact Values

The two delivery mechanisms produce two different payload shapes. Your parser must handle both.

A legacy report-uri delivery — Content-Type: application/csp-report, a single object:

{
  "csp-report": {
    "document-uri": "https://example.com/checkout",
    "referrer": "https://example.com/cart",
    "violated-directive": "script-src-elem",
    "effective-directive": "script-src-elem",
    "original-policy": "default-src 'self'; script-src 'self'; report-uri /csp",
    "blocked-uri": "https://cdn.thirdparty.example/widget.js",
    "disposition": "report",
    "status-code": 200,
    "line-number": 42,
    "column-number": 8,
    "source-file": "https://example.com/checkout"
  }
}

A modern report-to delivery — Content-Type: application/reports+json, a JSON array:

[
  {
    "age": 240,
    "type": "csp-violation",
    "url": "https://example.com/checkout",
    "user_agent": "Mozilla/5.0 …",
    "body": {
      "documentURL": "https://example.com/checkout",
      "referrer": "https://example.com/cart",
      "effectiveDirective": "script-src-elem",
      "originalPolicy": "default-src 'self'; script-src 'self'; report-to csp-endpoint",
      "blockedURL": "https://cdn.thirdparty.example/widget.js",
      "disposition": "report",
      "statusCode": 200,
      "lineNumber": 42,
      "columnNumber": 8,
      "sourceFile": "https://example.com/checkout"
    }
  }
]

Field-by-field, the four fields that drive every triage decision:

blocked-uri / blockedURL — what the browser refused to load or execute. The most important triage field. A real URL points at a resource (third-party script, image, font); literal tokens inline, eval, wasm-eval mean inline code or eval() was blocked; data: means a data URI; chrome-extension:/moz-extension: means an extension. Browsers truncate cross-origin values to just the scheme + host (or just the scheme) for privacy, so do not expect a full path on cross-origin blocks.
violated-directive / effectiveDirective — which directive failed. Legacy reports send violated-directive (sometimes with the source list appended, e.g. script-src 'self'); modern reports send effectiveDirective as the resolved directive only (e.g. script-src-elem). Normalize to the bare directive name for grouping.
document-uri / documentURL — where it happened: the page URL that held the violating content. This is your grouping key for “which page is broken.” Strip query strings before grouping to avoid fragmenting one issue across many URLs.
disposition — enforce (the resource was actually blocked) or report (Report-Only; nothing was blocked). Route the two to separate queues: report is policy authoring; enforce is production breakage or a live attack.

For where these directives are emitted and the full report-uri vs report-to mechanics, see the CSP violation reporting reference.

Route each report by blocked-uri: extension schemes are noise, inline/eval/data: are your own policy gaps, and unknown hosts warrant an attack investigation.

Server-Side Configuration

Node receiver that parses and dedupes

const crypto = require('crypto');

const NOISE = /^(chrome|moz|safari)-extension:|^about$|^null$|^$/;
const seen = new Map(); // key -> { count, first, last }

function normalize(rec) {
  const b = rec.body || rec['csp-report'] || rec; // report-to nests under body
  return {
    documentUri: (b.documentURL || b['document-uri'] || '').split('?')[0],
    directive: (b.effectiveDirective || b['violated-directive'] || '').split(' ')[0],
    blockedUri: b.blockedURL || b['blocked-uri'] || '',
    disposition: b.disposition || 'enforce',
  };
}

function ingest(payload) {
  const recs = Array.isArray(payload) ? payload : [payload];
  for (const rec of recs) {
    const v = normalize(rec);
    if (NOISE.test(v.blockedUri)) continue;            // discard extension/proxy noise
    const key = crypto
      .createHash('sha1')
      .update(`${v.documentUri}|${v.directive}|${v.blockedUri}`)
      .digest('hex');
    const hit = seen.get(key);
    if (hit) { hit.count++; hit.last = Date.now(); }   // dedupe: bump count, do not re-store
    else seen.set(key, { ...v, count: 1, first: Date.now(), last: Date.now() });
  }
}

The dedupe key is sha1(document-uri + violated-directive + blocked-uri) with the query string stripped — one row per root cause, with a count, instead of one row per browser event.

Python receiver

import hashlib
import re

NOISE = re.compile(r"^(chrome|moz|safari)-extension:|^about$|^null$|^$")
seen: dict[str, dict] = {}

def normalize(rec: dict) -> dict:
    b = rec.get("body") or rec.get("csp-report") or rec
    return {
        "document_uri": (b.get("documentURL") or b.get("document-uri") or "").split("?")[0],
        "directive": (b.get("effectiveDirective") or b.get("violated-directive") or "").split(" ")[0],
        "blocked_uri": b.get("blockedURL") or b.get("blocked-uri") or "",
        "disposition": b.get("disposition", "enforce"),
    }

def ingest(payload):
    records = payload if isinstance(payload, list) else [payload]
    for rec in records:
        v = normalize(rec)
        if NOISE.match(v["blocked_uri"]):
            continue
        key = hashlib.sha1(
            f"{v['document_uri']}|{v['directive']}|{v['blocked_uri']}".encode()
        ).hexdigest()
        if key in seen:
            seen[key]["count"] += 1
        else:
            seen[key] = {**v, "count": 1}

Sample aggregation query

Once normalized rows land in a table (csp_violations(document_uri, directive, blocked_uri, disposition, ts)), rank root causes:

SELECT document_uri, directive, blocked_uri, disposition,
       COUNT(*) AS hits,
       MIN(ts)  AS first_seen,
       MAX(ts)  AS last_seen
FROM csp_violations
WHERE ts > NOW() - INTERVAL '7 days'
  AND blocked_uri NOT LIKE '%-extension:%'   -- exclude residual noise
  AND blocked_uri NOT IN ('inline', 'eval', 'about', '')
GROUP BY document_uri, directive, blocked_uri, disposition
ORDER BY hits DESC
LIMIT 50;

This collapses the flood into the top distinct violations, sorted by frequency — the worklist for tightening the policy.

Diagnostic & Verification Steps

Feed a known payload through the parser and confirm the normalized output.

curl -s -X POST -H 'Content-Type: application/reports+json' \
  -d '[{"type":"csp-violation","age":0,"url":"https://example.com/checkout","body":{"documentURL":"https://example.com/checkout?sid=abc","effectiveDirective":"script-src-elem","blockedURL":"https://cdn.thirdparty.example/widget.js","disposition":"report"}}]' \
  https://reports.example.com/csp -o /dev/null -w '%{http_code}\n'

Expected output: 204.

Expected normalized record after parsing (query string stripped, directive reduced to bare name):

{
  "documentUri": "https://example.com/checkout",
  "directive": "script-src-elem",
  "blockedUri": "https://cdn.thirdparty.example/widget.js",
  "disposition": "report",
  "count": 1
}

Posting the same payload again must leave one row with "count": 2, confirming dedupe works rather than producing a second row.

Edge Cases, Security Implications & Safe Rollback

Browser-extension noise — chrome-extension:, moz-extension:, safari-extension:, and bare inline/eval from injected extension code dominate public-site volume. Drop or count-only these; never widen the policy to silence them, because that re-opens the injection hole they masquerade as.
data: and inline blocked-uri — these are your own code, not third parties: an inline <script>/<style> with no nonce, or a data: image/font. They are policy gaps, not attacks. Fix by adopting per-request nonces or hashes, not by adding 'unsafe-inline'.
False positives from truncated values — browsers report cross-origin blocks as scheme-or-host only (https://cdn.thirdparty.example with no path). Group on what you have; do not treat the missing path as suspicious.
PII in reports — document-uri, referrer, and occasionally blocked-uri can carry session tokens, password-reset tokens, or email addresses in query strings. Strip query strings on ingest (as the parsers above do), redact known token parameters before storage, and apply the same retention limits you apply to access logs. Treat the report store as containing user data.
Mixed schemas in one stream — during a report-uri/report-to transition you receive both shapes simultaneously; the normalize() step that checks both field names is mandatory, not optional.

Safe rollback (non-destructive): parsing and storage are downstream of a Report-Only header, so disabling the parser or the storage write only stops telemetry — it cannot affect page rendering or the enforced policy. To halt ingestion under a flood, return 204 and drop the body before ingest(); re-enable once rate-limiting is in place.

Frequently Asked Questions

How do I dedupe reports that describe the same root cause? Hash document-uri (query string stripped) + the bare violated-directive + blocked-uri, and keep a count per hash instead of one row per event. One broken third-party script then becomes a single high-count row rather than thousands of duplicates.

Why does blocked-uri only show a hostname with no path? Browsers truncate cross-origin blocked URLs to scheme + host (or just the scheme) to avoid leaking the target path. This is expected; group on the truncated value and do not treat the missing path as anomalous.

Are inline and eval violations attacks? Usually not — they are your own inline scripts/styles or eval() usage that the policy correctly refuses. They are policy gaps to fix with nonces or hashes, not with 'unsafe-inline'. A surge of them from a single page after a deploy is far more likely a regression than an attack.

Can violation reports contain personal data? Yes. URLs in document-uri, referrer, and blocked-uri may carry session or reset tokens and other identifiers in query strings. Strip queries on ingest, redact known token params, and apply log-grade retention and access controls to the report store.

Conclusion

Roll the parser out the same way you roll out the policy: receive into a staging store first, confirm both payload shapes normalize to one record and that re-posts increment a count rather than duplicate, then enable noise filtering and dedupe, and only then point production reporting at it. Promote to enforcement-disposition alerting last, once the Report-Only stream is quiet and every recurring blocked-uri is explained.

CSP violation reporting and monitoring — the parent reference for the reporting pipeline and receivers.
Migrating CSP from report-uri to report-to — the directive migration that produces the two payload shapes.
Generating CSP nonces per request — the fix for inline-script policy gaps surfaced by triage.