Skip to content

Untrusted XML parsing hardening (Node.js + general)

When you parse XML you didn't generate, treat it like a code-adjacent input format: it can be used for DoS, data exfiltration, and in some ecosystems, even RCE.

This page is a quick checklist of durable defenses that repeatedly show up in real-world advisories.

1) Prefer “safe mode”: disable entity expansion and DTDs

If your parser supports it, default to:

  • Disable DTD parsing (blocks most XXE-style issues)
  • Disable external entity resolution (prevents SSRF / local file reads)
  • Disable entity expansion (reduces “billion laughs” / amplification DoS risk)

If you must support entities:

  • Set strict limits (depth, total expansions, total output size)
  • Reject documents with a DTD unless explicitly required

2) Always treat parse as a failure-prone operation

A parse error should be a normal outcome for untrusted inputs.

  • Wrap parsing in try/catch.
  • Return a generic error (e.g., 400) without leaking internals.
  • Ensure you don’t crash the whole process (especially important in Node.js).

Why this matters (example)

A class of vulnerabilities is simply: “uncaught exception = remote DoS.”

For example, some XML parsers convert numeric entities into code points. If an out-of-range code point is accepted by a regex and then passed to String.fromCodePoint(), Node will throw a RangeError. If the library doesn’t catch it, your service crashes.

Mitigation: catch exceptions at the boundary (request handler / ingestion pipeline) even if you trust the library.

3) Enforce input limits before parsing

Before the parser runs:

  • Cap request body size (Content-Length and streaming limits)
  • Cap decompressed size (zip/gzip bombs)
  • Apply timeouts and per-request CPU budgets (where possible)

If you accept XML via file upload, also:

  • Verify file type and encoding
  • Normalize line endings and reject invalid encodings early

4) Do not parse XML in privileged contexts

  • Run parsers in a least-privileged runtime/container.
  • Keep file system access minimal.
  • No ambient cloud credentials.

If possible, parse in an isolated worker (separate process) so a crash doesn’t take down the API.

5) Log for triage, but don’t leak

  • Log: parser errors, size limits hit, DTD/entity usage detected.
  • Don’t return: stack traces, internal parser error strings, file paths.

6) Dependency hygiene for parsers

XML parsers tend to be “deep dependency” components.

  • Track them explicitly in SBOM / dependency inventory.
  • Patch fast on parser advisories.
  • Consider pinning versions and using Renovate/Dependabot with security PR auto-merge (with tests).

Example advisory

  • fast-xml-parser RangeError DoS when parsing out-of-range numeric entities (CVE-2026-25128 / GHSA-37qj-frw5-hhjh)
  • Fix: upgrade to fast-xml-parser v5.3.4 (or later)
  • Defense-in-depth: even with patched versions, keep a try/catch boundary so malformed XML can’t crash the whole Node process.

Quick checklist

  • DTD disabled (or strictly gated)
  • External entities disabled
  • Entity expansion disabled or strictly limited
  • Body size limits
  • Decompression limits
  • Parse wrapped in try/catch (no process crash)
  • Parsing isolated / least privilege
  • Observability: log blocks + anomalies
  • Rapid patch process for parser advisories