Skip to content

Python pickle: never deserialize untrusted data¶

Summary¶

pickle is not a safe data format. Deserializing attacker-controlled pickle payloads can lead to:

Remote Code Execution (RCE) (common outcome)
unexpected side effects even when “RCE keywords” are filtered
arbitrary file creation / filesystem tampering in some gadget chains

Blocklists (“deny os.system”, “deny subprocess”) are routinely bypassed.

Durable guidance¶

1) Policy: treat pickle as code¶

Do not accept pickle payloads from:
users
network clients
queues/topics not fully controlled
plugins / extensions
CI artifacts from untrusted repos

If you cannot strongly prove the source is trusted, it’s untrusted.

2) Prefer safe serialization formats¶

Use formats designed for untrusted data:

JSON (with schema validation)
MessagePack (with explicit type handling)
Protocol Buffers / Avro / Thrift (strong typing)

Store only primitive data types; avoid “object graphs”.

3) If you must load pickle (rare), sandbox hard¶

If you are forced to load pickle for legacy reasons:

require integrity + authenticity:
signatures (e.g., Ed25519) or HMAC with rotation
explicit key management policy
load in a highly constrained environment:
separate service/container
read-only filesystem
no network egress
seccomp/AppArmor, low privileges
implement defense-in-depth:
allowlist types (custom restricted unpickler)
audit logs + anomaly detection

Even then, assume bypasses exist.

4) Don’t “filter” pickle payloads¶

String scanning is not a control.
Gadget chains can trigger side effects without obvious keywords.