PQC Threat Models: 'Harvest Now, Decrypt Later' in Real Systems

Monthly research note. Theme: Post-Quantum Cryptography & Migration.

TL;DR

A focused memo on PQC Threat Models: 'Harvest Now, Decrypt Later' in Real Systems: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Migration is mixed-version for years: compatibility and rollback are security features.
Hybrid composition must be explicit and transcript-bound to resist downgrade.
PQC changes handshake costs; plan DoS defenses and budgets.
Prefer protocols and APIs that make invalid states hard to express.
Define safety properties before performance goals.

Why this matters

PQC changes bandwidth and CPU costs; DoS surfaces move.
Hybrid designs fail if binding is ambiguous (mix-and-match, downgrade).
Interop is the real risk: multiple stacks, vendors, and versions.
Operationalization (monitoring, rollback) determines success more than crypto choice.

Key questions

Which parts must be constant-time, and how will you validate that?
How do you bind hybrid secrets to prevent downgrade and mix-and-match attacks?
What does interoperability testing look like across vendors and stacks?
What telemetry proves PQC is working (not just enabled)?
How do you handle failures: decryption failures, invalid ciphertexts, malformed keys?
Which secrets require long-term confidentiality (HNDL) and where are they today?

Assumptions

Vendors vary: implementations and defaults differ.
Side channels exist: timing and cache behavior leak information.
Bandwidth is limited in some environments; larger handshakes matter.
Deployments are mixed; old clients must interoperate or fail safely.

Non-goals

Treating migration as a single flag flip.
Assuming PQC is “drop-in” without changing operational processes.

Attack surface

Any unbounded work per request becomes a DoS primitive under adversaries.

Model & invariants

A KEM gives you shared secrets without discrete-log assumptions:

(\mathrm{pk},\mathrm{sk})\leftarrow \mathrm{KeyGen}();\ (\mathrm{ct},\mathrm{ss})\leftarrow \mathrm{Enc}(\mathrm{pk});\ \mathrm{ss}\leftarrow \mathrm{Dec}(\mathrm{sk},\mathrm{ct}).

Treat algorithm negotiation as adversarial: explicit downgrade resistance.

Make costs explicit: measure CPU and bandwidth, then add protections.

Invariant

Monotonicity beats timestamps: counters and epochs survive clock skew.

Security properties

Downgrade resistance: negotiation can’t silently weaken security posture.
Evidence: critical actions emit verifiable audit events.
Least authority: privileges are scoped by purpose and time.
Integrity: invalid transitions are rejected (and detectable).

Failure modes

Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Timeout ambiguity causing double-apply or partial state transitions.
Observability gaps during incidents (missing evidence).
Recovery paths that only work when nothing is broken.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

sequenceDiagram
  participant A as Initiator
  participant B as Responder
  A->>B: classical_keyshare + pqc_pk
  B-->>A: classical_keyshare + pqc_ct + sig
  A-->>B: sig
  Note over A,B: ss = HKDF(ss_classical || ss_pqc, transcript)

Implementation notes

Explicit binding prevents downgrade and mix-and-match. Don’t leave it implicit.

Rule of thumb

If you can’t explain a timeout outcome, you can’t make retries safe.

Hybrid handshake checklist:
- Explicit negotiation (no silent downgrade)
- Transcript-bound KDF
- DoS protections (rate limits, cookies, puzzles)
- Constant-time operations
- Telemetry: which mode, which failures, which clients

Verification strategy

Downgrade tests: active attacker manipulates negotiation.
Interop matrices across vendors/versions and failure modes.
Side-channel tests where tooling exists; constant-time audits.
Chaos deploys: mixed versions + rollback during partial outages.
DoS tests: measure CPU/bandwidth amplification and mitigation impact.

Operational notes

Roll out with canaries and explicit rollback triggers.
Add telemetry for negotiation outcomes, failures, and client cohorts.
Inventory long-lived secrets and migrate the highest-risk first.
Cap handshake cost per peer/IP; use stateless cookies when needed.
Document supported algorithm sets and deprecation timelines.

Operational note

Make degraded modes explicit: fail closed vs fail open is a policy choice.

What to monitor

Admission-control / rate-limit rejections (by reason).
Retry/timeout rates by endpoint and client cohort.
Invariant violation rate (should be ~0).
Error budget burn + tail latency under load.
Authz failures and policy denials (unexpected spikes).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Use canaries and staged rollout; stop early when signals degrade.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Keep dual-write / dual-verify windows where appropriate.

Evidence

RFC 5869: HKDF (1) — Useful when discussing hybrid binding and context separation.
- Evidence: HKDF is the workhorse for domain separation; bind purpose/context to avoid cross-protocol key reuse.
NIST Post-Quantum Cryptography Project (2) — Standardization process and algorithm selections.
- Evidence: Treat PQ migration as a program (inventory, interop, rollback). Use NIST status to drive prioritization and timelines.

Open questions

How do you rotate algorithms without introducing configuration chaos?
Which clients will fail first, and what is the safe fallback behavior?
What is the worst-case handshake cost under attack?
Where would a downgrade be visible today, and how would you detect it?

Checklist

Safety properties stated as invariants.
Failure modes enumerated with mitigations.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Assumptions listed and reviewed.
Rollback plan rehearsed and automated.
Telemetry captures correctness signals.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading