Quantum Threat Modeling for Infrastructure: What Changes, What Doesn’t

Monthly research note. Theme: Quantum-Resilient Systems Engineering.

TL;DR

A focused memo on Quantum Threat Modeling for Infrastructure: What Changes, What Doesn’t: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Treat “timeouts” as a third outcome: not success, not failure—ambiguity you must model.

Key takeaways

Inventory long-lived secrets first; you can’t migrate what you can’t locate.
Hybrid is an operational mode: deploy, monitor, rollback—not a paper design.
Downgrade resistance must be explicit and tested under active attackers.
Define safety properties before performance goals.
Treat retries, reordering, and partial failure as default conditions.

Why this matters

Long-lived devices and PKI lifecycles are the hard constraint.
Hybrid protocols fail if binding is unclear or downgrade is possible.
Quantum risk is uneven: some secrets must last decades, others do not.
Cost changes drive new DoS surfaces; defenses must evolve.

Key questions

How do you validate resilience (DoS, side channels, rollback, compromise)?
What does rotation look like at fleet scale (devices, certs, tunnels, identities)?
How do you define success metrics for PQ readiness beyond “enabled”?
What secrets must remain confidential for 10–30 years (and where are they today)?
Which protocols need hybrid now, and which can wait without regret?
How do you stop downgrade under active adversaries?

Assumptions

Key and certificate lifecycles outlive application versions.
Some environments require constrained implementations (no_std, embedded).
Operational teams need safe playbooks; crypto changes are not one-off.
Rollouts happen under partial adoption; compatibility matters.

Non-goals

Relying on ‘automatic’ negotiation without downgrade resistance.
Assuming performance impacts will be negligible.

Attack surface

Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.

Model & invariants

Hybrid composition should be explicit and transcript-bound:

\mathrm{ss} = \mathrm{HKDF}(\mathrm{ss}_\text{classical}\ \Vert\ \mathrm{ss}_\text{pqc},\ \text{info}=\mathrm{transcript}).

Inventory first. You can’t migrate what you can’t locate.

Make downgrade resistance explicit and test it like a security feature.

Invariant

If the system can enter an invalid state, it eventually will—usually during an incident.

Security properties

Authenticity: actions are bound to identity and purpose.
Replay resistance: duplicated inputs do not change outcomes.
Evidence: critical actions emit verifiable audit events.
Least authority: privileges are scoped by purpose and time.

Failure modes

Observability gaps during incidents (missing evidence).
Recovery paths that only work when nothing is broken.
Timeout ambiguity causing double-apply or partial state transitions.
Mixed-version behavior that violates assumptions silently.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart LR
  threat["Threat Model (quantum + classical)"] --> design["Protocol Design"]
  design --> impl["Implementation (no_std where needed)"]
  impl --> verify["Verification (tests + formal)"]
  verify --> ops["Operationalization (rotation + monitoring)"]
  ops --> threat

Implementation notes

Operationalize early: rollback and monitoring are part of the design.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

// PQ migration note: "enabled" is not "safe" unless binding and downgrade resistance are explicit.

Verification strategy

Interop tests across stacks and versions.
Downgrade simulations with active attackers.
Side-channel audits for constrained implementations.
Performance profiling under load to quantify DoS risk.
Rotation drills: certificates, tunnels, device identities.

Operational notes

Maintain an inventory of long-lived secrets and their lifetimes.
Practice emergency deprecation (turn off broken algorithms quickly).
Define compatibility windows and communicate them to stakeholders.
Roll out hybrid with canaries and explicit rollback triggers.
Add telemetry for algorithm negotiation and failure modes.

Operational note

Attach explicit rollout/rollback triggers to changes that touch security or correctness.

What to monitor

Authz failures and policy denials (unexpected spikes).
Retry/timeout rates by endpoint and client cohort.
Error budget burn + tail latency under load.
Admission-control / rate-limit rejections (by reason).
Rollback events and the conditions that triggered them.

Rollback plan

Prefer backward-compatible changes; avoid “flag day” upgrades.
Use canaries and staged rollout; stop early when signals degrade.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Keep dual-write / dual-verify windows where appropriate.
Define an explicit rollback trigger (metrics + thresholds).

Evidence

Let's Encrypt Incident Reports (1) — Operational lessons relevant to rotation and recovery at scale.
- Evidence: Rotation and revocation are operational protocols; extract failure patterns into drills and automated rollbacks.
RFC 8446: TLS 1.3 (2) — A useful reference for handshake structure and downgrade resistance patterns.
- Evidence: Handshake transcript binding and downgrade resistance patterns; monitor negotiation paths and failure reasons.

Open questions

Which protocol surfaces are most exposed to HNDL risk in your environment?
What is your plan for third-party dependencies that can’t migrate quickly?
What is your minimal ‘safe mode’ when PQ paths fail?
How do you prevent configuration drift from re-enabling weak modes?

Checklist

Assumptions listed and reviewed.
Rollback plan rehearsed and automated.
Failure modes enumerated with mitigations.
Safety properties stated as invariants.
Telemetry captures correctness signals.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading