Post-Quantum DoS Surfaces: Handshakes, Amplification, and Mitigations

Monthly research note. Theme: Quantum-Resilient Systems Engineering.

TL;DR

A focused memo on Post-Quantum DoS Surfaces: Handshakes, Amplification, and Mitigations: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Inventory long-lived secrets first; you can’t migrate what you can’t locate.
Hybrid is an operational mode: deploy, monitor, rollback—not a paper design.
Downgrade resistance must be explicit and tested under active attackers.
Define safety properties before performance goals.
Treat retries, reordering, and partial failure as default conditions.

Why this matters

Quantum risk is uneven: some secrets must last decades, others do not.
Migration risk is operational: inventory, rollout, rollback, and monitoring.
Cost changes drive new DoS surfaces; defenses must evolve.
Long-lived devices and PKI lifecycles are the hard constraint.

Key questions

How do you define success metrics for PQ readiness beyond “enabled”?
How do you validate resilience (DoS, side channels, rollback, compromise)?
Which protocols need hybrid now, and which can wait without regret?
What secrets must remain confidential for 10–30 years (and where are they today)?
How do you stop downgrade under active adversaries?
What does rotation look like at fleet scale (devices, certs, tunnels, identities)?

Assumptions

Key and certificate lifecycles outlive application versions.
Operational teams need safe playbooks; crypto changes are not one-off.
Some environments require constrained implementations (no_std, embedded).
Rollouts happen under partial adoption; compatibility matters.

Non-goals

Assuming performance impacts will be negligible.
Switching algorithms without inventorying where secrets are used.

Attack surface

Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.

Model & invariants

Risk is a function of exposure and lifetime:

\mathrm{risk} \approx \mathrm{exposure} \times \mathrm{lifetime} \times \mathrm{adversary\_capability}.

Treat ops as part of the protocol: monitoring, rollback, and incident response.

Make downgrade resistance explicit and test it like a security feature.

Invariant

Make the “impossible state” observable: a metric or alert that fires when invariants drift.

Security properties

Evidence: critical actions emit verifiable audit events.
Least authority: privileges are scoped by purpose and time.
Integrity: invalid transitions are rejected (and detectable).
Downgrade resistance: negotiation can’t silently weaken security posture.

Failure modes

Timeout ambiguity causing double-apply or partial state transitions.
Mixed-version behavior that violates assumptions silently.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Config drift that weakens security posture over time.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart TD
  inventory["Inventory"] --> prioritize["Prioritize"]
  prioritize --> hybrid["Hybrid Deploy"]
  hybrid --> monitor["Monitor"]
  monitor --> cutover["Cutover"]
  cutover --> deprecate["Deprecate Old"]

Implementation notes

Design hybrid modes with explicit binding and observable outcomes.

Rule of thumb

Bound work per request: parse, validate, and cap cost before you allocate heavy resources.

// PQ migration note: "enabled" is not "safe" unless binding and downgrade resistance are explicit.

Verification strategy

Interop tests across stacks and versions.
Rotation drills: certificates, tunnels, device identities.
Performance profiling under load to quantify DoS risk.
Downgrade simulations with active attackers.
Side-channel audits for constrained implementations.

Operational notes

Add telemetry for algorithm negotiation and failure modes.
Define compatibility windows and communicate them to stakeholders.
Practice emergency deprecation (turn off broken algorithms quickly).
Maintain an inventory of long-lived secrets and their lifetimes.
Roll out hybrid with canaries and explicit rollback triggers.

Operational note

Keep audit and config history queryable during incidents—evidence beats intuition.

What to monitor

Admission-control / rate-limit rejections (by reason).
Rollback events and the conditions that triggered them.
Retry/timeout rates by endpoint and client cohort.
Authz failures and policy denials (unexpected spikes).
Invariant violation rate (should be ~0).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Use canaries and staged rollout; stop early when signals degrade.
Keep dual-write / dual-verify windows where appropriate.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.

Evidence

Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
Site Reliability Engineering (Google) (2) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.

Open questions

Which protocol surfaces are most exposed to HNDL risk in your environment?
How do you prevent configuration drift from re-enabling weak modes?
What is your plan for third-party dependencies that can’t migrate quickly?
What is your minimal ‘safe mode’ when PQ paths fail?

Checklist

Failure modes enumerated with mitigations.
Safety properties stated as invariants.
Assumptions listed and reviewed.
Telemetry captures correctness signals.
Rollback plan rehearsed and automated.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading