Signatures in Practice: Dilithium/Falcon and Deployment Constraints

Monthly research note. Theme: Post-Quantum Cryptography & Migration.

TL;DR

A focused memo on Signatures in Practice: Dilithium/Falcon and Deployment Constraints: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Interop is the migration plan—test matrices are more important than whitepapers.
PQC changes handshake costs; plan DoS defenses and budgets.
Hybrid composition must be explicit and transcript-bound to resist downgrade.
Treat retries, reordering, and partial failure as default conditions.
Define safety properties before performance goals.

Why this matters

Interop is the real risk: multiple stacks, vendors, and versions.
PQC changes bandwidth and CPU costs; DoS surfaces move.
Migration will be mixed-version for years; plan for it explicitly.
Operationalization (monitoring, rollback) determines success more than crypto choice.

Key questions

Which parts must be constant-time, and how will you validate that?
What are the new DoS surfaces (bigger keys, more CPU, more bandwidth)?
How do you rotate algorithms safely (crypto agility without chaos)?
What does interoperability testing look like across vendors and stacks?
Which secrets require long-term confidentiality (HNDL) and where are they today?
What telemetry proves PQC is working (not just enabled)?

Assumptions

Side channels exist: timing and cache behavior leak information.
Bandwidth is limited in some environments; larger handshakes matter.
Deployments are mixed; old clients must interoperate or fail safely.
Vendors vary: implementations and defaults differ.

Non-goals

Relying on silent fallback to weaker modes during interop failures.
Assuming PQC is “drop-in” without changing operational processes.

Attack surface

Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.

Model & invariants

A KEM gives you shared secrets without discrete-log assumptions:

(\mathrm{pk},\mathrm{sk})\leftarrow \mathrm{KeyGen}();\ (\mathrm{ct},\mathrm{ss})\leftarrow \mathrm{Enc}(\mathrm{pk});\ \mathrm{ss}\leftarrow \mathrm{Dec}(\mathrm{sk},\mathrm{ct}).

Treat algorithm negotiation as adversarial: explicit downgrade resistance.

Binding is the whole game: make the transcript an input to the KDF.

Invariant

If the system can enter an invalid state, it eventually will—usually during an incident.

Security properties

Downgrade resistance: negotiation can’t silently weaken security posture.
Least authority: privileges are scoped by purpose and time.
Integrity: invalid transitions are rejected (and detectable).
Authenticity: actions are bound to identity and purpose.

Failure modes

Recovery paths that only work when nothing is broken.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Config drift that weakens security posture over time.
Mixed-version behavior that violates assumptions silently.

Pitfall

Mixed-version deployments create states you never tested—plan for them explicitly.

Design sketch

flowchart TD
  negotiate["Negotiate Algorithms"] --> bind["Bind Transcript"]
  bind --> kdf["KDF (hybrid)"]
  kdf --> keys["Traffic Keys"]
  keys --> monitor["Monitor + Rollback"]

Implementation notes

Interop tests are the migration plan; everything else is a hypothesis.

Rule of thumb

If you can’t explain a timeout outcome, you can’t make retries safe.

// Hybrid binding sketch (pseudocode):
// ss = HKDF(ss_classical || ss_pqc, info=transcript_hash)
// Then derive traffic keys from ss.

Verification strategy

Interop matrices across vendors/versions and failure modes.
Side-channel tests where tooling exists; constant-time audits.
Chaos deploys: mixed versions + rollback during partial outages.
Downgrade tests: active attacker manipulates negotiation.
DoS tests: measure CPU/bandwidth amplification and mitigation impact.

Operational notes

Inventory long-lived secrets and migrate the highest-risk first.
Document supported algorithm sets and deprecation timelines.
Add telemetry for negotiation outcomes, failures, and client cohorts.
Cap handshake cost per peer/IP; use stateless cookies when needed.
Roll out with canaries and explicit rollback triggers.

Operational note

Make degraded modes explicit: fail closed vs fail open is a policy choice.

What to monitor

Invariant violation rate (should be ~0).
Error budget burn + tail latency under load.
Admission-control / rate-limit rejections (by reason).
Authz failures and policy denials (unexpected spikes).
Retry/timeout rates by endpoint and client cohort.

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Keep dual-write / dual-verify windows where appropriate.
Use canaries and staged rollout; stop early when signals degrade.

Evidence

NIST Post-Quantum Cryptography Project (1) — Standardization process and algorithm selections.
- Evidence: Treat PQ migration as a program (inventory, interop, rollback). Use NIST status to drive prioritization and timelines.
Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.

Open questions

Which clients will fail first, and what is the safe fallback behavior?
What is the worst-case handshake cost under attack?
Where would a downgrade be visible today, and how would you detect it?
How do you rotate algorithms without introducing configuration chaos?

Checklist

Safety properties stated as invariants.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Rollback plan rehearsed and automated.
Telemetry captures correctness signals.
Failure modes enumerated with mitigations.
Assumptions listed and reviewed.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading