Compliance & Standards: Translating NIST to Engineering Action

Monthly research note. Theme: Post-Quantum Cryptography & Migration.

TL;DR

A focused memo on Compliance & Standards: Translating NIST to Engineering Action: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Hybrid composition must be explicit and transcript-bound to resist downgrade.
Constant-time requirements don’t disappear; they become harder under bigger primitives.
PQC changes handshake costs; plan DoS defenses and budgets.
Treat retries, reordering, and partial failure as default conditions.
Measure correctness signals, not only latency/throughput.

Why this matters

Hybrid designs fail if binding is ambiguous (mix-and-match, downgrade).
PQC changes bandwidth and CPU costs; DoS surfaces move.
Operationalization (monitoring, rollback) determines success more than crypto choice.
Constant-time constraints are harder under large primitives.

Key questions

How do you rotate algorithms safely (crypto agility without chaos)?
How do you handle failures: decryption failures, invalid ciphertexts, malformed keys?
How do you bind hybrid secrets to prevent downgrade and mix-and-match attacks?
Which secrets require long-term confidentiality (HNDL) and where are they today?
What does interoperability testing look like across vendors and stacks?
What are the new DoS surfaces (bigger keys, more CPU, more bandwidth)?

Assumptions

Vendors vary: implementations and defaults differ.
Active attacker can force retries, downgrades, and expensive handshakes.
Bandwidth is limited in some environments; larger handshakes matter.
Side channels exist: timing and cache behavior leak information.

Non-goals

Treating migration as a single flag flip.
Assuming PQC is “drop-in” without changing operational processes.

Attack surface

Any unbounded work per request becomes a DoS primitive under adversaries.

Model & invariants

Hybrid composition should be transcript-bound:

\mathrm{ss} = \mathrm{HKDF}(\mathrm{ss}_\text{classical}\ \Vert\ \mathrm{ss}_\text{pqc},\ \text{info}=\mathrm{transcript}).

Binding is the whole game: make the transcript an input to the KDF.

Treat algorithm negotiation as adversarial: explicit downgrade resistance.

Invariant

Make the “impossible state” observable: a metric or alert that fires when invariants drift.

Security properties

Downgrade resistance: negotiation can’t silently weaken security posture.
Evidence: critical actions emit verifiable audit events.
Integrity: invalid transitions are rejected (and detectable).
Authenticity: actions are bound to identity and purpose.

Failure modes

Timeout ambiguity causing double-apply or partial state transitions.
Recovery paths that only work when nothing is broken.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Mixed-version behavior that violates assumptions silently.

Pitfall

Sampling hides the rare schedule that breaks your invariants.

Design sketch

sequenceDiagram
  participant A as Initiator
  participant B as Responder
  A->>B: classical_keyshare + pqc_pk
  B-->>A: classical_keyshare + pqc_ct + sig
  A-->>B: sig
  Note over A,B: ss = HKDF(ss_classical || ss_pqc, transcript)

Implementation notes

Interop tests are the migration plan; everything else is a hypothesis.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

// Hybrid binding sketch (pseudocode):
// ss = HKDF(ss_classical || ss_pqc, info=transcript_hash)
// Then derive traffic keys from ss.

Verification strategy

Side-channel tests where tooling exists; constant-time audits.
Chaos deploys: mixed versions + rollback during partial outages.
DoS tests: measure CPU/bandwidth amplification and mitigation impact.
Interop matrices across vendors/versions and failure modes.
Downgrade tests: active attacker manipulates negotiation.

Operational notes

Document supported algorithm sets and deprecation timelines.
Add telemetry for negotiation outcomes, failures, and client cohorts.
Roll out with canaries and explicit rollback triggers.
Inventory long-lived secrets and migrate the highest-risk first.
Cap handshake cost per peer/IP; use stateless cookies when needed.

Operational note

Make degraded modes explicit: fail closed vs fail open is a policy choice.

What to monitor

Rollback events and the conditions that triggered them.
Error budget burn + tail latency under load.
Admission-control / rate-limit rejections (by reason).
Retry/timeout rates by endpoint and client cohort.
Authz failures and policy denials (unexpected spikes).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Prefer backward-compatible changes; avoid “flag day” upgrades.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Keep dual-write / dual-verify windows where appropriate.
Use canaries and staged rollout; stop early when signals degrade.

Evidence

RFC 5869: HKDF (1) — Useful when discussing hybrid binding and context separation.
- Evidence: HKDF is the workhorse for domain separation; bind purpose/context to avoid cross-protocol key reuse.
NIST Post-Quantum Cryptography Project (2) — Standardization process and algorithm selections.
- Evidence: Treat PQ migration as a program (inventory, interop, rollback). Use NIST status to drive prioritization and timelines.

Open questions

Which clients will fail first, and what is the safe fallback behavior?
How do you rotate algorithms without introducing configuration chaos?
Where would a downgrade be visible today, and how would you detect it?
What is the worst-case handshake cost under attack?

Checklist

Telemetry captures correctness signals.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Assumptions listed and reviewed.
Failure modes enumerated with mitigations.
Safety properties stated as invariants.
Rollback plan rehearsed and automated.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading