Paper-driven research note. Theme: PQC migration that fails at the certificate boundary.
TL;DR
If you read post-quantum TLS plans as “replace ECDSA with some PQ signature”, you will build the wrong system.
In TLS 1.3, the end-entity (leaf) certificate key is not just an identity anchor — it is the key that signs the live handshake transcript (CertificateVerify). That single design fact turns “which signature algorithm lives in the leaf” into a hot-path engineering decision with direct DoS consequences.
Delgado Jiménez (arXiv:2604.06100) runs a clean local experiment matrix on OpenSSL 3 + oqsprovider, varying where ML-DSA and SLH-DSA appear in the certificate hierarchy. The result is a discontinuity you cannot hand-wave away:
- a fully-ML baseline is ~0.809 ms mean handshake latency, ~0.562 ms server task-clock per handshake
- moving SLH-DSA into the server leaf produces ~1402 ms mean latency, ~1401 ms server task-clock per handshake (≈ 1733× the baseline)
- the bytes transferred only grow ~1.69×, so the collapse is not “just bigger certificates” — it is online signing cost. (1)
In PQ TLS, “placement” is a performance and security boundary. The leaf algorithm determines online server signing cost; upper-layer algorithms mostly shift validation work to clients. This is cost concentration, not algorithm substitution.
Key takeaways
- Leaf SLH-DSA is an online CPU collapse. In the paper’s matrix, leaf-SLH jumps from ~0.8 ms to ~1400 ms mean latency (≈ 1733×). (1)
- Upper-layer SLH-DSA is penalized but plausible. Root-SLH / leaf-ML increases latency to ~2.133 ms (≈ 2.64×) while server task-clock rises only ~1.19×. (1)
- Transport size is a second-order effect in the heavy regime. Leaf-SLH reads ~27,015 bytes vs ~16,008 baseline (≈ 1.69×) while server CPU rises ≈ 2494×. (1)
- Client/server work distribution changes by placement. Upper-layer SLH shifts active work toward client validation; leaf-SLH becomes overwhelmingly server-bound. (1)
- PQC rollout must be evaluated as PKI+TLS design. Chain exposure, depth, caching, compression, and resumption interact with cryptographic cost in ways primitive benchmarks cannot predict. (2) (3) (4)
Introduction (pragmatic abstract: the infrastructure problem)
The real question your pager asks is not “is ML-DSA post-quantum secure?”.
It is: “Can my TLS front-end authenticate at peak load without becoming a self-inflicted CPU DoS?”
In classical TLS deployments, RSA/ECDSA/Ed25519 signing costs are low enough that we tend to blame handshakes on network RTT, certificate chain size, or cache misses. Post-quantum signatures break that mental model because they are not a single family with smooth tradeoffs:
- ML-DSA (FIPS 204) is lattice-based and engineered to be deployable in interactive authentication. (5)
- SLH-DSA (FIPS 205) is stateless hash-based and conservative, but its performance profile is fundamentally different. (6)
The paper’s claim is not theoretical: it is deployment-shaped.
“Post-quantum migration in TLS 1.3 should not be understood as a flat substitution problem … [it] depends on where it appears in the certification hierarchy … and how cryptographic burden is distributed across client and server roles.” (1)
That is the right framing. TLS is not “a signature benchmark”; it is an authenticated key-establishment protocol with roles, state, and adversaries.
Assumptions
- TLS 1.3 full handshakes with certificate-based server authentication. (2)
- X.509 certification hierarchies (root → intermediate → leaf). (3)
- Threat model includes adversarial handshakes (flooding, forced full handshakes, cache bypass). Availability is a security property.
- I treat the paper’s lab measurements as a signal, not as a universal constant: implementation quality and hardware matter, but order-of-magnitude discontinuities are not noise.
- Focus is on server-authentication (no mutual TLS), because that is where “internet scale” lives.
Non-goals
- Re-proving TLS 1.3 security. This is about operational correctness under PQ parameter sets.
- Modeling global internet pathologies (loss, reordering, congestion collapse). The paper’s lab is local; I’ll critique that explicitly.
- Claiming “SLH-DSA is unusable”. The claim is narrower: SLH-DSA in the interactive leaf is operationally toxic for front-ends in the measured regimes.
Security properties
TLS security is not only confidentiality/authenticity. Under active adversaries, availability is a cryptographic boundary because authentication work is attacker-triggerable.
S1 — Authentication correctness
The server must prove possession of the private key corresponding to the presented leaf certificate during the handshake:
where is the leaf signature algorithm and is the CertificateVerify signature. (2)
S2 — Bounded attacker-triggerable work (DoS-resilience invariant)
Let be the server CPU time spent in cryptographic operations per full handshake. A front-end that must survive adversarial connection rates needs:
and the system-level stability constraint (multi-core queueing approximation):
where is handshake arrival rate and is effective parallelism (cores dedicated to handshake crypto).
This is the invariant leaf-SLH breaks: it moves from sub-millisecond to ~1.4 seconds in the paper’s measurements. (1)
Hot-path crypto budget: the signature algorithm used for CertificateVerify must keep server per-handshake CPU under a fixed bound; otherwise availability collapses under adversarial handshakes.
S3 — Cryptographic agility without silent downgrade
PQC migration in TLS is long-lived and mixed-mode. The negotiation must prevent “compatibility” from becoming a downgrade vector:
- explicit policy for acceptable signature algorithms,
- telemetry that reveals negotiated algorithms,
- rollback that preserves safety properties (no “enable PQ in prod” without escape hatch).
Failure modes
- CPU collapse at the leaf: server spends ~1400 ms signing/verifying per handshake; handshake rate collapses; queue grows; timeouts cascade. (1)
- Client validation overload: upper-layer SLH increases client task-clock materially (validation-skewed regime); low-end clients and IIoT gateways regress first. (1)
- Size-induced latency amplification: certificate chains grow; slow-start, fragmentation, retransmits, and handshake flighting add RTTs (the paper’s local setup underestimates this). (4)
- Cache illusions: resumption hides cost only for honest traffic. An attacker can force full handshakes by rotating SNI, disabling tickets, or exploiting client diversity.
- Mixed deployment drift: partial rollouts and heterogeneous client capabilities force policy forks; “support both” becomes “accept the weakest under pressure”.
Handshake authentication is attacker-triggerable compute. If you put an expensive signer in the leaf, you have built a CPU amplification primitive into your perimeter.
What to monitor
- Per-handshake crypto time split by phase: chain validation vs
CertificateVerifysigning/verification. - Negotiated signature algorithm distribution (by SNI, region, client cohort).
- Handshake latency (p50/p95/p99) and timeout/retry rates.
- CPU saturation signatures: run-queue length, softirq pressure, context switch rate.
- Handshake queue depth at the load balancer / accept queue.
- Bytes per handshake and certificate chain lengths (especially with PQ chains).
- Resumption ratio vs full handshake ratio; alert on drops.
Rollback plan
- Feature-flag placement policy: ability to move SLH-DSA out of the interactive leaf without redeploying the entire fleet.
- Dual chain strategy (operationally plausible): keep ML-DSA in the leaf and place SLH-DSA in upper trust layers (root/intermediate), matching the “bounded penalty” regime observed in the paper. (1)
- Client capability gating: enforce per-cohort policies; do not let “one legacy client” dictate global acceptance rules.
- Emergency mode: prefer classical leaf fallback only as a last resort (explicitly logged and time-boxed), because “availability now” often becomes “downgrade forever”.
Rollback has to be faster than the incident. If changing the leaf algorithm requires a CA ceremony and multi-day issuance, you do not have an operational rollback plan.
The Mathematical Anatomy of the Problem
The paper’s core point can be expressed as a simple decomposition: not all certificate signatures are equal in the TLS protocol.
Let a chain be root → intermediate → leaf.
During a TLS 1.3 full handshake, the client does:
- verify the certificate chain signatures (issuer algorithms),
- verify the live handshake signature
CertificateVerify(leaf algorithm).
The server does:
- generate the live
CertificateVerifysignature (leaf algorithm).
Abstract the per-handshake costs:
Sign(A)= cost to sign using algorithmAVerify(A)= cost to verify using algorithmA
Then, ignoring key exchange and symmetric crypto:
That is the placement lever. Putting SLH-DSA at the root or intermediate raises Verify(SLH) costs on the client side. Putting SLH-DSA at the leaf raises Sign(SLH) on the server side — and that hits your perimeter at scale.
Evidence from the paper’s strategy matrix
Under a common hybrid key-establishment baseline (x25519 + ML-KEM-768), the paper reports (Campaign B): (1)
| Scenario | Placement | Mean latency | Mean server task-clock | Bytes read |
|---|---|---|---|---|
x25519mlkem768__ml_root__ml_int__ml_leaf |
ML/ML/ML | 0.809 ms | 0.562 ms | 16,008 |
x25519mlkem768__slh_root__ml_int__ml_leaf |
SLH/ML/ML | 2.133 ms | 0.667 ms | 28,947 |
x25519mlkem768__ml_root__ml_int__slh_leaf |
ML/ML/SLH | 1402.486 ms | 1401.169 ms | 27,015 |
Two points matter operationally:
- Upper-layer SLH increases latency without collapsing server CPU. That is a validation-skewed regime.
- Leaf SLH is a server-dominated regime. The system is not “a bit slower”; it is in a different stability class.
Service capacity as an invariant
If the mean server crypto time per full handshake is seconds, a single core can sustain at most:
With seconds (leaf-SLH server task-clock), that is handshakes/sec. Even with effective cores, you are in the tens of handshakes per second regime — below the baseline assumptions of modern TLS termination.
This is why the paper’s conclusion is correct: the collapse is not explained by chain size, but by where the expensive signer lives. (1)
From Measurements to Deployment: the engineering gap
The paper is experimental, but the deployment implication is structural:
CertificateVerifyis a live signature over a transcript; you cannot precompute it.- resumption reduces exposure but does not eliminate attacker-triggerable full handshakes.
- certificate compression reduces bytes, not signing cost. (4)
So the migration strategy must treat the certificate hierarchy as a design surface:
- keep a conservative (hash-based) algorithm in long-lived trust anchors,
- keep a performant algorithm in the interactive leaf,
- and plan for key agility with short-lived leaf certificates.
Critique (what the paper does not prove)
- Local lab ≠ internet. The paper’s results isolate compute effects, but the real internet will make PQ chain size penalties worse via RTT amplification. This strengthens (not weakens) the “don’t do leaf-SLH” conclusion for front-ends.
- Implementation quality matters.
oqsproviderand OpenSSL integration are moving targets. But the observed 10^3× gap is too large to dismiss as mere optimization debt. (7) - Client heterogeneity is under-modeled. Many clients are constrained (mobile, embedded, IIoT gateways). Validation-skewed regimes can still be unacceptable in those populations.
- Mutual TLS will magnify costs. If both sides sign, the placement problem becomes bilateral; you must reason about who signs online and under what rate limits.
Evidence
- Signature Placement in Post-Quantum TLS Certificate Hierarchies (arXiv:2604.06100) (1)
- Evidence: leaf-SLH produces ≈ 1733× latency and ≈ 2494× server CPU relative to an all-ML baseline, while bytes grow only ≈ 1.69×.
- TLS 1.3 (RFC 8446) (2)
- Evidence:
CertificateVerifyis a live signature over the handshake transcript, binding the leaf algorithm to the hot path.
- Evidence:
- X.509 PKI Profile (RFC 5280) (3)
- Evidence: certification hierarchies separate offline issuance from online authentication; that separation is the placement lever.
- FIPS 204 (ML-DSA) (5)
- Evidence: ML-DSA is explicitly standardized for interactive deployments.
- FIPS 205 (SLH-DSA) (6)
- Evidence: SLH-DSA is conservative but its performance profile requires careful placement.
oqs-provider(7)- Evidence: real PQ TLS experiments depend on provider quality and integration details, which are still evolving.
Open questions
- What is the cleanest “PQ root + fast leaf” strategy that preserves long-term trust while keeping hot-path CPU bounded?
- Can we formalize a deployment constraint language: “these algorithms are allowed in offline issuance vs online authentication”?
- How do we make downgrade resistance auditable at scale (per-cohort policy + telemetry + enforcement)?
Checklist
- Leaf algorithm chosen with an explicit per-handshake CPU budget.
- Chain design separates offline issuance from online authentication costs.
- Certificate compression evaluated (bytes) but not used as a proxy for CPU. (4)
- Rate limiting and handshake queuing modeled under adversarial load.
- Negotiated algorithms logged and monitored (per cohort / SNI).
- Rollback plan does not require a multi-day CA ceremony.
Further reading
- RFC 8446: TLS 1.3 (2)
- RFC 5280: X.509 PKI Profile (3)
- RFC 8879: TLS Certificate Compression (4)
- NIST PQC Project (8)
- FIPS 204: ML-DSA (5)
- FIPS 205: SLH-DSA (6)
- Open Quantum Safe: oqs-provider (7)