The Leaf Is the Hot Path: Signature Placement in Post-Quantum TLS (ML-DSA vs SLH-DSA)

Paper-driven research note. Theme: PQC migration that fails at the certificate boundary.

TL;DR

If you read post-quantum TLS plans as “replace ECDSA with some PQ signature”, you will build the wrong system.

In TLS 1.3, the end-entity (leaf) certificate key is not just an identity anchor — it is the key that signs the live handshake transcript (CertificateVerify). That single design fact turns “which signature algorithm lives in the leaf” into a hot-path engineering decision with direct DoS consequences.

Delgado Jiménez (arXiv:2604.06100) runs a clean local experiment matrix on OpenSSL 3 + oqsprovider, varying where ML-DSA and SLH-DSA appear in the certificate hierarchy. The result is a discontinuity you cannot hand-wave away:

a fully-ML baseline is ~0.809 ms mean handshake latency, ~0.562 ms server task-clock per handshake
moving SLH-DSA into the server leaf produces ~1402 ms mean latency, ~1401 ms server task-clock per handshake (≈ 1733× the baseline)
the bytes transferred only grow ~1.69×, so the collapse is not “just bigger certificates” — it is online signing cost. (1)

Key insight

In PQ TLS, “placement” is a performance and security boundary. The leaf algorithm determines online server signing cost; upper-layer algorithms mostly shift validation work to clients. This is cost concentration, not algorithm substitution.

Key takeaways

Leaf SLH-DSA is an online CPU collapse. In the paper’s matrix, leaf-SLH jumps from ~0.8 ms to ~1400 ms mean latency (≈ 1733×). (1)
Upper-layer SLH-DSA is penalized but plausible. Root-SLH / leaf-ML increases latency to ~2.133 ms (≈ 2.64×) while server task-clock rises only ~1.19×. (1)
Transport size is a second-order effect in the heavy regime. Leaf-SLH reads ~27,015 bytes vs ~16,008 baseline (≈ 1.69×) while server CPU rises ≈ 2494×. (1)
Client/server work distribution changes by placement. Upper-layer SLH shifts active work toward client validation; leaf-SLH becomes overwhelmingly server-bound. (1)
PQC rollout must be evaluated as PKI+TLS design. Chain exposure, depth, caching, compression, and resumption interact with cryptographic cost in ways primitive benchmarks cannot predict. (2) (3) (4)

Introduction (pragmatic abstract: the infrastructure problem)

The real question your pager asks is not “is ML-DSA post-quantum secure?”.

It is: “Can my TLS front-end authenticate at peak load without becoming a self-inflicted CPU DoS?”

In classical TLS deployments, RSA/ECDSA/Ed25519 signing costs are low enough that we tend to blame handshakes on network RTT, certificate chain size, or cache misses. Post-quantum signatures break that mental model because they are not a single family with smooth tradeoffs:

ML-DSA (FIPS 204) is lattice-based and engineered to be deployable in interactive authentication. (5)
SLH-DSA (FIPS 205) is stateless hash-based and conservative, but its performance profile is fundamentally different. (6)

The paper’s claim is not theoretical: it is deployment-shaped.

“Post-quantum migration in TLS 1.3 should not be understood as a flat substitution problem … [it] depends on where it appears in the certification hierarchy … and how cryptographic burden is distributed across client and server roles.” (1)

That is the right framing. TLS is not “a signature benchmark”; it is an authenticated key-establishment protocol with roles, state, and adversaries.

Assumptions

TLS 1.3 full handshakes with certificate-based server authentication. (2)
X.509 certification hierarchies (root → intermediate → leaf). (3)
Threat model includes adversarial handshakes (flooding, forced full handshakes, cache bypass). Availability is a security property.
I treat the paper’s lab measurements as a signal, not as a universal constant: implementation quality and hardware matter, but order-of-magnitude discontinuities are not noise.
Focus is on server-authentication (no mutual TLS), because that is where “internet scale” lives.

Non-goals

Re-proving TLS 1.3 security. This is about operational correctness under PQ parameter sets.
Modeling global internet pathologies (loss, reordering, congestion collapse). The paper’s lab is local; I’ll critique that explicitly.
Claiming “SLH-DSA is unusable”. The claim is narrower: SLH-DSA in the interactive leaf is operationally toxic for front-ends in the measured regimes.

Security properties

TLS security is not only confidentiality/authenticity. Under active adversaries, availability is a cryptographic boundary because authentication work is attacker-triggerable.

S1 — Authentication correctness

The server must prove possession of the private key corresponding to the presented leaf certificate during the handshake:

\mathrm{Verify}(\mathrm{cert\_chain}) \wedge \mathrm{VerifySig}_{A_{\mathrm{leaf}}}(\mathrm{transcript}, \sigma_{\mathrm{CV}})

where $A_{\mathrm{leaf}}$ is the leaf signature algorithm and $\sigma_{\mathrm{CV}}$ is the CertificateVerify signature. (2)

S2 — Bounded attacker-triggerable work (DoS-resilience invariant)

Let $C_{\mathrm{srv}}$ be the server CPU time spent in cryptographic operations per full handshake. A front-end that must survive adversarial connection rates needs:

\forall \text{handshakes } h:\;\; C_{\mathrm{srv}}(h) \le C_{\max}

and the system-level stability constraint (multi-core queueing approximation):

\rho \equiv \frac{\lambda \cdot \E[C_{\mathrm{srv}}]}{k} < 1,

where $\lambda$ is handshake arrival rate and $k$ is effective parallelism (cores dedicated to handshake crypto).

This is the invariant leaf-SLH breaks: it moves $\E[C_{\mathrm{srv}}]$ from sub-millisecond to ~1.4 seconds in the paper’s measurements. (1)

Invariant

Hot-path crypto budget: the signature algorithm used for CertificateVerify must keep server per-handshake CPU under a fixed bound; otherwise availability collapses under adversarial handshakes.

S3 — Cryptographic agility without silent downgrade

PQC migration in TLS is long-lived and mixed-mode. The negotiation must prevent “compatibility” from becoming a downgrade vector:

explicit policy for acceptable signature algorithms,
telemetry that reveals negotiated algorithms,
rollback that preserves safety properties (no “enable PQ in prod” without escape hatch).

Failure modes

CPU collapse at the leaf: server spends ~1400 ms signing/verifying per handshake; handshake rate collapses; queue grows; timeouts cascade. (1)
Client validation overload: upper-layer SLH increases client task-clock materially (validation-skewed regime); low-end clients and IIoT gateways regress first. (1)
Size-induced latency amplification: certificate chains grow; slow-start, fragmentation, retransmits, and handshake flighting add RTTs (the paper’s local setup underestimates this). (4)
Cache illusions: resumption hides cost only for honest traffic. An attacker can force full handshakes by rotating SNI, disabling tickets, or exploiting client diversity.
Mixed deployment drift: partial rollouts and heterogeneous client capabilities force policy forks; “support both” becomes “accept the weakest under pressure”.

Attack surface

Handshake authentication is attacker-triggerable compute. If you put an expensive signer in the leaf, you have built a CPU amplification primitive into your perimeter.

What to monitor

Per-handshake crypto time split by phase: chain validation vs CertificateVerify signing/verification.
Negotiated signature algorithm distribution (by SNI, region, client cohort).
Handshake latency (p50/p95/p99) and timeout/retry rates.
CPU saturation signatures: run-queue length, softirq pressure, context switch rate.
Handshake queue depth at the load balancer / accept queue.
Bytes per handshake and certificate chain lengths (especially with PQ chains).
Resumption ratio vs full handshake ratio; alert on drops.

Rollback plan

Feature-flag placement policy: ability to move SLH-DSA out of the interactive leaf without redeploying the entire fleet.
Dual chain strategy (operationally plausible): keep ML-DSA in the leaf and place SLH-DSA in upper trust layers (root/intermediate), matching the “bounded penalty” regime observed in the paper. (1)
Client capability gating: enforce per-cohort policies; do not let “one legacy client” dictate global acceptance rules.
Emergency mode: prefer classical leaf fallback only as a last resort (explicitly logged and time-boxed), because “availability now” often becomes “downgrade forever”.

Operational note

Rollback has to be faster than the incident. If changing the leaf algorithm requires a CA ceremony and multi-day issuance, you do not have an operational rollback plan.

The Mathematical Anatomy of the Problem

The paper’s core point can be expressed as a simple decomposition: not all certificate signatures are equal in the TLS protocol.

Let a chain be root → intermediate → leaf.

During a TLS 1.3 full handshake, the client does:

verify the certificate chain signatures (issuer algorithms),
verify the live handshake signature CertificateVerify (leaf algorithm).

The server does:

generate the live CertificateVerify signature (leaf algorithm).

Abstract the per-handshake costs:

Sign(A) = cost to sign using algorithm A
Verify(A) = cost to verify using algorithm A

Then, ignoring key exchange and symmetric crypto:

C_{\mathrm{srv}} \approx \mathrm{Sign}(A_{\mathrm{leaf}})

C_{\mathrm{cli}} \approx \mathrm{Verify}(A_{\mathrm{leaf}}) + \mathrm{Verify}(A_{\mathrm{int}}) + \mathrm{Verify}(A_{\mathrm{root}})

That is the placement lever. Putting SLH-DSA at the root or intermediate raises Verify(SLH) costs on the client side. Putting SLH-DSA at the leaf raises Sign(SLH) on the server side — and that hits your perimeter at scale.

Evidence from the paper’s strategy matrix

Under a common hybrid key-establishment baseline (x25519 + ML-KEM-768), the paper reports (Campaign B): (1)

Scenario	Placement	Mean latency	Mean server task-clock	Bytes read
`x25519mlkem768__ml_root__ml_int__ml_leaf`	ML/ML/ML	0.809 ms	0.562 ms	16,008
`x25519mlkem768__slh_root__ml_int__ml_leaf`	SLH/ML/ML	2.133 ms	0.667 ms	28,947
`x25519mlkem768__ml_root__ml_int__slh_leaf`	ML/ML/SLH	1402.486 ms	1401.169 ms	27,015

Two points matter operationally:

Upper-layer SLH increases latency without collapsing server CPU. That is a validation-skewed regime.
Leaf SLH is a server-dominated regime. The system is not “a bit slower”; it is in a different stability class.

Service capacity as an invariant

If the mean server crypto time per full handshake is $S$ seconds, a single core can sustain at most:

\mu_{\text{core}} \le \frac{1}{S}\;\text{handshakes/sec}.

With $S \approx 1.401$ seconds (leaf-SLH server task-clock), that is $\mu_{\text{core}} \approx 0.71$ handshakes/sec. Even with $k=32$ effective cores, you are in the tens of handshakes per second regime — below the baseline assumptions of modern TLS termination.

This is why the paper’s conclusion is correct: the collapse is not explained by chain size, but by where the expensive signer lives. (1)

From Measurements to Deployment: the engineering gap

The paper is experimental, but the deployment implication is structural:

CertificateVerify is a live signature over a transcript; you cannot precompute it.
resumption reduces exposure but does not eliminate attacker-triggerable full handshakes.
certificate compression reduces bytes, not signing cost. (4)

So the migration strategy must treat the certificate hierarchy as a design surface:

keep a conservative (hash-based) algorithm in long-lived trust anchors,
keep a performant algorithm in the interactive leaf,
and plan for key agility with short-lived leaf certificates.

Critique (what the paper does not prove)

Local lab ≠ internet. The paper’s results isolate compute effects, but the real internet will make PQ chain size penalties worse via RTT amplification. This strengthens (not weakens) the “don’t do leaf-SLH” conclusion for front-ends.
Implementation quality matters. oqsprovider and OpenSSL integration are moving targets. But the observed 10^3× gap is too large to dismiss as mere optimization debt. (7)
Client heterogeneity is under-modeled. Many clients are constrained (mobile, embedded, IIoT gateways). Validation-skewed regimes can still be unacceptable in those populations.
Mutual TLS will magnify costs. If both sides sign, the placement problem becomes bilateral; you must reason about who signs online and under what rate limits.

Evidence

Signature Placement in Post-Quantum TLS Certificate Hierarchies (arXiv:2604.06100) (1)
- Evidence: leaf-SLH produces ≈ 1733× latency and ≈ 2494× server CPU relative to an all-ML baseline, while bytes grow only ≈ 1.69×.
TLS 1.3 (RFC 8446) (2)
- Evidence: CertificateVerify is a live signature over the handshake transcript, binding the leaf algorithm to the hot path.
X.509 PKI Profile (RFC 5280) (3)
- Evidence: certification hierarchies separate offline issuance from online authentication; that separation is the placement lever.
FIPS 204 (ML-DSA) (5)
- Evidence: ML-DSA is explicitly standardized for interactive deployments.
FIPS 205 (SLH-DSA) (6)
- Evidence: SLH-DSA is conservative but its performance profile requires careful placement.
oqs-provider (7)
- Evidence: real PQ TLS experiments depend on provider quality and integration details, which are still evolving.

Open questions

What is the cleanest “PQ root + fast leaf” strategy that preserves long-term trust while keeping hot-path CPU bounded?
Can we formalize a deployment constraint language: “these algorithms are allowed in offline issuance vs online authentication”?
How do we make downgrade resistance auditable at scale (per-cohort policy + telemetry + enforcement)?

Checklist

Leaf algorithm chosen with an explicit per-handshake CPU budget.
Chain design separates offline issuance from online authentication costs.
Certificate compression evaluated (bytes) but not used as a proxy for CPU. (4)
Rate limiting and handshake queuing modeled under adversarial load.
Negotiated algorithms logged and monitored (per cohort / SNI).
Rollback plan does not require a multi-day CA ceremony.

TL;DR

Key takeaways

Introduction (pragmatic abstract: the infrastructure problem)

Assumptions

Non-goals

Security properties

S1 — Authentication correctness

S2 — Bounded attacker-triggerable work (DoS-resilience invariant)

S3 — Cryptographic agility without silent downgrade

Failure modes

What to monitor

Rollback plan

The Mathematical Anatomy of the Problem

Evidence from the paper’s strategy matrix

Service capacity as an invariant

From Measurements to Deployment: the engineering gap

Critique (what the paper does not prove)

Evidence

Open questions

Checklist

Further reading