Monthly research note. Theme: Formal Methods & Verification.
TL;DR
A focused memo on Verified Crypto Interfaces: Constant-Time Boundaries and Misuse Resistance: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.
Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.
Key takeaways
- Refinement boundaries prevent spec drift between paper and code.
- Keep models small enough to run in seconds or they will rot.
- Counterexamples are engineering artifacts—minimize them and turn them into tests.
- Make failure modes explicit and observable.
- Write assumptions down; treat them as interfaces.
Why this matters
- The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
- Formal models force you to name assumptions (time, ordering, failure).
- Verification complements testing by exploring adversarial schedules systematically.
- Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
Key questions
- How do you handle state explosion (symmetry, abstraction, bounds)?
- What is the refinement boundary between spec and implementation?
- What is the smallest model that still captures the bug class you fear?
- Which invariants must hold under every interleaving and crash point?
- Which properties belong in the model vs in tests vs in monitoring?
- What is the environment model (adversary actions, scheduling, failures)?
Assumptions
- Most systems have implicit assumptions about timeouts and ordering.
- Concurrency introduces interleavings humans don’t reason about reliably.
- Adversaries choose the worst schedule, not the average one.
- Specifications omit details; implementations invent them. That gap is risk.
Non-goals
- Proving the whole system end-to-end with all implementation details.
- Assuming the spec and the code share the same definitions implicitly.
Parsing is an attacker-controlled interface—validate early and fail fast.
Model & invariants
In temporal logic terms, the common shape is:
Keep the model small enough to run in seconds; large models rot.
Model the scheduler explicitly when concurrency is part of the threat model.
Make the “impossible state” observable: a metric or alert that fires when invariants drift.
Security properties
- Least authority: privileges are scoped by purpose and time.
- Replay resistance: duplicated inputs do not change outcomes.
- Downgrade resistance: negotiation can’t silently weaken security posture.
- Integrity: invalid transitions are rejected (and detectable).
Failure modes
- Mixed-version behavior that violates assumptions silently.
- Observability gaps during incidents (missing evidence).
- Recovery paths that only work when nothing is broken.
- Config drift that weakens security posture over time.
Mixed-version deployments create states you never tested—plan for them explicitly.
Design sketch
flowchart LR
spec["Spec (TLA+/PlusCal)"] --> mc["Model Check"]
mc --> refine["Refinement / Invariants"]
refine --> impl["Implementation (Rust/Go)"]
impl --> tests["Fuzz / PBT / Differential"]
tests --> specImplementation notes
Treat invariants as code: version, review, and test them.
If you can’t explain a timeout outcome, you can’t make retries safe.
// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.Verification strategy
- Refinement tests: compare model traces to implementation traces.
- Proof maintenance: keep models in CI with a time budget.
- Property-based tests derived from invariants.
- Differential tests against other implementations/specs.
- Runtime assertions for invariants that are cheap to check.
Operational notes
- Version properties and invariants like code; review changes carefully.
- Run the model checker in CI with explicit timeouts and bounds.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Use models to evaluate protocol upgrades before shipping.
- Keep a library of “known hard schedules” from past failures.
Make degraded modes explicit: fail closed vs fail open is a policy choice.
What to monitor
- Rollback events and the conditions that triggered them.
- Invariant violation rate (should be ~0).
- Retry/timeout rates by endpoint and client cohort.
- Admission-control / rate-limit rejections (by reason).
- Error budget burn + tail latency under load.
Rollback plan
- Define an explicit rollback trigger (metrics + thresholds).
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Use canaries and staged rollout; stop early when signals degrade.
- Keep dual-write / dual-verify windows where appropriate.
Evidence
- Designing Data-Intensive Applications (Kleppmann) (1) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
- Learn TLA+ (2) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
Open questions
- Which properties are you currently assuming but not testing or proving?
- Which invariants are cheap enough to monitor in production?
- How will you keep models aligned during rapid iteration?
- What is the smallest model that reproduces your worst incident class?
Checklist
- Safety properties stated as invariants.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Failure modes enumerated with mitigations.
- Assumptions listed and reviewed.
- Telemetry captures correctness signals.
- Rollback plan rehearsed and automated.
Further reading
- Learn TLA+ — Practical workflow and examples.
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.