Monthly research note. Theme: Formal Methods & Verification.
TL;DR
Designing APIs for Correctness: Types, Lifetimes, and Capabilities as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Keep models small enough to run in seconds or they will rot.
- Refinement boundaries prevent spec drift between paper and code.
- Write properties in plain language next to the formal statement.
- Write assumptions down; treat them as interfaces.
- Measure correctness signals, not only latency/throughput.
Why this matters
- Formal models force you to name assumptions (time, ordering, failure).
- Refinement boundaries prevent “spec drift” between paper and code.
- Counterexamples are better than intuition—they are executable bug reports.
- The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
Key questions
- How do you handle state explosion (symmetry, abstraction, bounds)?
- How do you ensure proofs stay valid through refactors and upgrades?
- What is the environment model (adversary actions, scheduling, failures)?
- Which invariants must hold under every interleaving and crash point?
- Which properties belong in the model vs in tests vs in monitoring?
- What is the refinement boundary between spec and implementation?
Assumptions
- Adversaries choose the worst schedule, not the average one.
- Most systems have implicit assumptions about timeouts and ordering.
- Concurrency introduces interleavings humans don’t reason about reliably.
- Specifications omit details; implementations invent them. That gap is risk.
Non-goals
- Assuming the spec and the code share the same definitions implicitly.
- Writing models that can’t produce counterexamples quickly.
Parsing is an attacker-controlled interface—validate early and fail fast.
Model & invariants
Refinement is a simulation relation between spec and impl:
Keep the model small enough to run in seconds; large models rot.
Treat counterexamples as regression tests: reduce, encode, and replay.
Monotonicity beats timestamps: counters and epochs survive clock skew.
Security properties
- Authenticity: actions are bound to identity and purpose.
- Integrity: invalid transitions are rejected (and detectable).
- Replay resistance: duplicated inputs do not change outcomes.
- Least authority: privileges are scoped by purpose and time.
Failure modes
- Config drift that weakens security posture over time.
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
- Timeout ambiguity causing double-apply or partial state transitions.
- Mixed-version behavior that violates assumptions silently.
Sampling hides the rare schedule that breaks your invariants.
Design sketch
flowchart LR
spec["Spec (TLA+/PlusCal)"] --> mc["Model Check"]
mc --> refine["Refinement / Invariants"]
refine --> impl["Implementation (Rust/Go)"]
impl --> tests["Fuzz / PBT / Differential"]
tests --> specImplementation notes
Make the model executable enough to generate counterexamples quickly.
Bound work per request: parse, validate, and cap cost before you allocate heavy resources.
// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.Verification strategy
- Refinement tests: compare model traces to implementation traces.
- Property-based tests derived from invariants.
- Model checking bounded versions of the core protocol.
- Differential tests against other implementations/specs.
- Runtime assertions for invariants that are cheap to check.
Operational notes
- Version properties and invariants like code; review changes carefully.
- Use models to evaluate protocol upgrades before shipping.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Run the model checker in CI with explicit timeouts and bounds.
- Keep a library of “known hard schedules” from past failures.
Attach explicit rollout/rollback triggers to changes that touch security or correctness.
What to monitor
- Retry/timeout rates by endpoint and client cohort.
- Error budget burn + tail latency under load.
- Invariant violation rate (should be ~0).
- Authz failures and policy denials (unexpected spikes).
- Admission-control / rate-limit rejections (by reason).
Rollback plan
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Keep dual-write / dual-verify windows where appropriate.
- Define an explicit rollback trigger (metrics + thresholds).
- Use canaries and staged rollout; stop early when signals degrade.
Evidence
- Learn TLA+ (1) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
- Jepsen (2) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
Open questions
- Which properties are you currently assuming but not testing or proving?
- Which invariants are cheap enough to monitor in production?
- How will you keep models aligned during rapid iteration?
- What is the smallest model that reproduces your worst incident class?
Checklist
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Rollback plan rehearsed and automated.
- Telemetry captures correctness signals.
- Safety properties stated as invariants.
- Assumptions listed and reviewed.
- Failure modes enumerated with mitigations.
Further reading
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Learn TLA+ — Practical workflow and examples.
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.