Monthly research note. Theme: Formal Methods & Verification.
TL;DR
A focused memo on Spec-Driven Development: Making the Spec the Center of Gravity: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Keep models small enough to run in seconds or they will rot.
- Write properties in plain language next to the formal statement.
- Counterexamples are engineering artifacts—minimize them and turn them into tests.
- Treat retries, reordering, and partial failure as default conditions.
- Design rollbacks as part of the happy path.
Why this matters
- Formal models force you to name assumptions (time, ordering, failure).
- Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
- Verification complements testing by exploring adversarial schedules systematically.
- Refinement boundaries prevent “spec drift” between paper and code.
Key questions
- What is the environment model (adversary actions, scheduling, failures)?
- What is the smallest model that still captures the bug class you fear?
- How do you handle state explosion (symmetry, abstraction, bounds)?
- What is the refinement boundary between spec and implementation?
- How do you ensure proofs stay valid through refactors and upgrades?
- How do you convert counterexamples into test harnesses?
Assumptions
- Most systems have implicit assumptions about timeouts and ordering.
- Specifications omit details; implementations invent them. That gap is risk.
- Teams need workflows that keep models and code aligned over time.
- Adversaries choose the worst schedule, not the average one.
Non-goals
- Treating verification as a one-time event rather than a process.
- Proving the whole system end-to-end with all implementation details.
Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.
Model & invariants
In temporal logic terms, the common shape is:
Write properties in plain language next to the formal version.
Keep the model small enough to run in seconds; large models rot.
Invariants must be checkable from evidence you actually have (state + logs + counters).
Security properties
- Evidence: critical actions emit verifiable audit events.
- Replay resistance: duplicated inputs do not change outcomes.
- Integrity: invalid transitions are rejected (and detectable).
- Least authority: privileges are scoped by purpose and time.
Failure modes
- Observability gaps during incidents (missing evidence).
- Recovery paths that only work when nothing is broken.
- Config drift that weakens security posture over time.
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
A recovery plan that isn’t exercised will fail when you need it.
Design sketch
flowchart TD
props["Properties"] --> inv["Invariants"]
inv --> model["Model"]
model --> cex["Counterexamples"]
cex --> tests["Regression Tests"]
tests --> modelImplementation notes
Keep refinement boundaries explicit: what the spec promises vs what code enforces.
Make rollbacks boring: if rollback is a hero move, it will fail.
Workflow:
1) Write a model with a few state variables.
2) State invariants (safety) and progress conditions (liveness).
3) Run model checker with tight bounds.
4) Minimize counterexamples into test cases.
5) Iterate until failures are boring.Verification strategy
- Refinement tests: compare model traces to implementation traces.
- Model checking bounded versions of the core protocol.
- Runtime assertions for invariants that are cheap to check.
- Property-based tests derived from invariants.
- Proof maintenance: keep models in CI with a time budget.
Operational notes
- Version properties and invariants like code; review changes carefully.
- Keep a library of “known hard schedules” from past failures.
- Run the model checker in CI with explicit timeouts and bounds.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Use models to evaluate protocol upgrades before shipping.
Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.
What to monitor
- Rollback events and the conditions that triggered them.
- Error budget burn + tail latency under load.
- Authz failures and policy denials (unexpected spikes).
- Invariant violation rate (should be ~0).
- Retry/timeout rates by endpoint and client cohort.
Rollback plan
- Keep dual-write / dual-verify windows where appropriate.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Use canaries and staged rollout; stop early when signals degrade.
- Define an explicit rollback trigger (metrics + thresholds).
Evidence
- Site Reliability Engineering (Google) (1) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.
- Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Open questions
- What is the smallest model that reproduces your worst incident class?
- Which invariants are cheap enough to monitor in production?
- Which properties are you currently assuming but not testing or proving?
- How will you keep models aligned during rapid iteration?
Checklist
- Assumptions listed and reviewed.
- Safety properties stated as invariants.
- Telemetry captures correctness signals.
- Rollback plan rehearsed and automated.
- Failure modes enumerated with mitigations.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Further reading
- Learn TLA+ — Practical workflow and examples.
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.
- Jepsen — Fault injection and correctness testing for distributed systems.