Spec-Driven Development: Making the Spec the Center of Gravity

Monthly research note. Theme: Formal Methods & Verification.

TL;DR

A focused memo on Spec-Driven Development: Making the Spec the Center of Gravity: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.

Key takeaways

Keep models small enough to run in seconds or they will rot.
Write properties in plain language next to the formal statement.
Counterexamples are engineering artifacts—minimize them and turn them into tests.
Treat retries, reordering, and partial failure as default conditions.
Design rollbacks as part of the happy path.

Why this matters

Formal models force you to name assumptions (time, ordering, failure).
Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
Verification complements testing by exploring adversarial schedules systematically.
Refinement boundaries prevent “spec drift” between paper and code.

Key questions

What is the environment model (adversary actions, scheduling, failures)?
What is the smallest model that still captures the bug class you fear?
How do you handle state explosion (symmetry, abstraction, bounds)?
What is the refinement boundary between spec and implementation?
How do you ensure proofs stay valid through refactors and upgrades?
How do you convert counterexamples into test harnesses?

Assumptions

Most systems have implicit assumptions about timeouts and ordering.
Specifications omit details; implementations invent them. That gap is risk.
Teams need workflows that keep models and code aligned over time.
Adversaries choose the worst schedule, not the average one.

Non-goals

Treating verification as a one-time event rather than a process.
Proving the whole system end-to-end with all implementation details.

Attack surface

Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.

Model & invariants

In temporal logic terms, the common shape is:

\mathrm{Safety} \equiv \Box\,\mathrm{Inv}\qquad\qquad \mathrm{Liveness} \equiv \Box\Diamond\,\mathrm{Progress}.

Write properties in plain language next to the formal version.

Keep the model small enough to run in seconds; large models rot.

Invariant

Invariants must be checkable from evidence you actually have (state + logs + counters).

Security properties

Evidence: critical actions emit verifiable audit events.
Replay resistance: duplicated inputs do not change outcomes.
Integrity: invalid transitions are rejected (and detectable).
Least authority: privileges are scoped by purpose and time.

Failure modes

Observability gaps during incidents (missing evidence).
Recovery paths that only work when nothing is broken.
Config drift that weakens security posture over time.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart TD
  props["Properties"] --> inv["Invariants"]
  inv --> model["Model"]
  model --> cex["Counterexamples"]
  cex --> tests["Regression Tests"]
  tests --> model

Implementation notes

Keep refinement boundaries explicit: what the spec promises vs what code enforces.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

Workflow:
1) Write a model with a few state variables.
2) State invariants (safety) and progress conditions (liveness).
3) Run model checker with tight bounds.
4) Minimize counterexamples into test cases.
5) Iterate until failures are boring.

Verification strategy

Refinement tests: compare model traces to implementation traces.
Model checking bounded versions of the core protocol.
Runtime assertions for invariants that are cheap to check.
Property-based tests derived from invariants.
Proof maintenance: keep models in CI with a time budget.

Operational notes

Version properties and invariants like code; review changes carefully.
Keep a library of “known hard schedules” from past failures.
Run the model checker in CI with explicit timeouts and bounds.
Treat counterexamples as incidents: track, root-cause, regression-test.
Use models to evaluate protocol upgrades before shipping.

Operational note

Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.

What to monitor

Rollback events and the conditions that triggered them.
Error budget burn + tail latency under load.
Authz failures and policy denials (unexpected spikes).
Invariant violation rate (should be ~0).
Retry/timeout rates by endpoint and client cohort.

Rollback plan

Keep dual-write / dual-verify windows where appropriate.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Use canaries and staged rollout; stop early when signals degrade.
Define an explicit rollback trigger (metrics + thresholds).

Evidence

Site Reliability Engineering (Google) (1) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.
Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.

Open questions

What is the smallest model that reproduces your worst incident class?
Which invariants are cheap enough to monitor in production?
Which properties are you currently assuming but not testing or proving?
How will you keep models aligned during rapid iteration?

Checklist

Assumptions listed and reviewed.
Safety properties stated as invariants.
Telemetry captures correctness signals.
Rollback plan rehearsed and automated.
Failure modes enumerated with mitigations.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading