Refinement: Proving Your Implementation Matches the Spec

Monthly research note. Theme: Formal Methods & Verification.

TL;DR

Refinement: Proving Your Implementation Matches the Spec as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.

Key insight

If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.

Key takeaways

Counterexamples are engineering artifacts—minimize them and turn them into tests.
Keep models small enough to run in seconds or they will rot.
Write properties in plain language next to the formal statement.
Measure correctness signals, not only latency/throughput.
Make boundaries boring: validate inputs, cap costs, and be deterministic where needed.

Why this matters

The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
Verification complements testing by exploring adversarial schedules systematically.
Formal models force you to name assumptions (time, ordering, failure).

Key questions

How do you handle state explosion (symmetry, abstraction, bounds)?
What is the environment model (adversary actions, scheduling, failures)?
Which properties belong in the model vs in tests vs in monitoring?
How do you convert counterexamples into test harnesses?
What is the smallest model that still captures the bug class you fear?
Which invariants must hold under every interleaving and crash point?

Assumptions

Concurrency introduces interleavings humans don’t reason about reliably.
Most systems have implicit assumptions about timeouts and ordering.
Adversaries choose the worst schedule, not the average one.
Specifications omit details; implementations invent them. That gap is risk.

Non-goals

Proving the whole system end-to-end with all implementation details.
Writing models that can’t produce counterexamples quickly.

Attack surface

Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.

Model & invariants

A common way to state linearizability is existence of a sequential history:

\exists H_s:\ H_s \text{ is sequential } \wedge H_s \sim H_c.

Keep the model small enough to run in seconds; large models rot.

Write properties in plain language next to the formal version.

Invariant

If the system can enter an invalid state, it eventually will—usually during an incident.

Security properties

Replay resistance: duplicated inputs do not change outcomes.
Integrity: invalid transitions are rejected (and detectable).
Downgrade resistance: negotiation can’t silently weaken security posture.
Least authority: privileges are scoped by purpose and time.

Failure modes

Config drift that weakens security posture over time.
Observability gaps during incidents (missing evidence).
Timeout ambiguity causing double-apply or partial state transitions.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart TD
  props["Properties"] --> inv["Invariants"]
  inv --> model["Model"]
  model --> cex["Counterexamples"]
  cex --> tests["Regression Tests"]
  tests --> model

Implementation notes

Treat invariants as code: version, review, and test them.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.

Verification strategy

Property-based tests derived from invariants.
Differential tests against other implementations/specs.
Model checking bounded versions of the core protocol.
Refinement tests: compare model traces to implementation traces.
Runtime assertions for invariants that are cheap to check.

Operational notes

Use models to evaluate protocol upgrades before shipping.
Run the model checker in CI with explicit timeouts and bounds.
Version properties and invariants like code; review changes carefully.
Treat counterexamples as incidents: track, root-cause, regression-test.
Keep a library of “known hard schedules” from past failures.

Operational note

Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.

What to monitor

Retry/timeout rates by endpoint and client cohort.
Authz failures and policy denials (unexpected spikes).
Error budget burn + tail latency under load.
Rollback events and the conditions that triggered them.
Admission-control / rate-limit rejections (by reason).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Use canaries and staged rollout; stop early when signals degrade.
Keep dual-write / dual-verify windows where appropriate.

Evidence

Designing Data-Intensive Applications (Kleppmann) (1) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Learn TLA+ (2) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.

Open questions

Which invariants are cheap enough to monitor in production?
How will you keep models aligned during rapid iteration?
Which properties are you currently assuming but not testing or proving?
What is the smallest model that reproduces your worst incident class?

Checklist

Rollback plan rehearsed and automated.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Telemetry captures correctness signals.
Assumptions listed and reviewed.
Safety properties stated as invariants.
Failure modes enumerated with mitigations.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading