Monthly research note. Theme: Formal Methods & Verification.
TL;DR
Property-Based Testing: Finding Bugs You Didn’t Imagine as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Write properties in plain language next to the formal statement.
- Keep models small enough to run in seconds or they will rot.
- Refinement boundaries prevent spec drift between paper and code.
- Measure correctness signals, not only latency/throughput.
- Write assumptions down; treat them as interfaces.
Why this matters
- Formal models force you to name assumptions (time, ordering, failure).
- Refinement boundaries prevent “spec drift” between paper and code.
- Verification complements testing by exploring adversarial schedules systematically.
- The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
Key questions
- How do you handle state explosion (symmetry, abstraction, bounds)?
- How do you convert counterexamples into test harnesses?
- What is the environment model (adversary actions, scheduling, failures)?
- What is the refinement boundary between spec and implementation?
- Which invariants must hold under every interleaving and crash point?
- How do you ensure proofs stay valid through refactors and upgrades?
Assumptions
- Teams need workflows that keep models and code aligned over time.
- Adversaries choose the worst schedule, not the average one.
- Concurrency introduces interleavings humans don’t reason about reliably.
- Specifications omit details; implementations invent them. That gap is risk.
Non-goals
- Assuming the spec and the code share the same definitions implicitly.
- Writing models that can’t produce counterexamples quickly.
Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.
Model & invariants
In temporal logic terms, the common shape is:
Write properties in plain language next to the formal version.
Model the scheduler explicitly when concurrency is part of the threat model.
Invariants must be checkable from evidence you actually have (state + logs + counters).
Security properties
- Replay resistance: duplicated inputs do not change outcomes.
- Integrity: invalid transitions are rejected (and detectable).
- Least authority: privileges are scoped by purpose and time.
- Authenticity: actions are bound to identity and purpose.
Failure modes
- Mixed-version behavior that violates assumptions silently.
- Config drift that weakens security posture over time.
- Observability gaps during incidents (missing evidence).
- Recovery paths that only work when nothing is broken.
Sampling hides the rare schedule that breaks your invariants.
Design sketch
flowchart TD
props["Properties"] --> inv["Invariants"]
inv --> model["Model"]
model --> cex["Counterexamples"]
cex --> tests["Regression Tests"]
tests --> modelImplementation notes
Treat invariants as code: version, review, and test them.
Make rollbacks boring: if rollback is a hero move, it will fail.
Workflow:
1) Write a model with a few state variables.
2) State invariants (safety) and progress conditions (liveness).
3) Run model checker with tight bounds.
4) Minimize counterexamples into test cases.
5) Iterate until failures are boring.Verification strategy
- Property-based tests derived from invariants.
- Runtime assertions for invariants that are cheap to check.
- Refinement tests: compare model traces to implementation traces.
- Model checking bounded versions of the core protocol.
- Differential tests against other implementations/specs.
Operational notes
- Use models to evaluate protocol upgrades before shipping.
- Version properties and invariants like code; review changes carefully.
- Run the model checker in CI with explicit timeouts and bounds.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Keep a library of “known hard schedules” from past failures.
Attach explicit rollout/rollback triggers to changes that touch security or correctness.
What to monitor
- Error budget burn + tail latency under load.
- Retry/timeout rates by endpoint and client cohort.
- Invariant violation rate (should be ~0).
- Authz failures and policy denials (unexpected spikes).
- Rollback events and the conditions that triggered them.
Rollback plan
- Define an explicit rollback trigger (metrics + thresholds).
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Use canaries and staged rollout; stop early when signals degrade.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Keep dual-write / dual-verify windows where appropriate.
Evidence
- Learn TLA+ (1) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
- Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Open questions
- Which properties are you currently assuming but not testing or proving?
- Which invariants are cheap enough to monitor in production?
- What is the smallest model that reproduces your worst incident class?
- How will you keep models aligned during rapid iteration?
Checklist
- Safety properties stated as invariants.
- Telemetry captures correctness signals.
- Rollback plan rehearsed and automated.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Assumptions listed and reviewed.
- Failure modes enumerated with mitigations.
Further reading
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Learn TLA+ — Practical workflow and examples.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.