Monthly research note. Theme: Formal Methods & Verification.
TL;DR
Symbolic Execution: When Brute Force Becomes Logic as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Refinement boundaries prevent spec drift between paper and code.
- Counterexamples are engineering artifacts—minimize them and turn them into tests.
- Write properties in plain language next to the formal statement.
- Automate guardrails; humans are for judgment, not for consistent enforcement.
- Make boundaries boring: validate inputs, cap costs, and be deterministic where needed.
Why this matters
- Counterexamples are better than intuition—they are executable bug reports.
- Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
- The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
- Verification complements testing by exploring adversarial schedules systematically.
Key questions
- How do you ensure proofs stay valid through refactors and upgrades?
- How do you convert counterexamples into test harnesses?
- What is the environment model (adversary actions, scheduling, failures)?
- What is the smallest model that still captures the bug class you fear?
- Which properties belong in the model vs in tests vs in monitoring?
- How do you handle state explosion (symmetry, abstraction, bounds)?
Assumptions
- Most systems have implicit assumptions about timeouts and ordering.
- Concurrency introduces interleavings humans don’t reason about reliably.
- Specifications omit details; implementations invent them. That gap is risk.
- Adversaries choose the worst schedule, not the average one.
Non-goals
- Treating verification as a one-time event rather than a process.
- Writing models that can’t produce counterexamples quickly.
Parsing is an attacker-controlled interface—validate early and fail fast.
Model & invariants
A common way to state linearizability is existence of a sequential history:
Treat counterexamples as regression tests: reduce, encode, and replay.
Model the scheduler explicitly when concurrency is part of the threat model.
Invariants must be checkable from evidence you actually have (state + logs + counters).
Security properties
- Evidence: critical actions emit verifiable audit events.
- Authenticity: actions are bound to identity and purpose.
- Replay resistance: duplicated inputs do not change outcomes.
- Integrity: invalid transitions are rejected (and detectable).
Failure modes
- Config drift that weakens security posture over time.
- Observability gaps during incidents (missing evidence).
- Mixed-version behavior that violates assumptions silently.
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Caches tend to become sources of truth unless you can recompute and validate them.
Design sketch
flowchart LR
spec["Spec (TLA+/PlusCal)"] --> mc["Model Check"]
mc --> refine["Refinement / Invariants"]
refine --> impl["Implementation (Rust/Go)"]
impl --> tests["Fuzz / PBT / Differential"]
tests --> specImplementation notes
Make the model executable enough to generate counterexamples quickly.
Bound work per request: parse, validate, and cap cost before you allocate heavy resources.
// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.Verification strategy
- Model checking bounded versions of the core protocol.
- Property-based tests derived from invariants.
- Differential tests against other implementations/specs.
- Proof maintenance: keep models in CI with a time budget.
- Refinement tests: compare model traces to implementation traces.
Operational notes
- Keep a library of “known hard schedules” from past failures.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Run the model checker in CI with explicit timeouts and bounds.
- Version properties and invariants like code; review changes carefully.
- Use models to evaluate protocol upgrades before shipping.
Make degraded modes explicit: fail closed vs fail open is a policy choice.
What to monitor
- Retry/timeout rates by endpoint and client cohort.
- Invariant violation rate (should be ~0).
- Error budget burn + tail latency under load.
- Admission-control / rate-limit rejections (by reason).
- Authz failures and policy denials (unexpected spikes).
Rollback plan
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Define an explicit rollback trigger (metrics + thresholds).
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Use canaries and staged rollout; stop early when signals degrade.
- Keep dual-write / dual-verify windows where appropriate.
Evidence
- Designing Data-Intensive Applications (Kleppmann) (1) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
- Jepsen (2) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
Open questions
- How will you keep models aligned during rapid iteration?
- Which invariants are cheap enough to monitor in production?
- Which properties are you currently assuming but not testing or proving?
- What is the smallest model that reproduces your worst incident class?
Checklist
- Failure modes enumerated with mitigations.
- Safety properties stated as invariants.
- Assumptions listed and reviewed.
- Rollback plan rehearsed and automated.
- Telemetry captures correctness signals.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Further reading
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Learn TLA+ — Practical workflow and examples.
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.
- Jepsen — Fault injection and correctness testing for distributed systems.