Monthly research note. Theme: Formal Methods & Verification.
TL;DR
Fuzzing Protocol Parsers: When Inputs Are Adversarial as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.
Correctness is cheaper to enforce at interfaces than to repair in production data.
Key takeaways
- Refinement boundaries prevent spec drift between paper and code.
- Counterexamples are engineering artifacts—minimize them and turn them into tests.
- Keep models small enough to run in seconds or they will rot.
- Make boundaries boring: validate inputs, cap costs, and be deterministic where needed.
- Write assumptions down; treat them as interfaces.
Why this matters
- Verification complements testing by exploring adversarial schedules systematically.
- Counterexamples are better than intuition—they are executable bug reports.
- Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
- Formal models force you to name assumptions (time, ordering, failure).
Key questions
- What is the smallest model that still captures the bug class you fear?
- What is the environment model (adversary actions, scheduling, failures)?
- How do you handle state explosion (symmetry, abstraction, bounds)?
- Which invariants must hold under every interleaving and crash point?
- How do you ensure proofs stay valid through refactors and upgrades?
- How do you convert counterexamples into test harnesses?
Assumptions
- Adversaries choose the worst schedule, not the average one.
- Most systems have implicit assumptions about timeouts and ordering.
- Concurrency introduces interleavings humans don’t reason about reliably.
- Specifications omit details; implementations invent them. That gap is risk.
Non-goals
- Treating verification as a one-time event rather than a process.
- Writing models that can’t produce counterexamples quickly.
Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.
Model & invariants
In temporal logic terms, the common shape is:
Treat counterexamples as regression tests: reduce, encode, and replay.
Model the scheduler explicitly when concurrency is part of the threat model.
Invariants must be checkable from evidence you actually have (state + logs + counters).
Security properties
- Integrity: invalid transitions are rejected (and detectable).
- Least authority: privileges are scoped by purpose and time.
- Replay resistance: duplicated inputs do not change outcomes.
- Downgrade resistance: negotiation can’t silently weaken security posture.
Failure modes
- Timeout ambiguity causing double-apply or partial state transitions.
- Observability gaps during incidents (missing evidence).
- Recovery paths that only work when nothing is broken.
- Mixed-version behavior that violates assumptions silently.
Mixed-version deployments create states you never tested—plan for them explicitly.
Design sketch
flowchart LR
spec["Spec (TLA+/PlusCal)"] --> mc["Model Check"]
mc --> refine["Refinement / Invariants"]
refine --> impl["Implementation (Rust/Go)"]
impl --> tests["Fuzz / PBT / Differential"]
tests --> specImplementation notes
Treat invariants as code: version, review, and test them.
Make rollbacks boring: if rollback is a hero move, it will fail.
Workflow:
1) Write a model with a few state variables.
2) State invariants (safety) and progress conditions (liveness).
3) Run model checker with tight bounds.
4) Minimize counterexamples into test cases.
5) Iterate until failures are boring.Verification strategy
- Proof maintenance: keep models in CI with a time budget.
- Differential tests against other implementations/specs.
- Property-based tests derived from invariants.
- Model checking bounded versions of the core protocol.
- Refinement tests: compare model traces to implementation traces.
Operational notes
- Use models to evaluate protocol upgrades before shipping.
- Treat counterexamples as incidents: track, root-cause, regression-test.
- Run the model checker in CI with explicit timeouts and bounds.
- Version properties and invariants like code; review changes carefully.
- Keep a library of “known hard schedules” from past failures.
Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.
What to monitor
- Retry/timeout rates by endpoint and client cohort.
- Authz failures and policy denials (unexpected spikes).
- Admission-control / rate-limit rejections (by reason).
- Rollback events and the conditions that triggered them.
- Error budget burn + tail latency under load.
Rollback plan
- Keep dual-write / dual-verify windows where appropriate.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Use canaries and staged rollout; stop early when signals degrade.
- Define an explicit rollback trigger (metrics + thresholds).
Evidence
- Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
- Learn TLA+ (2) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
Open questions
- How will you keep models aligned during rapid iteration?
- What is the smallest model that reproduces your worst incident class?
- Which properties are you currently assuming but not testing or proving?
- Which invariants are cheap enough to monitor in production?
Checklist
- Assumptions listed and reviewed.
- Telemetry captures correctness signals.
- Failure modes enumerated with mitigations.
- Safety properties stated as invariants.
- Rollback plan rehearsed and automated.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Further reading
- Specifying Systems (Lamport) — The TLA+ reference for safety/liveness and system specs.
- Paxos Made Simple (Lamport) — A small protocol that demonstrates why specs matter.
- Learn TLA+ — Practical workflow and examples.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.