Symbolic Execution: When Brute Force Becomes Logic

Monthly research note. Theme: Formal Methods & Verification.

TL;DR

Symbolic Execution: When Brute Force Becomes Logic as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.

Key insight

If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.

Key takeaways

Refinement boundaries prevent spec drift between paper and code.
Counterexamples are engineering artifacts—minimize them and turn them into tests.
Write properties in plain language next to the formal statement.
Automate guardrails; humans are for judgment, not for consistent enforcement.
Make boundaries boring: validate inputs, cap costs, and be deterministic where needed.

Why this matters

Counterexamples are better than intuition—they are executable bug reports.
Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
The goal is not a perfect proof—it’s reducing the space of unknown failure modes.
Verification complements testing by exploring adversarial schedules systematically.

Key questions

How do you ensure proofs stay valid through refactors and upgrades?
How do you convert counterexamples into test harnesses?
What is the environment model (adversary actions, scheduling, failures)?
What is the smallest model that still captures the bug class you fear?
Which properties belong in the model vs in tests vs in monitoring?
How do you handle state explosion (symmetry, abstraction, bounds)?

Assumptions

Most systems have implicit assumptions about timeouts and ordering.
Concurrency introduces interleavings humans don’t reason about reliably.
Specifications omit details; implementations invent them. That gap is risk.
Adversaries choose the worst schedule, not the average one.

Non-goals

Treating verification as a one-time event rather than a process.
Writing models that can’t produce counterexamples quickly.

Attack surface

Parsing is an attacker-controlled interface—validate early and fail fast.

Model & invariants

A common way to state linearizability is existence of a sequential history:

\exists H_s:\ H_s \text{ is sequential } \wedge H_s \sim H_c.

Treat counterexamples as regression tests: reduce, encode, and replay.

Model the scheduler explicitly when concurrency is part of the threat model.

Invariant

Invariants must be checkable from evidence you actually have (state + logs + counters).

Security properties

Evidence: critical actions emit verifiable audit events.
Authenticity: actions are bound to identity and purpose.
Replay resistance: duplicated inputs do not change outcomes.
Integrity: invalid transitions are rejected (and detectable).

Failure modes

Config drift that weakens security posture over time.
Observability gaps during incidents (missing evidence).
Mixed-version behavior that violates assumptions silently.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.

Pitfall

Caches tend to become sources of truth unless you can recompute and validate them.

Design sketch

flowchart LR
  spec["Spec (TLA+/PlusCal)"] --> mc["Model Check"]
  mc --> refine["Refinement / Invariants"]
  refine --> impl["Implementation (Rust/Go)"]
  impl --> tests["Fuzz / PBT / Differential"]
  tests --> spec

Implementation notes

Make the model executable enough to generate counterexamples quickly.

Rule of thumb

Bound work per request: parse, validate, and cap cost before you allocate heavy resources.

// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.

Verification strategy

Model checking bounded versions of the core protocol.
Property-based tests derived from invariants.
Differential tests against other implementations/specs.
Proof maintenance: keep models in CI with a time budget.
Refinement tests: compare model traces to implementation traces.

Operational notes

Keep a library of “known hard schedules” from past failures.
Treat counterexamples as incidents: track, root-cause, regression-test.
Run the model checker in CI with explicit timeouts and bounds.
Version properties and invariants like code; review changes carefully.
Use models to evaluate protocol upgrades before shipping.

Operational note

Make degraded modes explicit: fail closed vs fail open is a policy choice.

What to monitor

Retry/timeout rates by endpoint and client cohort.
Invariant violation rate (should be ~0).
Error budget burn + tail latency under load.
Admission-control / rate-limit rejections (by reason).
Authz failures and policy denials (unexpected spikes).

Rollback plan

Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Define an explicit rollback trigger (metrics + thresholds).
Prefer backward-compatible changes; avoid “flag day” upgrades.
Use canaries and staged rollout; stop early when signals degrade.
Keep dual-write / dual-verify windows where appropriate.

Evidence

Designing Data-Intensive Applications (Kleppmann) (1) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Jepsen (2) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.

Open questions

How will you keep models aligned during rapid iteration?
Which invariants are cheap enough to monitor in production?
Which properties are you currently assuming but not testing or proving?
What is the smallest model that reproduces your worst incident class?

Checklist

Failure modes enumerated with mitigations.
Safety properties stated as invariants.
Assumptions listed and reviewed.
Rollback plan rehearsed and automated.
Telemetry captures correctness signals.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading