Concurrency Testing in Rust: Loom, Schedules, and Determinism

Monthly research note. Theme: Formal Methods & Verification.

TL;DR

A focused memo on Concurrency Testing in Rust: Loom, Schedules, and Determinism: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Refinement boundaries prevent spec drift between paper and code.
Write properties in plain language next to the formal statement.
Keep models small enough to run in seconds or they will rot.
Make boundaries boring: validate inputs, cap costs, and be deterministic where needed.
Bind security decisions to evidence (audit, invariants, telemetry).

Why this matters

Most catastrophic bugs are small: a missing condition, a stale variable, a rare interleaving.
Verification complements testing by exploring adversarial schedules systematically.
Counterexamples are better than intuition—they are executable bug reports.
Formal models force you to name assumptions (time, ordering, failure).

Key questions

How do you convert counterexamples into test harnesses?
What is the smallest model that still captures the bug class you fear?
What is the refinement boundary between spec and implementation?
How do you handle state explosion (symmetry, abstraction, bounds)?
Which invariants must hold under every interleaving and crash point?
How do you ensure proofs stay valid through refactors and upgrades?

Assumptions

Teams need workflows that keep models and code aligned over time.
Specifications omit details; implementations invent them. That gap is risk.
Adversaries choose the worst schedule, not the average one.
Most systems have implicit assumptions about timeouts and ordering.

Non-goals

Proving the whole system end-to-end with all implementation details.
Writing models that can’t produce counterexamples quickly.

Attack surface

Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.

Model & invariants

A common way to state linearizability is existence of a sequential history:

\exists H_s:\ H_s \text{ is sequential } \wedge H_s \sim H_c.

Model the scheduler explicitly when concurrency is part of the threat model.

Keep the model small enough to run in seconds; large models rot.

Invariant

If the system can enter an invalid state, it eventually will—usually during an incident.

Security properties

Downgrade resistance: negotiation can’t silently weaken security posture.
Evidence: critical actions emit verifiable audit events.
Replay resistance: duplicated inputs do not change outcomes.
Authenticity: actions are bound to identity and purpose.

Failure modes

Mixed-version behavior that violates assumptions silently.
Observability gaps during incidents (missing evidence).
Config drift that weakens security posture over time.
Recovery paths that only work when nothing is broken.

Pitfall

Caches tend to become sources of truth unless you can recompute and validate them.

Design sketch

flowchart TD
  props["Properties"] --> inv["Invariants"]
  inv --> model["Model"]
  model --> cex["Counterexamples"]
  cex --> tests["Regression Tests"]
  tests --> model

Implementation notes

Keep refinement boundaries explicit: what the spec promises vs what code enforces.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

// Practical tip: make the model "executable" enough to emit traces you can replay.
// Then treat traces as regression inputs for your implementation.

Verification strategy

Proof maintenance: keep models in CI with a time budget.
Runtime assertions for invariants that are cheap to check.
Refinement tests: compare model traces to implementation traces.
Differential tests against other implementations/specs.
Property-based tests derived from invariants.

Operational notes

Keep a library of “known hard schedules” from past failures.
Run the model checker in CI with explicit timeouts and bounds.
Use models to evaluate protocol upgrades before shipping.
Treat counterexamples as incidents: track, root-cause, regression-test.
Version properties and invariants like code; review changes carefully.

Operational note

Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.

What to monitor

Admission-control / rate-limit rejections (by reason).
Invariant violation rate (should be ~0).
Authz failures and policy denials (unexpected spikes).
Error budget burn + tail latency under load.
Rollback events and the conditions that triggered them.

Rollback plan

Prefer backward-compatible changes; avoid “flag day” upgrades.
Keep dual-write / dual-verify windows where appropriate.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Use canaries and staged rollout; stop early when signals degrade.
Define an explicit rollback trigger (metrics + thresholds).

Evidence

Learn TLA+ (1) — Practical workflow and examples.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
Jepsen (2) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.

Open questions

Which properties are you currently assuming but not testing or proving?
Which invariants are cheap enough to monitor in production?
What is the smallest model that reproduces your worst incident class?
How will you keep models aligned during rapid iteration?

Checklist

Rollback plan rehearsed and automated.
Safety properties stated as invariants.
Failure modes enumerated with mitigations.
Telemetry captures correctness signals.
Assumptions listed and reviewed.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading