Monthly research note. Theme: Blockchain Protocols.
TL;DR
A focused memo on The Ledger as a State Machine: Execution, Determinism, and Reproducibility: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.
Treat “timeouts” as a third outcome: not success, not failure—ambiguity you must model.
Key takeaways
- Finality guarantees are user security guarantees—document and enforce them.
- Topology attacks (eclipse/partition) change security outcomes; harden peer selection.
- Mempools are adversarial schedulers: admission and fairness are protocol concerns.
- Bind security decisions to evidence (audit, invariants, telemetry).
- Make failure modes explicit and observable.
Why this matters
- Bridges reintroduce trust; you must model it explicitly.
- Topology attacks (eclipse, partition) change who sees which transactions.
- Light clients shift assumptions; they must be written down.
- Finality guarantees are user security guarantees; ambiguity is a UX vulnerability.
Key questions
- Which invariants need proofs (supply, balances, ordering, slashing)?
- Where is the economic/DoS pressure applied (mempool, gossip, execution, storage)?
- What is the determinism story (byte-for-byte re-execution across platforms)?
- What is the finality guarantee users can rely on (and when does it break)?
- How do you defend against topology attacks (eclipse, partition, sybil)?
- What is the reorg budget for applications and how do you communicate it?
Assumptions
- Users and apps rely on probabilistic finality until proven otherwise.
- Attackers can buy bandwidth and compute; they can also bribe and censor.
- Nodes are heterogeneous; determinism must survive platform differences.
- Peers are untrusted; gossip can be manipulated for delay or isolation.
Non-goals
- Allowing execution nondeterminism for performance convenience.
- Treating mempool policy as “local preference” when it affects security.
Any unbounded work per request becomes a DoS primitive under adversaries.
Model & invariants
A ledger is a replicated state machine. Safety is uniqueness of finalized history:
Explicitly model upgrade boundaries: old rules vs new rules during transition.
Treat reorgs as a user-visible security event; encode reorg-aware semantics.
If the system can enter an invalid state, it eventually will—usually during an incident.
Security properties
- Replay resistance: duplicated inputs do not change outcomes.
- Downgrade resistance: negotiation can’t silently weaken security posture.
- Evidence: critical actions emit verifiable audit events.
- Least authority: privileges are scoped by purpose and time.
Failure modes
- Timeout ambiguity causing double-apply or partial state transitions.
- Mixed-version behavior that violates assumptions silently.
- Recovery paths that only work when nothing is broken.
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
A recovery plan that isn’t exercised will fail when you need it.
Design sketch
flowchart TD
tx["Transaction"] --> mp["Mempool (admission + prioritization)"]
mp --> prop["Block Proposal"]
prop --> cons["Consensus / Finality"]
cons --> exec["Deterministic Execution"]
exec --> root["State Root Commitment"]Implementation notes
Determinism is a boundary: every nondeterministic input is an attack surface.
Make rollbacks boring: if rollback is a hero move, it will fail.
// Deterministic execution is a security boundary.
pub trait Executor {
fn apply_block(&mut self, block: &[u8]) -> Result<(), String>;
fn state_root(&self) -> [u8; 32];
}
// Avoid nondeterminism: time, RNG, unordered maps, floating-point.Verification strategy
- Fuzzing transaction decoding and state transition edge cases.
- Adversarial mempool tests: spam, pinning, worst-case signature patterns.
- Determinism tests across architectures (x86/ARM) and OSes.
- Formal invariants for supply/balance conservation where appropriate.
- Fork/reorg simulations: application-facing invariants under reorgs.
Operational notes
- Keep execution resource limits explicit and enforced.
- Protect peer tables against eclipse attempts (diversity, scoring, rotation).
- Measure invalid tx rejection reasons and rates (spam signature).
- Rehearse upgrades with mixed versions and rollback paths.
- Monitor reorg depth and frequency; treat increases as incidents.
Keep audit and config history queryable during incidents—evidence beats intuition.
What to monitor
- Rollback events and the conditions that triggered them.
- Admission-control / rate-limit rejections (by reason).
- Authz failures and policy denials (unexpected spikes).
- Error budget burn + tail latency under load.
- Invariant violation rate (should be ~0).
Rollback plan
- Define an explicit rollback trigger (metrics + thresholds).
- Keep dual-write / dual-verify windows where appropriate.
- Use canaries and staged rollout; stop early when signals degrade.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Evidence
- Learn TLA+ (1) — Practical entry point for specification and model checking.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
- Site Reliability Engineering (Google) (2) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.
Open questions
- How do you communicate finality uncertainty to users without lying?
- Where does your implementation accidentally depend on local wall-clock time?
- What is the worst-case work a single transaction can force?
- Which invariants should be proven vs tested vs monitored?
Checklist
- Failure modes enumerated with mitigations.
- Telemetry captures correctness signals.
- Rollback plan rehearsed and automated.
- Safety properties stated as invariants.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Assumptions listed and reviewed.
Further reading
- Bitcoin: A Peer-to-Peer Electronic Cash System — The original replicated-ledger model and threat assumptions.
- Ethereum Yellow Paper — A formal-ish specification for execution and state transitions.
- EIP-1559 — Fee market mechanics and incentive surfaces.
- Learn TLA+ — Practical entry point for specification and model checking.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.