Monthly research note. Theme: Blockchain Protocols.
TL;DR
State Commitments: Merkle, Verkle, and Proof Sizes as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Topology attacks (eclipse/partition) change security outcomes; harden peer selection.
- Mempools are adversarial schedulers: admission and fairness are protocol concerns.
- Upgrades must be compatibility-aware: mixed rulesets are a threat model.
- Measure correctness signals, not only latency/throughput.
- Automate guardrails; humans are for judgment, not for consistent enforcement.
Why this matters
- Mempools are an attack surface: spam, pinning, and incentive manipulation.
- Finality guarantees are user security guarantees; ambiguity is a UX vulnerability.
- Bridges reintroduce trust; you must model it explicitly.
- State growth is a security problem: it impacts decentralization and verification.
Key questions
- How do upgrades change security assumptions (fork choice, state transition rules)?
- What is the determinism story (byte-for-byte re-execution across platforms)?
- What is the reorg budget for applications and how do you communicate it?
- Which invariants need proofs (supply, balances, ordering, slashing)?
- Where do you enforce resource limits (gas, bandwidth, storage, signature checks)?
- How do you defend against topology attacks (eclipse, partition, sybil)?
Assumptions
- Upgrades happen under partial adoption; mixed-version is inevitable.
- Users and apps rely on probabilistic finality until proven otherwise.
- Peers are untrusted; gossip can be manipulated for delay or isolation.
- Nodes are heterogeneous; determinism must survive platform differences.
Non-goals
- Treating mempool policy as “local preference” when it affects security.
- Relying on client-side heuristics to paper over protocol ambiguity.
Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.
Model & invariants
A simple resource-admission constraint:
Separate consensus safety from execution safety; both must hold.
Treat reorgs as a user-visible security event; encode reorg-aware semantics.
Invariants must be checkable from evidence you actually have (state + logs + counters).
Security properties
- Evidence: critical actions emit verifiable audit events.
- Replay resistance: duplicated inputs do not change outcomes.
- Downgrade resistance: negotiation can’t silently weaken security posture.
- Authenticity: actions are bound to identity and purpose.
Failure modes
- Observability gaps during incidents (missing evidence).
- Mixed-version behavior that violates assumptions silently.
- Config drift that weakens security posture over time.
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Mixed-version deployments create states you never tested—plan for them explicitly.
Design sketch
flowchart TD
tx["Transaction"] --> mp["Mempool (admission + prioritization)"]
mp --> prop["Block Proposal"]
prop --> cons["Consensus / Finality"]
cons --> exec["Deterministic Execution"]
exec --> root["State Root Commitment"]Implementation notes
Encode resource accounting and limits early; retrofits are painful.
Acknowledge only after durability (or make “ack” explicitly best-effort).
// Deterministic execution is a security boundary.
pub trait Executor {
fn apply_block(&mut self, block: &[u8]) -> Result<(), String>;
fn state_root(&self) -> [u8; 32];
}
// Avoid nondeterminism: time, RNG, unordered maps, floating-point.Verification strategy
- Formal invariants for supply/balance conservation where appropriate.
- Adversarial mempool tests: spam, pinning, worst-case signature patterns.
- Determinism tests across architectures (x86/ARM) and OSes.
- Fuzzing transaction decoding and state transition edge cases.
- Fork/reorg simulations: application-facing invariants under reorgs.
Operational notes
- Keep execution resource limits explicit and enforced.
- Monitor reorg depth and frequency; treat increases as incidents.
- Measure invalid tx rejection reasons and rates (spam signature).
- Rehearse upgrades with mixed versions and rollback paths.
- Protect peer tables against eclipse attempts (diversity, scoring, rotation).
Make degraded modes explicit: fail closed vs fail open is a policy choice.
What to monitor
- Admission-control / rate-limit rejections (by reason).
- Authz failures and policy denials (unexpected spikes).
- Error budget burn + tail latency under load.
- Invariant violation rate (should be ~0).
- Rollback events and the conditions that triggered them.
Rollback plan
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Use canaries and staged rollout; stop early when signals degrade.
- Keep dual-write / dual-verify windows where appropriate.
- Define an explicit rollback trigger (metrics + thresholds).
Evidence
- Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
- Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Open questions
- Which invariants should be proven vs tested vs monitored?
- What is the worst-case work a single transaction can force?
- How do you communicate finality uncertainty to users without lying?
- Where does your implementation accidentally depend on local wall-clock time?
Checklist
- Rollback plan rehearsed and automated.
- Failure modes enumerated with mitigations.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Safety properties stated as invariants.
- Assumptions listed and reviewed.
- Telemetry captures correctness signals.
Further reading
- EIP-1559 — Fee market mechanics and incentive surfaces.
- Bitcoin: A Peer-to-Peer Electronic Cash System — The original replicated-ledger model and threat assumptions.
- Ethereum Yellow Paper — A formal-ish specification for execution and state transitions.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.