Monthly research note. Theme: Blockchain Protocols.
TL;DR
A focused memo on ZK in Protocols: Proof Systems as Network Primitives: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.
Correctness is cheaper to enforce at interfaces than to repair in production data.
Key takeaways
- Topology attacks (eclipse/partition) change security outcomes; harden peer selection.
- Upgrades must be compatibility-aware: mixed rulesets are a threat model.
- Mempools are adversarial schedulers: admission and fairness are protocol concerns.
- Measure correctness signals, not only latency/throughput.
- Prefer protocols and APIs that make invalid states hard to express.
Why this matters
- Mempools are an attack surface: spam, pinning, and incentive manipulation.
- Bridges reintroduce trust; you must model it explicitly.
- Light clients shift assumptions; they must be written down.
- Consensus safety is meaningless if execution is nondeterministic across nodes.
Key questions
- Where do you enforce resource limits (gas, bandwidth, storage, signature checks)?
- How do upgrades change security assumptions (fork choice, state transition rules)?
- Where is the economic/DoS pressure applied (mempool, gossip, execution, storage)?
- What is the reorg budget for applications and how do you communicate it?
- What is the determinism story (byte-for-byte re-execution across platforms)?
- How do you defend against topology attacks (eclipse, partition, sybil)?
Assumptions
- Nodes are heterogeneous; determinism must survive platform differences.
- Attackers can buy bandwidth and compute; they can also bribe and censor.
- Users and apps rely on probabilistic finality until proven otherwise.
- Peers are untrusted; gossip can be manipulated for delay or isolation.
Non-goals
- Assuming honest majority without defining the adversary’s budget.
- Relying on client-side heuristics to paper over protocol ambiguity.
Any unbounded work per request becomes a DoS primitive under adversaries.
Model & invariants
A ledger is a replicated state machine. Safety is uniqueness of finalized history:
Explicitly model upgrade boundaries: old rules vs new rules during transition.
Treat reorgs as a user-visible security event; encode reorg-aware semantics.
Make the “impossible state” observable: a metric or alert that fires when invariants drift.
Security properties
- Replay resistance: duplicated inputs do not change outcomes.
- Least authority: privileges are scoped by purpose and time.
- Authenticity: actions are bound to identity and purpose.
- Integrity: invalid transitions are rejected (and detectable).
Failure modes
- Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
- Mixed-version behavior that violates assumptions silently.
- Timeout ambiguity causing double-apply or partial state transitions.
- Observability gaps during incidents (missing evidence).
Mixed-version deployments create states you never tested—plan for them explicitly.
Design sketch
flowchart TD
tx["Transaction"] --> mp["Mempool (admission + prioritization)"]
mp --> prop["Block Proposal"]
prop --> cons["Consensus / Finality"]
cons --> exec["Deterministic Execution"]
exec --> root["State Root Commitment"]Implementation notes
Encode resource accounting and limits early; retrofits are painful.
Acknowledge only after durability (or make “ack” explicitly best-effort).
Mempool hardening checklist:
- Per-peer rate limits + global admission budget
- Duplicate detection and eviction policy
- Signature verification batching with caps
- Anti-DoS: bounded decode/parse cost
- Fairness: per-sender quotas (avoid hot-account starvation)Verification strategy
- Adversarial mempool tests: spam, pinning, worst-case signature patterns.
- Fuzzing transaction decoding and state transition edge cases.
- Fork/reorg simulations: application-facing invariants under reorgs.
- Formal invariants for supply/balance conservation where appropriate.
- Determinism tests across architectures (x86/ARM) and OSes.
Operational notes
- Measure invalid tx rejection reasons and rates (spam signature).
- Monitor reorg depth and frequency; treat increases as incidents.
- Rehearse upgrades with mixed versions and rollback paths.
- Keep execution resource limits explicit and enforced.
- Protect peer tables against eclipse attempts (diversity, scoring, rotation).
Make degraded modes explicit: fail closed vs fail open is a policy choice.
What to monitor
- Admission-control / rate-limit rejections (by reason).
- Invariant violation rate (should be ~0).
- Retry/timeout rates by endpoint and client cohort.
- Error budget burn + tail latency under load.
- Rollback events and the conditions that triggered them.
Rollback plan
- Define an explicit rollback trigger (metrics + thresholds).
- Prefer backward-compatible changes; avoid “flag day” upgrades.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Keep dual-write / dual-verify windows where appropriate.
- Use canaries and staged rollout; stop early when signals degrade.
Evidence
- Learn TLA+ (1) — Practical entry point for specification and model checking.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
- Site Reliability Engineering (Google) (2) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.
Open questions
- Which invariants should be proven vs tested vs monitored?
- Where does your implementation accidentally depend on local wall-clock time?
- How do you communicate finality uncertainty to users without lying?
- What is the worst-case work a single transaction can force?
Checklist
- Assumptions listed and reviewed.
- Failure modes enumerated with mitigations.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Safety properties stated as invariants.
- Telemetry captures correctness signals.
- Rollback plan rehearsed and automated.
Further reading
- Ethereum Yellow Paper — A formal-ish specification for execution and state transitions.
- EIP-1559 — Fee market mechanics and incentive surfaces.
- Bitcoin: A Peer-to-Peer Electronic Cash System — The original replicated-ledger model and threat assumptions.
- Jepsen — Fault injection and correctness testing for distributed systems.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Learn TLA+ — Practical entry point for specification and model checking.