Monthly research note. Theme: Blockchain Protocols.
TL;DR
A focused memo on Gossip Networks: Propagation, Eclipse Attacks, and Topology: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.
If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.
Key takeaways
- Topology attacks (eclipse/partition) change security outcomes; harden peer selection.
- Upgrades must be compatibility-aware: mixed rulesets are a threat model.
- Finality guarantees are user security guarantees—document and enforce them.
- Bind security decisions to evidence (audit, invariants, telemetry).
- Automate guardrails; humans are for judgment, not for consistent enforcement.
Why this matters
- Topology attacks (eclipse, partition) change who sees which transactions.
- State growth is a security problem: it impacts decentralization and verification.
- Light clients shift assumptions; they must be written down.
- MEV turns protocol details into adversarial strategy.
Key questions
- Where do you enforce resource limits (gas, bandwidth, storage, signature checks)?
- What is the determinism story (byte-for-byte re-execution across platforms)?
- Where is the economic/DoS pressure applied (mempool, gossip, execution, storage)?
- What is the finality guarantee users can rely on (and when does it break)?
- What is the reorg budget for applications and how do you communicate it?
- How do you defend against topology attacks (eclipse, partition, sybil)?
Assumptions
- Upgrades happen under partial adoption; mixed-version is inevitable.
- Users and apps rely on probabilistic finality until proven otherwise.
- Attackers can buy bandwidth and compute; they can also bribe and censor.
- Peers are untrusted; gossip can be manipulated for delay or isolation.
Non-goals
- Relying on client-side heuristics to paper over protocol ambiguity.
- Allowing execution nondeterminism for performance convenience.
Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.
Model & invariants
A simple resource-admission constraint:
Treat reorgs as a user-visible security event; encode reorg-aware semantics.
Separate consensus safety from execution safety; both must hold.
If the system can enter an invalid state, it eventually will—usually during an incident.
Security properties
- Integrity: invalid transitions are rejected (and detectable).
- Least authority: privileges are scoped by purpose and time.
- Downgrade resistance: negotiation can’t silently weaken security posture.
- Replay resistance: duplicated inputs do not change outcomes.
Failure modes
- Observability gaps during incidents (missing evidence).
- Config drift that weakens security posture over time.
- Mixed-version behavior that violates assumptions silently.
- Timeout ambiguity causing double-apply or partial state transitions.
Sampling hides the rare schedule that breaks your invariants.
Design sketch
sequenceDiagram
participant U as User
participant N as Node
participant P as Peers
U->>N: submit(tx)
N->>P: gossip(tx)
P-->>N: gossip(more tx)
Note over N: admission + ordering
N-->>U: inclusion/finality signalImplementation notes
Determinism is a boundary: every nondeterministic input is an attack surface.
Make rollbacks boring: if rollback is a hero move, it will fail.
// Deterministic execution is a security boundary.
pub trait Executor {
fn apply_block(&mut self, block: &[u8]) -> Result<(), String>;
fn state_root(&self) -> [u8; 32];
}
// Avoid nondeterminism: time, RNG, unordered maps, floating-point.Verification strategy
- Cross-implementation tests when multiple clients exist.
- Fork/reorg simulations: application-facing invariants under reorgs.
- Adversarial mempool tests: spam, pinning, worst-case signature patterns.
- Fuzzing transaction decoding and state transition edge cases.
- Formal invariants for supply/balance conservation where appropriate.
Operational notes
- Measure invalid tx rejection reasons and rates (spam signature).
- Protect peer tables against eclipse attempts (diversity, scoring, rotation).
- Keep execution resource limits explicit and enforced.
- Rehearse upgrades with mixed versions and rollback paths.
- Monitor reorg depth and frequency; treat increases as incidents.
Keep audit and config history queryable during incidents—evidence beats intuition.
What to monitor
- Admission-control / rate-limit rejections (by reason).
- Rollback events and the conditions that triggered them.
- Retry/timeout rates by endpoint and client cohort.
- Error budget burn + tail latency under load.
- Authz failures and policy denials (unexpected spikes).
Rollback plan
- Keep dual-write / dual-verify windows where appropriate.
- Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
- Define an explicit rollback trigger (metrics + thresholds).
- Use canaries and staged rollout; stop early when signals degrade.
- Prefer backward-compatible changes; avoid “flag day” upgrades.
Evidence
- Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
- Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.
Open questions
- What is the worst-case work a single transaction can force?
- Where does your implementation accidentally depend on local wall-clock time?
- How do you communicate finality uncertainty to users without lying?
- Which invariants should be proven vs tested vs monitored?
Checklist
- Telemetry captures correctness signals.
- Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
- Safety properties stated as invariants.
- Failure modes enumerated with mitigations.
- Rollback plan rehearsed and automated.
- Assumptions listed and reviewed.
Further reading
- Bitcoin: A Peer-to-Peer Electronic Cash System — The original replicated-ledger model and threat assumptions.
- Ethereum Yellow Paper — A formal-ish specification for execution and state transitions.
- EIP-1559 — Fee market mechanics and incentive surfaces.
- Site Reliability Engineering (Google) — Error budgets, incident response, and reliability as an engineering discipline.
- Designing Data-Intensive Applications (Kleppmann) — The systems-engineering baseline for correctness, replication, and failure.
- Jepsen — Fault injection and correctness testing for distributed systems.