Monthly research note. Theme: Blockchain Protocols.

TL;DR

A focused memo on Gossip Networks: Propagation, Eclipse Attacks, and Topology: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

If the spec is implicit, the implementation becomes the spec—and you’ll learn it during incidents.

Key takeaways

  • Topology attacks (eclipse/partition) change security outcomes; harden peer selection.
  • Upgrades must be compatibility-aware: mixed rulesets are a threat model.
  • Finality guarantees are user security guarantees—document and enforce them.
  • Bind security decisions to evidence (audit, invariants, telemetry).
  • Automate guardrails; humans are for judgment, not for consistent enforcement.

Why this matters

  • Topology attacks (eclipse, partition) change who sees which transactions.
  • State growth is a security problem: it impacts decentralization and verification.
  • Light clients shift assumptions; they must be written down.
  • MEV turns protocol details into adversarial strategy.

Key questions

  • Where do you enforce resource limits (gas, bandwidth, storage, signature checks)?
  • What is the determinism story (byte-for-byte re-execution across platforms)?
  • Where is the economic/DoS pressure applied (mempool, gossip, execution, storage)?
  • What is the finality guarantee users can rely on (and when does it break)?
  • What is the reorg budget for applications and how do you communicate it?
  • How do you defend against topology attacks (eclipse, partition, sybil)?

Assumptions

  • Upgrades happen under partial adoption; mixed-version is inevitable.
  • Users and apps rely on probabilistic finality until proven otherwise.
  • Attackers can buy bandwidth and compute; they can also bribe and censor.
  • Peers are untrusted; gossip can be manipulated for delay or isolation.

Non-goals

  • Relying on client-side heuristics to paper over protocol ambiguity.
  • Allowing execution nondeterminism for performance convenience.
Attack surface

Observability pipelines can be attacked (cardinality explosions, log injection). Protect them.

Model & invariants

A simple resource-admission constraint:

txBcost(tx)budget(B)(gas/bytes/sigchecks).\sum_{tx \in B} \mathrm{cost}(tx) \le \mathrm{budget}(B)\qquad\text{(gas/bytes/sigchecks)}.

Treat reorgs as a user-visible security event; encode reorg-aware semantics.

Separate consensus safety from execution safety; both must hold.

Invariant

If the system can enter an invalid state, it eventually will—usually during an incident.

Security properties

  • Integrity: invalid transitions are rejected (and detectable).
  • Least authority: privileges are scoped by purpose and time.
  • Downgrade resistance: negotiation can’t silently weaken security posture.
  • Replay resistance: duplicated inputs do not change outcomes.

Failure modes

  • Observability gaps during incidents (missing evidence).
  • Config drift that weakens security posture over time.
  • Mixed-version behavior that violates assumptions silently.
  • Timeout ambiguity causing double-apply or partial state transitions.
Pitfall

Sampling hides the rare schedule that breaks your invariants.

Design sketch

sequenceDiagram
  participant U as User
  participant N as Node
  participant P as Peers
  U->>N: submit(tx)
  N->>P: gossip(tx)
  P-->>N: gossip(more tx)
  Note over N: admission + ordering
  N-->>U: inclusion/finality signal

Implementation notes

Determinism is a boundary: every nondeterministic input is an attack surface.

Rule of thumb

Make rollbacks boring: if rollback is a hero move, it will fail.

// Deterministic execution is a security boundary.
pub trait Executor {
  fn apply_block(&mut self, block: &[u8]) -> Result<(), String>;
  fn state_root(&self) -> [u8; 32];
}

// Avoid nondeterminism: time, RNG, unordered maps, floating-point.

Verification strategy

  • Cross-implementation tests when multiple clients exist.
  • Fork/reorg simulations: application-facing invariants under reorgs.
  • Adversarial mempool tests: spam, pinning, worst-case signature patterns.
  • Fuzzing transaction decoding and state transition edge cases.
  • Formal invariants for supply/balance conservation where appropriate.

Operational notes

  • Measure invalid tx rejection reasons and rates (spam signature).
  • Protect peer tables against eclipse attempts (diversity, scoring, rotation).
  • Keep execution resource limits explicit and enforced.
  • Rehearse upgrades with mixed versions and rollback paths.
  • Monitor reorg depth and frequency; treat increases as incidents.
Operational note

Keep audit and config history queryable during incidents—evidence beats intuition.

What to monitor

  • Admission-control / rate-limit rejections (by reason).
  • Rollback events and the conditions that triggered them.
  • Retry/timeout rates by endpoint and client cohort.
  • Error budget burn + tail latency under load.
  • Authz failures and policy denials (unexpected spikes).

Rollback plan

  • Keep dual-write / dual-verify windows where appropriate.
  • Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
  • Define an explicit rollback trigger (metrics + thresholds).
  • Use canaries and staged rollout; stop early when signals degrade.
  • Prefer backward-compatible changes; avoid “flag day” upgrades.

Evidence

  • Jepsen (1) — Fault injection and correctness testing for distributed systems.
    • Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
  • Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
    • Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.

Open questions

  • What is the worst-case work a single transaction can force?
  • Where does your implementation accidentally depend on local wall-clock time?
  • How do you communicate finality uncertainty to users without lying?
  • Which invariants should be proven vs tested vs monitored?

Checklist

  • Telemetry captures correctness signals.
  • Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
  • Safety properties stated as invariants.
  • Failure modes enumerated with mitigations.
  • Rollback plan rehearsed and automated.
  • Assumptions listed and reviewed.

Further reading

1.
Jepsen. Jepsen: Distributed Systems Safety Analysis [Internet]. Web; Available from: https://jepsen.io/
2.
Kleppmann M. Designing Data-Intensive Applications [Internet]. O’Reilly Media; 2017. Available from: https://dataintensive.net/