Bridges: Where Trust Comes Back to Collect

Monthly research note. Theme: Blockchain Protocols.

TL;DR

A focused memo on Bridges: Where Trust Comes Back to Collect: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Correctness is cheaper to enforce at interfaces than to repair in production data.

Key takeaways

Upgrades must be compatibility-aware: mixed rulesets are a threat model.
Consensus safety is meaningless if execution is nondeterministic across nodes.
Mempools are adversarial schedulers: admission and fairness are protocol concerns.
Make failure modes explicit and observable.
Prefer protocols and APIs that make invalid states hard to express.

Why this matters

Consensus safety is meaningless if execution is nondeterministic across nodes.
State growth is a security problem: it impacts decentralization and verification.
Finality guarantees are user security guarantees; ambiguity is a UX vulnerability.
Mempools are an attack surface: spam, pinning, and incentive manipulation.

Key questions

Which invariants need proofs (supply, balances, ordering, slashing)?
How do upgrades change security assumptions (fork choice, state transition rules)?
What is the determinism story (byte-for-byte re-execution across platforms)?
What is the finality guarantee users can rely on (and when does it break)?
Where do you enforce resource limits (gas, bandwidth, storage, signature checks)?
Where is the economic/DoS pressure applied (mempool, gossip, execution, storage)?

Assumptions

Peers are untrusted; gossip can be manipulated for delay or isolation.
Attackers can buy bandwidth and compute; they can also bribe and censor.
Users and apps rely on probabilistic finality until proven otherwise.
Nodes are heterogeneous; determinism must survive platform differences.

Non-goals

Assuming honest majority without defining the adversary’s budget.
Allowing execution nondeterminism for performance convenience.

Attack surface

Negotiation and fallbacks are where security silently becomes optional—treat them as hostile.

Model & invariants

State commitments bind execution to succinct proofs:

\mathrm{root}_{t+1} = H(\mathrm{root}_t,\ \mathrm{block}_t,\ \mathrm{witness}_t).

Separate consensus safety from execution safety; both must hold.

Model the mempool as an adversarial scheduler: it chooses which work gets executed.

Invariant

Monotonicity beats timestamps: counters and epochs survive clock skew.

Security properties

Downgrade resistance: negotiation can’t silently weaken security posture.
Replay resistance: duplicated inputs do not change outcomes.
Integrity: invalid transitions are rejected (and detectable).
Evidence: critical actions emit verifiable audit events.

Failure modes

Timeout ambiguity causing double-apply or partial state transitions.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Config drift that weakens security posture over time.
Observability gaps during incidents (missing evidence).

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart TD
  tx["Transaction"] --> mp["Mempool (admission + prioritization)"]
  mp --> prop["Block Proposal"]
  prop --> cons["Consensus / Finality"]
  cons --> exec["Deterministic Execution"]
  exec --> root["State Root Commitment"]

Implementation notes

Treat mempool policy as part of the protocol if it changes security outcomes.

Rule of thumb

Acknowledge only after durability (or make “ack” explicitly best-effort).

// Deterministic execution is a security boundary.
pub trait Executor {
  fn apply_block(&mut self, block: &[u8]) -> Result<(), String>;
  fn state_root(&self) -> [u8; 32];
}

// Avoid nondeterminism: time, RNG, unordered maps, floating-point.

Verification strategy

Determinism tests across architectures (x86/ARM) and OSes.
Adversarial mempool tests: spam, pinning, worst-case signature patterns.
Fuzzing transaction decoding and state transition edge cases.
Formal invariants for supply/balance conservation where appropriate.
Cross-implementation tests when multiple clients exist.

Operational notes

Protect peer tables against eclipse attempts (diversity, scoring, rotation).
Keep execution resource limits explicit and enforced.
Rehearse upgrades with mixed versions and rollback paths.
Measure invalid tx rejection reasons and rates (spam signature).
Monitor reorg depth and frequency; treat increases as incidents.

Operational note

Keep audit and config history queryable during incidents—evidence beats intuition.

What to monitor

Admission-control / rate-limit rejections (by reason).
Authz failures and policy denials (unexpected spikes).
Rollback events and the conditions that triggered them.
Error budget burn + tail latency under load.
Invariant violation rate (should be ~0).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Keep dual-write / dual-verify windows where appropriate.
Use canaries and staged rollout; stop early when signals degrade.

Evidence

Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
Designing Data-Intensive Applications (Kleppmann) (2) — The systems-engineering baseline for correctness, replication, and failure.
- Evidence: Replication and consistency tradeoffs as engineering constraints; use as reference when naming guarantees.

Open questions

Which invariants should be proven vs tested vs monitored?
Where does your implementation accidentally depend on local wall-clock time?
How do you communicate finality uncertainty to users without lying?
What is the worst-case work a single transaction can force?

Checklist

Safety properties stated as invariants.
Telemetry captures correctness signals.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Assumptions listed and reviewed.
Failure modes enumerated with mitigations.
Rollback plan rehearsed and automated.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading