Supply Chain Security: SLSA, SBOM, and Build Provenance

Monthly research note. Theme: DevSecOps & Resilience Engineering.

TL;DR

A focused memo on Supply Chain Security: SLSA, SBOM, and Build Provenance: define the model, state the properties, then design the system so those properties remain true under failure and adversaries.

Key insight

Treat “timeouts” as a third outcome: not success, not failure—ambiguity you must model.

Key takeaways

Make rollback a first-class operation with explicit triggers and rehearsal.
Policy-as-code needs tests, rollout, and rollback like any other production system.
Provenance is a cryptographic statement; ship evidence with artifacts.
Automate guardrails; humans are for judgment, not for consistent enforcement.
Write assumptions down; treat them as interfaces.

Why this matters

Reproducibility is how you know what you shipped is what you built.
Runtime security needs evidence pipelines, not just dashboards.
Rollouts are where incidents happen; safe rollback is a security feature.
Secrets in CI turn “one compromised job” into “full compromise.”

Key questions

How do you prevent “break glass” from becoming the standard path?
How do you rehearse incident response as code (runbooks, chaos, drills)?
Where do you enforce policy (pre-merge, build, deploy, runtime)?
How do you manage secrets without long-lived credentials in CI?
Which signals prove correctness (not just availability) in production?
How do you do safe rollouts (canary, blast-radius, rapid rollback)?

Assumptions

CI runners are exposed to untrusted code (PRs, dependencies).
Dependencies can be compromised upstream (typosquatting, maintainer takeover).
Policy enforcement must be consistent across environments.
Observability pipelines can be attacked (log injection, PII leaks).

Non-goals

Manual policy enforcement or manual security review as the only control.
Assuming deploy equals success without runtime evidence.

Attack surface

Any unbounded work per request becomes a DoS primitive under adversaries.

Model & invariants

Build provenance is a cryptographic statement:

\mathrm{attest} \leftarrow \mathrm{Sign}_{k_\text{build}}(\mathrm{hash}(\text{artifact})\ \Vert\ \text{metadata}).

Treat CI as attacker-controlled until proven otherwise; minimize secrets and privileges.

Make provenance verifiable: “what built this” must be cryptographically bound.

Invariant

Monotonicity beats timestamps: counters and epochs survive clock skew.

Security properties

Least authority: privileges are scoped by purpose and time.
Evidence: critical actions emit verifiable audit events.
Replay resistance: duplicated inputs do not change outcomes.
Integrity: invalid transitions are rejected (and detectable).

Failure modes

Config drift that weakens security posture over time.
Mixed-version behavior that violates assumptions silently.
Timeout ambiguity causing double-apply or partial state transitions.
Recovery paths that only work when nothing is broken.

Pitfall

Mixed-version deployments create states you never tested—plan for them explicitly.

Design sketch

flowchart LR
  src["Source"] --> build["Build (reproducible)"]
  build --> attest["Attestation"]
  attest --> scan["SAST/DAST/SCA"]
  scan --> deploy["Deploy (policy gates)"]
  deploy --> runtime["Runtime Policy + Observability"]

Implementation notes

Prefer short-lived credentials (OIDC) and explicit policy gates.

Rule of thumb

If you can’t explain a timeout outcome, you can’t make retries safe.

// Treat CI as untrusted: keep tokens short-lived and scoped.
type Token struct {
  Value string
  ExpiresAtUnix int64
  Scope string
}

Verification strategy

Rollback tests as part of release (not “if needed”).
Runtime conformance: detect drift between desired and actual state.
Policy tests: unit tests for policy-as-code rules.
Dependency tampering drills: lockfile changes, integrity failures.
Pipeline attack simulations: compromise a runner and measure blast radius.

Operational notes

Rehearse incident response for the pipeline itself.
Continuously scan and inventory dependencies; prioritize by exposure.
Keep a provenance trail for every artifact deployed to production.
Audit who can ship and how; remove implicit paths.
Treat policy changes as security-sensitive deploys (review + rollout).

Operational note

Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.

What to monitor

Retry/timeout rates by endpoint and client cohort.
Authz failures and policy denials (unexpected spikes).
Rollback events and the conditions that triggered them.
Admission-control / rate-limit rejections (by reason).
Invariant violation rate (should be ~0).

Rollback plan

Define an explicit rollback trigger (metrics + thresholds).
Keep dual-write / dual-verify windows where appropriate.
Use canaries and staged rollout; stop early when signals degrade.
Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Prefer backward-compatible changes; avoid “flag day” upgrades.

Evidence

Learn TLA+ (1) — Practical entry point for specification and model checking.
- Evidence: Model the smallest thing that can break; use model checking to validate invariants before optimizing.
Jepsen (2) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.

Open questions

How quickly can you revoke all pipeline credentials in an incident?
Can you answer “what code is running” with cryptographic evidence?
What is the smallest CI compromise that becomes a prod compromise today?
Which deploy actions are irreversible and how do you mitigate that?

Checklist

Telemetry captures correctness signals.
Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Safety properties stated as invariants.
Assumptions listed and reviewed.
Rollback plan rehearsed and automated.
Failure modes enumerated with mitigations.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading