Red Teaming Infrastructure: Turning Attacks into Regression Tests

Monthly research note. Theme: DevSecOps & Resilience Engineering.

TL;DR

Red Teaming Infrastructure: Turning Attacks into Regression Tests as an engineering constraint: write down assumptions, make invariants executable, and design operational recovery as part of correctness.

Key insight

Most failures are boundary failures: parsing, persistence, concurrency, retries, and upgrades.

Key takeaways

Policy-as-code needs tests, rollout, and rollback like any other production system.
Treat CI/CD as attacker-controlled until proven otherwise; minimize secrets and privileges.
Provenance is a cryptographic statement; ship evidence with artifacts.
Automate guardrails; humans are for judgment, not for consistent enforcement.
Prefer protocols and APIs that make invalid states hard to express.

Why this matters

Policy drift is the default; guardrails must be automated and enforced.
Reproducibility is how you know what you shipped is what you built.
Rollouts are where incidents happen; safe rollback is a security feature.
Secrets in CI turn “one compromised job” into “full compromise.”

Key questions

What is your supply-chain threat model (dependency poisoning, CI compromise)?
Where do you enforce policy (pre-merge, build, deploy, runtime)?
Which signals prove correctness (not just availability) in production?
How do you prevent “break glass” from becoming the standard path?
How do you manage secrets without long-lived credentials in CI?
What is the minimum set of humans who can ship to production?

Assumptions

Observability pipelines can be attacked (log injection, PII leaks).
Dependencies can be compromised upstream (typosquatting, maintainer takeover).
CI runners are exposed to untrusted code (PRs, dependencies).
Rollbacks must be executed under time pressure.

Non-goals

Trusting CI environments by default.
Assuming deploy equals success without runtime evidence.

Attack surface

Any unbounded work per request becomes a DoS primitive under adversaries.

Model & invariants

Build provenance is a cryptographic statement:

\mathrm{attest} \leftarrow \mathrm{Sign}_{k_\text{build}}(\mathrm{hash}(\text{artifact})\ \Vert\ \text{metadata}).

Policy should be code with diffs and reviews—guardrails, not guidelines.

Make provenance verifiable: “what built this” must be cryptographically bound.

Invariant

Monotonicity beats timestamps: counters and epochs survive clock skew.

Security properties

Authenticity: actions are bound to identity and purpose.
Replay resistance: duplicated inputs do not change outcomes.
Downgrade resistance: negotiation can’t silently weaken security posture.
Integrity: invalid transitions are rejected (and detectable).

Failure modes

Config drift that weakens security posture over time.
Timeout ambiguity causing double-apply or partial state transitions.
Resource exhaustion (CPU/bandwidth/storage) turning into correctness failures.
Recovery paths that only work when nothing is broken.

Pitfall

A recovery plan that isn’t exercised will fail when you need it.

Design sketch

flowchart LR
  src["Source"] --> build["Build (reproducible)"]
  build --> attest["Attestation"]
  attest --> scan["SAST/DAST/SCA"]
  scan --> deploy["Deploy (policy gates)"]
  deploy --> runtime["Runtime Policy + Observability"]

Implementation notes

The pipeline is production: it has credentials, network reach, and authority.

Rule of thumb

If you can’t explain a timeout outcome, you can’t make retries safe.

CI hardening checklist:
- No long-lived secrets in CI
- OIDC to obtain short-lived creds
- Pin dependencies and verify integrity
- Reproducible builds + provenance attestation
- Policy-as-code gates (deploy blocked on evidence)

Verification strategy

Rollback tests as part of release (not “if needed”).
Pipeline attack simulations: compromise a runner and measure blast radius.
Dependency tampering drills: lockfile changes, integrity failures.
Runtime conformance: detect drift between desired and actual state.
Policy tests: unit tests for policy-as-code rules.

Operational notes

Rehearse incident response for the pipeline itself.
Treat policy changes as security-sensitive deploys (review + rollout).
Audit who can ship and how; remove implicit paths.
Continuously scan and inventory dependencies; prioritize by exposure.
Keep a provenance trail for every artifact deployed to production.

Operational note

Design playbooks as protocols: predictable steps, bounded risk, and clear ownership.

What to monitor

Authz failures and policy denials (unexpected spikes).
Error budget burn + tail latency under load.
Rollback events and the conditions that triggered them.
Admission-control / rate-limit rejections (by reason).
Retry/timeout rates by endpoint and client cohort.

Rollback plan

Preserve evidence (configs, artifacts, audit logs) to reconstruct what changed.
Use canaries and staged rollout; stop early when signals degrade.
Keep dual-write / dual-verify windows where appropriate.
Prefer backward-compatible changes; avoid “flag day” upgrades.
Define an explicit rollback trigger (metrics + thresholds).

Evidence

Jepsen (1) — Fault injection and correctness testing for distributed systems.
- Evidence: Turn faults into test cases; prioritize partition and clock-skew scenarios that violate user-visible guarantees.
Site Reliability Engineering (Google) (2) — Error budgets, incident response, and reliability as an engineering discipline.
- Evidence: Error budgets and incident response are correctness controls; tie monitoring and rollback triggers to SLO burn.

Open questions

How quickly can you revoke all pipeline credentials in an incident?
Can you answer “what code is running” with cryptographic evidence?
What is the smallest CI compromise that becomes a prod compromise today?
Which deploy actions are irreversible and how do you mitigate that?

Checklist

Costs bounded (CPU/memory/bandwidth) under adversarial inputs.
Failure modes enumerated with mitigations.
Safety properties stated as invariants.
Assumptions listed and reviewed.
Rollback plan rehearsed and automated.
Telemetry captures correctness signals.

TL;DR

Key takeaways

Why this matters

Key questions

Assumptions

Non-goals

Model & invariants

Security properties

Failure modes

Design sketch

Implementation notes

Verification strategy

Operational notes

What to monitor

Rollback plan

Evidence

Open questions

Checklist

Further reading