Agents Playbook
Pillars/Security

Secrets-Leak Post-Mortem Playbook

When a secret leaks — git history, log, screenshot, public artefact — what to do, in what order, within what timeline.

Secrets-Leak Post-Mortem Playbook

When a secret leaks — git history, log, screenshot, public artefact — what to do, in what order, within what timeline.

TL;DR (human)

A leaked secret is compromised the moment it touches a public channel. Assume worst case. Rotate immediately; revoke aggressively; investigate scope; communicate honestly; close the gap that allowed the leak. Speed beats completeness — a half-rotation in 10 minutes is better than full rotation in 4 hours.

For agents

Trigger — what counts as a leak

ChannelTreat as leaked?
Pushed to a public repoYes
Pushed to a private repo viewable by > 1 personYes (treat as leaked)
In a CI log accessible to broader audienceYes
In a screenshot shared on chat / ticket / forumYes
In a stack trace returned to end usersYes
In an error log readable by support teamYes (depends on policy)
Committed locally but never pushedNo (still rotate; cheap insurance)
Found by a scanner, surface unknownTreat as leaked

When in doubt: treat as leaked. The cost of rotation is small; the cost of dismissing a real leak is huge.

Severity ladder

SevTriggerFirst-response time
SEV-1Production secret (prod DB password, payment processor key, signing key, sealer key) leaked publicly≤ 15 min
SEV-2Production secret leaked to a small audience (internal Slack, private repo with > 5 viewers)≤ 1 h
SEV-3Non-prod secret (staging, dev) leaked≤ 24 h
SEV-4Suspected-leak with low confidenceInvestigate first; rotate if confirmed

Use the on-call paging path (per on-call-rotation-pattern.md) for SEV-1/2.

The minute-by-minute response (SEV-1)

Minute 0–5: Confirm + rotate

  1. Confirm the leak: which secret, where exposed, when first exposed.
  2. Rotate the secret immediately. New value in the vault; old value flagged revoked.
  3. Push the new value to all consumers (deploys / restarts as needed).
  4. The old secret is now inert; even if attackers have it, it cannot authenticate.

This is the highest-leverage 5 minutes. Everything else can wait; this cannot.

Minute 5–20: Contain blast radius

  1. Revoke at the issuer (not just the vault):
    • API key: call the provider's revoke endpoint.
    • OAuth token: revoke via provider OAuth admin.
    • Signing key: publish revocation; rotate signed artefacts.
    • DB password: alter user; flush sessions.
  2. Block the channel: delete the public commit / log / chat message. Note — deletion does not retroactively undo exposure; treat as leaked.
  3. Audit recent use of the secret: who used it, when, from where. Anomalies = possible attacker use.

Minute 20–60: Forensic scope

  1. Determine how long the secret was exposed (commit history, log retention, etc.).
  2. Determine who could have seen it (repo visibility, log access, channel membership).
  3. Determine what the secret could access (the actual blast radius).
  4. Pull access logs for the secret's scope (API audit logs, DB query logs, file access).
  5. Look for anomalies during the exposure window.

Hour 1–24: Communication + cleanup

  1. Internal comms: post in #security; brief leadership.
  2. Customer comms (if customer data potentially at risk): draft per legal review; transparent within regulatory window (GDPR 72h notification if PII compromised).
  3. Git history scrub (if applicable): force-push to remove from history. Note — this does not retract the leak; the secret remains compromised. Scrub for hygiene, not safety.
  4. Document timeline for the post-mortem.

Day 1–7: Post-mortem + close the gap

  1. Write the post-mortem (per on-call-rotation-pattern.md discipline).
  2. Identify root cause: how did the secret end up where it leaked?
  3. Action items to prevent recurrence (see "Closing the gap" below).
  4. Verify rotation is complete (no consumers still using old secret).
  5. Audit-log the full incident with timeline + decisions.

Secret-specific rotation playbook

Database password:

  1. Generate new password in vault.
  2. Set new password on DB user (some DBs support multi-password during transition).
  3. Push new password to all app consumers (deploy / restart).
  4. Revoke the old password.
  5. Flush long-lived DB sessions if applicable.

API key (provider-managed):

  1. Create new key via provider dashboard / API.
  2. Update vault.
  3. Push to consumers.
  4. Revoke old key in provider dashboard.
  5. Audit-log; check provider audit logs for old-key usage post-leak.

OAuth token:

  1. Revoke at provider (call revoke endpoint).
  2. Re-authenticate the connector (user flow).
  3. Update vault with new token.
  4. Refresh token also typically rotates.

Cryptographic signing key:

This is the hardest case. Steps:

  1. Generate new keypair.
  2. Publish revocation of old key (PGP keyserver, sigstore, etc.).
  3. Re-sign current artefacts with new key.
  4. Update consumers' trust roots.
  5. Old key is now untrusted; any signature created with it is suspect.
  6. For audit-ledger signing key: re-anchor; document the rotation in the ledger itself.

Vault sealer key (KEK):

  1. Generate new KEK in KMS / HSM.
  2. Re-wrap every DEK with the new KEK (per vault-pattern.md).
  3. Switch operational KEK to new.
  4. After dual-read window, retire old KEK.
  5. Audit-log.

Session signing key:

  1. Generate new session key.
  2. Servers sign new sessions with new key.
  3. Servers continue to verify with both old + new for the session-lifetime window.
  4. After window, retire old key.
  5. Existing sessions degrade gracefully (logout + re-login).

"Was anything stolen?" investigation

Pull access logs for the exposure window:

  • Unusual IPs: requests from unexpected geos.
  • Unusual times: spikes at 3 AM in your customer's geo.
  • Unusual patterns: scraping; pagination through full data; bulk export.
  • Privilege ladder: did access pattern climb (read → write → admin)?

For each anomaly: investigate, document, decide if it justifies customer disclosure.

Honest investigation matters more than convenient conclusion. If you can't tell whether data was stolen, say so — both internally and (when applicable) externally.

Customer disclosure

Triggers requiring disclosure:

  • PII potentially accessed (GDPR / LGPD / state-law thresholds).
  • Payment data potentially accessed (PCI-DSS).
  • Customer-by-name credentials leaked (compromised credentials notice).

Disclosure shape:

  • What happened, in plain language.
  • What data was potentially exposed (be specific).
  • What you've done to contain (rotation, revocation, additional monitoring).
  • What customers should do (rotate their side, watch for phishing, change passwords).
  • Contact for questions (security@<domain>).

Avoid:

  • Minimising language ("a small incident").
  • Blaming a single person.
  • Withholding information that emerges later.

Legal usually reviews disclosure. Pre-draft a template; iterate during incident.

Regulatory timers

  • GDPR: 72h to notify supervisory authority if personal data breach. Notify affected individuals "without undue delay" if high risk to rights/freedoms.
  • LGPD: similar to GDPR; specific Brazilian DPA thresholds.
  • HIPAA: 60-day notification for breaches > 500 individuals.
  • State / local laws: California (CCPA), New York SHIELD, others — each with timelines.

Track applicable regulations per the markets you serve.

Closing the gap

After rotation, address the leak vector:

VectorMitigation
Secret in source codeGate: check-secrets (see ../../scripts/check-secrets.example.mjs)
Secret in .env committedPre-commit hook; .gitignore enforcement; secret-scan in CI
Secret in log lineLogger redaction (see vault-pattern.md)
Secret in screenshotAwareness training; mask sensitive UI in dev tools
Secret in CI logMask in CI runner; verify before adding to env
Secret in chat / ticketTeam training; auto-DLP scanning of chat
Secret in error responseError serializer never includes secrets / internals

Pick the specific vector; ship the specific gate.

Practice drills

Run a leak-rotation drill quarterly:

  1. Pick a non-critical secret.
  2. Simulate exposure.
  3. Run the playbook end-to-end.
  4. Time it; identify slow steps.
  5. Refine.

A drilled rotation is fast. An undrilled rotation panics.

Common failure modes

  • Delete-then-claim-resolved. Pushing a follow-up commit that removes the secret. The secret is still in git history; still in any clone. → Rotate first. Cleanup is hygiene.
  • Rotate only at the vault. Forget to revoke at the issuer. Attackers still authenticate. → Revoke at issuer; that's the kill.
  • Rotate one consumer; forget another. Service A on new key; service B still using old. → Inventory all consumers before rotation; verify all switched.
  • Slow comms. Internal team finds out via Twitter. → Internal first; then customer if applicable.
  • Blame the engineer. They committed the secret. → Blameless; the gate that should have caught it didn't.
  • No post-mortem. Same leak class recurs. → Post-mortem mandatory; action items tracked.
  • Force-push to "fix" history. Removes from main but not from forks / mirrors / caches. → Rotate; do not rely on history scrub for safety.

See also