docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard

Closes acquisition-diligence Bundle 6 findings on secret custody, config
encryption, and local artifact hygiene. Source IDs: S6, R4, SEC-M2,
RT-M1, RT-M2, RT-L1.

Surgical closures (artifact-only audit-framed memos stay out of the
public repo per the Bundle 5 lesson):

R4 / RT-L1 — local EC private key artifact
  rm cmd/agent/mc-001.key (gitignored, never in git history, leftover
  from a 2025-era agent dev run on the operator's workstation).
  Added scripts/ci-guards/B6-no-private-keys-in-tree.sh that fails the
  build if any TRACKED non-test file contains a PEM private-key block,
  so the next attempt to commit similar material gets caught at CI.
  Allowlist: *_test.go (hermetic-test PEMs), examples/*.md (sample
  walkthroughs), internal/scep/intune/testdata/ (certificates, not
  keys).

RT-M1 — landing-page HSM implication
  certctl.io/index.html: 'their hardware' / 'your hardware' colloquial
  comparisons rephrased to 'their custody' / 'your servers'. The phrase
  'Your keys. Your hardware. Your data. Your terms.' becomes 'Your
  keys. Your servers. Your data. Your terms.' to remove any inferred
  HSM-backed key-storage claim. The technical disclosure now lives in
  docs/operator/secret-custody.md (linked below); the landing page no
  longer makes a claim it cannot back.

S6 + SEC-M2 + RT-M2 (composite documentation closure)
  Added docs/operator/secret-custody.md — public operator reference
  enumerating every secret material on the control plane and on
  agents:
    - Local CA private key (FileDriver, file-on-disk, heap-resident
      with the L-014 carve-out documented in
      internal/connector/issuer/local/local.go).
    - Agent ECDSA P-256 keys (file on agent host, never transmitted).
    - OIDC client secret (AES-256-GCM v3, PBKDF2 600k).
    - Session signing key (same encryption regime).
    - Break-glass credential (Argon2id, never encrypted).
    - API-key bearer tokens (SHA-256 hash only; plaintext shown once).
    - CSR private keys mid-issuance (agent memory only).
    - Issuer-connector backend secrets (encrypted_config column,
      fail-closed for source='database', plaintext-by-design for
      source='env' with rationale).
  The Env-seeded-vs-DB-seeded plaintext policy is explained in plain
  text so a buyer review can independently verify the startup guard at
  cmd/server/main.go:222-262 makes sense.

  Added docs/operator/runbooks/config-encryption-upgrade.md — the
  procedural arm: how to force v1/v2 -> v3 re-seal across the
  database, plus the passphrase-rotation order. Documents the
  AEAD-driven read fallback (v3 -> v2 -> v1) and the fact that
  re-sealing happens passively on UPDATE. Open roadmap item: a
  certctl admin reseal --all command (tracked in
  WORKSPACE-ROADMAP.md).

  Both docs wired into docs/README.md Operator + Runbooks tables.

Verification:
  rg -n 'CONFIG_ENCRYPTION|encrypt|v1|private key|HSM|PKCS11|mc-001.key|\.key|Local CA' \
     internal cmd docs .gitignore README.md   # ambient (no NEW leaks)
  find . -name '*.key' \
     -not -path './.git/*' -not -path './web/node_modules/*'   # empty
  git ls-files | xargs grep -lE 'BEGIN .* PRIVATE KEY' \
     | grep -vE '_test\.go$|^examples/|^internal/scep/intune/testdata/'   # empty
  bash scripts/ci-guards/B6-no-private-keys-in-tree.sh   # PASS
  bash scripts/ci-guards/G-3-env-docs-drift.sh           # PASS
  bash scripts/ci-guards/doc-rot-detector.sh             # PASS

Residual roadmap (deliberately deferred):
  - signer.PKCS11Driver (HSM-token-backed CA-key custody).
  - signer.CloudKMSDriver (AWS/GCP/Azure KMS-backed CA-key custody).
  - FIPS 140-3 mode for the whole control plane.
  - HSM-backed session signing key.
  - Built-in 'certctl admin reseal --all' command.
  All five tracked in WORKSPACE-ROADMAP.md, not retracted.
This commit is contained in:
shankar0123
2026-05-13 01:48:40 +00:00
parent 5b151e74da
commit 476022ca59
4 changed files with 394 additions and 0 deletions
@@ -0,0 +1,165 @@
# Runbook: forcing config-encryption blob upgrades (v1/v2 → v3)
> Last reviewed: 2026-05-12
Use this when:
- You've rotated `CERTCTL_CONFIG_ENCRYPTION_KEY` and want every row in
the database to be re-sealed under the new passphrase, not just the
next ones to be touched.
- A v1- or v2-era encrypted blob existed in your database before you
upgraded to a post-M-8 release and you want to retire the legacy
read path's PBKDF2 work factor (100,000 rounds) in favor of the v3
factor (600,000 rounds, OWASP 2024).
- You're preparing for an audit and want every at-rest encrypted blob
to be on the same wire format.
Audience: a platform sysadmin who can run SQL against certctl's
PostgreSQL instance and exercise the GUI/REST API write paths.
For background on the v3 / v2 / v1 wire formats and the FileDriver vs
HSM threat model, read
[`docs/operator/secret-custody.md`](../secret-custody.md) first.
---
## Background: how the read fallback works
`internal/crypto/encryption.go::DecryptIfKeySet` reads three on-disk
formats in this order:
```
v3 (magic 0x03, per-ciphertext 16-byte salt, PBKDF2 600k) →
v2 (magic 0x02, per-ciphertext 16-byte salt, PBKDF2 100k) →
v1 (no magic, fixed 28-byte salt, PBKDF2 100k)
```
The fallback is AEAD-driven: if v3 decryption fails authentication, the
function tries v2; if v2 fails, v1. This is what keeps pre-M-8 v1 blobs
readable without an explicit migration.
`EncryptIfKeySet` always writes v3. As a result, any row that is
**re-written** through the normal application code path is silently
upgraded to v3 the moment it's persisted.
The implication: you do not need to "migrate" v1/v2 blobs for them to
keep working — only if you want the v1/v2 wire format physically gone
from your database.
## Procedure
### Step 1 — confirm the encryption key is set
Re-encryption obviously cannot run without a passphrase. Verify:
```bash
echo "${CERTCTL_CONFIG_ENCRYPTION_KEY:-NOT SET}" | sed -E 's/./*/g'
```
If the variable prints `NOT SET`, do not proceed — set the key in your
deployment manifest and restart the control plane first.
### Step 2 — identify which tables hold encrypted blobs
Encrypted columns in the v2.1.0 schema:
| Table | Column | Notes |
|---|---|---|
| `issuers` | `encrypted_config` | Only populated for `source='database'` rows (env-seeded rows are not encrypted) |
| `targets` | `encrypted_config` | Same source-based gating as issuers |
| `oidc_providers` | `client_secret_enc` | OIDC client_secret |
| `auth_session_signing_keys` | `key_material_enc` | HMAC-SHA256 session-cookie signing key |
If your schema differs, derive the column list from the migration
folder:
```bash
grep -hE '_enc[ ,]|encrypted_config' migrations/*.up.sql | sort -u
```
### Step 3 — identify rows still on v1/v2
The magic byte of the blob distinguishes versions; v1 blobs start with
the random AES-GCM nonce (anything but `0x02` or `0x03` is definitely
v1), and v2 vs v3 is determined by the first byte:
```sql
-- Per-table version distribution (run against your live database)
SELECT
SUBSTRING(encrypted_config FROM 1 FOR 1)::bytea AS magic,
COUNT(*) AS rows
FROM issuers
WHERE encrypted_config IS NOT NULL
GROUP BY magic;
```
Expected steady-state output is a single row with `magic = \x03`.
Any rows with `\x02` are v2; any rows with anything else are v1.
### Step 4 — force re-sealing
`UPDATE` the rows back to themselves through the normal application
write path. The cleanest way to do this is via the REST API or GUI,
not raw SQL — re-issuing the same `PUT /api/v1/issuers/:id` reads the
row, decrypts, then re-encrypts under v3 on the write back.
For an issuer named `iss-letsencrypt-prod`:
```bash
# Fetch then re-PUT the same body (CSRF + bearer token elided).
curl -sS https://certctl.example.com/api/v1/issuers/iss-letsencrypt-prod \
-H "Authorization: Bearer $CERTCTL_API_KEY" \
| jq '.' \
| curl -sS -X PUT https://certctl.example.com/api/v1/issuers/iss-letsencrypt-prod \
-H "Authorization: Bearer $CERTCTL_API_KEY" \
-H "Content-Type: application/json" \
--data-binary @-
```
Repeat for each row that the Step 3 query flagged as non-v3.
### Step 5 — verify
Re-run the Step 3 query. The output should now show only `magic =
\x03` rows.
## Special case: rotating the encryption-key passphrase
If your goal is to retire a possibly-compromised passphrase rather
than retire a legacy wire format, the order is:
1. Generate a new passphrase. Document it via your secret-management
tool (HashiCorp Vault, AWS Secrets Manager, etc.).
2. Stop the control plane briefly so no rows are written under the
stale passphrase during the transition window.
3. Run a one-shot decrypt-with-old / re-encrypt-with-new pass.
certctl ships no built-in tool for this — see the open
roadmap item below. The cleanest current approach is:
- Start certctl with the OLD passphrase.
- Read every encrypted column out to a JSON dump via the REST API.
- Stop certctl. Update its env to the NEW passphrase. Restart.
- PUT every row back from the JSON dump (the writes re-seal under
the new passphrase).
4. Document the old passphrase as retired in your secret-management
tool. Anyone with read access to a pre-rotation backup still needs
it to decrypt that backup; the live database no longer needs it.
For most operators, simply rotating the passphrase and letting the
re-seal happen organically as rows are touched is acceptable — the
v3 wire format with PBKDF2 600k rounds makes offline brute-force
against the old passphrase computationally expensive.
## Open roadmap items
- Ship a built-in `certctl admin reseal --all` command that does Steps
3 and 4 in one shot, with structured progress + audit logging.
Tracked in [WORKSPACE-ROADMAP.md](../../WORKSPACE-ROADMAP.md).
- Surface per-table v1/v2/v3 distribution as a Prometheus gauge so
alerting can fire on "rows on legacy format" drift.
## Related reading
- [`docs/operator/secret-custody.md`](../secret-custody.md) — the
broader where-do-private-keys-live reference; this runbook is the
procedural arm of that document.
- [`internal/crypto/encryption.go`](../../../internal/crypto/encryption.go)
package comment — wire format authoritative reference.