docs(b6): secret-custody reference + config-encryption upgrade runbook + private-key CI guard

Closes acquisition-diligence Bundle 6 findings on secret custody, config
encryption, and local artifact hygiene. Source IDs: S6, R4, SEC-M2,
RT-M1, RT-M2, RT-L1.

Surgical closures (artifact-only audit-framed memos stay out of the
public repo per the Bundle 5 lesson):

R4 / RT-L1 — local EC private key artifact
  rm cmd/agent/mc-001.key (gitignored, never in git history, leftover
  from a 2025-era agent dev run on the operator's workstation).
  Added scripts/ci-guards/B6-no-private-keys-in-tree.sh that fails the
  build if any TRACKED non-test file contains a PEM private-key block,
  so the next attempt to commit similar material gets caught at CI.
  Allowlist: *_test.go (hermetic-test PEMs), examples/*.md (sample
  walkthroughs), internal/scep/intune/testdata/ (certificates, not
  keys).

RT-M1 — landing-page HSM implication
  certctl.io/index.html: 'their hardware' / 'your hardware' colloquial
  comparisons rephrased to 'their custody' / 'your servers'. The phrase
  'Your keys. Your hardware. Your data. Your terms.' becomes 'Your
  keys. Your servers. Your data. Your terms.' to remove any inferred
  HSM-backed key-storage claim. The technical disclosure now lives in
  docs/operator/secret-custody.md (linked below); the landing page no
  longer makes a claim it cannot back.

S6 + SEC-M2 + RT-M2 (composite documentation closure)
  Added docs/operator/secret-custody.md — public operator reference
  enumerating every secret material on the control plane and on
  agents:
    - Local CA private key (FileDriver, file-on-disk, heap-resident
      with the L-014 carve-out documented in
      internal/connector/issuer/local/local.go).
    - Agent ECDSA P-256 keys (file on agent host, never transmitted).
    - OIDC client secret (AES-256-GCM v3, PBKDF2 600k).
    - Session signing key (same encryption regime).
    - Break-glass credential (Argon2id, never encrypted).
    - API-key bearer tokens (SHA-256 hash only; plaintext shown once).
    - CSR private keys mid-issuance (agent memory only).
    - Issuer-connector backend secrets (encrypted_config column,
      fail-closed for source='database', plaintext-by-design for
      source='env' with rationale).
  The Env-seeded-vs-DB-seeded plaintext policy is explained in plain
  text so a buyer review can independently verify the startup guard at
  cmd/server/main.go:222-262 makes sense.

  Added docs/operator/runbooks/config-encryption-upgrade.md — the
  procedural arm: how to force v1/v2 -> v3 re-seal across the
  database, plus the passphrase-rotation order. Documents the
  AEAD-driven read fallback (v3 -> v2 -> v1) and the fact that
  re-sealing happens passively on UPDATE. Open roadmap item: a
  certctl admin reseal --all command (tracked in
  WORKSPACE-ROADMAP.md).

  Both docs wired into docs/README.md Operator + Runbooks tables.

Verification:
  rg -n 'CONFIG_ENCRYPTION|encrypt|v1|private key|HSM|PKCS11|mc-001.key|\.key|Local CA' \
     internal cmd docs .gitignore README.md   # ambient (no NEW leaks)
  find . -name '*.key' \
     -not -path './.git/*' -not -path './web/node_modules/*'   # empty
  git ls-files | xargs grep -lE 'BEGIN .* PRIVATE KEY' \
     | grep -vE '_test\.go$|^examples/|^internal/scep/intune/testdata/'   # empty
  bash scripts/ci-guards/B6-no-private-keys-in-tree.sh   # PASS
  bash scripts/ci-guards/G-3-env-docs-drift.sh           # PASS
  bash scripts/ci-guards/doc-rot-detector.sh             # PASS

Residual roadmap (deliberately deferred):
  - signer.PKCS11Driver (HSM-token-backed CA-key custody).
  - signer.CloudKMSDriver (AWS/GCP/Azure KMS-backed CA-key custody).
  - FIPS 140-3 mode for the whole control plane.
  - HSM-backed session signing key.
  - Built-in 'certctl admin reseal --all' command.
  All five tracked in WORKSPACE-ROADMAP.md, not retracted.
This commit is contained in:
shankar0123
2026-05-13 01:48:40 +00:00
parent 5b151e74da
commit 476022ca59
4 changed files with 394 additions and 0 deletions
@@ -0,0 +1,165 @@
# Runbook: forcing config-encryption blob upgrades (v1/v2 → v3)
> Last reviewed: 2026-05-12
Use this when:
- You've rotated `CERTCTL_CONFIG_ENCRYPTION_KEY` and want every row in
the database to be re-sealed under the new passphrase, not just the
next ones to be touched.
- A v1- or v2-era encrypted blob existed in your database before you
upgraded to a post-M-8 release and you want to retire the legacy
read path's PBKDF2 work factor (100,000 rounds) in favor of the v3
factor (600,000 rounds, OWASP 2024).
- You're preparing for an audit and want every at-rest encrypted blob
to be on the same wire format.
Audience: a platform sysadmin who can run SQL against certctl's
PostgreSQL instance and exercise the GUI/REST API write paths.
For background on the v3 / v2 / v1 wire formats and the FileDriver vs
HSM threat model, read
[`docs/operator/secret-custody.md`](../secret-custody.md) first.
---
## Background: how the read fallback works
`internal/crypto/encryption.go::DecryptIfKeySet` reads three on-disk
formats in this order:
```
v3 (magic 0x03, per-ciphertext 16-byte salt, PBKDF2 600k) →
v2 (magic 0x02, per-ciphertext 16-byte salt, PBKDF2 100k) →
v1 (no magic, fixed 28-byte salt, PBKDF2 100k)
```
The fallback is AEAD-driven: if v3 decryption fails authentication, the
function tries v2; if v2 fails, v1. This is what keeps pre-M-8 v1 blobs
readable without an explicit migration.
`EncryptIfKeySet` always writes v3. As a result, any row that is
**re-written** through the normal application code path is silently
upgraded to v3 the moment it's persisted.
The implication: you do not need to "migrate" v1/v2 blobs for them to
keep working — only if you want the v1/v2 wire format physically gone
from your database.
## Procedure
### Step 1 — confirm the encryption key is set
Re-encryption obviously cannot run without a passphrase. Verify:
```bash
echo "${CERTCTL_CONFIG_ENCRYPTION_KEY:-NOT SET}" | sed -E 's/./*/g'
```
If the variable prints `NOT SET`, do not proceed — set the key in your
deployment manifest and restart the control plane first.
### Step 2 — identify which tables hold encrypted blobs
Encrypted columns in the v2.1.0 schema:
| Table | Column | Notes |
|---|---|---|
| `issuers` | `encrypted_config` | Only populated for `source='database'` rows (env-seeded rows are not encrypted) |
| `targets` | `encrypted_config` | Same source-based gating as issuers |
| `oidc_providers` | `client_secret_enc` | OIDC client_secret |
| `auth_session_signing_keys` | `key_material_enc` | HMAC-SHA256 session-cookie signing key |
If your schema differs, derive the column list from the migration
folder:
```bash
grep -hE '_enc[ ,]|encrypted_config' migrations/*.up.sql | sort -u
```
### Step 3 — identify rows still on v1/v2
The magic byte of the blob distinguishes versions; v1 blobs start with
the random AES-GCM nonce (anything but `0x02` or `0x03` is definitely
v1), and v2 vs v3 is determined by the first byte:
```sql
-- Per-table version distribution (run against your live database)
SELECT
SUBSTRING(encrypted_config FROM 1 FOR 1)::bytea AS magic,
COUNT(*) AS rows
FROM issuers
WHERE encrypted_config IS NOT NULL
GROUP BY magic;
```
Expected steady-state output is a single row with `magic = \x03`.
Any rows with `\x02` are v2; any rows with anything else are v1.
### Step 4 — force re-sealing
`UPDATE` the rows back to themselves through the normal application
write path. The cleanest way to do this is via the REST API or GUI,
not raw SQL — re-issuing the same `PUT /api/v1/issuers/:id` reads the
row, decrypts, then re-encrypts under v3 on the write back.
For an issuer named `iss-letsencrypt-prod`:
```bash
# Fetch then re-PUT the same body (CSRF + bearer token elided).
curl -sS https://certctl.example.com/api/v1/issuers/iss-letsencrypt-prod \
-H "Authorization: Bearer $CERTCTL_API_KEY" \
| jq '.' \
| curl -sS -X PUT https://certctl.example.com/api/v1/issuers/iss-letsencrypt-prod \
-H "Authorization: Bearer $CERTCTL_API_KEY" \
-H "Content-Type: application/json" \
--data-binary @-
```
Repeat for each row that the Step 3 query flagged as non-v3.
### Step 5 — verify
Re-run the Step 3 query. The output should now show only `magic =
\x03` rows.
## Special case: rotating the encryption-key passphrase
If your goal is to retire a possibly-compromised passphrase rather
than retire a legacy wire format, the order is:
1. Generate a new passphrase. Document it via your secret-management
tool (HashiCorp Vault, AWS Secrets Manager, etc.).
2. Stop the control plane briefly so no rows are written under the
stale passphrase during the transition window.
3. Run a one-shot decrypt-with-old / re-encrypt-with-new pass.
certctl ships no built-in tool for this — see the open
roadmap item below. The cleanest current approach is:
- Start certctl with the OLD passphrase.
- Read every encrypted column out to a JSON dump via the REST API.
- Stop certctl. Update its env to the NEW passphrase. Restart.
- PUT every row back from the JSON dump (the writes re-seal under
the new passphrase).
4. Document the old passphrase as retired in your secret-management
tool. Anyone with read access to a pre-rotation backup still needs
it to decrypt that backup; the live database no longer needs it.
For most operators, simply rotating the passphrase and letting the
re-seal happen organically as rows are touched is acceptable — the
v3 wire format with PBKDF2 600k rounds makes offline brute-force
against the old passphrase computationally expensive.
## Open roadmap items
- Ship a built-in `certctl admin reseal --all` command that does Steps
3 and 4 in one shot, with structured progress + audit logging.
Tracked in [WORKSPACE-ROADMAP.md](../../WORKSPACE-ROADMAP.md).
- Surface per-table v1/v2/v3 distribution as a Prometheus gauge so
alerting can fire on "rows on legacy format" drift.
## Related reading
- [`docs/operator/secret-custody.md`](../secret-custody.md) — the
broader where-do-private-keys-live reference; this runbook is the
procedural arm of that document.
- [`internal/crypto/encryption.go`](../../../internal/crypto/encryption.go)
package comment — wire format authoritative reference.
+166
View File
@@ -0,0 +1,166 @@
# Secret custody — where private keys live in certctl
> Last reviewed: 2026-05-12
Use this when:
- You're sizing certctl against an internal security review or third-party
diligence ("where do private keys live, and how are they protected at
rest?").
- You're evaluating the file-on-disk vs HSM-vs-cloud-KMS roadmap before
committing to a deployment topology.
- You need a single page that names every secret material on the control
plane and on agents, plus the at-rest protection for each.
This document covers WHAT secrets exist, HOW they are stored, and the
THREAT MODEL we accept for each — it is not a hardening checklist. The
hardening levers (env-vars, file modes, encryption-key configuration) are
cross-referenced as you read through.
## The secrets that exist
| Material | Where it lives | Protection at rest | Closes when… |
|---|---|---|---|
| Local CA private key | File on the control-plane host (`CERTCTL_CA_KEY_PATH`) | Filesystem ACLs (operator-supplied path; mode 0600 recommended) | A `signer.PKCS11Driver` or `signer.CloudKMSDriver` ships (post-v2.1.0) |
| Agent ECDSA P-256 private keys | File on each agent host (default `/var/lib/certctl-agent/keys/`) | Filesystem ACLs on the agent host. Never transmitted to the control plane. | TPM / Secure Enclave drivers ship (no current roadmap entry) |
| OIDC client secret | `oidc_providers.client_secret_enc` column (PostgreSQL) | AES-256-GCM v3 wire format, derived from `CERTCTL_CONFIG_ENCRYPTION_KEY` via PBKDF2-SHA256 600k rounds | The encryption key is rotated via `internal/crypto` re-seal (see runbook below) |
| Session signing key | `auth_session_signing_keys` table (PostgreSQL) | AES-256-GCM v3, same encryption-key passphrase as above | HSM/FIPS-validated signing-key driver lands (deferred to v3) |
| Break-glass credential | `breakglass_credentials.password_hash` column (PostgreSQL) | Argon2id (m=64MiB, t=1, p=4) hash; never encrypted because we need constant-time comparison | Out of scope — Argon2id resists offline attack already |
| API-key bearer tokens | `auth_api_keys.token_hash` column (PostgreSQL) | SHA-256(token) only — the plaintext is shown to the operator once at create time and never persisted | Out of scope |
| CSR private keys mid-issuance | Agent memory only, ephemeral | Never written to disk; never transmitted to the server (CSRs only) | Already closed |
| Issuer-connector backend secrets | `issuers.encrypted_config` column (PostgreSQL) for `source='database'` rows | AES-256-GCM v3; FAIL-CLOSED if `CERTCTL_CONFIG_ENCRYPTION_KEY` is unset (see "Env-seeded vs DB-seeded" below) | Already closed for `source='database'`; `source='env'` carries an explicit carve-out |
The breakdown by row source matters and is the subject of the next
section. Read it before concluding that a plaintext column is a bug.
## Env-seeded vs DB-seeded configs
certctl supports two sources for issuer and target configurations:
- **`source='env'`** — built from process environment variables on every
boot (`CERTCTL_CA_CERT_PATH`, `CERTCTL_CA_KEY_PATH`, `CERTCTL_ACME_DIRECTORY_URL`,
`CERTCTL_STEPCA_URL`, etc. — see `internal/service/issuer.go::buildEnvVarSeeds`
for the exact list). These rows are deterministically reconstructable from environment and
exist primarily so the GUI has something to display and so audit logs
can reference an issuer ID. The `config` column is intentionally
plaintext for `source='env'` rows: the exact same bytes already live
in the operator's Compose file / Helm values / systemd unit, so
persisting them again to PostgreSQL adds no new disclosure surface.
- **`source='database'`** — created via the GUI or REST API write paths
(`POST /api/v1/issuers`, etc.). These rows fail closed when
`CERTCTL_CONFIG_ENCRYPTION_KEY` is not configured:
- The HTTP handlers refuse the write with
`crypto.ErrEncryptionKeyRequired`.
- The server **refuses to start** if any `source='database'` row
exists without the encryption key, to prevent retroactive
plaintext exposure.
The startup guard is in `cmd/server/main.go` around the
`encryptionKey != ""` branch — it lists `source='database'` rows on every
boot and aborts if any are present without the key.
If you want every issuer/target row to be encrypted at rest unconditionally,
set `CERTCTL_CONFIG_ENCRYPTION_KEY` and use database-sourced
configurations exclusively (re-create env-seeded rows through the GUI
once the key is present).
## The signer abstraction
All CA private-key signing flows through
`internal/crypto/signer.Signer`, which embeds the stdlib `crypto.Signer`
and adds `Algorithm()`. Two drivers ship today:
- `signer.FileDriver` — the production default. Wraps the historical
file-on-disk PEM flow without behavior change. **Heap-resident**:
while certctl is running, the key bytes sit in the process's address
space.
- `signer.MemoryDriver` — used in tests; never reaches production code
paths.
The disk-exposure leg of the threat model is documented inline at the
top of `internal/connector/issuer/local/local.go` (the L-014 carve-out).
The mitigations on the FileDriver leg include:
- mode 0600 enforced on the key file at startup,
- the key directory is not served by any handler,
- the bytes are never logged or echoed in audit events,
- the server fails closed if it cannot read the key.
`FileDriver` does NOT mitigate "an attacker with read access to the
control-plane filesystem can recover the CA key." That mitigation lives
in a future `signer.PKCS11Driver` (hardware token) or
`signer.CloudKMSDriver` (AWS/GCP/Azure KMS). The interface exists; the
drivers do not ship yet. Both are post-v2.1.0 roadmap items — see
[`docs/reference/architecture.md`](../reference/architecture.md) for the
target topology.
If you need HSM-grade key custody today, you have two options:
1. Run certctl behind an enterprise issuer (Microsoft ADCS, EJBCA,
Smallstep, ACME-public) and configure certctl's local CA as
intermediate-only or disable it entirely. The issuer connector then
sends every signing request to your existing hardware-rooted PKI.
2. Wait for the PKCS#11 driver. Track its status in
[WORKSPACE-ROADMAP.md](../../WORKSPACE-ROADMAP.md).
## Config-encryption wire format
`internal/crypto/encryption.go` produces and reads three on-disk
formats. The read path accepts all three; the write path emits only
the newest:
| Version | Magic byte | Salt | PBKDF2-SHA256 work factor | Status |
|---|---|---|---|---|
| v3 | `0x03` | per-ciphertext 16B | 600,000 | **Default for all writes** (OWASP 2024) |
| v2 | `0x02` | per-ciphertext 16B | 100,000 | Legacy read-only; superseded by v3 |
| v1 | none | fixed 28B | 100,000 | Pre-M-8 legacy read-only; written before per-ciphertext-salt fix |
The wire-format documentation is also in the `internal/crypto/encryption.go`
package comment.
### Forcing legacy blob upgrades
Re-sealing happens passively: any `UPDATE` against a row that contains a
v1 or v2 blob triggers a v3 rewrite the next time the field is set.
There is no in-place migration tool because re-sealing requires reading
the row through the same code path that performs the write, and any
operational path that touches the row (renaming an issuer in the GUI,
updating a target's endpoint, refreshing an OIDC provider's
client-secret) achieves this naturally.
If you want to FORCE re-sealing across the entire database, use the
runbook at
[`docs/operator/runbooks/config-encryption-upgrade.md`](runbooks/config-encryption-upgrade.md).
Recommended only if you suspect the encryption-key passphrase has
been exposed and have already rotated it (the runbook covers the
rotation order: set the new key, force re-seal, retire the old key
from the rotation pool).
## Roadmap (what is not yet closed)
Tracked in [`WORKSPACE-ROADMAP.md`](../../WORKSPACE-ROADMAP.md), not
maintained here to prevent drift:
- `signer.PKCS11Driver` for HSM-token-backed CA key custody.
- `signer.CloudKMSDriver` for AWS/GCP/Azure KMS-backed CA key custody.
- FIPS 140-3 mode for the entire control plane.
- HSM-backed session signing key (currently HMAC-SHA256 software keys).
If a buyer or auditor asks for "HSM support," the honest answer is:
the interface is there, the drivers are not, and an enterprise issuer
connector is the bridge until the drivers ship.
## Related reading
- [`docs/operator/security.md`](security.md) — the broader hardening
checklist; covers TLS, RBAC, audit logging, network policy.
- [`docs/operator/auth-threat-model.md`](auth-threat-model.md) — the
authentication-subsystem threat model. Item 5 ("HSM / FIPS-validated
signing key for sessions") is the session-signing-key analog of this
document's CA-key story.
- [`docs/reference/architecture.md`](../reference/architecture.md) §
"Signer abstraction" — the diagram form of the FileDriver / future
PKCS11Driver / CloudKMSDriver topology.
- [`internal/crypto/encryption.go`](../../internal/crypto/encryption.go)
package comment — wire format authoritative reference.
- [`internal/connector/issuer/local/local.go`](../../internal/connector/issuer/local/local.go)
L-014 carve-out — the load-bearing threat-model section for the
FileDriver case.