Acquisition-audit DEPL-005 (backup runbook exists but no CI restore
test) + DATA-012 closure (Sprint 4 ACQ, 2026-05-16).
A backup procedure that has never been restore-tested is not a backup
procedure. The Helm CronJob at deploy/helm/certctl/templates/backup-
cronjob.yaml and the operator runbook at
docs/operator/runbooks/postgres-backup.md both document a
`pg_dump -Fc --no-owner --no-acl`-based backup strategy, but the
dump shape has never been restored end-to-end under CI. This sprint
adds the missing assertion.
Each Monday at 07:00 UTC (1h offset from loadtest.yml's 06:00 slot so
the two jobs don't fight for runners), boot a real postgres:16-alpine
service container pinned to the SAME sha256 digest as
deploy/docker-compose.yml, exercise the audit_events hash chain
with 24 synthetic rows representing an issue/renew/revoke/auth-login
cycle, take a custom-format dump, DROP SCHEMA public CASCADE
(simulating an operator-side data-loss event), pg_restore, and
assert:
pre.row_count == post.row_count
pre.chain_head_hash == post.chain_head_hash (BYTE-EXACT)
post.first_break_id == "" (verify_chain clean)
post.verifier_walked == pre.row_count (every row walked)
The chain-head byte-exact assertion is the load-bearing one.
Migration 000047 hashes each row's canonical payload with
`to_char(timestamp AT TIME ZONE 'UTC',
'YYYY-MM-DD"T"HH24:MI:SS.US"Z"')` — any TIMESTAMPTZ-precision loss
in the dump/restore path (a real concern across major Postgres
upgrades or with --format=plain) would corrupt the hash. The point
of testing is to PROVE the property, not to defend against a known
quirk.
Files
=====
- .github/workflows/backup-restore.yml — Mondays 07:00 UTC +
workflow_dispatch. Postgres service container; Go 1.25.10;
contents:read; 15-min timeout. Action SHAs pinned to match
ci.yml's pinning convention.
- deploy/test/backup-restore-smoke.sh — bash orchestrator: preflight
(postgresql-client + Go + python3 on PATH); wait-for-ready loop;
DROP SCHEMA + workload + dump + DROP SCHEMA + restore + verify
+ python3 JSON diff. ::error:: prefix on any assertion failure.
Same script runs unchanged locally against any reachable Postgres.
- deploy/test/backupsmoke/main.go — Go program with --mode=workload
and --mode=verify. Imports the repo's
internal/repository/postgres.RunMigrations and emits a small JSON
snapshot to stdout. INSERT shape mirrors
internal/repository/postgres/audit_chain_test.go.
- docs/operator/runbooks/postgres-backup.md — adds a 'CI restore
verification' subsection after the existing quarterly-dry-run
section, points at the new workflow + harness + smoke program,
bumps the last-reviewed marker.
Verified locally: gofmt clean, go vet clean, staticcheck clean,
`go build ./deploy/test/backupsmoke` succeeds, bash -n on the shell
harness, python3 -c yaml.safe_load on the workflow, dry-run of the
JSON-diff python block on synthetic pre.json/post.json covers both
PASS and ::error:: paths.