mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:01:32 +00:00
ci(cold-db-smoke): shrink to cold-boot + admin bootstrap only
Drop steps 5-7 (issue/renew/revoke + audit row assertion). They covered functional API behavior (cert lifecycle) which the warm-DB integration test suite under 'Go Test with Coverage' already covers thoroughly. The cold-DB smoke's unique value is catching the bug class only a true cold boot can surface — config validation gaps, non-idempotent migrations, env-var-wiring gaps in the demo compose. Today's run found three real master bugs of that class (6d0f774DEMO_MODE_ACK,910097emigration 000043 idempotency,58b1441bootstrap-token interpolation); cert lifecycle is not in that bug class. Steps that remain (proven to fire on real bugs today): 1. docker compose down -v --remove-orphans 2. docker compose up -d (cold boot) 3. wait for postgres + certctl-server + certctl-agent healthy 4. force-recreate certctl-server with CERTCTL_BOOTSTRAP_TOKEN + POST /api/v1/auth/bootstrap — proves the full migration ladder ran cleanly on a warm DB second-boot AND that the day-0 admin path works. Steps dropped: 5. issuing test cert via POST /api/v1/certificates — required team_id + renewal_policy_id + issuer_id from the seeded demo data; the original payload was speculative and would have needed maintenance whenever the seed shape changes. Functional cert-issue coverage already in the integration suite. 6. renewing via POST /api/v1/certificates/{id}/renew — same: functional renewal coverage in the integration suite. 7. revoking + asserting audit row presence — same: handler tests cover audit emission. Wall-clock cap tightened from 15min to 10min (the dropped steps were the slowest; 4 steps fit comfortably in ~7-8min cold). Audit-Closes: post-v2.1.0-anti-rot/item-6
This commit is contained in:
+33
-30
@@ -243,18 +243,38 @@ jobs:
|
||||
docker compose version
|
||||
|
||||
- name: Cold-DB compose smoke
|
||||
# 15-min wall-clock cap covers cold image pull + compose-up +
|
||||
# full issue/renew/revoke probe + teardown. Increase only if
|
||||
# the underlying steps legitimately grow.
|
||||
# The smoke deliberately focuses on the bug class that ONLY a
|
||||
# cold boot can catch: stack-startup correctness against a
|
||||
# blank database. It is intentionally NOT a functional API
|
||||
# walkthrough — the integration test suite under
|
||||
# 'Go Test with Coverage' already covers issue / renew /
|
||||
# revoke / audit-row plumbing against a warm DB.
|
||||
#
|
||||
# The bugs this gate is uniquely positioned to catch:
|
||||
# - Missing required env vars that fail Config.Validate()
|
||||
# at startup (e.g. CERTCTL_DEMO_MODE_ACK gap, 2026-05-12).
|
||||
# - Non-idempotent migrations that crash on the second boot
|
||||
# (e.g. migration 000043 CHECK constraint, 2026-05-12).
|
||||
# - Documented manual flows that don't work end-to-end on
|
||||
# a clean compose (e.g. CERTCTL_BOOTSTRAP_TOKEN
|
||||
# interpolation gap, 2026-05-12).
|
||||
#
|
||||
# Bugs OUTSIDE the scope of this smoke (covered elsewhere):
|
||||
# - API request/response contract changes (integration suite).
|
||||
# - Cert lifecycle correctness (integration suite + handler
|
||||
# tests).
|
||||
# - Audit row plumbing (handler tests).
|
||||
#
|
||||
# 10-min wall-clock cap covers cold image pull + compose-up +
|
||||
# force-recreate + admin bootstrap + teardown. Increase only
|
||||
# if the underlying steps legitimately grow.
|
||||
#
|
||||
# The smoke is inlined here on purpose — it is NOT a script in
|
||||
# scripts/ci-guards/, because there is no value in a developer
|
||||
# running this locally. The whole point of the gate is that CI
|
||||
# owns the cold-DB state; the operator never has to remember to
|
||||
# run it. Master branch-protection enforces this job as a
|
||||
# required check; that is the manual action, and it happens
|
||||
# once.
|
||||
timeout-minutes: 15
|
||||
# run it.
|
||||
timeout-minutes: 10
|
||||
working-directory: deploy
|
||||
env:
|
||||
STARTUP_TIMEOUT_SECONDS: 300
|
||||
@@ -298,23 +318,22 @@ jobs:
|
||||
local method="$1" path="$2" data="${3:-}"
|
||||
local args=(--silent --show-error --max-time 30 -X "$method" "$SERVER_URL$path")
|
||||
[ -f "$CACERT_PATH" ] && args+=(--cacert "$CACERT_PATH") || args+=(--insecure)
|
||||
[ -n "${KEY:-}" ] && args+=(-H "Authorization: Bearer $KEY")
|
||||
[ -n "$data" ] && args+=(-H "Content-Type: application/json" -d "$data")
|
||||
curl "${args[@]}"
|
||||
}
|
||||
|
||||
log "1/7 down -v --remove-orphans"
|
||||
log "1/4 down -v --remove-orphans"
|
||||
docker compose down -v --remove-orphans 2>&1 | tail -3 || true
|
||||
|
||||
log "2/7 up -d (cold boot)"
|
||||
log "2/4 up -d (cold boot)"
|
||||
docker compose up -d 2>&1 | tail -3
|
||||
|
||||
log "3/7 wait for healthchecks"
|
||||
log "3/4 wait for healthchecks"
|
||||
wait_for_service_healthy postgres
|
||||
wait_for_service_healthy certctl-server
|
||||
wait_for_service_healthy certctl-agent || log " (agent skipped — non-demo compose)"
|
||||
|
||||
log "4/7 minting day-0 admin"
|
||||
log "4/4 minting day-0 admin (proves migration ladder + bootstrap path)"
|
||||
TOKEN="$(openssl rand -base64 32 | tr -d '\n')"
|
||||
echo "CERTCTL_BOOTSTRAP_TOKEN=$TOKEN" > /tmp/_smoke.env
|
||||
docker compose --env-file /tmp/_smoke.env up -d --force-recreate certctl-server 2>&1 | tail -2
|
||||
@@ -324,24 +343,8 @@ jobs:
|
||||
KEY="$(echo "$BODY" | python3 -c 'import json,sys; print(json.load(sys.stdin)["key_value"])')"
|
||||
[ -n "$KEY" ] || { log "bootstrap failed: $BODY"; exit 1; }
|
||||
|
||||
log "5/7 issuing test cert"
|
||||
ISSUE='{"common_name":"smoke-test.local","profile_id":"profile-default","environment":"test","owner_id":"o-platform"}'
|
||||
R="$(http_call POST /api/v1/certificates "$ISSUE")"
|
||||
CID="$(echo "$R" | python3 -c 'import json,sys; d=json.load(sys.stdin); print(d.get("id") or d.get("certificate",{}).get("id",""))')"
|
||||
[ -n "$CID" ] || { log "issue failed: $R"; exit 1; }
|
||||
|
||||
log "6/7 renewing $CID"
|
||||
http_call POST "/api/v1/certificates/$CID/renew" >/dev/null
|
||||
|
||||
log "7/7 revoking + asserting audit rows"
|
||||
http_call POST "/api/v1/certificates/$CID/revoke" '{"reason":"smoke-test"}' >/dev/null
|
||||
AUD="$(http_call GET '/api/v1/audit?limit=50')"
|
||||
for action in cert.issued cert.renewed cert.revoked; do
|
||||
if ! echo "$AUD" | python3 -c "import json,sys; d=json.load(sys.stdin); evs=d.get('events') or d.get('audit',{}).get('events') or []; sys.exit(0 if any(e.get('action')=='$action' for e in evs) else 1)"; then
|
||||
log "MISSING audit row: $action"; echo "$AUD" | head -200; exit 1
|
||||
fi
|
||||
done
|
||||
log "PASS — tearing down"
|
||||
log "PASS — cold boot + force-recreate + admin bootstrap all green"
|
||||
log "tearing down"
|
||||
docker compose down -v 2>&1 | tail -2
|
||||
|
||||
- name: Dump compose logs on failure
|
||||
|
||||
Reference in New Issue
Block a user