Sprint 6 closure of the audit's HIGH-severity COMP-001-HASH finding.
Pre-fix posture: migration 000018 installs a WORM trigger on
audit_events that blocks UPDATE / DELETE for the application role.
But the trigger header itself documents a compliance-superuser
bypass (backup restore, retention purges, breach recovery). Without
a hash chain, that role can rewrite any row's actor / action /
details / timestamp / event_category with no on-disk trace.
HIPAA §164.312(b), FedRAMP AU-9, NIST 800-53 AU-10 want tamper-
EVIDENCE, not just tamper-prevention. This commit ships the
evidence layer.
Wire shape:
migrations/000047_audit_events_hash_chain.up.sql
+ pgcrypto extension (digest function)
+ audit_chain_head: single-row sentinel table holding the most
recent row_hash; FOR UPDATE row-lock serialises chain writes
under concurrent INSERTs so two parallel writers can't read
the same prev_hash and produce a forked chain
+ audit_events: prev_hash + row_hash columns
+ audit_events_canonical_payload(): centralised hash input
builder. UTC + microsecond ISO-8601 keeps the hash session-
timezone-independent. All columns separated by '|' so a
concatenation-ambiguity exploit can't fabricate a collision
+ audit_events_compute_hash_chain(): BEFORE-INSERT trigger
function. Reads sentinel FOR UPDATE → computes
sha256(prev_hash || id || actor || actor_type || action ||
resource_type || resource_id || details::text ||
timestamp_utc_iso || event_category) → writes both columns +
advances the sentinel
+ backfill loop walks every existing row in (timestamp ASC, id
ASC) order; WORM trigger temporarily DISABLEd inside this
migration's transaction so backfill UPDATEs land cleanly,
ENABLEd before COMMIT
+ audit_events_verify_chain(): STABLE plpgsql verifier. Walks
the chain end-to-end and returns the first break:
(first_break_id TEXT, first_break_pos INT, row_count INT)
internal/repository/postgres/audit.go
+ AuditRepository.VerifyHashChain — calls the SQL function and
maps the OUT parameters to Go return values
internal/repository/interfaces.go
+ AuditRepository.VerifyHashChain in the contract; every
in-memory mock + stub picks up the no-op implementation
internal/scheduler/scheduler.go
+ AuditChainVerifier + AuditChainBreakRecorder interfaces
+ auditChainVerifyInterval (default 6h)
+ auditChainVerifyLoop: runs once on start + every tick;
atomic.Bool guard + 5-min per-tick context timeout match every
other GC loop's pattern
internal/service/audit_chain_metric.go
+ AuditChainCounter type with atomic counters. Sticky-first-
detection on (BrokenAtID, BrokenAtPos) so the actionable
alarm doesn't drift across walks. Snapshot() returns the
full state for the metrics handler
internal/api/handler/metrics.go
+ AuditChainCounterSnapshotter interface + Prometheus
exposition for four series:
certctl_audit_chain_break_detected_total counter (the alarm)
certctl_audit_chain_verify_total counter (walks done)
certctl_audit_chain_rows gauge (last walk size)
certctl_audit_chain_last_verified_at gauge (unix seconds)
internal/config/config.go
+ AuditChainConfig{ VerifyInterval } + CERTCTL_AUDIT_CHAIN_VERIFY_INTERVAL
cmd/server/main.go
+ wires AuditChainCounter into both the scheduler (recorder) +
metrics handler (snapshotter) — single instance shared so the
writer + reader are guaranteed to converge
internal/repository/postgres/audit_chain_test.go (NEW)
+ TestAuditEventsHashChain_FreshTable: empty walk → clean
+ TestAuditEventsHashChain_AppendLinksRows: three INSERTs
produce a strictly-linked chain; prev_hash on row 0 is NULL;
verifier walks clean over the 3 rows
+ TestAuditEventsHashChain_VerifierDetectsTampering: simulate
the compliance-superuser threat model (DISABLE WORM, UPDATE
a middle row, ENABLE WORM); verifier returns the tampered
row's id at position 1
docs/operator/audit-chain.md (NEW)
+ Layered-defenses explainer (WORM + hash chain). Verifier
function reference. Recommended Prometheus alert rule.
Performance scaling table (10k to 10M rows). Step-by-step
runbook for what to do when a break is detected. Operator
configuration table.
Test-stub additions for AuditRepository.VerifyHashChain:
internal/service/testutil_test.go — mockAuditRepo
internal/service/acme_test.go — fakeAuditRepo
internal/integration/lifecycle_test.go — mockAuditRepository
internal/api/handler/scep_intune_e2e_test.go — intuneE2EAuditRepo
Verified locally:
go vet ./... (clean)
gofmt -l internal/ cmd/ (clean)
go test -short -count=1 ./internal/scheduler/... ./internal/config/...
./internal/service/... ./internal/api/handler/... ./internal/repository/...
(all green)
Verified with testcontainers + postgres:16-alpine + the migration
runner (not gated under -short — requires docker):
go test -count=1 -run TestAuditEventsHashChain ./internal/repository/postgres/...
Closes COMP-001-HASH leg of Sprint 6. COMP-002-RETENTION lands in
the next commit (separate concern: federated-user PII retention).
certctl Documentation
Last reviewed: 2026-05-12
The full docs index, organized by audience. Pick the section that matches what you need to do; each link below opens a focused doc rather than a wall of text.
For the elevator pitch and quickstart commands, see the repo README.md at the root. For the marketing site, see certctl.io.
Getting Started
You're new to certctl, just cloned the repo, or want to understand what it does before installing.
| Doc | What it covers |
|---|---|
| Concepts | TLS certificates explained for beginners — CAs, ACME, EST, private keys, the full glossary |
| Quickstart | Five-minute setup with Docker Compose, dashboard tour, API tour |
| Examples | Five turnkey scenarios — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer |
| Advanced demo | End-to-end certificate lifecycle with technical depth at each step |
| Why certctl | Positioning vs ACME clients, agent-based SaaS, enterprise platforms; when to look elsewhere |
Reference
You're operating certctl in production or building integrations and need authoritative technical detail.
| Doc | What it covers |
|---|---|
| Architecture | System design, data flow, security model, deployment topologies |
| Profiles | CertificateProfile policy object — issuer wiring, EKUs, RequiresApproval gate (with profile-edit closure) |
| API | OpenAPI 3.1 spec, integration patterns, client SDK generation |
| CLI | certctl-cli command reference and CI/CD integration patterns |
| Configuration | CERTCTL_* environment variable reference (scheduler, rate limits, deploy verify, audit, agent) |
| MCP server | Model Context Protocol integration for AI assistants |
| Release verification | Cosign / SLSA / SBOM verification procedure |
| Intermediate CA hierarchy | Multi-level CA tree management — RFC 5280 §3.2/§4.2.1.9/§4.2.1.10 enforcement |
| Auth standards implemented | RFC + CWE evidence for the API-key + RBAC + OIDC + sessions + break-glass surface (NOT a compliance-mapping doc) |
| Deployment model | Atomic write, post-deploy verify, rollback semantics across all targets |
| Vendor matrix | Tested vendor versions per target connector |
Connectors
The connector index is the canonical catalog (interfaces, registry, scanners, plus an inline reference per built-in). Per-connector deep-dive siblings cover operator-grade material — vendor edges, troubleshooting, rotation playbooks, when-to-use vs alternatives.
Issuers (13 deep-dives): ACME · ADCS · AWS ACM Private CA · DigiCert · EJBCA / Keyfactor · Entrust · GlobalSign Atlas HVCA · Google CAS · Local CA · OpenSSL / Custom CA · Sectigo SCM · step-ca / Smallstep · Vault PKI
Targets (15 deep-dives): Apache · AWS Certificate Manager · Azure Key Vault · Caddy · Envoy · F5 BIG-IP · HAProxy · IIS · Java Keystore · Kubernetes Secrets · NGINX · Postfix / Dovecot · SSH (agentless) · Traefik · Windows Certificate Store
Protocols
| Doc | What it covers |
|---|---|
| ACME server | Run certctl as an RFC 8555 + RFC 9773 ARI ACME server |
| ACME server threat model | Security posture for the ACME server endpoint |
| SCEP server | RFC 8894 native SCEP server — RA cert config, multi-profile dispatch, must-staple, mTLS sibling route |
| SCEP for Microsoft Intune | Intune-specific deployment guide — NDES replacement playbook |
| EST server | RFC 7030 EST server — 802.1X / Wi-Fi enrollment, IoT bootstrap, channel binding |
| CRL & OCSP | RFC 5280 CRL + RFC 6960 OCSP responder for relying parties |
| Async CA polling | Bounded polling for async-CA issuer connectors |
Operator
You're running certctl in production and need operational guidance.
| Doc | What it covers |
|---|---|
| Security posture | Auth, rate limits, encryption at rest, key rotation, RBAC + OIDC + sessions + break-glass, bootstrap |
| Secret custody | Where private keys live; FileDriver vs HSM/KMS; encryption wire format; env-seeded vs DB-seeded plaintext policy |
| Observability | Metrics surface, Prometheus exposition vs client_golang, tracing scope, log structure, rate-limit semantics across restarts/replicas |
| RBAC operator reference | Roles, permissions, scopes, scope-down + day-0 bootstrap |
| Auth threat model | API-key + RBAC + OIDC + sessions + break-glass — token forgery, session hijacking, IdP compromise, role-grant abuse, bootstrap-token leak, audit-mutation |
| OIDC / SSO runbooks | Per-IdP setup guides — Keycloak, Authentik, Okta, Auth0, Entra ID, Google Workspace |
| Control plane TLS | Self-signed bootstrap, operator-supplied Secret, cert-manager Certificate CR |
| Database TLS | PostgreSQL transport encryption |
| Approval workflow | Two-person integrity gate for high-stakes issuance + profile-edit closure |
| Helm deployment | Kubernetes installation via the bundled chart |
| Performance baselines | Operator-runnable benchmarks for regression spot checks |
| Auth benchmarks | Session + OIDC validation p99 targets and measured baselines |
| Legacy clients (TLS 1.2) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 |
Runbooks
| Runbook | When |
|---|---|
| Cloud targets | AWS ACM + Azure Key Vault deployment, debugging, rollback |
| Expiry alerts | Per-policy multi-channel routing matrix, severity tiers |
| Disaster recovery | CRL cache, OCSP responder cert, CA private-key rotation, Postgres restore |
| Config-encryption upgrade | Force v1/v2 → v3 re-seal across the database; passphrase rotation procedure |
| PostgreSQL backup | Operator-run backup recipe (docker-compose + Kubernetes); recommended cadence; quarterly DR dry-run |
Migration
You're moving from another cert-management tool to certctl, or running both in parallel.
| From | Doc |
|---|---|
| Certbot | migration/from-certbot.md |
| acme.sh | migration/from-acmesh.md |
| cert-manager (coexistence, not replacement) | migration/cert-manager-coexistence.md |
| Caddy ACME (point Caddy at certctl) | migration/acme-from-caddy.md |
| cert-manager ACME (point cert-manager at certctl) | migration/acme-from-cert-manager.md |
| Traefik ACME (point Traefik at certctl) | migration/acme-from-traefik.md |
| API keys → RBAC (v2.0.x → v2.1.0) | migration/api-keys-to-rbac.md — AUDIT YOUR API KEYS post-upgrade |
| Enable OIDC SSO | migration/oidc-enable.md — step-by-step OIDC onboarding for an existing API-key + RBAC deployment |
Contributor
You're contributing to certctl, running tests locally, or trying to understand the CI pipeline.
| Doc | What it covers |
|---|---|
| Testing strategy | What we test and why; per-PR fast gates vs daily deep-scan |
| Test environment | Local environment with real CAs (Pebble, step-ca, etc.) |
| QA prerequisites | Before running QA: stack boot, demo data baseline, env vars |
| QA test suite | qa_test.go reference for release QA |
| GUI QA checklist | Manual GUI verification pass for release |
| Release sign-off | Release-day checklist — code state, automated gates, manual QA, artefact verification |
| CI pipeline | CI shape, regression guards, adding new checks |
| CI guards | Per-class CI guards (code-shape, contract-parity, build/dep, operational); how to add one |
Archive
Historical docs preserved for reference. Most operators don't need these.
| Doc | Why archived |
|---|---|
| Upgrade to TLS (v2.2) | Pre-v2.2 HTTPS-everywhere upgrade procedure |
| Upgrade past v2 JWT removal | G-1 milestone JWT auth removal procedure |
Reading order by role
First-time operator: Concepts → Quickstart → Examples. About 90 minutes end to end.
Production operator: Architecture → Security posture → Control plane TLS → Disaster recovery runbook. About 4 hours end to end.
PKI engineer: ACME server → SCEP server → EST server → Intermediate CA hierarchy. About 6 hours end to end.
Contributor: Architecture → Testing strategy → Test environment → CI pipeline. About 3 hours end to end.