mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:41:30 +00:00
docs(b12): observability reference + Postgres backup runbook
Closes acquisition-diligence Bundle 12 — Observability, DR,
Operations Receipts, And Performance Proof. Source IDs: D5, D6, D8,
T9, finding 7, OPS-H1, OPS-M1, OPS-M2, LOW-7.
Two new operator-facing references; both non-audit-framed per the
Bundle 5 doc-placement policy.
docs/operator/observability.md — single canonical statement of what
certctl emits, what it doesn't, and what survives a restart:
- Metrics surface: both /api/v1/metrics (JSON) and
/api/v1/metrics/prometheus (text exposition v0.0.4); inventory of
certctl_certificate_* gauges + certctl_issuance_duration_seconds
per-issuer-type histogram + certctl_uptime_seconds.
- Prometheus library vs hand-rolled exposition: explicit scope
statement — hand-rolled fmt.Fprintf is intentional for v2.x given
the shallow metric surface; client_golang migration tracked as
v3 item (closes OPS-M1).
- Tracing: explicit deferral — no OTel SDK setup, OTel packages
are indirect-only in go.mod, no spans, no OTLP exporter; tracked
as v3 item; in the meantime structured logs carry request_id and
certctl_issuance_duration_seconds carries the per-issuer latency
signal (closes OPS-M2).
- Logging: structured JSON via log/slog; CERTCTL_LOG_LEVEL control;
no key material / bearer tokens / session cookies in log lines.
- Rate-limit semantics under restarts + replicas: per-process,
in-memory, reset-on-restart, NOT shared across replicas; full
inventory of the 5 limiter call sites (break-glass login,
SCEP/Intune per-device, EST per-principal CSR, EST HTTP-Basic
source-IP, ACME per-account); multi-replica + sticky-session
implications; database-backed sliding window deferred to v3
(closes D8).
- Performance harness scope: cross-references the explicit
'What it explicitly does NOT measure' list in
deploy/test/loadtest/README.md (closes LOW-7 + finding 7).
docs/operator/runbooks/postgres-backup.md — operator-runnable
backup procedure:
- Inventory of what to back up (DB + operator-managed file
material that lives outside the DB: CA keys, RA keys, OCSP
responder keys, trust bundles).
- Logical backup recipe with docker-compose + Kubernetes variants,
integrity verification step, off-host storage step.
- Physical / PITR recipe pointing at pgbackrest / wal-g
(certctl ships nothing here — standard PostgreSQL DBA work).
- Three sample automation paths (in-cluster Postgres → S3 CronJob,
managed Postgres PITR, self-hosted VM systemd timer + restic).
- Quarterly restore-dry-run procedure.
- Helm CronJob template deliberately not shipped — three
documented reasons (deployment topology / secret-management
integration / off-host storage all vary by operator) plus
roadmap entry for shipping a starter template when a real
operator asks for one (closes D6 + OPS-H1).
Both new docs wired into docs/README.md Operator + Runbooks tables.
D5 (ServiceMonitor) and T9 (canonical k6 load-test) were already
shipped in Bundle 3 (deploy/helm/certctl/templates/servicemonitor.yaml)
and in deploy/test/loadtest/ + .github/workflows/loadtest.yml
respectively; this bundle doesn't touch them — it just records the
closure in the audit HTML.
Verified:
bash scripts/ci-guards/G-3-env-docs-drift.sh # PASS
bash scripts/ci-guards/doc-rot-detector.sh # PASS
All 35 scripts/ci-guards/*.sh green.
This commit is contained in:
@@ -66,6 +66,7 @@ You're running certctl in production and need operational guidance.
|
||||
|---|---|
|
||||
| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation, RBAC + OIDC + sessions + break-glass, bootstrap |
|
||||
| [Secret custody](operator/secret-custody.md) | Where private keys live; FileDriver vs HSM/KMS; encryption wire format; env-seeded vs DB-seeded plaintext policy |
|
||||
| [Observability](operator/observability.md) | Metrics surface, Prometheus exposition vs client_golang, tracing scope, log structure, rate-limit semantics across restarts/replicas |
|
||||
| [RBAC operator reference](operator/rbac.md) | Roles, permissions, scopes, scope-down + day-0 bootstrap |
|
||||
| [Auth threat model](operator/auth-threat-model.md) | API-key + RBAC + OIDC + sessions + break-glass — token forgery, session hijacking, IdP compromise, role-grant abuse, bootstrap-token leak, audit-mutation |
|
||||
| [OIDC / SSO runbooks](operator/oidc-runbooks/index.md) | Per-IdP setup guides — Keycloak, Authentik, Okta, Auth0, Entra ID, Google Workspace |
|
||||
@@ -85,6 +86,7 @@ You're running certctl in production and need operational guidance.
|
||||
| [Expiry alerts](operator/runbooks/expiry-alerts.md) | Per-policy multi-channel routing matrix, severity tiers |
|
||||
| [Disaster recovery](operator/runbooks/disaster-recovery.md) | CRL cache, OCSP responder cert, CA private-key rotation, Postgres restore |
|
||||
| [Config-encryption upgrade](operator/runbooks/config-encryption-upgrade.md) | Force v1/v2 → v3 re-seal across the database; passphrase rotation procedure |
|
||||
| [PostgreSQL backup](operator/runbooks/postgres-backup.md) | Operator-run backup recipe (docker-compose + Kubernetes); recommended cadence; quarterly DR dry-run |
|
||||
|
||||
## Migration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user