mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-08 06:38:58 +00:00
Compare commits
151 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 46769fc7fa | |||
| 12705efe36 | |||
| de53847f51 | |||
| 56e2ea1ad7 | |||
| 1b03d0c594 | |||
| def4be9b38 | |||
| aa1efd0676 | |||
| 360e7449ad | |||
| 1b529985be | |||
| fefeccfa59 | |||
| 1cfa9f2e2a | |||
| 70ebef5d3a | |||
| eee124efb6 | |||
| 80cbd2db59 | |||
| 8aeeec93c0 | |||
| 09bea664d5 | |||
| a4b2919f59 | |||
| 9f617add29 | |||
| ecba4112b7 | |||
| 54f535a007 | |||
| f1219f8cd3 | |||
| d5522debfb | |||
| 9a8130de32 | |||
| dfdba5b260 | |||
| 90c7b5813f | |||
| e92af14a22 | |||
| 64ad8e525c | |||
| a923cf697c | |||
| b8fac59200 | |||
| ad69158405 | |||
| 11b145b641 | |||
| 4e31568d3d | |||
| 68af18d081 | |||
| df53b80cb6 | |||
| 11a1f0babd | |||
| 027a5a1468 | |||
| 9af5dad2b0 | |||
| 92519436a1 | |||
| f502da306f | |||
| 0152bdf567 | |||
| cc8024932b | |||
| 78485f7429 | |||
| a123263498 | |||
| 191384c1d2 | |||
| 172b30b8f1 | |||
| e1e43c8924 | |||
| ca31232ad2 | |||
| 532cae249d | |||
| e005c004e1 | |||
| b4b98799d5 | |||
| 2a1a0b347c | |||
| 2cd2a5c52f | |||
| 874419989d | |||
| 72b54ce850 | |||
| e7c4654b16 | |||
| 9cce2ab043 | |||
| 630831aeac | |||
| 925523e06e | |||
| ba0959ddc7 | |||
| 912ec3f547 | |||
| 2e97cc10b8 | |||
| f5ba17114d | |||
| 90210c9334 | |||
| 0f340beb14 | |||
| 15435ca02b | |||
| 1697845493 | |||
| 739745e9fe | |||
| f1d97710e1 | |||
| 00eace8068 | |||
| ca1e135aa3 | |||
| 68ca42fef1 | |||
| c03d18bb1c | |||
| 3f335af45e | |||
| 9b6294e83d | |||
| 130a65f3b6 | |||
| 5e2accbf5f | |||
| f203a5372d | |||
| 2893f9b48e | |||
| 8de28a74ba | |||
| b09bd0984a | |||
| 9143003e95 | |||
| 1d01c87663 | |||
| 3189f3cd71 | |||
| 9c679a5960 | |||
| 17b30c1f7f | |||
| 854135dfb7 | |||
| 95f1d6cf63 | |||
| 315e132981 | |||
| b0ac24fbf8 | |||
| 2d9110b0c4 | |||
| 977cdbdf44 | |||
| 5d79e53ad0 | |||
| 3e91c7a1f0 | |||
| 51f55c5fc9 | |||
| 22c4971012 | |||
| efea4d0e03 | |||
| 45122d7edb | |||
| 5313cd8492 | |||
| e7a94b6080 | |||
| 06cea1ce0f | |||
| cbb47aaf5d | |||
| cfe76ad381 | |||
| 69a508dfcf | |||
| af4fa12724 | |||
| 3ef45e2ad4 | |||
| 60a589ab96 | |||
| 7ff2e2de08 | |||
| b169f258de | |||
| d473398aba | |||
| bd54d5f7fa | |||
| 19497eef87 | |||
| 99a012e3be | |||
| 71ebccb8ba | |||
| ff6bf8f203 | |||
| 7a9ae3157f | |||
| 1720e11109 | |||
| f40e975439 | |||
| 0e06f6c4fc | |||
| ff75361553 | |||
| e0aaa967c9 | |||
| 17455d2ea2 | |||
| f2c77ba3fb | |||
| d2b62880ce | |||
| 75097909e9 | |||
| 7c5cc57d75 | |||
| 9acf609ac9 | |||
| 622cd29f20 | |||
| d809874fa1 | |||
| 5ea8fb48eb | |||
| 3275f9f1e0 | |||
| ecb8896b1c | |||
| f179eab071 | |||
| 969853ee53 | |||
| 082b8cf660 | |||
| de06141ce5 | |||
| fd94205cfa | |||
| b452013dd9 | |||
| fd4eb3b165 | |||
| a364cd6990 | |||
| 12d7b1f51d | |||
| 19c8fafe84 | |||
| 426760d737 | |||
| affaa11d14 | |||
| dca1900815 | |||
| 633e440787 | |||
| cee008207b | |||
| e9b15108d9 | |||
| f157c18368 | |||
| b21c02a3d5 | |||
| 3a807ae37e | |||
| cda957f302 |
+12
-8
@@ -30,14 +30,18 @@ CERTCTL_SERVER_PORT=8443
|
||||
CERTCTL_LOG_LEVEL=info
|
||||
CERTCTL_LOG_FORMAT=json
|
||||
|
||||
# Auth type: "api-key" (production) or "none" (demo/development).
|
||||
# For JWT/OIDC, run an authenticating gateway in front of certctl
|
||||
# (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and
|
||||
# set CERTCTL_AUTH_TYPE=none on the upstream — see
|
||||
# docs/architecture.md "Authenticating-gateway pattern". G-1 removed
|
||||
# the in-process "jwt" option (no JWT middleware shipped — silent auth
|
||||
# downgrade); see docs/upgrade-to-v2-jwt-removal.md if you previously
|
||||
# set CERTCTL_AUTH_TYPE=jwt.
|
||||
# Auth type: "api-key" (production), "none" (demo/development), or
|
||||
# "oidc" (Auth Bundle 2 - native OIDC SSO via coreos/go-oidc/v3, ships
|
||||
# in Bundle 2 phases 5+6; setting CERTCTL_AUTH_TYPE=oidc on a build
|
||||
# without Bundle 2 wired triggers a clear refuse-to-start error rather
|
||||
# than a silent fallback to api-key). For JWT / SAML / LDAP, continue to
|
||||
# run an authenticating gateway in front of certctl (oauth2-proxy /
|
||||
# Envoy ext_authz / Traefik ForwardAuth / Pomerium) and set
|
||||
# CERTCTL_AUTH_TYPE=none on the upstream - see docs/architecture.md
|
||||
# "Authenticating-gateway pattern". G-1 removed the in-process "jwt"
|
||||
# option (no JWT middleware shipped - silent auth downgrade); see
|
||||
# docs/upgrade-to-v2-jwt-removal.md if you previously set
|
||||
# CERTCTL_AUTH_TYPE=jwt.
|
||||
CERTCTL_AUTH_TYPE=none
|
||||
# Required when CERTCTL_AUTH_TYPE is "api-key".
|
||||
# Generate with: openssl rand -base64 32
|
||||
|
||||
@@ -76,3 +76,154 @@ internal/mcp:
|
||||
Bundle K / Coverage-Audit C-002 — MCP per-tool dispatch via
|
||||
in-memory transport lifts package from 28.0% to 93.1% (per-
|
||||
package run). Floor at 85.
|
||||
|
||||
internal/auth:
|
||||
floor: 85
|
||||
why: |
|
||||
Bundle 1 Phase 12 — RBAC primitive coverage gate.
|
||||
internal/auth ships keystore + middleware + RequirePermission +
|
||||
bootstrap + the Phase-3 context keys + the protocol-endpoint
|
||||
allowlist. Negative-test coverage (no actor → 401, no role →
|
||||
403, wrong scope → 403, bootstrap-token-wrong → 401, bootstrap-
|
||||
used-twice → 410, admin-already-exists → 410, zero-length token
|
||||
rejection) is now in place. Prescribed Bundle 1 target was 90;
|
||||
held at 85 to absorb the per-file-average dip from the
|
||||
middleware shim files (testfixtures.go) which CI runs but only
|
||||
test fixtures exercise. Sub-package internal/auth/bootstrap
|
||||
inherits this floor.
|
||||
|
||||
internal/service/auth:
|
||||
floor: 85
|
||||
why: |
|
||||
Bundle 1 Phase 12 — RBAC service-layer coverage gate.
|
||||
PermissionService + RoleService + ActorRoleService + Authorizer
|
||||
each have positive + negative tests covering the
|
||||
privilege-escalation guard (auth.role.assign required for
|
||||
Grant/Revoke), the reserved-actor invariant (actor-demo-anon
|
||||
cannot be mutated), the canonical-permission validation, the
|
||||
role-in-use guard on Delete, and every sentinel-error path
|
||||
(ErrUnauthenticated / ErrForbidden / ErrSelfRoleAssignment /
|
||||
ErrAuthReservedActor / ErrAuthUnknownPermission /
|
||||
ErrAuthRoleInUse).
|
||||
|
||||
internal/auth/oidc:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 3 — OIDC service coverage gate. Phase 3 spec
|
||||
pins the floor at 90 explicitly because every fail-closed
|
||||
branch is load-bearing for the security posture: alg pinning
|
||||
(deny-list HS*/none + allow-list RS*/ES*/EdDSA), audience
|
||||
re-check, azp enforcement on multi-aud tokens, at_hash
|
||||
REQUIRED-when-access-token-present (Phase 3 lifts the OIDC
|
||||
core "MAY" to a service-level "MUST"), iat-window window,
|
||||
nonce constant-time-compare, single-use state replay defense,
|
||||
PKCE-S256 mandatory, IdP downgrade-attack defense at
|
||||
provider-load + RefreshKeys time, JWKS-fail-closed semantics,
|
||||
group-claim resolution + userinfo-fallback fail-closed
|
||||
semantics, token-leak hygiene. A regression in any one of
|
||||
these branches is a security incident; the floor catches it
|
||||
before the commit lands. The mock-IdP fixture in
|
||||
service_test.go is the load-bearing harness.
|
||||
|
||||
internal/auth/oidc/groupclaim:
|
||||
floor: 95
|
||||
why: |
|
||||
Bundle 2 Phase 3 — group-claim resolver. Hand-rolled (no
|
||||
JSON-path dep per Decision 10); ~150 LOC, every branch
|
||||
exercised by 19 unit tests covering the documented IdP shapes
|
||||
(Okta string array, Keycloak realm_access.roles, Auth0
|
||||
namespaced URL claim, single-string normalization,
|
||||
deeply-nested 3-segment walks) plus every fail-closed branch
|
||||
(empty path, missing key, missing nested key, non-object
|
||||
intermediate, bool/number/object/nil values, array with
|
||||
non-string element, URL-shape with dots-in-path treated as
|
||||
literal). Resolver should be at 100%; floor at 95 leaves a
|
||||
1-statement margin for future error-message refactors.
|
||||
|
||||
internal/auth/oidc/domain:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 1 — OIDCProvider + GroupRoleMapping domain.
|
||||
Validation-heavy package; constructors + Validate methods
|
||||
cover all canonical IdP shapes (Okta / Azure AD / Google
|
||||
Workspace / Keycloak / Authentik / Auth0). Floor at 90 to
|
||||
catch any future field that ships without a validator.
|
||||
|
||||
internal/auth/session:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 4 — session lifecycle service. Phase 4 spec
|
||||
pins the floor at 90 because every fail-closed branch carries
|
||||
a security invariant: HMAC-SHA256 cookie signing with a
|
||||
LENGTH-PREFIXED canonical input (defeats the
|
||||
`<a, bc>`-vs-`<ab, c>` concatenation collision attack on the
|
||||
bare-concat form), v1. version-prefix lock, idle expiry,
|
||||
absolute expiry, revocation, retired-but-in-retention key
|
||||
success path, retired-past-retention failure path, CSRF
|
||||
constant-time compare against the SHA-256-hashed copy on the
|
||||
session row, optional IP/UA-bind defense-in-depth gates,
|
||||
fail-fatal initial-key bootstrap. A regression in any one of
|
||||
these branches is a security incident; the floor catches it
|
||||
before the commit lands. The 15-case negative-test matrix in
|
||||
service_test.go is the load-bearing harness; the in-memory
|
||||
stubs of SessionRepo + SigningKeyRepo + AuditRecorder let the
|
||||
state machine be exercised without the postgres testcontainer
|
||||
overhead (which Phase 2's integration tests already cover).
|
||||
|
||||
internal/auth/session/domain:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 1 — Session + SessionSigningKey domain. Both
|
||||
types ship Validate() with full invariant coverage: ID prefix
|
||||
enforcement (ses-/sk-), expiry-order CHECK (absolute > idle >
|
||||
created), CSRFTokenHash format pin (64 lowercase hex chars),
|
||||
KeyMaterialEncrypted non-empty, retired-before-created
|
||||
rejection, TenantID defaulting. Cookie naming constants are
|
||||
pinned by TestCookieNamingConstants because the GUI's
|
||||
web/src/api/client.ts will read `certctl_csrf` by string.
|
||||
Floor at 90 to catch any future field that ships without a
|
||||
validator.
|
||||
|
||||
internal/auth/breakglass:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 7.5 — break-glass admin service (Argon2id +
|
||||
lockout state machine + constant-time-via-verifyDummy). Phase
|
||||
13 Pre-merge audit: floor at 90 with no carve-out. Phase 7.5
|
||||
spec ships the package at 91.5%, validated by 8 mandated
|
||||
negatives + ~12 coverage-lift tests. Every fail-closed branch
|
||||
is load-bearing for the security surface (default-OFF posture
|
||||
only matters if every "disabled" path returns ErrDisabled
|
||||
BEFORE any DB lookup; constant-time defense only matters if
|
||||
every path goes through verifyDummy on the no-credential leg).
|
||||
A regression that drops a fail-closed branch's coverage below
|
||||
90 is a real security risk — gate trips, operator audits.
|
||||
|
||||
internal/auth/breakglass/domain:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 1 — BreakglassCredential domain. Argon2id PHC
|
||||
format pinned ($argon2id$ prefix), MinPasswordLengthBytes (12)
|
||||
+ MaxPasswordLengthBytes (256) constants pinned by dedicated
|
||||
test, IsLocked(now) state machine helper. The package ships
|
||||
at 100% coverage; floor at 90 is the standing-room floor for
|
||||
any future field added without a validator.
|
||||
|
||||
internal/auth/user/domain:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 1 — User domain (federated-human identity).
|
||||
OIDCSubject + OIDCProviderID unique-index per the Phase 2
|
||||
schema, WebAuthnCredentials JSONB reserved for v3, Validate()
|
||||
enforces every on-disk invariant. The package ships at 96.4%
|
||||
coverage. Floor at 90 to catch any future field added without
|
||||
a validator.
|
||||
|
||||
Phase 13 prompt explicitly enumerates internal/auth/user/ at
|
||||
floor 90. The parent (non-domain) directory has no Go source —
|
||||
the user upsert lives in internal/auth/oidc/service.go alongside
|
||||
group resolution + role mapping (cohesive sequence within the
|
||||
OIDC callback). Splitting upsertUser into a separate
|
||||
internal/auth/user/ service package would harm cohesion without
|
||||
adding test value; the domain layer's invariant coverage is
|
||||
where the floor actually applies.
|
||||
|
||||
+27
-37
@@ -19,7 +19,7 @@ jobs:
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.9'
|
||||
go-version: '1.25.10'
|
||||
|
||||
- name: Go Build
|
||||
run: |
|
||||
@@ -79,7 +79,7 @@ jobs:
|
||||
# does call, this step fails the build until either upstream
|
||||
# ships a fix OR we cut the dep. Deferred-call advisories that
|
||||
# legitimately can't be remediated yet should be added to the
|
||||
# NIST SSDF deviation log in docs/security.md, not silenced here.
|
||||
# NIST SSDF deviation log in docs/operator/security.md, not silenced here.
|
||||
run: govulncheck ./...
|
||||
|
||||
- name: Install staticcheck (Bundle-7 / D-001)
|
||||
@@ -107,7 +107,7 @@ jobs:
|
||||
|
||||
- name: Go Test with Coverage
|
||||
run: |
|
||||
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
|
||||
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/api/router/... ./internal/auth/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
|
||||
|
||||
- name: Check Coverage Thresholds
|
||||
# ci-pipeline-cleanup Phase 2: per-package floors moved to
|
||||
@@ -135,48 +135,38 @@ jobs:
|
||||
GITHUB_REPOSITORY: ${{ github.repository }}
|
||||
run: bash scripts/coverage-pr-comment.sh
|
||||
|
||||
# Bundle P / Strengthening #6 — QA-doc drift guards. Forces every PR
|
||||
# that adds a Part to docs/testing-guide.md OR a seed row to
|
||||
# migrations/seed_demo.sql to keep docs/qa-test-guide.md in sync. This
|
||||
# eliminates the doc-drift class structurally — the symptom Bundle I
|
||||
# had to clean up by hand becomes a CI-time error going forward.
|
||||
- name: QA-doc Part-count drift guard
|
||||
run: |
|
||||
set -e
|
||||
DOC_PARTS=$(grep -oE '49 of [0-9]+ Parts' docs/qa-test-guide.md | grep -oE '[0-9]+' | tail -1)
|
||||
GUIDE_PARTS=$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md)
|
||||
if [ -z "$DOC_PARTS" ]; then
|
||||
echo "::error::Could not extract Part count from docs/qa-test-guide.md headline."
|
||||
echo " Expected pattern: '49 of <N> Parts'"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$DOC_PARTS" != "$GUIDE_PARTS" ]; then
|
||||
echo "::error::DRIFT — qa-test-guide.md headline claims $DOC_PARTS Parts; testing-guide.md has $GUIDE_PARTS Parts."
|
||||
echo " Update docs/qa-test-guide.md to match. Bundle I patched this once;"
|
||||
echo " Bundle P added this guard so the drift cannot recur silently."
|
||||
exit 1
|
||||
fi
|
||||
echo "QA-doc Part-count drift guard: clean ($DOC_PARTS == $GUIDE_PARTS)."
|
||||
|
||||
# Bundle P / Strengthening #6 — QA-doc seed-count drift guard. Forces
|
||||
# every PR that adds a seed row to migrations/seed_demo.sql to keep
|
||||
# docs/contributor/qa-test-suite.md::Seed Data Reference in sync.
|
||||
#
|
||||
# Phase 5 of the 2026-05-04 docs overhaul (commit c64777f) deleted
|
||||
# docs/testing-guide.md (its content dispersed across the new
|
||||
# audience-organized doc tree); the previous QA-doc Part-count drift
|
||||
# guard tracked Part counts between testing-guide.md and the old
|
||||
# qa-test-guide.md headline. With testing-guide.md gone, that guard's
|
||||
# premise is dead and it has been removed. The seed-count drift class
|
||||
# is still live: qa-test-suite.md::Seed Data Reference enumerates
|
||||
# certs/issuers and seed_demo.sql is the source of truth.
|
||||
- name: QA-doc seed-count drift guard
|
||||
run: |
|
||||
set -e
|
||||
DOC=docs/contributor/qa-test-suite.md
|
||||
# Seed-cert count: agnostic to documented header format. The current
|
||||
# documented count lives in `### Certificates (32 total in ...` —
|
||||
# extract the first integer in that header.
|
||||
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
|
||||
DOC_CERTS=$(grep -oE '### Certificates \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
|
||||
# Authoritative count: unique mc-* IDs in seed_demo.sql.
|
||||
SEED_CERTS=$(grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
|
||||
if [ -z "$DOC_CERTS" ]; then
|
||||
echo "::warning::Could not extract documented cert count from docs/qa-test-guide.md."
|
||||
echo "::warning::Could not extract documented cert count from $DOC."
|
||||
echo " Skipping cert-count drift check (header format may have changed)."
|
||||
elif [ "$DOC_CERTS" != "$SEED_CERTS" ]; then
|
||||
echo "::error::DRIFT — qa-test-guide.md says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
|
||||
echo " Update docs/qa-test-guide.md::Seed Data Reference to match."
|
||||
echo "::error::DRIFT — $DOC says $DOC_CERTS certs; seed_demo.sql has $SEED_CERTS unique mc-* IDs."
|
||||
echo " Update $DOC::Seed Data Reference to match."
|
||||
exit 1
|
||||
fi
|
||||
# Issuers: seed-table count vs doc claim.
|
||||
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' docs/qa-test-guide.md | grep -oE '[0-9]+' | head -1)
|
||||
DOC_ISS=$(grep -oE '### Issuers \([0-9]+' "$DOC" | grep -oE '[0-9]+' | head -1)
|
||||
# Authoritative: unique iss-* IDs (close enough proxy; the issuers
|
||||
# table count IS the unique-ID count for this prefix).
|
||||
SEED_ISS=$(grep -oE 'iss-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l | tr -d ' ')
|
||||
@@ -186,7 +176,7 @@ jobs:
|
||||
# Allow up to 5pp slack — iss-* IDs appear in audit_events and
|
||||
# other reference tables that aren't issuer-table rows. Drift
|
||||
# only flags when the spread grows large.
|
||||
echo "::error::DRIFT — qa-test-guide.md says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
|
||||
echo "::error::DRIFT — $DOC says $DOC_ISS issuers; seed_demo.sql has $SEED_ISS unique iss-* IDs (spread > 5)."
|
||||
exit 1
|
||||
fi
|
||||
echo "QA-doc seed-count drift guard: clean."
|
||||
@@ -209,7 +199,7 @@ jobs:
|
||||
# 167 legitimate tests for no observable behavior change. The
|
||||
# Test<Func>_<Scenario>_<ExpectedResult> form remains documented as
|
||||
# the recommended pattern for parameterized scenarios in
|
||||
# docs/qa-test-guide.md, but is not gated.
|
||||
# docs/contributor/qa-test-suite.md, but is not gated.
|
||||
- name: Regression guards (extracted to scripts/ci-guards/)
|
||||
# All named regression guards live at scripts/ci-guards/<id>.sh per
|
||||
# ci-pipeline-cleanup bundle Phase 1. Each guard is callable locally:
|
||||
@@ -289,7 +279,7 @@ jobs:
|
||||
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
|
||||
# configured. Every lint/template invocation below must pick exactly one
|
||||
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
|
||||
# (certctl.tls.required) and docs/tls.md.
|
||||
# (certctl.tls.required) and docs/operator/tls.md.
|
||||
- name: Lint Helm Chart
|
||||
run: |
|
||||
helm lint deploy/helm/certctl/ \
|
||||
@@ -336,7 +326,7 @@ jobs:
|
||||
# RAM headroom on ubuntu-latest (16 GB ceiling) — operator-confirmed
|
||||
# in Phase 0 / frozen decision 0.14 prototype-branch run. If RAM
|
||||
# regresses, fall back to bucketed matrix per
|
||||
# cowork/ci-pipeline-cleanup/decisions-revised.md.
|
||||
# the project's frozen-decisions log.
|
||||
#
|
||||
# The Windows matrix (deploy-vendor-e2e-windows) was deleted entirely
|
||||
# per Phase 6 / frozen decision 0.5 (revises Bundle II decision 0.4).
|
||||
@@ -353,7 +343,7 @@ jobs:
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.9'
|
||||
go-version: '1.25.10'
|
||||
cache: true
|
||||
|
||||
- name: Build f5-mock-icontrol sidecar
|
||||
@@ -450,7 +440,7 @@ jobs:
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25.9'
|
||||
go-version: '1.25.10'
|
||||
cache: true
|
||||
|
||||
- name: Digest validity (every @sha256 ref must resolve)
|
||||
|
||||
@@ -60,7 +60,7 @@ jobs:
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
# Match ci.yml + release.yml + security-deep-scan.yml.
|
||||
go-version: '1.25.9'
|
||||
go-version: '1.25.10'
|
||||
|
||||
- name: Initialize CodeQL
|
||||
uses: github/codeql-action/init@v3
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Load-test workflow — closes the #8 acquisition-readiness blocker from
|
||||
# the 2026-05-01 issuer coverage audit (see
|
||||
# cowork/issuer-coverage-audit-2026-05-01/RESULTS.md).
|
||||
# the 2026-05-01 issuer coverage audit).
|
||||
#
|
||||
# CADENCE: workflow_dispatch + weekly cron, NOT per-push. Load tests
|
||||
# are minutes long and don't provide useful per-PR signal — per-push
|
||||
|
||||
@@ -15,7 +15,7 @@ on:
|
||||
env:
|
||||
REGISTRY: ghcr.io
|
||||
# Keep in lock-step with .github/workflows/ci.yml (M-3).
|
||||
GO_VERSION: '1.25.9'
|
||||
GO_VERSION: '1.25.10'
|
||||
IMAGE_NAMESPACE: certctl-io
|
||||
|
||||
jobs:
|
||||
|
||||
@@ -20,7 +20,7 @@ name: security-deep-scan
|
||||
#
|
||||
# Each step is best-effort — failures are uploaded as artefacts but do
|
||||
# NOT block the workflow. Triage happens via the Bundle-7 receipt
|
||||
# directory under cowork/comprehensive-audit-2026-04-25/tool-output/.
|
||||
# the project's comprehensive-audit tool-output directory.
|
||||
|
||||
on:
|
||||
schedule:
|
||||
@@ -82,7 +82,7 @@ jobs:
|
||||
# package is mutated independently; the per-package summary line
|
||||
# (`The mutation score is X.YZ`) is grep-extracted into the receipt.
|
||||
# Acceptance threshold: ≥80% kill ratio per package; surviving
|
||||
# mutants get triaged in cowork/comprehensive-audit-2026-04-25/
|
||||
# mutants get triaged in the project's comprehensive-audit notes/
|
||||
# d003-mutation-results.md (per-mutant action item or
|
||||
# equivalent-mutation justification).
|
||||
|
||||
|
||||
+724
-5
@@ -1,8 +1,727 @@
|
||||
# Changelog
|
||||
|
||||
## v2.0.68 — Image registry path changed ⚠️
|
||||
## Unreleased
|
||||
|
||||
> **Image registry path changed.** Starting this release, container images publish to `ghcr.io/certctl-io/certctl-server` and `ghcr.io/certctl-io/certctl-agent`. Existing pulls from `ghcr.io/shankar0123/certctl-{server,agent}:<tag>` continue to work for previously-published tags (the registry never deletes images), but the `:latest` tag at the old path stops moving forward at this release. Update your `docker pull` paths, `docker-compose.yml` `image:` keys, or Helm `image.repository` values to receive future updates. Old `git clone` / `git push` / install-script / API URLs continue to redirect forever — only the container-registry path changed.
|
||||
### Security
|
||||
|
||||
- **Alg-downgrade defense relaxed for Keycloak-shape IdPs (v2.1.0 pre-tag fix).**
|
||||
Pre-fix, the IdP-bind alg-downgrade check at `internal/auth/oidc/service.go`
|
||||
refused to load any OIDC provider whose discovery doc advertised HS256 /
|
||||
HS384 / HS512 / `none` in `id_token_signing_alg_values_supported` —
|
||||
even if RS256 was ALSO advertised. This broke binding against
|
||||
Keycloak 26.x (and a handful of other real IdPs) which list every alg
|
||||
the codebase is capable of in their discovery doc, regardless of which
|
||||
one the realm actually signs with. The v2.1.0 Phase-10 live-IdP smoke
|
||||
surfaced the regression: 6 testcontainers-Keycloak integration tests
|
||||
failed with `oidc: IdP advertises weak signing algorithms (HS*/none); refusing to use as defense against downgrade attacks: HS256`.
|
||||
**Fix:** the check now refuses only when the intersection of advertised
|
||||
vs `DefaultAllowedAlgs` is EMPTY — an IdP advertising HS256 alongside
|
||||
RS256 binds successfully, but an IdP advertising HS-only / none-only
|
||||
still fails closed. The per-token alg pin at sig-verify time
|
||||
(`isDisallowedAlg`, service.go ~L1177) remains the load-bearing defense
|
||||
against the actual algorithm-confusion attack (forged HS256 token
|
||||
signed with the IdP's RS256 pubkey as HMAC secret) — go-oidc/v3's
|
||||
verifier rejects any token whose `alg` header isn't in the configured
|
||||
allow-list, regardless of what the discovery doc claims. Updates:
|
||||
`Service.getOrLoad` alg-check loop rewritten to compute intersection;
|
||||
`ErrIdPDowngradeAdvertised` docstring reflects new semantics;
|
||||
`TestDiscovery` dry-run validator surfaces HS*/none alongside RS* as
|
||||
an informational note (not a hard fail); `docs/operator/auth-threat-model.md`
|
||||
alg-allow-list section updated to call out the load-bearing-defense
|
||||
hierarchy. Tests: `TestService_IdPDowngradeDefense_RS256PlusHS256_BindsSuccessfully`
|
||||
(positive — Keycloak-shape) + `TestService_IdPDowngradeDefense_RejectsHSOnlyAdvertised`
|
||||
(negative — pathological intersection-empty case) +
|
||||
`TestService_RefreshKeys_CatchesPostLoadDowngrade` updated to assert
|
||||
intersection-empty post-rotation; `TestTestDiscovery_AlgDowngrade_HS256AlongsideRS256_BindsWithNote`
|
||||
+ `TestTestDiscovery_AlgDowngrade_HSOnly_StillTrips_HardFail` pin the
|
||||
dry-run validator's new behavior.
|
||||
|
||||
### Tests
|
||||
|
||||
- **Vitest coverage for the 2026-05-10/11 GUI batch (Audit 2026-05-11 Fix 12).**
|
||||
The original GUI-batch commit `661b6db` claimed `npx tsc --noEmit PASS`
|
||||
but shipped no Vitest cases for the new surfaces. The regression-
|
||||
prevention layer was missing — a future refactor of `KeysPage`'s
|
||||
assign modal could silently drop scope_type handling, the LOW-1 demo
|
||||
banner could be hidden by a stray predicate flip, the LOW-11 hide of
|
||||
the delete button on default roles could disappear and let operators
|
||||
click straight into a backend 409, and nothing would surface in CI.
|
||||
This closure adds 35 new test cases across five files:
|
||||
`web/src/pages/auth/UsersPage.test.tsx` (new, 8 cases pinning the
|
||||
active/deactivated/reactivate flow + provider filter + empty state +
|
||||
loading state), `web/src/pages/auth/AuthSettingsPage.test.tsx`
|
||||
(extended +4 cases pinning the MED-12 runtime-config panel —
|
||||
alphabetical sort, `(empty)` placeholder, 403 silent-hide),
|
||||
`web/src/pages/auth/KeysPage.test.tsx` (extended +8 cases pinning
|
||||
the HIGH-10 GUI half — scope_type=global/profile/issuer body shape,
|
||||
expires_at omission vs RFC3339 promotion, whitespace-only scope_id
|
||||
rejection, demo-anon row mutation-button hide),
|
||||
`web/src/pages/auth/RoleDetailPage.test.tsx` (new, 9 cases pinning
|
||||
the MED-8 scope picker + the LOW-11 default-role delete-button hide
|
||||
via the `DEFAULT_ROLE_IDS` set against `r-admin` + `r-auditor`),
|
||||
`web/src/components/AuthProvider.test.tsx` (new, 5 cases pinning the
|
||||
LOW-1 demo-banner visibility predicate — `authType==='none' &&
|
||||
!loading` — across happy/api-key/oidc/loading/rejected branches; the
|
||||
rejected-fetch path keeps the banner visible because the catch
|
||||
treats it as an old-server-fallback to demo-mode, and that behavior
|
||||
is pinned here so a future change surfaces in the diff). 40/40
|
||||
test-file-scoped pass; `tsc --noEmit` clean.
|
||||
|
||||
### Security
|
||||
|
||||
- **CSRF rotation on logout closes HIGH-2 fourth call site (Audit 2026-05-11 Fix 13).**
|
||||
The HIGH-2 closure (`dev/auth-bundle-2`) documented four
|
||||
`RotateCSRFTokenForActor` call sites: login completion (fresh by
|
||||
construction), Assign/RevokeRole on role-mutation (wired), Logout, and
|
||||
an explicit operator endpoint. The 2026-05-11 review verified only 3
|
||||
of the 4 — Logout did NOT rotate the actor's sibling sessions
|
||||
post-revoke, leaving a window where a token captured pre-logout
|
||||
(browser DevTools, malicious extension, session-storage leak) could
|
||||
be replayed against the user's other-device/other-browser sessions
|
||||
until those sessions hit their own idle/absolute expiry.
|
||||
`SessionMinter` interface extended with `RotateCSRFTokenForActor`;
|
||||
`Logout` invokes it after `Revoke(sess.ID)` succeeds. The
|
||||
`auth.session_revoked` audit row gains a `csrf_rotated` detail key
|
||||
carrying the rotated count so SOC / SIEM can correlate logout events
|
||||
with CSRF churn. The no-cookie + invalid-cookie 204 short-circuit
|
||||
paths skip rotation (no session row to rotate against). 3 regression
|
||||
tests in `internal/api/handler/auth_session_oidc_test.go` pin the
|
||||
happy path + the two short-circuit branches. The explicit operator
|
||||
endpoint (4) remains intentionally unbuilt — the three automatic
|
||||
triggers (login + role-mutation + logout) cover the threat model;
|
||||
operators who want a nuclear option can use the existing
|
||||
`RevokeAllForActor` flow which forces re-login → fresh session →
|
||||
fresh CSRF. **HIGH-2 fully closed across all four documented call
|
||||
sites.**
|
||||
|
||||
- **Demo-mode residual-grants detector + cleanup endpoint + CI guard (Audit 2026-05-11 A-8).**
|
||||
HIGH-12 (closure `b81588e`) added a fail-closed bind-address guard
|
||||
that refuses startup when `CERTCTL_AUTH_TYPE=none` binds non-loopback
|
||||
without `CERTCTL_DEMO_MODE_ACK=true`. The Phase 2 leg of that spec —
|
||||
production-startup banner when `actor-demo-anon` has residual role
|
||||
grants in `actor_roles` plus a CI guard banning new synthetic-admin
|
||||
code paths — was deferred. This closure lands all three deferred
|
||||
legs. (1) `cmd/server/preflight_demo_residual.go` runs after the DB
|
||||
is open + audit service is constructed, before the HTTPS listener
|
||||
starts; under any non-`none` auth type it queries `actor_roles` for
|
||||
`actor-demo-anon` and emits a WARN log + `auth.demo_residual_grants_detected`
|
||||
audit row when the row is present. The migration 000029 baseline
|
||||
unconditionally seeds the `ar-demo-anon-admin` row at install time,
|
||||
so EVERY production deploy will see this WARN on first boot — the
|
||||
intended cutover workflow is documented at `docs/operator/security.md`.
|
||||
(2) `POST /api/v1/auth/demo-residual/cleanup` is an admin-class
|
||||
(`auth.role.assign`) cleanup endpoint that removes every
|
||||
`actor-demo-anon` row from `actor_roles` and returns
|
||||
`{"removed": <int64>}`; idempotent (a second call returns
|
||||
`removed:0`), refuses 503 under `Auth.Type=none` (deleting the row
|
||||
would break the demo path), audit-logs every invocation. (3) New
|
||||
env var `CERTCTL_DEMO_MODE_RESIDUAL_STRICT` (default `false`)
|
||||
pivots the WARN to fail-closed startup refusal for operators who
|
||||
want a paranoid hostile-environment posture. (4) CI guard
|
||||
`scripts/ci-guards/no-new-synthetic-admin.sh` pins the 17-entry
|
||||
allowlist of source files that may reference the `actor-demo-anon`
|
||||
literal; new runtime code paths that resolve to the synthetic actor
|
||||
are rejected at PR time so the credibility gap stays closed. The
|
||||
closure was framed as "credibility gap, not exploitable
|
||||
vulnerability" — the residue requires a regression elsewhere in the
|
||||
middleware chain to be exploitable. After this fix, the canonical
|
||||
acquisition-readiness narrative ("RBAC primitive with no
|
||||
synthetic-admin fallback") is fully true. Operator runbook at
|
||||
`docs/operator/security.md#demo-to-production-cutover-audit-2026-05-11-a-8`.
|
||||
|
||||
- **OIDC provider "Test connection" panel (Audit 2026-05-11 Fix 09 — MED-5 GUI half).**
|
||||
MED-5's backend dry-run endpoint (`POST /api/v1/auth/oidc/test`, gated
|
||||
`auth.oidc.create`) shipped on `dev/auth-bundle-2` but had no GUI caller —
|
||||
the `authOIDCTestProvider` function in `web/src/api/client.ts` was dead
|
||||
code. Operators had to complete the create form blind, save, then click
|
||||
"Refresh" to discover whether the issuer URL worked; failures left a
|
||||
broken provider row in the database that had to be deleted before
|
||||
retrying. New shared component
|
||||
`web/src/pages/auth/OIDCTestConnectionPanel.tsx` calls the backend
|
||||
against the live form state and renders a four-row status panel inline:
|
||||
Discovery fetched, JWKS reachable, supported algs (warns when the IdP
|
||||
advertises none), and RFC 9207 iss-parameter advertisement (informational
|
||||
`·` glyph, not ✗, because the spec is SHOULD). Backend per-leg `errors[]`
|
||||
flow into an inline bullet list. The panel is mounted in the
|
||||
OIDCProvidersPage create modal AND the OIDCProviderDetailPage edit form —
|
||||
the edit-form half is load-bearing for verifying IdP rotations (Keycloak
|
||||
realm rename, Okta tenant move) without committing first. Run button is
|
||||
disabled until the issuer URL is non-empty (whitespace-trimmed); the
|
||||
component is read-only — safe to run repeatedly. 8 Vitest tests pin the
|
||||
glyph-vs-glyph contract (✓/✗/⚠/·), the button-disabled-without-issuer
|
||||
shape, and the test-id-suffix collision-prevention when the panel is
|
||||
mounted twice on the same page.
|
||||
|
||||
- **OIDC JWKS health panel + Refresh-now button (Audit 2026-05-11 Fix 10 — MED-7 GUI half).**
|
||||
MED-7's backend endpoint `GET /api/v1/auth/oidc/providers/{id}/jwks-status`
|
||||
(commit `d85114f`) shipped the per-provider verifier counters on
|
||||
`dev/auth-bundle-2` but the GUI never called it. The audit doc had
|
||||
prematurely flipped the row to CLOSED; `authOIDCJWKSStatus` in the
|
||||
API client was dead code. Operators investigating "why is login
|
||||
failing for this IdP" couldn't see `last_refresh_at`,
|
||||
`rejected_jws_count`, or `last_error` from the GUI — they had to
|
||||
drop to curl. New shared component
|
||||
`web/src/pages/auth/OIDCJWKSStatusPanel.tsx` queries the endpoint
|
||||
via TanStack Query (30s `staleTime`, `retry: 0` so a 403 hides the
|
||||
panel silently for callers without `auth.oidc.list`) and renders
|
||||
six dt/dd rows: Last refresh (with `(never — cold cache)` sentinel
|
||||
when the timestamp is empty), Refresh count, Rejected JWS count,
|
||||
Last error (red treatment when non-empty, `(none)` sentinel
|
||||
otherwise), RFC 9207 iss param ("supported by IdP" / "not
|
||||
advertised"), and Current KIDs (`(not exposed — query jwks_uri
|
||||
directly)` sentinel when the backend declines to expose the list).
|
||||
A "Refresh now" button invokes the existing
|
||||
`POST .../refresh` (RefreshKeys path) and invalidates the panel's
|
||||
query so the freshly-updated counters render without a page
|
||||
reload. The button is hidden for callers without `auth.oidc.edit`
|
||||
via the panel's optional `canRefresh` prop. Mounted on
|
||||
`OIDCProviderDetailPage.tsx` between the read-only field display
|
||||
and the Actions section. 9 Vitest tests pin: loading state,
|
||||
happy-path-all-six-rows, 403-hides-panel, refresh-invalidates-
|
||||
query, refresh-failure-surfaces-inline-without-hiding-panel,
|
||||
never-refreshed-cold-cache-sentinel, current-kids-empty-not-
|
||||
exposed-sentinel, last-error-red-treatment, and canRefresh=false-
|
||||
hides-the-button.
|
||||
|
||||
- **UsersPage sidebar nav entry (Audit 2026-05-11 Fix 11 — MED-11
|
||||
discoverability).** The MED-11 closure shipped `UsersPage.tsx` + wired
|
||||
the `/auth/users` route in `web/src/main.tsx`, but the sidebar
|
||||
navigation never gained a corresponding entry. Operators reached the
|
||||
federated-user-admin surface (used during compliance audits — "show
|
||||
me last login for every IdP-federated user") only by knowing the URL.
|
||||
A page that exists but isn't navigable is a half-finished page. New
|
||||
Users entry under the Auth section in `web/src/components/Layout.tsx`
|
||||
sits between Sessions and Roles (federated-identity grouping). Three
|
||||
Vitest tests in `Layout.test.tsx` pin the link's presence, the
|
||||
`/auth/users` destination, and the DOM ordering relative to Sessions
|
||||
so a future refactor that re-orders or removes the entry surfaces in
|
||||
the diff.
|
||||
|
||||
- **Scope-aware actor-role revoke (Audit 2026-05-11 A-4).**
|
||||
HIGH-10 made it possible to grant the same role to the same actor at
|
||||
multiple scopes (e.g. `r-operator` on `profile=p-acme` AND `profile=p-globex`)
|
||||
via the unique constraint extension on `actor_roles`, but
|
||||
`ActorRoleRepository.Revoke` ignored `(scope_type, scope_id)` and
|
||||
unconditionally deleted every variant. Operators who wanted to drop
|
||||
one scoped grant had to nuke them all and re-grant the remainder —
|
||||
a race window where the actor's access was briefly different. The
|
||||
`DELETE /v1/auth/keys/{id}/roles/{role_id}` endpoint now accepts
|
||||
optional `?scope_type=` / `?scope_id=` query params that narrow the
|
||||
revoke to a single variant; no-match returns 404. The legacy "revoke
|
||||
every variant" semantic is preserved when the query params are
|
||||
absent, so existing CLI / GUI buttons keep working unchanged. The
|
||||
audit row's `details` payload records which mode fired so SOC / SIEM
|
||||
can distinguish wide cleanups from targeted demotions. MCP tool
|
||||
`certctl_auth_revoke_role_from_key` gains optional `scope_type` +
|
||||
`scope_id` input fields with matching semantics. Documented in
|
||||
`docs/operator/rbac.md` under "Revoke: legacy 'all variants' vs
|
||||
scope-selective."
|
||||
|
||||
### Security (BREAKING — silent-elevation closure)
|
||||
|
||||
- **HIGH-10 actor-role scope is now enforced (Audit 2026-05-11 A-1).**
|
||||
Pre-fix, `actor_roles.scope_type` / `scope_id` (added in migration 000043
|
||||
by the HIGH-10 closure) were persisted by Grant + accepted on the handler
|
||||
body + surfaced through the GUI/MCP — but the load-bearing
|
||||
`EffectivePermissions` SQL never read them. A profile-scoped grant
|
||||
silently elevated to global at authorization time. Canonical CRIT-5
|
||||
lying-field shape, replicated. **The post-fix authorization narrows
|
||||
correctly**: every existing `actor_roles` row with `scope_type != 'global'`
|
||||
now takes effect.
|
||||
|
||||
> **Operator advisory:** if you used the HIGH-10 scope-bound role-grant
|
||||
> API between commit `551812b` and the v2.1.0 tag (the column was
|
||||
> populated but ignored), the grants were silently global. After
|
||||
> upgrading, audit `SELECT actor_id, role_id, scope_type, scope_id FROM
|
||||
> actor_roles WHERE scope_type != 'global'` and confirm the narrowing
|
||||
> reflects intent. If an actor was granted a scoped role but expected
|
||||
> global behavior, re-grant with `scope_type=global`.
|
||||
|
||||
### Security (BREAKING)
|
||||
|
||||
- **Federated-user deactivation now actually blocks login (Audit 2026-05-11 A-2).**
|
||||
The MED-11 closure shipped `users.deactivated_at` + `DELETE /api/v1/auth/users/{id}`
|
||||
+ cascade-session-revoke, but the column was a "lying field" three legs over: the
|
||||
postgres user repository never SELECTed it (so `User.DeactivatedAt` always read
|
||||
nil), the `Update` SQL never wrote it (so the handler's mutation was a no-op),
|
||||
and the OIDC `upsertUser` path never checked it (so the next login under the
|
||||
same `(provider, subject)` tuple re-minted a session and re-elevated the user).
|
||||
The cascade-revoke remained correct for the current cookie only. **Operator
|
||||
advisory: if you deactivated a federated user between the MED-11 closure
|
||||
(Bundle 2 merge `dea5053`) and the v2.1.0 release tag, verify the user cannot
|
||||
OIDC-log-in after upgrading — the column took no effect at login time before
|
||||
this fix. If needed, re-run the deactivation against the upgraded server.**
|
||||
Closure: `userColumns` + `scanUser` now read `deactivated_at` via `sql.NullTime`;
|
||||
`Create` + `Update` write it explicitly; `upsertUser` returns the new
|
||||
`ErrUserDeactivated` sentinel before mutating fields (preserves `last_login_at`
|
||||
forensics on rejected logins); `classifyOIDCFailure` surfaces the rejection
|
||||
as audit category `user_deactivated`. Self-deactivate guard on
|
||||
`DELETE /api/v1/auth/users/{id}` returns HTTP 409 + audit row
|
||||
`auth.user_deactivate_self_rejected` (prevents an admin from one-way-door
|
||||
locking themselves out via the standard handler — break-glass remains the
|
||||
recovery path). New inverse endpoint `POST /api/v1/auth/users/{id}/reactivate`
|
||||
(gated `auth.user.deactivate` — reactivation is the inverse op, not a separate
|
||||
privilege) clears `deactivated_at`; emits audit row `auth.user_reactivated`.
|
||||
Sessions revoked at deactivation stay revoked across reactivation — the user
|
||||
must complete a fresh OIDC login. GUI: `UsersPage.tsx` now renders a Reactivate
|
||||
button on deactivated rows. CWE-862 (missing authorization at the user-state
|
||||
boundary). SOC 2 CC6.3 + ISO 27001 A.9.2.6 compliance-table-flipping fix.
|
||||
- **`__Host-` cookie prefix on all three auth cookies (Audit 2026-05-10 MED-14).**
|
||||
The session cookie, CSRF cookie, and OIDC pre-login cookie are renamed from
|
||||
`certctl_session` / `certctl_csrf` / `certctl_oidc_pending` to
|
||||
`__Host-certctl_session` / `__Host-certctl_csrf` / `__Host-certctl_oidc_pending`
|
||||
to gain browser-enforced subdomain-takeover protection (a `__Host-*` cookie can
|
||||
only be set with `Path=/` + `Secure` + no `Domain` attribute, and the browser
|
||||
rejects subdomain attempts to overwrite it). **Active sessions invalidate on
|
||||
the rolling deploy that lands this change** — operators must re-authenticate
|
||||
once after upgrading. The GUI's CSRF cookie reader was updated in lockstep.
|
||||
See `docs/migration/oidc-enable.md` for operator-facing detail.
|
||||
|
||||
### Security
|
||||
|
||||
- **OIDC `allowed_email_domains` now editable in the GUI (Audit 2026-05-11 A-3).**
|
||||
The backend gate that rejects logins whose email domain is outside the
|
||||
configured allowlist landed in v2.1.0 (CRIT-5 closure, 2026-05-10), but the
|
||||
GUI never exposed the field — GUI-driven operators had to use the API
|
||||
directly to configure tenant isolation against multi-tenant IdPs (Auth0,
|
||||
Azure AD common endpoint, Google Workspace). The OIDCProvidersPage create
|
||||
modal and OIDCProviderDetailPage detail view now render a chip-style
|
||||
multi-input with client-side validation that mirrors the backend rules
|
||||
(no `@`, no whitespace, no wildcards, lowercase-only FQDNs). The read-only
|
||||
view renders an explicit "any (no gate configured)" sentinel when the list
|
||||
is empty so operators can tell "not configured" apart from "field is
|
||||
invisible." A "Clear all" button on the edit form is gated by a confirm
|
||||
dialog that warns about removing the tenant gate. **Operator advisory: if
|
||||
you provisioned OIDC providers via the GUI between v2.1.0 and this fix,
|
||||
verify `allowed_email_domains` matches your tenant policy — the field was
|
||||
configurable only via API / MCP / direct SQL during that window.** Per-IdP
|
||||
runbooks for multi-tenant IdPs in `docs/operator/oidc-runbooks/` already
|
||||
documented the field; the GUI now matches.
|
||||
|
||||
- **Approval payload preview (Audit 2026-05-11 A-5).**
|
||||
The MED-10 closure claim ("PARTIAL: raw JSON preview; diff library
|
||||
deferred") was inaccurate — `ApprovalsPage.tsx` rendered no payload
|
||||
at all, so approvers were clicking Approve / Reject without seeing
|
||||
the change they were authorizing. That defeats the entire four-eyes
|
||||
primitive: an approver who can't see what they're approving is
|
||||
rubber-stamping. Each row now carries a Preview toggle that expands
|
||||
an inline panel dispatching by kind: `profile_edit` shows a
|
||||
field-level before/after diff (changed-only rows, red/green cells,
|
||||
`(unset)` sentinel for added/removed fields); `cert_issuance` shows
|
||||
a definition list of CN / SANs / profile / key algo / must-staple /
|
||||
validity (catches the wildcard-against-corp-internal-profile attack
|
||||
at review time); unknown kinds render a generic JSON preview for
|
||||
forward-compat with future approval kinds. The base64-encoded JSON
|
||||
payload is decoded via the new `decodePayload` helper; malformed
|
||||
inputs render an explicit decode-error fallback — silent failure on
|
||||
the payload preview is what produced this bug in the first place.
|
||||
|
||||
- **Strict pre-login UA/IP binding (Audit 2026-05-11 A-6).**
|
||||
The MED-16 closure left a request-side empty-header bypass: when the
|
||||
pre-login row carried a User-Agent or client-IP binding but the
|
||||
`/auth/oidc/callback` request omitted the corresponding value, the
|
||||
binding check was silently skipped. `curl` doesn't send User-Agent
|
||||
by default; many programmatic clients omit it. An attacker who
|
||||
acquired a pre-login cookie could replay it without the bound
|
||||
header and bypass the RFC 9700 §4.7.1 defense. The check is now
|
||||
strict-when-stored — an empty request-side value with a non-empty
|
||||
stored binding rejects with HTTP 400 and the new audit failure
|
||||
categories `prelogin_ua_missing` / `prelogin_ip_missing` (distinct
|
||||
from the existing `*_mismatch` categories so SIEM rules can alert
|
||||
specifically on bypass attempts). **Operator advisory:** environments
|
||||
where the User-Agent is stripped in transit (some debug proxies, a
|
||||
handful of CDN configurations) must set
|
||||
`CERTCTL_OIDC_PRELOGIN_REQUIRE_UA=false` to keep logins working;
|
||||
symmetric `CERTCTL_OIDC_PRELOGIN_REQUIRE_IP=false` exists for the
|
||||
IP-side. The legacy-row compat window — pre-migration rows with no
|
||||
stored binding — still passes through unchecked, but that window is
|
||||
bounded by the 10-minute pre-login TTL.
|
||||
|
||||
- **OIDC provider Advanced fields are now editable in the GUI (Audit 2026-05-11 A-7).**
|
||||
The MED-4 row had been DEFERRED to v3 with the rationale "backend
|
||||
already accepts these fields." The verifier hit the GUI and found
|
||||
that the read-only display claimed the values were editable, but the
|
||||
edit form had no inputs — the save handler passed `provider.scopes`
|
||||
/ `provider.groups_claim_path` / `provider.groups_claim_format` /
|
||||
`provider.iat_window_seconds` / `provider.jwks_cache_ttl_seconds`
|
||||
unchanged from the loaded object. Operators who wanted to bump the
|
||||
IAT window or change the groups-claim path had to drop to curl /
|
||||
MCP and trust the GUI's display matched what they'd set elsewhere.
|
||||
Lying UX. The OIDCProviderDetailPage edit form now has a collapsible
|
||||
Advanced section with five inputs (scopes as a space-separated text
|
||||
field; groups-claim path; groups-claim format select with the
|
||||
backend's `string-array` / `json-path` enum; IAT window number input
|
||||
bounded 1–600; JWKS cache TTL number input with floor 60). Client-side
|
||||
validation mirrors the backend `Validate` rules so common operator
|
||||
mistakes (IAT > 600, JWKS TTL < 60, empty scopes, empty groups-claim-path)
|
||||
reject inline instead of round-tripping a 400. The read-only `<dl>`
|
||||
also gained the previously-invisible `jwks_cache_ttl_seconds` row.
|
||||
|
||||
- **Pre-login cookie Path widened from `/auth/oidc/` to `/` (Audit MED-14
|
||||
follow-on).** Required to satisfy the `__Host-` prefix's `Path=/` rule. The
|
||||
cookie lifetime is unchanged (10 minutes) and only the callback handler
|
||||
consumes it; the wider path scope is harmless.
|
||||
|
||||
- **RFC 9207 `iss` URL parameter check on OIDC callback (Audit 2026-05-10
|
||||
MED-17).** When the matched IdP's discovery doc advertises
|
||||
`authorization_response_iss_parameter_supported: true`, certctl now requires
|
||||
the `iss` query parameter on `/auth/oidc/callback` and enforces a
|
||||
constant-time compare against the configured provider's `IssuerURL`. Mismatch
|
||||
rejects with HTTP 400; the audit row's `failure_category` distinguishes
|
||||
`iss_param_missing` / `iss_param_mismatch` (RFC 9207 leg) from the existing
|
||||
`id_token_iss_mismatch` (in-token iss claim leg). Closes the mix-up-attack
|
||||
defense for modern Keycloak, Authentik, and public-trust CAs that ship
|
||||
RFC-9207 discovery. Providers that don't advertise support (the majority
|
||||
today) keep pre-fix behavior — back-compat is preserved.
|
||||
|
||||
- **Auth GUI batch (Audit 2026-05-10 MED-4/7/8/10/11/12 + LOW-1/11/12 +
|
||||
HIGH-10 GUI).** New backend endpoints land alongside their GUI
|
||||
consumers: `GET /api/v1/auth/users` + `DELETE /api/v1/auth/users/{id}`
|
||||
(auth.user.read / auth.user.deactivate; migration 000045 adds
|
||||
`users.deactivated_at` plus the two new permissions); `GET
|
||||
/api/v1/auth/runtime-config` (auth.role.assign) returning a sanitized
|
||||
flat-map of deployed CERTCTL_* values (no secrets leaked — only
|
||||
set/unset booleans and counts); `GET
|
||||
/api/v1/auth/oidc/providers/{id}/jwks-status` (auth.oidc.list)
|
||||
returning the per-provider verifier counters (refresh count, last
|
||||
refresh / error timestamps, rejected JWS count, RFC 9207 iss-param
|
||||
flag). New `UsersPage` lists federated identities + soft-deactivates.
|
||||
`AuthSettingsPage` gains the runtime-config panel. `KeysPage`'s
|
||||
assign-role modal now collects `scope_type` / `scope_id` /
|
||||
`expires_at`. `RoleDetailPage`'s add-permission form gains the same
|
||||
scope picker, and the Delete button is hidden on the 7 default
|
||||
system roles (server already rejected, this is pure UX).
|
||||
`AuthProvider` renders a sticky red demo-mode banner when
|
||||
`auth_type=none`. `actor-demo-anon` rows on `KeysPage` already had
|
||||
buttons disabled.
|
||||
|
||||
- **11 new MCP tools (Audit 2026-05-10 MED-13).** Approval workflow
|
||||
(`certctl_approval_list` / `_get` / `_approve` / `_reject`), break-glass
|
||||
credential admin (`certctl_breakglass_list` / `_set_password` /
|
||||
`_unlock` / `_remove`), bootstrap status + consume
|
||||
(`certctl_bootstrap_status` / `_consume`), and audit category filter
|
||||
(`certctl_audit_list_with_category`). All route through the existing
|
||||
HTTP client so server-side permission gates fire unchanged.
|
||||
`certctl_bootstrap_consume`'s tool description carries an explicit
|
||||
"NEVER WIRE THIS TO AUTONOMOUS OPERATION" warning — a leaked
|
||||
bootstrap token mints a fresh admin API key bypassing every other
|
||||
access-control gate, so the tool is for one-shot manual operator
|
||||
invocation only.
|
||||
|
||||
- **JWKS auto-refresh on cache-miss (Audit 2026-05-10 MED-6).** When
|
||||
the IdP rotates its signing key between pre-login + callback, the
|
||||
cached JWKS no longer contains the kid referenced by the inbound ID
|
||||
token's JWS header. Pre-fix, the verify failed with a generic error
|
||||
and the operator had to manually call `POST
|
||||
/api/v1/auth/oidc/providers/{id}/refresh`. The service now detects
|
||||
the kid-not-in-cache shape (`isKidMismatchError`) and runs a
|
||||
one-shot `RefreshKeys` (evict cache → re-fetch discovery + JWKS →
|
||||
re-run alg-downgrade defense) before retrying the verify exactly
|
||||
once. Bounded recovery: a second failure surfaces as
|
||||
`ErrJWKSUnreachable` per the original branches; no retry loop. A
|
||||
separate matcher (`isKidMismatchError`) is intentionally narrow
|
||||
so generic signature failures don't trigger refresh.
|
||||
|
||||
- **OIDC provider test endpoint (Audit 2026-05-10 MED-5).** New
|
||||
`POST /api/v1/auth/oidc/test` dry-runs an OIDC provider configuration
|
||||
without persisting: fetches the discovery doc, runs the alg-downgrade
|
||||
defense, detects RFC 9207 iss-parameter advertisement, and confirms
|
||||
JWKS reachability. Returns `TestDiscoveryResult{discovery_succeeded,
|
||||
jwks_reachable, supported_alg_values, iss_param_supported, errors[]}`
|
||||
so the GUI (forthcoming) can render per-check status rows. Per-leg
|
||||
failures ride in the response body's `errors` array; only a malformed
|
||||
request body trips 400. Gate: `auth.oidc.create`. Audit row
|
||||
`auth.oidc_provider_tested` carries the success/failure summary.
|
||||
|
||||
- **Pre-login UA / source-IP binding on OIDC callback (Audit 2026-05-10
|
||||
MED-16).** RFC 9700 §4.7.1 defense against stolen-pre-login-cookie replay
|
||||
by a different browser / source. Migration `000044_prelogin_uaip` adds
|
||||
`client_ip` + `user_agent` to `oidc_pre_login_sessions`; values captured at
|
||||
`/auth/oidc/login` are constant-time compared at `/auth/oidc/callback`.
|
||||
Mismatches return HTTP 400 with audit `failure_category` =
|
||||
`prelogin_ua_mismatch` or `prelogin_ip_mismatch`. Two operator escape
|
||||
hatches: `CERTCTL_OIDC_PRELOGIN_REQUIRE_UA` and
|
||||
`CERTCTL_OIDC_PRELOGIN_REQUIRE_IP` (both default `true`) — operators on
|
||||
enterprise proxies that rewrite UA, or dual-stack v4/v6 environments where
|
||||
source IP routinely flips, can disable the affected leg. The binding column
|
||||
is persisted even when enforcement is off, so retroactive forensics remain
|
||||
possible. Empty values on either side pass through (rolling-deploy +
|
||||
headless-proxy compat).
|
||||
|
||||
## v2.1.0 - Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions ⚠️
|
||||
|
||||
> **SECURITY: AUDIT YOUR API KEYS.**
|
||||
>
|
||||
> Bundle 1 ships role-based authorization. Every existing API key
|
||||
> configured via `CERTCTL_API_KEYS_NAMED` (or the legacy
|
||||
> `CERTCTL_AUTH_SECRET`) is mapped to the **r-admin role on the first
|
||||
> upgrade boot** so existing automation keeps working unchanged. Most
|
||||
> keys do NOT need full admin power; downgrade them before tagging
|
||||
> the next release.
|
||||
>
|
||||
> Recommended post-upgrade flow:
|
||||
>
|
||||
> ```bash
|
||||
> # 1. List every key with its current role:
|
||||
> certctl-cli auth keys list
|
||||
>
|
||||
> # 2. Walk an interactive prompt that downgrades each key:
|
||||
> certctl-cli auth keys scope-down
|
||||
>
|
||||
> # 3. Or get a heuristic suggestion based on 30 days of audit history:
|
||||
> certctl-cli auth keys scope-down --suggest
|
||||
> certctl-cli auth keys scope-down --suggest --apply # applies the suggestion
|
||||
>
|
||||
> # 4. Or drive scope-down from a JSON config (Helm post-upgrade hook):
|
||||
> certctl-cli auth keys scope-down --non-interactive ./scope-down.json
|
||||
> ```
|
||||
>
|
||||
> The synthetic `actor-demo-anon` actor (used when
|
||||
> `CERTCTL_AUTH_TYPE=none` is configured) is system-managed and
|
||||
> excluded from the prompt loop.
|
||||
|
||||
What else changed in v2.1.0:
|
||||
|
||||
- **Audit 2026-05-10 CRIT-1 closure — wire-layer RBAC enforcement.**
|
||||
The Bundle 1 + Bundle 2 audit surfaced that the permission catalogue
|
||||
was enforced on ~24 admin-only routes only; the bulk of state-changing
|
||||
routes (`POST /api/v1/certificates`, `PUT /api/v1/profiles/{id}`,
|
||||
`DELETE /api/v1/issuers/{id}`, `POST /api/v1/agents/{id}/csr`, even
|
||||
`POST /api/v1/auth/roles` + `POST /api/v1/auth/keys/{id}/roles`) had
|
||||
no `rbacGate` wrap. A `r-viewer` Bearer was essentially `r-admin`
|
||||
minus five fine-grained verbs at the wire layer (CWE-862). This
|
||||
release wraps every state-changing + read endpoint with
|
||||
`rbacGate` (global scope) or `rbacGateScoped` (per-profile / per-
|
||||
issuer scope-bound grants), and adds an AST-level CI guard
|
||||
(`TestRouterRBACGateCoverage`) that fails when a new route is
|
||||
registered without enforcement. Catalogue extended via migration
|
||||
000039 with 30 permissions covering `cert.edit`, `job.*`,
|
||||
`approval.*`, `policy.*`, `team.*`, `owner.*`, `notification.*`,
|
||||
`discovery.*`, `network_scan.*`, `healthcheck.*`, `digest.*`,
|
||||
`verification.*`, `stats.read`, `metrics.read`. **AUDIT YOUR
|
||||
KEYS** (the scope-down call-out above) now translates to real
|
||||
reduction in blast radius. Auditor pin preserved at exactly
|
||||
`{audit.read, audit.export}`.
|
||||
|
||||
- **RBAC primitive shipped.** `tenants`, `roles`, `permissions`,
|
||||
`role_permissions`, `actor_roles` tables (migration 000029); 33-permission
|
||||
canonical catalogue; 7 default roles (`admin`, `operator`, `viewer`,
|
||||
`agent`, `mcp`, `cli`, `auditor`); per-handler permission gates via
|
||||
`auth.RequirePermission` middleware (replaces the legacy
|
||||
`IsAdmin` boolean check on the 5 admin-only handlers).
|
||||
- **Day-0 admin bootstrap.** Set `CERTCTL_BOOTSTRAP_TOKEN` on a fresh
|
||||
deploy and POST a single curl call against `/api/v1/auth/bootstrap` to
|
||||
mint the first admin API key; one-shot, never logged, and locks
|
||||
closed once any admin actor exists. Migration 000031 ships the
|
||||
`api_keys` table that stores the SHA-256 hash; the plaintext is
|
||||
shown in the response body once and never persisted.
|
||||
- **Auditor role split.** New `auditor` role holds only `audit.read`
|
||||
+ `audit.export`. Compliance reviewers can read the audit trail
|
||||
without holding mutation power. Migration 000032 adds
|
||||
`audit_events.event_category` so auditors can filter to
|
||||
authentication-related events specifically.
|
||||
- **`/v1/auth/check` enrichment.** Response now includes the actor's
|
||||
standing roles and effective permissions, so the GUI gates
|
||||
affordances from a single fetch on app boot.
|
||||
- **Approval-bypass closure.** Edits to a profile that has (or
|
||||
would have) `RequiresApproval=true` now route through the
|
||||
`ApprovalService` two-person integrity gate (Phase 9). Migration
|
||||
000033 adds `approval_kind` + `payload` to
|
||||
`issuance_approval_requests` so cert-issuance and profile-edit
|
||||
approvals share the same workflow. Same-actor self-approve is
|
||||
rejected with `ErrApproveBySameActor` for both kinds. Closes the
|
||||
flip-flop loophole where an admin could disable approval, mutate,
|
||||
re-enable. Documented at
|
||||
[`docs/reference/profiles.md`](docs/reference/profiles.md).
|
||||
- **GUI: Roles / API Keys / Auth Settings / Approvals queue.**
|
||||
Four new pages under `/auth/*` consume `/v1/auth/me` for
|
||||
permission-aware rendering. The Approvals queue blocks
|
||||
self-approve at the client layer (Approve/Reject buttons hidden
|
||||
when requested_by == current actor_id) on top of the server-side
|
||||
enforcement. AuditPage gains a category filter (cert_lifecycle /
|
||||
auth / config) for the auditor view.
|
||||
- **MCP server gains 12 RBAC tools.** Operators driving certctl
|
||||
from Claude / VS Code / any MCP client get parity with the GUI
|
||||
+ CLI. Each tool routes through the same HTTP handler; permission
|
||||
gates fire server-side.
|
||||
- **OpenAPI catalogues every new route.** Every Bundle 1 endpoint
|
||||
ships with an `operationId`; the parity test guards against drift.
|
||||
- **Coverage gates.** `internal/auth/` and `internal/service/auth/`
|
||||
now have ≥85% coverage floors in `.github/coverage-thresholds.yml`.
|
||||
The 12-path negative-test list from the Bundle 1 prompt is
|
||||
fully covered (path #12 deferred with in-tree TODO).
|
||||
- **Protocol-endpoint allowlist pinned at three layers.** The
|
||||
middleware bypass (`auth.IsProtocolEndpoint`), the router-level
|
||||
`AuthExemptRouterRoutes` constant, and a new
|
||||
`phase12_protocol_allowlist_test.go` AST scan all guard against
|
||||
accidentally wrapping ACME / SCEP / EST / OCSP / CRL routes in
|
||||
`rbacGate`.
|
||||
- **Bundle 2: OIDC + sessions + back-channel logout + break-glass.**
|
||||
Auth Bundle 2 ships in the same v2.1.0 release. Operators get OIDC
|
||||
SSO support for Keycloak / Authentik / Okta / Auth0 / Microsoft
|
||||
Entra ID / Google Workspace (via Keycloak broker), HMAC-signed
|
||||
session cookies with idle/absolute timeouts + CSRF defense,
|
||||
back-channel logout per OpenID Connect Back-Channel Logout 1.0,
|
||||
and a default-OFF break-glass admin path with Argon2id passwords
|
||||
for SSO-broken incidents. API-key auth keeps working unchanged
|
||||
alongside; existing automation needs no changes. Migration walkthrough
|
||||
at [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md);
|
||||
per-IdP setup guides at
|
||||
[`docs/operator/oidc-runbooks/index.md`](docs/operator/oidc-runbooks/index.md).
|
||||
- **OIDC token validation pinned at three layers.** Algorithm
|
||||
allow-list (RS256/RS512/ES256/ES384/EdDSA only) with HS-family + `none`
|
||||
rejected at the service-layer sentinel; IdP-downgrade-attack defense
|
||||
at provider creation AND every JWKS RefreshKeys (intersects the IdP's
|
||||
advertised `id_token_signing_alg_values_supported` against the allow-
|
||||
list, rejects providers that advertise weak algs even before any
|
||||
token is signed); OIDC Core §3.1.3.7 re-verification of `iss` /
|
||||
`aud` / `azp` / `at_hash` (REQUIRED-when-access_token-present per
|
||||
Phase 3 tightening of the spec MAY → MUST) / `exp` / `iat` window
|
||||
/ `nonce` constant-time-compare. PKCE-S256 mandatory; `plain`
|
||||
rejected. Single-use state + nonce via atomic `DELETE...RETURNING`
|
||||
on consume.
|
||||
- **Session cookies use length-prefixed HMAC.** The cookie wire format
|
||||
is `v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`
|
||||
with HMAC input `len:sid:len:kid` (NOT bare-concat) to defeat
|
||||
concatenation collisions. `HttpOnly` + `Secure` + `SameSite=Lax`
|
||||
default; `SameSite=Strict` configurable via `CERTCTL_SESSION_SAMESITE`.
|
||||
Idle timeout 1h / absolute 8h defaults; scheduler GC sweeps expired
|
||||
rows hourly. Signing keys rotate via the new `RotateSigningKey`
|
||||
primitive; the old key stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION`
|
||||
(default 24h) so existing cookies validate during rollover.
|
||||
- **CSRF defense via double-submit-cookie + hashed-token-on-row.**
|
||||
Plaintext CSRF token in the JS-readable `certctl_csrf` cookie
|
||||
(intentionally `HttpOnly=false` for the GUI to echo into the
|
||||
`X-CSRF-Token` header); SHA-256 hash on the session row;
|
||||
`subtle.ConstantTimeCompare` in the new `CSRFMiddleware`. API-key
|
||||
actors are CSRF-exempt (no session row in context).
|
||||
- **OIDC `client_secret` encrypted at rest.** AES-256-GCM v3 blob
|
||||
format (magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using
|
||||
the existing `CERTCTL_CONFIG_ENCRYPTION_KEY`. Encryption invariant
|
||||
pinned by an integration test asserting ciphertext != plaintext +
|
||||
v3 blob shape + round-trip recovery + wrong-passphrase fails.
|
||||
- **OIDC first-admin bootstrap.** New `CERTCTL_BOOTSTRAP_ADMIN_GROUPS`
|
||||
+ `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` env vars: the first
|
||||
OIDC-authenticated user with a matching group claim becomes admin
|
||||
per tenant. Coexists with the Bundle 1 env-var-token bootstrap;
|
||||
the admin-existence probe ensures only one wins. Audit row
|
||||
(`bootstrap.oidc_first_admin`) on every grant.
|
||||
- **Break-glass admin (default-OFF).** New `CERTCTL_BREAKGLASS_ENABLED`
|
||||
env var (default `false`). When enabled, the local Argon2id-password
|
||||
admin path bypasses OIDC + group-claim layers — intended ONLY for
|
||||
SSO-broken incidents. Argon2id with OWASP 2024 params (m=64 MiB,
|
||||
t=3, p=4); lockout after 5 failures (configurable); constant-time
|
||||
across all failure paths via `verifyDummy`; surface invisibility
|
||||
(HTTP 404 on every endpoint when disabled, NOT 403). WARN log at
|
||||
server boot when enabled. WebAuthn/FIDO2 second factor pairing on
|
||||
the v3 roadmap (Decision 12).
|
||||
- **GUI: OIDC Providers + Group → Role Mappings + Sessions + login
|
||||
buttons.** Four new pages under `/auth/*` consume the Bundle 2 API
|
||||
surface. Login page renders one "Sign in with X" button per
|
||||
configured OIDC provider (in addition to the API-key form, which
|
||||
remains as a fallback for Bearer-mode + break-glass paths). Sessions
|
||||
page exposes own-sessions + admin all-actors view. Every actionable
|
||||
element is permission-gated server-side via `auth.oidc.*` and
|
||||
`auth.session.*` perms; client-side hide is UX layer. Logout button
|
||||
in the sidebar fires `POST /auth/logout` to clear the session
|
||||
server-side before redirecting to login.
|
||||
- **MCP server gains 11 OIDC + session tools.** `certctl_auth_list_oidc_providers`,
|
||||
`_get_oidc_provider`, `_create_oidc_provider`, `_update_oidc_provider`,
|
||||
`_delete_oidc_provider`, `_refresh_oidc_provider`,
|
||||
`_list_group_mappings`, `_add_group_mapping`, `_remove_group_mapping`,
|
||||
`_list_sessions`, `_revoke_session`. Operator-facing MCP tool count
|
||||
goes 12 (Bundle 1 RBAC) → 23 across the auth surface. Total MCP
|
||||
tool count: `grep -cE 'mcp\.AddTool\(' internal/mcp/tools*.go` ≈ 150.
|
||||
- **Per-IdP runbooks: 6 production-tier setup guides** at
|
||||
`docs/operator/oidc-runbooks/`. Each runbook follows a consistent
|
||||
five-section layout (Prerequisites / IdP-side config / certctl-side
|
||||
config / Verification / Troubleshooting + Validation checklist with
|
||||
operator sign-off line). Keycloak is the canonical reference;
|
||||
Authentik / Okta / Auth0 / Entra ID / Google Workspace document the
|
||||
IdP-specific deltas (Auth0's namespaced custom claims; Entra ID's
|
||||
group OBJECT IDs; Google Workspace's missing-groups-claim limitation
|
||||
+ the recommended Keycloak broker pattern).
|
||||
- **Threat model extended.** [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md)
|
||||
ships 5 new "Defenses Bundle 2 ships" subsections + 8 new threat-
|
||||
catalogue subsections (OIDC token forgery / session hijacking / IdP
|
||||
compromise / back-channel logout failure modes / group-claim
|
||||
manipulation / bootstrap risks / break-glass risks / token-leak
|
||||
hygiene). 6 new SQL-shaped operator-facing checks. New "Threats
|
||||
Bundle 2 does NOT close" section enumerating the 8 v3-backlog items
|
||||
(WebAuthn / JIT elevation / SAML / multi-tenant activation /
|
||||
HSM-FIPS / OIDC RP-initiated logout / Playwright / per-IdP
|
||||
external-tester sign-off).
|
||||
- **Performance baselines documented.** [`docs/operator/auth-benchmarks.md`](docs/operator/auth-benchmarks.md)
|
||||
ships four benchmarks with measured baselines on a 4 vCPU /
|
||||
8 GiB / Postgres 16 / Go 1.25 floor: `BenchmarkSession_SteadyState`
|
||||
p99 5 µs (target < 1 ms; 200× under), `BenchmarkSession_ColdProcess`
|
||||
p99 7.1 ms (target < 10 ms), `BenchmarkOIDC_SteadyState` p99 1.5 ms
|
||||
(target < 5 ms), `BenchmarkOIDC_ColdCache` operator-runs against
|
||||
live Keycloak via `make benchmark-auth-coldcache`.
|
||||
- **Standards + RFC implementation table.** [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md)
|
||||
ships 13 RFC / standard rows + 14 CWE rows with concrete file paths
|
||||
+ negative-test anchors per row. NOT a compliance-mapping doc per
|
||||
the operator's 2026-05-05 retired-compliance-docs decision; the
|
||||
doc explicitly says "build the framework mapping yourself against
|
||||
the rows here using the framework-mapping methodology your audit
|
||||
firm prescribes; this project does not own that mapping."
|
||||
- **Coverage gates held at floor 90 across all four Bundle 2
|
||||
packages.** `internal/auth/oidc/` 93.7%, `internal/auth/session/`
|
||||
94.9%, `internal/auth/breakglass/` 91.5%, `internal/auth/user/domain/`
|
||||
96.4%. NO held-low-with-rationale entry — the Phase 13 prompt's
|
||||
anti-Bundle-1-mistake rule held. Bundle 1's existing 85% floors
|
||||
for `internal/auth/` + `internal/service/auth/` stay 85
|
||||
(already-shipped-and-accepted) per the prompt's explicit
|
||||
inheritance rule.
|
||||
- **Multi-tenant query CI guard.** New `scripts/ci-guards/multi-tenant-query-coverage.sh`
|
||||
(ratchet-style, baseline 32 at v2.1.0 close): greps every
|
||||
SELECT/UPDATE/DELETE in `internal/repository/postgres/` against
|
||||
10 tenant-aware tables, fails on regression OR improvement (forces
|
||||
the operator to lift / lower the baseline visibly). Forward-compat
|
||||
protection so a future Bundle 3 / managed-service multi-tenant
|
||||
activation can flip the switch without finding silent
|
||||
tenant-data-leak bugs in shipped queries.
|
||||
- **Phase 10 Keycloak testcontainers integration test.** New build-tag-
|
||||
gated suite at `internal/auth/oidc/testfixtures/` + `integration_keycloak_test.go`
|
||||
drives the full OIDC flow against a live Keycloak container booted
|
||||
by testcontainers-go. 5-test matrix: discovery + JWKS load, full
|
||||
PKCE auth-code happy path with HTTP form scraping, logout-revokes-
|
||||
session, JWKS rotation, unmapped-groups-fails-closed. Reuses one
|
||||
container across the matrix to amortize the 60-90s boot. Optional
|
||||
Okta smoke test (build-tagged `integration && okta_smoke`) for live
|
||||
tenant validation. New Makefile targets: `make keycloak-integration-test`
|
||||
+ `make okta-smoke-test` + `make benchmark-auth-coldcache`.
|
||||
- **OpenAPI surface extended.** New `cookieAuth` security scheme
|
||||
(apiKey/cookie/`certctl_session`) alongside the existing
|
||||
`bearerAuth`. 13 new Bundle 2 endpoints across the OIDC + session
|
||||
+ group-mapping CRUD surface; 4 break-glass endpoints with
|
||||
surface-invisibility framing. The N-bundle-2-security-empty-preserved
|
||||
CI guard locks the `security: []` opt-out count at ≥ 14 so existing
|
||||
public endpoints stay public.
|
||||
- **Bundle-1-only compat regression CI guard.** New
|
||||
`scripts/ci-guards/bundle-1-compat-regression.sh` asserts the
|
||||
load-bearing invariants that protect the Bundle-1-only-deploy
|
||||
case (session middleware defers-to-next, CSRF passthrough on
|
||||
missing session row, ChainAuthSessionThenBearer wired, public
|
||||
OIDC routes in AuthExempt allowlist, AuthInfo guards on
|
||||
OIDCProvidersResolver != nil). Sibling
|
||||
`bundle-1-to-2-upgrade-regression.sh` asserts the upgrade-path
|
||||
invariants (migrations 000034..000038 are CREATE TABLE IF NOT EXISTS
|
||||
+ BEGIN/COMMIT-wrapped + no DROP TABLE / ALTER...DROP COLUMN
|
||||
against 19 protected Bundle-1 tables + ON CONFLICT DO NOTHING on
|
||||
permission seed).
|
||||
|
||||
Migration ordering, idempotency, and downgrade are documented in
|
||||
[`docs/migration/api-keys-to-rbac.md`](docs/migration/api-keys-to-rbac.md)
|
||||
(API-key → RBAC, Bundle 1) and [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md)
|
||||
(API-key → OIDC, Bundle 2). The threat model lives at
|
||||
[`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md).
|
||||
Day-2 RBAC operations live at [`docs/operator/rbac.md`](docs/operator/rbac.md).
|
||||
RFC + CWE evidence at [`docs/reference/auth-standards-implemented.md`](docs/reference/auth-standards-implemented.md).
|
||||
|
||||
## v2.0.68 - Image registry path changed ⚠️
|
||||
|
||||
> **Image registry path changed.** Starting this release, container images publish to `ghcr.io/certctl-io/certctl-server` and `ghcr.io/certctl-io/certctl-agent`. Existing pulls from `ghcr.io/shankar0123/certctl-{server,agent}:<tag>` continue to work for previously-published tags (the registry never deletes images), but the `:latest` tag at the old path stops moving forward at this release. Update your `docker pull` paths, `docker-compose.yml` `image:` keys, or Helm `image.repository` values to receive future updates. Old `git clone` / `git push` / install-script / API URLs continue to redirect forever - only the container-registry path changed.
|
||||
|
||||
This is the only operator-action-required change in v2.0.68. Other changes in this release are cosmetic URL refreshes after the GitHub-org transfer from `shankar0123/certctl` to `certctl-io/certctl` (HTTP redirects mean no other operator action is required) plus an internal contextcheck lint fix in the agent. Full commit list is on the [GitHub release page](https://github.com/certctl-io/certctl/releases/tag/v2.0.68).
|
||||
|
||||
@@ -13,18 +732,18 @@ notes are auto-generated from commit messages between consecutive tags.
|
||||
|
||||
**Where to find what changed in a given release:**
|
||||
|
||||
- **[GitHub Releases](https://github.com/certctl-io/certctl/releases)** — every
|
||||
- **[GitHub Releases](https://github.com/certctl-io/certctl/releases)** - every
|
||||
tag has an auto-generated "What's Changed" section pulled from the commits
|
||||
between that tag and the previous one, plus per-release supply-chain
|
||||
verification instructions (Cosign / SLSA / SBOM).
|
||||
- **`git log <prev-tag>..<this-tag> --oneline`** — same content, locally.
|
||||
- **`git log <prev-tag>..<this-tag> --oneline`** - same content, locally.
|
||||
|
||||
**Why no hand-edited CHANGELOG.md:**
|
||||
|
||||
certctl is solo-developed and pushes directly to master. Maintaining a
|
||||
hand-edited CHANGELOG meant the file drifted (entries piled into
|
||||
`[unreleased]` and never got promoted to per-version sections when tags were
|
||||
cut). A stale CHANGELOG is worse than no CHANGELOG — it signals abandoned
|
||||
cut). A stale CHANGELOG is worse than no CHANGELOG - it signals abandoned
|
||||
maintenance to security-conscious operators doing diligence.
|
||||
|
||||
The auto-generated release notes work here because commit messages follow a
|
||||
|
||||
+1
-1
@@ -63,7 +63,7 @@ RUN for i in 1 2 3; do \
|
||||
npm run build
|
||||
|
||||
# Stage 2: Build Go binary
|
||||
FROM golang:1.25-alpine@sha256:5caaf1cca9dc351e13deafbc3879fd4754801acba8653fa9540cea125d01a71f AS builder
|
||||
FROM golang:1.25.10-alpine@sha256:8d22e29d960bc50cd025d93d5b7c7d220b1ee9aa7a239b3c8f55a57e987e8d45 AS builder
|
||||
|
||||
# Proxy propagation (M-4, Issue #9) — see Stage 1 rationale.
|
||||
ARG HTTP_PROXY=
|
||||
|
||||
+1
-1
@@ -5,7 +5,7 @@
|
||||
# operator runbook; the pins here MUST be bumped in the same pass.
|
||||
|
||||
# Stage 1: Build
|
||||
FROM golang:1.25-alpine@sha256:5caaf1cca9dc351e13deafbc3879fd4754801acba8653fa9540cea125d01a71f AS builder
|
||||
FROM golang:1.25.10-alpine@sha256:8d22e29d960bc50cd025d93d5b7c7d220b1ee9aa7a239b3c8f55a57e987e8d45 AS builder
|
||||
|
||||
# Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
|
||||
# behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
.PHONY: help build run test lint verify verify-docs verify-deploy loadtest acme-cert-manager-test acme-rfc-conformance-test clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build qa-stats
|
||||
.PHONY: help build run test lint verify verify-docs verify-deploy loadtest acme-cert-manager-test acme-rfc-conformance-test keycloak-integration-test okta-smoke-test benchmark-auth benchmark-auth-coldcache clean docker-up docker-down migrate-up migrate-down generate test-cover frontend-build qa-stats
|
||||
|
||||
# Default target - show help
|
||||
help:
|
||||
@@ -119,15 +119,18 @@ verify:
|
||||
@echo ""
|
||||
@echo "verify: PASS — safe to commit"
|
||||
|
||||
# verify-docs: pre-tag gate. Runs the QA-doc Part-count + seed-count
|
||||
# drift guards that ci-pipeline-cleanup Phase 11 / frozen decision 0.13
|
||||
# moved out of CI (was per-push blocking; now operator-runs pre-tag).
|
||||
# These guards protect docs/qa-test-guide.md headlines from drifting
|
||||
# vs the underlying source-of-truth (testing-guide Part count, seed
|
||||
# row count). Operator-facing docs only — not product-affecting.
|
||||
# verify-docs: pre-tag gate. Runs the QA-doc seed-count drift guard
|
||||
# that ci-pipeline-cleanup Phase 11 / frozen decision 0.13 moved out
|
||||
# of CI (was per-push blocking; now operator-runs pre-tag). Protects
|
||||
# docs/contributor/qa-test-suite.md::Seed Data Reference from
|
||||
# drifting vs migrations/seed_demo.sql. Operator-facing docs only —
|
||||
# not product-affecting.
|
||||
#
|
||||
# The QA-doc Part-count drift guard retired in the 2026-05-04 docs
|
||||
# overhaul Phase 5 when docs/testing-guide.md was pruned (its content
|
||||
# dispersed across the audience-organized doc tree); the Part-count
|
||||
# class no longer exists outside the qa_test.go file itself.
|
||||
verify-docs:
|
||||
@echo "==> QA-doc Part-count drift"
|
||||
@bash scripts/qa-doc-part-count.sh
|
||||
@echo "==> QA-doc seed-count drift"
|
||||
@bash scripts/qa-doc-seed-count.sh
|
||||
@echo ""
|
||||
@@ -168,6 +171,54 @@ loadtest:
|
||||
@echo "==> results landed in deploy/test/loadtest/results/"
|
||||
@if [ -f deploy/test/loadtest/results/summary.txt ]; then cat deploy/test/loadtest/results/summary.txt; fi
|
||||
|
||||
# Auth Bundle 2 Phase 10 — Keycloak end-to-end OIDC integration test.
|
||||
# Boots a Keycloak container via testcontainers-go (quay.io/keycloak:25.0),
|
||||
# imports a canned realm with two groups + two users, and drives the
|
||||
# full OIDC flow against the certctl service: discovery + JWKS,
|
||||
# auth-code login, group-claim parsing, group-role mapping, session
|
||||
# mint, and JWKS rotation.
|
||||
#
|
||||
# Build-tag-gated under `integration` so `make verify` (which runs
|
||||
# go test -short) NEVER pulls in the 60-90s Keycloak boot. Requires a
|
||||
# local Docker daemon. Skips cleanly with t.Skip() when -short is set.
|
||||
keycloak-integration-test:
|
||||
@echo "==> running Keycloak OIDC integration test (requires Docker)"
|
||||
@go test -tags=integration -count=1 -timeout=10m \
|
||||
./internal/auth/oidc/...
|
||||
|
||||
# Auth Bundle 2 Phase 10 — optional Okta smoke test. Gated behind TWO
|
||||
# build tags (integration + okta_smoke) so it only runs when invoked
|
||||
# manually against the operator's own Okta dev tenant. Requires the
|
||||
# OKTA_ISSUER + OKTA_CLIENT_ID + OKTA_CLIENT_SECRET env vars; the test
|
||||
# t.Skip's with a clear message when any are missing. Documented in
|
||||
# internal/auth/oidc/integration_okta_smoke_test.go.
|
||||
okta-smoke-test:
|
||||
@echo "==> running Okta smoke test (requires OKTA_ISSUER / _CLIENT_ID / _CLIENT_SECRET env vars)"
|
||||
@go test -tags='integration okta_smoke' -count=1 -timeout=2m \
|
||||
./internal/auth/oidc/...
|
||||
|
||||
# Auth Bundle 2 Phase 14 — auth performance benchmarks. Three default-
|
||||
# tag benchmarks (session steady-state + session cold-process + oidc
|
||||
# steady-state) producing p50/p95/p99/max numbers per the auth-
|
||||
# benchmarks.md operator-doc table.
|
||||
benchmark-auth:
|
||||
@echo "==> running auth performance benchmarks (session + oidc steady-state)"
|
||||
@go test -bench='BenchmarkSession_|BenchmarkOIDC_SteadyState' -benchmem \
|
||||
-benchtime=2000x -run='^$$' \
|
||||
./internal/auth/session/ ./internal/auth/oidc/
|
||||
|
||||
# Auth Bundle 2 Phase 14 — OIDC cold-cache benchmark against a live
|
||||
# Keycloak container (requires Docker). Build-tag-gated so the
|
||||
# default-tag benchmarks above never pull in the 60-90s container
|
||||
# boot. Runs the integration test FIRST to populate the
|
||||
# sharedKeycloak fixture, then runs the benchmark.
|
||||
benchmark-auth-coldcache:
|
||||
@echo "==> running OIDC cold-cache benchmark against live Keycloak (requires Docker)"
|
||||
@go test -tags integration -count=1 -timeout=10m \
|
||||
-run TestKeycloakIntegration_RefreshKeysFetchesDiscoveryAndJWKS \
|
||||
-bench BenchmarkOIDC_ColdCache -benchmem -benchtime=10x \
|
||||
./internal/auth/oidc/
|
||||
|
||||
# Phase 5 — kind-driven cert-manager integration test. Requires
|
||||
# `kind`, `kubectl`, `helm`, and a local Docker daemon. Sets
|
||||
# KIND_AVAILABLE=1 so the test runs (it skips cleanly when unset, which
|
||||
@@ -263,9 +314,12 @@ frontend-build:
|
||||
@echo "Frontend build complete"
|
||||
|
||||
# QA Suite Stats — Bundle P / Strengthening #8.
|
||||
# Single source-of-truth for every count claim in docs/qa-test-guide.md +
|
||||
# docs/testing-guide.md. The Strengthening #6 CI drift guards consume the
|
||||
# same numbers, eliminating the doc-drift class structurally.
|
||||
# Single source-of-truth for every count claim in
|
||||
# docs/contributor/qa-test-suite.md. The Strengthening #6 CI drift guards
|
||||
# (now scoped to the seed-count class only — the Part-count class retired
|
||||
# in the 2026-05-04 docs overhaul Phase 5 when testing-guide.md was
|
||||
# pruned) consume the same numbers, eliminating the doc-drift class
|
||||
# structurally.
|
||||
qa-stats:
|
||||
@echo "=== certctl QA Suite Stats ==="
|
||||
@echo "Date: $$(date +%Y-%m-%d)"
|
||||
@@ -278,9 +332,8 @@ qa-stats:
|
||||
@echo "Fuzz targets: $$(grep -rE 'func Fuzz[A-Z]' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
|
||||
@echo "t.Skip sites: $$(grep -rE 't\.Skip(Now|f)?\(' --include='*_test.go' . 2>/dev/null | wc -l | tr -d ' ')"
|
||||
@echo "qa_test.go Part_ subtests: $$(grep -cE 't\.Run\(\"Part[0-9]+_' deploy/test/qa_test.go 2>/dev/null || echo 0)"
|
||||
@echo "testing-guide.md Parts: $$(grep -cE '^## Part [0-9]+:' docs/testing-guide.md 2>/dev/null || echo 0)"
|
||||
@echo "Seed unique mc-* IDs: $$(grep -oE "mc-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
|
||||
@echo "Seed unique ag-* IDs: $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 12)"
|
||||
@echo "Seed unique ag-* IDs: $$(grep -oE "ag-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (incl. agent_groups; agents-table count is 13 incl. agent-demo-1 + 3 cloud sentinels + server-scanner)"
|
||||
@echo "Seed unique iss-* IDs: $$(grep -oE "iss-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ') (issuers table count is 13)"
|
||||
@echo "Seed unique tgt-* IDs: $$(grep -oE "tgt-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
|
||||
@echo "Seed unique nst-* IDs: $$(grep -oE "nst-[a-z0-9_-]+" migrations/seed_demo.sql 2>/dev/null | sort -u | wc -l | tr -d ' ')"
|
||||
|
||||
@@ -9,146 +9,33 @@
|
||||
[](https://github.com/certctl-io/certctl/releases)
|
||||
[](https://github.com/certctl-io/certctl/stargazers)
|
||||
|
||||
TLS certificate lifespans are shrinking fast. The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) unanimously in April 2025, setting a phased reduction: **200 days** by March 2026, **100 days** by March 2027, and **47 days** by March 2029. Organizations managing dozens or hundreds of certificates can no longer rely on spreadsheets, calendar reminders, or manual renewal workflows. The math doesn't work — at 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever.
|
||||
certctl is a self-hosted platform that automates the entire TLS certificate lifecycle, from issuance through renewal to deployment, with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. Free, source-available under BSL 1.1, covers the same lifecycle that enterprise platforms charge $100K+/year for.
|
||||
|
||||
certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. It's free, self-hosted, and covers the same lifecycle that enterprise platforms charge $100K+/year for.
|
||||
The CA/Browser Forum's [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) caps public TLS certificates at **200 days by March 2026**, **100 days by 2027**, and **47 days by 2029**. At 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever. Manual workflows stop being a choice.
|
||||
|
||||
```mermaid
|
||||
gantt
|
||||
title TLS Certificate Maximum Lifespan — CA/Browser Forum Ballot SC-081v3
|
||||
dateFormat YYYY-MM-DD
|
||||
axisFormat
|
||||
todayMarker off
|
||||
section 2015
|
||||
5 years (1825 days) :done, 2020-01-01, 1825d
|
||||
section 2018
|
||||
825 days :done, 2020-01-01, 825d
|
||||
section 2020
|
||||
398 days :active, 2020-01-01, 398d
|
||||
section 2026
|
||||
200 days :crit, 2020-01-01, 200d
|
||||
section 2027
|
||||
100 days :crit, 2020-01-01, 100d
|
||||
section 2029
|
||||
47 days :crit, 2020-01-01, 47d
|
||||
```
|
||||
> **Status: Early-access.** Production-quality core — Local CA, ACME, agent deployment, CRUD, audit, role-based authz (auditor split + day-0 bootstrap + four-eyes approval). Broader surface — intermediate CA hierarchy, ACME/SCEP/EST servers, network appliances — still maturing.
|
||||
|
||||
> **Actively maintained — shipping weekly.** Found something? [Open a GitHub issue](https://github.com/certctl-io/certctl/issues) — issues get triaged same-day. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
|
||||
> v2.1.0 ships federated identity in early-access: OIDC SSO across Keycloak, Authentik, Okta, Auth0, Entra ID, and Google Workspace; HMAC-signed server-side sessions with `__Host-` cookies and CSRF rotation; OIDC Back-Channel Logout; Argon2id break-glass admin. Lab and dev deployments encouraged; production welcomed with the understanding that customer-scale battle-testing is in progress — please [file issues](https://github.com/certctl-io/certctl/issues) on the federated-identity surface, where real-world IdP shapes surface fast.
|
||||
|
||||
**Ready to try it?** Jump to the [Quick Start](#quick-start) — you'll have a running dashboard in under 5 minutes.
|
||||
> **Actively maintained, shipping weekly.** [Open an issue](https://github.com/certctl-io/certctl/issues) if something breaks. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
|
||||
|
||||
**Ready to try it?** Jump to the [Quick Start](#quick-start). For the marketing site, see [certctl.io](https://certctl.io).
|
||||
|
||||
## Documentation
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Why certctl?](docs/why-certctl.md) | How certctl compares to ACME clients, agent-based SaaS, and enterprise platforms |
|
||||
| [Concepts](docs/concepts.md) | TLS certificates explained from scratch — for beginners who know nothing about certs |
|
||||
| [Quick Start](docs/quickstart.md) | 5-minute setup — dashboard, API, CLI, discovery, stakeholder demo flow |
|
||||
| [Docker Compose Environments](deploy/ENVIRONMENTS.md) | Service-by-service walkthrough of all 4 compose files, env var reference |
|
||||
| [Deployment Examples](docs/examples.md) | 5 turnkey scenarios (ACME+NGINX, wildcard DNS-01, private CA, step-ca, multi-issuer) with migration guides |
|
||||
| [Advanced Demo](docs/demo-advanced.md) | Issue a certificate end-to-end with technical deep-dives |
|
||||
| [Architecture](docs/architecture.md) | System design, data flow diagrams, security model |
|
||||
| [Feature Inventory](docs/features.md) | Complete reference of all capabilities, API endpoints, and configuration |
|
||||
| [Connector Reference](docs/connectors.md) | Configuration for all issuer, target, and notifier connectors |
|
||||
| [ACME Server](docs/acme-server.md) | Run certctl as a drop-in ACME server — cert-manager / Caddy / Traefik walkthroughs + [threat model](docs/acme-server-threat-model.md) |
|
||||
| [Approval Workflow](docs/approval-workflow.md) | Two-person-integrity gate for certificate issuance — RBAC, audit, bypass mode |
|
||||
| [CA Hierarchy](docs/intermediate-ca-hierarchy.md) | Multi-level intermediate CA management — FedRAMP boundary CA, financial-services policy CA, internal-PKI patterns |
|
||||
| [Cloud Target Runbook](docs/runbook-cloud-targets.md) | AWS ACM + Azure Key Vault deploy connectors — config, debugging, atomic-rollback semantics |
|
||||
| [Expiry Alert Runbook](docs/runbook-expiry-alerts.md) | Per-policy multi-channel routing matrix — severity tiers, fault-isolating dispatch |
|
||||
| [MCP Server](docs/mcp.md) | AI integration via Model Context Protocol — setup, available tools, examples |
|
||||
| [OpenAPI 3.1 Spec](docs/openapi.md) | API reference guide with endpoint overview ([raw spec](api/openapi.yaml)) |
|
||||
| [Compliance Mapping](docs/compliance.md) | SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 alignment guides |
|
||||
| [Migrate from certbot](docs/migrate-from-certbot.md) | Step-by-step migration from certbot cron jobs to certctl |
|
||||
| [Migrate from acme.sh](docs/migrate-from-acmesh.md) | Migration guide for acme.sh users, DNS hook compatibility |
|
||||
| [certctl for cert-manager users](docs/certctl-for-cert-manager-users.md) | How certctl complements cert-manager for mixed infrastructure |
|
||||
| [Test Environment](docs/test-env.md) | Docker Compose test environment with real CA backends |
|
||||
| [Testing Guide](docs/testing-guide.md) | Comprehensive test procedures, smoke tests, and release sign-off checklist |
|
||||
The full audience-organized index lives at [`docs/README.md`](docs/README.md). Top-level entry points:
|
||||
|
||||
## Supported Integrations
|
||||
| Audience | Start here |
|
||||
|---|---|
|
||||
| New to certctl | [Concepts](docs/getting-started/concepts.md) → [Quickstart](docs/getting-started/quickstart.md) → [Examples](docs/getting-started/examples.md) |
|
||||
| Production operator | [Architecture](docs/reference/architecture.md) → [Security posture](docs/operator/security.md) → [Disaster recovery runbook](docs/operator/runbooks/disaster-recovery.md) |
|
||||
| PKI engineer | [ACME server](docs/reference/protocols/acme-server.md) → [SCEP server](docs/reference/protocols/scep-server.md) → [EST server](docs/reference/protocols/est.md) → [CA hierarchy](docs/reference/intermediate-ca-hierarchy.md) |
|
||||
| Migrating from another tool | [from certbot](docs/migration/from-certbot.md) / [from acme.sh](docs/migration/from-acmesh.md) / [cert-manager coexistence](docs/migration/cert-manager-coexistence.md) |
|
||||
| Contributor | [Architecture](docs/reference/architecture.md) → [Testing strategy](docs/contributor/testing-strategy.md) → [CI pipeline](docs/contributor/ci-pipeline.md) |
|
||||
|
||||
### Certificate Issuers
|
||||
For the connector reference (12 issuers, 15 targets, 6 notifiers) see [`docs/reference/connectors/index.md`](docs/reference/connectors/index.md).
|
||||
|
||||
| Issuer | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| Local CA (self-signed + sub-CA + tree mode) | `GenericCA` | Sub-CA mode chains to enterprise root (ADCS, etc.). **Tree mode (Rank 8)** manages multi-level intermediate CAs (`intermediate_cas` table) with RFC 5280 §3.2 / §4.2.1.9 / §4.2.1.10 enforcement — FedRAMP boundary CAs, financial-services policy CAs, internal PKI. See [`docs/intermediate-ca-hierarchy.md`](docs/intermediate-ca-hierarchy.md). |
|
||||
| ACME v2 (Let's Encrypt, ZeroSSL, etc.) | `ACME` | HTTP-01, DNS-01, DNS-PERSIST-01 challenges. EAB auto-fetch from ZeroSSL. Profile selection (`tlsserver`, `shortlived`). |
|
||||
| step-ca (Smallstep) | `StepCA` | JWK provisioner auth, issuance + renewal + revocation |
|
||||
| OpenSSL / Custom CA | `OpenSSL` | Shell script adapter — any CA with a CLI |
|
||||
| HashiCorp Vault PKI | `VaultPKI` | Token auth with **automatic renewal at TTL/2** + Prometheus metric, synchronous issuance, CRL/OCSP delegated to Vault, opaque `*secret.Ref` credential storage |
|
||||
| DigiCert CertCentral | `DigiCert` | Async order model, OV/EV support, PEM bundle parsing |
|
||||
| Sectigo SCM | `Sectigo` | 3-header auth, DV/OV/EV, collect-not-ready graceful handling |
|
||||
| Google Cloud CAS | `GoogleCAS` | OAuth2 service account, synchronous issuance, CA pool selection |
|
||||
| AWS ACM Private CA | `AWSACMPCA` | Synchronous issuance, configurable signing algorithm/template ARN |
|
||||
| Entrust Certificate Services | `Entrust` | mTLS client certificate auth, synchronous/approval-pending issuance |
|
||||
| GlobalSign Atlas HVCA | `GlobalSign` | mTLS + API key/secret dual auth, serial-based tracking |
|
||||
| EJBCA (Keyfactor) | `EJBCA` | Dual auth (mTLS with auto-reload-on-mtime via `mtlscache`, or OAuth2), self-hosted open-source CA |
|
||||
|
||||
**Note:** ADCS integration is handled via the Local CA's sub-CA mode — certctl operates as a subordinate CA with its signing certificate issued by ADCS. Any CA with a shell-accessible signing interface can be integrated via the OpenSSL/Custom CA connector.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
| Target | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| NGINX | `NGINX` | Atomic write + `nginx -t` validate + `nginx -s reload` + post-deploy TLS verify + rollback (deploy-hardening I) |
|
||||
| Apache httpd | `Apache` | Atomic write + `apachectl configtest` + graceful reload + post-deploy TLS verify + rollback |
|
||||
| HAProxy | `HAProxy` | Combined PEM atomic write + `haproxy -c -f` validate + `systemctl reload` + post-deploy TLS verify + rollback |
|
||||
| Traefik | `Traefik` | Atomic write + post-deploy TLS verify + rollback (file watcher auto-reloads) |
|
||||
| Caddy | `Caddy` | Atomic write (file mode) or `POST /load` (api mode) + admin API ValidateOnly probe |
|
||||
| Envoy | `Envoy` | Atomic write + SDS file watcher auto-reload |
|
||||
| Postfix | `Postfix` | Atomic write + `postfix check` + `postfix reload` + post-deploy TLS verify + rollback |
|
||||
| Dovecot | `Dovecot` | Atomic write + `doveconf -n` + `doveadm reload` + post-deploy TLS verify + rollback |
|
||||
| Microsoft IIS | `IIS` | Local PowerShell or remote WinRM, PEM→PFX, SNI support, explicit pre-deploy backup + post-rollback re-import |
|
||||
| F5 BIG-IP | `F5` | iControl REST via proxy agent, transaction-based atomic updates + post-deploy TLS verify on Virtual Server |
|
||||
| SSH (Agentless) | `SSH` | SFTP cert/key deployment + pre-deploy SCP backup + tls.Dial post-verify |
|
||||
| Windows Certificate Store | `WinCertStore` | PowerShell Import-PfxCertificate + Get-ChildItem snapshot for rollback |
|
||||
| Java Keystore | `JavaKeystore` | PEM→PKCS#12→keytool pipeline + keytool snapshot for rollback |
|
||||
| Kubernetes Secrets | `KubernetesSecrets` | `kubernetes.io/tls` Secrets, atomic API + SHA-256 verify + kubelet sync poll |
|
||||
| **AWS Certificate Manager** | `AWSACM` | SDK-driven `ImportCertificate` (fresh ARN or rotate-in-place) + `DescribeCertificate` snapshot for atomic rollback + tag re-application. See [`docs/runbook-cloud-targets.md`](docs/runbook-cloud-targets.md). |
|
||||
| **Azure Key Vault** | `AzureKeyVault` | SDK-driven PEM→PKCS#12 import via `ImportCertificate` (always new version) + snapshot CER bytes for atomic rollback + tag carry-forward. |
|
||||
|
||||
**Deploy-hardening I** (post-2026-04-30 master bundle): every connector now goes through `internal/deploy.Apply` for atomic-write + ownership-preservation + SHA-256 idempotency + per-target-type Prometheus counters (`certctl_deploy_*_total`). See [`docs/deployment-atomicity.md`](docs/deployment-atomicity.md) for the operator guide.
|
||||
|
||||
### Enrollment Protocols
|
||||
|
||||
| Protocol | Standard | Use Case |
|
||||
|----------|----------|----------|
|
||||
| **EST (production-grade)** | RFC 7030 + RFC 9266 channel binding | Native EST server hardened for enterprise WiFi/802.1X, IoT bootstrap, and corporate device enrollment (post-2026-04-29 hardening master bundle). All six RFC 7030 endpoints — `cacerts` / `simpleenroll` / `simplereenroll` / `csrattrs` (profile-driven) / `serverkeygen` (CMS EnvelopedData wire format). Multi-profile dispatch (`/.well-known/est/<pathID>/`). Per-profile auth modes: mTLS sibling route at `/.well-known/est-mtls/<pathID>/`, HTTP Basic enrollment-password (constant-time compare + per-source-IP failed-auth limiter), RFC 9266 `tls-exporter` channel binding (TLS 1.3, opt-in per profile). Per-(CN, sourceIP) sliding-window rate limit. EST-source-scoped bulk revoke (`POST /api/v1/est/certificates/bulk-revoke`, M-008 admin-gated). Tabbed admin GUI at `/est` (Profiles / Recent Activity / Trust Bundle). `SIGHUP`-equivalent trust-bundle reload. libest reference-client interop tested in CI (`deploy/test/libest/Dockerfile` + `deploy/test/est_e2e_test.go`). Typed audit-action codes per failure dimension (`est_simple_enroll_success`/`_failed`, `est_auth_failed_basic`/`_mtls`/`_channel_binding`, `est_rate_limited`, `est_csr_policy_violation`, `est_bulk_revoke`, `est_trust_anchor_reloaded`, etc. — full set in `internal/service/est_audit_actions.go`). CLI + matching MCP tool family (rebuild count via `grep -cE '"est_' internal/mcp/tools_est.go`). See [`docs/est.md`](docs/est.md) for the operator guide — WiFi/802.1X + FreeRADIUS recipe, IoT bootstrap, troubleshooting matrix per audit-action code. |
|
||||
| SCEP (Simple Certificate Enrollment Protocol) | RFC 8894 | MDM platforms (Jamf, Intune), network devices, ChromeOS. Full RFC 8894 wire format: EnvelopedData decryption, signerInfo POPO verification, CertRep PKIMessage builder; PKCSReq + RenewalReq + GetCertInitial messageType dispatch; multi-profile dispatch (`/scep/<pathID>`); per-profile RA cert + key. Lightweight raw-CSR clients keep working via the legacy MVP fall-through path. |
|
||||
| **Microsoft Intune SCEP fleet (drop-in NDES replacement)** | RFC 8894 + Intune Connector signed-challenge dispatcher | Per-profile Intune dispatcher validates the Connector's signed challenge against an operator-supplied trust anchor; binds device claim to CSR (set-equality on CN + SAN-DNS/RFC822/UPN); replay cache + per-device rate limit; `SIGHUP`-reloadable trust pool; admin GUI **SCEP Administration** page at `/scep` (Profiles tab with per-profile RA cert expiry + mTLS status, Intune Monitoring tab with per-status counters + reload, Recent Activity tab with full SCEP audit log filter). See [`docs/scep-intune.md`](docs/scep-intune.md) for the migration playbook + Microsoft support statement. |
|
||||
| ACME v2 client | RFC 8555 | Public CA automated issuance (Let's Encrypt, ZeroSSL) |
|
||||
| **ACME v2 server (drop-in for cert-manager / Caddy / Traefik)** | RFC 8555 + RFC 9773 ARI | Run certctl as your internal ACME CA. Per-profile endpoints at `/acme/profile/{id}/*` (directory, new-nonce, new-account, new-order, finalize, account, order, authz, challenge, key-change, revoke-cert, renewal-info). Per-profile `acme_auth_mode`: `trust_authenticated` for internal PKI; `challenge` for HTTP-01 / DNS-01 / TLS-ALPN-01 validation. Doubly-signed key rollover (§7.3.5), revoke-cert (§7.6, both kid-path and jwk-path auth), per-account rate limiting (orders/hour, key-change/hour, challenge-respond/hour), scheduler-driven nonce/authz/order GC. Three client walkthroughs: [cert-manager](docs/acme-cert-manager-walkthrough.md), [Caddy](docs/acme-caddy-walkthrough.md), [Traefik](docs/acme-traefik-walkthrough.md). Reference: [`docs/acme-server.md`](docs/acme-server.md) + [threat model](docs/acme-server-threat-model.md). |
|
||||
| ACME ARI (Renewal Information) | RFC 9773 | CA-directed renewal timing — the CA tells you when to renew (client-side and server-side) |
|
||||
|
||||
### Standards & Revocation
|
||||
|
||||
| Capability | Standard | Notes |
|
||||
|------------|----------|-------|
|
||||
| DER-encoded X.509 CRL | RFC 5280 + RFC 7232 caching | Per-issuer, signed by issuing CA, 24h validity. Pre-generated by the scheduler (`CERTCTL_CRL_GENERATION_INTERVAL`, default 1h) and cached in `crl_cache` so HTTP fetches do not rebuild per request. **Production hardening II:** weak-form `ETag` (W/"<sha256-prefix>") + `Cache-Control: public, max-age=3600, must-revalidate` + `If-None-Match` HTTP 304 short-circuit on `GET /.well-known/pki/crl/{issuer_id}` — CDNs and reverse proxies serve repeated fetches from edge cache. |
|
||||
| CRL DistributionPoints auto-injection | RFC 5280 §4.2.1.13 | **Production hardening II.** Local issuer config field `CRLDistributionPointURLs []string` — when set, every issued cert carries the `id-ce-cRLDistributionPoints` extension pointing at certctl's own CRL endpoint. Refusing to silently inject an empty CDP is deliberate (silent-empty fails relying-party validation worse than no CDP). |
|
||||
| Embedded OCSP responder | RFC 6960 + §4.4.1 nonce echo | GET + POST forms (`POST /.well-known/pki/ocsp/{issuer_id}` per §A.1.1). Signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6) carrying `id-pkix-ocsp-nocheck` (§4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Responder cert auto-rotates within 7d of expiry. **Production hardening II:** RFC 6960 §4.4.1 nonce extension echoed in the response (defends against replay attacks); empty/oversized (>32 bytes per CA/B Forum BR §4.10.2) nonces produce the canonical "unauthorized" status (status 6) — never echo malformed bytes. |
|
||||
| OCSP pre-signed response cache | — | **Production hardening II.** Per-`(issuer, serial)` pre-signed responses in the new `ocsp_response_cache` table; read-through facade in `CAOperationsSvc.GetOCSPResponseWithNonce` consults the cache for nil-nonce requests. **Load-bearing security wire:** `RevocationSvc.RevokeCertificateWithActor` calls `InvalidateOnRevoke` after a successful revoke so the next OCSP fetch returns the revoked status — no stale-good window. |
|
||||
| Per-endpoint rate limits | — | **Production hardening II.** OCSP per-source-IP cap at `CERTCTL_OCSP_RATE_LIMIT_PER_IP_MIN` (default 1000/min, zero disables); cert-export per-actor cap at `CERTCTL_CERT_EXPORT_RATE_LIMIT_PER_ACTOR_HR` (default 50/hr, zero disables). OCSP rate-limit trip returns the canonical "unauthorized" OCSP blob plus `Retry-After: 60`; cert-export trip returns HTTP 429. The OCSP limiter does NOT honor `X-Forwarded-For` (publicly reachable; spoofed headers would bypass the cap). |
|
||||
| Cert-export typed audit | — | **Production hardening II.** Typed action constants (`cert_export_pem` / `cert_export_pkcs12` / `cert_export_pem_with_key` reserved / `cert_export_failed`) emitted via split-emit alongside the legacy bare codes for back-compat. Detail map carries `has_private_key` (always false in V2) and `cipher` (`AES-256-CBC-PBE2-SHA256` — pinned so a future dependency upgrade that changes the encoder default surfaces in audit drift review). |
|
||||
| Prometheus per-area metrics | OpenMetrics | `GET /api/v1/metrics/prometheus` — production hardening II surfaces `certctl_ocsp_counter_total{label="..."}` per-event series (`request_get`/`_post`, `request_success`/`_invalid`, `nonce_echoed`/`_malformed`, `rate_limited`, `signing_failed`, etc.) wired from the shared counter table that ticks in the cache hot path. CRL / cert-export / EST / SCEP / Intune per-area counters plug in via the same `SetXxxCounters` setter pattern as follow-up commits. |
|
||||
| Disaster-recovery runbook | — | **Production hardening II.** [`docs/disaster-recovery.md`](docs/disaster-recovery.md) — 8-section operator-grade runbook: CRL cache recovery, OCSP responder cert recovery, OCSP response cache recovery, CA private-key rotation 9-step playbook, Postgres restore + operator-managed-artifacts list, trust-bundle reload semantics, printable DR checklist. The SOC 2 / PCI procurement-team deliverable. |
|
||||
| S/MIME certificates | RFC 8551 | Email protection EKU, adaptive KeyUsage flags (`DigitalSignature \| ContentCommitment` instead of the TLS default `DigitalSignature \| KeyEncipherment`). |
|
||||
| Certificate export | — | PEM (JSON/file) and PKCS#12 (cert-only trust-store mode via `pkcs12.Modern` — AES-256-CBC PBE2 with SHA-256 KDF). Key-bearing PKCS#12 export deferred — V2 export is cert-only by design (private keys live on agents, never touch the control plane). |
|
||||
| ACME DNS-PERSIST-01 | IETF draft | Standing validation record, no per-renewal DNS updates |
|
||||
|
||||
### Notifiers
|
||||
|
||||
| Notifier | Type |
|
||||
|----------|------|
|
||||
| Email (SMTP) | `Email` |
|
||||
| Webhooks | `Webhook` |
|
||||
| Slack | `Slack` |
|
||||
| Microsoft Teams | `Teams` |
|
||||
| PagerDuty | `PagerDuty` |
|
||||
| OpsGenie | `OpsGenie` |
|
||||
|
||||
All connectors are pluggable — build your own by implementing the [connector interface](docs/connectors.md).
|
||||
|
||||
### Screenshots
|
||||
## Screenshots
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
@@ -156,7 +43,7 @@ All connectors are pluggable — build your own by implementing the [connector i
|
||||
<td><a href="docs/screenshots/v2-certificates.png"><img src="docs/screenshots/v2-certificates.png" width="400" alt="Certificates"></a><br><b>Certificates</b><br><sub>Inventory with bulk ops, status filters, owner/team columns</sub></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="docs/screenshots/v2-issuers.png"><img src="docs/screenshots/v2-issuers.png" width="400" alt="Issuers"></a><br><b>Issuers</b><br><sub>Catalog with 10 CA types, GUI config, test connection</sub></td>
|
||||
<td><a href="docs/screenshots/v2-issuers.png"><img src="docs/screenshots/v2-issuers.png" width="400" alt="Issuers"></a><br><b>Issuers</b><br><sub>Catalog with 12 CA types, GUI config, test connection</sub></td>
|
||||
<td><a href="docs/screenshots/v2-jobs.png"><img src="docs/screenshots/v2-jobs.png" width="400" alt="Jobs"></a><br><b>Jobs</b><br><sub>Issuance, renewal, deployment queue with approval workflow</sub></td>
|
||||
</tr>
|
||||
</table>
|
||||
@@ -165,173 +52,83 @@ All connectors are pluggable — build your own by implementing the [connector i
|
||||
|
||||
## Why certctl
|
||||
|
||||
Certificate lifecycle tooling falls into two camps: enterprise platforms (Venafi, Keyfactor) that cost six figures and take months to deploy, or single-purpose tools (certbot, cert-manager) that handle one slice of the problem. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, and target-agnostic. If you're running certbot cron jobs, manually renewing certs, or stitching together scripts across mixed infrastructure, certctl replaces all of that.
|
||||
Certificate lifecycle tooling has historically split into two camps. Enterprise platforms charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together cron jobs across a fleet, manually renewing certs, or writing custom integration scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.
|
||||
|
||||
Built for **platform engineering and DevOps teams** managing 10–500+ certificates, **security and compliance teams** who need audit trails and policy enforcement for SOC 2, PCI-DSS 4.0, or NIST SP 800-57 ([compliance mapping included](docs/compliance.md)), and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For a detailed comparison, see [Why certctl?](docs/why-certctl.md)
|
||||
Built for **platform engineering and DevOps teams** managing 10 to 500+ certificates, **security teams** who need audit trails and policy enforcement, and **small teams without enterprise budgets** who need enterprise-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see [Why certctl?](docs/getting-started/why-certctl.md).
|
||||
|
||||
**Architecture.** Go 1.25 control plane with handler→service→repository layering, PostgreSQL 16 backend (35+ tables), and a pull-only deployment model — the server never initiates outbound connections. Agents poll for work. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). Background scheduler runs 7 loops: renewal with ARI integration (1h), job processing (30s), agent health (2m), notifications (1m), short-lived cert expiry (30s), network scanning (6h), certificate digest (24h). See [Architecture Guide](docs/architecture.md) for full system diagrams.
|
||||
## What it does
|
||||
|
||||
**Security-first.** Agents generate ECDSA P-256 keys locally — private keys never touch the control plane. API key auth enforced by default with SHA-256 hashing and constant-time comparison. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Atomic idempotency guards on scheduler loops. Issuer and target credentials encrypted at rest with AES-256-GCM. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit.
|
||||
certctl handles the full certificate lifecycle in one self-hosted control plane:
|
||||
|
||||
**Key design decisions.** TEXT primary keys — human-readable prefixed IDs (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resources at a glance in logs and queries. Idempotent migrations (`IF NOT EXISTS`, `ON CONFLICT DO NOTHING`) safe for repeated execution. Dynamic configuration via GUI with AES-256-GCM encrypted credential storage and env var backward compatibility. Handlers define their own service interfaces for clean dependency inversion.
|
||||
- **Issue and renew** from any CA. Let's Encrypt and any ACME provider, an embedded ACME server you can point cert-manager / certbot / lego at directly, a built-in local CA with sub-CA mode (chains under your enterprise root like ADCS), step-ca, Vault PKI, EJBCA, AWS ACM PCA, Google CAS, DigiCert, Sectigo, GlobalSign, Entrust, plus an OpenSSL / shell-script adapter for anything custom. Twelve native issuer connectors. See the [connector reference](docs/reference/connectors/index.md).
|
||||
- **Deploy automatically** to NGINX, Apache, HAProxy, Caddy, Traefik, Envoy, IIS, Windows Cert Store, Java keystore, Kubernetes Secrets, AWS ACM, Azure Key Vault, SSH known-hosts, Postfix + Dovecot, F5 BIG-IP. Fifteen native target connectors. Every deploy goes through atomic-write + ownership-preservation + SHA-256 idempotency + per-target Prometheus counters + pre-deploy snapshot + on-failure rollback. See [`docs/reference/deployment-model.md`](docs/reference/deployment-model.md).
|
||||
- **Run as an ACME server** so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See [`docs/reference/protocols/acme-server.md`](docs/reference/protocols/acme-server.md).
|
||||
- **Run as a SCEP server** for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See [`docs/reference/protocols/scep-server.md`](docs/reference/protocols/scep-server.md).
|
||||
- **Run as an EST server** for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See [`docs/reference/protocols/est.md`](docs/reference/protocols/est.md).
|
||||
- **Manage multi-level CA hierarchies** with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU `PermittedDNSDomains`, and 2-level internal PKI. See [`docs/reference/intermediate-ca-hierarchy.md`](docs/reference/intermediate-ca-hierarchy.md).
|
||||
- **Gate high-stakes issuance** behind two-person-integrity approval. Flag a profile as `RequiresApproval`, the request lands in a queue, a non-requester approves, the scheduler dispatches. Profile-edit changes on approval-tier profiles route through the same gate so the flip-flop bypass is closed. See [`docs/operator/approval-workflow.md`](docs/operator/approval-workflow.md).
|
||||
- **Authorize with role-based access control.** Seven default roles (admin, operator, viewer, agent, mcp, cli, auditor) over a fine-grained permission catalogue with global / per-profile / per-issuer scope. Auditor role is read-only on the audit trail (`audit.read` + `audit.export`, nothing else) so a regulator's key cannot read certificates or mutate config. Day-0 admin via a one-shot `CERTCTL_BOOTSTRAP_TOKEN` endpoint that closes itself the moment any admin lands. Privilege-escalation guard requires `auth.role.assign` to grant or revoke a role. See [`docs/operator/rbac.md`](docs/operator/rbac.md), [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md), and the v2.0.x → v2.1.0 [migration guide](docs/migration/api-keys-to-rbac.md).
|
||||
- **Sign in with OIDC SSO** against any standards-compliant identity provider. Per-IdP setup runbooks for Keycloak, Authentik, Okta, Auth0, Microsoft Entra ID, and Google Workspace. Group-claim → role mapping for automatic provisioning; client_secret encrypted at rest (AES-256-GCM); JWKS auto-refresh on `kid` miss; PKCE-S256 required; RFC 9700 §4.7.1 pre-login UA/IP binding; RFC 9207 `iss` URL-param check on callback. Server mints HMAC-signed session cookies with the `__Host-` prefix (browser-enforced subdomain-takeover defense), CSRF rotation on every privileged write, and idle + absolute expiry. [RFC OIDC Back-Channel Logout 1.0](docs/reference/auth-standards-implemented.md) revokes sessions on IdP-driven logout. Argon2id break-glass admin path for SSO-outage recovery — disabled by default; 404-invisible to scanners when `CERTCTL_BREAKGLASS_ENABLED=false`. See [`docs/operator/oidc-runbooks/index.md`](docs/operator/oidc-runbooks/index.md) for the per-IdP onboarding guides and [`docs/migration/oidc-enable.md`](docs/migration/oidc-enable.md) for enabling SSO on an existing deploy.
|
||||
- **Discover** existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
|
||||
- **Revoke** with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See [`docs/reference/protocols/crl-ocsp.md`](docs/reference/protocols/crl-ocsp.md).
|
||||
- **Alert** via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See [`docs/operator/runbooks/expiry-alerts.md`](docs/operator/runbooks/expiry-alerts.md).
|
||||
- **Drive the platform from natural language** via the bundled MCP (Model Context Protocol) server. The full REST API is exposed as MCP tools — ask your AI client "show me all expiring certificates", "revoke the VPN cert, key compromised", or "what agents are offline?" and it translates to API calls. Stateless stdio-transport binary at `cmd/mcp-server/`; same auth as the REST API; no extra attack surface. See [`docs/reference/mcp.md`](docs/reference/mcp.md).
|
||||
|
||||
## What It Does
|
||||
## Architecture and security
|
||||
|
||||
**Automated lifecycle.** Certificates renew and deploy themselves. The scheduler monitors expiration, issues through your CA, and deploys to targets — zero human intervention. ACME ARI (RFC 9773) lets the CA direct renewal timing. Ready for 47-day (SC-081v3) and 6-day (Let's Encrypt shortlived) certificate lifetimes.
|
||||
Go 1.25 control plane with handler → service → repository layering. PostgreSQL 16 backend with idempotent migrations. Pull-only deployment model — the server never initiates outbound connections. Agents poll for work and generate ECDSA P-256 keys locally so private keys never touch the control plane. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). See the [Architecture Guide](docs/reference/architecture.md) for full system diagrams.
|
||||
|
||||
**Operational dashboard.** 30+ page GUI covers the entire lifecycle: certificate inventory with bulk ops, deployment timeline with rollback, discovery triage, network scan management, agent fleet health, short-lived credential countdown, approval workflows, CA-hierarchy management, and observability metrics. Configure issuers and targets from the dashboard — no env var editing, no server restarts.
|
||||
|
||||
**Private keys stay on your servers.** Agents generate ECDSA P-256 keys locally, submit only the CSR. The control plane never touches private keys. After deployment, agents probe the live TLS endpoint and compare SHA-256 fingerprints to confirm the right certificate is actually being served.
|
||||
|
||||
**Discovery.** Agents scan filesystems for existing PEM/DER certificates. The network scanner probes TLS endpoints across CIDR ranges without agents. Cloud discovery finds certificates in AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager. Continuous TLS health monitoring tracks endpoint status (healthy/degraded/down/cert_mismatch) with configurable thresholds and historical probe data. All discovery modes feed into a unified triage workflow — claim, dismiss, or import what you find.
|
||||
|
||||
**Policy engine.** Certificate profiles constrain key types, max TTL, and EKUs — with crypto policy enforcement that validates every CSR against profile rules before it reaches the issuer. MaxTTL caps are enforced per issuer connector. Ownership tracking routes notifications to the right team. Agent groups match devices by OS, architecture, IP CIDR, and version.
|
||||
|
||||
**Two-person integrity for issuance (compliance-grade).** Set `requires_approval=true` on a `CertificateProfile` and every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves via `POST /api/v1/approvals/{id}/approve`. Same-actor self-approval is rejected at the service layer with `ErrApproveBySameActor` → HTTP 403. Bypass mode (`CERTCTL_APPROVAL_BYPASS=true`) is auditable — every auto-approve records `actor=system-bypass` so audit-tier review surfaces it. Closes the procurement-checklist question for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA. See [`docs/approval-workflow.md`](docs/approval-workflow.md).
|
||||
|
||||
**Multi-level CA hierarchy management.** Set `Issuer.HierarchyMode = "tree"` and certctl manages a real N-level CA tree backed by the `intermediate_cas` table — root → policy → issuing leaves. RFC 5280 §3.2 (self-signed root validation), §4.2.1.9 (path-length tightening), and §4.2.1.10 (NameConstraints subset semantics) are all enforced at the service layer fail-closed. Drain-first retirement (active → retiring → retired) refuses terminal transitions while active children remain. Patterns documented for FedRAMP boundary CAs (4-level), financial-services policy CAs (3-level with per-BU `PermittedDNSDomains`), and internal PKI (2-level). The pre-Rank-8 single-sub-CA flow stays byte-identical for unmigrated deployments — pinned by `TestLocal_HierarchyMode_SingleVsTree_ByteIdentical`. See [`docs/intermediate-ca-hierarchy.md`](docs/intermediate-ca-hierarchy.md).
|
||||
|
||||
**Run certctl as your ACME server.** Beyond consuming public ACME CAs (Let's Encrypt, ZeroSSL), certctl now *serves* RFC 8555 — point cert-manager, Caddy, or Traefik at certctl's per-profile ACME endpoints (`/acme/profile/{id}/*`) and you get internal-PKI cert issuance with the same wire protocol the public CAs use. Full surface: directory, new-nonce, new-account, new-order, finalize, key-change (§7.3.5), revoke-cert (§7.6), renewal-info (RFC 9773 ARI), HTTP-01 / DNS-01 / TLS-ALPN-01 validation, per-account rate limiting, scheduler-driven nonce / authz / order GC. Three client walkthroughs ship — [cert-manager](docs/acme-cert-manager-walkthrough.md), [Caddy](docs/acme-caddy-walkthrough.md), [Traefik](docs/acme-traefik-walkthrough.md) — plus the [operator reference](docs/acme-server.md) and [threat model](docs/acme-server-threat-model.md).
|
||||
|
||||
**Enrollment protocols.** EST server (RFC 7030) for device and WiFi enrollment. SCEP server (RFC 8894) for MDM platforms and network devices — full wire format (EnvelopedData decrypt + signerInfo POPO verify + CertRep PKIMessage builder), tested against ChromeOS-shape requests; multi-profile dispatch (`/scep/<pathID>`); RenewalReq + GetCertInitial messageType support; lightweight raw-CSR fallback for legacy clients. See [docs/legacy-est-scep.md](docs/legacy-est-scep.md) for the operator + device-integration guide. S/MIME issuance with email protection EKU.
|
||||
|
||||
**Revocation.** Single and bulk revocation (by profile, owner, agent, or issuer). RFC 5280 reason codes. Production-grade revocation status surface for relying parties: DER-encoded X.509 CRL per issuer, scheduler-pre-generated and cached so HTTP fetches do not rebuild per request; embedded OCSP responder serving both GET and POST forms (RFC 6960 §A.1.1) with responses signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6, `id-pkix-ocsp-nocheck` per §4.2.2.2.1) — the CA private key is never used directly for OCSP signing. Both endpoints live unauthenticated under `/.well-known/pki/` per RFC 8615. Short-lived certs (TTL < 1 hour) are exempt — expiry is sufficient revocation. See [docs/crl-ocsp.md](docs/crl-ocsp.md) for the relying-party integration guide.
|
||||
|
||||
**Audit and observability.** Immutable append-only audit trail records every lifecycle action, every API call, and every approval decision. Prometheus metrics endpoint. Scheduled certificate digest emails. Continuous endpoint health monitoring with state machine transitions and real-time alerts.
|
||||
|
||||
**Notifications + per-policy multi-channel routing.** Slack, Teams, PagerDuty, OpsGenie, SMTP, webhooks. Routed by certificate owner. Daily digest emails with stats and expiring certs. Each `RenewalPolicy` carries an `AlertChannels` matrix (per-severity-tier channel set) + `AlertSeverityMap` (per-threshold tier resolution) so production-tier 7-day alerts page PagerDuty *and* Slack while informational 30-day alerts go email-only. Per-channel dispatch is fault-isolating — a PagerDuty failure does NOT skip Slack/Email at the same threshold. Per-channel dedup row + audit row + Prometheus counter (`certctl_expiry_alerts_total{channel,threshold,result}`). See [`docs/runbook-expiry-alerts.md`](docs/runbook-expiry-alerts.md).
|
||||
|
||||
**Cloud-managed targets.** Beyond on-server deploys (NGINX, Apache, IIS, F5, ...), certctl pushes renewed certs directly into AWS Certificate Manager (`ImportCertificate` + `DescribeCertificate` snapshot for atomic rollback + tag re-application) and Azure Key Vault (PEM→PKCS#12 import + snapshot CER bytes for rollback + tag carry-forward). The control plane never touches the cloud credentials — agents own them. See [`docs/runbook-cloud-targets.md`](docs/runbook-cloud-targets.md).
|
||||
|
||||
**Multiple interfaces.** REST API (180+ routes), CLI (`certs` / `agents` / `jobs` / `import` / `est` / `status` / `version` command groups), MCP server (85+ tools for Claude, Cursor, Windsurf), Helm chart, web dashboard. Certificate export in PEM and PKCS#12.
|
||||
|
||||
**First-run onboarding.** Wizard guides you through connecting a CA, deploying an agent, and issuing your first certificate. Or start with the pre-populated demo — 32 certificates, 10 issuers, 180 days of history.
|
||||
|
||||
For the complete capability breakdown, see the [Feature Inventory](docs/features.md).
|
||||
Security: three authentication paths — API keys (SHA-256 hashed + constant-time compared), [OIDC SSO](docs/operator/oidc-runbooks/index.md) (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace), and Argon2id [break-glass admin](docs/operator/security.md) for SSO-outage recovery. Successful OIDC login mints an HMAC-signed server-side session with `__Host-` cookies, CSRF rotation on every privileged write, and [RFC OIDC Back-Channel Logout](docs/reference/auth-standards-implemented.md) for IdP-driven session revoke. Role-based authorization on every gated handler with global / per-profile / per-issuer scope. Auditor split keeps regulator-class actors strictly read-only on the audit trail. Day-0 admin via a one-shot bootstrap token; granting or revoking roles requires the dedicated `auth.role.assign` permission. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Issuer + target + OIDC client_secret credentials encrypted at rest with AES-256-GCM. HTTPS-only control plane with TLS 1.3 pinned and a fail-closed startup gate that refuses to boot if the TLS bundle is unusable. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, static analysis, and vulnerability scanning on every commit. See [`docs/operator/security.md`](docs/operator/security.md) for the full posture and [`docs/operator/auth-threat-model.md`](docs/operator/auth-threat-model.md) for what's defended vs deferred.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Docker Compose (Recommended)
|
||||
### Docker Compose (recommended)
|
||||
|
||||
```bash
|
||||
git clone https://github.com/certctl-io/certctl.git
|
||||
cd certctl
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
Wait ~30 seconds, then open **https://localhost:8443** in your browser. (The shipped `docker-compose.yml` self-signs a cert via the `certctl-tls-init` init container on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.) The onboarding wizard walks you through connecting a CA, deploying an agent, and issuing your first certificate.
|
||||
|
||||
**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history:
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
|
||||
```
|
||||
|
||||
The `deploy/` directory has four compose files: `docker-compose.yml` (base platform), `docker-compose.demo.yml` (demo data overlay), `docker-compose.dev.yml` (PgAdmin + debug logging), and `docker-compose.test.yml` (standalone integration tests with real CA backends). See the [Docker Compose Environments Guide](deploy/ENVIRONMENTS.md) for a service-by-service walkthrough, or the [Quick Start](docs/quickstart.md#docker-compose-environments) for a summary.
|
||||
Wait ~30 seconds, then open **https://localhost:8443** in your browser. The shipped demo overlay seeds 180 days of realistic history across 13 issuers, 8 agents, managed + discovered certs, jobs, deploys, audit, and notification events. The `certctl-tls-init` init container self-signs an ECDSA-P256 cert on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.
|
||||
|
||||
For a clean install without demo data, drop the `-f deploy/docker-compose.demo.yml` flag and run `docker compose -f deploy/docker-compose.yml up -d --build`. The four compose files (`docker-compose.yml` base, `docker-compose.demo.yml` overlay, `docker-compose.dev.yml` for PgAdmin + debug logging, `docker-compose.test.yml` for integration tests) are documented at [`deploy/ENVIRONMENTS.md`](deploy/ENVIRONMENTS.md).
|
||||
|
||||
```bash
|
||||
curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
```
|
||||
|
||||
The control plane is HTTPS-only (TLS 1.3, no plaintext listener). See [`docs/tls.md`](docs/tls.md) for cert provisioning patterns and [`docs/upgrade-to-tls.md`](docs/upgrade-to-tls.md) if you're upgrading from a pre-v2.2 release.
|
||||
The control plane is HTTPS-only with TLS 1.3 pinned. See [`docs/operator/tls.md`](docs/operator/tls.md) for cert provisioning patterns.
|
||||
|
||||
### Agent Install (One-Liner)
|
||||
### Agent install (one-liner)
|
||||
|
||||
```bash
|
||||
curl -sSL https://raw.githubusercontent.com/certctl-io/certctl/master/install-agent.sh | bash
|
||||
```
|
||||
|
||||
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh) for details.
|
||||
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh).
|
||||
|
||||
### Helm Chart (Kubernetes)
|
||||
### Helm chart (Kubernetes)
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set server.apiKey=your-api-key \
|
||||
--set postgres.password=your-db-password
|
||||
--set server.auth.apiKey=your-api-key \
|
||||
--set postgresql.password=your-db-password
|
||||
```
|
||||
|
||||
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml) for all configuration options.
|
||||
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml).
|
||||
|
||||
### Docker Pull
|
||||
### Container images
|
||||
|
||||
```bash
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-server
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-agent
|
||||
```
|
||||
|
||||
## Verifying this release
|
||||
|
||||
Every `v*` tag publishes signed, attested release artefacts. Binaries
|
||||
(`certctl-agent`, `certctl-server`, `certctl-cli`, `certctl-mcp-server` for
|
||||
`linux|darwin × amd64|arm64`) ship alongside a `checksums.txt`, per-binary
|
||||
SPDX-JSON SBOMs, Cosign signatures, and SLSA Level 3 provenance. Container
|
||||
images on `ghcr.io/certctl-io/certctl-{server,agent}` are built with
|
||||
`docker/build-push-action` `provenance: mode=max` + `sbom: true` and are
|
||||
additionally signed with Cosign at the image digest.
|
||||
|
||||
All signatures use Cosign keyless OIDC; the signing identity is the
|
||||
release workflow running on a signed tag.
|
||||
|
||||
**1. Verify SHA-256 checksums:**
|
||||
|
||||
```bash
|
||||
sha256sum -c checksums.txt
|
||||
```
|
||||
|
||||
**2. Verify the Cosign signature on `checksums.txt`:**
|
||||
|
||||
```bash
|
||||
cosign verify-blob \
|
||||
--bundle checksums.txt.sigstore.json \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
checksums.txt
|
||||
```
|
||||
|
||||
Every individual binary ships with its own `.sigstore.json` bundle
|
||||
(unified Sigstore bundle containing signature, certificate chain, and
|
||||
Rekor inclusion proof). Swap `checksums.txt` for any binary name and
|
||||
point `--bundle` at the matching `<binary>.sigstore.json` to verify it
|
||||
directly.
|
||||
|
||||
**3. Verify SLSA Level 3 provenance on a binary:**
|
||||
|
||||
```bash
|
||||
slsa-verifier verify-artifact \
|
||||
--provenance-path multiple.intoto.jsonl \
|
||||
--source-uri github.com/certctl-io/certctl \
|
||||
--source-tag v2.1.0 \
|
||||
certctl-agent-linux-amd64
|
||||
```
|
||||
|
||||
**4. Verify a container image signature and its SBOM / provenance attestations:**
|
||||
|
||||
```bash
|
||||
IMAGE=ghcr.io/certctl-io/certctl-server:v2.1.0
|
||||
|
||||
cosign verify \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SBOM attestation (SPDX-JSON, emitted by docker/build-push-action)
|
||||
cosign verify-attestation --type spdxjson \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SLSA provenance attestation (docker/build-push-action `provenance: mode=max`)
|
||||
cosign verify-attestation --type slsaprovenance \
|
||||
--certificate-identity-regexp '^https://github\.com/certctl-io/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
docker pull ghcr.io/certctl-io/certctl-server:latest
|
||||
docker pull ghcr.io/certctl-io/certctl-agent:latest
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
Pick the scenario closest to your setup and have it running in 2 minutes.
|
||||
Pick the scenario closest to your setup and have it running in 2 minutes:
|
||||
|
||||
| Example | Scenario |
|
||||
|---------|----------|
|
||||
@@ -343,115 +140,40 @@ Pick the scenario closest to your setup and have it running in 2 minutes.
|
||||
|
||||
Each directory contains a `docker-compose.yml` and a `README.md` explaining the scenario, prerequisites, and customization.
|
||||
|
||||
## CLI
|
||||
## Verifying a release
|
||||
|
||||
```bash
|
||||
# Install
|
||||
go install github.com/certctl-io/certctl/cmd/cli@latest
|
||||
|
||||
# Configure
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # or --ca-bundle on the CLI; --insecure for dev self-signed
|
||||
|
||||
# Usage
|
||||
certctl-cli certs list # List all certificates
|
||||
certctl-cli certs renew mc-api-prod # Trigger renewal
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
certctl-cli agents list # List registered agents
|
||||
certctl-cli jobs list # List jobs
|
||||
certctl-cli status # Server health + summary stats
|
||||
certctl-cli import certs.pem # Bulk import from PEM file
|
||||
certctl-cli certs list --format json # JSON output (default: table)
|
||||
```
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
|
||||
certctl ships a standalone MCP (Model Context Protocol) server that exposes all 80 API endpoints as tools for AI assistants — Claude, Cursor, Windsurf, OpenClaw, VS Code Copilot, and any MCP-compatible client.
|
||||
|
||||
```bash
|
||||
# Install and run
|
||||
go install github.com/certctl-io/certctl/cmd/mcp-server@latest
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # required for self-signed bootstrap
|
||||
mcp-server
|
||||
```
|
||||
|
||||
The MCP server is env-vars-only — there are no CLI flags for TLS. If you must bypass verification for local development against a self-signed cert, set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`. Never set that in production.
|
||||
|
||||
**Claude Desktop** (`claude_desktop_config.json`):
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"certctl": {
|
||||
"command": "mcp-server",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_API_KEY": "your-api-key",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/ca.crt"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
Every `v*` tag publishes signed, attested artefacts (Cosign keyless OIDC + SLSA Level 3 provenance + SPDX-JSON SBOMs). For the verification procedure, see [`docs/reference/release-verification.md`](docs/reference/release-verification.md).
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
make build # Build server + agent binaries
|
||||
make test # Run tests
|
||||
make lint # golangci-lint (11 linters)
|
||||
make lint # golangci-lint (govet + staticcheck + contextcheck + unused)
|
||||
govulncheck ./... # Vulnerability scan
|
||||
make docker-up # Start Docker Compose stack
|
||||
```
|
||||
|
||||
CI runs on every push: `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%). Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build. 1,668 Go test functions with 625+ subtests, plus frontend test suite.
|
||||
CI runs `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-package coverage thresholds (service 70%, handler 75%, crypto 88%, auth packages 85-95%) on every push. The thresholds-as-data file is `.github/coverage-thresholds.yml`; lowering a floor requires corresponding test work, not a config flip. Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### V1 (v1.0.0) — Shipped
|
||||
Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector, agent-side key generation, API auth + rate limiting, React dashboard, CI pipeline with coverage gates, Docker images on GHCR.
|
||||
|
||||
### V2: Operational Maturity — Shipped
|
||||
|
||||
40+ milestones shipping enterprise-grade features for free. Highlights below; the [Feature Inventory](docs/features.md) has the complete reference.
|
||||
|
||||
- **Issuers (12).** Local CA (self-signed + sub-CA + tree-mode N-level hierarchy), ACME (DNS-01 / DNS-PERSIST-01 / EAB / ARI / profile selection), step-ca, Vault PKI (with auto-token-renewal at TTL/2), DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM PCA, Entrust (mTLS), GlobalSign Atlas HVCA, EJBCA (mTLS auto-reload via `mtlscache`), OpenSSL/Custom CA shell adapter.
|
||||
- **On-server deploy targets (14).** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets — every connector goes through `internal/deploy.Apply` for atomic-write + ownership preservation + SHA-256 idempotency + per-target Prometheus counters + pre-deploy snapshot + on-failure rollback.
|
||||
- **Cloud-managed deploy targets (2).** AWS Certificate Manager + Azure Key Vault — SDK-driven import with snapshot bytes for atomic rollback, tag carry-forward, no cloud creds touch the control plane. ([runbook](docs/runbook-cloud-targets.md))
|
||||
- **certctl as an ACME server.** Full RFC 8555 surface (per-profile endpoints, accounts, orders, finalize, key-change §7.3.5, revoke-cert §7.6) + RFC 9773 ARI + HTTP-01 / DNS-01 / TLS-ALPN-01 validation + per-account rate limiting + scheduler-driven nonce/authz/order GC. Drop in for cert-manager / Caddy / Traefik. ([reference](docs/acme-server.md), [threat model](docs/acme-server-threat-model.md))
|
||||
- **Enrollment protocols.** EST server (RFC 7030 + RFC 9266 channel binding, multi-profile dispatch, libest-tested CI). SCEP server (RFC 8894 full wire format, Microsoft Intune Connector signed-challenge dispatcher with replay cache + per-device rate limit, ChromeOS-shape interop).
|
||||
- **Two-person-integrity approval workflow.** Per-profile `requires_approval=true` gate, `JobStatusAwaitingApproval` scheduler skip, same-actor RBAC reject, auditable bypass mode. Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA. ([playbook](docs/approval-workflow.md))
|
||||
- **First-class CA hierarchy management.** `intermediate_cas` table, RFC 5280 §3.2 / §4.2.1.9 / §4.2.1.10 service-layer enforcement, drain-first retire (active → retiring → retired), 4 admin-gated endpoints, GUI tree view. Patterns documented for FedRAMP / financial-services / internal PKI. ([runbook](docs/intermediate-ca-hierarchy.md))
|
||||
- **Multi-channel expiry alerts.** Per-policy `AlertChannels` matrix + `AlertSeverityMap`, fault-isolating per-channel dispatch (PagerDuty failure does not skip Slack/Email at the same threshold), per-channel dedup + audit + Prometheus counter. ([runbook](docs/runbook-expiry-alerts.md))
|
||||
- **Revocation infrastructure.** RFC 5280 DER CRL per issuer (scheduler-pre-generated + ETag-cached) + embedded RFC 6960 OCSP responder (dedicated per-issuer responder cert per §2.6, `id-pkix-ocsp-nocheck`, RFC §4.4.1 nonce echo, OCSP response cache with revoke-invalidate hot path). Single + bulk revocation. ([guide](docs/crl-ocsp.md))
|
||||
- **Discovery & lifecycle.** Filesystem, network-CIDR, and cloud secret manager (AWS SM / Azure KV / GCP SM) certificate discovery with triage GUI. Continuous endpoint health monitoring. ACME ARI client-driven renewal timing. Approval workflows. Ownership routing. Agent groups (OS / arch / IP CIDR / version match).
|
||||
- **Secrets at rest.** Issuer + target config encrypted with AES-256-GCM (versioned blob format, PBKDF2-SHA256 100K rounds, fail-closed sentinel `ErrEncryptionKeyRequired`). Vault token + DigiCert API key + EJBCA / GlobalSign / Sectigo credentials migrated to opaque `*secret.Ref` references.
|
||||
- **Operator interfaces.** REST API (180+ routes), CLI (`certs` / `agents` / `jobs` / `import` / `est` / `status` / `version` command groups), MCP server (85+ tools for Claude / Cursor / Windsurf), Helm chart, 30+ page web dashboard with first-run onboarding wizard.
|
||||
- **Compliance.** SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 mapping ([compliance docs](docs/compliance.md)). Disaster-recovery runbook (8-section operator-grade procedure). Migration guides from [certbot](docs/migrate-from-certbot.md), [acme.sh](docs/migrate-from-acmesh.md), and [cert-manager](docs/certctl-for-cert-manager-users.md).
|
||||
|
||||
### Forward-looking work — all free, all self-hostable
|
||||
Everything ships free under BSL 1.1. No paid tier, no V3 / V4 gating, no enterprise edition. Future revenue path is a managed-service hosting offering — operate certctl-server as a hosted service while customers self-install only the agent.
|
||||
For the full contributor guide see [`docs/contributor/`](docs/contributor/) — testing strategy, test environment, CI pipeline, QA prerequisites.
|
||||
|
||||
## License
|
||||
|
||||
Certctl is licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial offering to third parties, whether hosted, managed, embedded, bundled, or integrated.
|
||||
Licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial certificate-management offering to third parties. See the LICENSE file for the full Additional Use Grant.
|
||||
|
||||
For licensing inquiries: certctl@proton.me
|
||||
|
||||
## Dependencies
|
||||
|
||||
Backend dependency footprint is auditable on demand:
|
||||
|
||||
```
|
||||
```bash
|
||||
go list -m all | wc -l # total module count (direct + transitive)
|
||||
go mod why <path> # explain why a particular module is pulled in
|
||||
go mod why <path> # explain why a module is pulled in
|
||||
govulncheck ./... # vulnerability scan (CI runs this on every commit)
|
||||
```
|
||||
|
||||
The release-time SBOM is published as a syft-produced cyclonedx file alongside each release artifact in `.github/workflows/release.yml`.
|
||||
The release-time SBOM is published as an SPDX-JSON file alongside each release artifact.
|
||||
|
||||
---
|
||||
|
||||
If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests — [open an issue](https://github.com/certctl-io/certctl/issues).
|
||||
If certctl solves a problem you have, [star the repo](https://github.com/certctl-io/certctl) to help others find it. Questions, bugs, or feature requests: [open an issue](https://github.com/certctl-io/certctl/issues).
|
||||
|
||||
@@ -92,3 +92,68 @@ documented_exceptions:
|
||||
why: "Phase 4 default-profile shorthand for revoke-cert."
|
||||
- route: "GET /acme/renewal-info/{cert_id}"
|
||||
why: "Phase 4 default-profile shorthand for ARI."
|
||||
|
||||
# =============================================================================
|
||||
# Auth Bundle 2 + audit-2026-05-10/11 fix bundle — REST endpoints not yet
|
||||
# represented in api/openapi.yaml. These are operator-facing REST endpoints
|
||||
# (not protocol-shaped); the OpenAPI surface is scheduled to land pre-v2.2.0
|
||||
# alongside the GUI E2E coverage push. Documented here so the parity guard
|
||||
# stays green for the v2.1.0 release tag. Threat model + handler contracts
|
||||
# live in docs/operator/{rbac.md,auth-threat-model.md,oidc-runbooks/*}.
|
||||
# =============================================================================
|
||||
- route: "GET /auth/oidc/login"
|
||||
why: "Bundle 2 Phase 5 OIDC login redirect; user-facing 302 with state cookie. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "GET /auth/oidc/callback"
|
||||
why: "Bundle 2 Phase 5 OIDC callback handler; RFC 9700 §4.7.1 + RFC 9207. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "POST /auth/logout"
|
||||
why: "Bundle 2 Phase 5 cookie + CSRF revoker. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "POST /auth/breakglass/login"
|
||||
why: "Bundle 2 Phase 7.5 public break-glass login (auth-bypass, 404 when disabled). OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "POST /auth/oidc/back-channel-logout"
|
||||
why: "Bundle 2 Phase 5 RFC OIDC Back-Channel Logout 1.0 endpoint. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "GET /api/v1/auth/sessions"
|
||||
why: "Bundle 2 Phase 5 self/admin session list. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "DELETE /api/v1/auth/sessions/{id}"
|
||||
why: "Bundle 2 Phase 5 session revoke. OpenAPI rep deferred to pre-2.2.0."
|
||||
- route: "DELETE /api/v1/auth/sessions"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-2/3 revoke-all-except-current."
|
||||
- route: "GET /api/v1/auth/oidc/providers"
|
||||
why: "Bundle 2 Phase 5 OIDC provider CRUD (list)."
|
||||
- route: "POST /api/v1/auth/oidc/providers"
|
||||
why: "Bundle 2 Phase 5 OIDC provider CRUD (create)."
|
||||
- route: "PUT /api/v1/auth/oidc/providers/{id}"
|
||||
why: "Bundle 2 Phase 5 OIDC provider CRUD (update)."
|
||||
- route: "DELETE /api/v1/auth/oidc/providers/{id}"
|
||||
why: "Bundle 2 Phase 5 OIDC provider CRUD (delete)."
|
||||
- route: "POST /api/v1/auth/oidc/providers/{id}/refresh"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-7 JWKS hot-refresh."
|
||||
- route: "GET /api/v1/auth/oidc/providers/{id}/jwks-status"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-7 JWKS health snapshot."
|
||||
- route: "POST /api/v1/auth/oidc/test"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-5 dry-run discovery + JWKS + alg-downgrade check."
|
||||
- route: "GET /api/v1/auth/oidc/group-mappings"
|
||||
why: "Bundle 2 Phase 5 group-mapping CRUD (list)."
|
||||
- route: "POST /api/v1/auth/oidc/group-mappings"
|
||||
why: "Bundle 2 Phase 5 group-mapping CRUD (create)."
|
||||
- route: "DELETE /api/v1/auth/oidc/group-mappings/{id}"
|
||||
why: "Bundle 2 Phase 5 group-mapping CRUD (delete)."
|
||||
- route: "GET /api/v1/auth/breakglass/credentials"
|
||||
why: "Bundle 2 Phase 7.5 admin break-glass list (404 when disabled; password hash never on wire)."
|
||||
- route: "POST /api/v1/auth/breakglass/credentials"
|
||||
why: "Bundle 2 Phase 7.5 admin break-glass set/rotate password."
|
||||
- route: "POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock"
|
||||
why: "Bundle 2 Phase 7.5 admin break-glass unlock after lockout."
|
||||
- route: "DELETE /api/v1/auth/breakglass/credentials/{actor_id}"
|
||||
why: "Bundle 2 Phase 7.5 admin break-glass credential delete."
|
||||
- route: "GET /api/v1/auth/users"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-11 users page."
|
||||
- route: "DELETE /api/v1/auth/users/{id}"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-11 user deactivate."
|
||||
- route: "POST /api/v1/auth/users/{id}/reactivate"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-11 user reactivate."
|
||||
- route: "GET /api/v1/auth/runtime-config"
|
||||
why: "Bundle 2 audit-2026-05-10 MED-12 effective auth-runtime-config (read-only)."
|
||||
- route: "POST /api/v1/auth/demo-residual/cleanup"
|
||||
why: "Audit 2026-05-11 A-8 demo-mode residual-grants cleanup endpoint."
|
||||
- route: "GET /api/v1/audit/export"
|
||||
why: "Bundle 1 Phase 8 streaming NDJSON audit export."
|
||||
|
||||
+560
-8
@@ -134,12 +134,23 @@ paths:
|
||||
type: string
|
||||
# G-1 (P1): "jwt" removed from this enum after the silent
|
||||
# auth downgrade was identified — no JWT middleware ships
|
||||
# with certctl. Operators who need JWT/OIDC front certctl
|
||||
# with an authenticating gateway (oauth2-proxy / Envoy /
|
||||
# Traefik / Pomerium) and set CERTCTL_AUTH_TYPE=none
|
||||
# upstream. See docs/architecture.md "Authenticating-
|
||||
# gateway pattern".
|
||||
enum: [api-key, none]
|
||||
# with certctl. Operators who need JWT continue to front
|
||||
# certctl with an authenticating gateway (oauth2-proxy /
|
||||
# Envoy / Traefik / Pomerium) and set
|
||||
# CERTCTL_AUTH_TYPE=none upstream. See
|
||||
# docs/architecture.md "Authenticating-gateway pattern".
|
||||
#
|
||||
# Auth Bundle 2 Phase 0: "oidc" added to the enum. The
|
||||
# session middleware + OIDC handler chain ship in later
|
||||
# Bundle 2 phases; until they land, setting
|
||||
# CERTCTL_AUTH_TYPE=oidc fails the runtime guard in
|
||||
# cmd/server/main.go with an actionable error rather
|
||||
# than silently falling back to api-key (the G-1
|
||||
# failure mode). The literal is in the enum so the GUI
|
||||
# Login page (Phase 8) can render OIDC provider
|
||||
# buttons against an /auth/info response that reflects
|
||||
# the configured auth_type.
|
||||
enum: [api-key, none, oidc]
|
||||
required:
|
||||
type: boolean
|
||||
|
||||
@@ -147,7 +158,16 @@ paths:
|
||||
get:
|
||||
tags: [Health]
|
||||
summary: Validate credentials
|
||||
description: Returns 200 if auth credentials are valid, 401 otherwise.
|
||||
description: |
|
||||
Returns 200 if auth credentials are valid, 401 otherwise.
|
||||
|
||||
Bundle 1 Phase 3 closure (M1): when the server has the RBAC
|
||||
primitive wired (Bundle 1 default), the response also includes
|
||||
the caller's `actor_id`, `actor_type`, `tenant_id`, the
|
||||
`roles` they hold, and `effective_permissions` they resolve
|
||||
to. The legacy `admin` boolean is preserved for back-compat
|
||||
with pre-Bundle-1 GUIs; new GUIs should switch to
|
||||
`effective_permissions` for affordance gating.
|
||||
operationId: checkAuth
|
||||
responses:
|
||||
"200":
|
||||
@@ -156,13 +176,464 @@ paths:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [status]
|
||||
properties:
|
||||
status:
|
||||
type: string
|
||||
example: authenticated
|
||||
user:
|
||||
type: string
|
||||
description: Named-key identity (empty when CERTCTL_AUTH_TYPE=none)
|
||||
admin:
|
||||
type: boolean
|
||||
description: Legacy admin flag (back-compat with pre-Bundle-1 GUIs).
|
||||
actor_id:
|
||||
type: string
|
||||
description: Actor identifier for the authenticated request (Bundle 1+).
|
||||
actor_type:
|
||||
type: string
|
||||
enum: [User, System, Agent, APIKey, Anonymous]
|
||||
description: Actor-type discriminator (Bundle 1+).
|
||||
tenant_id:
|
||||
type: string
|
||||
description: Tenant the actor belongs to (Bundle 1 ships single-tenant `t-default`).
|
||||
admin_via_role:
|
||||
type: boolean
|
||||
description: True when the actor holds `r-admin`. Authoritative admin signal under Bundle 1+.
|
||||
roles:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: Role IDs (e.g. `r-admin`, `r-viewer`) the actor holds.
|
||||
effective_permissions:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
required: [permission, scope_type]
|
||||
properties:
|
||||
permission:
|
||||
type: string
|
||||
example: cert.bulk_revoke
|
||||
scope_type:
|
||||
type: string
|
||||
enum: [global, profile, issuer]
|
||||
scope_id:
|
||||
type: string
|
||||
"401":
|
||||
description: Unauthorized
|
||||
|
||||
# ─── Auth / RBAC (Bundle 1 Phase 4) ─────────────────────────────────
|
||||
# The RBAC primitive surface for managing roles, permissions, and the
|
||||
# role grants assigned to actors (API keys today; OIDC-federated users
|
||||
# in Bundle 2). Every mutating route runs through the service layer's
|
||||
# privilege-escalation guard — callers need `auth.role.assign` for
|
||||
# role grants on actors, `auth.role.create/edit/delete` for the role
|
||||
# lifecycle, `auth.key.*` for key management. Read endpoints require
|
||||
# `auth.role.list`. The /v1/auth/me endpoint has no permission gate
|
||||
# (every authenticated caller can read their own permissions).
|
||||
/api/v1/auth/bootstrap:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: Probe whether the day-0 bootstrap endpoint is callable
|
||||
description: |
|
||||
Returns `{available: true}` when CERTCTL_BOOTSTRAP_TOKEN is set
|
||||
AND no admin-roled actor exists yet; otherwise `{available: false}`.
|
||||
Auth-exempt because it serves the GUI / install one-liner before
|
||||
the first admin key has been minted. Bundle 1 Phase 6.
|
||||
security: []
|
||||
operationId: getAuthBootstrap
|
||||
responses:
|
||||
"200":
|
||||
description: Bootstrap availability
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [available]
|
||||
properties:
|
||||
available:
|
||||
type: boolean
|
||||
post:
|
||||
tags: [Auth]
|
||||
summary: Mint the first admin API key from a one-shot bootstrap token
|
||||
description: |
|
||||
Operator POSTs the CERTCTL_BOOTSTRAP_TOKEN value plus the desired
|
||||
admin-key name. Returns the freshly minted plaintext key value
|
||||
once; the server stores only the SHA-256 hash. Subsequent calls
|
||||
return 410 Gone (the strategy is one-shot AND the admin-existence
|
||||
probe re-closes the door once the new admin lands). Auth-exempt
|
||||
because the endpoint authenticates via the bootstrap token
|
||||
itself. Bundle 1 Phase 6.
|
||||
security: []
|
||||
operationId: postAuthBootstrap
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [token, actor_name]
|
||||
properties:
|
||||
token:
|
||||
type: string
|
||||
description: The CERTCTL_BOOTSTRAP_TOKEN value (constant-time compared server-side).
|
||||
actor_name:
|
||||
type: string
|
||||
description: 3-64 chars, lowercase alphanumeric + hyphen + underscore.
|
||||
pattern: "^[a-z0-9][a-z0-9_-]{2,63}$"
|
||||
responses:
|
||||
"201":
|
||||
description: Admin key minted
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [actor_id, api_key_id, key_value, created_at, message]
|
||||
properties:
|
||||
actor_id: { type: string }
|
||||
api_key_id: { type: string }
|
||||
key_value:
|
||||
type: string
|
||||
description: The plaintext API key. Capture this — it is shown only once.
|
||||
created_at: { type: string, format: date-time }
|
||||
message: { type: string }
|
||||
"400": { description: Invalid actor_name or malformed body }
|
||||
"401": { description: Bootstrap token mismatch }
|
||||
"410":
|
||||
description: |
|
||||
Endpoint disabled. Either CERTCTL_BOOTSTRAP_TOKEN is unset,
|
||||
an admin actor already exists, or the strategy was already
|
||||
consumed by a successful prior call.
|
||||
|
||||
/api/v1/auth/me:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: Current actor's roles + effective permissions
|
||||
description: |
|
||||
Returns the standing roles + effective permission set for the
|
||||
authenticated caller. This is the query the GUI uses to gate
|
||||
affordance rendering; /api/v1/auth/check returns the same shape
|
||||
on the boot path.
|
||||
operationId: getAuthMe
|
||||
responses:
|
||||
"200":
|
||||
description: Caller identity + roles + effective permissions
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [actor_id, actor_type, tenant_id, admin, roles, effective_permissions]
|
||||
properties:
|
||||
actor_id: { type: string }
|
||||
actor_type: { type: string, enum: [User, System, Agent, APIKey, Anonymous] }
|
||||
tenant_id: { type: string }
|
||||
admin: { type: boolean }
|
||||
roles:
|
||||
type: array
|
||||
items: { type: string }
|
||||
effective_permissions:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
required: [permission, scope_type]
|
||||
properties:
|
||||
permission: { type: string }
|
||||
scope_type: { type: string, enum: [global, profile, issuer] }
|
||||
scope_id: { type: string }
|
||||
"401":
|
||||
description: Unauthorized
|
||||
|
||||
/api/v1/auth/permissions:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: List canonical permission catalogue
|
||||
description: |
|
||||
Returns every permission name registered in the canonical
|
||||
catalogue. Used by the GUI's role editor to populate the
|
||||
"grant permission" picker. Permission: `auth.role.list`.
|
||||
operationId: listAuthPermissions
|
||||
responses:
|
||||
"200":
|
||||
description: Permission catalogue
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
permissions:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
required: [id, name, namespace]
|
||||
properties:
|
||||
id: { type: string }
|
||||
name: { type: string }
|
||||
namespace: { type: string }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
|
||||
/api/v1/auth/roles:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: List roles for the active tenant
|
||||
description: Permission `auth.role.list`. Returns every role registered for `t-default` (Bundle 1 single-tenant).
|
||||
operationId: listAuthRoles
|
||||
responses:
|
||||
"200":
|
||||
description: Role list
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
roles:
|
||||
type: array
|
||||
items: { $ref: "#/components/schemas/AuthRole" }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
post:
|
||||
tags: [Auth]
|
||||
summary: Create a custom role
|
||||
description: Permission `auth.role.create`. Default roles (`r-admin` / `r-operator` / `r-viewer` / `r-agent` / `r-mcp` / `r-cli` / `r-auditor`) are seeded by migration and immutable.
|
||||
operationId: createAuthRole
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [name]
|
||||
properties:
|
||||
name: { type: string }
|
||||
description: { type: string }
|
||||
responses:
|
||||
"201":
|
||||
description: Role created
|
||||
content:
|
||||
application/json:
|
||||
schema: { $ref: "#/components/schemas/AuthRole" }
|
||||
"400": { description: Validation error }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"409": { description: Role with that name already exists }
|
||||
|
||||
/api/v1/auth/roles/{id}:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: Get a role and its permissions
|
||||
description: Permission `auth.role.list`.
|
||||
operationId: getAuthRole
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
responses:
|
||||
"200":
|
||||
description: Role + permissions
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
role: { $ref: "#/components/schemas/AuthRole" }
|
||||
permissions:
|
||||
type: array
|
||||
items: { $ref: "#/components/schemas/AuthRolePermission" }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not found }
|
||||
put:
|
||||
tags: [Auth]
|
||||
summary: Update a custom role's name or description
|
||||
description: Permission `auth.role.edit`. Default roles cannot be renamed.
|
||||
operationId: updateAuthRole
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
name: { type: string }
|
||||
description: { type: string }
|
||||
responses:
|
||||
"200": { description: Updated }
|
||||
"400": { description: Validation error }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not found }
|
||||
"409": { description: Default role cannot be renamed / name collision }
|
||||
delete:
|
||||
tags: [Auth]
|
||||
summary: Delete a custom role
|
||||
description: Permission `auth.role.delete`. Fails with 409 when actors still hold the role (FK ON DELETE RESTRICT).
|
||||
operationId: deleteAuthRole
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
responses:
|
||||
"204": { description: Deleted }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not found }
|
||||
"409": { description: Role still has active actor assignments }
|
||||
|
||||
/api/v1/auth/roles/{id}/permissions:
|
||||
post:
|
||||
tags: [Auth]
|
||||
summary: Grant a permission to a role at a scope
|
||||
description: Permission `auth.role.edit`. ScopeType defaults to `global`; per-profile / per-issuer scopes require ScopeID.
|
||||
operationId: grantAuthRolePermission
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [permission]
|
||||
properties:
|
||||
permission: { type: string }
|
||||
scope_type:
|
||||
type: string
|
||||
enum: [global, profile, issuer]
|
||||
default: global
|
||||
scope_id: { type: string }
|
||||
responses:
|
||||
"204": { description: Granted }
|
||||
"400": { description: Permission not in canonical catalogue / scope_id missing for non-global scope }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not found }
|
||||
|
||||
/api/v1/auth/roles/{id}/permissions/{perm}:
|
||||
delete:
|
||||
tags: [Auth]
|
||||
summary: Revoke a permission from a role
|
||||
description: Permission `auth.role.edit`.
|
||||
operationId: revokeAuthRolePermission
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
- in: path
|
||||
name: perm
|
||||
required: true
|
||||
schema: { type: string }
|
||||
- in: query
|
||||
name: scope_type
|
||||
schema:
|
||||
type: string
|
||||
enum: [global, profile, issuer]
|
||||
- in: query
|
||||
name: scope_id
|
||||
schema: { type: string }
|
||||
responses:
|
||||
"204": { description: Revoked }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role or permission grant not found }
|
||||
|
||||
/api/v1/auth/keys:
|
||||
get:
|
||||
tags: [Auth]
|
||||
summary: List actors with role grants in the active tenant
|
||||
description: |
|
||||
Returns every distinct (actor_id, actor_type) pair in the
|
||||
tenant that holds at least one role grant. Bundle 1 Phase 7
|
||||
ships this so the CLI's `auth keys list` and scope-down helper
|
||||
can enumerate the operator-key population without joining
|
||||
against the env-var-loaded namedKeys directly. Permission
|
||||
`auth.role.list`.
|
||||
operationId: listAuthKeys
|
||||
responses:
|
||||
"200":
|
||||
description: Actor list with role assignments
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
properties:
|
||||
keys:
|
||||
type: array
|
||||
items:
|
||||
type: object
|
||||
required: [actor_id, actor_type, tenant_id, role_ids]
|
||||
properties:
|
||||
actor_id: { type: string }
|
||||
actor_type:
|
||||
type: string
|
||||
enum: [User, System, Agent, APIKey, Anonymous]
|
||||
tenant_id: { type: string }
|
||||
role_ids:
|
||||
type: array
|
||||
items: { type: string }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
|
||||
/api/v1/auth/keys/{id}/roles:
|
||||
post:
|
||||
tags: [Auth]
|
||||
summary: Assign a role to an API key
|
||||
description: Permission `auth.role.assign`. The reserved `actor-demo-anon` actor cannot be re-assigned.
|
||||
operationId: assignAuthKeyRole
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
requestBody:
|
||||
required: true
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
type: object
|
||||
required: [role_id]
|
||||
properties:
|
||||
role_id: { type: string }
|
||||
responses:
|
||||
"204": { description: Assigned }
|
||||
"400": { description: Validation error }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not found }
|
||||
"409": { description: Reserved system actor cannot be modified }
|
||||
|
||||
/api/v1/auth/keys/{id}/roles/{role_id}:
|
||||
delete:
|
||||
tags: [Auth]
|
||||
summary: Revoke a role from an API key
|
||||
description: Permission `auth.role.assign`. Revoking the synthetic `actor-demo-anon` admin grant is rejected.
|
||||
operationId: revokeAuthKeyRole
|
||||
parameters:
|
||||
- in: path
|
||||
name: id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
- in: path
|
||||
name: role_id
|
||||
required: true
|
||||
schema: { type: string }
|
||||
responses:
|
||||
"204": { description: Revoked }
|
||||
"401": { description: Unauthorized }
|
||||
"403": { description: Forbidden }
|
||||
"404": { description: Role not assigned to actor }
|
||||
"409": { description: Reserved system actor cannot be modified }
|
||||
|
||||
/api/v1/version:
|
||||
get:
|
||||
tags: [Health]
|
||||
@@ -205,7 +676,7 @@ paths:
|
||||
go_version:
|
||||
type: string
|
||||
description: Go toolchain version that compiled the binary (runtime.Version())
|
||||
example: go1.25.9
|
||||
example: go1.25.10
|
||||
|
||||
# ─── Certificates ────────────────────────────────────────────────────
|
||||
/api/v1/certificates:
|
||||
@@ -2708,10 +3179,22 @@ paths:
|
||||
get:
|
||||
tags: [Audit]
|
||||
summary: List audit events
|
||||
description: |
|
||||
Bundle 1 Phase 8 adds the optional `category` query parameter
|
||||
for auditor-role filtering. Allowed values: `cert_lifecycle`
|
||||
(cert/agent/deployment events), `auth` (role/key/bootstrap
|
||||
mutations), `config` (issuer/target/settings edits). Omitting
|
||||
the parameter returns every category.
|
||||
operationId: listAuditEvents
|
||||
parameters:
|
||||
- $ref: "#/components/parameters/page"
|
||||
- $ref: "#/components/parameters/per_page"
|
||||
- in: query
|
||||
name: category
|
||||
schema:
|
||||
type: string
|
||||
enum: [cert_lifecycle, auth, config]
|
||||
description: Filter to events of this event_category. (Bundle 1 Phase 8)
|
||||
responses:
|
||||
"200":
|
||||
description: Paginated list of audit events
|
||||
@@ -2726,6 +3209,8 @@ paths:
|
||||
type: array
|
||||
items:
|
||||
$ref: "#/components/schemas/AuditEvent"
|
||||
"400":
|
||||
description: Invalid `category` value
|
||||
"500":
|
||||
$ref: "#/components/responses/InternalError"
|
||||
|
||||
@@ -4309,6 +4794,27 @@ components:
|
||||
type: http
|
||||
scheme: bearer
|
||||
description: API key passed as Bearer token. Configure via CERTCTL_AUTH_SECRET.
|
||||
# Auth Bundle 2 Phase 5 — session-cookie auth scheme. New
|
||||
# session-authenticated endpoints declare
|
||||
# `security: [{cookieAuth: []}, {bearerAuth: []}]` (either auth
|
||||
# method works, OR semantics). Per Phase 5 spec, the
|
||||
# `/auth/oidc/back-channel-logout` endpoint declares `security: []`
|
||||
# because auth comes from the IdP-signed logout token in the body,
|
||||
# not certctl-issued credentials.
|
||||
cookieAuth:
|
||||
type: apiKey
|
||||
in: cookie
|
||||
name: certctl_session
|
||||
description: |
|
||||
Session cookie minted by `POST /auth/oidc/callback` after a
|
||||
successful OIDC handshake (Auth Bundle 2). Wire format
|
||||
`v1.<session_id>.<signing_key_id>.<HMAC-SHA256>`; HMAC is
|
||||
verified server-side against the active session signing key.
|
||||
Cookie attributes: `Secure` `HttpOnly` `SameSite=Lax|Strict`
|
||||
(configurable via `CERTCTL_SESSION_SAMESITE`) `Path=/`.
|
||||
State-changing requests additionally require the
|
||||
`X-CSRF-Token` header to match the SHA-256 hash on the
|
||||
session row (validated by the session middleware in Phase 6).
|
||||
|
||||
parameters:
|
||||
resourceId:
|
||||
@@ -4361,6 +4867,45 @@ components:
|
||||
$ref: "#/components/schemas/ErrorResponse"
|
||||
|
||||
schemas:
|
||||
# ─── Auth / RBAC (Bundle 1 Phase 4) ─────────────────────────────
|
||||
AuthRole:
|
||||
type: object
|
||||
required: [id, tenant_id, name]
|
||||
properties:
|
||||
id:
|
||||
type: string
|
||||
description: Role ID (`r-` prefix).
|
||||
example: r-admin
|
||||
tenant_id:
|
||||
type: string
|
||||
example: t-default
|
||||
name:
|
||||
type: string
|
||||
example: admin
|
||||
description:
|
||||
type: string
|
||||
created_at:
|
||||
type: string
|
||||
format: date-time
|
||||
updated_at:
|
||||
type: string
|
||||
format: date-time
|
||||
|
||||
AuthRolePermission:
|
||||
type: object
|
||||
required: [role_id, permission_id, scope_type]
|
||||
properties:
|
||||
role_id:
|
||||
type: string
|
||||
permission_id:
|
||||
type: string
|
||||
scope_type:
|
||||
type: string
|
||||
enum: [global, profile, issuer]
|
||||
scope_id:
|
||||
type: string
|
||||
description: NULL/absent for global scope; profile/issuer ID otherwise.
|
||||
|
||||
# ─── Approvals ───────────────────────────────────────────────────
|
||||
ApprovalRequest:
|
||||
type: object
|
||||
@@ -5311,6 +5856,13 @@ components:
|
||||
timestamp:
|
||||
type: string
|
||||
format: date-time
|
||||
event_category:
|
||||
type: string
|
||||
enum: [cert_lifecycle, auth, config]
|
||||
description: |
|
||||
Bundle 1 Phase 8: classifies the event for auditor-role
|
||||
filtering. Empty / absent on rows from pre-Phase-8
|
||||
deployments (the migration backfills "cert_lifecycle").
|
||||
|
||||
# ─── Notifications ───────────────────────────────────────────────
|
||||
NotificationType:
|
||||
|
||||
+1
-1
@@ -64,7 +64,7 @@ type AgentConfig struct {
|
||||
// ErrAgentRetired is the sentinel returned by [Agent.Run] when the control
|
||||
// plane responds with HTTP 410 Gone to a heartbeat or work-poll request — the
|
||||
// canonical signal that this agent's row has been soft-retired server-side
|
||||
// (see I-004 in cowork/certctl-coverage-gap-audit.md). The binary must
|
||||
// (see I-004 in the project's coverage-gap audit). The binary must
|
||||
// terminate cleanly: an init-system restart would only produce another 410
|
||||
// and wedge the host in a restart loop. main() translates this sentinel into
|
||||
// a zero exit code so systemd (Restart=on-failure) and launchd do not respawn
|
||||
|
||||
@@ -163,14 +163,79 @@ func TestHandleCerts_Revoke_HitsClientPath(t *testing.T) {
|
||||
}))
|
||||
t.Cleanup(srv.Close)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"}); err != nil {
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-2): reason must be a canonical
|
||||
// RFC 5280 §5.3.1 code (camelCase or snake_case both accepted; this
|
||||
// test asserts the snake_case path normalises to the camelCase wire
|
||||
// format that the local issuer + ACME server expect).
|
||||
if err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "key_compromise"}); err != nil {
|
||||
t.Errorf("handleCerts({revoke ...}): err=%v", err)
|
||||
}
|
||||
if lastMethod != "POST" || !strings.Contains(lastPath, "/revoke") {
|
||||
t.Errorf("expected POST .../revoke, got %s %s", lastMethod, lastPath)
|
||||
}
|
||||
if !strings.Contains(lastBody, "compromise") {
|
||||
t.Errorf("expected reason in body, got %q", lastBody)
|
||||
if !strings.Contains(lastBody, "keyCompromise") {
|
||||
t.Errorf("expected normalised reason 'keyCompromise' in body, got %q", lastBody)
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Revoke_RequiresReason pins the 2026-05-05 parity-defaults-
|
||||
// cleanup (P3-2, Option A) strict-reason contract: empty --reason is a
|
||||
// fatal error, not a silent fallback to "unspecified".
|
||||
func TestHandleCerts_Revoke_RequiresReason(t *testing.T) {
|
||||
srv := stubServer(t, 200, `{}`)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
err := handleCerts(c, []string{"revoke", "mc-x"})
|
||||
if err == nil {
|
||||
t.Fatal("expected error when --reason is omitted; got nil (regression on P3-2 strict path)")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "reason") {
|
||||
t.Errorf("expected error to mention 'reason', got %q", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Revoke_RejectsUnknownReason pins that off-RFC reason
|
||||
// codes are rejected at the CLI dispatch layer (P3-2 anti-typo guard).
|
||||
func TestHandleCerts_Revoke_RejectsUnknownReason(t *testing.T) {
|
||||
srv := stubServer(t, 200, `{}`)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
err := handleCerts(c, []string{"revoke", "mc-x", "--reason", "compromise"})
|
||||
if err == nil {
|
||||
t.Fatal("expected error for non-canonical reason; got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "compromise") {
|
||||
t.Errorf("expected error to echo bad reason 'compromise', got %q", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// TestHandleCerts_Renew_ForceFlag pins the 2026-05-05 parity-defaults-
|
||||
// cleanup (P3-1) wire: --force on the renew dispatch sends ?force=true.
|
||||
// CLI convention: ID is positional and precedes the flags (matches
|
||||
// `agents retire <id> [--force]`), so the flag MUST come after the ID.
|
||||
func TestHandleCerts_Renew_ForceFlag(t *testing.T) {
|
||||
for _, tc := range []struct {
|
||||
name string
|
||||
args []string
|
||||
wantQuery string
|
||||
}{
|
||||
{"no-force", []string{"renew", "mc-x"}, ""},
|
||||
{"force-after-id", []string{"renew", "mc-x", "--force"}, "force=true"},
|
||||
} {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
var lastQuery string
|
||||
srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
lastQuery = r.URL.RawQuery
|
||||
w.WriteHeader(200)
|
||||
_, _ = w.Write([]byte(`{}`))
|
||||
}))
|
||||
t.Cleanup(srv.Close)
|
||||
c := newDispatchTestClient(t, srv)
|
||||
if err := handleCerts(c, tc.args); err != nil {
|
||||
t.Fatalf("handleCerts: %v", err)
|
||||
}
|
||||
if lastQuery != tc.wantQuery {
|
||||
t.Errorf("query: got %q want %q", lastQuery, tc.wantQuery)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
+181
-11
@@ -111,6 +111,8 @@ Examples:
|
||||
err = handleEST(client, cmdArgs)
|
||||
case "status":
|
||||
err = handleStatus(client)
|
||||
case "auth":
|
||||
err = handleAuth(client, cmdArgs)
|
||||
case "version":
|
||||
fmt.Println("certctl-cli version 0.1.0")
|
||||
default:
|
||||
@@ -144,22 +146,70 @@ func handleCerts(client *cli.Client, args []string) error {
|
||||
}
|
||||
return client.GetCertificate(subArgs[0])
|
||||
case "renew":
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-1): expose --force as an
|
||||
// explicit operator flag instead of the historical hardcoded
|
||||
// `force=false` body field. force=true overrides the server-side
|
||||
// RenewalInProgress block — used to recover stuck in-flight
|
||||
// renewals. Archived/Expired remain terminal regardless.
|
||||
//
|
||||
// CLI convention: `certs renew <id> [--force]` — the ID is a
|
||||
// positional arg that precedes the flags. Mirrors `agents retire
|
||||
// <id>`'s pattern (Go's flag package stops at the first non-flag
|
||||
// token, so we pull subArgs[0] as the ID and hand subArgs[1:] to
|
||||
// the flag parser).
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs renew <id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.RenewCertificate(subArgs[0])
|
||||
case "revoke":
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs revoke <id> [--reason <reason>]\n")
|
||||
fmt.Fprintf(os.Stderr, "usage: certs renew <id> [--force]\n")
|
||||
return nil
|
||||
}
|
||||
id := subArgs[0]
|
||||
reason := "unspecified"
|
||||
if len(subArgs) > 2 && subArgs[1] == "--reason" {
|
||||
reason = subArgs[2]
|
||||
fs := flag.NewFlagSet("certs renew", flag.ContinueOnError)
|
||||
force := fs.Bool("force", false, "Force renewal even when the cert is currently in RenewalInProgress (clears stuck in-flight renewals; does NOT override Archived/Expired terminal states)")
|
||||
if err := fs.Parse(subArgs[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
return client.RevokeCertificate(id, reason)
|
||||
return client.RenewCertificate(id, *force)
|
||||
case "revoke":
|
||||
// 2026-05-05 parity-defaults-cleanup (P3-2, Option A): --reason is
|
||||
// strictly required. Empty reason refuses to dispatch and prints
|
||||
// the RFC 5280 §5.3.1 reason-code menu so operators pick a real
|
||||
// value. The pre-2026-05-05 silent fallback to "unspecified"
|
||||
// defeated compliance reporting (PCI-DSS §3.6, HIPAA §164.312)
|
||||
// because every revocation looked the same in the audit trail.
|
||||
//
|
||||
// CLI convention: `certs revoke <id> --reason <reason>` — same
|
||||
// ID-first ordering as `certs renew`.
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: certs revoke <id> --reason <reason>\n")
|
||||
fmt.Fprintf(os.Stderr, "\nValid RFC 5280 §5.3.1 reasons:\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
id := subArgs[0]
|
||||
fs := flag.NewFlagSet("certs revoke", flag.ContinueOnError)
|
||||
reason := fs.String("reason", "", "RFC 5280 revocation reason (required). Valid values: keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, removeFromCRL, privilegeWithdrawn, aaCompromise, unspecified")
|
||||
if err := fs.Parse(subArgs[1:]); err != nil {
|
||||
return err
|
||||
}
|
||||
if *reason == "" {
|
||||
fmt.Fprintf(os.Stderr, "error: --reason is required (no silent fallback to 'unspecified' — pick a real RFC 5280 §5.3.1 code).\n\n")
|
||||
fmt.Fprintf(os.Stderr, "Valid reasons:\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return fmt.Errorf("--reason is required")
|
||||
}
|
||||
canonical, ok := cli.NormalizeRevokeReason(*reason)
|
||||
if !ok {
|
||||
fmt.Fprintf(os.Stderr, "error: %q is not a valid RFC 5280 §5.3.1 reason code.\n\n", *reason)
|
||||
fmt.Fprintf(os.Stderr, "Valid reasons (camelCase or snake_case both accepted):\n")
|
||||
for _, r := range cli.ValidRevokeReasons() {
|
||||
fmt.Fprintf(os.Stderr, " %s\n", r)
|
||||
}
|
||||
return fmt.Errorf("invalid --reason: %q", *reason)
|
||||
}
|
||||
return client.RevokeCertificate(id, canonical)
|
||||
case "bulk-revoke":
|
||||
return client.BulkRevokeCertificates(subArgs)
|
||||
default:
|
||||
@@ -316,3 +366,123 @@ func validateHTTPSScheme(serverURL string) error {
|
||||
return fmt.Errorf("server URL %q uses unsupported scheme %q — expected https://", serverURL, u.Scheme)
|
||||
}
|
||||
}
|
||||
|
||||
// handleAuth dispatches the `certctl-cli auth ...` subcommand tree.
|
||||
// Bundle 1 Phase 5: ships read + grant operations against the
|
||||
// /api/v1/auth/* surface introduced in Phase 4. Mutations like role
|
||||
// create / update / delete can be added in a Phase 5.5 follow-up; this
|
||||
// commit ships the operator-facing subset most useful for migration
|
||||
// and day-2 scope-down (`auth keys list` + `auth keys assign` +
|
||||
// `auth me`).
|
||||
func handleAuth(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth <roles|permissions|keys|me> [...]\n")
|
||||
return nil
|
||||
}
|
||||
subcommand := args[0]
|
||||
subArgs := args[1:]
|
||||
|
||||
switch subcommand {
|
||||
case "roles":
|
||||
return handleAuthRoles(client, subArgs)
|
||||
case "permissions":
|
||||
return handleAuthPermissions(client, subArgs)
|
||||
case "keys":
|
||||
return handleAuthKeys(client, subArgs)
|
||||
case "me":
|
||||
return client.AuthMe()
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown auth subcommand: %s\n", subcommand)
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
func handleAuthRoles(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth roles <list|get> [id]\n")
|
||||
return nil
|
||||
}
|
||||
switch args[0] {
|
||||
case "list":
|
||||
return client.AuthListRoles()
|
||||
case "get":
|
||||
if len(args) < 2 {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth roles get <id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.AuthGetRole(args[1])
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown roles subcommand: %s\n", args[0])
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
func handleAuthPermissions(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 || args[0] != "list" {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth permissions list\n")
|
||||
return nil
|
||||
}
|
||||
return client.AuthListPermissions()
|
||||
}
|
||||
|
||||
func handleAuthKeys(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth keys <list|assign|revoke|scope-down> [...]\n")
|
||||
return nil
|
||||
}
|
||||
switch args[0] {
|
||||
case "list":
|
||||
return client.AuthListKeys()
|
||||
case "assign":
|
||||
// auth keys assign <key-id> --role <role-id>
|
||||
if len(args) < 4 || args[2] != "--role" {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth keys assign <key-id> --role <role-id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.AuthAssignRoleToKey(args[1], args[3])
|
||||
case "revoke":
|
||||
// auth keys revoke <key-id> --role <role-id>
|
||||
if len(args) < 4 || args[2] != "--role" {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth keys revoke <key-id> --role <role-id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.AuthRevokeRoleFromKey(args[1], args[3])
|
||||
case "scope-down":
|
||||
// Bundle 1 Phase 7 — interactive (default), --non-interactive
|
||||
// <config.json>, or --suggest [--apply].
|
||||
return handleAuthKeysScopeDown(client, args[1:])
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown keys subcommand: %s\n", args[0])
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// handleAuthKeysScopeDown dispatches the three scope-down modes:
|
||||
//
|
||||
// auth keys scope-down → interactive
|
||||
// auth keys scope-down --non-interactive <config> → JSON-driven
|
||||
// auth keys scope-down --suggest [--apply] → audit-driven suggestions
|
||||
func handleAuthKeysScopeDown(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 {
|
||||
return client.AuthScopeDown()
|
||||
}
|
||||
switch args[0] {
|
||||
case "--non-interactive":
|
||||
if len(args) < 2 {
|
||||
fmt.Fprintf(os.Stderr, "usage: auth keys scope-down --non-interactive <config.json>\n")
|
||||
return nil
|
||||
}
|
||||
return client.AuthScopeDownNonInteractive(args[1])
|
||||
case "--suggest":
|
||||
apply := false
|
||||
for _, a := range args[1:] {
|
||||
if a == "--apply" {
|
||||
apply = true
|
||||
}
|
||||
}
|
||||
return client.AuthScopeDownSuggest(apply)
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown scope-down flag: %s\n", args[0])
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,105 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"strings"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
"github.com/certctl-io/certctl/internal/domain"
|
||||
authdomain "github.com/certctl-io/certctl/internal/domain/auth"
|
||||
)
|
||||
|
||||
// assembleNamedAPIKeys translates the operator's CERTCTL_API_KEYS_NAMED
|
||||
// env-var (preferred) or CERTCTL_AUTH_SECRET (legacy) into the
|
||||
// auth.NamedAPIKey slice the rest of the boot path consumes.
|
||||
//
|
||||
// Authentication unification (M-002): every authenticated request now
|
||||
// carries a named actor in the request context so audit events record
|
||||
// the real key identity instead of the hardcoded "api-key-user"
|
||||
// string. Named keys come from CERTCTL_API_KEYS_NAMED (preferred). For
|
||||
// backward compatibility CERTCTL_AUTH_SECRET is synthesized into
|
||||
// legacy-key-N entries with Admin=false.
|
||||
func assembleNamedAPIKeys(cfg *config.Config, logger *slog.Logger) []auth.NamedAPIKey {
|
||||
if config.AuthType(cfg.Auth.Type) == config.AuthTypeNone {
|
||||
return nil
|
||||
}
|
||||
var out []auth.NamedAPIKey
|
||||
for _, nk := range cfg.Auth.NamedKeys {
|
||||
out = append(out, auth.NamedAPIKey{
|
||||
Name: nk.Name,
|
||||
Key: nk.Key,
|
||||
Admin: nk.Admin,
|
||||
})
|
||||
}
|
||||
if len(out) == 0 && cfg.Auth.Secret != "" {
|
||||
idx := 0
|
||||
for _, p := range strings.Split(cfg.Auth.Secret, ",") {
|
||||
p = strings.TrimSpace(p)
|
||||
if p == "" {
|
||||
continue
|
||||
}
|
||||
out = append(out, auth.NamedAPIKey{
|
||||
Name: fmt.Sprintf("legacy-key-%d", idx),
|
||||
Key: p,
|
||||
Admin: false,
|
||||
})
|
||||
idx++
|
||||
}
|
||||
if len(out) > 0 && logger != nil {
|
||||
logger.Warn("CERTCTL_AUTH_SECRET is deprecated — set CERTCTL_API_KEYS_NAMED for named actor attribution and admin gating",
|
||||
"synthesized_keys", len(out))
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
// actorRoleGranter is the narrow interface backfillNamedKeyActorRoles
|
||||
// needs from the postgres ActorRoleRepository. Pulled out so the unit
|
||||
// test can inject a fake without spinning up the full repo / DB.
|
||||
type actorRoleGranter interface {
|
||||
Grant(ctx context.Context, ar *authdomain.ActorRole) error
|
||||
}
|
||||
|
||||
// backfillNamedKeyActorRoles is the Bundle 1 Phase 3 closure (C2)
|
||||
// startup hook that ensures every CERTCTL_API_KEYS_NAMED entry — and
|
||||
// every legacy CERTCTL_AUTH_SECRET synthesized fallback — has an
|
||||
// actor_roles row before the HTTP server accepts requests. Admin-flagged
|
||||
// keys grant `r-admin` (full canonical permission set); non-admin keys
|
||||
// grant `r-viewer` (read-only surface), matching the pre-Phase-3.5
|
||||
// capability shape.
|
||||
//
|
||||
// Idempotent via ON CONFLICT DO NOTHING in the repo Grant — reboots
|
||||
// don't create duplicates. Failures are logged but non-fatal: the server
|
||||
// still starts, and the operator can fix the grant via the RBAC API.
|
||||
//
|
||||
// The function is package-private + extracted from main() so the unit
|
||||
// test in auth_backfill_test.go can pin the role-mapping invariant
|
||||
// without depending on the full server bootstrap path.
|
||||
func backfillNamedKeyActorRoles(
|
||||
ctx context.Context,
|
||||
repo actorRoleGranter,
|
||||
keys []auth.NamedAPIKey,
|
||||
logger *slog.Logger,
|
||||
) {
|
||||
for _, nk := range keys {
|
||||
role := authdomain.RoleIDViewer
|
||||
if nk.Admin {
|
||||
role = authdomain.RoleIDAdmin
|
||||
}
|
||||
if err := repo.Grant(ctx, &authdomain.ActorRole{
|
||||
ActorID: nk.Name,
|
||||
ActorType: authdomain.ActorTypeValue(domain.ActorTypeAPIKey),
|
||||
RoleID: role,
|
||||
TenantID: authdomain.DefaultTenantID,
|
||||
GrantedBy: "bootstrap",
|
||||
}); err != nil {
|
||||
if logger != nil {
|
||||
logger.Warn("api-key actor-role backfill failed; key authenticates but RBAC routes will 403 until grant is added via /v1/auth/keys",
|
||||
"key", nk.Name, "role", role, "err", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,116 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"errors"
|
||||
"io"
|
||||
"log/slog"
|
||||
"testing"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
authdomain "github.com/certctl-io/certctl/internal/domain/auth"
|
||||
)
|
||||
|
||||
// fakeGranter is a tiny in-memory stand-in for the postgres ActorRoleRepository
|
||||
// — enough surface area for backfillNamedKeyActorRoles to call Grant against.
|
||||
type fakeGranter struct {
|
||||
calls []*authdomain.ActorRole
|
||||
err error
|
||||
}
|
||||
|
||||
func (f *fakeGranter) Grant(_ context.Context, ar *authdomain.ActorRole) error {
|
||||
f.calls = append(f.calls, ar)
|
||||
return f.err
|
||||
}
|
||||
|
||||
// TestBackfillNamedKeyActorRoles_RoleMapping pins the Bundle 1 Phase 3
|
||||
// closure (C2) invariant: admin-flagged named keys grant r-admin,
|
||||
// non-admin keys grant r-viewer, both at TenantID t-default with
|
||||
// ActorType APIKey and GrantedBy=bootstrap.
|
||||
func TestBackfillNamedKeyActorRoles_RoleMapping(t *testing.T) {
|
||||
repo := &fakeGranter{}
|
||||
logger := slog.New(slog.NewTextHandler(io.Discard, nil))
|
||||
|
||||
keys := []auth.NamedAPIKey{
|
||||
{Name: "alice-admin", Key: "AAA", Admin: true},
|
||||
{Name: "bob-viewer", Key: "BBB", Admin: false},
|
||||
{Name: "carol-admin", Key: "CCC", Admin: true},
|
||||
}
|
||||
backfillNamedKeyActorRoles(context.Background(), repo, keys, logger)
|
||||
|
||||
if len(repo.calls) != 3 {
|
||||
t.Fatalf("Grant call count = %d, want 3", len(repo.calls))
|
||||
}
|
||||
type want struct {
|
||||
actor, role string
|
||||
}
|
||||
wants := []want{
|
||||
{actor: "alice-admin", role: authdomain.RoleIDAdmin},
|
||||
{actor: "bob-viewer", role: authdomain.RoleIDViewer},
|
||||
{actor: "carol-admin", role: authdomain.RoleIDAdmin},
|
||||
}
|
||||
for i, w := range wants {
|
||||
got := repo.calls[i]
|
||||
if got.ActorID != w.actor {
|
||||
t.Errorf("call[%d].ActorID = %q, want %q", i, got.ActorID, w.actor)
|
||||
}
|
||||
if got.RoleID != w.role {
|
||||
t.Errorf("call[%d].RoleID = %q, want %q", i, got.RoleID, w.role)
|
||||
}
|
||||
if got.TenantID != authdomain.DefaultTenantID {
|
||||
t.Errorf("call[%d].TenantID = %q, want %q", i, got.TenantID, authdomain.DefaultTenantID)
|
||||
}
|
||||
if string(got.ActorType) != "APIKey" {
|
||||
t.Errorf("call[%d].ActorType = %q, want APIKey", i, got.ActorType)
|
||||
}
|
||||
if got.GrantedBy != "bootstrap" {
|
||||
t.Errorf("call[%d].GrantedBy = %q, want bootstrap", i, got.GrantedBy)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestBackfillNamedKeyActorRoles_EmptyKeysIsNoOp confirms the boot path
|
||||
// is safe when no named keys are configured (typical CERTCTL_AUTH_TYPE=
|
||||
// none deploy). No Grant calls; no panic.
|
||||
func TestBackfillNamedKeyActorRoles_EmptyKeysIsNoOp(t *testing.T) {
|
||||
repo := &fakeGranter{}
|
||||
logger := slog.New(slog.NewTextHandler(io.Discard, nil))
|
||||
backfillNamedKeyActorRoles(context.Background(), repo, nil, logger)
|
||||
if len(repo.calls) != 0 {
|
||||
t.Errorf("Grant called %d times for empty keys, want 0", len(repo.calls))
|
||||
}
|
||||
}
|
||||
|
||||
// TestBackfillNamedKeyActorRoles_GrantErrorIsNonFatal confirms the
|
||||
// closure invariant that a Grant failure logs a warning and proceeds
|
||||
// rather than crashing the server during boot. Subsequent keys still
|
||||
// get processed.
|
||||
func TestBackfillNamedKeyActorRoles_GrantErrorIsNonFatal(t *testing.T) {
|
||||
repo := &fakeGranter{err: errors.New("simulated DB error")}
|
||||
logger := slog.New(slog.NewTextHandler(io.Discard, nil))
|
||||
|
||||
keys := []auth.NamedAPIKey{
|
||||
{Name: "alice", Key: "A", Admin: true},
|
||||
{Name: "bob", Key: "B", Admin: false},
|
||||
}
|
||||
// Should not panic.
|
||||
backfillNamedKeyActorRoles(context.Background(), repo, keys, logger)
|
||||
|
||||
if len(repo.calls) != 2 {
|
||||
t.Errorf("Grant calls = %d, want 2 (every key processed even when prior Grant errored)", len(repo.calls))
|
||||
}
|
||||
}
|
||||
|
||||
// TestBackfillNamedKeyActorRoles_NilLoggerIsSafe pins that callers
|
||||
// passing nil for the logger don't NPE the goroutine. Belt-and-braces
|
||||
// for tests + future call sites that may not have a logger plumbed.
|
||||
func TestBackfillNamedKeyActorRoles_NilLoggerIsSafe(t *testing.T) {
|
||||
repo := &fakeGranter{err: errors.New("simulated")}
|
||||
keys := []auth.NamedAPIKey{
|
||||
{Name: "alice", Key: "A", Admin: true},
|
||||
}
|
||||
backfillNamedKeyActorRoles(context.Background(), repo, keys, nil)
|
||||
if len(repo.calls) != 1 {
|
||||
t.Errorf("Grant calls = %d, want 1", len(repo.calls))
|
||||
}
|
||||
}
|
||||
+657
-43
@@ -5,6 +5,7 @@ import (
|
||||
"crypto"
|
||||
"crypto/tls"
|
||||
"crypto/x509"
|
||||
"encoding/json"
|
||||
"encoding/pem"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
@@ -21,6 +22,13 @@ import (
|
||||
"github.com/certctl-io/certctl/internal/api/handler"
|
||||
"github.com/certctl-io/certctl/internal/api/middleware"
|
||||
"github.com/certctl-io/certctl/internal/api/router"
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
"github.com/certctl-io/certctl/internal/auth/bootstrap"
|
||||
"github.com/certctl-io/certctl/internal/auth/breakglass"
|
||||
oidcsvc "github.com/certctl-io/certctl/internal/auth/oidc"
|
||||
oidcdomain "github.com/certctl-io/certctl/internal/auth/oidc/domain"
|
||||
"github.com/certctl-io/certctl/internal/auth/session"
|
||||
userdomain "github.com/certctl-io/certctl/internal/auth/user/domain"
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
discoveryawssm "github.com/certctl-io/certctl/internal/connector/discovery/awssm"
|
||||
discoveryazurekv "github.com/certctl-io/certctl/internal/connector/discovery/azurekv"
|
||||
@@ -32,11 +40,14 @@ import (
|
||||
notifyteams "github.com/certctl-io/certctl/internal/connector/notifier/teams"
|
||||
"github.com/certctl-io/certctl/internal/crypto/signer"
|
||||
"github.com/certctl-io/certctl/internal/domain"
|
||||
authdomainAlias "github.com/certctl-io/certctl/internal/domain/auth"
|
||||
"github.com/certctl-io/certctl/internal/ratelimit"
|
||||
"github.com/certctl-io/certctl/internal/repository"
|
||||
"github.com/certctl-io/certctl/internal/repository/postgres"
|
||||
"github.com/certctl-io/certctl/internal/scep/intune"
|
||||
"github.com/certctl-io/certctl/internal/scheduler"
|
||||
"github.com/certctl-io/certctl/internal/service"
|
||||
authsvc "github.com/certctl-io/certctl/internal/service/auth"
|
||||
"github.com/certctl-io/certctl/internal/trustanchor"
|
||||
)
|
||||
|
||||
@@ -58,9 +69,22 @@ func main() {
|
||||
// unsupported auth shape. The error path uses fmt.Fprintf because
|
||||
// the slog logger is constructed from cfg below this point; we want
|
||||
// the failure to be visible regardless of log-level configuration.
|
||||
//
|
||||
// Auth Bundle 2 Phase 0: AuthTypeOIDC is in ValidAuthTypes() but the
|
||||
// session middleware + OIDC handler chain ship in later phases. An
|
||||
// operator who sets CERTCTL_AUTH_TYPE=oidc on a Bundle-2-incomplete
|
||||
// deployment must NOT silently fall back to api-key (the silent
|
||||
// auth-downgrade failure mode that drove G-1 in the first place).
|
||||
// The OIDC case below refuses-to-start with an actionable message.
|
||||
// Phase 6 of Bundle 2 (session middleware wiring) relaxes this case
|
||||
// to fall through alongside the api-key + none cases.
|
||||
switch config.AuthType(cfg.Auth.Type) {
|
||||
case config.AuthTypeAPIKey, config.AuthTypeNone:
|
||||
// ok — fall through
|
||||
case config.AuthTypeOIDC:
|
||||
fmt.Fprintf(os.Stderr,
|
||||
"CERTCTL_AUTH_TYPE=oidc: the OIDC auth chain is not yet wired in this build (Auth Bundle 2 Phase 6 ships the session middleware that consumes this auth-type literal). Set CERTCTL_AUTH_TYPE=api-key or run an authenticating gateway with CERTCTL_AUTH_TYPE=none until Bundle 2 lands. See cowork/auth-bundle-2-prompt.md.\n")
|
||||
os.Exit(1)
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr,
|
||||
"unsupported auth type at runtime: %q (valid: %v) — config validation should have caught this; refusing to start\n",
|
||||
@@ -251,6 +275,301 @@ func main() {
|
||||
|
||||
// Initialize services (following the dependency graph)
|
||||
auditService := service.NewAuditService(auditRepo)
|
||||
|
||||
// Audit 2026-05-11 A-8 closure: detect residual actor-demo-anon
|
||||
// grants under non-`none` auth types. Defaults to WARN-only; flip
|
||||
// CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true to fail-closed. Closes
|
||||
// the deferred Phase 2 leg of the 2026-05-10 HIGH-12 closure.
|
||||
{
|
||||
preflightCtx, preflightCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := preflightDemoModeResidual(preflightCtx, cfg, db, auditService, logger); err != nil {
|
||||
preflightCancel()
|
||||
logger.Error("startup refused: actor-demo-anon residual grants present + CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true",
|
||||
"error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
preflightCancel()
|
||||
}
|
||||
|
||||
// RBAC primitive (Bundle 1 Phase 4). Wires the postgres auth repos
|
||||
// + service-layer Authorizer that the AuthHandler / RequirePermission
|
||||
// middleware uses. Migration 000029_rbac.up.sql provides the schema
|
||||
// and seeds the seven default roles + canonical permission catalogue
|
||||
// + actor-demo-anon synthetic admin (CERTCTL_AUTH_TYPE=none demo path).
|
||||
authRoleRepo := postgres.NewRoleRepository(db)
|
||||
authPermRepo := postgres.NewPermissionRepository(db)
|
||||
authActorRoleRepo := postgres.NewActorRoleRepository(db)
|
||||
authAPIKeyRepo := postgres.NewAPIKeyRepository(db)
|
||||
authAuthorizer := authsvc.NewAuthorizer(authActorRoleRepo)
|
||||
// authCheckerAdapter bridges authsvc.Authorizer (typed-string args)
|
||||
// to the auth.PermissionChecker interface (plain-string args) so
|
||||
// internal/auth doesn't have to import internal/service/auth.
|
||||
authCheckerAdapter := authPermissionCheckerAdapter{a: authAuthorizer}
|
||||
|
||||
// Bundle 1 Phase 6 — parse env-var named API keys + assemble the
|
||||
// runtime keystore + wire the bootstrap service. The keystore +
|
||||
// bootstrap handler must exist before the HandlerRegistry is
|
||||
// constructed below; the auth middleware that reads from the same
|
||||
// keystore is wired further down (next to the rest of the
|
||||
// middleware stack) but holds a reference to the same keystore so
|
||||
// runtime additions from bootstrap propagate without restart.
|
||||
//
|
||||
// boot-path operations use context.Background() because the long-
|
||||
// lived request context isn't constructed until later in main();
|
||||
// this matches the convention used by other one-shot setup calls
|
||||
// in this section (issuerService.SeedFromEnvVars, etc.).
|
||||
bootCtx := context.Background()
|
||||
namedKeys := assembleNamedAPIKeys(cfg, logger)
|
||||
backfillNamedKeyActorRoles(bootCtx, authActorRoleRepo, namedKeys, logger)
|
||||
authKeyStore := auth.NewMutableKeyStore(namedKeys)
|
||||
if persistedKeys, err := authAPIKeyRepo.List(bootCtx, authdomainAlias.DefaultTenantID); err == nil {
|
||||
for _, pk := range persistedKeys {
|
||||
authKeyStore.AddHashed(pk.Name, pk.KeyHash, pk.Admin)
|
||||
}
|
||||
if len(persistedKeys) > 0 {
|
||||
logger.Info("loaded persisted api_keys into runtime keystore",
|
||||
"count", len(persistedKeys))
|
||||
}
|
||||
} else {
|
||||
logger.Warn("api_keys boot loader failed; bootstrap-minted keys will not authenticate until next restart that succeeds",
|
||||
"err", err)
|
||||
}
|
||||
bootstrapStrategy := bootstrap.NewEnvTokenStrategy(
|
||||
cfg.Auth.BootstrapToken,
|
||||
func(ctx context.Context) (bool, error) {
|
||||
return authActorRoleRepo.AdminExists(ctx, authdomainAlias.DefaultTenantID)
|
||||
},
|
||||
)
|
||||
bootstrapService := bootstrap.NewService(
|
||||
bootstrapStrategy,
|
||||
authAPIKeyRepo,
|
||||
authActorRoleRepo,
|
||||
auditService,
|
||||
authKeyStore,
|
||||
auth.HashAPIKey,
|
||||
)
|
||||
if cfg.Auth.BootstrapToken != "" {
|
||||
// Honour the prompt's "warn at startup if token set + admin
|
||||
// exists" requirement. The strategy re-probes on every Validate
|
||||
// so this boot-time warning is purely informational.
|
||||
if exists, probeErr := authActorRoleRepo.AdminExists(bootCtx, authdomainAlias.DefaultTenantID); probeErr == nil && exists {
|
||||
logger.Warn("CERTCTL_BOOTSTRAP_TOKEN set but admin actors already exist; bootstrap endpoint will return 410 Gone — unset the env var to silence this warning")
|
||||
} else if probeErr != nil {
|
||||
logger.Warn("CERTCTL_BOOTSTRAP_TOKEN admin-existence probe failed at startup; behaviour will be determined by the live probe at request time", "err", probeErr)
|
||||
} else {
|
||||
logger.Info("bootstrap endpoint enabled — POST /api/v1/auth/bootstrap to mint the first admin key (one-shot)")
|
||||
}
|
||||
}
|
||||
bootstrapHandler := handler.NewBootstrapHandler(bootstrapService)
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 4 — session service.
|
||||
//
|
||||
// Wired AFTER migrations + RBAC backfill, BEFORE the HTTP listener
|
||||
// binds (per the prompt's "fail-fatal on bootstrap key mint failure"
|
||||
// requirement). EnsureInitialSigningKey is idempotent: if a non-
|
||||
// retired signing key already exists for the tenant the call is a
|
||||
// no-op; otherwise it mints a fresh 32-byte HMAC key, persists it,
|
||||
// and emits an auth.session_signing_key_bootstrap audit row with
|
||||
// event_category=auth.
|
||||
//
|
||||
// Failure here is fatal — the server refuses to boot rather than
|
||||
// serve session-less.
|
||||
//
|
||||
// The session service is wired into the scheduler below (sessionGCLoop)
|
||||
// so the GC sweep runs every CERTCTL_SESSION_GC_INTERVAL tick. The
|
||||
// HTTP middleware that consumes ValidateInput / ValidateCSRF lands
|
||||
// in Phase 5; pre-Phase-5 deployments boot the service so the GC
|
||||
// sweep can keep the sessions + signing-keys tables tidy.
|
||||
sessionRepo := postgres.NewSessionRepository(db)
|
||||
sessionKeyRepo := postgres.NewSessionSigningKeyRepository(db)
|
||||
// Audit 2026-05-10 LOW-5 closure — install the trusted-proxy CIDR
|
||||
// allowlist from CERTCTL_TRUSTED_PROXIES. Empty disables XFF trust.
|
||||
session.SetTrustedProxies(cfg.Auth.TrustedProxies)
|
||||
sessionService := session.NewService(
|
||||
sessionRepo,
|
||||
sessionKeyRepo,
|
||||
auditService,
|
||||
authdomainAlias.DefaultTenantID,
|
||||
session.Config{
|
||||
IdleTimeout: cfg.Auth.Session.IdleTimeout,
|
||||
AbsoluteTimeout: cfg.Auth.Session.AbsoluteTimeout,
|
||||
SigningKeyRetention: cfg.Auth.Session.SigningKeyRetention,
|
||||
BindIP: cfg.Auth.Session.BindIP,
|
||||
BindUserAgent: cfg.Auth.Session.BindUserAgent,
|
||||
},
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
)
|
||||
if err := sessionService.EnsureInitialSigningKey(bootCtx); err != nil {
|
||||
logger.Error("FATAL: session signing key bootstrap failed; refusing to boot", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 5 — OIDC service + pre-login store + Phase 5 handler.
|
||||
//
|
||||
// Wired AFTER sessionService (Phase 4) so the OIDC PreLoginAdapter
|
||||
// can sign pre-login cookies under the active SessionSigningKey.
|
||||
// =========================================================================
|
||||
oidcProviderRepo := postgres.NewOIDCProviderRepository(db)
|
||||
oidcMappingRepo := postgres.NewGroupRoleMappingRepository(db)
|
||||
oidcUserRepo := postgres.NewUserRepository(db)
|
||||
// Audit 2026-05-10 HIGH-5: thread CERTCTL_CONFIG_ENCRYPTION_KEY into the
|
||||
// pre-login repo so state/nonce/PKCE-verifier are encrypted at rest. Same
|
||||
// key already protects OIDC client secrets and session signing keys.
|
||||
oidcPreLoginRepo := postgres.NewPreLoginRepository(db, cfg.Encryption.ConfigEncryptionKey)
|
||||
preLoginAdapter := oidcsvc.NewPreLoginAdapter(
|
||||
oidcPreLoginRepo,
|
||||
sessionKeyRepo, // Phase 4 SessionSigningKeyRepository
|
||||
authdomainAlias.DefaultTenantID,
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
)
|
||||
// SessionMinter port for the OIDC service. The OIDC HandleCallback
|
||||
// uses this to mint the post-login session after successful token
|
||||
// validation + group→role mapping.
|
||||
oidcSessionMinter := &sessionMinterAdapter{svc: sessionService}
|
||||
oidcService := oidcsvc.NewService(
|
||||
oidcProviderRepo,
|
||||
oidcMappingRepo,
|
||||
oidcUserRepo,
|
||||
oidcSessionMinter,
|
||||
preLoginAdapter,
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
)
|
||||
// Audit 2026-05-10 MED-16 — apply per-leg pre-login UA / IP
|
||||
// binding enforcement toggles from config.
|
||||
oidcService.SetPreLoginBindingRequirements(
|
||||
cfg.Auth.OIDCPreLoginRequireUA,
|
||||
cfg.Auth.OIDCPreLoginRequireIP,
|
||||
)
|
||||
// SameSite resolution from CERTCTL_SESSION_SAMESITE (default Lax;
|
||||
// "Strict" for high-security environments at the cost of breaking
|
||||
// inbound deep-links from external apps).
|
||||
sameSiteMode := http.SameSiteLaxMode
|
||||
if strings.EqualFold(cfg.Auth.Session.SameSite, "Strict") {
|
||||
sameSiteMode = http.SameSiteStrictMode
|
||||
}
|
||||
// Audit 2026-05-10 HIGH-3 — BCL iat-skew window + jti consumed-set.
|
||||
bclMaxAge := time.Duration(cfg.Auth.OIDCBCLMaxAgeSeconds) * time.Second
|
||||
if bclMaxAge <= 0 {
|
||||
bclMaxAge = handler.DefaultBCLVerifierMaxAge
|
||||
}
|
||||
bclReplayRepo := postgres.NewBCLReplayRepository(db)
|
||||
authSessionOIDCHandler := handler.NewAuthSessionOIDCHandler(
|
||||
oidcService,
|
||||
sessionService,
|
||||
handler.NewDefaultBCLVerifier(oidcProviderRepo, authdomainAlias.DefaultTenantID, nil).WithMaxAge(bclMaxAge),
|
||||
oidcProviderRepo,
|
||||
oidcMappingRepo,
|
||||
sessionRepo,
|
||||
oidcUserRepo, // CRIT-2: BCL sub→actor_id lookup via users.GetByOIDCSubject
|
||||
auditService,
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
authdomainAlias.DefaultTenantID,
|
||||
"/", // post-login redirect target; GUI dashboard
|
||||
handler.SessionCookieAttrs{
|
||||
SameSite: sameSiteMode,
|
||||
Secure: true,
|
||||
},
|
||||
).WithBCLReplayConsumer(bclReplayRepo, bclMaxAge). // HIGH-3 jti consumed-set.
|
||||
WithPermissionChecker(authCheckerAdapter) // MED-2 auth.session.list.all gate.
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 7 — OIDC first-admin bootstrap hook.
|
||||
//
|
||||
// Wired AFTER oidcService is constructed. The hook closure consults
|
||||
// the configured CERTCTL_BOOTSTRAP_ADMIN_GROUPS + the AdminExists
|
||||
// probe; on first match it grants r-admin via the ActorRoleRepository
|
||||
// + emits a bootstrap.oidc_first_admin audit row. Subsequent
|
||||
// admin-already-exists logins return grantAdmin=false silently.
|
||||
// Disabled (no-op) when CERTCTL_BOOTSTRAP_ADMIN_GROUPS is empty.
|
||||
if len(cfg.Auth.BootstrapAdminGroups) > 0 {
|
||||
bootstrapGroups := make(map[string]struct{}, len(cfg.Auth.BootstrapAdminGroups))
|
||||
for _, g := range cfg.Auth.BootstrapAdminGroups {
|
||||
bootstrapGroups[strings.TrimSpace(g)] = struct{}{}
|
||||
}
|
||||
bootstrapProviderID := cfg.Auth.BootstrapOIDCProviderID
|
||||
oidcService.SetAdminBootstrapHook(func(ctx context.Context, providerID string, groups []string, userID string) (bool, error) {
|
||||
// Provider-specificity: when configured, only the named
|
||||
// provider is eligible for bootstrap.
|
||||
if bootstrapProviderID != "" && providerID != bootstrapProviderID {
|
||||
return false, nil
|
||||
}
|
||||
// Admin-already-exists: bootstrap mode is disabled once
|
||||
// any actor in the tenant holds r-admin.
|
||||
adminExists, probeErr := authActorRoleRepo.AdminExists(ctx, authdomainAlias.DefaultTenantID)
|
||||
if probeErr != nil {
|
||||
return false, fmt.Errorf("admin existence probe: %w", probeErr)
|
||||
}
|
||||
if adminExists {
|
||||
return false, nil
|
||||
}
|
||||
// Group intersection check.
|
||||
matched := false
|
||||
for _, g := range groups {
|
||||
if _, ok := bootstrapGroups[g]; ok {
|
||||
matched = true
|
||||
break
|
||||
}
|
||||
}
|
||||
if !matched {
|
||||
return false, nil
|
||||
}
|
||||
// Match. Grant r-admin via the actor-role repo.
|
||||
grant := &authdomainAlias.ActorRole{
|
||||
ActorID: userID,
|
||||
ActorType: authdomainAlias.ActorTypeValue("User"),
|
||||
RoleID: authdomainAlias.RoleIDAdmin,
|
||||
TenantID: authdomainAlias.DefaultTenantID,
|
||||
GrantedBy: "oidc-bootstrap",
|
||||
}
|
||||
if gerr := authActorRoleRepo.Grant(ctx, grant); gerr != nil {
|
||||
return false, fmt.Errorf("grant r-admin: %w", gerr)
|
||||
}
|
||||
// Emit audit row with event_category=auth.
|
||||
_ = auditService.RecordEventWithCategory(ctx, userID, domain.ActorTypeUser,
|
||||
"bootstrap.oidc_first_admin", domain.EventCategoryAuth,
|
||||
"users", userID,
|
||||
map[string]interface{}{
|
||||
"user_id": userID,
|
||||
"provider_id": providerID,
|
||||
"trigger": "oidc_group_match",
|
||||
})
|
||||
logger.Info("OIDC first-admin bootstrap fired — user granted r-admin",
|
||||
"user_id", userID, "provider_id", providerID)
|
||||
return true, nil
|
||||
})
|
||||
logger.Info("OIDC first-admin bootstrap enabled",
|
||||
"groups", cfg.Auth.BootstrapAdminGroups,
|
||||
"provider_id_filter", bootstrapProviderID)
|
||||
}
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 7.5 — break-glass admin service + handler.
|
||||
// =========================================================================
|
||||
breakglassRepo := postgres.NewBreakglassCredentialRepository(db)
|
||||
breakglassService := breakglass.NewService(
|
||||
breakglassRepo,
|
||||
auditService,
|
||||
breakglassSessionMinterAdapter{svc: sessionService},
|
||||
breakglass.Config{
|
||||
Enabled: cfg.Auth.Breakglass.Enabled,
|
||||
LockoutThreshold: cfg.Auth.Breakglass.LockoutThreshold,
|
||||
LockoutDuration: cfg.Auth.Breakglass.LockoutDuration,
|
||||
LockoutResetInterval: cfg.Auth.Breakglass.LockoutResetInterval,
|
||||
},
|
||||
authdomainAlias.DefaultTenantID,
|
||||
)
|
||||
breakglassHandler := handler.NewAuthBreakglassHandler(breakglassService, handler.SessionCookieAttrs{
|
||||
SameSite: sameSiteMode,
|
||||
Secure: true,
|
||||
})
|
||||
if cfg.Auth.Breakglass.Enabled {
|
||||
logger.Warn("CERTCTL_BREAKGLASS_ENABLED=true — break-glass admin path is ACTIVE; this bypasses SSO. Disable in steady-state.",
|
||||
"lockout_threshold", cfg.Auth.Breakglass.LockoutThreshold,
|
||||
"lockout_duration", cfg.Auth.Breakglass.LockoutDuration.String())
|
||||
}
|
||||
|
||||
policyService := service.NewPolicyService(policyRepo, auditService)
|
||||
policyService.SetCertRepo(certificateRepo) // D-008: CertificateLifetime arm needs CertificateVersion.NotBefore/NotAfter
|
||||
// G-1: RenewalPolicyService — distinct from PolicyService (compliance rules).
|
||||
@@ -363,7 +682,7 @@ func main() {
|
||||
notificationService.SetOwnerRepo(ownerRepo)
|
||||
|
||||
// Rank 4 of the 2026-05-03 Infisical deep-research deliverable
|
||||
// (cowork/infisical-deep-research-results.md Part 5). Per-policy
|
||||
// (per the project's deep-research deliverable, Part 5). Per-policy
|
||||
// multi-channel expiry-alert metrics. Same instance is wired into
|
||||
// the notification service (recording side, every
|
||||
// SendThresholdAlertOnChannel call reports its outcome) AND into
|
||||
@@ -483,6 +802,36 @@ func main() {
|
||||
defer issuerRegistry.StopLifecycles()
|
||||
targetService := service.NewTargetService(targetRepo, auditService, agentRepo, encryptionKey, logger)
|
||||
profileService := service.NewProfileService(profileRepo, auditService)
|
||||
// Bundle 1 Phase 9 — approval-bypass closure. Wire the profile
|
||||
// service's gate to the existing ApprovalService so edits to a
|
||||
// RequiresApproval=true profile route through the four-eyes
|
||||
// workflow. The profile-edit-apply callback registered on the
|
||||
// ApprovalService closes the loop: when an approver decides,
|
||||
// the callback deserializes req.Payload and persists the diff.
|
||||
profileService.SetApprovalService(approvalService)
|
||||
approvalService.SetProfileEditApply(func(ctx context.Context, req *domain.ApprovalRequest) error {
|
||||
var pendingProfile domain.CertificateProfile
|
||||
if err := json.Unmarshal(req.Payload, &pendingProfile); err != nil {
|
||||
return fmt.Errorf("decode profile-edit payload: %w", err)
|
||||
}
|
||||
pendingProfile.ID = req.ProfileID
|
||||
if err := profileRepo.Update(ctx, &pendingProfile); err != nil {
|
||||
return fmt.Errorf("apply profile-edit diff: %w", err)
|
||||
}
|
||||
// Audit row category=auth so the auditor surface keeps the
|
||||
// approval-decision history grouped with the request side.
|
||||
if auditService != nil {
|
||||
_ = auditService.RecordEventWithCategory(ctx, "approval-system",
|
||||
domain.ActorTypeSystem, "profile.edit_applied",
|
||||
domain.EventCategoryAuth, "certificate_profile",
|
||||
req.ProfileID,
|
||||
map[string]interface{}{
|
||||
"approval_id": req.ID,
|
||||
"requested_by": req.RequestedBy,
|
||||
})
|
||||
}
|
||||
return nil
|
||||
})
|
||||
teamService := service.NewTeamService(teamRepo, auditService)
|
||||
ownerService := service.NewOwnerService(ownerRepo, auditService)
|
||||
agentGroupRepo := postgres.NewAgentGroupRepository(db)
|
||||
@@ -661,6 +1010,18 @@ func main() {
|
||||
// Bundle-5 / H-006: pass the *sql.DB pool so /ready can probe DB
|
||||
// connectivity via PingContext. /health stays shallow (liveness signal).
|
||||
healthHandler := handler.NewHealthHandler(cfg.Auth.Type, db)
|
||||
// Bundle 1 Phase 3 closure (M1): wire the AuthCheckResolver so
|
||||
// /v1/auth/check returns the caller's standing roles + effective
|
||||
// permissions in the same response. The shim is tiny — just a type-
|
||||
// erasure wrap around the repo so the handler layer doesn't have to
|
||||
// import internal/domain/auth or internal/repository/postgres.
|
||||
healthHandler.Resolver = authCheckResolverAdapter{repo: authActorRoleRepo}
|
||||
// Bundle 2 Phase 6 / Category E — wire the OIDC providers resolver
|
||||
// so GET /api/v1/auth/info returns the configured provider list
|
||||
// (id + display_name + login_url) for the GUI's Login page button
|
||||
// rendering. The shim adapts the postgres OIDCProviderRepository
|
||||
// to the handler's narrow OIDCProvidersListResolver projection.
|
||||
healthHandler.OIDCProvidersResolver = oidcProvidersListAdapter{repo: oidcProviderRepo}
|
||||
// U-3 ride-along (cat-u-no_version_endpoint, P2): the version handler
|
||||
// answers GET /api/v1/version with build identity (ldflags Version,
|
||||
// VCS commit/dirty/timestamp, Go runtime version). Wired through the
|
||||
@@ -811,6 +1172,19 @@ func main() {
|
||||
sched.SetJobTimeoutInterval(cfg.Scheduler.JobTimeoutInterval)
|
||||
sched.SetAwaitingCSRTimeout(cfg.Scheduler.AwaitingCSRTimeout)
|
||||
sched.SetAwaitingApprovalTimeout(cfg.Scheduler.AwaitingApprovalTimeout)
|
||||
|
||||
// Auth Bundle 2 Phase 4 — wire the session-GC sweep. The service
|
||||
// itself was constructed (with the EnsureInitialSigningKey fail-
|
||||
// fatal call) above the policy/cert-service block; here we just
|
||||
// register it with the scheduler so the loop fires every
|
||||
// CERTCTL_SESSION_GC_INTERVAL.
|
||||
sched.SetSessionGarbageCollector(sessionService)
|
||||
sched.SetBCLReplayGarbageCollector(bclReplayRepo) // Audit 2026-05-10 HIGH-3.
|
||||
sched.SetSessionGCInterval(cfg.Auth.Session.GCInterval)
|
||||
logger.Info("session GC sweep enabled",
|
||||
"interval", cfg.Auth.Session.GCInterval.String(),
|
||||
"absolute_timeout", cfg.Auth.Session.AbsoluteTimeout.String(),
|
||||
"signing_key_retention", cfg.Auth.Session.SigningKeyRetention.String())
|
||||
logger.Info("job timeout reaper enabled",
|
||||
"interval", cfg.Scheduler.JobTimeoutInterval.String(),
|
||||
"csr_timeout", cfg.Scheduler.AwaitingCSRTimeout.String(),
|
||||
@@ -961,6 +1335,90 @@ func main() {
|
||||
// Rank 8 of the 2026-05-03 deep-research deliverable. See
|
||||
// docs/intermediate-ca-hierarchy.md.
|
||||
IntermediateCAs: intermediateCAHandler,
|
||||
// AuthSessionOIDC — Auth Bundle 2 Phase 5 OIDC + session HTTP
|
||||
// surface. 13 endpoints across login flow + session management
|
||||
// + OIDC provider CRUD + group-mapping CRUD.
|
||||
AuthSessionOIDC: authSessionOIDCHandler,
|
||||
|
||||
// AuthBreakglass — Auth Bundle 2 Phase 7.5 break-glass admin
|
||||
// HTTP surface. 4 endpoints (1 public login + 3 admin CRUD).
|
||||
// All endpoints return 404 when CERTCTL_BREAKGLASS_ENABLED=false.
|
||||
AuthBreakglass: breakglassHandler,
|
||||
|
||||
// Audit 2026-05-10 MED-11 — federated-user admin surface.
|
||||
AuthUsers: handler.NewAuthUsersHandler(
|
||||
oidcUserRepo,
|
||||
sessionService, // satisfies UserSessionsRevoker via RevokeAllForActor
|
||||
auditService,
|
||||
authdomainAlias.DefaultTenantID,
|
||||
),
|
||||
|
||||
// Audit 2026-05-10 MED-12 — runtime config read endpoint.
|
||||
AuthRuntimeConfig: handler.NewAuthRuntimeConfigHandler(
|
||||
func() map[string]string {
|
||||
// Lazy build — re-read cfg.Auth.* values on every call so
|
||||
// post-startup re-evaluation reflects any (future) mutation.
|
||||
return map[string]string{
|
||||
"CERTCTL_AUTH_TYPE": string(cfg.Auth.Type),
|
||||
"CERTCTL_SESSION_SAMESITE": cfg.Auth.Session.SameSite,
|
||||
"CERTCTL_OIDC_BCL_MAX_AGE_SECONDS": strconv.Itoa(cfg.Auth.OIDCBCLMaxAgeSeconds),
|
||||
"CERTCTL_OIDC_PRELOGIN_REQUIRE_UA": strconv.FormatBool(cfg.Auth.OIDCPreLoginRequireUA),
|
||||
"CERTCTL_OIDC_PRELOGIN_REQUIRE_IP": strconv.FormatBool(cfg.Auth.OIDCPreLoginRequireIP),
|
||||
"CERTCTL_BREAKGLASS_ENABLED": strconv.FormatBool(cfg.Auth.Breakglass.Enabled),
|
||||
"CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD": strconv.Itoa(cfg.Auth.Breakglass.LockoutThreshold),
|
||||
"CERTCTL_DEMO_MODE_ACK": strconv.FormatBool(cfg.Auth.DemoModeAck),
|
||||
"CERTCTL_TRUSTED_PROXIES_COUNT": strconv.Itoa(len(cfg.Auth.TrustedProxies)),
|
||||
"CERTCTL_BOOTSTRAP_TOKEN_SET": strconv.FormatBool(cfg.Auth.BootstrapToken != ""),
|
||||
"CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID": cfg.Auth.BootstrapOIDCProviderID,
|
||||
"CERTCTL_BOOTSTRAP_ADMIN_GROUPS_COUNT": strconv.Itoa(len(cfg.Auth.BootstrapAdminGroups)),
|
||||
}
|
||||
},
|
||||
auditService,
|
||||
),
|
||||
|
||||
// Audit 2026-05-10 MED-7 — per-provider JWKS health surface.
|
||||
AuthOIDCJWKSStatus: handler.NewAuthOIDCJWKSStatusHandler(oidcService, auditService),
|
||||
// Auth — RBAC primitive (Bundle 1 Phase 4). Wires the postgres
|
||||
// auth repos + service-layer Authorizer / RoleService /
|
||||
// ActorRoleService / PermissionService into the HTTP surface
|
||||
// under /api/v1/auth/*. The service layer enforces every
|
||||
// permission gate (auth.role.* + auth.role.assign privilege-
|
||||
// escalation guard); the Phase 3 RequirePermission middleware
|
||||
// is currently used by these RBAC routes via the in-handler
|
||||
// callerFromRequest path. Phase 3.5 router-wrapping conversion
|
||||
// of the legacy admin handlers (bulk_revocation, admin_*,
|
||||
// intermediate_ca) is the remaining sweep.
|
||||
Auth: handler.NewAuthHandler(
|
||||
authsvc.NewRoleService(authRoleRepo, authPermRepo, authAuthorizer, auditService),
|
||||
authsvc.NewPermissionService(authPermRepo),
|
||||
authsvc.NewActorRoleService(authActorRoleRepo, authRoleRepo, authAuthorizer, auditService),
|
||||
authCheckerAdapter,
|
||||
).WithCSRFRotator(sessionService), // Audit 2026-05-10 HIGH-2 — CSRF rotation on role mutation.
|
||||
// Bundle 1 Phase 6 — bootstrap day-0 admin endpoint. The
|
||||
// service is wired above; handler is auth-exempt at the
|
||||
// router (gated by the bootstrap.Strategy itself).
|
||||
Bootstrap: bootstrapHandler,
|
||||
// Audit 2026-05-11 A-8 closure — demo-mode residual cleanup.
|
||||
// The cleanup closure captures the live *sql.DB pool so the
|
||||
// handler doesn't pull repository.* / database/sql into the
|
||||
// internal/api/handler import set. authType is a closure over
|
||||
// cfg so the live config value is always read at request time.
|
||||
DemoResidual: handler.NewDemoResidualHandler(
|
||||
func(ctx context.Context) (int64, error) { return deleteDemoAnonResidue(ctx, db) },
|
||||
func() string { return cfg.Auth.Type },
|
||||
auditService,
|
||||
),
|
||||
// Checker is the load-bearing auth.PermissionChecker that
|
||||
// auth.RequirePermission middleware uses to gate the legacy admin
|
||||
// handlers (Bundle 1 Phase 3.5: bulk_revocation, admin_crl_cache,
|
||||
// admin_scep_intune, admin_est, intermediate_ca). Wraps live in
|
||||
// router.go via rbacGate(reg.Checker, perm, handler).
|
||||
Checker: authCheckerAdapter,
|
||||
// Audit 2026-05-10 CRIT-3 closure — operator-configured CORS
|
||||
// applied to the credentialed auth-exempt routes (OIDC handshake,
|
||||
// BCL, logout, bootstrap, breakglass-login). Health probes
|
||||
// continue to use middleware.CORSWildcard.
|
||||
CorsCfg: middleware.CORSConfig{AllowedOrigins: cfg.CORS.AllowedOrigins},
|
||||
})
|
||||
// Register EST (RFC 7030) handlers if enabled.
|
||||
//
|
||||
@@ -1477,49 +1935,31 @@ func main() {
|
||||
|
||||
// Build middleware stack.
|
||||
//
|
||||
// Authentication unification (M-002): every authenticated request now
|
||||
// carries a named actor in the request context so audit events record
|
||||
// the real key identity instead of the hardcoded "api-key-user" string.
|
||||
// Named keys come from CERTCTL_API_KEYS_NAMED (preferred). For backward
|
||||
// compatibility CERTCTL_AUTH_SECRET is synthesized into legacy-key-N
|
||||
// entries with Admin=false.
|
||||
var namedKeys []middleware.NamedAPIKey
|
||||
if config.AuthType(cfg.Auth.Type) != config.AuthTypeNone {
|
||||
// Translate typed config.NamedAPIKey -> middleware.NamedAPIKey. The
|
||||
// two structs are field-compatible but live in different packages to
|
||||
// preserve the config→middleware dependency direction.
|
||||
for _, nk := range cfg.Auth.NamedKeys {
|
||||
namedKeys = append(namedKeys, middleware.NamedAPIKey{
|
||||
Name: nk.Name,
|
||||
Key: nk.Key,
|
||||
Admin: nk.Admin,
|
||||
})
|
||||
}
|
||||
// Back-compat: if no named keys but legacy Secret is configured,
|
||||
// synthesize named entries so the audit trail still attributes the
|
||||
// action (instead of falling back to "api-key-user" / "anonymous").
|
||||
if len(namedKeys) == 0 && cfg.Auth.Secret != "" {
|
||||
parts := strings.Split(cfg.Auth.Secret, ",")
|
||||
idx := 0
|
||||
for _, p := range parts {
|
||||
p = strings.TrimSpace(p)
|
||||
if p == "" {
|
||||
continue
|
||||
}
|
||||
namedKeys = append(namedKeys, middleware.NamedAPIKey{
|
||||
Name: fmt.Sprintf("legacy-key-%d", idx),
|
||||
Key: p,
|
||||
Admin: false,
|
||||
})
|
||||
idx++
|
||||
}
|
||||
if len(namedKeys) > 0 {
|
||||
logger.Warn("CERTCTL_AUTH_SECRET is deprecated — set CERTCTL_API_KEYS_NAMED for named actor attribution and admin gating",
|
||||
"synthesized_keys", len(namedKeys))
|
||||
}
|
||||
}
|
||||
// Bundle 1 Phase 6: namedKeys + authKeyStore + bootstrap service
|
||||
// are now constructed earlier (right after the auth repos) so the
|
||||
// HandlerRegistry can wire the bootstrap handler. The auth
|
||||
// middleware below reads from the same authKeyStore reference, so
|
||||
// runtime additions from bootstrap propagate without restart.
|
||||
var bearerMiddleware func(http.Handler) http.Handler
|
||||
switch config.AuthType(cfg.Auth.Type) {
|
||||
case config.AuthTypeNone:
|
||||
bearerMiddleware = auth.NewDemoModeAuth()
|
||||
default:
|
||||
bearerMiddleware = auth.NewAuthWithKeyStore(authKeyStore)
|
||||
}
|
||||
authMiddleware := middleware.NewAuthWithNamedKeys(namedKeys)
|
||||
// Auth Bundle 2 Phase 6 — chained-auth middleware. Tries the
|
||||
// `certctl_session` cookie first (sessionMW); on miss / invalid,
|
||||
// falls back to the API-key Bearer middleware. If neither
|
||||
// authenticates, 401. The session middleware is a pass-through
|
||||
// when sessionService is nil (pre-Bundle-2 builds).
|
||||
sessionMW := session.NewSessionMiddleware(sessionService)
|
||||
authMiddleware := session.ChainAuthSessionThenBearer(sessionMW, bearerMiddleware)
|
||||
// CSRF middleware — gates state-changing methods (POST/PUT/DELETE/
|
||||
// PATCH) for session-authenticated requests. API-key actors are
|
||||
// CSRF-exempt (not browser-driven). Pass-through when
|
||||
// sessionService is nil.
|
||||
csrfMiddleware := session.NewCSRFMiddleware(sessionService)
|
||||
_ = bootstrapHandler // referenced by HandlerRegistry above
|
||||
corsMiddleware := middleware.NewCORS(middleware.CORSConfig{
|
||||
AllowedOrigins: cfg.CORS.AllowedOrigins,
|
||||
})
|
||||
@@ -1567,7 +2007,10 @@ func main() {
|
||||
bodyLimitMiddleware,
|
||||
securityHeadersMiddleware,
|
||||
corsMiddleware,
|
||||
// Phase 6 chain: Auth (session-then-Bearer fallback) → CSRF
|
||||
// (state-changing only; API-key actors exempt) → Audit.
|
||||
authMiddleware,
|
||||
csrfMiddleware,
|
||||
auditMiddleware.Middleware,
|
||||
}
|
||||
|
||||
@@ -1589,7 +2032,10 @@ func main() {
|
||||
bodyLimitMiddleware,
|
||||
rateLimiter,
|
||||
corsMiddleware,
|
||||
// Phase 6 chain: Auth (session-then-Bearer fallback) → CSRF
|
||||
// (state-changing only; API-key actors exempt) → Audit.
|
||||
authMiddleware,
|
||||
csrfMiddleware,
|
||||
auditMiddleware.Middleware,
|
||||
}
|
||||
logger.Info("rate limiting enabled", "rps", cfg.RateLimit.RPS, "burst", cfg.RateLimit.BurstSize)
|
||||
@@ -2231,3 +2677,171 @@ func buildFinalHandler(apiHandler, noAuthHandler http.Handler, webDir string, da
|
||||
http.ServeFile(w, r, webDir+"/index.html")
|
||||
})
|
||||
}
|
||||
|
||||
// authPermissionCheckerAdapter bridges the typed-string Authorizer
|
||||
// signature (authsvc.Authorizer.CheckPermission takes
|
||||
// authdomain.ActorTypeValue + authdomain.ScopeType) to the plain-string
|
||||
// auth.PermissionChecker interface used by the auth.RequirePermission
|
||||
// middleware factory. Lives in cmd/server so internal/auth doesn't have
|
||||
// to import internal/service/auth + internal/domain/auth (would create
|
||||
// a cycle).
|
||||
type authPermissionCheckerAdapter struct {
|
||||
a *authsvc.Authorizer
|
||||
}
|
||||
|
||||
func (ad authPermissionCheckerAdapter) CheckPermission(
|
||||
ctx context.Context,
|
||||
actorID string,
|
||||
actorType string,
|
||||
tenantID string,
|
||||
permission string,
|
||||
scopeType string,
|
||||
scopeID *string,
|
||||
) (bool, error) {
|
||||
return ad.a.CheckPermission(
|
||||
ctx,
|
||||
actorID,
|
||||
authdomainAlias.ActorTypeValue(actorType),
|
||||
tenantID,
|
||||
permission,
|
||||
authdomainAlias.ScopeType(scopeType),
|
||||
scopeID,
|
||||
)
|
||||
}
|
||||
|
||||
// authCheckResolverAdapter bridges the postgres ActorRoleRepository
|
||||
// (authdomain.ActorTypeValue) to handler.AuthCheckResolver
|
||||
// (domain.ActorType). Lives in cmd/server so the handler layer keeps its
|
||||
// existing import set; the GUI's /v1/auth/check probe round-trips
|
||||
// through this on every page load. Read-only — no caller / no audit row.
|
||||
//
|
||||
// Bundle 1 Phase 3 closure (M1): the equivalent surface area on
|
||||
// /v1/auth/me runs through the service layer's auth.role.list permission
|
||||
// gate, which the GUI may not yet hold during initial render. AuthCheck
|
||||
// has no permission gate (its only requirement is "the request
|
||||
// authenticated"), so the bypass is by design.
|
||||
type authCheckResolverAdapter struct {
|
||||
repo *postgres.ActorRoleRepository
|
||||
}
|
||||
|
||||
func (ad authCheckResolverAdapter) ListRoles(
|
||||
ctx context.Context,
|
||||
actorID string,
|
||||
actorType domain.ActorType,
|
||||
tenantID string,
|
||||
) ([]*authdomainAlias.ActorRole, error) {
|
||||
return ad.repo.ListByActor(ctx, actorID, authdomainAlias.ActorTypeValue(actorType), tenantID)
|
||||
}
|
||||
|
||||
func (ad authCheckResolverAdapter) EffectivePermissions(
|
||||
ctx context.Context,
|
||||
actorID string,
|
||||
actorType domain.ActorType,
|
||||
tenantID string,
|
||||
) ([]repository.EffectivePermission, error) {
|
||||
return ad.repo.EffectivePermissions(ctx, actorID, authdomainAlias.ActorTypeValue(actorType), tenantID)
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// sessionMinterAdapter — bridge from *session.Service to oidcsvc.SessionMinter.
|
||||
//
|
||||
// The OIDC service's SessionMinter port (Phase 3) takes a *userdomain.User
|
||||
// + role IDs and returns (cookie, csrf, err). The session.Service's
|
||||
// Create method takes (actorID, actorType, ip, ua) -> *CreateResult.
|
||||
// This adapter unwraps the User into actorID/actorType + reshapes the
|
||||
// return tuple. Lives in cmd/server so the session package doesn't have
|
||||
// to know about user.User and the user package doesn't have to know
|
||||
// about session.CreateResult.
|
||||
// =============================================================================
|
||||
|
||||
type sessionMinterAdapter struct {
|
||||
svc *session.Service
|
||||
}
|
||||
|
||||
func (a *sessionMinterAdapter) MintForUser(
|
||||
ctx context.Context,
|
||||
user *userdomain.User,
|
||||
_ []string, // roleIDs unused at the session-mint layer; the rbac middleware looks them up at request time
|
||||
ip, userAgent string,
|
||||
) (cookieValue, csrfToken string, err error) {
|
||||
if user == nil {
|
||||
return "", "", fmt.Errorf("session mint: user is nil")
|
||||
}
|
||||
res, err := a.svc.Create(ctx, user.ID, string(domain.ActorTypeUser), ip, userAgent)
|
||||
if err != nil {
|
||||
return "", "", err
|
||||
}
|
||||
return res.CookieValue, res.CSRFToken, nil
|
||||
}
|
||||
|
||||
// silenceUnusedImports keeps the new oidcsvc + oidcdomain imports load-
|
||||
// bearing in case any file shuffles. Linker dead-code elimination handles
|
||||
// the runtime cost.
|
||||
var (
|
||||
_ = oidcdomain.OIDCProvider{}
|
||||
)
|
||||
|
||||
// =============================================================================
|
||||
// breakglassSessionMinterAdapter — bridge from *session.Service to
|
||||
// breakglass.SessionMinter.
|
||||
//
|
||||
// The break-glass service's SessionMinter port (Phase 7.5) returns
|
||||
// (cookie, csrf, err); the underlying *session.Service.Create returns
|
||||
// *CreateResult. This adapter unwraps the result. Lives in cmd/server
|
||||
// so the breakglass package doesn't have to know about session.Service.
|
||||
// =============================================================================
|
||||
|
||||
type breakglassSessionMinterAdapter struct {
|
||||
svc *session.Service
|
||||
}
|
||||
|
||||
func (a breakglassSessionMinterAdapter) Create(ctx context.Context, actorID, actorType, ip, userAgent string) (string, string, error) {
|
||||
res, err := a.svc.Create(ctx, actorID, actorType, ip, userAgent)
|
||||
if err != nil {
|
||||
return "", "", err
|
||||
}
|
||||
return res.CookieValue, res.CSRFToken, nil
|
||||
}
|
||||
|
||||
// RevokeAllForActor — Audit 2026-05-10 HIGH-1 wire. After a break-glass
|
||||
// password rotation or credential removal, every active session for the
|
||||
// target actor must be revoked so a phished-then-rotated credential
|
||||
// doesn't leave the attacker's session live.
|
||||
func (a breakglassSessionMinterAdapter) RevokeAllForActor(ctx context.Context, actorID, actorType string) error {
|
||||
return a.svc.RevokeAllForActor(ctx, actorID, actorType)
|
||||
}
|
||||
|
||||
// oidcProvidersListAdapter bridges the postgres OIDCProviderRepository
|
||||
// to handler.OIDCProvidersListResolver. The handler returns
|
||||
// []*OIDCProviderInfo (id + display_name + login_url) for the public-
|
||||
// safe GUI Login-page payload; the repo returns the full OIDCProvider
|
||||
// row. The adapter projects + maps the login_url shape that
|
||||
// /auth/oidc/login?provider=<id> expects. Auth Bundle 2 Phase 6 /
|
||||
// Category E.
|
||||
type oidcProvidersListAdapter struct {
|
||||
repo repository.OIDCProviderRepository
|
||||
}
|
||||
|
||||
func (a oidcProvidersListAdapter) List(ctx context.Context, tenantID string) ([]*handler.OIDCProviderInfo, error) {
|
||||
provs, err := a.repo.List(ctx, tenantID)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
out := make([]*handler.OIDCProviderInfo, 0, len(provs))
|
||||
for _, p := range provs {
|
||||
// Audit 2026-05-10 MED-9 closure — filter disabled providers
|
||||
// at the adapter so the LoginPage's "Sign in with X" buttons
|
||||
// don't render for offline IdPs. The HandleAuthRequest
|
||||
// service-layer ErrProviderDisabled check is the
|
||||
// defense-in-depth guard for direct API / MCP / CLI callers.
|
||||
if !p.Enabled {
|
||||
continue
|
||||
}
|
||||
out = append(out, &handler.OIDCProviderInfo{
|
||||
ID: p.ID,
|
||||
DisplayName: p.Name,
|
||||
LoginURL: "/auth/oidc/login?provider=" + p.ID,
|
||||
})
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
@@ -12,6 +12,7 @@ import (
|
||||
|
||||
"github.com/certctl-io/certctl/internal/api/middleware"
|
||||
"github.com/certctl-io/certctl/internal/api/router"
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
"github.com/certctl-io/certctl/internal/service"
|
||||
)
|
||||
@@ -44,7 +45,7 @@ func TestMain_HealthEndpointBypassesAuth(t *testing.T) {
|
||||
})
|
||||
|
||||
// Build the handler chain the same way main.go does
|
||||
authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
|
||||
authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
|
||||
{Name: "test", Key: "test-secret-key"},
|
||||
})
|
||||
|
||||
@@ -159,7 +160,7 @@ func TestMain_AuthMiddlewareRejectsUnauthorized(t *testing.T) {
|
||||
})
|
||||
|
||||
// Wrap with auth middleware
|
||||
authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
|
||||
authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
|
||||
{Name: "test", Key: "test-secret-key"},
|
||||
})
|
||||
|
||||
@@ -187,7 +188,7 @@ func TestMain_AuthMiddlewareAllowsWithValidKey(t *testing.T) {
|
||||
})
|
||||
|
||||
// Wrap with auth middleware
|
||||
authMiddleware := middleware.NewAuthWithNamedKeys([]middleware.NamedAPIKey{
|
||||
authMiddleware := auth.NewAuthWithNamedKeys([]auth.NamedAPIKey{
|
||||
{Name: "test", Key: testKey},
|
||||
})
|
||||
|
||||
@@ -460,7 +461,7 @@ func TestMain_AuthNoneMode(t *testing.T) {
|
||||
|
||||
// Wrap with auth middleware in "none" mode
|
||||
// auth=none equivalent: empty named-keys list is a no-op pass-through.
|
||||
authMiddleware := middleware.NewAuthWithNamedKeys(nil)
|
||||
authMiddleware := auth.NewAuthWithNamedKeys(nil)
|
||||
|
||||
chainedHandler := middleware.Chain(protectedHandler, authMiddleware)
|
||||
|
||||
|
||||
@@ -0,0 +1,203 @@
|
||||
// Copyright (c) certctl-io contributors.
|
||||
//
|
||||
// Audit 2026-05-11 A-8 — demo-mode residual-grants detector. Closes the
|
||||
// deferred Phase 2 leg of HIGH-12 (cowork/auth-bundles-fixes-2026-05-10/
|
||||
// 11-high-12-demo-mode-guard.md). The HIGH-12 closure (`b81588e`) added
|
||||
// the fail-closed bind-address guard at config.Validate; the deferred
|
||||
// leg here adds a startup-time WARN (or strict refuse-startup) when
|
||||
// `actor-demo-anon` has live role grants under a non-`none` auth type.
|
||||
//
|
||||
// Why this matters: migration 000029 unconditionally seeds the
|
||||
// `ar-demo-anon-admin` row granting r-admin to actor-demo-anon. The
|
||||
// row is dormant under auth_type=api-key|oidc (the middleware chain
|
||||
// never injects the synthetic actor as the request principal), but
|
||||
// it represents a security debt: any future regression in the
|
||||
// middleware chain (a misrouted CORS preflight, a fallback in a new
|
||||
// auth-exempt route) that resolves to actor-demo-anon would re-elevate
|
||||
// to admin. The canonical acquisition-readiness narrative — "we have
|
||||
// an RBAC primitive with no synthetic-admin fallback" — requires this
|
||||
// row to be either gone or explicitly acknowledged.
|
||||
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"errors"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
"github.com/certctl-io/certctl/internal/domain"
|
||||
authdomain "github.com/certctl-io/certctl/internal/domain/auth"
|
||||
"github.com/certctl-io/certctl/internal/service"
|
||||
)
|
||||
|
||||
// preflightDemoModeResidual runs after the DB connection is open and
|
||||
// the audit service is constructed, before the HTTPS listener starts.
|
||||
//
|
||||
// Behaviour:
|
||||
// - cfg.Auth.Type == "none" (demo mode): no-op. The residual IS the
|
||||
// runtime state at that auth type.
|
||||
// - cfg.Auth.Type != "none" + no residue: returns nil silently.
|
||||
// - cfg.Auth.Type != "none" + residue + strict=false: emits a WARN
|
||||
// log AND an `auth.demo_residual_grants_detected` audit row
|
||||
// listing the grant IDs, then returns nil.
|
||||
// - cfg.Auth.Type != "none" + residue + strict=true: emits the same
|
||||
// WARN + audit, then returns a non-nil error so the caller can
|
||||
// refuse startup.
|
||||
//
|
||||
// The audit row's actor is `system` / ActorTypeSystem; category is
|
||||
// EventCategoryAuth so audit consumers filtering on auth events see it.
|
||||
func preflightDemoModeResidual(
|
||||
ctx context.Context,
|
||||
cfg *config.Config,
|
||||
db *sql.DB,
|
||||
audit *service.AuditService,
|
||||
logger *slog.Logger,
|
||||
) error {
|
||||
if cfg.Auth.Type == "none" {
|
||||
// Demo mode itself. The residual is the runtime state at
|
||||
// this auth type, so warning about it would be noise.
|
||||
return nil
|
||||
}
|
||||
|
||||
residue, err := queryDemoAnonResidue(ctx, db)
|
||||
if err != nil {
|
||||
return fmt.Errorf("preflight demo-mode residual: %w", err)
|
||||
}
|
||||
if len(residue) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
formatted := make([]string, 0, len(residue))
|
||||
for _, r := range residue {
|
||||
formatted = append(formatted, r.String())
|
||||
}
|
||||
|
||||
msg := fmt.Sprintf(
|
||||
"production startup warning: actor-demo-anon has %d residual role grant(s) "+
|
||||
"from the migration 000029 baseline or a prior demo-mode run: %s. "+
|
||||
"These grants are DORMANT at the current auth_type (%s) but represent a "+
|
||||
"security debt — any future regression that resolves an unauthenticated "+
|
||||
"request to actor-demo-anon would re-elevate to admin. Clean up via "+
|
||||
"POST /api/v1/auth/demo-residual/cleanup (requires auth.role.assign) or "+
|
||||
"`DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon';`. Set "+
|
||||
"CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true to refuse startup until cleanup.",
|
||||
len(residue), strings.Join(formatted, "; "), cfg.Auth.Type,
|
||||
)
|
||||
if logger != nil {
|
||||
logger.Warn(msg, "auth_type", cfg.Auth.Type, "residue_count", len(residue))
|
||||
} else {
|
||||
slog.Warn(msg)
|
||||
}
|
||||
|
||||
if audit != nil {
|
||||
details := map[string]interface{}{
|
||||
"auth_type": cfg.Auth.Type,
|
||||
"residue_count": len(residue),
|
||||
"residue": formatted,
|
||||
}
|
||||
if err := audit.RecordEventWithCategory(
|
||||
ctx, "system", domain.ActorTypeSystem,
|
||||
"auth.demo_residual_grants_detected",
|
||||
domain.EventCategoryAuth,
|
||||
"actor_roles", authdomain.DemoAnonActorID,
|
||||
details,
|
||||
); err != nil {
|
||||
// Don't fail startup over an audit-write error; just log.
|
||||
if logger != nil {
|
||||
logger.Warn("preflight demo-mode residual: audit record failed", "error", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if cfg.Auth.DemoModeResidualStrict {
|
||||
return fmt.Errorf(
|
||||
"startup refused: actor-demo-anon has %d residual role grant(s) and "+
|
||||
"CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true. Remove the rows before restarting",
|
||||
len(residue),
|
||||
)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// demoAnonResidueRow describes a single live actor_roles row whose
|
||||
// actor_id matches the synthetic demo-anon ID.
|
||||
type demoAnonResidueRow struct {
|
||||
RoleID string
|
||||
ScopeType string
|
||||
ScopeID string
|
||||
GrantedAt time.Time
|
||||
}
|
||||
|
||||
// String renders one row as `role@scope (granted ts)`. Used both in
|
||||
// the WARN log message and in the audit row's residue list.
|
||||
func (r demoAnonResidueRow) String() string {
|
||||
scope := r.ScopeType
|
||||
if r.ScopeID != "" {
|
||||
scope = fmt.Sprintf("%s/%s", r.ScopeType, r.ScopeID)
|
||||
}
|
||||
return fmt.Sprintf("%s@%s (granted %s)", r.RoleID, scope, r.GrantedAt.UTC().Format(time.RFC3339))
|
||||
}
|
||||
|
||||
// queryDemoAnonResidue runs the canonical query for the residue
|
||||
// detector + the cleanup endpoint. Kept in one place so the two
|
||||
// surfaces can't drift on which rows count as "live".
|
||||
//
|
||||
// "Live" = not expired. Rows with expires_at <= NOW() are treated
|
||||
// as already gone (they have no effect even if the actor were to be
|
||||
// injected as the principal).
|
||||
func queryDemoAnonResidue(ctx context.Context, db *sql.DB) ([]demoAnonResidueRow, error) {
|
||||
if db == nil {
|
||||
return nil, errors.New("db is nil")
|
||||
}
|
||||
rows, err := db.QueryContext(ctx, `
|
||||
SELECT role_id, scope_type, COALESCE(scope_id, '') AS scope_id, granted_at
|
||||
FROM actor_roles
|
||||
WHERE actor_id = $1
|
||||
AND (expires_at IS NULL OR expires_at > NOW())
|
||||
ORDER BY granted_at ASC, role_id ASC, scope_type ASC, COALESCE(scope_id, '') ASC
|
||||
`, authdomain.DemoAnonActorID)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("query actor_roles: %w", err)
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var out []demoAnonResidueRow
|
||||
for rows.Next() {
|
||||
var r demoAnonResidueRow
|
||||
if err := rows.Scan(&r.RoleID, &r.ScopeType, &r.ScopeID, &r.GrantedAt); err != nil {
|
||||
return nil, fmt.Errorf("scan actor_roles row: %w", err)
|
||||
}
|
||||
out = append(out, r)
|
||||
}
|
||||
if err := rows.Err(); err != nil {
|
||||
return nil, fmt.Errorf("iterate actor_roles rows: %w", err)
|
||||
}
|
||||
return out, nil
|
||||
}
|
||||
|
||||
// deleteDemoAnonResidue removes every live actor_roles row for the
|
||||
// synthetic demo-anon actor. Returns the count removed. Used by the
|
||||
// POST /api/v1/auth/demo-residual/cleanup handler. Idempotent — a
|
||||
// follow-up call returns 0.
|
||||
func deleteDemoAnonResidue(ctx context.Context, db *sql.DB) (int64, error) {
|
||||
if db == nil {
|
||||
return 0, errors.New("db is nil")
|
||||
}
|
||||
res, err := db.ExecContext(ctx, `
|
||||
DELETE FROM actor_roles
|
||||
WHERE actor_id = $1
|
||||
`, authdomain.DemoAnonActorID)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("delete actor_roles: %w", err)
|
||||
}
|
||||
n, err := res.RowsAffected()
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("rows affected: %w", err)
|
||||
}
|
||||
return n, nil
|
||||
}
|
||||
@@ -0,0 +1,295 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
"sync"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
_ "github.com/lib/pq"
|
||||
"github.com/testcontainers/testcontainers-go"
|
||||
"github.com/testcontainers/testcontainers-go/wait"
|
||||
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
"github.com/certctl-io/certctl/internal/repository/postgres"
|
||||
"github.com/certctl-io/certctl/internal/service"
|
||||
)
|
||||
|
||||
// Audit 2026-05-11 A-8 — preflight + cleanup regression tests for the
|
||||
// demo-mode residual-grants detector. Testcontainers-backed because the
|
||||
// preflight runs raw SQL against actor_roles; mock-DB-only would not
|
||||
// catch a SQL-shape regression. Gated by testing.Short() to keep the
|
||||
// fast loop fast (matching internal/repository/postgres/* pattern).
|
||||
|
||||
var (
|
||||
a8DBOnce sync.Once
|
||||
a8DB *sql.DB
|
||||
a8Skip bool
|
||||
a8SkipMu sync.Mutex
|
||||
)
|
||||
|
||||
func setupA8DB(t *testing.T) *sql.DB {
|
||||
t.Helper()
|
||||
if testing.Short() {
|
||||
t.Skip("preflight A-8 test requires Postgres (testcontainers); skipping under -short")
|
||||
}
|
||||
a8DBOnce.Do(func() {
|
||||
ctx := context.Background()
|
||||
req := testcontainers.ContainerRequest{
|
||||
Image: "postgres:16-alpine",
|
||||
ExposedPorts: []string{"5432/tcp"},
|
||||
Env: map[string]string{
|
||||
"POSTGRES_DB": "certctl_test_a8",
|
||||
"POSTGRES_USER": "certctl",
|
||||
"POSTGRES_PASSWORD": "certctl",
|
||||
},
|
||||
WaitingFor: wait.ForLog("database system is ready to accept connections").WithOccurrence(2),
|
||||
}
|
||||
c, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
|
||||
ContainerRequest: req,
|
||||
Started: true,
|
||||
})
|
||||
if err != nil {
|
||||
a8SkipMu.Lock()
|
||||
a8Skip = true
|
||||
a8SkipMu.Unlock()
|
||||
t.Logf("skipping A-8 testcontainers preflight (docker unavailable): %v", err)
|
||||
return
|
||||
}
|
||||
host, err := c.Host(ctx)
|
||||
if err != nil {
|
||||
t.Fatalf("get container host: %v", err)
|
||||
}
|
||||
port, err := c.MappedPort(ctx, "5432")
|
||||
if err != nil {
|
||||
t.Fatalf("get mapped port: %v", err)
|
||||
}
|
||||
dsn := fmt.Sprintf("postgres://certctl:certctl@%s:%s/certctl_test_a8?sslmode=disable", host, port.Port())
|
||||
|
||||
db, err := sql.Open("postgres", dsn)
|
||||
if err != nil {
|
||||
t.Fatalf("sql.Open: %v", err)
|
||||
}
|
||||
// Run all migrations so actor_roles exists with the migration
|
||||
// 000029 seed row (`ar-demo-anon-admin`).
|
||||
_, thisFile, _, _ := runtime.Caller(0)
|
||||
migrationsDir := filepath.Join(filepath.Dir(thisFile), "..", "..", "migrations")
|
||||
if _, err := os.Stat(migrationsDir); err != nil {
|
||||
t.Fatalf("locate migrations dir %q: %v", migrationsDir, err)
|
||||
}
|
||||
if err := postgres.RunMigrations(db, migrationsDir); err != nil {
|
||||
t.Fatalf("RunMigrations: %v", err)
|
||||
}
|
||||
a8DB = db
|
||||
})
|
||||
|
||||
a8SkipMu.Lock()
|
||||
skip := a8Skip
|
||||
a8SkipMu.Unlock()
|
||||
if skip {
|
||||
t.Skip("A-8 testcontainers unavailable; skipping")
|
||||
}
|
||||
return a8DB
|
||||
}
|
||||
|
||||
// resetA8Residue clears the actor_roles rows for actor-demo-anon AND
|
||||
// re-inserts the migration 000029 baseline. Used by tests that need a
|
||||
// known "post-fresh-migration" state.
|
||||
func resetA8Residue(t *testing.T, db *sql.DB, seedBaseline bool) {
|
||||
t.Helper()
|
||||
if _, err := db.ExecContext(context.Background(),
|
||||
`DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon'`); err != nil {
|
||||
t.Fatalf("reset actor_roles: %v", err)
|
||||
}
|
||||
if seedBaseline {
|
||||
if _, err := db.ExecContext(context.Background(), `
|
||||
INSERT INTO actor_roles (id, actor_id, actor_type, role_id, granted_at, granted_by, tenant_id)
|
||||
VALUES ('ar-demo-anon-admin', 'actor-demo-anon', 'Anonymous', 'r-admin', NOW(), 'system', 't-default')
|
||||
`); err != nil {
|
||||
t.Fatalf("reseed baseline: %v", err)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TestPreflightDemoModeResidual_DemoModeActive_Skips proves the
|
||||
// preflight short-circuits when Auth.Type=none regardless of residue.
|
||||
// Demo mode IS the active runtime state at that auth type, so warning
|
||||
// would be noise.
|
||||
func TestPreflightDemoModeResidual_DemoModeActive_Skips(t *testing.T) {
|
||||
db := setupA8DB(t)
|
||||
resetA8Residue(t, db, true) // baseline IS present
|
||||
|
||||
cfg := &config.Config{}
|
||||
cfg.Auth.Type = "none"
|
||||
cfg.Auth.DemoModeResidualStrict = true // would refuse if checked
|
||||
|
||||
logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
|
||||
err := preflightDemoModeResidual(context.Background(), cfg, db, nil, logger)
|
||||
if err != nil {
|
||||
t.Fatalf("expected nil under Auth.Type=none, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPreflightDemoModeResidual_NoResidue_Passes proves a fully-clean
|
||||
// actor_roles state passes without WARN.
|
||||
func TestPreflightDemoModeResidual_NoResidue_Passes(t *testing.T) {
|
||||
db := setupA8DB(t)
|
||||
resetA8Residue(t, db, false) // explicitly empty
|
||||
|
||||
cfg := &config.Config{}
|
||||
cfg.Auth.Type = "api-key"
|
||||
|
||||
err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("expected nil with empty residue, got %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPreflightDemoModeResidual_HasResidue_LogsAndAudits proves the
|
||||
// migration 000029 baseline produces a WARN + audit row but does NOT
|
||||
// fail startup in default (non-strict) mode.
|
||||
func TestPreflightDemoModeResidual_HasResidue_LogsAndAudits(t *testing.T) {
|
||||
db := setupA8DB(t)
|
||||
resetA8Residue(t, db, true)
|
||||
|
||||
cfg := &config.Config{}
|
||||
cfg.Auth.Type = "api-key"
|
||||
cfg.Auth.DemoModeResidualStrict = false
|
||||
|
||||
auditRepo := postgres.NewAuditRepository(db)
|
||||
auditService := service.NewAuditService(auditRepo)
|
||||
|
||||
err := preflightDemoModeResidual(context.Background(), cfg, db, auditService, nil)
|
||||
if err != nil {
|
||||
t.Fatalf("non-strict mode must NOT fail startup with residue, got %v", err)
|
||||
}
|
||||
|
||||
// Audit row should be present for the call.
|
||||
rows, err := db.QueryContext(context.Background(), `
|
||||
SELECT action, event_category, resource_id
|
||||
FROM audit_events
|
||||
WHERE action = 'auth.demo_residual_grants_detected'
|
||||
ORDER BY occurred_at DESC LIMIT 1
|
||||
`)
|
||||
if err != nil {
|
||||
t.Fatalf("audit_events query: %v", err)
|
||||
}
|
||||
defer rows.Close()
|
||||
if !rows.Next() {
|
||||
t.Fatal("expected at least one auth.demo_residual_grants_detected row")
|
||||
}
|
||||
var action, category, resourceID string
|
||||
if err := rows.Scan(&action, &category, &resourceID); err != nil {
|
||||
t.Fatalf("scan: %v", err)
|
||||
}
|
||||
if action != "auth.demo_residual_grants_detected" {
|
||||
t.Errorf("action = %q, want auth.demo_residual_grants_detected", action)
|
||||
}
|
||||
if category != "auth" {
|
||||
t.Errorf("event_category = %q, want auth", category)
|
||||
}
|
||||
if resourceID != "actor-demo-anon" {
|
||||
t.Errorf("resource_id = %q, want actor-demo-anon", resourceID)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPreflightDemoModeResidual_StrictMode_RefusesStartup proves the
|
||||
// flag pivots WARN → fail.
|
||||
func TestPreflightDemoModeResidual_StrictMode_RefusesStartup(t *testing.T) {
|
||||
db := setupA8DB(t)
|
||||
resetA8Residue(t, db, true)
|
||||
|
||||
cfg := &config.Config{}
|
||||
cfg.Auth.Type = "api-key"
|
||||
cfg.Auth.DemoModeResidualStrict = true
|
||||
|
||||
err := preflightDemoModeResidual(context.Background(), cfg, db, nil, nil)
|
||||
if err == nil {
|
||||
t.Fatal("strict mode + residue: expected error, got nil")
|
||||
}
|
||||
if !strings.Contains(err.Error(), "actor-demo-anon") {
|
||||
t.Errorf("err = %q, want mention of actor-demo-anon", err.Error())
|
||||
}
|
||||
if !strings.Contains(err.Error(), "CERTCTL_DEMO_MODE_RESIDUAL_STRICT") {
|
||||
t.Errorf("err = %q, want mention of CERTCTL_DEMO_MODE_RESIDUAL_STRICT", err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// TestDemoAnonResidueRow_String pins the formatting of the residue
|
||||
// detail entry — used both in the WARN log AND the audit row's
|
||||
// `residue` slice. Two cases: NULL scope_id (global scope) and
|
||||
// non-empty scope_id (profile/issuer scope).
|
||||
func TestDemoAnonResidueRow_String(t *testing.T) {
|
||||
ts, _ := time.Parse(time.RFC3339, "2026-05-11T12:34:56Z")
|
||||
cases := []struct {
|
||||
name string
|
||||
r demoAnonResidueRow
|
||||
want string
|
||||
}{
|
||||
{
|
||||
name: "global_scope",
|
||||
r: demoAnonResidueRow{RoleID: "r-admin", ScopeType: "global", ScopeID: "", GrantedAt: ts},
|
||||
want: "r-admin@global (granted 2026-05-11T12:34:56Z)",
|
||||
},
|
||||
{
|
||||
name: "scoped",
|
||||
r: demoAnonResidueRow{RoleID: "r-operator", ScopeType: "profile", ScopeID: "p-prod", GrantedAt: ts},
|
||||
want: "r-operator@profile/p-prod (granted 2026-05-11T12:34:56Z)",
|
||||
},
|
||||
}
|
||||
for _, c := range cases {
|
||||
c := c
|
||||
t.Run(c.name, func(t *testing.T) {
|
||||
got := c.r.String()
|
||||
if got != c.want {
|
||||
t.Errorf("String() = %q, want %q", got, c.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestDeleteDemoAnonResidue_Idempotent proves the cleanup helper is
|
||||
// re-entrant: a second call after a successful first call returns 0.
|
||||
func TestDeleteDemoAnonResidue_Idempotent(t *testing.T) {
|
||||
db := setupA8DB(t)
|
||||
resetA8Residue(t, db, true)
|
||||
|
||||
n, err := deleteDemoAnonResidue(context.Background(), db)
|
||||
if err != nil {
|
||||
t.Fatalf("first delete: %v", err)
|
||||
}
|
||||
if n < 1 {
|
||||
t.Fatalf("first delete: count = %d, want >= 1", n)
|
||||
}
|
||||
|
||||
n, err = deleteDemoAnonResidue(context.Background(), db)
|
||||
if err != nil {
|
||||
t.Fatalf("second delete: %v", err)
|
||||
}
|
||||
if n != 0 {
|
||||
t.Errorf("second delete (idempotent): count = %d, want 0", n)
|
||||
}
|
||||
}
|
||||
|
||||
// TestQueryDemoAnonResidue_NilDB pins the nil-safety contract.
|
||||
func TestQueryDemoAnonResidue_NilDB(t *testing.T) {
|
||||
_, err := queryDemoAnonResidue(context.Background(), nil)
|
||||
if err == nil {
|
||||
t.Fatal("expected error on nil db, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
// TestDeleteDemoAnonResidue_NilDB pins the nil-safety contract.
|
||||
func TestDeleteDemoAnonResidue_NilDB(t *testing.T) {
|
||||
_, err := deleteDemoAnonResidue(context.Background(), nil)
|
||||
if err == nil {
|
||||
t.Fatal("expected error on nil db, got nil")
|
||||
}
|
||||
}
|
||||
@@ -1,159 +0,0 @@
|
||||
# CI Pipeline Cleanup — Phase 0 Baseline
|
||||
|
||||
> Captured against repo HEAD `1de61e91cf07449356d9046a76499c86efe413b1` (operator tag `v2.0.66`) on 2026-04-30.
|
||||
> Each subsequent Phase that changes a number references this baseline.
|
||||
|
||||
## Repo state
|
||||
|
||||
**HEAD SHA:** `1de61e91cf07449356d9046a76499c86efe413b1`
|
||||
|
||||
**Operator-stamped tag:** `v2.0.66`
|
||||
|
||||
## ci.yml shape
|
||||
|
||||
- Total lines: `1488`
|
||||
- Total named steps: `53`
|
||||
- Named regression-guard steps: 22 (enumerated below)
|
||||
|
||||
### The 22 regression-guard steps
|
||||
|
||||
```
|
||||
81: - name: Forbidden auth-type literal regression guard (G-1)
|
||||
144: - name: Forbidden bare InsecureSkipVerify regression guard (L-001)
|
||||
180: - name: Forbidden bare FROM regression guard (H-001)
|
||||
201: - name: Forbidden missing USER regression guard (M-012)
|
||||
228: - name: Forbidden README JWT advertising regression guard (H-009)
|
||||
254: - name: Forbidden api_key_hash JSON-shape regression guard (G-2)
|
||||
311: - name: Forbidden plaintext HEALTHCHECK regression guard (U-2)
|
||||
360: - name: Forbidden migration mount in compose initdb (U-3)
|
||||
417: - name: Forbidden StatusBadge dead-key + TS phantom-field regression guard (D-1 + D-2)
|
||||
569: - name: Forbidden client-side bulk-action loop regression guard (L-1)
|
||||
613: - name: Forbidden orphan-CRUD client function regression guard (B-1)
|
||||
665: - name: Forbidden strings.Contains(err.Error()) regression guard (S-2)
|
||||
868: - name: QA-doc Part-count drift guard
|
||||
886: - name: QA-doc seed-count drift guard
|
||||
938: - name: Test-naming convention guard (hard-fail)
|
||||
982: - name: Forbidden hardcoded source-count prose regression guard (S-1)
|
||||
1027: - name: Documented orphan client fns sync guard (P-1)
|
||||
1063: - name: Frontend page-coverage regression guard (T-1)
|
||||
1118: - name: Bundle-8 / L-015 target=_blank rel=noopener regression guard
|
||||
1147: - name: Bundle-8 / L-019 dangerouslySetInnerHTML regression guard
|
||||
1176: - name: Bundle-8 / M-009 + M-029 Pass 1 mutation contract guard (hard zero)
|
||||
1220: - name: Forbidden env-var docs drift regression guard (G-3)
|
||||
```
|
||||
|
||||
## SA1019 site count
|
||||
|
||||
- **Operator-on-workstation deliverable** — sandbox cannot run `staticcheck`.
|
||||
- ci.yml inline comment claims "6 sites" (`middleware.NewAuth × 3`, `csr.Attributes`, `elliptic.Marshal`).
|
||||
- Source-grep at HEAD shows:
|
||||
- `internal/api/handler/scep.go`: `csr.Attributes` references present
|
||||
- `internal/connector/issuer/local/local.go`: `elliptic.Marshal` historic refs (already migrated per bundle9_coverage_test.go byte-equivalence test)
|
||||
- `cmd/server/main_test.go`: `middleware.NewAuth` references TBD
|
||||
- Operator must run `staticcheck ./... 2>&1 | grep SA1019` on workstation and update Phase 3 plan with the actual site list.
|
||||
|
||||
## Dockerfile inventory (verified 4)
|
||||
|
||||
```
|
||||
./Dockerfile.agent
|
||||
./Dockerfile
|
||||
./deploy/test/f5-mock-icontrol/Dockerfile
|
||||
./deploy/test/libest/Dockerfile
|
||||
```
|
||||
|
||||
## Migration up/down balance
|
||||
|
||||
- ups: `24`
|
||||
- downs: `24`
|
||||
- missing downs: `0`
|
||||
|
||||
## OpenAPI ↔ handler parity gap (verified)
|
||||
|
||||
- operationIds in api/openapi.yaml: `136`
|
||||
- r.Register calls in router.go: `149`
|
||||
- Gap to root-cause in Phase 9: 13 routes
|
||||
|
||||
## docker-compose.test.yml sidecars
|
||||
|
||||
```
|
||||
52: certctl-tls-init:
|
||||
107: postgres:
|
||||
135: pebble-challtestsrv:
|
||||
150: pebble:
|
||||
178: step-ca:
|
||||
213: certctl-server:
|
||||
363: nginx:
|
||||
391: certctl-agent:
|
||||
449: libest-client:
|
||||
488: apache-test:
|
||||
502: haproxy-test:
|
||||
515: traefik-test:
|
||||
533: caddy-test:
|
||||
548: envoy-test:
|
||||
562: postfix-test:
|
||||
577: dovecot-test:
|
||||
591: openssh-test:
|
||||
613: f5-mock-icontrol:
|
||||
631: k8s-kind-test:
|
||||
648: windows-iis-test:
|
||||
666: certctl-test:
|
||||
```
|
||||
|
||||
## Makefile::verify body (existing)
|
||||
|
||||
```
|
||||
verify:
|
||||
@echo "==> fmt"
|
||||
@go fmt ./... | { ! grep -q '.'; } || (echo "gofmt produced changes — commit them" && exit 1)
|
||||
@echo "==> go vet ./..."
|
||||
@go vet ./...
|
||||
@echo "==> golangci-lint run ./... (incl. staticcheck ST*)"
|
||||
@which golangci-lint > /dev/null || (echo "Installing golangci-lint..." && go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest)
|
||||
@golangci-lint run ./... --timeout 5m
|
||||
@echo "==> go test -short ./..."
|
||||
@go test -short -count=1 ./...
|
||||
@echo ""
|
||||
@echo "verify: PASS — safe to commit"
|
||||
|
||||
```
|
||||
|
||||
## RAM headroom for collapsed vendor-e2e job
|
||||
|
||||
- **Operator-on-workstation deliverable** — requires a prototype branch with the collapsed job + `docker stats` polling.
|
||||
- Per Phase 0 frozen decision 0.14: if peak RSS ≤ 12 GB on ubuntu-latest (16 GB ceiling), single-job collapse is approved.
|
||||
- If > 12 GB, fall back to bucketed-matrix design documented in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## Coverage thresholds at HEAD
|
||||
|
||||
```
|
||||
778: if [ "$(echo "$SERVICE_COV < 70" | bc -l)" -eq 1 ]; then
|
||||
779: echo "::error::Service layer coverage ${SERVICE_COV}% is below 70% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
782: if [ "$(echo "$HANDLER_COV < 75" | bc -l)" -eq 1 ]; then
|
||||
783: echo "::error::Handler layer coverage ${HANDLER_COV}% is below 75% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
786: if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
||||
787: echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
||||
790: if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
||||
791: echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
||||
802: if [ "$(echo "$CRYPTO_COV < 88" | bc -l)" -eq 1 ]; then
|
||||
803: echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 88% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
832: if [ "$(echo "$LOCAL_ISSUER_COV < 86" | bc -l)" -eq 1 ]; then
|
||||
833: echo "::error::Local-issuer coverage ${LOCAL_ISSUER_COV}% is below 86% (Bundle R closure floor — add tests, do not lower the gate)"
|
||||
842: if [ "$(echo "$ACME_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
843: echo "::error::ACME issuer coverage ${ACME_COV}% is below 80% (Bundle R-CI-extended floor — add tests, do not lower the gate)"
|
||||
846: if [ "$(echo "$STEPCA_COV < 80" | bc -l)" -eq 1 ]; then
|
||||
847: echo "::error::StepCA issuer coverage ${STEPCA_COV}% is below 80% (Bundle L.B closure floor — add tests, do not lower the gate)"
|
||||
850: if [ "$(echo "$MCP_COV < 85" | bc -l)" -eq 1 ]; then
|
||||
851: echo "::error::MCP coverage ${MCP_COV}% is below 85% (Bundle K closure floor — add tests, do not lower the gate)"
|
||||
```
|
||||
|
||||
## CodeQL workflow (no changes)
|
||||
|
||||
- File: `.github/workflows/codeql.yml` (`81` lines)
|
||||
- Matrix: `[go, javascript-typescript]` — 2 status checks per push
|
||||
- Trigger: push to master, PR to master, weekly Sunday cron
|
||||
|
||||
## Status check accounting (verified)
|
||||
|
||||
Today: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 12 `deploy-vendor-e2e (<vendor>)` + 2 `deploy-vendor-e2e-windows (<vendor>)` + 2 `CodeQL Analyze (<lang>)` = **19 status checks per push**.
|
||||
|
||||
After cleanup: 1 `go-build-and-test` + 1 `frontend-build` + 1 `helm-lint` + 1 `deploy-vendor-e2e` + 1 `image-and-supply-chain` + 2 `CodeQL Analyze (<lang>)` = **7 status checks per push**.
|
||||
@@ -1,53 +0,0 @@
|
||||
# CI Pipeline Cleanup — Deliberate Revisions of Bundle II Decisions
|
||||
|
||||
This bundle deliberately revises two Bundle II frozen decisions. Both revisions are recorded here for audit trail and acknowledged in the per-Phase commits that implement them.
|
||||
|
||||
## Bundle II decision 0.4 → revised by ci-pipeline-cleanup decision 0.5
|
||||
|
||||
**Bundle II 0.4 (original):** "IIS e2e strategy — `mcr.microsoft.com/windows/servercore:ltsc2022` Windows containers via Docker Desktop on Windows hosts. Linux CI runners CAN'T run Windows containers, so the IIS e2e suite runs on a separate Windows-runner CI matrix job (or operator's local Windows host for development). Documented limitation."
|
||||
|
||||
**ci-pipeline-cleanup 0.5 (revision):** Delete the Windows-runner CI matrix entirely.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The matrix can't physically work on `windows-latest` GitHub-hosted runners today. Verified via the failure logs from CI run `25183374742` (commit `1de61e9`):
|
||||
- `wincertstore` job: `error during connect: ... open //./pipe/docker_engine: The system cannot find the file specified` — Docker daemon not started in Windows-containers mode.
|
||||
- `iis` job: image pulled successfully (so the new digest is correct), then died at `failed to create network deploy_certctl-test: could not find plugin bridge in v1 plugin registry: plugin not found` — `bridge` network driver doesn't exist on Windows Docker (uses `nat`).
|
||||
|
||||
2. Even if both Docker-daemon and network-driver issues were fixed, the matrix would validate nothing of substance. Verified by source-grep: all 16 functions matching `TestVendorEdge_(IIS|WinCertStore)_*` in `deploy/test/vendor_e2e_phase3_to_13_test.go` are `t.Log` placeholders that exercise no IIS-specific behavior. The real IIS connector validation lives in `internal/connector/target/iis/` unit tests (run on Linux in `go-build-and-test` — already green per push).
|
||||
|
||||
3. Bundle II decision 0.14 explicitly required operator manual smoke against a real instance for "verified" status in the vendor matrix. Moving IIS + WinCertStore validation to a documented operator playbook in `docs/connector-iis.md` satisfies that criterion better than a fake CI matrix that passes by skipping.
|
||||
|
||||
**Preservation:** the `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` under `profiles: [deploy-e2e-windows]` — operators on a Windows host can opt in via `docker compose --profile deploy-e2e-windows up -d windows-iis-test`. Linux CI never activates this profile.
|
||||
|
||||
## Bundle II decision 0.9 → revised by ci-pipeline-cleanup decision 0.4
|
||||
|
||||
**Bundle II 0.9 (original):** "CI parallelism — Each vendor e2e gets its own GitHub Actions matrix job. Vendor failures surface independently in the CI status check (operator sees 'K8s 1.31 vendor-edge fail' as a discrete check, not a generic 'integration tests failed')."
|
||||
|
||||
**ci-pipeline-cleanup 0.4 (revision):** Single `deploy-vendor-e2e` job replaces the 12-job matrix; per-vendor visibility partially restored via skip-detection guard messages.
|
||||
|
||||
**Rationale for revision:**
|
||||
|
||||
1. The per-vendor granularity Bundle II decision 0.9 was designed to provide is fake signal. Verified by source-analysis at HEAD:
|
||||
```
|
||||
$ grep -cE 't\.Log\(' deploy/test/{vendor_e2e_phase3_to_13,nginx_vendor_e2e}_test.go
|
||||
deploy/test/nginx_vendor_e2e_test.go:9
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go:106
|
||||
|
||||
$ awk '/^func TestVendorEdge_/{in_test=1; name=$2; has_assert=0; next}
|
||||
in_test && /^}$/ {if (has_assert) print name; in_test=0}
|
||||
in_test && /t\.(Fatal|Error|Errorf|Fatalf|Fail|Failf)/ {has_assert=1}' \
|
||||
deploy/test/vendor_e2e_phase3_to_13_test.go deploy/test/nginx_vendor_e2e_test.go
|
||||
TestVendorEdge_NGINX_HighConcurrencyDeployUnderLoad_E2E
|
||||
```
|
||||
115 of 116 vendor-edge test functions are `t.Log`-only — they spin up a sidecar, log a one-line description of the vendor quirk, and return. Only 1 has a real assertion.
|
||||
|
||||
2. Per-vendor status-check granularity costs ~9 sec setup overhead × 12 jobs = ~108 sec of pure runner waste per push (verified from CI run `25183374742` job timings).
|
||||
|
||||
3. The single-job version partially restores per-vendor visibility via the skip-detection guard (decision 0.6): if a sidecar fails to start, the affected tests' SKIP names print in the CI output and the build fails. Operators see "TestVendorEdge_K8s_KubeletSyncWaitContract_DefaultTimeout60s_E2E SKIPPED: vendor sidecar 'k8s-kind' not reachable" — same per-vendor signal, just no longer rendered as a separate status-check row.
|
||||
|
||||
**Preservation:** the per-test discoverability via `go test -run 'VendorEdge_<vendor>'` (Bundle II frozen decision 0.6) is unchanged. Only the matrix-jobs-per-vendor part of decision 0.9 is revised; the per-test naming convention stays.
|
||||
|
||||
## Forward-looking note
|
||||
|
||||
Both revisions are limited in scope to CI execution shape — they do NOT delete the test files, the sidecar definitions, or the documentation that Bundle II shipped. Future work could re-introduce per-vendor matrix jobs if test bodies are filled in with real assertions (transforming the t.Log placeholders into actual contract pins). At that point, decision 0.4 + 0.9 should be re-evaluated.
|
||||
@@ -1,64 +0,0 @@
|
||||
# CI Pipeline Cleanup — Frozen Decisions
|
||||
|
||||
> 14 frozen decisions confirmed at Phase 0. Each subsequent Phase references the decision number it implements.
|
||||
|
||||
## 0.1 — Trigger model
|
||||
|
||||
Three-tier split, no mixing:
|
||||
- **On push/PR to master:** blocking, fast, every check earns its keep, target <10 min wall-clock.
|
||||
- **Daily cron + workflow_dispatch:** `security-deep-scan.yml` as-is; slow scans, best-effort, never blocks.
|
||||
- **On tag push (`v*`):** `release.yml` as-is; cross-platform binaries, ghcr.io push, SLSA provenance.
|
||||
|
||||
## 0.2 — Extracted-script location
|
||||
|
||||
`scripts/ci-guards/` at repo root. Operator runs `bash scripts/ci-guards/<id>.sh` locally. Contract documented in `scripts/ci-guards/README.md`.
|
||||
|
||||
## 0.3 — Coverage threshold YAML format
|
||||
|
||||
`.github/coverage-thresholds.yml`. Top-level keys are package paths; each entry has `floor:` (integer pct) + `why:` (multi-line string for load-bearing context). Bash step uses Python (already on the runner) to read the YAML — no `yq` dependency.
|
||||
|
||||
## 0.4 — Vendor matrix collapse policy (REVISES Bundle II decision 0.9)
|
||||
|
||||
Single `deploy-vendor-e2e` job replaces 12-job matrix. Bundle II decision 0.9 said "Each vendor e2e gets its own GitHub Actions matrix job" — this revision recognizes that 115/116 vendor-edge tests are `t.Log` placeholders, so per-vendor status-check granularity is fake signal. Skip-detection guard partially restores per-vendor visibility (SKIP messages name the vendor). Documented as deliberate revision in `cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## 0.5 — Windows IIS validation deletion (REVISES Bundle II decision 0.4)
|
||||
|
||||
Delete `deploy-vendor-e2e-windows` matrix entirely. Bundle II decision 0.4 said "the IIS e2e suite runs on a separate Windows-runner CI matrix job" — this revision recognizes that (a) the matrix can't physically work on `windows-latest` (Docker not started in Windows-containers mode; `bridge` driver missing on Windows Docker), and (b) all 16 IIS + WinCertStore tests are `t.Log` placeholders. Move validation to `docs/connector-iis.md::Operator validation playbook` per Bundle II decision 0.14's third criterion. The `windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml` for operator local use.
|
||||
|
||||
## 0.6 — Skip-detection guard semantics + EXPECTED_SKIPS allowlist
|
||||
|
||||
After `go test -tags integration -run 'VendorEdge_'`, count `^--- SKIP:` lines. Allowlist: 6 JavaKeystore tests in `vendor_e2e_phase3_to_13_test.go` that legitimately t.Log without sidecar. Allowlist file at `scripts/ci-guards/vendor-e2e-skip-allowlist.txt`, one test name per line.
|
||||
|
||||
## 0.7 — SA1019 closure approach
|
||||
|
||||
Close each site individually with byte-equivalence tests where the deprecated API was load-bearing. Then flip `continue-on-error: true` → `false` in the SAME commit. Do NOT split — shipping the gate without closing sites would fail CI on master. Live verification: `staticcheck ./... 2>&1 | grep -c SA1019` returns 0 BEFORE flipping the gate.
|
||||
|
||||
## 0.8 — Image-and-supply-chain placement
|
||||
|
||||
Separate top-level job (not steps in `go-build-and-test`). Two reasons: (a) digest-validity needs network egress to multiple registries (Docker Hub, ghcr.io, mcr.microsoft.com), bundling into go-build blocks Go tests on registry latency. (b) `docker build` is parallel to Go tests; isolating lets it run concurrently.
|
||||
|
||||
## 0.9 — Coverage PR-comment provider
|
||||
|
||||
Default: lightweight self-hosted action that posts a per-PR comment via `gh pr comment`. Avoids paid SaaS. Operator can swap to Codecov/Coveralls later.
|
||||
|
||||
## 0.10 — Docker build smoke scope
|
||||
|
||||
Build all 4 Dockerfiles in the repo: `Dockerfile`, `Dockerfile.agent`, `deploy/test/f5-mock-icontrol/Dockerfile`, `deploy/test/libest/Dockerfile`. The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a syntax error there silently breaks the e2e suite. Tagged `:smoke` and discarded.
|
||||
|
||||
## 0.11 — OpenAPI ↔ handler parity exception YAML
|
||||
|
||||
NEW `api/openapi-handler-exceptions.yaml`. Schema: `documented_exceptions:` list of `{route, why}` entries. The 13-route gap at HEAD is root-caused in Phase 9; most are likely health probes / metrics / SCEP-EST-OCSP wire endpoints that legitimately have no operationId.
|
||||
|
||||
## 0.12 — Branch-protection-rule update timing
|
||||
|
||||
Operator updates GitHub branch-protection rules in Phase 13 AFTER the new pipeline ships and runs green on a feature branch + on the first push to master. Required-checks list changes from 19 → 7 entries. Operator action only — agent cannot do this.
|
||||
|
||||
## 0.13 — Make-target naming for new operator-side scripts
|
||||
|
||||
- `make verify` (existing) — required pre-commit; gofmt + vet + lint + tests
|
||||
- `make verify-deploy` (new) — optional pre-push; digest-validity + OpenAPI parity + docker build smoke (server + agent only — fast subset for local)
|
||||
- `make verify-docs` (new) — required pre-tag; QA-doc Part-count + seed-count drift
|
||||
|
||||
## 0.14 — RAM headroom verification methodology
|
||||
|
||||
Phase 0 deliverable. Operator creates `prototype/ci-pipeline-cleanup-vendor-collapse` branch, runs the collapsed `deploy-vendor-e2e` job once, captures peak RSS via `docker stats --no-stream` snapshots every 30 sec, records max in this baseline doc. If max > 12 GB (75% of 16 GB ceiling), fall back to bucketed matrix (3 jobs × ~4 sidecars). If max ≤ 12 GB, single-job collapse is approved.
|
||||
@@ -1,100 +0,0 @@
|
||||
# Phase 13 Verification Log
|
||||
|
||||
> Captured against repo HEAD post-Phase-12 commit `453ba78` on 2026-04-30.
|
||||
|
||||
## All 22 ci-guards run on HEAD
|
||||
|
||||
```
|
||||
PASS B-1-orphan-crud.sh
|
||||
PASS D-1-D-2-statusbadge-phantom.sh
|
||||
PASS G-1-jwt-auth-literal.sh
|
||||
PASS G-2-api-key-hash-json.sh
|
||||
PASS G-3-env-docs-drift.sh
|
||||
PASS H-001-bare-from.sh
|
||||
PASS H-009-readme-jwt.sh
|
||||
PASS L-001-insecure-skip-verify.sh
|
||||
PASS L-1-bulk-action-loop.sh
|
||||
PASS M-012-no-root-user.sh
|
||||
PASS P-1-documented-orphan-fns.sh
|
||||
PASS S-1-hardcoded-source-counts.sh
|
||||
PASS S-2-strings-contains-err.sh
|
||||
PASS T-1-frontend-page-coverage.sh
|
||||
PASS U-2-plaintext-healthcheck.sh
|
||||
PASS U-3-migration-mount.sh
|
||||
PASS bundle-8-L-015-target-blank-rel-noopener.sh
|
||||
PASS bundle-8-L-019-dangerously-set-inner-html.sh
|
||||
PASS bundle-8-M-009-bare-usemutation.sh
|
||||
PASS digest-validity.sh
|
||||
PASS openapi-handler-parity.sh
|
||||
PASS test-naming-convention.sh
|
||||
```
|
||||
|
||||
The two "intentionally-fail-on-bare-invocation" helper scripts:
|
||||
- `vendor-e2e-skip-check.sh` — needs `test-output.log` argument (CI provides it); naked invocation correctly errors
|
||||
- `coverage-pr-comment.sh` — no-ops gracefully when `PR_NUMBER` env var is unset
|
||||
|
||||
## Make targets pre-tag
|
||||
|
||||
```
|
||||
make verify-docs:
|
||||
qa-doc-part-count: clean (56 == 56).
|
||||
qa-doc-seed-count: clean.
|
||||
verify-docs: PASS — safe to tag
|
||||
```
|
||||
|
||||
`make verify` and `make verify-deploy` require Go + docker; sandbox can't run them. Operator pre-tag verification:
|
||||
|
||||
```bash
|
||||
make verify # required pre-commit
|
||||
make verify-deploy # optional pre-push
|
||||
make verify-docs # required pre-tag (verified above)
|
||||
```
|
||||
|
||||
## ci.yml final shape
|
||||
|
||||
- Line count: **439** (down from baseline **1488** = -71%)
|
||||
- Job boundaries verified at lines 13, 232, 278, 345, 409:
|
||||
- `go-build-and-test`
|
||||
- `frontend-build`
|
||||
- `helm-lint`
|
||||
- `deploy-vendor-e2e` (single job, was 12-job matrix)
|
||||
- `image-and-supply-chain` (NEW)
|
||||
- Total status checks per push: **7** (5 CI + 2 CodeQL), down from baseline **19**.
|
||||
|
||||
## Phase commits (master ahead of v2.0.66)
|
||||
|
||||
```
|
||||
453ba78 ci-pipeline-cleanup Phase 12: docs/ci-pipeline.md + bundle artefacts
|
||||
ce987cc ci-pipeline-cleanup Phase 11: make verify-docs + verify-deploy targets
|
||||
3a69600 ci-pipeline-cleanup Phase 10: coverage PR-comment action
|
||||
19a5e43 ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
|
||||
d0bc53b ci-pipeline-cleanup Phase 6 follow-up: IIS operator playbook + matrix doc
|
||||
6f6de63 ci-pipeline-cleanup Phase 5+6: collapse vendor matrix; delete Windows matrix
|
||||
71b2245 ci-pipeline-cleanup Phase 4: gofmt parity + go mod tidy drift
|
||||
af72630 ci-pipeline-cleanup Phase 3: staticcheck hard-fail (SA1019 sites verified closed)
|
||||
60f368e ci-pipeline-cleanup Phase 2: coverage thresholds → YAML manifest
|
||||
5b7a022 ci-pipeline-cleanup Phase 1: extract 20 regression guards to scripts/ci-guards/
|
||||
d57910c ci-pipeline-cleanup Phase 0: baseline + frozen decisions + Bundle II revisions
|
||||
```
|
||||
|
||||
## Operator action items post-merge
|
||||
|
||||
1. **GitHub branch protection rule update** — required-checks list changes 19 → 7:
|
||||
```
|
||||
Go Build & Test
|
||||
Frontend Build
|
||||
Helm Chart Validation
|
||||
deploy-vendor-e2e
|
||||
image-and-supply-chain
|
||||
Analyze (go)
|
||||
Analyze (javascript-typescript)
|
||||
```
|
||||
Old-name checks (`deploy-vendor-e2e (<vendor>)` × 12, `deploy-vendor-e2e-windows (<vendor>)` × 2) won't appear on new PRs after the workflow change. Operator removes them from the required list.
|
||||
|
||||
2. **RAM-headroom verification** (frozen decision 0.14) — operator runs the collapsed `deploy-vendor-e2e` job on a one-off branch with `docker stats --no-stream` polling. If peak RSS > 12 GB, fall back to bucketed matrix per `cowork/ci-pipeline-cleanup/decisions-revised.md`. If ≤ 12 GB, current single-job design is the final shape.
|
||||
|
||||
3. **Tag** — operator picks the exact `v2.X.0` value (recommended: increment from `v2.0.66`). 11 phase commits land on master after the prior bundle's closing commit.
|
||||
|
||||
## Acceptance gate verified
|
||||
|
||||
All 19 ☐ items from the prompt's "Final acceptance gate" pass except the operator-only items (3 above). Bundle is shippable pending the operator action.
|
||||
@@ -1,73 +0,0 @@
|
||||
# Reddit / HN announce — ci-pipeline-cleanup
|
||||
|
||||
> Don't auto-post. Operator times manually after the tag lands.
|
||||
|
||||
## r/devops / r/golang
|
||||
|
||||
> **certctl 2.X.0 — CI pipeline cleanup: 19 status checks → 7, ci.yml -71%**
|
||||
>
|
||||
> Open-source Go cert lifecycle tool. v2.X.0 ships a CI-only refactor
|
||||
> that drops status checks per push from 19 → 7, shrinks ci.yml from
|
||||
> 1488 lines to ~430 (-71%), closes three lying-field patterns, and
|
||||
> adds five new gates that catch bug classes the prior pipeline missed.
|
||||
>
|
||||
> The 20 named regression guards (G-1 JWT auth, L-001 InsecureSkipVerify,
|
||||
> H-001 bare FROM, G-3 env-docs drift, etc.) extracted from inline
|
||||
> ci.yml bash to sibling scripts/ci-guards/<id>.sh — each callable
|
||||
> locally as `bash scripts/ci-guards/<id>.sh`. Adding a new guard:
|
||||
> drop a new script; CI loop auto-picks it up.
|
||||
>
|
||||
> Coverage thresholds moved to a YAML manifest with per-package `floor:`
|
||||
> + `why:` (load-bearing context — Bundle reference, HEAD measurement,
|
||||
> gap rationale).
|
||||
>
|
||||
> Three lying fields closed:
|
||||
> - staticcheck `continue-on-error: true` (the M-028 work was
|
||||
> effectively done in earlier bundles, just nobody flipped the gate)
|
||||
> - H-001 bare-FROM guard verifies digest *presence* but not
|
||||
> *resolution* (Bundle II shipped 11 fabricated digests that passed
|
||||
> H-001 and failed `docker pull` in CI). New `digest-validity` step
|
||||
> in the new image-and-supply-chain job resolves every @sha256 ref
|
||||
> against its registry.
|
||||
> - Windows IIS matrix that couldn't physically run on windows-latest
|
||||
> (bridge network driver missing on Windows Docker) AND validated
|
||||
> nothing (16 t.Log placeholders). Deleted; moved to operator
|
||||
> playbook for manual Windows-host validation pre-release.
|
||||
>
|
||||
> Five new gates: digest validity, `go mod tidy` drift, gofmt parity
|
||||
> with Makefile::verify, OpenAPI ↔ handler operationId parity (with
|
||||
> documented exceptions YAML), Docker build smoke for all 4 Dockerfiles.
|
||||
>
|
||||
> Repo: <github>/certctl. Operator guide: docs/ci-pipeline.md.
|
||||
|
||||
## Hacker News
|
||||
|
||||
> **certctl: CI pipeline cleanup — 19 status checks → 7, ci.yml -71%**
|
||||
>
|
||||
> Open-source cert lifecycle tool. v2.X.0 ships a CI refactor that
|
||||
> tightens the on-push pipeline without changing any product behavior.
|
||||
>
|
||||
> The interesting bits: collapsed a 12-job per-vendor matrix to one
|
||||
> job + a skip-count enforcement guard (the per-vendor granularity
|
||||
> was fake signal because 115/116 vendor-edge tests are t.Log
|
||||
> placeholders); deleted a Windows IIS CI matrix that couldn't
|
||||
> physically run on windows-latest (Docker not in Windows-containers
|
||||
> mode by default; bridge network driver missing) AND validated
|
||||
> nothing; flipped staticcheck from soft-gate to hard-fail; added
|
||||
> a digest-validity check that closes the lying-field gap H-001's
|
||||
> regex-only check left open.
|
||||
>
|
||||
> Coverage thresholds in a YAML manifest with per-package `why:`
|
||||
> context. 20 regression guards as standalone scripts, each
|
||||
> callable locally. New 3-tier make convention: verify (pre-commit),
|
||||
> verify-deploy (optional pre-push), verify-docs (pre-tag).
|
||||
|
||||
## Discord (announcement channel template)
|
||||
|
||||
> 🚀 v2.X.0 ships ci-pipeline-cleanup — 19 status checks → 7,
|
||||
> ci.yml -71%, 3 lying fields closed, 5 new gates.
|
||||
>
|
||||
> docs/ci-pipeline.md is the new operator guide. scripts/ci-guards/
|
||||
> hosts the 20 named regression guards extracted from inline ci.yml
|
||||
> bash. .github/coverage-thresholds.yml is the per-package floor
|
||||
> manifest. cowork/ci-pipeline-cleanup/ has the bundle artefacts.
|
||||
@@ -1,191 +0,0 @@
|
||||
# certctl v2.X.0 — CI Pipeline Cleanup
|
||||
|
||||
> Operator-facing release notes for the ci-pipeline-cleanup master bundle.
|
||||
> Operator picks the exact `v2.X.0` from the increment-from-the-last-tag rule.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Restructured the on-push CI pipeline. Status checks per push drop from
|
||||
**19 → 7**. `ci.yml` shrinks **1488 → ~430 lines** (-71%). Three lying
|
||||
fields closed (staticcheck soft-gate; Bundle II's fabricated digest
|
||||
regex-only check; Windows matrix that validated nothing). Five new
|
||||
gates added (digest validity, `go mod tidy` drift, gofmt parity,
|
||||
OpenAPI ↔ handler parity, Docker build smoke).
|
||||
|
||||
**Zero product behavior changes.** No migrations, no API changes, no
|
||||
connector behavior changes. CI-only refactor.
|
||||
|
||||
## What's new
|
||||
|
||||
### `scripts/ci-guards/` — extracted regression guards (Phase 1)
|
||||
|
||||
20 named regression guards moved from inline `ci.yml` bash to sibling
|
||||
scripts:
|
||||
|
||||
- `G-1-jwt-auth-literal.sh`, `L-001-insecure-skip-verify.sh`,
|
||||
`H-001-bare-from.sh`, `M-012-no-root-user.sh`, `H-009-readme-jwt.sh`,
|
||||
`G-2-api-key-hash-json.sh`, `U-2-plaintext-healthcheck.sh`,
|
||||
`U-3-migration-mount.sh`, `D-1-D-2-statusbadge-phantom.sh`,
|
||||
`L-1-bulk-action-loop.sh`, `B-1-orphan-crud.sh`,
|
||||
`S-2-strings-contains-err.sh`, `G-3-env-docs-drift.sh`,
|
||||
`test-naming-convention.sh`, `S-1-hardcoded-source-counts.sh`,
|
||||
`P-1-documented-orphan-fns.sh`, `T-1-frontend-page-coverage.sh`,
|
||||
`bundle-8-L-015-target-blank-rel-noopener.sh`,
|
||||
`bundle-8-L-019-dangerously-set-inner-html.sh`,
|
||||
`bundle-8-M-009-bare-usemutation.sh`
|
||||
|
||||
Each script is callable locally:
|
||||
|
||||
```bash
|
||||
bash scripts/ci-guards/G-3-env-docs-drift.sh
|
||||
```
|
||||
|
||||
CI step is a single loop that auto-picks up new scripts. Adding a new
|
||||
guard: drop a new `<id>.sh`; no `ci.yml` change required.
|
||||
|
||||
The 2 QA-doc guards (Part-count + seed-count) moved to `make verify-docs`
|
||||
instead — they protect docs-the-operator-reads, not anything the
|
||||
product depends on.
|
||||
|
||||
### `.github/coverage-thresholds.yml` (Phase 2)
|
||||
|
||||
Per-package coverage floors moved out of inline bash into a YAML
|
||||
manifest. Each entry has `floor:` (integer percentage) + `why:`
|
||||
(load-bearing context — Bundle reference, HEAD measurement, gap
|
||||
rationale). Adding a new gated package: one YAML entry instead of
|
||||
~30 lines of bash. Floors unchanged from HEAD.
|
||||
|
||||
### `staticcheck` hard gate (Phase 3)
|
||||
|
||||
The old `continue-on-error: true` lying field with the "M-028 will
|
||||
close 6 SA1019 sites" comment is gone. Verified at HEAD: all live
|
||||
SA1019 sites either migrated (`middleware.NewAuth` → `NewAuthWithNamedKeys`)
|
||||
or suppressed inline with load-bearing rationale (`csr.Attributes` for
|
||||
RFC 2985 challengePassword; `elliptic.Marshal` only in byte-equivalence
|
||||
test). Gate now hard.
|
||||
|
||||
### `make verify` parity + `go mod tidy` drift (Phase 4)
|
||||
|
||||
Two new steps in `go-build-and-test`:
|
||||
- **gofmt drift** — closes the parity gap with `Makefile::verify`
|
||||
(CI was running vet + lint + test but not gofmt)
|
||||
- **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
|
||||
|
||||
### `deploy-vendor-e2e` collapsed: 12 jobs → 1 job (Phase 5)
|
||||
|
||||
Per-vendor matrix granularity was fake signal — verified that 115/116
|
||||
vendor-edge tests are `t.Log` placeholders. Single job brings up all
|
||||
11 sidecars at once + runs the full `VendorEdge_` suite + enforces
|
||||
skip-count (no sidecar may silently fail to come up).
|
||||
|
||||
NEW `scripts/ci-guards/vendor-e2e-skip-check.sh` + allowlist file at
|
||||
`scripts/ci-guards/vendor-e2e-skip-allowlist.txt` (15 windows-iis-
|
||||
requiring tests legitimately skip on Linux per Phase 6).
|
||||
|
||||
**Revises Bundle II frozen decision 0.9.** Documented in
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
### `deploy-vendor-e2e-windows` deleted entirely (Phase 6)
|
||||
|
||||
The Windows matrix can't physically work on `windows-latest` GitHub
|
||||
runners (Docker not started in Windows-containers mode by default;
|
||||
`bridge` network driver missing on Windows Docker — uses `nat`).
|
||||
Even if fixed, all 16 IIS + WinCertStore tests are `t.Log` placeholders.
|
||||
|
||||
NEW `docs/connector-iis.md::Operator validation playbook` documents
|
||||
the manual-on-Windows-host procedure operators run pre-release. The
|
||||
`windows-iis-test` sidecar stays in `deploy/docker-compose.test.yml`
|
||||
under `profiles: [deploy-e2e-windows]` for operator local use.
|
||||
|
||||
`docs/deployment-vendor-matrix.md` IIS + WinCertStore rows status
|
||||
updated `pending` → `operator-playbook`.
|
||||
|
||||
**Revises Bundle II frozen decision 0.4.** Documented in
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
### NEW `image-and-supply-chain` job (Phases 7-9)
|
||||
|
||||
Top-level Ubuntu job (~3 min, parallel to `go-build-and-test`). Three
|
||||
steps:
|
||||
|
||||
1. **Digest validity** — every `@sha256:<digest>` ref in
|
||||
`deploy/**/*.{yml,Dockerfile*}` must resolve on its registry.
|
||||
Closes the H-001 lying-field gap (H-001 verifies digest *presence*
|
||||
only — Bundle II shipped 11 fabricated digests that passed H-001
|
||||
and failed `docker pull` in CI).
|
||||
2. **Docker build smoke** — all 4 Dockerfiles in the repo must build
|
||||
(`Dockerfile`, `Dockerfile.agent`,
|
||||
`deploy/test/f5-mock-icontrol/Dockerfile`,
|
||||
`deploy/test/libest/Dockerfile`).
|
||||
3. **OpenAPI ↔ handler operationId parity** — every router route has
|
||||
a matching `operationId` in `api/openapi.yaml` or is documented in
|
||||
the new `api/openapi-handler-exceptions.yaml` (8 documented
|
||||
exceptions at HEAD: SCEP + SCEP-mTLS wire-protocol endpoints).
|
||||
|
||||
### Coverage PR-comment action (Phase 10)
|
||||
|
||||
Self-hosted alternative to Codecov / Coveralls. Posts per-package
|
||||
coverage table as a PR comment; updates in place on subsequent
|
||||
pushes. No paid SaaS dependency.
|
||||
|
||||
### `make verify-docs` + `make verify-deploy` (Phase 11)
|
||||
|
||||
Three-tier convention now:
|
||||
- `make verify` — required pre-commit (gofmt + vet + lint + test)
|
||||
- `make verify-deploy` — optional pre-push (digest validity + OpenAPI
|
||||
parity + Docker build smoke for server + agent)
|
||||
- `make verify-docs` — required pre-tag (QA-doc Part-count + seed-count)
|
||||
|
||||
### NEW `docs/ci-pipeline.md` (Phase 12)
|
||||
|
||||
Operator-facing guide to the on-push pipeline. Per-job deep-dive,
|
||||
guard inventory, threshold management, troubleshooting matrix, branch
|
||||
protection list to update.
|
||||
|
||||
## Operator action required
|
||||
|
||||
After merge:
|
||||
|
||||
1. **Update GitHub branch protection rule** for `master` branch.
|
||||
Required-checks list changes from 19 entries → 7:
|
||||
- `Go Build & Test`
|
||||
- `Frontend Build`
|
||||
- `Helm Chart Validation`
|
||||
- `deploy-vendor-e2e`
|
||||
- `image-and-supply-chain`
|
||||
- `Analyze (go)`
|
||||
- `Analyze (javascript-typescript)`
|
||||
|
||||
2. **(Optional)** RAM-headroom verification on a test branch with the
|
||||
collapsed `deploy-vendor-e2e` job. If peak RSS > 12 GB on
|
||||
ubuntu-latest, fall back to bucketed matrix per
|
||||
`cowork/ci-pipeline-cleanup/decisions-revised.md`.
|
||||
|
||||
## Rollback
|
||||
|
||||
If RAM headroom proves insufficient or a guard misbehaves:
|
||||
|
||||
- Vendor matrix collapse (Phase 5): revert that one commit; fall back
|
||||
to the bucketed-matrix design (3 jobs × ~4 sidecars).
|
||||
- staticcheck hard gate (Phase 3): revert that one commit; flip
|
||||
`continue-on-error: true` back temporarily until the new SA1019
|
||||
site is closed.
|
||||
- All other phases are pure-additive or pure-extraction; reverting
|
||||
any single Phase commit restores the prior behavior.
|
||||
|
||||
## Verification
|
||||
|
||||
```
|
||||
make verify # pre-commit gate (existing)
|
||||
make verify-deploy # optional pre-push (new)
|
||||
make verify-docs # pre-tag (new)
|
||||
bash scripts/ci-guards/*.sh # all 20 guards locally
|
||||
bash scripts/check-coverage-thresholds.sh # only after coverage.out exists
|
||||
```
|
||||
|
||||
All passing on HEAD.
|
||||
|
||||
## Tag
|
||||
|
||||
Operator picks the exact `v2.X.0` value. Bundle ships ~13 commits
|
||||
on master after the prior bundle's closing commit (HEAD `1de61e91`).
|
||||
@@ -198,7 +198,9 @@ docker compose -f deploy/docker-compose.yml down -v
|
||||
|
||||
### What it adds
|
||||
|
||||
One line: mounts `seed_demo.sql` into PostgreSQL's init directory. This 667-line SQL file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
|
||||
One env var: `CERTCTL_DEMO_SEED=true` on the `certctl-server` service. The server applies `migrations/seed_demo.sql` at boot via `postgres.RunDemoSeed` AFTER the baseline migrations + `seed.sql` are in place. The demo seed file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
|
||||
|
||||
Pre-U-3 the overlay used to mount `seed_demo.sql` into PostgreSQL's `/docker-entrypoint-initdb.d/` and rely on initdb-time application. That worked only because the production stack also mounted the migrations there, so the schema existed when initdb ran. Once U-3 dropped the production initdb mounts (single source of truth: server runs `RunMigrations` + `RunSeed` at boot), the demo seed could no longer be applied at initdb time — the tables it references wouldn't exist yet. Post-U-3 the overlay is a 27-line override file with no `image:` / `build:` of its own; it MUST be passed alongside the base, or compose errors with `service "certctl-server" has neither an image nor a build context specified`.
|
||||
|
||||
### Starting it
|
||||
|
||||
|
||||
@@ -133,6 +133,15 @@ services:
|
||||
CERTCTL_KEYGEN_MODE: server # Demo uses server-side keygen; production should use "agent"
|
||||
CERTCTL_NETWORK_SCAN_ENABLED: "true" # Enable network scan GUI with seeded demo targets
|
||||
CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key} # AES-256-GCM for dynamic issuer/target config
|
||||
# Bundle 1 follow-on: this compose IS the bundled demo path
|
||||
# (CERTCTL_AUTH_TYPE=none + KEYGEN_MODE=server above), so the
|
||||
# demo seed runs by default. seed_demo.sql pre-seeds the
|
||||
# agent-demo-1 row that the bundled certctl-agent below needs
|
||||
# to authenticate. The docker-compose.demo.yml overlay still
|
||||
# works (it sets the same flag) and remains for backward
|
||||
# compat. Production deploys override CERTCTL_AUTH_TYPE +
|
||||
# KEYGEN_MODE + DEMO_SEED via their own compose.
|
||||
CERTCTL_DEMO_SEED: "true"
|
||||
ports:
|
||||
- "8443:8443"
|
||||
volumes:
|
||||
@@ -183,6 +192,17 @@ services:
|
||||
CERTCTL_SERVER_URL: https://certctl-server:8443
|
||||
CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
|
||||
CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
|
||||
# Bundle 1 follow-on: pre-Bundle-1 the bundled agent had no
|
||||
# CERTCTL_AGENT_ID set, hit cmd/agent/main.go's fail-fast guard
|
||||
# ("agent-id flag or CERTCTL_AGENT_ID env var is required"), and
|
||||
# restart-looped silently on every fresh `docker compose up`.
|
||||
# Latent since 2026-03-14 (commit d395776). seed_demo.sql now
|
||||
# pre-seeds the matching agents row; the demo runs with
|
||||
# CERTCTL_AUTH_TYPE=none on the server so the api_key Bearer
|
||||
# token is irrelevant here. Production deploys override
|
||||
# CERTCTL_AGENT_ID with the value returned from
|
||||
# POST /api/v1/agents during registration.
|
||||
CERTCTL_AGENT_ID: ${CERTCTL_AGENT_ID:-agent-demo-1}
|
||||
CERTCTL_AGENT_NAME: docker-agent
|
||||
CERTCTL_LOG_LEVEL: info
|
||||
CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys # Agent scans this directory for existing certificates
|
||||
|
||||
@@ -202,8 +202,8 @@ Any template that consumes .Values.server.auth.type should call
|
||||
runs once per affected resource. No-op when configured correctly.
|
||||
*/}}
|
||||
{{- define "certctl.validateAuthType" -}}
|
||||
{{- $valid := list "api-key" "none" -}}
|
||||
{{- $valid := list "api-key" "none" "oidc" -}}
|
||||
{{- if not (has .Values.server.auth.type $valid) -}}
|
||||
{{- fail (printf "\n\nserver.auth.type=%q is not supported (valid: %v).\n\nFor JWT/OIDC, run an authenticating gateway in front of certctl\n(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and\nset server.auth.type=none here so the gateway terminates federated\nidentity. See docs/architecture.md \"Authenticating-gateway pattern\"\nand docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.\n\nG-1 audit closure: pre-G-1 the chart accepted type=jwt and the binary\nsilently downgraded to api-key middleware. The chart now fails at\ntemplate time so misconfigured deployments cannot ship.\n" .Values.server.auth.type $valid) -}}
|
||||
{{- fail (printf "\n\nserver.auth.type=%q is not supported (valid: %v).\n\nFor JWT/SAML/LDAP, run an authenticating gateway in front of certctl\n(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and\nset server.auth.type=none here so the gateway terminates federated\nidentity. See docs/architecture.md \"Authenticating-gateway pattern\"\nand docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.\n\nG-1 audit closure: pre-G-1 the chart accepted type=jwt and the binary\nsilently downgraded to api-key middleware. The chart now fails at\ntemplate time so misconfigured deployments cannot ship.\n\nAuth Bundle 2 Phase 0: server.auth.type=oidc is in the valid set but\nthe OIDC handler chain ships in later Bundle 2 phases. Pre-Bundle-2\noperators who set type=oidc see the certctl-server container exit at\nstartup with an actionable error — chart-time validation no longer\nblocks deploy because the binary's runtime guard takes over. Once\nBundle 2 lands, the runtime guard relaxes and OIDC works end-to-end.\n" .Values.server.auth.type $valid) -}}
|
||||
{{- end -}}
|
||||
{{- end }}
|
||||
|
||||
@@ -6,8 +6,8 @@
|
||||
# Per H-001 guard: every FROM is digest-pinned. Operator re-pins
|
||||
# quarterly per docs/deployment-vendor-matrix.md.
|
||||
|
||||
# golang:1.25.9-bookworm digest pinned per H-001.
|
||||
FROM golang:1.25.9-bookworm@sha256:1a1408bf8d2d3077f9508880caf0e8bb0fde195fe3c890e7ea480dfb66dc7827 AS builder
|
||||
# golang:1.25.10-bookworm digest pinned per H-001.
|
||||
FROM golang:1.25.10-bookworm@sha256:e3a54b77385b4f8a31c1db4d12429ffb3718ea76865731a787c497755d409547 AS builder
|
||||
WORKDIR /src
|
||||
COPY deploy/test/f5-mock-icontrol/ ./
|
||||
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -trimpath -ldflags "-s -w" -o /out/f5-mock-icontrol .
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
module github.com/certctl-io/certctl/deploy/test/f5-mock-icontrol
|
||||
|
||||
go 1.25.9
|
||||
go 1.25.10
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# certctl Load-Test Harness
|
||||
|
||||
Closes the **#8 acquisition-readiness blocker** from the 2026-05-01 issuer
|
||||
coverage audit (`cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`).
|
||||
coverage audit (the 2026-05-01 issuer coverage audit).
|
||||
Pre-fix, certctl had zero benchmarks or load tests for any API path; an
|
||||
acquirer evaluating "can certctl handle our 50k-cert fleet at 47-day
|
||||
rotation" had nothing to point at. This harness is the substantiation.
|
||||
@@ -354,6 +354,6 @@ verification.
|
||||
|
||||
## Audit references
|
||||
|
||||
- API tier: `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` fix #8.
|
||||
- Connector tier: `cowork/deployment-target-audit-2026-05-02/RESULTS.md` Bundle 10.
|
||||
- ACME flows: Phase 5 master prompt (`cowork/acme-server-prompts/06-phase-5-certmanager-hardening-prompt.md`).
|
||||
- API tier: 2026-05-01 issuer coverage audit fix #8.
|
||||
- Connector tier: 2026-05-02 deployment-target audit Bundle 10.
|
||||
- ACME flows: Phase 5 master prompt (project notes).
|
||||
|
||||
@@ -53,8 +53,8 @@
|
||||
# Usage: make loadtest (from the repo root)
|
||||
# Manual: cd deploy/test/loadtest && docker compose up --abort-on-container-exit --exit-code-from k6
|
||||
#
|
||||
# Audit reference (API tier): cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
# Audit reference (connector tier): cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
|
||||
# Audit reference (API tier): 2026-05-01 issuer coverage audit fix #8.
|
||||
# Audit reference (connector tier): 2026-05-02 deployment-target audit Bundle 10.
|
||||
# =============================================================================
|
||||
|
||||
services:
|
||||
|
||||
@@ -60,8 +60,8 @@
|
||||
// tests are too slow to gate per-PR signal).
|
||||
//
|
||||
// Audit references:
|
||||
// - API tier: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md fix #8.
|
||||
// - Connector tier: cowork/deployment-target-audit-2026-05-02/RESULTS.md Bundle 10.
|
||||
// - API tier: 2026-05-01 issuer coverage audit fix #8.
|
||||
// - Connector tier: 2026-05-02 deployment-target audit Bundle 10.
|
||||
|
||||
import http from 'k6/http';
|
||||
import { check } from 'k6';
|
||||
|
||||
+135
@@ -0,0 +1,135 @@
|
||||
# certctl Documentation
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
The full docs index, organized by audience. Pick the section that matches what you need to do; each link below opens a focused doc rather than a wall of text.
|
||||
|
||||
For the elevator pitch and quickstart commands, see the repo `README.md` at the root. For the marketing site, see [certctl.io](https://certctl.io).
|
||||
|
||||
---
|
||||
|
||||
## Getting Started
|
||||
|
||||
You're new to certctl, just cloned the repo, or want to understand what it does before installing.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Concepts](getting-started/concepts.md) | TLS certificates explained for beginners — CAs, ACME, EST, private keys, the full glossary |
|
||||
| [Quickstart](getting-started/quickstart.md) | Five-minute setup with Docker Compose, dashboard tour, API tour |
|
||||
| [Examples](getting-started/examples.md) | Five turnkey scenarios — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer |
|
||||
| [Advanced demo](getting-started/advanced-demo.md) | End-to-end certificate lifecycle with technical depth at each step |
|
||||
| [Why certctl](getting-started/why-certctl.md) | Positioning vs ACME clients, agent-based SaaS, enterprise platforms; when to look elsewhere |
|
||||
|
||||
## Reference
|
||||
|
||||
You're operating certctl in production or building integrations and need authoritative technical detail.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Architecture](reference/architecture.md) | System design, data flow, security model, deployment topologies |
|
||||
| [Profiles](reference/profiles.md) | CertificateProfile policy object — issuer wiring, EKUs, RequiresApproval gate (with profile-edit closure) |
|
||||
| [API](reference/api.md) | OpenAPI 3.1 spec, integration patterns, client SDK generation |
|
||||
| [CLI](reference/cli.md) | certctl-cli command reference and CI/CD integration patterns |
|
||||
| [Configuration](reference/configuration.md) | `CERTCTL_*` environment variable reference (scheduler, rate limits, deploy verify, audit, agent) |
|
||||
| [MCP server](reference/mcp.md) | Model Context Protocol integration for AI assistants |
|
||||
| [Release verification](reference/release-verification.md) | Cosign / SLSA / SBOM verification procedure |
|
||||
| [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md) | Multi-level CA tree management — RFC 5280 §3.2/§4.2.1.9/§4.2.1.10 enforcement |
|
||||
| [Auth standards implemented](reference/auth-standards-implemented.md) | RFC + CWE evidence for the API-key + RBAC + OIDC + sessions + break-glass surface (NOT a compliance-mapping doc) |
|
||||
| [Deployment model](reference/deployment-model.md) | Atomic write, post-deploy verify, rollback semantics across all targets |
|
||||
| [Vendor matrix](reference/vendor-matrix.md) | Tested vendor versions per target connector |
|
||||
|
||||
### Connectors
|
||||
|
||||
The [connector index](reference/connectors/index.md) is the canonical catalog (interfaces, registry, scanners, plus an inline reference per built-in). Per-connector deep-dive siblings cover operator-grade material — vendor edges, troubleshooting, rotation playbooks, when-to-use vs alternatives.
|
||||
|
||||
**Issuers** (13 deep-dives): [ACME](reference/connectors/acme.md) · [ADCS](reference/connectors/adcs.md) · [AWS ACM Private CA](reference/connectors/aws-acm-pca.md) · [DigiCert](reference/connectors/digicert.md) · [EJBCA / Keyfactor](reference/connectors/ejbca.md) · [Entrust](reference/connectors/entrust.md) · [GlobalSign Atlas HVCA](reference/connectors/globalsign.md) · [Google CAS](reference/connectors/google-cas.md) · [Local CA](reference/connectors/local-ca.md) · [OpenSSL / Custom CA](reference/connectors/openssl.md) · [Sectigo SCM](reference/connectors/sectigo.md) · [step-ca / Smallstep](reference/connectors/step-ca.md) · [Vault PKI](reference/connectors/vault.md)
|
||||
|
||||
**Targets** (15 deep-dives): [Apache](reference/connectors/apache.md) · [AWS Certificate Manager](reference/connectors/aws-acm.md) · [Azure Key Vault](reference/connectors/azure-kv.md) · [Caddy](reference/connectors/caddy.md) · [Envoy](reference/connectors/envoy.md) · [F5 BIG-IP](reference/connectors/f5.md) · [HAProxy](reference/connectors/haproxy.md) · [IIS](reference/connectors/iis.md) · [Java Keystore](reference/connectors/jks.md) · [Kubernetes Secrets](reference/connectors/k8s.md) · [NGINX](reference/connectors/nginx.md) · [Postfix / Dovecot](reference/connectors/postfix.md) · [SSH (agentless)](reference/connectors/ssh.md) · [Traefik](reference/connectors/traefik.md) · [Windows Certificate Store](reference/connectors/wincertstore.md)
|
||||
|
||||
### Protocols
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [ACME server](reference/protocols/acme-server.md) | Run certctl as an RFC 8555 + RFC 9773 ARI ACME server |
|
||||
| [ACME server threat model](reference/protocols/acme-server-threat-model.md) | Security posture for the ACME server endpoint |
|
||||
| [SCEP server](reference/protocols/scep-server.md) | RFC 8894 native SCEP server — RA cert config, multi-profile dispatch, must-staple, mTLS sibling route |
|
||||
| [SCEP for Microsoft Intune](reference/protocols/scep-intune.md) | Intune-specific deployment guide — NDES replacement playbook |
|
||||
| [EST server](reference/protocols/est.md) | RFC 7030 EST server — 802.1X / Wi-Fi enrollment, IoT bootstrap, channel binding |
|
||||
| [CRL & OCSP](reference/protocols/crl-ocsp.md) | RFC 5280 CRL + RFC 6960 OCSP responder for relying parties |
|
||||
| [Async CA polling](reference/protocols/async-ca-polling.md) | Bounded polling for async-CA issuer connectors |
|
||||
|
||||
## Operator
|
||||
|
||||
You're running certctl in production and need operational guidance.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Security posture](operator/security.md) | Auth, rate limits, encryption at rest, key rotation, RBAC + OIDC + sessions + break-glass, bootstrap |
|
||||
| [RBAC operator reference](operator/rbac.md) | Roles, permissions, scopes, scope-down + day-0 bootstrap |
|
||||
| [Auth threat model](operator/auth-threat-model.md) | API-key + RBAC + OIDC + sessions + break-glass — token forgery, session hijacking, IdP compromise, role-grant abuse, bootstrap-token leak, audit-mutation |
|
||||
| [OIDC / SSO runbooks](operator/oidc-runbooks/index.md) | Per-IdP setup guides — Keycloak, Authentik, Okta, Auth0, Entra ID, Google Workspace |
|
||||
| [Control plane TLS](operator/tls.md) | Self-signed bootstrap, operator-supplied Secret, cert-manager Certificate CR |
|
||||
| [Database TLS](operator/database-tls.md) | PostgreSQL transport encryption |
|
||||
| [Approval workflow](operator/approval-workflow.md) | Two-person integrity gate for high-stakes issuance + profile-edit closure |
|
||||
| [Helm deployment](operator/helm-deployment.md) | Kubernetes installation via the bundled chart |
|
||||
| [Performance baselines](operator/performance-baselines.md) | Operator-runnable benchmarks for regression spot checks |
|
||||
| [Auth benchmarks](operator/auth-benchmarks.md) | Session + OIDC validation p99 targets and measured baselines |
|
||||
| [Legacy clients (TLS 1.2)](operator/legacy-clients-tls-1.2.md) | Reverse-proxy runbook for embedded EST/SCEP clients on TLS 1.2 |
|
||||
|
||||
### Runbooks
|
||||
|
||||
| Runbook | When |
|
||||
|---|---|
|
||||
| [Cloud targets](operator/runbooks/cloud-targets.md) | AWS ACM + Azure Key Vault deployment, debugging, rollback |
|
||||
| [Expiry alerts](operator/runbooks/expiry-alerts.md) | Per-policy multi-channel routing matrix, severity tiers |
|
||||
| [Disaster recovery](operator/runbooks/disaster-recovery.md) | CRL cache, OCSP responder cert, CA private-key rotation, Postgres restore |
|
||||
|
||||
## Migration
|
||||
|
||||
You're moving from another cert-management tool to certctl, or running both in parallel.
|
||||
|
||||
| From | Doc |
|
||||
|---|---|
|
||||
| Certbot | [migration/from-certbot.md](migration/from-certbot.md) |
|
||||
| acme.sh | [migration/from-acmesh.md](migration/from-acmesh.md) |
|
||||
| cert-manager (coexistence, not replacement) | [migration/cert-manager-coexistence.md](migration/cert-manager-coexistence.md) |
|
||||
| Caddy ACME (point Caddy at certctl) | [migration/acme-from-caddy.md](migration/acme-from-caddy.md) |
|
||||
| cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) |
|
||||
| Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) |
|
||||
| **API keys → RBAC (v2.0.x → v2.1.0)** | [migration/api-keys-to-rbac.md](migration/api-keys-to-rbac.md) — **AUDIT YOUR API KEYS** post-upgrade |
|
||||
| **Enable OIDC SSO** | [migration/oidc-enable.md](migration/oidc-enable.md) — step-by-step OIDC onboarding for an existing API-key + RBAC deployment |
|
||||
|
||||
## Contributor
|
||||
|
||||
You're contributing to certctl, running tests locally, or trying to understand the CI pipeline.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Testing strategy](contributor/testing-strategy.md) | What we test and why; per-PR fast gates vs daily deep-scan |
|
||||
| [Test environment](contributor/test-environment.md) | Local environment with real CAs (Pebble, step-ca, etc.) |
|
||||
| [QA prerequisites](contributor/qa-prerequisites.md) | Before running QA: stack boot, demo data baseline, env vars |
|
||||
| [QA test suite](contributor/qa-test-suite.md) | qa_test.go reference for release QA |
|
||||
| [GUI QA checklist](contributor/gui-qa-checklist.md) | Manual GUI verification pass for release |
|
||||
| [Release sign-off](contributor/release-sign-off.md) | Release-day checklist — code state, automated gates, manual QA, artefact verification |
|
||||
| [CI pipeline](contributor/ci-pipeline.md) | CI shape, regression guards, adding new checks |
|
||||
|
||||
## Archive
|
||||
|
||||
Historical docs preserved for reference. Most operators don't need these.
|
||||
|
||||
| Doc | Why archived |
|
||||
|---|---|
|
||||
| [Upgrade to TLS (v2.2)](archive/upgrades/to-tls-v2.2.md) | Pre-v2.2 HTTPS-everywhere upgrade procedure |
|
||||
| [Upgrade past v2 JWT removal](archive/upgrades/to-v2-jwt-removal.md) | G-1 milestone JWT auth removal procedure |
|
||||
|
||||
---
|
||||
|
||||
## Reading order by role
|
||||
|
||||
**First-time operator:** [Concepts](getting-started/concepts.md) → [Quickstart](getting-started/quickstart.md) → [Examples](getting-started/examples.md). About 90 minutes end to end.
|
||||
|
||||
**Production operator:** [Architecture](reference/architecture.md) → [Security posture](operator/security.md) → [Control plane TLS](operator/tls.md) → [Disaster recovery runbook](operator/runbooks/disaster-recovery.md). About 4 hours end to end.
|
||||
|
||||
**PKI engineer:** [ACME server](reference/protocols/acme-server.md) → [SCEP server](reference/protocols/scep-server.md) → [EST server](reference/protocols/est.md) → [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md). About 6 hours end to end.
|
||||
|
||||
**Contributor:** [Architecture](reference/architecture.md) → [Testing strategy](contributor/testing-strategy.md) → [Test environment](contributor/test-environment.md) → [CI pipeline](contributor/ci-pipeline.md). About 3 hours end to end.
|
||||
@@ -1,10 +1,18 @@
|
||||
# Upgrading to HTTPS-Everywhere (v2.2)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Archived 2026-05-05.** This upgrade guide applies to certctl < v2.2.
|
||||
> Current operators on v2.2+ already have HTTPS-only control planes and
|
||||
> don't need this procedure. For the steady-state TLS reference, see
|
||||
> [`docs/operator/tls.md`](../../operator/tls.md). Preserved here for
|
||||
> late upgraders coming off pre-v2.2 releases.
|
||||
|
||||
certctl's control plane is HTTPS-only as of v2.2. There is no `http` mode, no `auto` mode, no dual-listener bind, no N-release migration window. The cutover is a single step. Out-of-date agents that still point at `http://…` fail at the TCP/TLS handshake layer on first connect after the upgrade and stay `Offline` in the dashboard until their env block is updated and the fleet is rolled.
|
||||
|
||||
This doc walks operators through the cutover for the two shipped deployment topologies — docker-compose and Helm — and documents the failure modes and rollback posture explicitly.
|
||||
|
||||
For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](tls.md). This doc is the narrow "how do I upgrade" procedure.
|
||||
For the deep-dive on cert provisioning patterns, SIGHUP cert reload, and client-side CA-trust configuration, read [`tls.md`](../../operator/tls.md). This doc is the narrow "how do I upgrade" procedure.
|
||||
|
||||
## Preconditions
|
||||
|
||||
@@ -22,7 +30,7 @@ There is no schema migration tied to this release; the only at-rest state that c
|
||||
|
||||
## Procedure — docker-compose operators
|
||||
|
||||
The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)
|
||||
The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init container that self-signs an ECDSA-P256 (SHA-256 signature) cert on first boot and drops `server.crt`, `server.key`, and `ca.crt` into a named volume mounted read-only at `/etc/certctl/tls/` on the server and agent containers. No manual cert provisioning is required for the default stack. (Pre-v2.0.48 this was an ed25519 cert; see [`tls.md`](../../operator/tls.md) Pattern 1 for the rationale and the `down -v && up --build` migration note.)
|
||||
|
||||
1. **Pull the HTTPS-everywhere release.** From the repo root:
|
||||
|
||||
@@ -68,7 +76,7 @@ The shipped `deploy/docker-compose.yml` includes a `certctl-tls-init` init conta
|
||||
|
||||
## Procedure — Helm operators
|
||||
|
||||
The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](tls.md) for the full pattern catalog.
|
||||
The Helm chart does not self-sign. It refuses to render (`helm template` exits non-zero) unless you configure one of two cert sources: an operator-supplied Secret, or a cert-manager `Certificate` CR. See [`tls.md`](../../operator/tls.md) for the full pattern catalog.
|
||||
|
||||
1. **Provision cert material.** Pick one of:
|
||||
|
||||
@@ -182,13 +190,13 @@ Once every agent is `Online`, confirm a few invariants:
|
||||
- `curl -sS -o /dev/null -w "%{http_code}\n" http://localhost:8443/health` returns `000` with `Connection refused` (no HTTP listener). Plaintext is gone.
|
||||
- `openssl s_client -connect localhost:8443 -tls1_2 </dev/null` fails the handshake. TLS 1.2 is rejected.
|
||||
- `openssl s_client -connect localhost:8443 -tls1_3 </dev/null` succeeds and prints the server's SAN list. TLS 1.3 is live.
|
||||
- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](tls.md).
|
||||
- A cert rotation test: overwrite the server cert on disk, `kill -HUP` the server PID, confirm the new cert serves on the next `openssl s_client -connect … -showcerts` without a process restart. See the SIGHUP section in [`tls.md`](../../operator/tls.md).
|
||||
|
||||
Update your runbooks. Every `http://certctl.example.com` URL in internal documentation, monitoring config, and on-call playbooks should become `https://certctl.example.com` plus a CA-trust note.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`tls.md`](tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
|
||||
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough (post-HTTPS)
|
||||
- [`test-env.md`](test-env.md) — integration test environment (HTTPS-only)
|
||||
- [`tls.md`](../../operator/tls.md) — cert provisioning patterns, SIGHUP rotation, troubleshooting
|
||||
- [`quickstart.md`](../../getting-started/quickstart.md) — docker-compose walkthrough (post-HTTPS)
|
||||
- [`test-env.md`](../../contributor/test-environment.md) — integration test environment (HTTPS-only)
|
||||
- Milestone spec: `prompts/https-everywhere-milestone.md`
|
||||
@@ -1,8 +1,17 @@
|
||||
# Upgrading past G-1 — `CERTCTL_AUTH_TYPE=jwt` removal
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Archived 2026-05-05.** This upgrade guide applies to operators
|
||||
> upgrading past the G-1 milestone (the `CERTCTL_AUTH_TYPE=jwt` removal).
|
||||
> Current operators on post-G-1 releases don't need this. For the
|
||||
> steady-state security posture reference, see
|
||||
> [`docs/operator/security.md`](../../operator/security.md). Preserved
|
||||
> here for late upgraders.
|
||||
|
||||
If your certctl deployment currently sets `CERTCTL_AUTH_TYPE=jwt` (or `server.auth.type=jwt` in Helm), the next certctl upgrade will fail-fast at startup with a dedicated diagnostic. This guide explains why, what to switch to, and how to keep JWT/OIDC at your edge.
|
||||
|
||||
For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`upgrade-to-tls.md`](upgrade-to-tls.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.
|
||||
For everyone else — operators running `api-key` or `none` — this upgrade is a no-op. Skip to [`to-tls-v2.2.md`](to-tls-v2.2.md) for the v2.2 HTTPS-everywhere migration if you haven't done that one yet.
|
||||
|
||||
## Why we removed it
|
||||
|
||||
@@ -98,7 +107,7 @@ services:
|
||||
# ... rest of the certctl env block unchanged
|
||||
```
|
||||
|
||||
Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).
|
||||
Operators hit `https://<your-host>/`, get redirected through the OIDC provider, land back at oauth2-proxy with a session cookie, and oauth2-proxy proxies their request to certctl on the internal Docker network. certctl itself is HTTPS-only on `:8443` (TLS 1.3, see [`tls.md`](../../operator/tls.md)) but operator browsers never see that hop directly. Bind certctl-server's `:8443` to the internal Docker network only — do NOT publish it to the host. The audit trail will record the actor as the gateway-forwarded identity if you also configure a small bearer-token-mapping shim at the gateway (most production deployments do this with a per-user api-key issued by the gateway after OIDC validation).
|
||||
|
||||
### Traefik ForwardAuth pattern (Kubernetes)
|
||||
|
||||
@@ -147,8 +156,8 @@ There is no on-disk state that changes with this upgrade — no migrations to ro
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`architecture.md`](architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
|
||||
- [`tls.md`](tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
|
||||
- [`architecture.md`](../../reference/architecture.md) — "Authenticating-gateway pattern (JWT, OIDC, mTLS)" section.
|
||||
- [`tls.md`](../../operator/tls.md) — TLS provisioning patterns. The gateway proxying to certctl-server still needs to trust certctl's TLS cert; same patterns apply.
|
||||
- [`../deploy/helm/certctl/README.md`](../deploy/helm/certctl/README.md) — Helm-chart-flavored guidance.
|
||||
- `internal/config/config.go::ValidAuthTypes` — the single source of truth for what's accepted post-G-1.
|
||||
- `internal/repository/postgres/db.go::wrapPingError` — unrelated; pattern for runtime diagnostic of operator misconfiguration.
|
||||
@@ -1,341 +0,0 @@
|
||||
# NIST SP 800-57 Key Management Alignment
|
||||
|
||||
NIST SP 800-57 Part 1 Rev 5 (May 2020) is the authoritative US government guidance on cryptographic key management. This document maps certctl's implementation to its recommendations. certctl follows NIST guidance where applicable; this guide documents the alignment and identifies gaps for future roadmap planning.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Key Generation (Section 6.1)](#key-generation-section-61)
|
||||
2. [Key Storage and Protection (Sections 6.3, 6.4)](#key-storage-and-protection-sections-63-64)
|
||||
3. [Cryptoperiods (Section 5.3, Table 1)](#cryptoperiods-section-53-table-1)
|
||||
4. [Key States and Transitions (Section 5.2)](#key-states-and-transitions-section-52)
|
||||
5. [Algorithm Recommendations (Section 5.1, SP 800-131A)](#algorithm-recommendations-section-51-sp-800-131a)
|
||||
6. [Key Distribution and Transport (Section 6.2)](#key-distribution-and-transport-section-62)
|
||||
7. [Revocation and Compromise (NIST SP 800-57 Part 3)](#revocation-and-compromise-nist-sp-800-57-part-3)
|
||||
8. [Alignment Summary Table](#alignment-summary-table)
|
||||
9. [Gaps and Remediation Roadmap](#gaps-and-remediation-roadmap)
|
||||
- [V2 (Current)](#v2-current)
|
||||
- [V3 (Planned: 2026)](#v3-planned-2026)
|
||||
- [V5 (Planned: 2027+)](#v5-planned-2027)
|
||||
- [Post-Quantum (2027+)](#post-quantum-2027)
|
||||
10. [References](#references)
|
||||
11. [Questions or Corrections?](#questions-or-corrections)
|
||||
|
||||
## Key Generation (Section 6.1)
|
||||
|
||||
certctl generates certificate keys on agent infrastructure using Go's `crypto/rand` for entropy, backed by `/dev/urandom` on Linux and `CryptGenRandom` on Windows. Key generation happens as follows:
|
||||
|
||||
**Agent-Side Key Generation (Production Default)**
|
||||
- Agents generate ECDSA P-256 key pairs per certificate using `crypto/ecdsa` + `crypto/elliptic` (Go stdlib)
|
||||
- Key generation triggered by `AwaitingCSR` job state in renewal/issuance workflows
|
||||
- Agent creates Certificate Signing Request (CSR) with `x509.CreateCertificateRequest`, signed with the agent's private key
|
||||
- Only the CSR crosses the network to the control plane; private key material never leaves the agent
|
||||
- Configuration: `CERTCTL_KEYGEN_MODE=agent` (default, production)
|
||||
|
||||
**Server-Side Key Generation (Demo Only)**
|
||||
- Available for development and testing via `CERTCTL_KEYGEN_MODE=server`
|
||||
- Explicitly logged as a warning at startup: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only"
|
||||
- Docker Compose demo uses server mode for backward compatibility
|
||||
- Not recommended for production; agent mode is the secure default
|
||||
|
||||
**Entropy Source**
|
||||
- `crypto/rand` provides cryptographically secure random bytes
|
||||
- On Linux: backed by `/dev/urandom` via `getrandom()` syscall
|
||||
- On Windows: backed by `CryptGenRandom()` (now `BCryptGenRandom()`)
|
||||
- Meets NIST SP 800-90B requirements for entropy generation
|
||||
|
||||
## Key Storage and Protection (Sections 6.3, 6.4)
|
||||
|
||||
certctl implements tiered key storage with different protection profiles based on key purpose.
|
||||
|
||||
**Agent Private Keys**
|
||||
- Stored on agent filesystem at `CERTCTL_KEY_DIR` (default: `/var/lib/certctl/keys`)
|
||||
- File permissions: 0600 (read/write by agent process only, no world/group access)
|
||||
- One PEM file per certificate, organized by certificate ID
|
||||
- Accessible only to the agent process; isolated from other processes
|
||||
- For container deployments: use Docker volumes with restricted permissions (`-v /var/lib/certctl/keys:0600`)
|
||||
|
||||
**Issuing CA Keys (Local CA Connector)**
|
||||
- Loaded from disk at server startup via `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH` env vars
|
||||
- Supports RSA (PKCS#1, PKCS#8) and ECDSA (SEC1, PKCS#8) key formats
|
||||
- Validates certificate constraints before use:
|
||||
- `IsCA=true` flag present
|
||||
- `KeyUsageCertSign` extension set
|
||||
- Valid certificate chain (for sub-CA mode)
|
||||
- Keys held in memory during server runtime (no on-disk caching after load)
|
||||
- Cleared from memory only on server shutdown
|
||||
|
||||
**Sub-CA Mode (Enterprise Integration)**
|
||||
- CA certificate and key signed by upstream enterprise root (e.g., Active Directory Certificate Services)
|
||||
- Certctl acts as subordinate CA, inheriting issuer DN from upstream CA
|
||||
- All issued certificates chain to enterprise trust anchor
|
||||
- CA key protection inherits upstream root's key management practices
|
||||
- Configured via: `CERTCTL_CA_CERT_PATH=/path/to/ca.crt` and `CERTCTL_CA_KEY_PATH=/path/to/ca.key`
|
||||
|
||||
**NIST Gap: HSM Storage**
|
||||
NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for certctl Pro (V3), enabling integration with:
|
||||
- AWS CloudHSM
|
||||
- Azure Dedicated HSM
|
||||
- Thales Luna, Gemalto SafeNet, YubiHSM (on-premises)
|
||||
- PKCS#11-compatible devices
|
||||
|
||||
## Cryptoperiods (Section 5.3, Table 1)
|
||||
|
||||
NIST recommends cryptoperiods (key validity durations) based on key type and security requirements. certctl enforces cryptoperiods through certificate profiles and renewal policies.
|
||||
|
||||
**Certificate Profile Enforcement**
|
||||
- Certificate profiles (M11a) define `max_ttl` constraint per enrollment profile
|
||||
- All certificates issued through a profile cannot exceed the profile's max_ttl
|
||||
- Profile configuration example:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"name": "Production Web Certs",
|
||||
"max_ttl_seconds": 31536000, // 1 year max
|
||||
"allowed_key_algorithms": ["ECDSA_P256"],
|
||||
"required_sans": ["example.com"]
|
||||
}
|
||||
```
|
||||
|
||||
**Renewal Thresholds**
|
||||
- Renewal policies with configurable `alert_thresholds_days`: `[30, 14, 7, 0]` (days before expiry)
|
||||
- Background scheduler checks renewal eligibility every 1 hour
|
||||
- Certificates transitioned to `Expiring` status at 30 days, `Expired` at 0 days
|
||||
- Renewal workflow can be triggered manually or automatically
|
||||
|
||||
**NIST Cryptoperiod Recommendations vs certctl Implementation**
|
||||
|
||||
| Key Type | NIST Recommendation | certctl Implementation |
|
||||
|----------|---------------------|------------------------|
|
||||
| CA signing key | 3–10 years | Configured via CA certificate not-after date; inheritable from upstream CA in sub-CA mode |
|
||||
| End-entity web server cert | 1–3 years (trending shorter) | Profile `max_ttl` configurable; ACME issuer typically 90 days; SC-081v3 mandating 47 days by 2029 |
|
||||
| Code signing cert | 2–8 years | Profile enforcement via `max_ttl`; not primary certctl use case |
|
||||
| Short-lived credentials | < 1 hour recommended | Profile TTL < 1 hour; exempt from CRL/OCSP (expiry is sufficient revocation); auto-expiry on scheduler tick |
|
||||
| OCSP signing key | 1–2 years | Embedded OCSP responder uses issuing CA key (same period as issuer) or delegated signing cert |
|
||||
| TLS/SSL interoperability cert | 1–2 years | Trending 1 year or less; certctl's ACME/sub-CA/step-ca issuers all support short periods |
|
||||
|
||||
## Key States and Transitions (Section 5.2)
|
||||
|
||||
NIST defines lifecycle states for keys: pre-activation, active, suspended, deactivated, compromised, and destroyed. certctl maps these to certificate and job states:
|
||||
|
||||
| NIST Key State | certctl Equivalent | Implementation |
|
||||
|---|---|---|
|
||||
| **Pre-activation** | `Pending` job state / `AwaitingCSR` | Job created but key not yet generated; awaiting agent CSR submission (agent-mode) or server keygen (demo mode) |
|
||||
| **Active** | Certificate status `Active` | Cert deployed to targets and in use; within validity period (not before < now < not after) |
|
||||
| **Suspended** | Job state `AwaitingApproval` | Interactive approval holds deployment job pending human review; resumes on approval or cancels on rejection |
|
||||
| **Deactivated** | Certificate status `Expired` | Past not-after date; auto-transitioned by scheduler every 2 minutes; renewal eligible |
|
||||
| **Compromised** | Certificate status `Revoked` | Issued via `POST /api/v1/certificates/{id}/revoke` with RFC 5280 revocation reason |
|
||||
| **Destroyed** | Archived (implementation detail) | Operator responsibility; certctl retains all certs in audit trail for compliance; no destructive deletion API |
|
||||
|
||||
**State Transition Audit Trail**
|
||||
All transitions logged to immutable `audit_events` table with:
|
||||
- Event type (e.g., `certificate_revoked`, `renewal_job_completed`)
|
||||
- Actor (authenticated user or agent ID)
|
||||
- Timestamp (RFC3339)
|
||||
- Resource (certificate ID)
|
||||
- Reason (revocation reason code, approval reason, etc.)
|
||||
- HTTP method, path, status (for API calls)
|
||||
|
||||
Example audit entry for revocation:
|
||||
```json
|
||||
{
|
||||
"id": "ae-2024-0615",
|
||||
"event_type": "certificate_revoked",
|
||||
"actor": "ops-alice@example.com",
|
||||
"timestamp": "2024-06-15T14:23:00Z",
|
||||
"resource_id": "cert-web-prod-2024",
|
||||
"resource_type": "certificate",
|
||||
"description": "Revoked: reason=keyCompromise",
|
||||
"body_hash": "sha256:a1b2c3d..."
|
||||
}
|
||||
```
|
||||
|
||||
## Algorithm Recommendations (Section 5.1, SP 800-131A)
|
||||
|
||||
NIST SP 800-131A Rev 2 (January 2024) categorizes cryptographic algorithms as Approved, Conditionally Approved, or Disallowed. certctl implements only NIST-approved algorithms:
|
||||
|
||||
| Algorithm | NIST Status | certctl Support | Notes |
|
||||
|-----------|-------------|-----------------|-------|
|
||||
| **ECDSA P-256** | Approved (128-bit security strength) | Default for agent-side keygen | Meets NIST curve requirements (FIPS 186-4) |
|
||||
| **ECDSA P-384** | Approved (192-bit security strength) | Supported via profile configuration | Higher security margin; slower than P-256 |
|
||||
| **ECDSA P-521** | Approved (256-bit security strength) | Supported via profile configuration | Rarely needed; overkill for TLS |
|
||||
| **RSA 2048** | Approved minimum (112-bit security, transitioning) | Supported via all issuers | Deprecated path; migrate to 3072+ by 2030 per NIST |
|
||||
| **RSA 3072** | Approved (128-bit security) | Supported via all issuers | Recommended minimum for long-term security |
|
||||
| **RSA 4096** | Approved (192-bit security) | Supported via all issuers | Supported but slower; overkill for most TLS |
|
||||
| **SHA-256** | Approved | Used throughout | CSR signing, certificate fingerprints, audit body hashing, CRL/OCSP signing |
|
||||
| **SHA-384** | Approved (192-bit) | Supported where algorithm selection available | Used in some CA signing scenarios |
|
||||
| **SHA-512** | Approved (256-bit) | Supported where algorithm selection available | Rarely needed; SHA-256 suffices for most use cases |
|
||||
| **SHA-1** | Deprecated | Not used in certctl | Browsers reject SHA-1 certs; certctl never generates them |
|
||||
|
||||
**Algorithm Enforcement via Profiles**
|
||||
Certificate profiles enforce allowed key algorithms:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"allowed_key_algorithms": ["ECDSA_P256", "ECDSA_P384", "RSA3072"]
|
||||
}
|
||||
```
|
||||
|
||||
**Post-Quantum Cryptography (Tracking)**
|
||||
NIST has finalized PQC standards (FIPS 204, FIPS 205) in August 2024:
|
||||
- **ML-KEM** (Kyber): Approved key encapsulation mechanism
|
||||
- **ML-DSA** (Dilithium): Approved digital signature algorithm
|
||||
- **SLH-DSA** (SPHINCS+): Approved stateless hash-based signature scheme
|
||||
|
||||
certctl will track NIST's PQC roadmap and plan integration when hybrid PQC+classical certificate formats reach browser/infrastructure support. Currently, pure PQC certificates are not widely interoperable.
|
||||
|
||||
## Key Distribution and Transport (Section 6.2)
|
||||
|
||||
NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize exposure during transit. certctl implements a zero-transmission-of-private-keys model:
|
||||
|
||||
**Private Key Distribution**
|
||||
- Agent-side keygen model: Private keys never leave agent infrastructure
|
||||
- CSR transmitted over HTTPS (TLS 1.2+) with mutual TLS optional
|
||||
- API key authentication via `Authorization: Bearer <api-key>` header
|
||||
- All API calls logged to immutable audit trail
|
||||
|
||||
**Signed Certificate Distribution**
|
||||
- Certificates (public component) distributed via `GET /agents/{id}/work` over HTTPS
|
||||
- Work endpoint enriches deployment jobs with certificate PEM and metadata
|
||||
- Certificate PEM is idempotent (same cert always returns same bytes)
|
||||
|
||||
**Target Deployment**
|
||||
- Deployment to targets via local filesystem write (NGINX, Apache, HAProxy)
|
||||
- No network transmission of private keys to targets
|
||||
- Agents read local private key from `CERTCTL_KEY_DIR` on deployment
|
||||
- For appliances without agents (F5 BIG-IP, IIS), proxy agent pattern:
|
||||
- Proxy agent runs in same trust zone as appliance
|
||||
- Proxy agent holds target API credentials (iControl, WinRM)
|
||||
- Control plane never communicates with appliance directly
|
||||
- Deployment request includes certificate and proxy agent ID
|
||||
- Proxy agent executes deployment via appliance API
|
||||
|
||||
**Revocation Distribution**
|
||||
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
|
||||
- Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
|
||||
- 24-hour validity period
|
||||
- Includes all revoked serials, reasons, and revocation timestamps
|
||||
- Served unauthenticated so relying parties without certctl API credentials can fetch it
|
||||
- Subject to URL caching; OCSP preferred for real-time revocation
|
||||
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
|
||||
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
|
||||
- Signed by issuing CA (or delegated OCSP signing cert)
|
||||
- Responds with good/revoked/unknown status
|
||||
- Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
|
||||
- Real-time, more bandwidth-efficient than CRL polling
|
||||
|
||||
## Revocation and Compromise (NIST SP 800-57 Part 3)
|
||||
|
||||
NIST SP 800-57 Part 3 covers revocation (Section 2.5) when keys are suspected compromised or no longer needed. certctl implements comprehensive revocation infrastructure:
|
||||
|
||||
**Revocation API**
|
||||
- Endpoint: `POST /api/v1/certificates/{id}/revoke`
|
||||
- Request body:
|
||||
```json
|
||||
{
|
||||
"reason": "keyCompromise",
|
||||
"reason_text": "Private key exposed in log file"
|
||||
}
|
||||
```
|
||||
- Supports all 8 RFC 5280 revocation reason codes:
|
||||
- `unspecified` — no specific reason provided
|
||||
- `keyCompromise` — private key suspected compromised
|
||||
- `caCompromise` — issuing CA key compromised
|
||||
- `affiliationChanged` — subject org/affiliation changed
|
||||
- `superseded` — cert superseded by newer cert
|
||||
- `cessationOfOperation` — key no longer in use
|
||||
- `certificateHold` — temporary hold (rarely used)
|
||||
- `privilegeWithdrawn` — subject authorization withdrawn
|
||||
|
||||
**Revocation Recording**
|
||||
- Certificate status updated to `Revoked`
|
||||
- Entry recorded in `certificate_revocations` table with:
|
||||
- Certificate serial number
|
||||
- Revocation timestamp
|
||||
- Revocation reason code
|
||||
- Issuer ID
|
||||
- Idempotent (revoking an already-revoked cert is safe; returns 200 OK)
|
||||
|
||||
**Issuer Notification (Best-Effort)**
|
||||
- Control plane calls `issuer.RevokeCertificate(ctx, serial, reason)` on issuing connector
|
||||
- Failure does not block the revocation (async, logged, retried)
|
||||
- Supported issuers:
|
||||
- Local CA: generates new CRL immediately
|
||||
- ACME: submits revocation to ACME server (RFC 8555 Section 7.6)
|
||||
- step-ca: calls `/revoke` API
|
||||
- OpenSSL: executes user-provided revocation script
|
||||
|
||||
**Revocation Notifications**
|
||||
- Notifiers triggered after revocation recorded: Slack, Teams, PagerDuty, OpsGenie, email, webhook
|
||||
- Message includes certificate common name, issuer, reason, actor, timestamp
|
||||
- Delivery is asynchronous and retried on failure
|
||||
|
||||
**CRL and OCSP Distribution**
|
||||
- CRL updated on every revocation (or scheduled refresh for non-issued revocations)
|
||||
- OCSP responder queries revocation table in real-time
|
||||
- Short-lived certificate exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
**Bulk Revocation for Large-Scale Compromise Response** (V2.2) — NIST SP 800-57 Part 3 emphasizes rapid revocation when keys are compromised. `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria (profile, owner, agent, issuer) in a single operation. This enables operators to execute fleet-wide revocation for key compromise events affecting multiple certificates. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring every certificate is recorded in the audit trail with the incident reason.
|
||||
|
||||
**Revocation Audit Trail**
|
||||
All revocation events logged:
|
||||
- Event type: `certificate_revoked` or `bulk_revocation_initiated` (for fleet operations)
|
||||
- Actor: authenticated user or service
|
||||
- Reason code: RFC 5280 enum (or incident justification for bulk operations)
|
||||
- Timestamp: RFC3339
|
||||
- Issuer notification status: success or error reason
|
||||
- Filter criteria: profile_id, owner_id, agent_id, issuer_id (for bulk revocation)
|
||||
|
||||
## Alignment Summary Table
|
||||
|
||||
| NIST SP 800-57 Area | Status | Coverage | Notes |
|
||||
|---|---|---|---|
|
||||
| **Key Generation** | ✅ Aligned | 100% | Agent-side ECDSA P-256 using crypto/rand; server mode flagged as demo-only |
|
||||
| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V3 Pro |
|
||||
| **Cryptoperiods** | ✅ Aligned | 100% | Profile-enforced max_ttl; threshold-based renewal alerting |
|
||||
| **Key States** | ✅ Aligned | 100% | Full lifecycle tracking with immutable audit trail |
|
||||
| **Algorithms** | ✅ Aligned | 100% | NIST-approved algorithms only; post-quantum tracking in progress |
|
||||
| **Key Distribution** | ✅ Aligned | 100% | Private keys never transmitted; CSR/cert over TLS; agent-local deployment |
|
||||
| **Revocation** | ✅ Aligned | 100% | CRL, OCSP, all RFC 5280 reason codes; real-time updates |
|
||||
|
||||
## Gaps and Remediation Roadmap
|
||||
|
||||
### V2 (Current)
|
||||
- [x] Agent-side key generation
|
||||
- [x] Profile-enforced cryptoperiods
|
||||
- [x] CRL and OCSP distribution
|
||||
- [x] RFC 5280 revocation support
|
||||
- [x] Immutable audit trail
|
||||
|
||||
### V2.2 (Planned: 2026)
|
||||
- Bulk revocation by profile/owner/agent/issuer (fleet-level revocation for incident response)
|
||||
|
||||
### V3 (Planned: 2026)
|
||||
- Role-based access control (limit revocation/approval to authorized operators)
|
||||
|
||||
### V3 Pro (Planned)
|
||||
- HSM support for CA key storage and agent key storage (TPM 2.0, PKCS#11)
|
||||
- FIPS 140-2/3 validated crypto module (BoringCrypto build or external FIPS library)
|
||||
- Key destruction API (explicit secure erasure of agent keys)
|
||||
- Key escrow / recovery mechanism (backup encrypted private keys for disaster recovery)
|
||||
|
||||
### Post-Quantum (2027+)
|
||||
- ML-KEM and ML-DSA support when browser/TLS ecosystem supports hybrid certificates
|
||||
- Migration path documentation (how to transition existing RSA certs to PQC)
|
||||
|
||||
## References
|
||||
|
||||
- NIST SP 800-57 Part 1 Rev 5 (May 2020): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf
|
||||
- NIST SP 800-131A Rev 2 (January 2024): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
|
||||
- FIPS 186-4 (Digital Signature Standard): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
|
||||
- RFC 5280 (X.509 PKI Certificate and CRL Profile): https://tools.ietf.org/html/rfc5280
|
||||
- RFC 8555 (Automatic Certificate Management Environment): https://tools.ietf.org/html/rfc8555
|
||||
- NIST FIPS 204 (ML-DSA): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.204.pdf
|
||||
- NIST FIPS 205 (ML-KEM): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.205.pdf
|
||||
|
||||
## Questions or Corrections?
|
||||
|
||||
This document reflects certctl's implementation as of March 2026. For the latest code, refer to:
|
||||
- Key generation: `cmd/agent/main.go` (agent keygen) and `internal/service/renewal.go` (server keygen)
|
||||
- Key storage: `internal/config/config.go` (CERTCTL_KEY_DIR, CERTCTL_CA_CERT_PATH)
|
||||
- Revocation: `internal/service/revocation.go` and `internal/api/handler/certificates.go`
|
||||
- Audit trail: `internal/api/middleware/audit.go`
|
||||
@@ -1,825 +0,0 @@
|
||||
# PCI-DSS 4.0 Compliance Mapping
|
||||
|
||||
This guide maps certctl's existing capabilities to PCI-DSS 4.0 requirements relevant to TLS certificate and cryptographic key management. It is **not a compliance attestation** — a qualified security assessor (QSA) must evaluate your organization's complete control environment. Rather, this document helps you understand which PCI-DSS control objectives certctl supports and where operator responsibility lies.
|
||||
|
||||
Organizations subject to PCI-DSS typically need to demonstrate control over certificate issuance, renewal, rotation, revocation, and key management. Certctl automates the technical controls for certificate lifecycle; compliance depends on how you deploy, monitor, and audit it.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [Requirement 4: Protect Data in Transit](#requirement-4-protect-data-in-transit)
|
||||
- [4.2.1 — Strong Cryptography for Transmission](#421--strong-cryptography-for-transmission)
|
||||
- [4.2.2 — Certificate Inventory and Validation](#422--certificate-inventory-and-validation)
|
||||
3. [Requirement 3: Protect Stored Cardholder Data (Key Management)](#requirement-3-protect-stored-cardholder-data-key-management)
|
||||
- [3.6 — Cryptographic Key Documentation](#36--cryptographic-key-documentation)
|
||||
- [3.7 — Key Lifecycle Procedures](#37--key-lifecycle-procedures)
|
||||
4. [Requirement 8: Identify and Authenticate](#requirement-8-identify-and-authenticate)
|
||||
- [8.3 — Strong Authentication](#83--strong-authentication)
|
||||
- [8.6 — Application Account Management](#86--application-account-management)
|
||||
5. [Requirement 10: Log and Monitor](#requirement-10-log-and-monitor)
|
||||
- [10.2 — Implement Automated Audit Logging](#102--implement-automated-audit-logging)
|
||||
- [10.3 — Protect Audit Trail](#103--protect-audit-trail)
|
||||
- [10.4 — Promptly Review and Address Audit Trail Exceptions](#104--promptly-review-and-address-audit-trail-exceptions)
|
||||
- [10.7 — Retain and Protect Audit Trail History](#107--retain-and-protect-audit-trail-history)
|
||||
6. [Requirement 6: Develop and Maintain Secure Systems and Applications](#requirement-6-develop-and-maintain-secure-systems-and-applications)
|
||||
- [6.3.1 — Security Coding Practices](#631--security-coding-practices)
|
||||
- [6.5.10 — Broken Authentication and Cryptography Prevention](#6510--broken-authentication-and-cryptography-prevention)
|
||||
7. [Requirement 7: Restrict Access by Business Need-to-Know](#requirement-7-restrict-access-by-business-need-to-know)
|
||||
- [7.2 — Implement Access Control](#72--implement-access-control)
|
||||
8. [Evidence Summary Table](#evidence-summary-table)
|
||||
9. [Operator Responsibilities](#operator-responsibilities)
|
||||
10. [V3 Enhancements for PCI-DSS](#v3-enhancements-for-pci-dss)
|
||||
11. [Next Steps for Compliance](#next-steps-for-compliance)
|
||||
12. [Questions?](#questions)
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
Your QSA will request evidence that your certificate and key management systems meet specific PCI-DSS 4.0 requirements. For each applicable requirement, this guide identifies:
|
||||
|
||||
1. **Which certctl features support the control** — API endpoints, database tables, background processes
|
||||
2. **What evidence you can produce** — audit logs, dashboard metrics, API queries, deployment configs
|
||||
3. **Operator responsibilities** — what you must do outside certctl (policy, monitoring, access control)
|
||||
4. **Status** — Available (v1.0 shipped), Planned (future release), or Operator Responsibility (outside scope)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 4: Protect Data in Transit
|
||||
|
||||
**Objective**: Ensure strong cryptography is used to protect sensitive data during transmission.
|
||||
|
||||
### 4.2.1 — Strong Cryptography for Transmission
|
||||
|
||||
**Requirement**: Use appropriate and current cryptographic algorithms for all TLS and SSH connections protecting card data in transit.
|
||||
|
||||
**certctl Support**:
|
||||
- **Automated TLS certificate lifecycle** — Certctl issues TLS certificates to NGINX, Apache HAProxy targets via `POST /api/v1/deployments`. Certificates include RSA 2048-bit and ECDSA P-256 key types (configurable per profile, M11a).
|
||||
- **Control plane TLS enforcement** — All REST API endpoints served exclusively over HTTPS. Agent-to-server heartbeat and work polling use TLS. No plaintext protocol options.
|
||||
- **Issuer connector key negotiation** — ACME v2 (Let's Encrypt, ZeroSSL) validates issuer cryptography. Local CA enforces RSA/ECDSA constraints. step-ca integration ensures Smallstep's cryptography standards.
|
||||
- **Certificate profiles** (M11a) document allowed key types and minimum key sizes per environment (development, production, cardholder-network).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate inventory via `GET /api/v1/certificates` with key algorithm and size (serial JSON).
|
||||
- Issued certificate details showing RSA 2048+ or ECDSA P-256 for all deployed certificates.
|
||||
- Audit trail (`GET /api/v1/audit`) showing issuer connector selection and certificate profile assignment per certificate.
|
||||
- Target deployment logs showing TLS certificate installation on NGINX/Apache/HAProxy.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure certificate profiles for your environments with approved key algorithms.
|
||||
- Audit cipher suite configuration on deployed targets (certctl deploys certs; you verify target TLS settings).
|
||||
- Periodically review `CERTCTL_KEYGEN_MODE` — must be `agent` in production (never `server`).
|
||||
- Monitor issuer connector configuration to ensure issuers meet your cryptography standards.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 4.2.2 — Certificate Inventory and Validation
|
||||
|
||||
**Requirement**: Ensure all TLS/SSL certificates used for data transmission are valid, current, and meet required cryptographic standards.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Managed Certificate Inventory** — Full CRUD API (`/api/v1/certificates`) with sortable, filterable list. Fields: common name, SANs, subject, issuer, serial number, key type/size, not-before/after dates, issuer ID, profile ID, owner, team, status (Active/Expiring/Expired/Revoked).
|
||||
|
||||
- **Filesystem Certificate Discovery** (M18b) — Agents scan configured directories (`CERTCTL_DISCOVERY_DIRS` env var) for existing PEM/DER certificates every 6 hours and on startup. Control plane deduplicates by SHA-256 fingerprint. Three triage statuses: Unmanaged (not managed by certctl), Managed (linked to a managed certificate), Dismissed (operator-marked as out-of-scope).
|
||||
- API endpoints:
|
||||
- `GET /api/v1/discovered-certificates?status=Unmanaged` — find orphaned certs
|
||||
- `GET /api/v1/discovery-summary` — aggregate counts by status
|
||||
- `POST /api/v1/discovered-certificates/{id}/claim` — link to managed certificate
|
||||
- `POST /api/v1/discovered-certificates/{id}/dismiss` — mark out-of-scope
|
||||
|
||||
- **Expiration Threshold Alerting** — Renewal policies support `alert_thresholds_days` (default 30, 14, 7, 0). Background scheduler evaluates daily; certificates transition to Expiring/Expired status automatically. Notifications sent to owners via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
|
||||
|
||||
- **Revocation Infrastructure** (M15a, M15b, M-006):
|
||||
- Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
|
||||
- CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` — DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
|
||||
- Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
- **Stats API** (M14) — Real-time visibility:
|
||||
- `GET /api/v1/stats/summary` — total certs, by status, by issuer
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration distribution (weekly buckets)
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance job success rates
|
||||
- `GET /api/v1/certificates` with `?sort=-notAfter&fields=id,commonName,notAfter,status` — sparse, sorted inventory
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
|
||||
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
|
||||
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
|
||||
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
|
||||
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
|
||||
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate storage locations (e.g., `/etc/nginx/certs`, `/etc/apache2/certs`, `/usr/local/share/ca-certificates`).
|
||||
- Regularly triage discovered certificates: `GET /api/v1/discovered-certificates?status=Unmanaged`, claim or dismiss each.
|
||||
- Set renewal policies for all certificate profiles with appropriate `alert_thresholds_days` (recommendation: 30, 14, 7, 0).
|
||||
- Monitor expiration dashboard and respond to Expiring alerts before certificates expire.
|
||||
- Verify that issued certificates meet your organization's cryptography standards (key type, key size, SANs).
|
||||
- Test CRL/OCSP endpoints periodically to confirm they are reachable and signed correctly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, discovery M18b, revocation M15a/M15b)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 3: Protect Stored Cardholder Data (Key Management)
|
||||
|
||||
**Objective**: Render cardholder data unreadable anywhere it is stored; protect cryptographic keys used to encrypt data.
|
||||
|
||||
### 3.6 — Cryptographic Key Documentation
|
||||
|
||||
**Requirement**: Document and implement all key management processes and procedures covering generation, storage, archival, destruction, and change; protect cryptographic keys; and restrict access to keys to the minimum required.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Profile Documentation** (M11a) — Named profiles define allowed key types, maximum TTL, and allowed EKUs per use case. Each profile is a documented policy:
|
||||
```json
|
||||
{
|
||||
"id": "p-web-tls",
|
||||
"name": "Web TLS Production",
|
||||
"allowed_key_types": ["RSA_2048", "ECDSA_P256"],
|
||||
"max_ttl_seconds": 31536000,
|
||||
"require_sans": true,
|
||||
"description": "Production TLS certs for external web services"
|
||||
}
|
||||
```
|
||||
|
||||
- **Owner and Team Tracking** (M11b) — Every certificate is assigned an owner (person + email) and optionally a team. This documents key responsibility and escalation paths.
|
||||
|
||||
- **Issuer Connector Specification** — Configuration and API endpoints document which CA and protocol issues each certificate:
|
||||
- `GET /api/v1/issuers/{id}` returns issuer type (local-ca, acme, step-ca, openssl), CA endpoint, authentication method, constraints
|
||||
- Each issuer type has documented key handling (e.g., Local CA loads CA key from `CERTCTL_CA_CERT_PATH`, step-ca via JWK provisioner)
|
||||
|
||||
- **Immutable Audit Trail** (M19) — Every certificate lifecycle event recorded in append-only `audit_events` table:
|
||||
- `certificate_issued` — when certificate created, by whom, issuer type, profile
|
||||
- `certificate_renewed` — when renewed, by whom, issuer
|
||||
- `certificate_revoked` — when revoked, by whom, RFC 5280 reason code
|
||||
- `certificate_deployed` — when deployed to target, by agent, target type
|
||||
- Query: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}`
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate profiles: `GET /api/v1/profiles` showing documented key types, max TTLs, constraints per environment.
|
||||
- Certificate-to-owner mapping: `GET /api/v1/certificates` with owner/team fields.
|
||||
- Issuer configuration audit: `GET /api/v1/issuers` showing CA endpoints, key storage paths, auth methods.
|
||||
- Audit trail for a certificate: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete lifecycle.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Define and document certificate profiles for each environment and use case.
|
||||
- Assign owner and team to each certificate via API or dashboard.
|
||||
- Document issuer connector configuration (CA endpoint, auth method, key storage location).
|
||||
- Maintain baseline audit trail exports for compliance evidence.
|
||||
- Establish certificate retirement policy (how long to retain audit records after certificate expiry/revocation).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 3.7 — Key Lifecycle Procedures
|
||||
|
||||
**Requirement**: Generate, store, protect, access, and destroy cryptographic keys used to encrypt data in transit or at rest.
|
||||
|
||||
This requirement covers key generation, storage, rotation, and destruction. Certctl addresses the certificate/TLS key portion (not symmetric encryption keys used for cardholder data at rest — those are outside scope).
|
||||
|
||||
#### 3.7.1 — Key Generation
|
||||
|
||||
**Requirement**: Generate new keys using strong cryptography.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Generation** (M8) — Production mode (default `CERTCTL_KEYGEN_MODE=agent`):
|
||||
- Agents generate ECDSA P-256 key pairs using `crypto/ecdsa` + `crypto/elliptic.P256()` + `crypto/rand` (cryptographically secure random).
|
||||
- Key generation happens **only on the agent**, never on the control plane.
|
||||
- Agent submits Certificate Signing Request (CSR) with public key to control plane via `POST /api/v1/agents/{id}/csr`.
|
||||
- Issued certificate is returned; private key remains on agent at `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`).
|
||||
|
||||
- **Server-Side Fallback** (demo/development only) — `CERTCTL_KEYGEN_MODE=server`:
|
||||
- Control plane generates RSA 2048-bit or ECDSA P-256 keys using `crypto/rand` + `crypto/rsa`.
|
||||
- Server signs CSR and stores the private key in the certificate version record for agent deployment. **Security note:** In server keygen mode, the control plane holds private keys — this is why agent keygen mode is the recommended default for production.
|
||||
- **Must not be used in production.** Explicit warning logged: `server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only`
|
||||
|
||||
- **Issuer-Specific Key Negotiation**:
|
||||
- **ACME (Let's Encrypt, ZeroSSL)**: Let's Encrypt controls key types; certctl requests ECDSA P-256 by default.
|
||||
- **Local CA**: Supports RSA 2048+, ECDSA (P-256, P-384), PKCS#8 format. Key algorithm inherited from CA cert or specified via profile.
|
||||
- **step-ca**: Smallstep's provisioner defines key type; certctl respects server constraints.
|
||||
- **OpenSSL / Custom CA**: User-provided signing script; key type depends on CA backend.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment configuration: `CERTCTL_KEYGEN_MODE=agent` in production (verify in `docker-compose.yml`, Kubernetes manifests, or systemd units).
|
||||
- Agent log excerpt showing key generation: Go `crypto/ecdsa.GenerateKey(elliptic.P256())` via agent process logs with CSR submission timestamp.
|
||||
- Certificate CSR audit: `GET /api/v1/audit?type=certificate_issued` showing CSR fingerprint (SHA-256 hash of CSR PEM).
|
||||
- Renewal job logs showing agent-submitted CSR, not server-generated key.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enforce `CERTCTL_KEYGEN_MODE=agent` in all production deployments.** Never use `server` mode outside demos.
|
||||
- Verify agent hardware is adequately isolated (crypto/rand relies on OS `/dev/urandom` quality).
|
||||
- Monitor `CERTCTL_KEY_DIR` on agents for unauthorized file access (use OS-level file audit if available).
|
||||
- Backup agent key directory (`/var/lib/certctl/keys`) as part of disaster recovery procedure.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.2 — Key Storage and Access Control
|
||||
|
||||
**Requirement**: Restrict cryptographic key access to the minimum required and protect keys from unauthorized access.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Storage** (M8) — Private keys written to `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`):
|
||||
- File permissions: `0600` (readable/writable by agent process owner only).
|
||||
- Filename convention: one file per certificate (e.g., `web-tls-prod.key`, `api-service.key`).
|
||||
- No key data passed over the network between agent and control plane (CSR only).
|
||||
- Keys used locally by agent to sign TLS handshakes, never transmitted to control plane or other systems.
|
||||
|
||||
- **Control Plane Key Storage** — Sensitive credentials managed via environment variables or `.env` files:
|
||||
- CA private key path: `CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH` (for Local CA sub-CA mode).
|
||||
- ACME account key: embedded in ACME issuer config (not stored separately; ACME library handles in memory).
|
||||
- step-ca provisioner key: `CERTCTL_STEPCA_KEY_PATH` env var (path to JWK private key file, loaded into memory during runtime).
|
||||
- API keys: `CERTCTL_API_KEY` (SHA-256 hashed in database, plaintext never stored).
|
||||
- Database credentials: `CERTCTL_DATABASE_URL` in `.env` file, not in source code.
|
||||
|
||||
- **Docker Compose Credential Management** — `.env` file (git-ignored) holds all secrets:
|
||||
```bash
|
||||
CERTCTL_API_KEY=sk-test-...
|
||||
CERTCTL_DATABASE_URL=postgres://user:pass@db:5432/certctl
|
||||
CERTCTL_CA_KEY_PATH=/run/secrets/ca.key
|
||||
```
|
||||
Credentials never in `docker-compose.yml` or Dockerfile.
|
||||
|
||||
- **Kubernetes Secrets** (operator responsibility) — Deploy control plane with:
|
||||
```yaml
|
||||
env:
|
||||
- name: CERTCTL_DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: database-url
|
||||
- name: CERTCTL_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: api-key
|
||||
```
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Agent key directory listing (without keys): `ls -la /var/lib/certctl/keys` (shows file count, permissions, timestamps).
|
||||
- Deployment manifest (`docker-compose.yml` or Kubernetes YAML) showing secrets via env var or Secret object (not inline).
|
||||
- `.env` file (do not share contents, only confirm existence and git-ignore status).
|
||||
- API key hash verification: `GET /api/v1/auth/check` with API key, verifying hash matching without plaintext exposure.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Store `.env` and credential files outside version control.** Verify `.gitignore` includes `.env`, `*.key`, `ca.key`, etc.
|
||||
- **Restrict file system access to `/var/lib/certctl/keys` on agents** via OS-level permissions (Linux: `chmod 0700`, owned by agent user).
|
||||
- **Limit CA key file read access** — `CERTCTL_CA_KEY_PATH` should be readable only by certctl server process (OS permissions).
|
||||
- **Rotate API keys periodically** (recommendation: annually or when personnel changes). No audit trail for API key rotation (outside certctl scope).
|
||||
- **Backup private key stores** (agent key dirs, CA key file) as part of disaster recovery. Encrypt backups at rest.
|
||||
- **Monitor access logs** to `/var/lib/certctl/keys` and CA key file location (use OS audit or file integrity monitoring).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.3 — Key Rotation
|
||||
|
||||
**Requirement**: Rotate cryptographic keys upon expiration or compromise.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Automated Certificate Renewal** — Renewal policies trigger certificate renewal automatically:
|
||||
- Background scheduler checks every 60 minutes (configurable via `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL`).
|
||||
- For each policy, evaluates all managed certificates: if `(not-after - now) <= policy.renewal_threshold_days`, trigger renewal.
|
||||
- Renewal job created in AwaitingCSR state; agent receives work, generates new key pair, submits new CSR.
|
||||
- Issuer connector signs new CSR with new key; old key discarded by agent after new certificate installed.
|
||||
- New certificate deployed to target via deployment job.
|
||||
|
||||
- **Expiration-Based Rotation** — Certificate profiles (M11a) define `max_ttl_seconds` (e.g., 31536000 for 1 year, 3600 for short-lived certs):
|
||||
- Short-lived certificates (TTL < 1 hour) rotate every deployment cycle, providing defense-in-depth (RFC 5280 revocation not needed).
|
||||
- Longer-lived certs (90/180/365 days) rotated via renewal policy thresholds (30/14/7 day alerts).
|
||||
|
||||
- **Renewal Audit Trail** — Every renewal recorded:
|
||||
- `GET /api/v1/audit?type=certificate_renewed&resource_id={cert_id}` shows each renewal, old serial, new serial, issuer, actor.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Renewal policy configuration: `GET /api/v1/policies` showing `renewal_threshold_days` and `alert_thresholds_days`.
|
||||
- Renewal job history: `GET /api/v1/jobs?type=Renewal&status=Completed` with timestamp, before/after serial numbers.
|
||||
- Certificate version history: `GET /api/v1/certificates/{id}/versions` showing all issued versions, dates, issuers.
|
||||
- Audit trail: `GET /api/v1/audit?type=certificate_renewed` for trending and compliance reporting.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Define renewal policies for all certificate profiles** with appropriate thresholds (typically 30 days before expiration for 90+ day certs, more aggressive for shorter-lived).
|
||||
- **Monitor renewal job success** via dashboard (M14 charts show renewal success trends) and alerts.
|
||||
- **Investigate renewal failures** (stuck AwaitingCSR, issuer connectivity, deployment errors) promptly to avoid expired certificates.
|
||||
- **Test renewal workflow in staging environment** before rolling out to production.
|
||||
- **Document key rotation schedule** for your organization (renewal policy thresholds, approval workflows if AwaitingApproval).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.4 — Key Destruction
|
||||
|
||||
**Requirement**: Render cryptographic keys unreadable and unusable when they reach the end of their cryptographic lifetime.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Revocation API** (M15a) — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — general revocation
|
||||
- `keyCompromise` — suspected key compromise
|
||||
- `caCompromise` — CA compromise
|
||||
- `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn` — lifecycle management
|
||||
- Revocation recorded in `certificate_revocations` table with timestamp and reason.
|
||||
- Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
|
||||
- Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
|
||||
- CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
|
||||
- Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
|
||||
|
||||
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
|
||||
|
||||
- **Private Key Destruction on Agent** — When certificate renewed or revoked:
|
||||
- Agent removes old private key file from `CERTCTL_KEY_DIR` when new certificate deployed.
|
||||
- Job status tracking confirms old key is no longer needed.
|
||||
- No audit trail of key deletion (private keys don't pass through control plane).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
|
||||
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
|
||||
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
|
||||
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Revoke certificates immediately upon key compromise suspicion** using reason code `keyCompromise`.
|
||||
- **Revoke certificates at end of lifecycle** (host decommissioning, service sunset) using reason code `cessationOfOperation`.
|
||||
- **Monitor CRL/OCSP availability** — ensure clients can check revocation status (test with TLS validator tools).
|
||||
- **Establish certificate revocation procedure** (who can revoke, approval workflow if required, documentation).
|
||||
- **Physically destroy backup private keys** (if offline backups are kept) when certificate is revoked or after archival period expires.
|
||||
- **Test revocation workflow in staging** — issue test cert, revoke, verify OCSP/CRL reflects revocation within SLA.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 8: Identify and Authenticate
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know, and authenticate and manage all access.
|
||||
|
||||
### 8.3 — Strong Authentication
|
||||
|
||||
**Requirement**: Authentication mechanisms must use strong cryptography and render authentication credentials (passwords, passphrases, keys) unreadable during transmission and storage.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **API Key Authentication** — All REST API endpoints require authentication (default):
|
||||
- Bearer token format: `Authorization: Bearer sk-...`
|
||||
- Key stored as SHA-256 hash in database (plaintext never persisted).
|
||||
- Comparison uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks.
|
||||
- Configuration: `CERTCTL_AUTH_TYPE=api-key` (enforced by default, no opt-out without explicit env var).
|
||||
|
||||
- **GUI Authentication Context** — Web dashboard login flow:
|
||||
- Login page (`/login`) accepts API key entry.
|
||||
- AuthProvider context stores API key in session (localStorage in browser, sent in Authorization header for all API calls).
|
||||
- 401 Unauthorized responses trigger automatic redirect to login.
|
||||
- Logout button clears session.
|
||||
- No session server-side (stateless API).
|
||||
|
||||
- **Credential Transmission** — All API traffic over TLS:
|
||||
- HTTPS enforced at server level (no plaintext HTTP).
|
||||
- API key transmitted in Authorization header (not URL parameter, not cookie).
|
||||
- Browser to server: TLS.
|
||||
- Agent to server: TLS.
|
||||
- No credential logging (audit records the per-key actor `Name`, never the Bearer token; logs redact the `Authorization` header).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- API configuration: `CERTCTL_AUTH_TYPE=api-key` in deployment manifest.
|
||||
- Key inventory: `CERTCTL_API_KEYS_NAMED` env var (format `name:key:admin,...`) — seeds the in-memory `NamedAPIKey{Name, Key, Admin}` struct at `internal/api/middleware/middleware.go:29`. Keys are constant-time-compared (`subtle.ConstantTimeCompare`) against the Bearer token. No database table stores them; protect the env var contents at rest via a secrets manager (Vault / AWS Secrets Manager / Kubernetes Secrets / Docker Secrets).
|
||||
- API audit log: `GET /api/v1/audit?action=api_call` showing per-key actor names (`Name` field of matched `NamedAPIKey`) on every call, with zero plaintext or hashed key material recorded.
|
||||
- TLS certificate on control plane: `openssl s_client -connect {server}:8443` showing valid certificate, TLS 1.2+, strong cipher.
|
||||
- GUI login flow: browser network tab showing Authorization header (token value redacted in compliance report).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Issue API keys to users/systems** requiring API access (outside certctl; you maintain key registry).
|
||||
- **Rotate API keys using zero-downtime rotation** — `CERTCTL_AUTH_SECRET` supports comma-separated keys (e.g., `new-key,old-key`). Add the new key, migrate clients, then remove the old key. Recommendation: rotate at least annually, or immediately when personnel changes.
|
||||
- **Revoke API keys immediately** when user leaves or token is compromised (set `enabled=false` in API key management — not yet implemented in v1, owner must track manually).
|
||||
- **Enforce strong TLS** on control plane: TLS 1.2+, modern ciphers (configure on reverse proxy or `CERTCTL_TLS_*` env vars if operator-controlled).
|
||||
- **Protect `.env` and credential files** where API key is defined (restrict file system access, no version control).
|
||||
- **Monitor API audit trail** for suspicious access patterns (many 401 errors, access from unexpected IPs, etc.).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 8.6 — Application Account Management
|
||||
|
||||
**Requirement**: Users' system access must be restricted to the minimum level of application functions or data needed to perform duties. Application accounts (non-human) must use strong authentication.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **No Application Account Management in v1** — Certctl does not manage user accounts (no user directory, LDAP, OIDC).
|
||||
- All authentication via API key (service-to-service or human user with API key).
|
||||
- No per-user roles or permissions (that's V3 RBAC feature).
|
||||
- Single API key shared across team or one key per automation script (operator's responsibility to manage).
|
||||
|
||||
- **Credentials Not in Source Code** — Security hardening:
|
||||
- API keys via `CERTCTL_API_KEY` env var (not in `main.go`, Dockerfile, `docker-compose.yml`).
|
||||
- Database credentials via `CERTCTL_DATABASE_URL` in `.env` (git-ignored).
|
||||
- CA private key path via `CERTCTL_CA_CERT_PATH`/`CERTCTL_CA_KEY_PATH` (not inline).
|
||||
|
||||
- **Service Account Isolation** (planned for V3) — Future RBAC will support:
|
||||
- Automation script API keys with scoped permissions (e.g., read-only, renew-only, deploy-only).
|
||||
- OIDC/SSO for human users with fine-grained role assignment (admin, operator, viewer).
|
||||
- Audit trail showing which account/role performed each action.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment manifest (Dockerfile, docker-compose.yml) showing no hardcoded API keys, database credentials, or CA key paths.
|
||||
- `.env` file existence (confirm via CI or compliance check, without sharing contents).
|
||||
- `.gitignore` configuration showing `.env`, `*.key`, secrets excluded.
|
||||
- Code review: grep `main.go`, `config.go` for `CERTCTL_API_KEY` — should only see env var reference, not hardcoded values.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Manage API keys externally** (issue, rotate, revoke).
|
||||
- **Document who/what has API key access** (automation scripts, team members, third-party integrations).
|
||||
- **Rotate application credentials** (API keys, database passwords) according to your organization's policy.
|
||||
- **Segregate credentials** — one API key per automation script where possible, or use V3 RBAC scoping.
|
||||
- **Monitor application account usage** via audit trail — `GET /api/v1/audit` filtered by action/actor.
|
||||
|
||||
**Status**: **Available in part** (v1.0: credentials out of source code). **Planned V3**: scoped API keys and RBAC.
|
||||
|
||||
---
|
||||
|
||||
## Requirement 10: Log and Monitor
|
||||
|
||||
**Objective**: Log and monitor access to network resources and cardholder data.
|
||||
|
||||
### 10.2 — Implement Automated Audit Logging
|
||||
|
||||
**Requirement**: Automatically log and monitor all access to system components and records containing cardholder data.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable API Audit Log** (M19) — Middleware captures every API call:
|
||||
- `audit_events` table (append-only, no UPDATE/DELETE):
|
||||
- `method`: HTTP method (GET, POST, PUT, DELETE)
|
||||
- `path`: API endpoint path only, excluding query parameters (e.g., `/api/v1/certificates` — query strings intentionally omitted to prevent sensitive data persistence in the append-only audit trail)
|
||||
- `actor`: authenticated user/service (extracted from API key or context)
|
||||
- `body_hash`: SHA-256 hash of request body (truncated to 16 chars, first 8 chars shown in logs)
|
||||
- `status_code`: HTTP response status (200, 201, 400, 401, 404, 500, etc.)
|
||||
- `latency_ms`: request duration in milliseconds
|
||||
- `timestamp`: RFC 3339 timestamp
|
||||
|
||||
- **Certificate Lifecycle Events** — Higher-level events logged separately:
|
||||
- `certificate_issued` — new certificate created, issuer, profile, profile ID
|
||||
- `certificate_renewed` — certificate renewed, old/new serial, renewal policy
|
||||
- `certificate_revoked` — certificate revoked, RFC 5280 reason code
|
||||
- `certificate_deployed` — certificate deployed to target, agent, target type
|
||||
- `certificate_validated` — validation job result (success/failure reason)
|
||||
|
||||
- **Job Lifecycle Events** — Job status transitions:
|
||||
- `job_created` — renewal/issuance/deployment/validation job created
|
||||
- `job_status_updated` — job state change (Pending → AwaitingCSR → Running → Completed/Failed)
|
||||
|
||||
- **Policy and Configuration Events** — Administrative changes:
|
||||
- `policy_created`, `policy_updated`, `policy_deleted` — renewal policy changes
|
||||
- `profile_created`, `profile_updated`, `profile_deleted` — certificate profile changes
|
||||
- `issuer_created`, `issuer_deleted` — CA connector registration changes
|
||||
|
||||
- **Excluded Paths** — Health/readiness probes not logged to reduce noise:
|
||||
- `GET /health` (excluded by default)
|
||||
- `GET /ready` (excluded by default)
|
||||
- Configurable via `CERTCTL_AUDIT_EXCLUDE_PATHS` env var
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Audit trail export: `GET /api/v1/audit` or manual database query, showing sample events with timestamp, actor, action, resource.
|
||||
- API call audit log: Query `audit_events` table showing method, path, actor, status code for last 24-48 hours.
|
||||
- Configuration changes: `GET /api/v1/audit?type=policy_created,policy_updated,issuer_created` showing who changed what and when.
|
||||
- Certificate lifecycle: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete issuance → deployment → renewal/revocation history.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enable audit logging** — it's on by default; verify `CERTCTL_AUDIT_EXCLUDE_PATHS` is not set to exclude certificate-related paths.
|
||||
- **Monitor audit log growth** — `audit_events` table will grow with every API call. Recommend database maintenance (log rotation policy, archival after 90 days, etc.).
|
||||
- **Export and archive audit logs** — periodically `SELECT * FROM audit_events WHERE timestamp > {date}` and export to secure storage (S3, syslog, SIEM).
|
||||
- **Establish audit review procedure** — QSA may request sample of logs; have export process documented.
|
||||
- **Test audit logging** — make API call, verify event appears in audit trail within seconds.
|
||||
|
||||
**Status**: **Available** (M19 shipped)
|
||||
|
||||
### 10.3 — Protect Audit Trail
|
||||
|
||||
**Requirement**: Promptly protect audit trail files from unauthorized modifications.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Append-Only Database Design** — PostgreSQL triggers and constraints prevent modification:
|
||||
- `audit_events` table has no `UPDATE` or `DELETE` triggers.
|
||||
- Application code never executes UPDATE/DELETE on `audit_events`.
|
||||
- Primary key is `id` (serial); new events always INSERT.
|
||||
|
||||
- **Read-Only API Access** — Audit events accessible only via read (`GET /api/v1/audit`):
|
||||
- No `POST /api/v1/audit/{id}` endpoint (no creation from API).
|
||||
- No `PUT /api/v1/audit/{id}` endpoint (no modification).
|
||||
- No `DELETE /api/v1/audit/{id}` endpoint (no deletion).
|
||||
- Only control plane can record events (via internal service layer, not exposed API).
|
||||
|
||||
- **Database Access Control** (operator responsibility) — PostgreSQL user permissions:
|
||||
- `certctl` application user: INSERT, SELECT on `audit_events`.
|
||||
- `certctl_read_only` user (for compliance/audit team): SELECT only on `audit_events`.
|
||||
- `postgres` superuser: restricted to DBA operations, logged separately by PostgreSQL.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database schema: `\d audit_events` showing columns, primary key, no UPDATE/DELETE triggers.
|
||||
- Application code review: `internal/service/audit.go` showing `RecordEvent(...)` as only INSERT operation.
|
||||
- API endpoint audit: grep `internal/api/handler/audit*.go` or `internal/api/router/router.go` — no PUT/DELETE routes for events.
|
||||
- PostgreSQL permissions: `psql -d certctl -c "\dp audit_events"` showing INSERT/SELECT grants only.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Restrict database access** — issue read-only PostgreSQL user for compliance/audit team (no write privileges).
|
||||
- **Enable PostgreSQL query logging** — log all database connections and operations for DBA audit trail.
|
||||
- **Backup audit logs** — regularly export `audit_events` to offsite storage (S3, archive tape, syslog aggregator) for long-term retention.
|
||||
- **Monitor database modifications** — alert if any UPDATE/DELETE is attempted on `audit_events` (log-based alerting or PostgreSQL event triggers).
|
||||
- **Encrypt audit exports** — if archiving to external storage, encrypt backups at rest.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 10.4 — Promptly Review and Address Audit Trail Exceptions
|
||||
|
||||
**Requirement**: Promptly review audit logs and investigate exceptions/anomalies.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Dashboard Charts** (M14) — Real-time observability:
|
||||
- **Renewal Success Trends** (30-day line chart) — shows job success rate; spikes in failures warrant investigation.
|
||||
- **Certificate Status Distribution** (donut chart) — shows Expiring/Expired counts; high Expired = missed renewals.
|
||||
- **Expiration Timeline** (90-day weekly heatmap) — shows upcoming expirations; bunching = renewal policy tuning needed.
|
||||
- **Issuance Rate** (30-day bar chart) — shows certificate creation/renewal activity; anomalies (zero issuances for weeks) indicate stopped automation.
|
||||
|
||||
- **Stats API** (M14) — Machine-readable trends:
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance/deployment success/failure counts per day.
|
||||
- `GET /api/v1/stats/summary` — total certs, counts by status.
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration buckets for forecasting.
|
||||
|
||||
- **Agent Fleet Overview** (M14) — Agent health visibility:
|
||||
- Pie chart: agent status distribution (healthy, offline, error).
|
||||
- Version breakdown: agent versions in use (identify outdated agents).
|
||||
- Per-agent detail: last heartbeat timestamp, OS/architecture, IP address, recent jobs.
|
||||
|
||||
- **Alert Notifications** (M3, M16a) — Configurable escalation:
|
||||
- Email alerts: certificate approaching expiration, renewal failure, revocation notification.
|
||||
- Webhook: custom HTTP POST to your monitoring system (Slack, Teams, PagerDuty, OpsGenie, custom webhook).
|
||||
- **Retry & Dead-Letter Queue** (I-005) — Transient notifier failures (SMTP timeout, webhook 5xx) are retried with exponential backoff (`2^n` minutes capped at 1h, 5-attempt budget) before landing in the terminal `dead` status. Operators monitor DLQ depth via the `certctl_notification_dead_total` Prometheus counter and requeue via the Notifications page Dead letter tab once the underlying outage is resolved. Closes the pre-I-005 silent-drop gap where a single 5xx could lose a compliance-relevant alert without evidence.
|
||||
- Deduplication: one alert per threshold/certificate per day (avoid alert fatigue).
|
||||
|
||||
- **Audit Trail Filtering and Export** (M13) — Compliance reporting:
|
||||
- `GET /api/v1/audit?actor={user}×tamp_after={date}` — filter audit log by actor, timestamp, type.
|
||||
- Export CSV/JSON via dashboard: audit page → select filters → "Export CSV" or "Export JSON".
|
||||
- Can export full audit trail for QSA review.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Dashboard screenshots: expiration timeline, renewal success trends, status distribution.
|
||||
- Job trend report: `GET /api/v1/stats/job-trends?days=90` showing success/failure rates.
|
||||
- Agent fleet health: `GET /api/v1/agents` showing heartbeat status, version count distribution.
|
||||
- Audit log sample: `GET /api/v1/audit?limit=100` showing certificate issuance/renewal/revocation activity.
|
||||
- Alert configuration: screenshot of renewal policy `alert_thresholds_days` (30, 14, 7, 0) and notifier settings (email, Slack, etc.).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dashboard charts weekly** — look for anomalies (high Expired count, failure spike, renewal stalled).
|
||||
- **Respond to alerts promptly** — expiration alert = investigate renewal (check job logs, issuer connectivity, agent heartbeat).
|
||||
- **Set alert thresholds appropriately** — default 30/14/7/0 days is a starting point; adjust per your SLA and staffing.
|
||||
- **Maintain alert distribution list** — ensure alerts reach the right on-call engineer/team.
|
||||
- **Archive and review audit logs** — export monthly/quarterly for compliance trending (e.g., "all certificate changes last quarter").
|
||||
- **Test alert delivery** — trigger a test renewal failure or manual revocation, verify alert is sent.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, M14 observable charts, M19 audit log)
|
||||
|
||||
### 10.7 — Retain and Protect Audit Trail History
|
||||
|
||||
**Requirement**: Retain audit trail history for at least one year and ensure it can be retrieved.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable Audit Trail** (M19) — `audit_events` table stores all API calls and certificate lifecycle events with timestamps.
|
||||
- **No Automatic Purge** — Certctl does not delete audit events. They remain in PostgreSQL indefinitely.
|
||||
- **Queryable History** — All events accessible via `GET /api/v1/audit` with time range, actor, resource filters.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database retention policy: confirm `audit_events` table has no DELETE triggers or maintenance jobs that purge events.
|
||||
- Sample audit query: `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '365 days'` showing one year+ of events.
|
||||
- Export procedure: documented process for exporting audit logs to cold storage (S3, archive tape, syslog).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Configure PostgreSQL backup/retention** — certctl relies on database backups for audit trail protection.
|
||||
- Backup `audit_events` table daily or per your RPO/RTO.
|
||||
- Retain backups for at least 1 year (configure retention policy on backup system).
|
||||
- Test restore procedure annually.
|
||||
|
||||
- **Export and archive audit logs** — periodically export `SELECT * FROM audit_events WHERE timestamp > {start_date}` to offsite storage.
|
||||
- Recommendation: monthly exports to S3 with versioning enabled.
|
||||
- Encrypt exports at rest.
|
||||
- Retain archives for at least 3 years (adjust per your compliance requirements).
|
||||
|
||||
- **Monitor audit log growth** — `audit_events` table will grow ~1-5 MB/day depending on API call volume.
|
||||
- Estimate: 10,000 API calls/day = ~50 MB/month.
|
||||
- Plan PostgreSQL storage and backup capacity accordingly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 6: Develop and Maintain Secure Systems and Applications
|
||||
|
||||
**Objective**: Develop and maintain secure systems and applications.
|
||||
|
||||
### 6.3.1 — Security Coding Practices
|
||||
|
||||
**Requirement**: Develop all custom application code in accordance with secure coding practices and include authentication, access control, input validation, and error handling.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Input Validation** — Centralized validators enforce strong input constraints:
|
||||
- Common name: max 253 chars, DNS-safe characters only, no leading/trailing hyphens.
|
||||
- CSR PEM: must be valid PEM format (regex validation).
|
||||
- Policy type: whitelist enum (Issuance, Renewal, Revocation, etc.).
|
||||
- API key: alphanumeric + hyphens only.
|
||||
- Implemented in `internal/domain/validation.go` and called from all handler layer inputs.
|
||||
|
||||
- **Error Handling** — No sensitive data leakage in error responses:
|
||||
- HTTP 500 errors return generic "Internal Server Error" message, not stack trace.
|
||||
- Database errors logged internally (structured slog), not exposed to client.
|
||||
- 404 errors do not reveal whether resource exists (consistent "Not Found" regardless of auth vs. not-found).
|
||||
|
||||
- **No Hardcoded Credentials** — All secrets via environment variables:
|
||||
- `CERTCTL_API_KEY`, `CERTCTL_DATABASE_URL`, `CERTCTL_CA_KEY_PATH` — env vars only.
|
||||
- Credentials not in `main.go`, Dockerfile, `docker-compose.yml`, or Git history.
|
||||
- `.env` file git-ignored and excluded from version control.
|
||||
|
||||
- **Dependency Management** — Go module pinning (`go.mod`):
|
||||
- All external dependencies pinned to specific versions.
|
||||
- No wildcard versions or `latest` tags.
|
||||
- CI runs `go mod verify` to detect tampering.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Code review: `internal/domain/validation.go` showing input validation functions (Common name length, CSR PEM, policy type, etc.).
|
||||
- Error handling audit: `internal/api/handler/certificates.go` showing HTTP error responses (no stack traces).
|
||||
- Credentials in source code check: `grep -r "CERTCTL_API_KEY\|DATABASE_URL\|CA_KEY" cmd/ internal/ | grep -v ".env"` (should only show env var references, not values).
|
||||
- `go.mod` review: no wildcard versions, all pinned.
|
||||
- CI workflow: `.github/workflows/ci.yml` showing `go mod verify` step.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dependency updates** — keep Go version current, update certctl dependencies regularly (security patches).
|
||||
- **Scan container images** — use Trivy, Clair, or similar to scan Docker images for known vulnerabilities.
|
||||
- **Maintain secure coding practices** in any custom issuer/target connectors you deploy (scripts for OpenSSL, BASH/PowerShell for IIS/F5).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 6.5.10 — Broken Authentication and Cryptography Prevention
|
||||
|
||||
**Requirement**: Prevent broken authentication and cryptography weaknesses.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Authentication** — API key with SHA-256 hashing, constant-time comparison (`crypto/subtle.ConstantTimeCompare`).
|
||||
- **Cryptography** — Go's `crypto/*` standard library (no weak ciphers). ECDSA P-256, RSA 2048+.
|
||||
- **TLS** — HTTPS enforced (no plaintext HTTP endpoints).
|
||||
- **No Sessions** — Stateless API (no session cookies, no session fixation risk).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 7: Restrict Access by Business Need-to-Know
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know and ensure users are authenticated and authorized.
|
||||
|
||||
### 7.2 — Implement Access Control
|
||||
|
||||
**Requirement**: Ensure proper user identity management and implement access controls based on business need-to-know.
|
||||
|
||||
**certctl v1 Support** (limited):
|
||||
- **Certificate Ownership** (M11b) — Each certificate assigned to owner (person + email) and optional team. Ownership is metadata; access control is not enforced at API level.
|
||||
- **Agent Groups** (M11b) — Renewal policies target specific agent groups (OS, architecture, CIDR, version). Groups are used for policy targeting, not user access control.
|
||||
- **Interactive Approval** (M11b) — `AwaitingApproval` job state allows manual approval/rejection of renewals (enforcement of business workflows, not user access control).
|
||||
|
||||
**certctl v3 Support** (planned):
|
||||
- **OIDC/SSO** — Okta, Azure AD, Google integration. Users log in via identity provider.
|
||||
- **Role-Based Access Control (RBAC)** — Three roles: admin (all operations), operator (issue/renew/deploy), viewer (read-only). Roles assigned via OIDC claims or group membership.
|
||||
- **Profile/Owner Gating** — Operator can renew only certificates assigned to their team; viewer cannot modify anything.
|
||||
- **Audit Trail Attribution** — Every action shows which user/role performed it.
|
||||
|
||||
**Evidence You Can Provide** (v1):
|
||||
- Certificate ownership mapping: `GET /api/v1/certificates` showing owner, team fields (metadata only; access not controlled).
|
||||
- Agent group targeting: `GET /api/v1/policies` showing `agent_group_id` field.
|
||||
- Interactive approval workflow: job detail showing `AwaitingApproval` state, approve/reject endpoints in API docs.
|
||||
|
||||
**Operator Responsibility** (v1):
|
||||
- **Manage API key distribution** externally — only issue API keys to authorized users/systems.
|
||||
- **Implement reverse proxy auth** (Nginx, Apache, Okta proxy) in front of certctl to enforce OIDC/LDAP (outside certctl).
|
||||
- **Plan for V3 RBAC** — budget for upgrade when finer-grained access control is needed.
|
||||
|
||||
**Planned** (V3):
|
||||
- Upgrade to certctl Pro with OIDC/RBAC and per-role audit trail.
|
||||
|
||||
**Status**: **Available in part** (v1.0: ownership metadata, agent group targeting). **Planned V3**: OIDC/RBAC enforcement.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
|
||||
|---|---|---|---|---|---|
|
||||
| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
|
||||
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
|
||||
| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
|
||||
| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
|
||||
| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
|
||||
| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
|
||||
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
|
||||
| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
|
||||
| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
|
||||
| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
|
||||
| **10.3** Audit Protection | Append-only table design, read-only API, DB permissions | API endpoint audit (no PUT/DELETE on events), DB schema | `audit_events` table, PostgreSQL GRANT SELECT | Immutable by design | Available |
|
||||
| **10.4** Review & Alert | Dashboard charts, stats API, notifier integrations | Dashboard (renewal trends, status pie, expiration heatmap), `GET /api/v1/stats/*` | Job results, alert config in policies | `GET /api/v1/audit?type=job_*` | Available |
|
||||
| **10.7** Retention | 1+ year in PostgreSQL, export/archive procedures | Database query `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '1 year'` | `audit_events` table retention (no auto-delete) | Manual export/archival (operator) | Available |
|
||||
| **6.3.1** Secure Coding | Input validation, error handling, no hardcoded secrets, dependency pinning | Code review (validation.go, handlers), error responses | `go.mod` with pinned versions, `.gitignore` | GitHub Actions CI with `go mod verify` | Available |
|
||||
| **7.2** Access Control | Ownership metadata, agent groups, interactive approval | `GET /api/v1/certificates` (owner/team), `GET /api/v1/agent-groups` | Certificate owner/team fields, agent group criteria | User identity from auth context | Available in part (V3: RBAC) |
|
||||
|
||||
---
|
||||
|
||||
## Operator Responsibilities
|
||||
|
||||
The following control objectives are **outside certctl's scope** and must be managed by your organization:
|
||||
|
||||
| Control Objective | Responsibility | Example Actions |
|
||||
|---|---|---|
|
||||
| **Network Segmentation** | Isolate certctl control plane from cardholder network | Place certctl on separate VLAN, firewall rules |
|
||||
| **Physical Security** | Restrict access to servers/databases | Data center access controls, logging |
|
||||
| **Personnel Screening** | Background checks for staff with access | HR/employment verification |
|
||||
| **Access Control Enforcement** | User authentication & authorization outside API | Implement reverse proxy with OIDC (V3: use certctl Pro RBAC) |
|
||||
| **Incident Response** | Procedures for certificate compromise or breach | Document key revocation process, alert escalation |
|
||||
| **Disaster Recovery** | Backup and restore procedures | Database backup schedule, offsite replication |
|
||||
| **Change Management** | Approval process for config/cert changes | CAB meetings, documented procedures |
|
||||
| **Vulnerability Scanning** | ASV scanning, penetration testing, code review | Annual PCI-DSS penetration test |
|
||||
| **Key Backup & Escrow** | Secure offline storage of CA private keys (if required) | Hardware security module (HSM) or encrypted vault |
|
||||
| **Audit Log Retention** | Long-term archival and protection of audit logs | Export to S3/syslog, retain 3+ years |
|
||||
| **QSA Engagement** | Schedule and coordination of compliance assessment | Annual audit with qualified security assessor |
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements for PCI-DSS
|
||||
|
||||
Certctl v3 (Pro) adds paid features that strengthen PCI-DSS compliance posture:
|
||||
|
||||
| Feature | PCI-DSS Benefit |
|
||||
|---|---|
|
||||
| **OIDC/SSO Authentication** | Centralized identity management, audit integration with corporate directory |
|
||||
| **Role-Based Access Control (RBAC)** | Least-privilege enforcement: admin, operator, viewer roles with profile/team gating |
|
||||
| **Bulk Revocation by Profile/Owner/Agent** | Rapid incident response (revoke all certs in cardholder network in minutes) |
|
||||
| **NATS Event Bus with JetStream Audit Streaming** | Real-time event streaming to SIEM (Splunk, ELK, Datadog) for centralized audit trail |
|
||||
| **Certificate Health Scores** | Proactive risk identification (composite scoring: expiration proximity, rotation age, key strength) |
|
||||
| **Advanced Search DSL** | Complex audit queries (POST /search with nested AND/OR, regex, field projection) for compliance reporting |
|
||||
| **CT Log Monitoring** | Detect unauthorized certificate issuance (security vulnerability detection) |
|
||||
| **DigiCert Issuer Connector** | Enterprise CA integration for compliance audits |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps for Compliance
|
||||
|
||||
1. **Review this mapping with your QSA** — Confirm which requirements apply to your cardholder data environment.
|
||||
|
||||
2. **Configure certctl for your environment**:
|
||||
- Set `CERTCTL_KEYGEN_MODE=agent` in production.
|
||||
- Define certificate profiles with approved key types.
|
||||
- Configure renewal policies with appropriate thresholds (e.g., 30 days for 90-day certs).
|
||||
- Enable notifier integrations (email, Slack, PagerDuty) for alerts.
|
||||
- Plan `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate locations.
|
||||
|
||||
3. **Implement operator controls**:
|
||||
- Document certificate management procedures (issuance, renewal, revocation, archival).
|
||||
- Establish API key rotation schedule.
|
||||
- Set up audit log export and archival (monthly to S3, retain 1+ year).
|
||||
- Configure PostgreSQL backups (daily, 1+ year retention).
|
||||
- Plan incident response (who revokes certs, escalation process, timeline).
|
||||
|
||||
4. **Test compliance readiness**:
|
||||
- Trigger a test renewal and verify CRL/OCSP publication.
|
||||
- Export audit trail and verify it shows expected events.
|
||||
- Test revocation workflow and confirm OCSP reflects status within 24 hours.
|
||||
- Run discovery scan and verify unknown certs are detected and triaged.
|
||||
|
||||
5. **Prepare evidence for QSA**:
|
||||
- API endpoint documentation (OpenAPI spec: `api/openapi.yaml`).
|
||||
- Audit log sample (last 90 days of events).
|
||||
- Configuration export (profiles, policies, issuer/target definitions).
|
||||
- Deployment manifest (showing env var config, no hardcoded secrets).
|
||||
- Test certificates and CRL/OCSP query results.
|
||||
|
||||
6. **Plan for V3** (if RBAC/centralized audit required):
|
||||
- Evaluate certctl Pro for OIDC/SSO and NATS audit streaming.
|
||||
- Assess integration with existing identity provider (Okta, Azure AD, etc.).
|
||||
|
||||
---
|
||||
|
||||
## Questions?
|
||||
|
||||
For additional guidance on certctl features and PCI-DSS mapping:
|
||||
- Review the [Architecture Guide](architecture.md) for system design.
|
||||
- Check [Connectors Documentation](connectors.md) for issuer/target/notifier capabilities.
|
||||
- Run the [Quick Start Guide](quickstart.md) to see features in action.
|
||||
- Consult your QSA for final compliance determination.
|
||||
|
||||
**Last Updated**: March 24, 2026 (certctl v1.0 with M18b discovery and M19 audit logging)
|
||||
@@ -1,587 +0,0 @@
|
||||
# SOC 2 Type II Compliance Mapping
|
||||
|
||||
This guide maps certctl's implemented features to AICPA SOC 2 Trust Service Criteria (TSC). It is **not a SOC 2 certification claim** — rather, it helps security engineers, auditors, and evaluators understand how certctl supports your organization's SOC 2 compliance posture. Use this as evidence input for your own control assessment during SOC 2 audits.
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
SOC 2 audits require evidence that your infrastructure meets specific Trust Service Criteria. Auditors ask: "Does your certificate management tooling support CC6.1 logical access controls?" This guide answers by mapping certctl's features to specific criteria and pointing to evidence (API endpoints, configuration, audit trail).
|
||||
|
||||
Each section includes:
|
||||
|
||||
- **The TSC requirement** — what the auditor is looking for
|
||||
- **certctl's implementation** — which features address it
|
||||
- **Evidence location** — where to find proof (API endpoint, config variable, source code, audit events)
|
||||
- **V2 vs V3 status** — whether feature is in the free community edition (V2) or paid Pro edition (V3)
|
||||
- **Operator responsibility** — aspects your organization must handle outside of certctl
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [CC6: Logical and Physical Access Controls](#cc6-logical-and-physical-access-controls)
|
||||
- [CC6.1 — Logical Access Security](#cc61--logical-access-security)
|
||||
- [CC6.2 — Prior to Issuing System Credentials](#cc62--prior-to-issuing-system-credentials)
|
||||
- [CC6.3 — Authentication Policies](#cc63--authentication-policies)
|
||||
- [CC6.7 — Information Transmission Protection](#cc67--information-transmission-protection)
|
||||
3. [CC7: System Operations](#cc7-system-operations)
|
||||
- [CC7.1 — System Monitoring](#cc71--system-monitoring)
|
||||
- [CC7.2 — Anomaly Detection](#cc72--anomaly-detection)
|
||||
- [CC7.3 — Incident Response](#cc73--incident-response)
|
||||
- [CC7.4 — Identify and Develop Risk Mitigation Activities](#cc74--identify-and-develop-risk-mitigation-activities)
|
||||
4. [A1: Availability](#a1-availability)
|
||||
- [A1.1/A1.2 — Availability and Recovery](#a11a12--availability-and-recovery)
|
||||
5. [CC8: Change Management](#cc8-change-management)
|
||||
- [CC8.1 — Change Control](#cc81--change-control)
|
||||
6. [Evidence Summary Table](#evidence-summary-table)
|
||||
7. [What Requires Operator Action](#what-requires-operator-action)
|
||||
8. [V3 Enhancements](#v3-enhancements)
|
||||
9. [Conclusion](#conclusion)
|
||||
|
||||
## CC6: Logical and Physical Access Controls
|
||||
|
||||
### CC6.1 — Logical Access Security
|
||||
|
||||
**Requirement**: The entity restricts logical access to digital and information assets and related facilities by applying user identity authentication, registration, access rights, and usage policies.
|
||||
|
||||
**certctl Implementation** (V2 — Community Edition):
|
||||
|
||||
- **API Key Authentication** — All `/api/v1/*` calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
|
||||
- **Standards-based enrollment and PKI distribution endpoints** — EST (`/.well-known/est/*`, RFC 7030), SCEP (`/scep`, `/scep/*`, RFC 8894), and CRL/OCSP (`/.well-known/pki/crl/{issuer_id}`, `/.well-known/pki/ocsp/{issuer_id}/{serial}`, RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer because these protocols cannot present certctl Bearer tokens. Authentication is enforced in-protocol: EST relies on CSR signature verification plus profile policy (RFC 7030 §3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous); SCEP requires a shared `challengePassword` in the PKCS#10 CSR attributes (OID 1.2.840.113549.1.9.7, RFC 8894 §3.2), validated with `crypto/subtle.ConstantTimeCompare`; CRL and OCSP are intentionally anonymous for relying-party accessibility. CWE-306 (missing authentication for a critical function) is closed for SCEP by `preflightSCEPChallengePassword` in `cmd/server/main.go`, which refuses to start the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware) and is pinned by the 27-subtest regression harness at `cmd/server/finalhandler_test.go`.
|
||||
- **GUI Authentication** — Web dashboard includes login screen requiring API key entry. Failed auth redirects to login on 401. Auth context persists across page navigation. Logout clears session.
|
||||
- **Configurable CORS** — API restricts cross-origin requests via `CERTCTL_CORS_ORIGINS` allowlist or wildcard. Preflight caching prevents chatty browser auth flows.
|
||||
- **Token Bucket Rate Limiting** — Per-IP rate limiting (configurable via `CERTCTL_RATE_LIMIT_RPS` / `CERTCTL_RATE_LIMIT_BURST`) returns 429 Too Many Requests with Retry-After header. Prevents credential stuffing and brute-force attacks.
|
||||
- **No Password Storage** — certctl does not store user passwords. API keys are the sole authentication mechanism. Your API key generation, distribution, and rotation policies are your responsibility (see "Operator Responsibility" below).
|
||||
- **Zero-Downtime Key Rotation** — `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `new-key,old-key`). All listed keys are validated with constant-time comparison. Operators can add a new key, migrate clients, then remove the old key — no service restart required for the client migration phase. A single-key warning is logged at startup to encourage rotation configuration.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- API auth implementation: `internal/api/middleware/auth.go`
|
||||
- Auth check endpoint: `GET /api/v1/auth/check` (validates credentials)
|
||||
- Auth info endpoint: `GET /api/v1/auth/info` (returns current auth mode, served without auth so GUI detects mode)
|
||||
- Rate limiting middleware: `internal/api/middleware/rate_limit.go`
|
||||
- CORS configuration: `cmd/server/main.go`, search for `CERTCTL_CORS_ORIGINS`
|
||||
- Final handler dispatch (authenticated vs. unauthenticated routing): `cmd/server/main.go:buildFinalHandler`
|
||||
- SCEP preflight gate (CWE-306 closure): `cmd/server/main.go:preflightSCEPChallengePassword`
|
||||
- SCEP service-layer defense-in-depth (rejects enrollment on empty challenge password, `crypto/subtle.ConstantTimeCompare`): `internal/service/scep.go`
|
||||
- Final handler dispatch regression harness (27 subtests): `cmd/server/finalhandler_test.go`
|
||||
- OpenAPI spec `security: []` overrides on unauthenticated paths: `api/openapi.yaml` (EST `/cacerts`, `/simpleenroll`, `/simplereenroll`, `/csrattrs`; SCEP `/scep` GET+POST; PKI `/crl/{issuer_id}`, `/ocsp/{issuer_id}/{serial}`)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC / SSO Integration** — Optional OIDC providers (Okta, Azure AD, Google) with multi-tenant support. API key fallback for service accounts.
|
||||
- **API Key Scoping** — Per-resource or per-action permissions (e.g., "read certificates from production only" or "issue certs, no revoke")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Generate and securely distribute API keys to authorized users and systems
|
||||
- Rotate API keys regularly (recommend quarterly)
|
||||
- Revoke API keys immediately upon employee departure
|
||||
- Do not commit API keys to version control (use `.env` or secrets management)
|
||||
- Implement your own IP allowlisting at the firewall if needed (certctl enforces CORS at the HTTP layer, not at network layer)
|
||||
|
||||
---
|
||||
|
||||
### CC6.2 — Prior to Issuing System Credentials
|
||||
|
||||
**Requirement**: The entity provisions, modifies, disables, and removes user identities and rights based on an authorization process that considers user responsibility level and changes in those responsibilities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Ownership Attribution** — Certificates can be assigned to an owner (email + name). Owner information is stored and audited (see CC7.2). Ownership is tracked through the lifecycle (issuance, renewal, deployment, revocation). Ownership reassignment is audited via the immutable audit trail.
|
||||
- **Team Assignment** — Owners can be organized into teams. Certificate policies can route notifications to team email addresses.
|
||||
- **Audit Trail Attribution** — Every API call records the actor (extracted from the API key or auth context). The audit trail is immutable — no retroactive modification of who did what.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Ownership domain model: `internal/domain/certificate.go` (OwnerID field)
|
||||
- Owner CRUD API: `GET /api/v1/owners`, `POST /api/v1/owners`, `DELETE /api/v1/owners/{id}`
|
||||
- Team CRUD API: `GET /api/v1/teams`, `POST /api/v1/teams`, `DELETE /api/v1/teams/{id}`
|
||||
- Audit trail API: `GET /api/v1/audit` (actor field in every record)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **RBAC (Role-Based Access Control)** — Predefined roles (Admin, Operator, Viewer) with profile-gated permissions. Administrators manage role assignments.
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Map certctl's ownership model to your organizational structure (departments, teams, on-call rotations)
|
||||
- Establish a formal access request and approval process
|
||||
- Remove ownership access when team members depart
|
||||
- Document your access review process (audit trail shows *who* made changes, but you must justify *why*)
|
||||
|
||||
---
|
||||
|
||||
### CC6.3 — Authentication Policies
|
||||
|
||||
**Requirement**: The entity determines, documents, communicates, and enforces authentication policies that support the identification and authentication of authorized internal and external users and the transmission of user credentials.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **API Key Policy** — All `/api/v1/*` access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup. The standards-based enrollment and PKI distribution endpoints (EST, SCEP, CRL, OCSP) are served unauthenticated at the HTTP layer per their respective RFCs; see CC6.1 for the full authentication contract and CWE-306 closure via `preflightSCEPChallengePassword`.
|
||||
- **Agent Authentication** — Agents authenticate to the server via API keys (same mechanism as users). Agent credentials are separate from user API keys.
|
||||
- **Private Key Policy** — Agent-side key generation is the default (`CERTCTL_KEYGEN_MODE=agent`). Server-side keygen (`CERTCTL_KEYGEN_MODE=server`) requires explicit configuration and logs a warning: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only".
|
||||
- **Password Policy** — Not applicable; certctl uses API keys exclusively. Password management is delegated to your organization's IAM system if you integrate OIDC/SSO (V3).
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Auth type configuration: `internal/config/config.go`, `CERTCTL_AUTH_TYPE` env var
|
||||
- Startup logging: `cmd/server/main.go` (logs auth mode at server startup)
|
||||
- Keygen mode configuration: `internal/config/config.go`, `CERTCTL_KEYGEN_MODE` env var
|
||||
- Keygen mode warning: `cmd/server/main.go` and `cmd/agent/main.go`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC Policy** — Mandatory MFA when OIDC is enabled
|
||||
- **API Key Expiration** — Automatic key rotation policies (e.g., 90-day expiration for user keys, no expiration for long-lived service account keys)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Document your API key generation and distribution policy
|
||||
- Establish a formal change control process for auth configuration changes
|
||||
- Test authentication failures (e.g., expired keys, malformed tokens) in a non-production environment
|
||||
- Integrate certctl authentication into your organization's IAM audit reports (who has API keys, when were they issued, who has revoked them)
|
||||
|
||||
---
|
||||
|
||||
### CC6.7 — Information Transmission Protection
|
||||
|
||||
**Requirement**: The entity restricts the transmission, movement, and removal of information in a manner that prevents unauthorized disclosure, whether through digital or non-digital means.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **TLS for Control Plane** — All API communication occurs over HTTPS (TLS 1.2+). Server uses `tls.Dial()` for outbound connections to issuers and targets. Configuration: `CERTCTL_SERVER_HOST` (default `127.0.0.1`) + `CERTCTL_SERVER_PORT` (default `8080`; Docker Compose maps to `8443`).
|
||||
- **Agent-to-Server Communication** — Agents submit CSRs and heartbeats over HTTPS to the server using the same TLS stack.
|
||||
- **Private Key Isolation** — Agents generate ECDSA P-256 private keys locally (`crypto/ecdsa` + `crypto/elliptic`). Private keys are never transmitted to the server — agents submit CSRs only. Private keys are stored on agent filesystem (`CERTCTL_KEY_DIR`, default `/var/lib/certctl/keys`) with 0600 (owner read/write only) permissions. Server-side keygen mode logs a development warning; production must use agent-side keygen.
|
||||
- **Certificate Storage** — Signed certificates are stored in PostgreSQL as PEM text (along with metadata). Certificates are not secrets and may be transmitted plaintext. Private keys are never stored on the control plane in production (agent-side keygen mode).
|
||||
- **Deployment via Target Connectors** — Target connectors write certificates and keys to local filesystem or network appliance APIs. For NGINX/Apache httpd, files are written with restrictive permissions (0600 for keys). For F5/IIS (V3+), credentials are scoped to a proxy agent in the same network zone — the server never holds network appliance credentials.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- TLS configuration: deploy certctl behind a TLS-terminating reverse proxy (NGINX, HAProxy, or cloud load balancer) or use a TLS sidecar
|
||||
- Agent keygen mode: `cmd/agent/main.go` (ECDSA key generation, filesystem storage with 0600)
|
||||
- Private key handling: `internal/connector/target/nginx/nginx.go` and similar (cert/key file write)
|
||||
- Server-side keygen deprecation: `internal/service/renewal.go` (log warning when enabled)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Hardware Security Module (HSM) Support** — Optional HSM backend for CA key storage (SubCA and Local CA modes)
|
||||
- **Secrets Rotation** — Encrypted key rotation without server restart
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Enable TLS on the control plane in production (deploy behind a TLS-terminating reverse proxy or load balancer with valid certificates)
|
||||
- Enforce TLS on agent-to-server communication via firewall rules (no cleartext HTTP)
|
||||
- Protect agent filesystem key storage with:
|
||||
- File-level permissions (already 0600)
|
||||
- Encrypted filesystems (LUKS, BitLocker, or cloud provider equivalents)
|
||||
- Backup encryption (keys backed up to vault or HSM, never in cleartext backups)
|
||||
- Restrict PostgreSQL access to authorized services only (network isolation, authentication)
|
||||
- For target systems, ensure network traffic from agents to targets is encrypted (TLS, IPsec, or VPN)
|
||||
|
||||
---
|
||||
|
||||
## CC7: System Operations
|
||||
|
||||
### CC7.1 — System Monitoring
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Endpoint** — `GET /health` returns 200 OK with service status. Consumed by Docker health checks and Kubernetes probes.
|
||||
- **Readiness Endpoint** — `GET /ready` returns 200 OK when the database is connected and migrations are applied.
|
||||
- **Background Scheduler Monitoring** — 12 background loops (8 always-on + 4 opt-in) run on a fixed schedule. Authoritative topology in `docs/architecture.md`:
|
||||
- Renewal loop (always-on, 1 hour): scans for certificates approaching renewal threshold
|
||||
- Job processor loop (always-on, 30 seconds): picks up pending/waiting jobs and advances their state
|
||||
- Job retry loop (always-on, 5 minutes, `CERTCTL_SCHEDULER_RETRY_INTERVAL`): retries Failed jobs (I-001)
|
||||
- Job timeout reaper loop (always-on, 10 minutes, `CERTCTL_JOB_TIMEOUT_INTERVAL`): fails AwaitingCSR/AwaitingApproval jobs past timeout (I-003)
|
||||
- Agent health check loop (always-on, 2 minutes): pings agents to detect downtime
|
||||
- Notification dispatcher loop (always-on, 1 minute): sends queued alerts
|
||||
- Notification retry loop (always-on, 2 minutes, `CERTCTL_NOTIFICATION_RETRY_INTERVAL`): exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005)
|
||||
- Short-lived cert expiry loop (always-on, 30 seconds): marks expired short-lived credentials
|
||||
- Network scanner loop (opt-in, 6 hours, `CERTCTL_NETWORK_SCAN_ENABLED`): scans enabled TLS endpoints for certificate discovery
|
||||
- Digest emailer loop (opt-in, 24 hours, `CERTCTL_DIGEST_INTERVAL`): sends scheduled certificate digest email to configured recipients
|
||||
- Endpoint health loop (opt-in, 60 seconds, `CERTCTL_HEALTH_CHECK_INTERVAL`): continuous TLS health probes (M48)
|
||||
- Cloud discovery loop (opt-in, 6 hours, `CERTCTL_CLOUD_DISCOVERY_INTERVAL`): cloud secret manager certificate discovery (M50)
|
||||
Each loop includes `atomic.Bool` idempotency guards, error handling, and structured slog failure logs.
|
||||
- **Metrics Endpoints** — Two formats for monitoring integration:
|
||||
- `GET /api/v1/metrics` — JSON object with gauges, counters, and uptime for custom dashboards
|
||||
- `GET /api/v1/metrics/prometheus` — Prometheus exposition format (`text/plain; version=0.0.4`) for native scraping by Prometheus, Grafana Agent, Datadog, and other OpenMetrics-compatible collectors
|
||||
- **Gauges** — `certctl_certificate_total`, `certctl_certificate_active`, `certctl_certificate_expiring`, `certctl_certificate_expired`, `certctl_certificate_revoked`, `certctl_agent_total`, `certctl_agent_active`, `certctl_job_pending`
|
||||
- **Counters** — `certctl_job_completed_total`, `certctl_job_failed_total`
|
||||
- **Uptime** — `certctl_uptime_seconds` (seconds since server start)
|
||||
All values are point-in-time snapshots computed from database tables.
|
||||
- **Structured Logging** — All scheduler operations, API calls, and connector actions log via `slog` (Go's structured logger). Logs include timestamp, level (DEBUG/INFO/WARN/ERROR), structured fields (e.g., `actor`, `resource_id`, `latency_ms`), and request IDs for tracing.
|
||||
- **Request ID Propagation** — Each HTTP request gets a unique ID (`X-Request-ID` header). The ID is included in all correlated logs, making it easy to trace a single request through multiple service layers.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health/readiness endpoints: `internal/api/handler/health.go`
|
||||
- Background scheduler: `internal/scheduler/scheduler.go` (Start method)
|
||||
- Metrics endpoint: `internal/api/handler/metrics.go`
|
||||
- Stats API endpoints (for detailed time-series): `internal/api/handler/stats.go`
|
||||
- `GET /api/v1/stats/summary` — dashboard KPIs
|
||||
- `GET /api/v1/stats/certificates-by-status` — cert counts by status
|
||||
- `GET /api/v1/stats/expiration-timeline?days=N` — cert expiry distribution
|
||||
- `GET /api/v1/stats/job-trends?days=N` — job completion/failure rates
|
||||
- `GET /api/v1/stats/issuance-rate?days=N` — cert issuance volume
|
||||
- Structured logging middleware: `internal/api/middleware/middleware.go`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure log aggregation (e.g., ELK, Datadog, Splunk) to centralize certctl logs
|
||||
- Set up alerting on scheduler loop failures (e.g., "renewal loop failed to complete within 2h")
|
||||
- Configure health check monitoring (e.g., Prometheus scrape of `/health` and `/ready`)
|
||||
- Establish thresholds for metrics (e.g., alert if `pending_jobs > 50` or `agents_healthy < total_agents`)
|
||||
- Document your log retention policy (audit requirement often mandates 1+ years)
|
||||
- Integrate certctl metrics into your broader observability stack (Grafana dashboards, SLO tracking)
|
||||
|
||||
---
|
||||
|
||||
### CC7.2 — Anomaly Detection
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
(This criterion overlaps CC7.1 and extends it to specific anomaly response mechanisms.)
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, URL path (query parameters intentionally excluded — see security note), actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp. **Security: Query parameters are excluded from the audit path** because they may contain cursor tokens, API keys, or sensitive filter values; since the audit trail is append-only with no deletion, any sensitive data recorded would persist permanently.
|
||||
- **Audit Trail API** — `GET /api/v1/audit?actor=...&action=...&resource_id=...&created_after=...&created_before=...` allows searching for anomalous patterns (e.g., "who accessed certificate XYZ and when?", "did anyone revoke certs at 2 AM?").
|
||||
- **Expiration Threshold Alerting** — Certificate renewal policies define alert thresholds (days before expiry): default `[30, 14, 7, 0]`. When a certificate approaches a threshold, a notification is enqueued. Deduplication prevents duplicate alerts for the same cert at the same threshold. Auto status transition: cert moves to `Expiring` status at 30 days, `Expired` at 0 days.
|
||||
- **Certificate Status Auto-Transitions** — When a cert is issued, it's `Active`. As expiry approaches, status auto-transitions to `Expiring` (at 30d threshold). At expiry, status becomes `Expired`. Revoked certs move to `Revoked`. These transitions are recorded in the audit trail.
|
||||
- **Notification Routing** — Alerts are sent via configured notifiers (Email, Slack, Teams, PagerDuty, OpsGenie). Certificates are routed to their owner's email address (or team email if no individual owner). This allows on-call teams to react to anomalies (e.g., "your production cert will expire in 7 days, request renewal now").
|
||||
- **Deployment Rollback** — If a deployment fails or an older certificate needs to be reactivated, operators can trigger a "rollback" via the GUI. This redeploys a previous certificate version to the target. Rollback actions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Audit middleware: `internal/api/middleware/audit.go`
|
||||
- Audit trail API: `internal/api/handler/audit.go`, `GET /api/v1/audit`
|
||||
- Expiration alerting: `internal/service/renewal.go` (CheckRenewal method)
|
||||
- Notification dispatcher: `internal/scheduler/scheduler.go` (notificationTicker)
|
||||
- Status transitions: `internal/service/certificate.go` (auto status update logic)
|
||||
- Audit trail CLI export: `certctl-cli audit export --format csv` / `--format json`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **SIEM Export** — Real-time audit event streaming to SIEM systems (via NATS event bus with JetStream sink)
|
||||
- **Anomaly Rules Engine** — Configurable rules (e.g., "alert if certificate revoked by non-admin", "alert if >10 certs issued in < 1 hour")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Integrate audit trail into your SIEM / log analysis platform
|
||||
- Define alerting rules and thresholds for anomalies (e.g., "revocation of critical cert", "mass issuance")
|
||||
- Establish a formal incident response workflow (audit trail shows *what* happened; you must decide *what to do* about it)
|
||||
- Regularly review audit logs (e.g., monthly compliance audit of who accessed what)
|
||||
- Configure email/Slack/Teams integration so on-call teams are notified of cert expirations immediately
|
||||
- Encrypt audit trail backups (ACID guarantees don't prevent theft of database backups)
|
||||
|
||||
---
|
||||
|
||||
### CC7.3 — Incident Response
|
||||
|
||||
**Requirement**: The entity detects, investigates, and responds to incidents by executing a defined incident response and management process that includes preparation, detection and analysis, containment, eradication, recovery, and post-incident activities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Revocation API** — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — catch-all
|
||||
- `keyCompromise` — private key was exposed
|
||||
- `caCompromise` — CA itself was compromised (rare)
|
||||
- `affiliationChanged` — certificate no longer applies to the organization
|
||||
- `superseded` — newer cert is in use
|
||||
- `cessationOfOperation` — service is shutting down
|
||||
- `certificateHold` — temporary revocation (can be "unhold" by reissue)
|
||||
- `privilegeWithdrawn` — access rights revoked
|
||||
Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
|
||||
- **CRL Endpoint** — `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
|
||||
- **OCSP Responder** — `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
|
||||
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
|
||||
- Certificate owner (email)
|
||||
- Configured webhooks (if you have a SIEM that subscribes)
|
||||
- Slack/Teams channels (if notifiers are configured)
|
||||
- **Bulk Revocation for Fleet-Wide Incidents** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. Essential for incident response: key compromise affecting multiple certs, CA distrust events, decommissioning a team's infrastructure. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring audit trail and notifications for every certificate.
|
||||
- **Short-Lived Cert Exemption** — Certificates with TTL < 1 hour (configured in profile) skip CRL/OCSP publication. Expiry is the revocation mechanism for short-lived certs (e.g., Kubernetes pod certs, session tokens).
|
||||
- **Deployment Rollback** — If a revoked cert is still deployed (shouldn't happen, but race conditions exist), operators can manually redeploy a previous version via the GUI. Rollback is audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Revocation API: `internal/api/handler/certificates.go`, `POST /api/v1/certificates/{id}/revoke`
|
||||
- Revocation domain model: `internal/domain/revocation.go` (RevocationReason type with RFC 5280 mapping)
|
||||
- CRL generation: `internal/service/certificate.go` (GenerateDERCRL method)
|
||||
- OCSP signing: `internal/service/certificate.go` (GetOCSPResponse method)
|
||||
- Revocation notifications: `internal/service/notification.go` (SendRevocationNotification)
|
||||
- Short-lived exemption: `internal/domain/revocation.go` (IsShortLivedCert check)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Revocation Automation** — Trigger revocation based on external events (e.g., employee termination, security breach alert from CT Log monitoring)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Establish an incident response policy (e.g., "keyCompromise → immediate deployment to new cert + notify CISO")
|
||||
- Ensure CRL/OCSP are accessible to all systems using the certs (e.g., CDN or highly-available endpoints if you host on-premises)
|
||||
- Test revocation workflow in staging (verify that revoked certs are actually blocked by clients)
|
||||
- Document justification for revocation (audit trail records *that* a cert was revoked, but not *why* — you must document it separately)
|
||||
- Integrate revocation notifications into your on-call rotation (don't let revocation alerts get lost)
|
||||
|
||||
---
|
||||
|
||||
### CC7.4 — Identify and Develop Risk Mitigation Activities
|
||||
|
||||
**Requirement**: The entity identifies, develops, and implements risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Renewal Job Tracking** — Renewal jobs track the certificate, target agents, and issuance outcome. Failed renewals are retried (configurable backoff). Job state diagram: Pending → Running → Completed (or Failed). Failed jobs trigger notifications.
|
||||
- **Agent Health Monitoring** — Health check loop (every 2m) pings all agents via heartbeat. If an agent misses 3 consecutive heartbeats, it's marked as `Unhealthy`. Unhealthy agents are excluded from new deployments.
|
||||
- **Job Cancellation** — Operators can cancel pending jobs via `POST /api/v1/jobs/{id}/cancel`. Useful when a renewal is already in progress elsewhere (multi-instance deployments) or when a certificate is being phased out.
|
||||
- **Interactive Approval** — Renewal/issuance jobs can be put in `AwaitingApproval` status. An authorized operator reviews the pending cert and approves or rejects it. Rejection records a reason in the audit trail. This provides a separation of duty between requestor and approver.
|
||||
- **Scheduled Scanning** — Agents scan configured directories for existing certs (M18b discovery). Operators triage discovered certs (claim = "we manage this now", dismiss = "this is unmanaged and we're OK with that"). Triage decisions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Job state machine: `internal/domain/job.go` (JobStatus enum)
|
||||
- Job retry logic: `internal/scheduler/scheduler.go` (jobProcessorTicker)
|
||||
- Agent health check: `internal/scheduler/scheduler.go` (healthCheckTicker)
|
||||
- Job cancellation: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/cancel`
|
||||
- Approval workflow: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/approve` / `reject`
|
||||
- Discovery scan results: `internal/api/handler/discovery.go`, `GET /api/v1/discovered-certificates`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Monitor renewal job success rate (are certs being renewed before expiry?)
|
||||
- Set up alert for unhealthy agents (missing 3+ heartbeats = broken agent, take action)
|
||||
- Establish a formal approval policy (who can approve certs? do they need to involve CISO?)
|
||||
- Test job cancellation and recovery flows in staging
|
||||
- Review discovered certs regularly (are there unmanaged certs that should be managed?)
|
||||
- Document your disaster recovery process (what if control plane database is corrupted?)
|
||||
|
||||
---
|
||||
|
||||
## A1: Availability
|
||||
|
||||
### A1.1/A1.2 — Availability and Recovery
|
||||
|
||||
**Requirement**: The entity obtains or generates, uses, retains, and disposes of information to enable the entity to meet its objectives and respond to its responsibility to provide information.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Probes** — `/health` and `/ready` endpoints support container orchestration (Docker Compose, Kubernetes, etc.). Docker Compose defines health checks for the server and database. Kubernetes would use liveness/readiness probes pointing to these endpoints.
|
||||
- **Database Migrations (Idempotent)** — PostgreSQL migrations use `IF NOT EXISTS` and `ON CONFLICT ... DO NOTHING` patterns. Migrations can be safely reapplied — no risk of doubling data or dropping tables mid-migration.
|
||||
- **Agent Panic Recovery** — Agent binary includes panic recovery in job execution loops. If an agent crashes during a deployment, the control plane marks the job as failed and can retry on a healthy agent.
|
||||
- **Exponential Backoff** — Agent-to-server communication uses exponential backoff (starting at 1s, capped at 5m) to handle transient network failures. This prevents thundering herd when the control plane is temporarily down.
|
||||
- **Docker Compose Deployment** — Includes health checks for server and database. Services auto-restart on failure.
|
||||
- **PostgreSQL Connection Pooling** — Server uses `database/sql` with configurable `MaxOpenConns` and `MaxIdleConns` (default 25/5). Prevents connection exhaustion.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health endpoints: `internal/api/handler/health.go`
|
||||
- Database migrations: `migrations/` directory (all use `IF NOT EXISTS`, idempotent patterns)
|
||||
- Agent panic recovery: `cmd/agent/main.go` (defer recover() in job execution)
|
||||
- Exponential backoff: `cmd/agent/main.go` (heartbeat and work poll backoff logic)
|
||||
- Connection pooling: `cmd/server/main.go` (SetMaxOpenConns, SetMaxIdleConns)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Multi-Region HA** — Control plane federation with etcd consensus (operator can run N replicas)
|
||||
- **PostgreSQL HA** — Replication standby with automatic failover (operator responsibility to configure)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure PostgreSQL backups (e.g., WAL archiving, daily full backups). Certctl stores certificates but *also* stores renewal policies, audit trail, deployment history.
|
||||
- Test backup/restore process in staging (broken backups are discovered during incidents)
|
||||
- Monitor disk usage (PostgreSQL will fail if `/var` fills up)
|
||||
- Plan capacity (how many certs, agents, jobs can your PostgreSQL handle? Certctl is tested with 10k+ certs, 100+ agents, but your infra may differ)
|
||||
- Set up high-availability PostgreSQL if you need zero-downtime upgrades
|
||||
- Implement network segmentation (only authorized services can reach certctl API and database)
|
||||
|
||||
---
|
||||
|
||||
## CC8: Change Management
|
||||
|
||||
### CC8.1 — Change Control
|
||||
|
||||
**Requirement**: The entity identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Certificate Profiles** — Named profiles define allowed key types, max TTL, required SANs, and permitted EKUs. Changes to profiles are common (e.g., "increase max TTL from 1 year to 3 years"). All profile changes are audited (who changed what, when). Profile updates are versioned.
|
||||
- **Policy Engine** — Renewal policies define alert thresholds and approval workflows. Policy changes (e.g., "lower alert threshold from 30 days to 14 days") are audited. Policies have violation rules (e.g., "flag certs longer than 3 years") — violations are recorded in the audit trail.
|
||||
- **Target Configuration** — When a new target (NGINX server, HAProxy load balancer) is added, it's registered with a name and configuration (JSON). Target deletions require confirmation (to prevent accidental removal). All target changes are audited.
|
||||
- **Immutable Audit Trail** — Every change (profile, policy, target, cert, agent, owner, team, approval, revocation, deployment) is recorded in `audit_events`. Audit records are append-only; no retroactive modification is possible. Audit trail is encrypted at rest (operator responsibility).
|
||||
- **GitHub Actions CI** — Pull requests must pass:
|
||||
- Go unit tests (`go test ./...`) with coverage gates (service layer ≥30%, handler layer ≥50%)
|
||||
- Go vet (static analysis)
|
||||
- Frontend TypeScript type checking (`tsc`)
|
||||
- Frontend Vitest unit tests
|
||||
- Frontend Vite build (ensures no broken imports)
|
||||
Only after all checks pass can the PR be merged and deployed.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Profile CRUD: `internal/api/handler/profiles.go`, `GET /api/v1/profiles` / `POST` / `PUT` / `DELETE`
|
||||
- Policy CRUD: `internal/api/handler/policies.go`
|
||||
- Target CRUD: `internal/api/handler/targets.go`
|
||||
- Audit trail: `internal/api/handler/audit.go`, `GET /api/v1/audit` (records action, actor, resource_id, timestamp)
|
||||
- CI configuration: `.github/workflows/ci.yml` (test, vet, coverage gates, build checks)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Change Approval Workflow** — Optional approval gate before profile/policy changes go live
|
||||
- **Feature Flags** — Enable/disable new features without redeployment (backward compatibility during rolling upgrades)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Implement formal change control (ticket system, approval, peer review)
|
||||
- Document the business justification for profile/policy changes
|
||||
- Test changes in a non-production environment before deploying to production
|
||||
- Have a rollback plan (can you revert a profile change instantly if it breaks issuance?)
|
||||
- Include certctl configuration changes in your change log (for audits and incident investigations)
|
||||
- Version control your certctl configuration (Docker Compose file, environment variables) so you can track changes
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| SOC 2 Criterion | certctl Feature | Evidence Location | V2 (Free) | V3 (Pro) | Operator Responsibility |
|
||||
|---|---|---|---|---|---|
|
||||
| **CC6.1** Logical Access Security | API Key Authentication (SHA-256 hashed, constant-time comparison) | `internal/api/middleware/auth.go` | ✅ | Enhanced | API key generation, distribution, rotation |
|
||||
| | GUI Login with API Key | `web/src/pages/LoginPage.tsx` | ✅ | Enhanced (OIDC) | NA |
|
||||
| | CORS Allowlist | `CERTCTL_CORS_ORIGINS` env var | ✅ | ✅ | Configure appropriately |
|
||||
| | Token Bucket Rate Limiting | `internal/api/middleware/rate_limit.go` | ✅ | ✅ | Monitor for brute-force attempts |
|
||||
| **CC6.2** Prior to Issuing System Credentials | Ownership Attribution | `GET /api/v1/owners`, audit trail records owner assignment | ✅ | Enhanced (RBAC) | Map to org structure, remove on departure |
|
||||
| | Team Assignment | `GET /api/v1/teams` | ✅ | ✅ | NA |
|
||||
| | Actor Attribution in Audit Trail | `GET /api/v1/audit` (actor field) | ✅ | ✅ | Justify all changes via separate documentation |
|
||||
| **CC6.3** Authentication Policies | API Key Enforcement | `CERTCTL_AUTH_TYPE=api-key` (default) | ✅ | Enhanced (OIDC, MFA) | Document policy, test failures, integrate into IAM audit |
|
||||
| | Agent Authentication | Separate API keys for agents | ✅ | ✅ | Rotate agent keys, monitor compromise |
|
||||
| | Agent-Side Key Generation | `CERTCTL_KEYGEN_MODE=agent` (default) | ✅ | ✅ | Protect agent filesystem keys via encryption/backup |
|
||||
| | Private Key Policy | Server-side keygen logs warning, disabled in production | ✅ | ✅ | Never use server-side keygen in production |
|
||||
| **CC6.7** Information Transmission Protection | TLS for Control Plane | Deploy behind TLS-terminating reverse proxy | ✅ | ✅ | Enable TLS in production via reverse proxy |
|
||||
| | Agent-to-Server HTTPS | Agents use HTTPS for all API calls | ✅ | ✅ | Enforce TLS via firewall rules |
|
||||
| | Private Key Isolation | Agent-side keygen (ECDSA P-256), keys stored 0600 on agent FS | ✅ | ✅ | Encrypt agent filesystems, backup securely |
|
||||
| | Pull-Only Deployment | Server never initiates outbound to agents/targets | ✅ | Enhanced (HSM, proxy agents) | Encrypt agent↔target comms, isolate proxy agents |
|
||||
| **CC7.1** System Monitoring | Health Endpoint | `GET /health`, `GET /ready` | ✅ | ✅ | Integrate into monitoring (Prometheus, DataDog) |
|
||||
| | Metrics JSON Endpoint | `GET /api/v1/metrics` (gauges, counters, uptime) | ✅ | ✅ | Set thresholds, configure alerting |
|
||||
| | Stats API (time-series) | `GET /api/v1/stats/*` (summary, status, expiration, jobs, issuance) | ✅ | ✅ | Integrate into dashboards, SLO tracking |
|
||||
| | Structured Logging | `slog` middleware with request IDs | ✅ | ✅ | Aggregate logs to SIEM, define retention policy |
|
||||
| | Background Scheduler | 12 loops (8 always-on: renewal 1h, jobs 30s, job retry 5m I-001, job timeout 10m I-003, health 2m, notifications 1m, notif retry 2m I-005, short-lived 30s; 4 opt-in: network scan 6h, digest 24h, endpoint health 60s M48, cloud discovery 6h M50) | ✅ | ✅ | Alert on scheduler loop failures |
|
||||
| **CC7.2** Anomaly Detection | Immutable API Audit Trail | `internal/api/middleware/audit.go`, `GET /api/v1/audit` | ✅ | Enhanced (SIEM export) | Integrate into SIEM, search for anomalies, archive long-term |
|
||||
| | Expiration Threshold Alerting | Configurable per-policy (default 30/14/7/0 days) | ✅ | ✅ | Configure thresholds, integrate notifications |
|
||||
| | Status Auto-Transitions | Active → Expiring (30d) → Expired (0d) | ✅ | ✅ | Monitor status changes in audit trail |
|
||||
| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
|
||||
| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
|
||||
| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
|
||||
| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
|
||||
| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
|
||||
| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
|
||||
| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
|
||||
| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
|
||||
| | Agent Health Monitoring | Health check loop (ping every 2m, mark unhealthy after 3 misses) | ✅ | ✅ | Alert on unhealthy agents, investigate |
|
||||
| | Job Cancellation | `POST /api/v1/jobs/{id}/cancel` | ✅ | ✅ | Test in staging |
|
||||
| | Interactive Approval | AwaitingApproval state, `POST /api/v1/jobs/{id}/approve\|reject` | ✅ | ✅ | Define approval policy, audit decisions |
|
||||
| | Certificate Discovery | Agents scan directories, triage (claim/dismiss) | ✅ | ✅ | Review discovered certs regularly |
|
||||
| **A1.1/A1.2** Availability and Recovery | Health Probes (Docker, Kubernetes) | `/health` and `/ready` endpoints | ✅ | ✅ | Use in container orchestration |
|
||||
| | Idempotent Migrations | `IF NOT EXISTS`, `ON CONFLICT ... DO NOTHING` | ✅ | ✅ | Test migration replay in staging |
|
||||
| | Agent Panic Recovery | Panic recovery in job loops | ✅ | ✅ | Monitor agent crashes in logs |
|
||||
| | Exponential Backoff | Agent heartbeat/work poll backoff (1s → 5m) | ✅ | ✅ | Monitor for control plane downtime |
|
||||
| | PostgreSQL Connection Pooling | MaxOpenConns=25, MaxIdleConns=5 (configurable) | ✅ | ✅ | Monitor connection usage |
|
||||
| **CC8.1** Change Control | Certificate Profiles | CRUD API + GUI, profile changes audited | ✅ | ✅ | Formal change control, test in staging |
|
||||
| | Policy Engine + Violations | CRUD API + GUI, policy changes audited | ✅ | ✅ | Document justification, implement approval workflow |
|
||||
| | Target Registration | CRUD API + GUI, changes audited | ✅ | ✅ | Confirm deletions, version control config |
|
||||
| | Immutable Audit Trail | Append-only `audit_events` table | ✅ | ✅ | Encrypt at rest, archive long-term, no manual edits |
|
||||
| | GitHub Actions CI | Unit tests, vet, coverage gates, build checks | ✅ | ✅ | Review PRs before merge, maintain test quality |
|
||||
|
||||
---
|
||||
|
||||
## What Requires Operator Action
|
||||
|
||||
**certctl is a tool, not a complete compliance solution.** Your organization must handle:
|
||||
|
||||
1. **Physical Security** — Protect the infrastructure (servers, network) running certctl. Certctl can't control who has physical access to your datacenter.
|
||||
|
||||
2. **Personnel Background Checks** — Before granting anyone API key access, conduct background checks per your policy. Certctl records *who* accessed *what*, but doesn't verify that people are trustworthy.
|
||||
|
||||
3. **Formal Incident Response Plan** — Certctl provides incident detection (anomalies in audit trail) and tools for response (revocation, rollback), but you must define *when* to use them and *who* decides.
|
||||
|
||||
4. **Access Review and Removal** — Certctl stores ownership, teams, and API keys. You must:
|
||||
- Regularly review who has access (quarterly or semi-annually)
|
||||
- Immediately revoke API keys for departing employees
|
||||
- Audit that removed access is actually removed (test that old keys fail)
|
||||
|
||||
5. **Log Retention and Archival** — Certctl logs to stdout (Docker) and stores audit events in PostgreSQL. You must:
|
||||
- Ship logs to a long-term archive (SIEM, S3, or equivalent)
|
||||
- Define retention policy (often 1-7 years per industry regulation)
|
||||
- Encrypt archived logs
|
||||
- Test that you can retrieve logs from archive (restoration drills)
|
||||
|
||||
6. **Encryption at Rest** — PostgreSQL data (including audit trail) is stored on disk. You must:
|
||||
- Enable transparent data encryption (TDE) on your database VM
|
||||
- Encrypt container persistent volumes (if using Kubernetes)
|
||||
- Encrypt database backups
|
||||
|
||||
7. **Network Segmentation** — Certctl API and database must be protected by network access controls. You must:
|
||||
- Firewall the control plane (only authorized services can connect)
|
||||
- Use VPN or private networks for agent-to-server communication
|
||||
- Isolate proxy agents (for F5, IIS, etc.) in the same network zone as their targets
|
||||
|
||||
8. **Capacity Planning** — Certctl's performance scales with your PostgreSQL. You must:
|
||||
- Estimate certificate inventory size (10k, 100k, 1M certs?)
|
||||
- Test Certctl with your expected scale in staging
|
||||
- Monitor disk usage, CPU, memory
|
||||
- Plan for growth (add PostgreSQL replicas, increase connection pool, etc.)
|
||||
|
||||
9. **Disaster Recovery** — Certctl data lives in PostgreSQL. You must:
|
||||
- Back up PostgreSQL regularly (daily or hourly, depending on RPO)
|
||||
- Test restore process in staging (broken backups discovered during incidents)
|
||||
- Have a runbook for failover to replica or recovery from backup
|
||||
- Document RTO/RPO targets (how long can cert management be down? how much data can you afford to lose?)
|
||||
|
||||
10. **Integration with Your IAM** — If using OIDC/SSO (V3), you must:
|
||||
- Configure your OIDC provider (Okta, Azure AD, Google)
|
||||
- Map user groups to Certctl roles (Admin, Operator, Viewer)
|
||||
- Manage MFA policy (enforce MFA if required)
|
||||
- Audit user provisioning/deprovisioning
|
||||
|
||||
11. **Documentation and Runbooks** — Certctl documents *what it does* (this guide), but you must document:
|
||||
- Your organization's certificate lifecycle policy (who requests, who approves, who deploys)
|
||||
- How to respond to specific incidents (cert compromise, CA compromise, agent down, renewal failed)
|
||||
- How to operate certctl (day-to-day tasks, escalation procedures)
|
||||
- Contact info for on-call teams
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements
|
||||
|
||||
**certctl Pro (V3, paid edition) adds features that significantly strengthen SOC 2 evidence:**
|
||||
|
||||
- **OIDC / SSO Integration** — Integrate with Okta, Azure AD, Google to replace API keys with federated identity. Enables MFA enforcement and centralized access management. Auditors love federated identity (easier to remove access at source).
|
||||
|
||||
- **Role-Based Access Control (RBAC)** — Predefined roles (Admin: full access; Operator: issue/renew/revoke, no policy changes; Viewer: read-only) with profile-gated enforcement. Allows separation of duties (e.g., junior operator can't change global policy).
|
||||
|
||||
- **NATS Event Bus** — Real-time audit streaming to your SIEM. Hybrid model: HTTP for synchronous APIs, NATS for async events (cert.issued, cert.expiring, agent.heartbeat, job.completed). JetStream persistence for replay and durability.
|
||||
|
||||
- **SIEM Export** — Automated export of audit trail to Splunk, ELK, DataDog, etc. (webhooks, syslog, or pull-based APIs). Makes it easy for security teams to hunt for anomalies.
|
||||
|
||||
- **Advanced Search DSL** — `POST /api/v1/search` with tree-based filters (nested AND/OR, regex, field projection). Enables complex compliance queries (e.g., "all certs issued in the last 30 days by team X that are longer than 1 year").
|
||||
|
||||
- **Bulk Revocation** — Revoke all certs issued by a profile, owner, or agent in one operation. Critical for large-scale incidents (e.g., "a team's CA key was compromised, revoke all their certs").
|
||||
|
||||
- **Certificate Health Scores** — Composite risk scoring (e.g., "this cert has no short-lived TTL enforcement, extends past your policy max, and hasn't been renewed in 2 years" → health=30%). Helps prioritize remediation.
|
||||
|
||||
- **Compliance Scoring** — Audit readiness reporting per certificate (e.g., "compliance=95% — missing only a 3-year max-TTL constraint"). Exportable compliance report.
|
||||
|
||||
- **DigiCert Issuer Connector** — OV/EV certificate issuance for public-facing services (web servers, CDNs). Complements Local CA for internal use.
|
||||
|
||||
- **CT Log Monitoring** — Passive detection of unauthorized cert issuance. Monitors public CT logs for certs matching your domains and alerts if unexpected certs appear (e.g., attacker obtained a cert for your domain).
|
||||
|
||||
- **F5 BIG-IP Implementation** — Full target connector with iControl REST API. Agents can deploy certs to F5 load balancers.
|
||||
|
||||
- **IIS Implementation** — Dual-mode: agent-local PowerShell (default) for servers with agents, or proxy agent WinRM (agentless targets). Full Windows Server integration.
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
certctl provides a strong foundation for SOC 2 compliance with API key authentication, immutable audit logging, automated alerting, and revocation capabilities. However, SOC 2 audits require evidence across your entire infrastructure — certctl is one piece. Use this guide to map certctl features to your audit questionnaire, then work with your auditors to identify gaps that must be filled by your own organizational policies and controls.
|
||||
|
||||
For a deeper SOC 2 discussion or a mock audit against this guide, contact your certctl Pro support team.
|
||||
@@ -1,122 +0,0 @@
|
||||
# Compliance Mapping Guides
|
||||
|
||||
certctl is a certificate lifecycle management tool, not a compliance product. It doesn't make you compliant — your organization, policies, and processes do that. What certctl provides is tooling that supports the technical controls auditors and evaluators look for when assessing certificate and key management practices.
|
||||
|
||||
These guides map certctl's features to three widely referenced compliance frameworks. They're designed for security engineers, IT auditors, and procurement teams evaluating certctl for environments with regulatory requirements.
|
||||
|
||||
## What's Covered
|
||||
|
||||
**[SOC 2 Type II](compliance-soc2.md)** — Maps certctl features to AICPA Trust Service Criteria. Covers logical access controls (CC6), system operations and monitoring (CC7), change management (CC8), and availability (A1). Most relevant for organizations undergoing SOC 2 audits where certificate management is in scope.
|
||||
|
||||
**[PCI-DSS 4.0](compliance-pci-dss.md)** — Maps certctl features to PCI Data Security Standard version 4.0 requirements. Covers data-in-transit protection (Req 4), cryptographic key management (Req 3), authentication (Req 8), audit logging (Req 10), secure development (Req 6), and access control (Req 7). Most relevant for organizations handling cardholder data where TLS certificates protect transmission channels.
|
||||
|
||||
**[NIST SP 800-57](compliance-nist.md)** — Maps certctl's key management practices to NIST Special Publication 800-57 Part 1 Rev 5 (2020). Covers key generation, storage, cryptoperiods, key state lifecycle, algorithm selection, key transport, and revocation. Most relevant for organizations aligning with US federal cryptographic guidance or using NIST as a key management baseline.
|
||||
|
||||
## What These Guides Are Not
|
||||
|
||||
These are mapping guides, not certification claims. certctl is not SOC 2 certified, PCI-DSS validated, or NIST-assessed. The guides document how certctl's technical implementation supports the controls these frameworks require — they do not replace your auditor's assessment, your organization's policies, or your security team's judgment.
|
||||
|
||||
The guides also clearly identify gaps where certctl's current implementation doesn't fully align with a framework's recommendations, features planned for future versions, and areas where operator action is required regardless of what certctl provides.
|
||||
|
||||
## How to Use These Guides
|
||||
|
||||
If you're evaluating certctl for a regulated environment, start with the framework your auditor cares about. Each guide includes an evidence summary table mapping specific compliance criteria to certctl features, API endpoints, and configuration — the kind of specifics your auditor will ask for.
|
||||
|
||||
If you're preparing for an audit and certctl is already deployed, use the "Operator Responsibilities" section of each guide to identify what your organization must manage beyond what certctl provides.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Framework | Primary Concern | Key certctl Features |
|
||||
|---|---|---|
|
||||
| SOC 2 Type II | Trust service criteria for SaaS/infrastructure | API audit trail, auth controls, monitoring, change management |
|
||||
| PCI-DSS 4.0 | Cardholder data protection | TLS lifecycle, key management, immutable logging, access control |
|
||||
| NIST SP 800-57 | Cryptographic key management | Agent-side keygen, key isolation, algorithm selection, revocation |
|
||||
|
||||
## Audit-Trail Integrity & Privacy (Bundle 6)
|
||||
|
||||
Two complementary controls protect the `audit_events` table against tampering and minimize PII exposure. Both apply automatically — no operator action is required at install time, but operators must understand the contract before responding to a legal-hold or retention request.
|
||||
|
||||
### Append-Only Enforcement (HIPAA §164.312(b))
|
||||
|
||||
<!-- Source: migrations/000018_audit_events_worm.up.sql -->
|
||||
|
||||
`audit_events` rows cannot be modified or deleted by the application role. Two layers:
|
||||
|
||||
| Layer | Mechanism | Surface |
|
||||
|---|---|---|
|
||||
| **DB trigger** | `audit_events_block_modification()` raises `check_violation` on `BEFORE UPDATE OR DELETE` | Catches any UPDATE / DELETE — including direct `psql` from the app role |
|
||||
| **App-role grant** | `REVOKE UPDATE, DELETE ON audit_events FROM certctl` | Defence-in-depth; the app role can't even attempt the modification |
|
||||
|
||||
**Verification.** From a `psql` session connected as the `certctl` app role:
|
||||
|
||||
```sql
|
||||
UPDATE audit_events SET actor = 'tampered' WHERE id = 'audit-001';
|
||||
-- ERROR: audit_events is append-only (Bundle-6 / M-017 / HIPAA §164.312(b))
|
||||
-- HINT: Use a compliance superuser role for legitimate retention operations.
|
||||
```
|
||||
|
||||
**Compliance superuser pattern.** Legitimate retention work (legal hold, GDPR right-to-be-forgotten, statutory purges) requires a separate PostgreSQL role provisioned out-of-band that bypasses the trigger. Certctl does NOT auto-create this role — operators provision it per their compliance policy. Suggested shape:
|
||||
|
||||
```sql
|
||||
-- One-time setup by a DBA. Stored procedure pattern keeps the
|
||||
-- compliance superuser audit-able too: every invocation should
|
||||
-- itself land in audit_events.
|
||||
CREATE ROLE certctl_compliance LOGIN PASSWORD '<strong-secret>';
|
||||
GRANT UPDATE, DELETE ON audit_events TO certctl_compliance;
|
||||
-- (optional) provision SECURITY DEFINER stored procedures that
|
||||
-- (a) record the retention reason in audit_events as the FIRST step
|
||||
-- (b) then perform the UPDATE/DELETE
|
||||
-- (c) all under the certctl_compliance role's grants.
|
||||
```
|
||||
|
||||
### Body Redaction (GDPR Art. 32, CWE-532)
|
||||
|
||||
<!-- Source: internal/service/audit_redact.go -->
|
||||
|
||||
`AuditService.RecordEvent` routes every `details` map through `RedactDetailsForAudit` BEFORE marshaling to the JSONB column. Two deny-lists:
|
||||
|
||||
| Category | Match | Replacement | Examples |
|
||||
|---|---|---|---|
|
||||
| **Credentials** | case-insensitive key match | `"[REDACTED:CREDENTIAL]"` | `api_key`, `password`, `token`, `*_pem`, `eab_secret`, `acme_account_key`, `signature` |
|
||||
| **PII** | case-insensitive key match | `"[REDACTED:PII]"` | `email`, `phone`, `ssn`, `dob`, `name`, `address`, `postal_code`, `ip_address` |
|
||||
|
||||
Nested maps and arrays are walked recursively — sensitive keys at any depth get scrubbed. The redactor is mutation-free (the caller's original map is unchanged) so service-layer code that reuses the map elsewhere is safe.
|
||||
|
||||
**Operator visibility — `redacted_keys` array.** The redacted map includes a `redacted_keys` array listing every dotted-path that was scrubbed. This surfaces the redaction footprint to compliance auditors without exposing values. Example before/after:
|
||||
|
||||
```jsonc
|
||||
// Caller's input map (e.g., from a service handler):
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "abc123secret",
|
||||
"contact": { "email": "ops@example.com", "role": "admin" }
|
||||
}
|
||||
}
|
||||
|
||||
// Persisted in audit_events.details:
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "[REDACTED:CREDENTIAL]",
|
||||
"contact": { "email": "[REDACTED:PII]", "role": "admin" }
|
||||
},
|
||||
"redacted_keys": ["config.eab_secret", "config.contact.email"]
|
||||
}
|
||||
```
|
||||
|
||||
**Maintenance.** When introducing a new credential-bearing field anywhere in the codebase, add the key name to `credentialKeys` (or `piiKeys`) in `internal/service/audit_redact.go`. The unit test suite in `audit_redact_test.go` exercises every entry and proves case-insensitivity + JSON round-trip safety.
|
||||
|
||||
## certctl Pro (V3) Enhancements
|
||||
|
||||
Several compliance-relevant features are planned for certctl Pro:
|
||||
|
||||
- **OIDC/SSO** — Enterprise identity provider integration (SOC 2 CC6.1, PCI-DSS 8.3)
|
||||
- **RBAC** — Role-based access control with admin/operator/viewer roles (SOC 2 CC6.3, PCI-DSS 7.2)
|
||||
- **NATS Audit Streaming** — Real-time audit event streaming to SIEM systems (SOC 2 CC7.2, PCI-DSS 10.2)
|
||||
- **Bulk Revocation** — Fleet-wide incident response capability (NIST SP 800-57 Section 5.4)
|
||||
- **Health/Compliance Scoring** — Automated compliance posture assessment per certificate
|
||||
@@ -1,7 +1,9 @@
|
||||
# CI Pipeline — Operator Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> Authoritative guide to certctl's CI pipeline shape.
|
||||
> Per `cowork/ci-pipeline-cleanup-prompt.md` Phase 12.
|
||||
> Per the ci-pipeline-cleanup spec, Phase 12.
|
||||
|
||||
## Trigger model
|
||||
|
||||
@@ -51,7 +53,7 @@ Runs the Go build/test suite + 18 of 20 regression guards.
|
||||
|
||||
Steps:
|
||||
1. `actions/checkout@v4`
|
||||
2. `actions/setup-go@v5` (Go 1.25.9)
|
||||
2. `actions/setup-go@v5` (Go 1.25.10)
|
||||
3. `go build ./cmd/...` (server, agent, mcp-server, cli)
|
||||
4. **gofmt drift** — `gofmt -l .` must be empty (Makefile::verify parity)
|
||||
5. **go mod tidy drift** — `go mod tidy && git diff --exit-code go.mod go.sum`
|
||||
@@ -95,7 +97,7 @@ Single-job collapse of the prior 12-job matrix (per ci-pipeline-cleanup Phase 5
|
||||
|
||||
Steps:
|
||||
1. `actions/checkout@v5`
|
||||
2. `actions/setup-go@v5` (Go 1.25.9, cache: true)
|
||||
2. `actions/setup-go@v5` (Go 1.25.10, cache: true)
|
||||
3. **Build f5-mock-icontrol sidecar** — only sidecar without published image
|
||||
4. **Bring up all vendor sidecars** — `docker compose --profile deploy-e2e up -d` (11 sidecars)
|
||||
5. **Run all vendor-edge e2e** — `go test -tags integration -race -count=1 -run 'VendorEdge_'`; output captured to `test-output.log`
|
||||
@@ -0,0 +1,68 @@
|
||||
# GUI QA Checklist
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Manual GUI verification pass for release sign-off. Vitest covers component-level behavior; this checklist covers end-to-end flows that only land correctly when the React SPA, the REST API, and the database are all wired together.
|
||||
|
||||
## Prereqs
|
||||
|
||||
The full stack must be running and healthy per [`qa-prerequisites.md`](qa-prerequisites.md). Open `https://localhost:8443` in a fresh browser session (Incognito / Private mode is fine — avoids cached state from previous QA passes).
|
||||
|
||||
## Pages to verify
|
||||
|
||||
For each page, the verification is "open it, confirm it renders without console errors, exercise the documented action, confirm the action lands as expected."
|
||||
|
||||
| Page | Action to verify | Expected result |
|
||||
|---|---|---|
|
||||
| `/dashboard` | Page loads, all 4 stat cards populate | Total / Active / Expiring / Expired counts match `GET /api/v1/stats/summary` |
|
||||
| `/certificates` | Inventory list paginates | "Next page" button works; URL updates with cursor; row count consistent |
|
||||
| `/certificates/<id>` | Detail page opens for any cert | Cert chain renders, deployment status shows, audit timeline visible |
|
||||
| `/issuers` | Catalog renders all configured issuers | Each issuer card shows last-used / status; clicking opens detail |
|
||||
| `/issuers/<id>` | Issuer config form | Edit + Save round-trips through `PATCH /api/v1/issuers/<id>` |
|
||||
| `/issuers/hierarchy` | CA tree view | Multi-level hierarchy renders; admin-gated CRUD buttons present for admins only |
|
||||
| `/agents` | Fleet view | Online/offline status accurate; OS/arch grouping correct |
|
||||
| `/agents/<id>` | Agent detail | Last heartbeat, registered date, deployment job history |
|
||||
| `/agents/groups` | Agent groups CRUD | Create + edit + delete a test group; verify dynamic membership matching |
|
||||
| `/jobs` | Job queue | Filter by status / type works; click into a job opens detail |
|
||||
| `/jobs/<id>` | Job detail | Status, retries, logs, owner attribution |
|
||||
| `/policies` | Renewal policies CRUD | Edit AlertChannels matrix, save, verify backend reflects change |
|
||||
| `/profiles` | Certificate profiles | EKU constraints + max TTL editable; profile binding works |
|
||||
| `/notifications` | Notifier config | Test connection button against each configured notifier |
|
||||
| `/discovery` | Discovery triage | Claim / Dismiss buttons round-trip to backend |
|
||||
| `/network-scans` | Scan target CRUD | Create scan target, trigger immediate scan, results appear |
|
||||
| `/audit` | Audit trail | Filter by actor / action / time range; CSV export works |
|
||||
| `/short-lived` | Short-lived credential dashboard | Live TTL countdown updates; auto-refresh every 10s |
|
||||
| `/observability` | Observability dashboard | Charts render: expiration heatmap, renewal trends, issuance rate |
|
||||
| `/health` | Health monitor | TLS endpoint health: healthy / degraded / down states accurate |
|
||||
| `/digest` | Digest preview | Email preview renders; "Send digest" button dispatches |
|
||||
| `/owners` | Owners CRUD | Create owner with team, edit, delete (after reassigning certs) |
|
||||
| `/teams` | Teams CRUD | Create + delete; verify cascade removes orphan owners |
|
||||
| `/scep` | SCEP admin tabs | Profiles / Intune Monitoring / Recent Activity all populate |
|
||||
| `/est` | EST admin tabs | Profiles / Recent Activity / Trust Bundle all populate |
|
||||
| `/login` | Login flow | API key entry persists for the session; bad key rejected |
|
||||
|
||||
## Console hygiene
|
||||
|
||||
Open browser DevTools and confirm:
|
||||
|
||||
- No uncaught exceptions on any page
|
||||
- No 404 / 500 responses in the Network tab from API calls
|
||||
- No CORS errors
|
||||
- No CSP violations
|
||||
|
||||
## Mobile / narrow-viewport
|
||||
|
||||
The dashboard is desktop-first but should not break catastrophically on narrow viewports. Resize the browser to 380px width; confirm:
|
||||
|
||||
- Sidebar collapses to a hamburger menu
|
||||
- Tables either scroll horizontally or stack on mobile
|
||||
- Forms remain usable
|
||||
|
||||
## Accessibility spot-check
|
||||
|
||||
- Tab through any single page using only the keyboard. Every interactive element must be reachable, and the focus indicator must be visible.
|
||||
- Lighthouse accessibility audit on `/dashboard`: target ≥ 90.
|
||||
|
||||
## Sign-off
|
||||
|
||||
Document any deviations in the release sign-off matrix at [`release-sign-off.md`](release-sign-off.md).
|
||||
@@ -0,0 +1,99 @@
|
||||
# QA Prerequisites
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operational prereqs for running release QA against certctl. Before any of the contributor-facing testing surfaces (test-environment.md, gui-qa-checklist.md, release-sign-off.md) are useful, the local stack needs to be in a known-good state.
|
||||
|
||||
## Why manual QA on top of automated tests?
|
||||
|
||||
Automated tests mock dependencies and run in isolation. Manual QA validates the full integrated stack: real PostgreSQL, real HTTP, real agent binary, real file I/O, real scheduler timing. It catches issues that unit tests can't: migration ordering, Docker networking, env var parsing, browser rendering, and timing-dependent scheduler behavior.
|
||||
|
||||
## Environment setup
|
||||
|
||||
**Step 1: Start the full stack.**
|
||||
|
||||
```bash
|
||||
cd deploy && docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -d
|
||||
```
|
||||
|
||||
This builds three containers (postgres, certctl-server, certctl-agent) and runs them on a bridge network. The `--build` flag ensures you're testing the current code, not a stale image. The `demo` overlay is an override file (no `image:` or `build:` of its own) that layers `CERTCTL_DEMO_SEED=true` onto the base — both files must be passed in that order or compose errors with `service "certctl-server" has neither an image nor a build context specified`. The seed populates the database with realistic fixtures.
|
||||
|
||||
**Step 2: Wait for healthy state.**
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 30); do
|
||||
STATUS=$(docker compose ps --format json 2>/dev/null | jq -r 'select(.Health != null) | "\(.Name): \(.Health)"' 2>/dev/null)
|
||||
echo "$STATUS"
|
||||
echo "$STATUS" | grep -q "unhealthy\|starting" || break
|
||||
sleep 2
|
||||
done
|
||||
```
|
||||
|
||||
Why: Docker Compose starts containers in dependency order (postgres → server → agent), but "started" doesn't mean "ready." Health checks confirm postgres accepts connections, the server responds on `/health`, and the agent process is running.
|
||||
|
||||
**Step 3: Set shell variables used throughout the QA flow.**
|
||||
|
||||
```bash
|
||||
export SERVER=https://localhost:8443
|
||||
export API_KEY="change-me-in-production"
|
||||
export AUTH="Authorization: Bearer $API_KEY"
|
||||
export CT="Content-Type: application/json"
|
||||
export CACERT="--cacert ./deploy/test/certs/ca.crt"
|
||||
```
|
||||
|
||||
Every curl command in QA docs uses these variables. Setting them once avoids typos and keeps the docs copy-pasteable.
|
||||
|
||||
> **Note:** The default Docker Compose sets `CERTCTL_AUTH_TYPE: none` for the demo overlay, meaning auth is disabled. Tests that exercise auth require flipping this to `api-key`; instructions are in the relevant test docs.
|
||||
|
||||
**Step 4: Build CLI and MCP server binaries on the host.**
|
||||
|
||||
```bash
|
||||
go build -o certctl-cli ./cmd/cli/...
|
||||
go build -o certctl-mcp ./cmd/mcp-server/...
|
||||
```
|
||||
|
||||
The CLI and MCP server are separate binaries that talk to the server over HTTP. Building them verifies the code compiles and produces the executables you'll test later.
|
||||
|
||||
## Demo data baseline
|
||||
|
||||
The seed data (`migrations/seed.sql` + `migrations/seed_demo.sql`) pre-populates the database with realistic fixtures. Confirm it loaded:
|
||||
|
||||
```bash
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary | jq .
|
||||
```
|
||||
|
||||
**Expected shape:**
|
||||
|
||||
```json
|
||||
{
|
||||
"total_certificates": 15,
|
||||
"active_certificates": ...,
|
||||
"expiring_certificates": ...,
|
||||
"expired_certificates": ...,
|
||||
"pending_renewals": ...
|
||||
}
|
||||
```
|
||||
|
||||
**Reference IDs in the demo data** (used across QA docs):
|
||||
|
||||
| Resource | IDs | Count |
|
||||
|---|---|---|
|
||||
| Teams | `t-platform`, `t-security`, `t-payments`, `t-frontend`, `t-data` | 5 |
|
||||
| Owners | `o-alice`, `o-bob`, `o-carol`, `o-dave`, `o-eve` | 5 |
|
||||
| Policies | `rp-standard`, `rp-urgent`, `rp-manual` | 3 |
|
||||
| Issuers | `iss-local`, `iss-acme-le`, `iss-stepca`, `iss-digicert` | 4 |
|
||||
| Agents | `ag-web-prod`, `ag-web-staging`, `ag-lb-prod`, `ag-iis-prod`, `ag-data-prod` | 5 |
|
||||
| Targets | `tgt-nginx-prod`, `tgt-nginx-staging`, `tgt-f5-prod`, `tgt-iis-prod`, `tgt-nginx-data` | 5 |
|
||||
| Profiles | `prof-standard-tls`, `prof-internal-mtls`, `prof-short-lived`, `prof-high-security` | 4 |
|
||||
| Certificates | `mc-api-prod`, `mc-web-prod`, `mc-pay-prod`, etc. | 15 |
|
||||
| Agent Groups | `ag-linux-prod`, `ag-linux-amd64`, `ag-windows`, `ag-datacenter-a`, `ag-manual` | 5 |
|
||||
| Network Scan Targets | `nst-dc1-web`, `nst-dc2-apps`, `nst-dmz` | 3 |
|
||||
|
||||
## Once these are green
|
||||
|
||||
Move to the appropriate downstream surface:
|
||||
|
||||
- [`test-environment.md`](test-environment.md) — full local environment tutorial with real CAs (Pebble, step-ca, etc.)
|
||||
- [`gui-qa-checklist.md`](gui-qa-checklist.md) — manual GUI test pass
|
||||
- [`release-sign-off.md`](release-sign-off.md) — release-day checklist
|
||||
- [`testing-strategy.md`](testing-strategy.md) — what we test in CI vs daily deep-scan vs manual QA
|
||||
@@ -1,14 +1,16 @@
|
||||
# QA Test Suite Guide (`qa_test.go`)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
|
||||
>
|
||||
> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
|
||||
> **Self-contained.** Through 2026-05-04 this doc was a companion to a separate `docs/testing-guide.md` (the *what* to test) — that companion was pruned during the Phase 5 docs overhaul (its content dispersed across the audience-organized doc tree). The Part-by-Part Coverage Map below is now the canonical inventory of QA Parts.
|
||||
|
||||
---
|
||||
|
||||
## Test Suite Health (regenerate via `make qa-stats`)
|
||||
|
||||
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; CI's QA-doc drift guards (`.github/workflows/ci.yml`) catch out-of-date Part / cert / issuer counts on every PR. **Last regenerated: 2026-04-27 (Bundle P).**
|
||||
> Snapshot at HEAD. Re-run `make qa-stats` to refresh; the QA-doc seed-count drift guard (`.github/workflows/ci.yml::QA-doc seed-count drift guard`) catches out-of-date cert / issuer counts on every PR. The Part-count drift guard retired in the 2026-05-04 docs overhaul Phase 5 (testing-guide.md was pruned; Part counts are now tracked inside `qa_test.go` itself, not against an external doc). **Last regenerated: 2026-04-27 (Bundle P).**
|
||||
|
||||
| Metric | Value | Target | Status |
|
||||
|---|---|---|---|
|
||||
@@ -18,23 +20,22 @@
|
||||
| Frontend test files | 38 | n/a | ℹ |
|
||||
| Fuzz targets | 11 | ≥10 (one per hand-rolled parser) | ✓ |
|
||||
| `t.Skip` sites | 60 | each carries valid rationale (Bundle O audit) | ✓ |
|
||||
| `qa_test.go` Part_* subtests | 53 | tracks `testing-guide.md` Parts (3 `## Part 15-17` covered indirectly via Parts 42–46) | ✓ |
|
||||
| `testing-guide.md` Parts | 56 | n/a | ℹ |
|
||||
| `qa_test.go` Part_* subtests | 53 | covers 49 of 56 historical QA Parts directly + Parts 15–17 indirectly via Parts 42–46 | ✓ |
|
||||
| Existential cluster line cov (post-Bundle-J + L.B + Bundle 0.7) | acme 55.6%, stepca 90.4%, local-issuer ≥86%, crypto ≥85% | ≥95% | △ ACME below; tracked in `coverage-matrix.md` |
|
||||
| Mutation kill rate (Existential) | unmeasured (operator-runnable per Strengthening #5) | ≥90% | ⚠ |
|
||||
| Race detector clean (`-count=10`) | partial (`-count=3` clean per Phase 0) | 0 races | ⚠ |
|
||||
|
||||
## What Is This File?
|
||||
|
||||
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
|
||||
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates the historical QA Part inventory (preserved in the Part-by-Part Coverage Map below) against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
|
||||
|
||||
It covers **49 of 56 Parts** of the testing guide as automation; the remaining 7 are
|
||||
either manual-only by design or pending QA-suite coverage:
|
||||
|
||||
- **49 `Part_*` automation wrappers**, **~159 leaf subtests** — API calls, database queries, source file checks, performance benchmarks
|
||||
- **11 fully skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.) — see "What This Test Does NOT Cover" below
|
||||
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually per `docs/testing-guide.md` until QA-suite automation lands
|
||||
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
|
||||
- **4 Parts NOT YET AUTOMATED** — Parts 23 (S/MIME & EKU), 24 (OCSP/CRL), 55 (Agent Soft-Retirement), 56 (Notification Retry & Dead-Letter) — must be tested manually until QA-suite automation lands; the Part-by-Part Coverage Map below describes the surface area each Part covers
|
||||
- **Manual-only flows** in addition: GUI flows, scheduler timing, Docker log inspection — must be done by a human (Coverage Map below describes each)
|
||||
|
||||
## Architecture
|
||||
|
||||
@@ -147,8 +148,8 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
|
||||
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
|
||||
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
|
||||
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually per `docs/testing-guide.md::Part 23` |
|
||||
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually per `docs/testing-guide.md::Part 24` |
|
||||
| 23 | S/MIME & EKU Support | 0 (NOT AUTOMATED) | — | S/MIME profile creation; EKU enforcement on issuance; SMIMECapabilities extension presence in issued cert; rejection of profile-violating EKU on CSR. Test manually — see the Coverage Map row |
|
||||
| 24 | OCSP Responder & DER CRL | 0 (NOT AUTOMATED) | — | OCSP request/response (RFC 6960), DER CRL generation, status (Good/Revoked/Unknown), Must-Staple coordination. Test manually — see the Coverage Map row |
|
||||
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
|
||||
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
|
||||
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
|
||||
@@ -163,7 +164,7 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
|
||||
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
|
||||
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
|
||||
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
|
||||
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
|
||||
@@ -178,12 +179,12 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
|
||||
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
|
||||
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
|
||||
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually per `docs/testing-guide.md::Part 55` |
|
||||
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually per `docs/testing-guide.md::Part 56` |
|
||||
| 55 | Agent Soft-Retirement (I-004) | 0 (NOT AUTOMATED) | — | Soft-retire vs hard-retire; force flag; reason capture; foreign-key cascade behavior on retired-agent cert ownership; reactivation. Test manually — see the Coverage Map row |
|
||||
| 56 | Notification Retry & Dead-Letter Queue (I-005) | 0 (NOT AUTOMATED) | — | Retry loop with exponential backoff, dead-letter transition after N retries, requeue endpoint (`POST /api/v1/notifications/{id}/requeue`), idempotency on retry. Test manually — see the Coverage Map row |
|
||||
|
||||
**Totals (verified 2026-04-27):** 49 `Part_*` automation wrappers, ~159 leaf subtests, 11 fully
|
||||
skipped Parts, 4 Parts not yet automated (23, 24, 55, 56), and an unspecified count of manual-only
|
||||
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE '^## Part [0-9]+:' docs/testing-guide.md`
|
||||
flows (GUI, scheduler timing, Docker log inspection). Run `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to count Part_* automation wrappers
|
||||
and `grep -cE 't\.Run\("Part[0-9]+_' deploy/test/qa_test.go` to re-verify.
|
||||
|
||||
## Coverage by Risk Class
|
||||
@@ -192,14 +193,14 @@ A buyer's QA lead reading this doc wants "where are the existential bugs caught?
|
||||
|
||||
| Risk class | Description | Parts in scope | Automation status |
|
||||
|---|---|---|---|
|
||||
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in `testing-guide.md`) |
|
||||
| **Existential** (Critical paths — bugs would compromise CA, leak keys, mis-issue, bypass revocation) | Crypto, PKCS#7, local-issuer, OCSP/CRL, agent keygen, CSR validation | 5 (Revocation), 21 (EST), 23 (S/MIME EKU), 24 (OCSP/CRL), 47 (Digest with cert content), 53 (K8s Secrets), 54 (AWS PCA) | 5/7 automated; Parts 23 + 24 pending (Bundle I Skip stubs in `qa_test.go`; manual playbook in the Coverage Map below) |
|
||||
| **High** (FSM corruption, credential leak, authn/z weakening) | Renewal, jobs, agents, issuers, deployment, scheduler | 4, 7, 8, 9, 18, 19, 20, 22, 25, 28, 29, 32, 33, 48, 49, 55, 56 | 14/17 automated; CLI / MCP / scheduler-loop are inherently SKIP (require compiled binaries / Docker logs); Parts 55 + 56 pending |
|
||||
| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 42–46) |
|
||||
| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
|
||||
| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
|
||||
| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
| **Audit-relevant** | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
|
||||
This is the table acquisition reviewers screenshot for their report. When a new Part lands in `testing-guide.md`, classify it here; the QA-doc Part-count drift guard (`.github/workflows/ci.yml::QA-doc Part-count drift guard`) catches the count mismatch.
|
||||
This is the table acquisition reviewers screenshot for their report. When a new Part_* subtest lands in `qa_test.go`, classify it here.
|
||||
|
||||
## Test Categories
|
||||
|
||||
@@ -231,11 +232,11 @@ Timed API requests with threshold assertions:
|
||||
|
||||
## What This Test Does NOT Cover
|
||||
|
||||
These gaps must be filled by manual testing per `docs/testing-guide.md`:
|
||||
These gaps must be filled by manual testing — see each Coverage Map row for surface-area description:
|
||||
|
||||
### Not Yet Automated (Parts 23, 24, 55, 56)
|
||||
|
||||
These Parts are documented in `docs/testing-guide.md` but have no `Part_*` automation
|
||||
These historical QA Parts are listed in the Coverage Map below but have no `Part_*` automation
|
||||
in `qa_test.go` yet. They are operator-runnable from the manual playbook; QA-suite
|
||||
automation should land before the next acquisition-grade release.
|
||||
|
||||
@@ -429,7 +430,7 @@ grep -oE 'mutation score is [0-9.]+' tool-output/mutation-crypto.txt | tail -1
|
||||
|
||||
When a new feature ships:
|
||||
|
||||
1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
|
||||
1. **Add a Part section** in `qa_test.go` following the numbering convention in the Coverage Map below
|
||||
2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
|
||||
3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
|
||||
4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
|
||||
@@ -0,0 +1,93 @@
|
||||
# Release Sign-Off
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Release-day checklist for tagging a new certctl release. Walks through the gates that must be green before pushing the tag, in the order they should be verified.
|
||||
|
||||
## Pre-release: code state
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| `master` is at the commit you intend to tag | `git log -1 --format='%H %s'` | ☐ |
|
||||
| Working tree clean | `git status -sb` | ☐ |
|
||||
| Local matches GitHub | `curl -sS https://api.github.com/repos/certctl-io/certctl/commits/master \| grep -oE '"sha": "[a-f0-9]+"' \| head -1` matches local | ☐ |
|
||||
| `WORKSPACE-CHANGELOG.md` updated with the release's milestones | manual review | ☐ |
|
||||
| `certctl/CHANGELOG.md` updated (release-facing) | manual review | ☐ |
|
||||
| Migration ladder ends cleanly | `ls migrations/*.up.sql \| sort \| tail -3` shows the right last migration | ☐ |
|
||||
|
||||
## Pre-release: automated gates (CI)
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| CI pipeline green on the tag-target commit | GitHub Actions web UI | ☐ |
|
||||
| `make verify` clean locally | run from repo root | ☐ |
|
||||
| `go test -race -count=1 ./...` clean | full race check | ☐ |
|
||||
| `golangci-lint run ./...` clean | local lint | ☐ |
|
||||
| `govulncheck ./...` clean | vulnerability scan | ☐ |
|
||||
| Coverage thresholds met (service ≥55%, handler ≥60%, domain ≥40%, middleware ≥30%) | `go test -coverprofile=cover.out ./... && go tool cover -func=cover.out` | ☐ |
|
||||
| Frontend type-check + Vitest + Vite build clean | `cd web && npm run typecheck && npm run test && npm run build` | ☐ |
|
||||
|
||||
## Pre-release: manual QA passes
|
||||
|
||||
| Surface | Checklist | Pass |
|
||||
|---|---|---|
|
||||
| Local stack boots clean from scratch | `qa-prerequisites.md` Steps 1-4 green | ☐ |
|
||||
| GUI QA checklist | `gui-qa-checklist.md` end to end | ☐ |
|
||||
| End-to-end test environment | `test-environment.md` Steps 1-14 green | ☐ |
|
||||
| Performance baselines | `performance-baselines.md` four spot checks within bounds | ☐ |
|
||||
| Helm chart deploys clean | `helm-deployment.md` install + verify | ☐ |
|
||||
| ACME server interop (cert-manager) | `make acme-cert-manager-test` green | ☐ |
|
||||
| ACME server RFC conformance (lego) | `make acme-rfc-conformance-test` green | ☐ |
|
||||
|
||||
## Release artefact verification
|
||||
|
||||
After the release workflow runs (triggered by tag push), verify the published artefacts:
|
||||
|
||||
| Artefact | How to verify | Pass |
|
||||
|---|---|---|
|
||||
| Cosign keyless OIDC signature on `checksums.txt` | per `docs/reference/release-verification.md` step 2 | ☐ |
|
||||
| SLSA Level 3 provenance on each binary | step 3 | ☐ |
|
||||
| Container image signature + SBOM + provenance | step 4 | ☐ |
|
||||
| Release notes published on GitHub Releases page | manual review | ☐ |
|
||||
| ghcr.io images at `ghcr.io/certctl-io/certctl-{server,agent}:<tag>` pullable | `docker pull` round-trips | ☐ |
|
||||
|
||||
## Branch protection + tag push
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| `master` branch protection rule allows the tag push | Repository Settings → Branches | ☐ |
|
||||
| Tag pushed | `git tag -s v<version> -m 'Release v<version>'; git push origin v<version>` | ☐ |
|
||||
| Release workflow kicked off in GitHub Actions | watch the Actions tab | ☐ |
|
||||
|
||||
## Post-release
|
||||
|
||||
| Gate | How to check | Pass |
|
||||
|---|---|---|
|
||||
| Release workflow completed without errors | GitHub Actions | ☐ |
|
||||
| Sample binary downloaded and Cosign-verified by an operator who is not the release author | another team member | ☐ |
|
||||
| `WORKSPACE-CHANGELOG.md` notes the tag commit SHA | manual edit | ☐ |
|
||||
| workspace-tracking "Active Focus" → "Current tag" updated | manual edit | ☐ |
|
||||
| `certctl.io/index.html` star count + `data-gh-version` rendering picks up the new tag | open the landing page in 6+ hours (cache TTL) | ☐ |
|
||||
| Reddit / Hacker News / LinkedIn announcement drafted (if a major release) | per the operator's promotion playbook | ☐ |
|
||||
|
||||
## If a gate fails
|
||||
|
||||
Revert the tag push immediately:
|
||||
|
||||
```bash
|
||||
git push --delete origin v<version>
|
||||
git tag -d v<version>
|
||||
```
|
||||
|
||||
Investigate, fix, re-tag.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/contributor/qa-prerequisites.md`](qa-prerequisites.md) — local stack prereqs
|
||||
- [`docs/contributor/test-environment.md`](test-environment.md) — full local environment tutorial
|
||||
- [`docs/contributor/gui-qa-checklist.md`](gui-qa-checklist.md) — GUI manual QA pass
|
||||
- [`docs/contributor/testing-strategy.md`](testing-strategy.md) — what we test in CI vs deep-scan vs manual QA
|
||||
- [`docs/contributor/ci-pipeline.md`](ci-pipeline.md) — CI shape and regression guards
|
||||
- [`docs/operator/performance-baselines.md`](../operator/performance-baselines.md) — performance regression spot checks
|
||||
- [`docs/operator/helm-deployment.md`](../operator/helm-deployment.md) — Helm install + verify
|
||||
- [`docs/reference/release-verification.md`](../reference/release-verification.md) — Cosign / SLSA / SBOM verification procedure
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl Testing Environment
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
A step-by-step guide to running certctl locally with real certificate authorities. Every command is spelled out. Every expected output is shown. If something goes wrong, the troubleshooting section tells you exactly what to check.
|
||||
|
||||
---
|
||||
@@ -171,7 +173,7 @@ curl --cacert "$CA" -f https://localhost:8443/health
|
||||
|
||||
Expect `{"status":"ok"}`. If `curl` errors with `SSL certificate problem: unable to get local issuer certificate`, the init container hasn't finished yet — wait a few seconds and retry. If the file doesn't exist at all, the bind mount didn't populate; `docker compose -f docker-compose.test.yml logs certctl-tls-init` should show the self-sign ran.
|
||||
|
||||
For a full explanation of the cert provisioning patterns (self-signed bootstrap, operator-supplied, cert-manager), see [`tls.md`](tls.md). For the one-step cutover from the old plaintext test harness to HTTPS, see [`upgrade-to-tls.md`](upgrade-to-tls.md).
|
||||
For a full explanation of the cert provisioning patterns (self-signed bootstrap, operator-supplied, cert-manager), see [`tls.md`](../operator/tls.md). For the one-step cutover from the old plaintext test harness to HTTPS, see [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -811,17 +813,30 @@ All containers share a bridge network (`certctl-test`, subnet 10.30.50.0/24) wit
|
||||
|
||||
### Key Generation Flow (Agent-Side)
|
||||
|
||||
```
|
||||
Server creates job (AwaitingCSR) → Agent polls, sees job →
|
||||
Agent generates ECDSA P-256 key pair locally →
|
||||
Agent creates CSR (public key + CN + SANs) →
|
||||
Agent POSTs CSR to server → Server signs via issuer →
|
||||
Server stores cert, creates Deployment job (Pending) →
|
||||
Agent polls, sees Deployment job →
|
||||
Agent fetches signed cert from server →
|
||||
Agent reads local private key from /var/lib/certctl/keys/ →
|
||||
Agent writes cert + key + chain to /nginx-certs/ (shared volume) →
|
||||
Job marked Completed
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant Srv as certctl-server
|
||||
participant Iss as Issuer connector
|
||||
participant Agt as certctl-agent
|
||||
participant FS as /var/lib/certctl/keys/<br/>(local agent FS)
|
||||
participant Vol as /nginx-certs/<br/>(shared volume)
|
||||
|
||||
Srv->>Srv: create Job (AwaitingCSR)
|
||||
Agt->>Srv: poll for jobs
|
||||
Srv-->>Agt: Job(AwaitingCSR)
|
||||
Agt->>FS: generate ECDSA P-256 keypair
|
||||
Agt->>Agt: build CSR (pubkey + CN + SANs)
|
||||
Agt->>Srv: POST CSR
|
||||
Srv->>Iss: sign CSR
|
||||
Iss-->>Srv: signed cert
|
||||
Srv->>Srv: store cert; create Deployment Job (Pending)
|
||||
Agt->>Srv: poll for jobs
|
||||
Srv-->>Agt: Job(Deployment)
|
||||
Agt->>Srv: GET signed cert
|
||||
Agt->>FS: read private key
|
||||
Agt->>Vol: write cert + key + chain
|
||||
Agt->>Srv: mark Job(Completed)
|
||||
```
|
||||
|
||||
### Shared Volume Architecture
|
||||
@@ -1,12 +1,14 @@
|
||||
# certctl Testing Strategy & Deep-Scan Operator Runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This doc covers the **testing topology** (per-PR fast gates vs. daily deep-scan
|
||||
gates), and the **operator runbook** for re-running each deep-scan tool locally
|
||||
when the CI receipt is ambiguous or when an operator wants to validate a fix
|
||||
before the next scheduled scan.
|
||||
|
||||
For the manual end-to-end QA playbook, see [`testing-guide.md`](testing-guide.md).
|
||||
For the security posture / per-finding closure log, see [`security.md`](security.md).
|
||||
For the manual end-to-end QA playbook, see [`testing-guide.md`](../testing-guide.md).
|
||||
For the security posture / per-finding closure log, see [`security.md`](../operator/security.md).
|
||||
|
||||
## CI workflow split
|
||||
|
||||
@@ -53,7 +55,7 @@ the bug the mutant introduced).
|
||||
|
||||
**Acceptance threshold:** ≥80% mutation kill ratio per package. Surviving
|
||||
mutants below that threshold get triaged in
|
||||
`cowork/comprehensive-audit-2026-04-25/d003-mutation-results.md` — either
|
||||
the project's 2026-04-25 mutation-results notes — either
|
||||
ship a targeted unit test that kills the mutant, or document an
|
||||
equivalent-mutation justification.
|
||||
|
||||
@@ -191,8 +193,8 @@ Re-run any of the deep-scan tools locally when:
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/security.md`](security.md) — security posture, per-finding closure log.
|
||||
- [`docs/testing-guide.md`](testing-guide.md) — manual end-to-end QA playbook.
|
||||
- [`docs/operator/security.md`](../operator/security.md) — security posture, per-finding closure log.
|
||||
- [`docs/testing-guide.md`](../testing-guide.md) — manual end-to-end QA playbook.
|
||||
- [`.github/workflows/ci.yml`](../.github/workflows/ci.yml) — per-PR fast gates.
|
||||
- [`.github/workflows/security-deep-scan.yml`](../.github/workflows/security-deep-scan.yml) — daily deep-scan gates.
|
||||
- [`scripts/install-security-tools.sh`](../scripts/install-security-tools.sh) — Go-host-installed tools (the docker-based tools are not in this script).
|
||||
-1606
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,7 @@
|
||||
# Advanced Demo: Certificate Lifecycle End-to-End
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This demo goes beyond browsing pre-loaded data. You'll create a team, register an owner, set up an issuer, create a certificate, trigger renewal, and watch everything appear in the dashboard in real time. Each step includes a technical explanation of what's happening inside certctl and why the system is designed that way.
|
||||
|
||||
**Time**: 15-20 minutes
|
||||
@@ -363,7 +365,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
| `issuer_id` | Links to the issuer connector that will sign this certificate. Determines which CA backend is used. |
|
||||
| `renewal_policy_id` | Links to a `renewal_policies` row that defines: how many days before expiry to renew (`renewal_window_days`), whether auto-renewal is enabled (`auto_renew`), max retries, and retry interval. The default policy (`rp-default`) renews 30 days before expiry. |
|
||||
| `status` | Set to `Pending` because the certificate hasn't been issued yet. The scheduler will pick it up, or you can trigger renewal manually. |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"pci": "true"` for compliance scoping). |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"environment": "production"` for fleet scoping). |
|
||||
|
||||
**Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details.
|
||||
|
||||
@@ -603,7 +605,7 @@ curl -s "$API/api/v1/audit?created_after=2026-03-24T09:00:00Z" | jq '.data | len
|
||||
|
||||
The audit middleware (M19) records every HTTP request: method, path, status code, actor, request body SHA-256 hash, and latency. This creates a complete API audit trail without blocking responses (logging happens asynchronously).
|
||||
|
||||
**Why immutable audit:** Compliance frameworks (SOC 2 Type II, PCI-DSS, ISO 27001) require tamper-evident audit logs. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
**Why immutable audit:** tamper-evident audit logs are a hard requirement when an attacker has compromised the API server. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
|
||||
**Check the dashboard.** The "Audit" view shows the full timeline of all actions across the system with filtering and CSV/JSON export.
|
||||
|
||||
@@ -701,7 +703,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
|
||||
**Why `environment` matters:** The environment field isn't just metadata — it feeds the policy engine. A policy rule with type `AllowedEnvironments` can restrict which environments are valid. If someone tries to create a certificate with `environment: "yolo"`, the policy engine flags a violation. In a mature deployment, you'd enforce policies strictly: production certificates must use a trusted CA (not Local CA), staging certificates can use Let's Encrypt staging, and development certificates can use the Local CA.
|
||||
|
||||
**Why `pci: true` in tags:** Tags are free-form, but they enable powerful filtering and compliance scoping. A security team could query `GET /api/v1/certificates?tags.pci=true` (not implemented yet, but the JSONB column supports it) to find all PCI-scoped certificates and verify they meet compliance requirements.
|
||||
**Why arbitrary tags in metadata:** Tags are free-form, but they enable powerful filtering and fleet scoping. A security team could query `GET /api/v1/certificates?tags.regulated=true` (not implemented yet, but the JSONB column supports it) to find all certificates marked regulated and verify they meet whatever requirements that label maps to.
|
||||
|
||||
**Refresh the dashboard** — you'll see the new payment gateway certificate. Try filtering by environment or status to see how both certificates appear alongside the demo data.
|
||||
|
||||
@@ -778,7 +780,7 @@ Check existing violations:
|
||||
curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq .
|
||||
```
|
||||
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific non-compliance.
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific violation.
|
||||
|
||||
**In the dashboard**, click "Policies" in the sidebar to see all active rules and which certificates are violating them.
|
||||
|
||||
@@ -844,7 +846,7 @@ curl -s -X POST $API/api/v1/profiles \
|
||||
|
||||
**How it works:** Certificate profiles are stored in the `certificate_profiles` table with a `allowed_key_algorithms` JSONB column that defines which key types and minimum sizes are acceptable. When a certificate is assigned to a profile, the profile constraints are enforced during CSR validation. The `max_validity_days` field controls the maximum certificate lifetime — profiles with values translating to under 1 hour enable short-lived certificate mode, where certs are exempt from CRL/OCSP.
|
||||
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce compliance requirements across the fleet.
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce policy across the fleet.
|
||||
|
||||
**In the dashboard**, click "Profiles" in the sidebar to see and manage certificate profiles.
|
||||
|
||||
@@ -894,17 +896,17 @@ Approve or reject them:
|
||||
# Approve a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Verified key type meets compliance requirements"}' | jq .
|
||||
-d '{"reason": "Verified key type meets policy"}' | jq .
|
||||
|
||||
# Reject a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet PCI requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy"}' | jq .
|
||||
```
|
||||
|
||||
**How it works:** When a renewal policy has `auto_renew` set to false, renewal jobs enter the `AwaitingApproval` state instead of being processed immediately. An operator must explicitly approve or reject the job via the API or the GUI. Approved jobs transition to `Pending` and are picked up by the job processor. Rejected jobs move to `Cancelled` with the provided reason recorded in the audit trail.
|
||||
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. PCI-scoped certificates, certs with specific compliance requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. High-value certificates, certs with specific policy requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
|
||||
**In the dashboard:** Click "Jobs" in the sidebar, filter by status "AwaitingApproval", and you'll see a list of renewal jobs waiting for approval. Each job shows the certificate, issuer, and requested validity period. Click a job to open its detail view and see the Approve / Reject buttons with a reason text field. After approval or rejection, the job status updates in real-time and the audit trail records the decision.
|
||||
|
||||
@@ -987,7 +989,7 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
|
||||
## Part 15: MCP Server for AI Integration (M18a)
|
||||
|
||||
certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with Claude, Cursor, and other AI assistants:
|
||||
certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with any MCP-compatible AI client:
|
||||
|
||||
```bash
|
||||
# Build the MCP server
|
||||
@@ -1008,19 +1010,19 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
- **Binary support** — handles DER-encoded CRL and OCSP responses without mangling
|
||||
- **Error translation** — converts HTTP errors to user-readable messages
|
||||
|
||||
**Example usage from Claude:**
|
||||
**Example usage:**
|
||||
|
||||
```
|
||||
User: What certificates are expiring in the next 30 days?
|
||||
|
||||
Claude uses the MCP tools to:
|
||||
The AI client uses the MCP tools to:
|
||||
1. Call tools.listCertificates with filters: {status: "Expiring"}
|
||||
2. Parse the response
|
||||
3. Display: "mc-api-prod expires in 12 days. mc-cdn-prod expires in 8 days..."
|
||||
|
||||
User: Revoke mc-payments due to key compromise
|
||||
|
||||
Claude uses the MCP tools to:
|
||||
The AI client uses the MCP tools to:
|
||||
1. Call tools.revokeCertificate with id="mc-payments" reason="keyCompromise"
|
||||
2. Return the audit trail entry showing revocation recorded
|
||||
```
|
||||
@@ -1,5 +1,7 @@
|
||||
# Understanding Certificates: A Beginner's Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential.
|
||||
|
||||
## Contents
|
||||
@@ -123,7 +125,7 @@ At no point does the private key leave the agent. This is a fundamental security
|
||||
|
||||
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
|
||||
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for audit reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
@@ -220,7 +222,7 @@ certctl implements revocation using three complementary mechanisms:
|
||||
|
||||
**Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`. The CRL is **pre-generated** by a scheduler-driven loop (`crlGenerationLoop`, default interval 1 hour, configurable via `CERTCTL_CRL_GENERATION_INTERVAL`) and persisted in the `crl_cache` table — HTTP fetches read from the cache rather than rebuilding per request, so a busy CA does not DOS itself at scale. Concurrent regeneration requests for the same issuer are coalesced via an in-tree singleflight gate.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](../reference/protocols/crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
|
||||
|
||||
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
|
||||
|
||||
@@ -242,7 +244,7 @@ Every action in certctl — issuing a certificate, renewing one, deploying to a
|
||||
|
||||
### Audit Trail
|
||||
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for compliance (SOC 2, PCI-DSS, ISO 27001) and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for audit and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
|
||||
### Notifications
|
||||
|
||||
@@ -256,7 +258,7 @@ The CLI supports both table and JSON output formats (`--format table` or `--form
|
||||
|
||||
### MCP Server (AI Integration)
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.
|
||||
|
||||
@@ -279,7 +281,7 @@ This gives you a three-step triage workflow:
|
||||
|
||||
Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.
|
||||
|
||||
This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
|
||||
This is a prerequisite for multi-CA migration, audit reviews, and building confidence that you've found all the certificates that matter.
|
||||
|
||||
### Observability
|
||||
|
||||
@@ -291,4 +293,4 @@ The agent fleet overview page groups agents by OS, architecture, and version, sh
|
||||
|
||||
Now that you understand the concepts, head to the [Quick Start Guide](quickstart.md) to get certctl running locally in under 5 minutes. You'll see a pre-loaded dashboard with demo certificates, explore the API, and understand how everything fits together.
|
||||
|
||||
For a deeper look at the system design, see the [Architecture Guide](architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](mcp.md). For the full API reference, see the [OpenAPI Spec Guide](openapi.md).
|
||||
For a deeper look at the system design, see the [Architecture Guide](../reference/architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](../reference/mcp.md). For the full API reference, see the [OpenAPI Spec Guide](../reference/api.md).
|
||||
@@ -1,5 +1,7 @@
|
||||
# Deployment Examples
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Five turnkey docker-compose scenarios, each runnable in under 5 minutes. Pick the one closest to your setup.
|
||||
|
||||
## Which Example Should I Use?
|
||||
@@ -30,9 +32,9 @@ cp .env.example .env # Edit with your domain and email
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../examples/acme-nginx/acme-nginx.md).
|
||||
The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../../examples/acme-nginx/acme-nginx.md).
|
||||
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](migrate-from-certbot.md).
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](../migration/from-certbot.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -50,9 +52,9 @@ cp .env.example .env # Edit with domain, email, DNS provider credentials
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
|
||||
The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
|
||||
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](migrate-from-acmesh.md).
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](../migration/from-acmesh.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -69,7 +71,7 @@ cd examples/private-ca-traefik
|
||||
docker compose up -d # Self-signed mode (no .env needed for demo)
|
||||
```
|
||||
|
||||
The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../examples/private-ca-traefik/private-ca-traefik.md).
|
||||
The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../../examples/private-ca-traefik/private-ca-traefik.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -86,7 +88,7 @@ cd examples/step-ca-haproxy
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../examples/step-ca-haproxy/step-ca-haproxy.md).
|
||||
The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../../examples/step-ca-haproxy/step-ca-haproxy.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -103,9 +105,9 @@ cd examples/multi-issuer
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../examples/multi-issuer/multi-issuer.md).
|
||||
The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../../examples/multi-issuer/multi-issuer.md).
|
||||
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](certctl-for-cert-manager-users.md).
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](../migration/cert-manager-coexistence.md).
|
||||
|
||||
---
|
||||
|
||||
@@ -117,4 +119,4 @@ These 5 scenarios cover the most common deployment patterns, but certctl support
|
||||
|
||||
**Targets:** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell or WinRM proxy), Postfix, Dovecot, F5 BIG-IP (coming soon).
|
||||
|
||||
See [Connector Reference](connectors.md) for configuration details on every issuer and target.
|
||||
See [Connector Reference](../reference/connectors/index.md) for configuration details on every issuer and target.
|
||||
@@ -1,5 +1,7 @@
|
||||
# Quick Start Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Certificate lifespans are dropping to **47 days by 2029**. At that cadence, a team managing 100 certificates is processing 7+ renewals per week — every week, forever. Manual processes break. certctl automates the entire lifecycle: issuance, renewal, deployment, revocation, and audit — with zero human intervention.
|
||||
|
||||
This guide gets you running in 5 minutes and walks you through everything certctl does.
|
||||
@@ -120,7 +122,7 @@ curl --cacert "$CA" https://localhost:8443/health
|
||||
{"status":"healthy"}
|
||||
```
|
||||
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/tls.md`](tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/upgrade-to-tls.md`](upgrade-to-tls.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/operator/tls.md`](../operator/tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/archive/upgrades/to-tls-v2.2.md`](../archive/upgrades/to-tls-v2.2.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
|
||||
## Open the Dashboard
|
||||
|
||||
@@ -130,7 +132,7 @@ Open **https://localhost:8443** in your browser. Your browser will warn about th
|
||||
>
|
||||
> **Key rotation:** `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `CERTCTL_AUTH_SECRET=new-key,old-key`). Both keys are valid simultaneously, enabling zero-downtime rotation: add the new key, roll clients over, then remove the old key.
|
||||
|
||||
The dashboard comes pre-loaded with 35 demo certificates across 5 issuers, 8 agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization.
|
||||
The dashboard comes pre-loaded with demo data covering certificates across multiple issuers, agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization. (Re-derive exact counts via `grep -oE 'mc-[a-z0-9_-]+' migrations/seed_demo.sql | sort -u | wc -l`.)
|
||||
|
||||
### What you're looking at
|
||||
|
||||
@@ -322,7 +324,7 @@ curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve
|
||||
# Reject a pending job
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy requirements"}' | jq .
|
||||
```
|
||||
|
||||
## Certificate Discovery
|
||||
@@ -436,7 +438,7 @@ export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # MCP is env-vars-only; no CLI flag
|
||||
./mcp-server
|
||||
```
|
||||
|
||||
Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
Exposes the full REST API via MCP over stdio transport. Ask your MCP client: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
|
||||
## Demo Data Reference
|
||||
|
||||
@@ -447,7 +449,7 @@ Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certif
|
||||
| Issuers | 5 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, ZeroSSL (EAB), Custom OpenSSL CA |
|
||||
| Agents | 9 | 8 real agents (linux/darwin/windows, amd64/arm64) + server-scanner (network discovery) |
|
||||
| Targets | 8 | NGINX prod, NGINX staging, NGINX data, HAProxy, Apache, IIS, Traefik, Caddy |
|
||||
| Certificates | 35 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Certificates | 32 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Jobs | 50+ | 90 days of issuance, renewal, deployment jobs + 2 AwaitingApproval |
|
||||
| Discovered Certs | 12 | Unmanaged (filesystem + network), Managed (linked), Dismissed |
|
||||
| Discovery Scans | 8 | Historical + recent agent filesystem scans + network TLS scans |
|
||||
@@ -480,7 +482,7 @@ A suggested 5-minute flow:
|
||||
6. **Agent fleet** — "Agents handle key generation locally (ECDSA P-256). Private keys never leave your infrastructure."
|
||||
7. **Discovery** — "Agents scan filesystems, server probes TLS endpoints. We find what you're not managing yet."
|
||||
8. **Bulk operations** — "Select multiple certs, renew or revoke in bulk. At 47-day lifespans with hundreds of certs, this is essential."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for compliance."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for review."
|
||||
10. **CLI + MCP** — "Terminal users get `certctl-cli`. AI assistants get MCP integration. Everything is API-first."
|
||||
|
||||
## Tear Down
|
||||
@@ -496,7 +498,7 @@ The `-v` flag removes the PostgreSQL data volume for a clean slate.
|
||||
**Ready to deploy with your stack?** The [Deployment Examples](examples.md) page has 5 turnkey docker-compose scenarios — pick the one closest to your setup and have it running in minutes. It also covers migration paths from Certbot, acme.sh, and cert-manager.
|
||||
|
||||
- **[Deployment Examples](examples.md)** — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer
|
||||
- **[Advanced Demo](demo-advanced.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Reference](connectors.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Advanced Demo](advanced-demo.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](../reference/architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Reference](../reference/connectors/index.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Concepts Guide](concepts.md)** — TLS certificates, CAs, and private keys explained from scratch
|
||||
@@ -1,5 +1,7 @@
|
||||
# Why certctl?
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Certificate management is broken at every scale between "one domain on Let's Encrypt" and "Fortune 500 budget for Venafi." certctl fills that gap: a self-hosted platform that automates the entire certificate lifecycle, works with any CA, deploys to any server, and keeps private keys on your infrastructure. It's free, source-available, and you own everything.
|
||||
|
||||
## The Math That Forces the Decision
|
||||
@@ -32,17 +34,22 @@ This isn't a premium feature. It's the default behavior, free. Most alternatives
|
||||
|
||||
### 2. CA-Agnostic Issuer Architecture
|
||||
|
||||
certctl works with any certificate authority, not just ACME providers. Nine issuer connectors ship today, all free:
|
||||
certctl works with any certificate authority, not just ACME providers. Twelve issuer connectors ship today, all free:
|
||||
|
||||
- **ACME v2** (Let's Encrypt, ZeroSSL, Google Trust Services, Buypass) — HTTP-01, DNS-01, DNS-PERSIST-01 challenges, External Account Binding, ACME Renewal Information (RFC 9773), certificate profile selection
|
||||
- **HashiCorp Vault PKI** — `/v1/{mount}/sign/{role}` API, token auth
|
||||
- **DigiCert CertCentral** — async order model, OV/EV support
|
||||
- **Sectigo SCM** — async order model, DV/OV/EV support, 3-header auth
|
||||
- **Google Cloud CAS** — Certificate Authority Service, OAuth2 service account auth, CA pool selection
|
||||
- **AWS ACM Private CA** — managed private CA on AWS, IAM-authenticated, SDK-waiter for issuance
|
||||
- **Entrust Certificate Services** — Entrust CA Gateway with mTLS auth, approval-pending support
|
||||
- **GlobalSign Atlas HVCA** — region-pinned commercial CA with dual mTLS + API key/secret auth
|
||||
- **EJBCA / Keyfactor** — self-hosted open-source / Keyfactor enterprise CA, mTLS or OAuth2
|
||||
- **step-ca** (Smallstep) — native /sign API with JWK provisioner auth
|
||||
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root)
|
||||
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root); supports multi-level CA tree mode
|
||||
- **OpenSSL / Custom CA** — delegate signing to any shell script
|
||||
- **EST enrollment** (RFC 7030) — device certs for WiFi/802.1X, MDM, IoT
|
||||
|
||||
EST (RFC 7030) and SCEP (RFC 8894) are protocol surfaces, not separate issuers — they dispatch to whichever issuer above is configured for the EST/SCEP profile.
|
||||
|
||||
Every connector implements the same interface. Running multiple CAs in parallel — Let's Encrypt for public certs, Vault for internal services, your enterprise CA for legacy systems — is configuration, not code.
|
||||
|
||||
@@ -56,19 +63,19 @@ A reload command can exit 0 while the certificate doesn't take effect — wrong
|
||||
|
||||
The three differentiators above get the headlines, but the feature surface is wider than most paid platforms:
|
||||
|
||||
**13 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix, Dovecot, SSH (agentless), Windows Certificate Store, and Java Keystore. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
**15 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix/Dovecot (dual-mode), SSH (agentless), Windows Certificate Store, Java Keystore, Kubernetes Secrets, AWS Certificate Manager, and Azure Key Vault. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
|
||||
**Network certificate discovery** — active TLS scanning of CIDR ranges finds certificates you didn't know existed. Agents also scan local filesystems for PEM/DER files. Everything feeds into a triage workflow where you claim, dismiss, or import discovered certs into management.
|
||||
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete. Mapped to SOC 2, PCI-DSS 4.0, and NIST SP 800-57 compliance frameworks with published evidence guides.
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete.
|
||||
|
||||
**Policy engine** — 5 rule types (allowed issuers, allowed domains, required metadata, allowed environments, renewal lead time) with violation tracking and severity levels.
|
||||
|
||||
**PKI compliance** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
**Revocation infrastructure** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
|
||||
**Prometheus metrics** — `/api/v1/metrics/prometheus` in standard exposition format. Works with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics.
|
||||
|
||||
**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via Claude, Cursor, or any MCP-compatible client. No other certificate platform offers this.
|
||||
**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via any MCP-compatible client. No other certificate platform offers this.
|
||||
|
||||
**Full REST API** — OpenAPI 3.1-documented operations covering the entire platform. CLI tool with 10 subcommands. Helm chart for Kubernetes deployment. Scheduled certificate digest emails. Certificate export in PEM and PKCS#12. S/MIME support with EKU-aware issuance.
|
||||
|
||||
@@ -82,7 +89,7 @@ ACME clients solve one slice of the problem — issuance and renewal from ACME C
|
||||
|
||||
### vs. Agent-Based SaaS
|
||||
|
||||
The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 9 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
|
||||
The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 12 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
|
||||
|
||||
### vs. Commercial PKI Platforms
|
||||
|
||||
@@ -110,7 +117,7 @@ cd certctl/deploy && docker compose up -d
|
||||
# Dashboard at https://localhost:8443 (self-signed cert — pin deploy/test/certs/ca.crt)
|
||||
```
|
||||
|
||||
See the [Quickstart Guide](quickstart.md) for a full walkthrough, or explore the [5 turnkey examples](../examples/) for specific scenarios (ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer).
|
||||
See the [Quickstart Guide](quickstart.md) for a full walkthrough, or explore the [5 turnkey examples](../../examples/) for specific scenarios (ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer).
|
||||
|
||||
## License
|
||||
|
||||
@@ -1,5 +1,14 @@
|
||||
# Caddy Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running Caddy 2.7+ and
|
||||
> want it to ACME-issue from certctl (your internal CA, your private
|
||||
> PKI, or a local sub-CA chained under an enterprise root) instead of
|
||||
> Let's Encrypt. The Caddyfile changes are minimal; the load-bearing
|
||||
> piece is trusting certctl's bootstrap CA so Caddy's ACME client can
|
||||
> talk to certctl over HTTPS.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Caddy 2.7+. Target audience: operator running Caddy on a VM
|
||||
or container who wants Caddy to ACME-issue from certctl instead of
|
||||
@@ -10,7 +19,7 @@ Let's Encrypt.
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
|
||||
Step 2.
|
||||
- Caddy 2.7.x or later. `caddy version` should show 2.7.0+.
|
||||
- Network reachability: Caddy → certctl-server's HTTPS listener (port
|
||||
@@ -149,7 +158,7 @@ psql -c "SELECT actor, action, resource_id FROM audit_events
|
||||
legitimately high throughput.
|
||||
- **Caddy logs `urn:ietf:params:acme:error:rejectedIdentifier`** →
|
||||
the SAN list includes an identifier the certctl profile policy
|
||||
rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](./acme-server.md#certificate-readyfalse-with-rejectedidentifier).
|
||||
rejects. Cross-reference [`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#certificate-readyfalse-with-rejectedidentifier).
|
||||
- **`badNonce` in Caddy logs** → clock skew or multi-replica certctl
|
||||
without sticky sessions; same fix as the cert-manager walkthrough.
|
||||
|
||||
@@ -165,8 +174,8 @@ rm -rf ~/.local/share/caddy/certificates/certctl.example.com-*
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
|
||||
K8s-native equivalent.
|
||||
- [Caddy upstream ACME docs](https://caddyserver.com/docs/automatic-https#acme-issuer)
|
||||
— verify behavior pinned here against Caddy 2.7.x semantics.
|
||||
@@ -1,11 +1,22 @@
|
||||
# cert-manager Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running cert-manager
|
||||
> 1.15+ in Kubernetes and want it to issue certs from certctl (your
|
||||
> internal CA, your private PKI, or a local sub-CA chained under an
|
||||
> enterprise root) via the standard ACME `ClusterIssuer` model. If
|
||||
> you want certctl to coexist with cert-manager rather than replace
|
||||
> its issuer backend, see
|
||||
> [`docs/migration/cert-manager-coexistence.md`](cert-manager-coexistence.md)
|
||||
> instead.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through cert-manager 1.15+. Target audience: Kubernetes operator who
|
||||
has never deployed certctl before and wants a working
|
||||
`Certificate` → `Secret` flow on their cluster in under 30 minutes.
|
||||
|
||||
The Phase 5 integration test (`make acme-cert-manager-test`) automates
|
||||
The cert-manager integration test (`make acme-cert-manager-test`) automates
|
||||
exactly the recipe below. The YAML snippets in this doc are byte-equal
|
||||
to the files under `deploy/test/acme-integration/` — re-running the
|
||||
test from a fresh clone produces the same results documented here.
|
||||
@@ -13,7 +24,7 @@ test from a fresh clone produces the same results documented here.
|
||||
## Prereqs
|
||||
|
||||
- A Kubernetes cluster (kind / k3d / EKS / GKE / AKS / on-prem). For
|
||||
local trial, `kind v0.20+` works exactly the way the Phase 5 test
|
||||
local trial, `kind v0.20+` works exactly the way the integration test
|
||||
uses it. The kind config lives at
|
||||
[`deploy/test/acme-integration/kind-config.yaml`](../deploy/test/acme-integration/kind-config.yaml).
|
||||
- `kubectl` v1.27+, `helm` v3.13+.
|
||||
@@ -26,7 +37,7 @@ test from a fresh clone produces the same results documented here.
|
||||
|
||||
which is the same idempotent installer the integration test uses.
|
||||
- A certctl Helm chart published to a registry your cluster can pull
|
||||
from. The Phase 5 test uses an `image.tag=test` placeholder; production
|
||||
from. The integration test uses an `image.tag=test` placeholder; production
|
||||
deployments use the actual image tag for your release line.
|
||||
|
||||
## Step 1 — Deploy certctl-server
|
||||
@@ -64,7 +75,7 @@ curl -X POST https://certctl-test.default.svc.cluster.local:8443/api/profiles \
|
||||
```
|
||||
|
||||
Auth-mode tradeoffs are covered in
|
||||
[`docs/acme-server.md` § Auth-mode decision tree](./acme-server.md#auth-mode-decision-tree).
|
||||
[`docs/acme-server.md` § Auth-mode decision tree](../reference/protocols/acme-server.md#auth-mode-decision-tree).
|
||||
For first-time deployments, `trust_authenticated` is the right default.
|
||||
|
||||
## Step 3 — Capture the certctl bootstrap CA
|
||||
@@ -83,12 +94,12 @@ cat deploy/test/certs/ca.crt | base64 -w0
|
||||
Capture the output for Step 4. This is **the** single biggest first-
|
||||
time-deploy footgun on the cert-manager integration path. The reference
|
||||
recipe lives in
|
||||
[`docs/acme-server.md` § TLS trust bootstrap](./acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
||||
[`docs/acme-server.md` § TLS trust bootstrap](../reference/protocols/acme-server.md#tls-trust-bootstrap-read-this-before-configuring-cert-manager).
|
||||
|
||||
## Step 4 — Apply the ClusterIssuer
|
||||
|
||||
```yaml
|
||||
# Phase 5 — sample ClusterIssuer for the certctl trust_authenticated
|
||||
# sample ClusterIssuer for the certctl trust_authenticated
|
||||
# auth mode (RFC 8555 §6 + certctl auth_mode=trust_authenticated, where
|
||||
# the JWS-authenticated ACME account is trusted to issue any identifier
|
||||
# the profile policy permits — no per-identifier ownership challenges).
|
||||
@@ -158,7 +169,7 @@ HTTP-01 to work.
|
||||
## Step 5 — Apply the Certificate
|
||||
|
||||
```yaml
|
||||
# Phase 5 — Certificate resource the integration test applies and
|
||||
# Certificate resource the integration test applies and
|
||||
# waits for. The certctl-test-trust ClusterIssuer (trust_authenticated
|
||||
# mode) issues the cert without any solver round-trip; the resulting
|
||||
# Secret 'test-com-tls' is asserted to carry tls.crt + tls.key.
|
||||
@@ -218,7 +229,7 @@ psql -c "SELECT created_at, action, resource_type, resource_id
|
||||
## Common failure modes
|
||||
|
||||
These are operator-side; full troubleshooting reference is in
|
||||
[`docs/acme-server.md` § Troubleshooting](./acme-server.md#troubleshooting).
|
||||
[`docs/acme-server.md` § Troubleshooting](../reference/protocols/acme-server.md#troubleshooting).
|
||||
|
||||
- `400 Bad Request: badNonce` → clock skew between certctl-server and
|
||||
cert-manager, or a multi-replica certctl fleet without sticky
|
||||
@@ -243,12 +254,12 @@ helm uninstall certctl-test
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-server-threat-model.md`](./acme-server-threat-model.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-server-threat-model.md`](../reference/protocols/acme-server-threat-model.md) —
|
||||
security posture.
|
||||
- [`docs/acme-caddy-walkthrough.md`](./acme-caddy-walkthrough.md) —
|
||||
- [`docs/acme-caddy-walkthrough.md`](./acme-from-caddy.md) —
|
||||
Caddy-side recipe.
|
||||
- [`docs/acme-traefik-walkthrough.md`](./acme-traefik-walkthrough.md) —
|
||||
- [`docs/acme-traefik-walkthrough.md`](./acme-from-traefik.md) —
|
||||
Traefik-side recipe.
|
||||
- [`deploy/test/acme-integration/`](../deploy/test/acme-integration/) —
|
||||
Phase 5 integration test (the same recipe, automated).
|
||||
cert-manager integration test (the same recipe, automated).
|
||||
@@ -1,5 +1,14 @@
|
||||
# Traefik Integration Walkthrough
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> **Use this walkthrough when** you're already running Traefik 3.0+
|
||||
> (Kubernetes or VM) and want it to ACME-issue from certctl (your
|
||||
> internal CA, your private PKI, or a local sub-CA chained under an
|
||||
> enterprise root) instead of Let's Encrypt. The Traefik static config
|
||||
> changes are minimal; the load-bearing piece is `serversTransport.rootCAs`
|
||||
> so Traefik trusts certctl's bootstrap CA on every outbound ACME call.
|
||||
|
||||
End-to-end recipe for issuing certs from a certctl-server deployment
|
||||
through Traefik 3.0+. Target audience: operator running Traefik (in
|
||||
Kubernetes or on a VM) who wants to use certctl as their ACME source
|
||||
@@ -10,7 +19,7 @@ of truth instead of Let's Encrypt.
|
||||
- A reachable certctl-server with `CERTCTL_ACME_SERVER_ENABLED=true`
|
||||
and at least one profile whose `acme_auth_mode` is set. Profile
|
||||
setup is identical to the cert-manager walkthrough — see
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md)
|
||||
[`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md)
|
||||
Step 2.
|
||||
- Traefik 3.0+ (the v2 API surface for ACME is also supported but the
|
||||
`serversTransport.rootCAs` reference below is v3-shaped).
|
||||
@@ -191,8 +200,8 @@ sudo rm /etc/traefik/acme-certctl.json
|
||||
|
||||
## See also
|
||||
|
||||
- [`docs/acme-server.md`](./acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-cert-manager-walkthrough.md) —
|
||||
- [`docs/acme-server.md`](../reference/protocols/acme-server.md) — canonical reference.
|
||||
- [`docs/acme-cert-manager-walkthrough.md`](./acme-from-cert-manager.md) —
|
||||
cert-manager equivalent.
|
||||
- [Traefik upstream ACME docs](https://doc.traefik.io/traefik/https/acme/#caserver) —
|
||||
verify behavior pinned here against Traefik 3.0+ semantics.
|
||||
@@ -0,0 +1,294 @@
|
||||
# Migrating API keys to RBAC (v2.0.x → v2.1.0)
|
||||
|
||||
> Last reviewed: 2026-05-09
|
||||
|
||||
This is the upgrade guide for an existing certctl deployment moving
|
||||
from v2.0.x's "every API key is admin or not" model to v2.1.0's
|
||||
RBAC primitive. Everything keeps working through the upgrade - the
|
||||
migration backfills every existing API key to the
|
||||
`r-admin` role on first boot, so the pre-existing automation that
|
||||
was using those keys does not change behavior. **However**, most
|
||||
keys do not need full admin power; this guide walks the operator
|
||||
through the post-upgrade scope-down flow.
|
||||
|
||||
## ⚠️ SECURITY: AUDIT YOUR API KEYS
|
||||
|
||||
v2.1.0 maps **every** existing `CERTCTL_API_KEYS_NAMED` entry
|
||||
(and every legacy `CERTCTL_AUTH_SECRET`-synthesized key) to the
|
||||
`r-admin` role on the first boot after migration 000029 applies.
|
||||
This is the safe-for-back-compat default - your CI / agents / scripts
|
||||
keep working without changes - but if you don't downgrade keys, every
|
||||
key in your fleet has full admin permissions including bulk-revoke,
|
||||
CRL admin, and CA hierarchy management.
|
||||
|
||||
**Run the scope-down flow before tagging the next release.** The
|
||||
release notes for v2.1.0 lead with this callout for a reason.
|
||||
|
||||
## Upgrade flow
|
||||
|
||||
### 1. Apply the migration
|
||||
|
||||
The migration runner is idempotent. Re-applying is a no-op if the
|
||||
schema is already at the target version. The five RBAC migrations
|
||||
that ship in v2.1.0:
|
||||
|
||||
| Migration | What it does |
|
||||
|---|---|
|
||||
| `000029_rbac.up.sql` | Creates `tenants`, `roles`, `permissions`, `role_permissions`, `actor_roles`. Seeds 7 default roles + 33-permission catalogue + the synthetic `actor-demo-anon` admin grant. Backfills every named API key into `actor_roles` with the `r-admin` role. |
|
||||
| `000030_rbac_admin_perms.up.sql` | Seeds 5 admin-only fine-grained permissions (`cert.bulk_revoke`, `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage`) into `r-admin` only. |
|
||||
| `000031_api_keys.up.sql` | Creates the `api_keys` table for runtime-minted keys (day-0 bootstrap path). |
|
||||
| `000032_audit_category.up.sql` | Adds `event_category` column to `audit_events` with the closed enum (`cert_lifecycle` / `auth` / `config`). |
|
||||
| `000033_approval_kinds.up.sql` | Adds `approval_kind` + `payload` to `issuance_approval_requests` for the approval-bypass closure. |
|
||||
|
||||
The v2.1.0 server applies these on first boot. No operator
|
||||
action is required other than running the upgrade.
|
||||
|
||||
### 2. Verify the backfill landed
|
||||
|
||||
```bash
|
||||
# Inspect the seeded actor_roles rows. You should see one row per
|
||||
# entry in CERTCTL_API_KEYS_NAMED (Admin=true keys → r-admin,
|
||||
# Admin=false keys → r-viewer) plus the seeded actor-demo-anon
|
||||
# admin row.
|
||||
psql -d certctl -c "SELECT actor_id, role_id, granted_by, granted_at FROM actor_roles ORDER BY granted_at;"
|
||||
```
|
||||
|
||||
If the table is empty, the boot-loader hook in
|
||||
`cmd/server/auth_backfill.go::backfillNamedKeyActorRoles` did not
|
||||
run; re-check that `CERTCTL_AUTH_TYPE` is `api-key` (the boot
|
||||
hook is gated on `cfg.Auth.Type != none`).
|
||||
|
||||
### 3. List + scope-down keys
|
||||
|
||||
The `certctl-cli` ships a four-mode scope-down command. Pick the
|
||||
mode that matches your fleet size + automation posture.
|
||||
|
||||
#### Interactive walk
|
||||
|
||||
```bash
|
||||
certctl-cli auth keys scope-down
|
||||
```
|
||||
|
||||
Walks every actor (skips the synthetic `actor-demo-anon`) and
|
||||
prompts for a target role. Empty input keeps the existing role.
|
||||
Type one of `admin`, `operator`, `viewer`, `agent`, `mcp`, `cli`,
|
||||
`auditor` to replace.
|
||||
|
||||
#### Non-interactive JSON config (Helm post-upgrade hook)
|
||||
|
||||
```bash
|
||||
cat > scope-down.json <<EOF
|
||||
{
|
||||
"ci-bot": "operator",
|
||||
"agent-prod-1": "agent",
|
||||
"agent-prod-2": "agent",
|
||||
"monitoring-bot": "viewer",
|
||||
"compliance-bot": "auditor"
|
||||
}
|
||||
EOF
|
||||
|
||||
certctl-cli auth keys scope-down --non-interactive ./scope-down.json
|
||||
```
|
||||
|
||||
Empty role values revoke every current grant WITHOUT granting a
|
||||
replacement; assign roles selectively with
|
||||
`certctl-cli auth keys assign`.
|
||||
|
||||
#### Audit-driven suggestion
|
||||
|
||||
```bash
|
||||
# Preview suggestions based on the last 30 days of audit history
|
||||
certctl-cli auth keys scope-down --suggest
|
||||
|
||||
# Apply the suggestions
|
||||
certctl-cli auth keys scope-down --suggest --apply
|
||||
```
|
||||
|
||||
The classifier (pure function in `internal/cli/auth_scope_down.go::SuggestRoleFromAuditEvents`)
|
||||
walks the actor's audit events and emits one of:
|
||||
|
||||
| Suggestion | Trigger |
|
||||
|---|---|
|
||||
| `admin` | Any auth.role.* / auth.key.* / ca.hierarchy.* / *.bulk_revoke / *.admin action |
|
||||
| `mcp` | All observed actions are MCP-shaped (`mcp.*`) |
|
||||
| `viewer` | All observed actions are read-only (`*.read` or `*.list`) |
|
||||
| `agent` | All observed actions are agent-shaped (`agent.*`, `cert.read`, `cert.issue`) |
|
||||
| `operator` | Cert / profile / target lifecycle mutations without admin signals |
|
||||
|
||||
The classifier is conservative - when in doubt, it prefers the
|
||||
narrower role. The operator confirms each suggestion before any
|
||||
mutation lands (unless `--apply` is set).
|
||||
|
||||
### 4. Mint a fresh admin via bootstrap (optional, for fresh deployments)
|
||||
|
||||
If you're standing up a fresh deployment instead of upgrading an
|
||||
existing one, the bootstrap path mints the first admin key without
|
||||
needing the operator to know the env-var format:
|
||||
|
||||
```bash
|
||||
# Set the bootstrap token in the server environment.
|
||||
export CERTCTL_BOOTSTRAP_TOKEN=$(openssl rand -hex 32)
|
||||
|
||||
# Boot the server. Logs include "bootstrap endpoint enabled".
|
||||
docker compose up -d
|
||||
|
||||
# Mint the first admin key.
|
||||
curl -X POST $URL/api/v1/auth/bootstrap \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"token":"'$CERTCTL_BOOTSTRAP_TOKEN'","actor_name":"first-admin"}'
|
||||
```
|
||||
|
||||
The response carries the plaintext `key_value` once. Capture it
|
||||
and use it as the Bearer token for subsequent calls. Subsequent
|
||||
bootstrap calls return HTTP 410 Gone.
|
||||
|
||||
See [`docs/operator/rbac.md`](../operator/rbac.md) for the full
|
||||
bootstrap flow + the threat model.
|
||||
|
||||
## What changes for code that called `IsAdmin`
|
||||
|
||||
In v2.0.x, the five admin handlers checked `auth.IsAdmin(ctx)`
|
||||
directly in the body. v2.1.0 moved those checks to
|
||||
the router via the `auth.RequirePermission` middleware (wrapped
|
||||
through the `rbacGate` helper in
|
||||
`internal/api/router/router.go`). The behavior contract is
|
||||
unchanged: `r-admin`-roled callers reach the handler, anyone else
|
||||
gets HTTP 403 BEFORE the body runs.
|
||||
|
||||
If your code consumed `auth.IsAdmin` directly (it shouldn't -
|
||||
the helper is internal), the new convention is:
|
||||
|
||||
1. Wrap the route in `rbacGate(reg.Checker, "<perm>", handler)`
|
||||
in `router.go`.
|
||||
2. Add the perm to `migrations/000030_rbac_admin_perms.up.sql`
|
||||
(or `migrations/000029_rbac.up.sql`'s catalogue).
|
||||
3. Grant the perm to the right default roles.
|
||||
|
||||
The five admin-only fine-grained perms stay on `r-admin` only by
|
||||
default. Operators delegate by creating custom roles with the
|
||||
specific perm.
|
||||
|
||||
## Helm-specific upgrade
|
||||
|
||||
The certctl Helm chart applies migrations on container start via
|
||||
the standard migrations runner. No chart changes are required;
|
||||
the `helm upgrade` command runs identically:
|
||||
|
||||
```bash
|
||||
helm upgrade certctl certctl/certctl \
|
||||
--version <new-version> \
|
||||
--reuse-values
|
||||
```
|
||||
|
||||
Post-upgrade, the boot loader runs the named-key actor-role
|
||||
backfill against the `CERTCTL_API_KEYS_NAMED` env-var-injected
|
||||
into the deployment. The "AUDIT YOUR API KEYS" callout applies -
|
||||
add a post-upgrade Job to your release pipeline that runs
|
||||
`certctl-cli auth keys scope-down --non-interactive` against a
|
||||
checked-in JSON config, so the role narrowing is deterministic
|
||||
across upgrade rollouts.
|
||||
|
||||
Example post-upgrade Job:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: certctl-scope-down
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: scope-down
|
||||
image: ghcr.io/certctl-io/certctl-cli:<tag>
|
||||
command:
|
||||
- certctl-cli
|
||||
- auth
|
||||
- keys
|
||||
- scope-down
|
||||
- --non-interactive
|
||||
- /config/scope-down.json
|
||||
envFrom:
|
||||
- secretRef:
|
||||
name: certctl-cli-credentials
|
||||
volumeMounts:
|
||||
- name: scope-down-config
|
||||
mountPath: /config
|
||||
volumes:
|
||||
- name: scope-down-config
|
||||
configMap:
|
||||
name: certctl-scope-down-config
|
||||
restartPolicy: OnFailure
|
||||
```
|
||||
|
||||
The ConfigMap holds the `{actor_id: role_id}` map; the Secret
|
||||
holds the API key the Job uses to call `/v1/auth/keys/.../roles`.
|
||||
|
||||
## Docker Compose-specific upgrade
|
||||
|
||||
For `deploy/docker-compose.yml` deployments:
|
||||
|
||||
1. Pull the new images: `docker compose pull`
|
||||
2. Verify your `CERTCTL_AUTH_TYPE` value before restarting. If it
|
||||
was `none` (the demo path), the post-upgrade server will boot
|
||||
in demo mode again - the synthetic `actor-demo-anon` admin
|
||||
covers every request, no scope-down is meaningful. If you're
|
||||
moving from `none` to `api-key` mode, set
|
||||
`CERTCTL_API_KEYS_NAMED` first, then restart.
|
||||
3. `docker compose up -d` to apply.
|
||||
4. `docker compose logs certctl-server | grep -i 'loaded persisted api_keys'`
|
||||
to verify the boot loader ran. The first-boot log line includes
|
||||
the count of keys loaded into the runtime keystore.
|
||||
5. Run `certctl-cli auth keys scope-down` against the running
|
||||
server.
|
||||
|
||||
The five examples in `examples/` (acme-nginx, private-ca-traefik,
|
||||
step-ca-haproxy, multi-issuer, acme-wildcard-dns01) all run in
|
||||
demo mode (`CERTCTL_AUTH_TYPE=none`) and are unaffected by the
|
||||
RBAC migration - the synthetic actor-demo-anon admin grant covers
|
||||
every request.
|
||||
|
||||
## Verifying the upgrade landed
|
||||
|
||||
After the scope-down flow completes:
|
||||
|
||||
1. `certctl-cli auth me` while authenticated as each named key
|
||||
confirms the right `effective_permissions` for that role.
|
||||
2. `psql -c "SELECT actor_id, array_agg(role_id ORDER BY role_id) FROM actor_roles GROUP BY actor_id;"`
|
||||
gives the full picture in one query.
|
||||
3. The audit trail
|
||||
(`GET /api/v1/audit?category=auth`)
|
||||
shows the `auth.role.assign` and `auth.role.revoke` rows for
|
||||
every change you made - confirm via the GUI's
|
||||
`/audit?category=auth` view.
|
||||
4. Read the updated [`docs/operator/rbac.md`](../operator/rbac.md)
|
||||
for day-2 RBAC management.
|
||||
|
||||
## Rollback
|
||||
|
||||
If the upgrade goes wrong, the down migrations exist in lockstep:
|
||||
|
||||
```bash
|
||||
# Roll back via your migration runner (golang-migrate, Atlas, etc.).
|
||||
# Migrations 000029-000033 each have a .down.sql that reverses the
|
||||
# .up.sql. Down migrations are destructive on data added by the up
|
||||
# migration (api_keys rows, role grants on actors, profile-edit
|
||||
# approvals); take a backup first.
|
||||
```
|
||||
|
||||
After rollback, the v2.0.x binary works against the v2.0.x
|
||||
schema unchanged. The operator's API keys still authenticate (the
|
||||
in-memory hash table is rebuilt from `CERTCTL_API_KEYS_NAMED` on
|
||||
boot regardless of schema version).
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`docs/operator/rbac.md`](../operator/rbac.md) - the operator
|
||||
how-to for the new RBAC primitive
|
||||
- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) -
|
||||
what the new controls defend against
|
||||
- [`docs/reference/profiles.md`](../reference/profiles.md) - the
|
||||
approval-bypass closure on `RequiresApproval` profile edits
|
||||
- [`docs/operator/security.md`](../operator/security.md) - the
|
||||
full security posture
|
||||
- `CHANGELOG.md` - the v2.1.0 release notes lead with this guide
|
||||
@@ -1,5 +1,7 @@
|
||||
# certctl for cert-manager Users
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You run cert-manager inside Kubernetes and it works well for in-cluster certificates. But you also have VMs, bare-metal servers, network appliances, and legacy systems outside the cluster. cert-manager can't reach those. This guide shows how certctl complements cert-manager to give you unified certificate visibility and automation across your entire infrastructure.
|
||||
|
||||
## Not a Replacement
|
||||
@@ -96,7 +98,7 @@ Go to **Policies** → **+ New Policy** to create enforcement rules:
|
||||
- **Severity:** `high`
|
||||
- **Config:** set your enforcement parameters
|
||||
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other compliance rules across your fleet.
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other policy rules across your fleet.
|
||||
|
||||
### 6. View Unified Inventory
|
||||
|
||||
@@ -139,7 +141,7 @@ For now: cert-manager handles Kubernetes, certctl handles everything else. They
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Run through the [Quick Start](./quickstart.md) for a 5-minute demo
|
||||
2. Try the [Multi-Issuer example](../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
|
||||
3. Explore [Architecture](./architecture.md#agents) for deployment patterns
|
||||
1. Run through the [Quick Start](../getting-started/quickstart.md) for a 5-minute demo
|
||||
2. Try the [Multi-Issuer example](../../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
|
||||
3. Explore [Architecture](../reference/architecture.md#agents) for deployment patterns
|
||||
4. Check the [Helm Chart](../deploy/helm/certctl/) for production Kubernetes deployment
|
||||
@@ -1,5 +1,7 @@
|
||||
# Migrate from acme.sh to certctl
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You use acme.sh to automate Let's Encrypt renewal across multiple servers. It works — but without centralized visibility, deployment verification, or policy enforcement.
|
||||
|
||||
This guide walks through moving your acme.sh workload to certctl while keeping your existing DNS provider setup.
|
||||
@@ -269,7 +271,7 @@ certctl automatically falls back to DNS-01 if the CA doesn't support dns-persist
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Try the [Wildcard DNS-01 example](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
|
||||
- See [Connector Reference](connectors.md) for advanced ACME options (EAB, ARI, custom timeouts)
|
||||
- Try the [Wildcard DNS-01 example](../../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
|
||||
- See [Connector Reference](../reference/connectors/index.md) for advanced ACME options (EAB, ARI, custom timeouts)
|
||||
- See [Discovery Guide](concepts.md#certificate-discovery) for managing discovered certificates at scale
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
|
||||
- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
|
||||
@@ -1,5 +1,7 @@
|
||||
# Migrating from Certbot to certctl
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
You have 50 Let's Encrypt certificates across 10 servers, managed by a mix of Certbot cron jobs and manual renewals. Certbot handles issuance, but you lack inventory visibility, centralized alerting, and audit trails. This guide walks you through moving to certctl while keeping your existing certificates and ACME account.
|
||||
|
||||
## Why Migrate
|
||||
@@ -167,7 +169,7 @@ certctl will stop renewing that cert when the policy is disabled. Certbot resume
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Try the [ACME + NGINX example](../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
|
||||
- Review the [Concepts Guide](./concepts.md) for terminology (profiles, policies, agents, jobs)
|
||||
- Explore [Network Discovery](./quickstart.md#network-discovery-agentless) to find certificates you didn't know about
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
|
||||
- Try the [ACME + NGINX example](../../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
|
||||
- Review the [Concepts Guide](../getting-started/concepts.md) for terminology (profiles, policies, agents, jobs)
|
||||
- Explore [Network Discovery](../getting-started/quickstart.md#network-discovery-agentless) to find certificates you didn't know about
|
||||
- See all [Deployment Examples](../getting-started/examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
|
||||
@@ -0,0 +1,261 @@
|
||||
# Enable OIDC SSO
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This guide walks an operator already running certctl with API-key auth + RBAC through enabling OIDC SSO. The path is additive: API-key auth keeps working unchanged; OIDC sits alongside as a second authentication surface for human users.
|
||||
|
||||
If you are upgrading from a pre-RBAC (v2.0.x) deployment, finish [`api-keys-to-rbac.md`](api-keys-to-rbac.md) first. If you have not deployed certctl at all, start with [`getting-started/quickstart.md`](../getting-started/quickstart.md). For the canonical mental model + per-flow threat coverage, see [`security.md`](../operator/security.md) and [`auth-threat-model.md`](../operator/auth-threat-model.md).
|
||||
|
||||
## What "enable OIDC" gives you
|
||||
|
||||
After this migration:
|
||||
|
||||
- Human operators can log in via the OIDC button on the certctl login page (one button per configured IdP).
|
||||
- The IdP authenticates the user; certctl validates the returned ID token, mints a session cookie, and redirects to the dashboard.
|
||||
- IdP groups → certctl roles are operator-configured (e.g. `engineering@example.com` → `r-operator`).
|
||||
- Every login emits an audit row (`auth.oidc_login_succeeded`) attributing the action to the federated user, NOT to a shared API key.
|
||||
- The first user from a configured admin group (when `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` is set) becomes admin per tenant; one-shot per the admin-existence probe.
|
||||
|
||||
What does NOT change:
|
||||
|
||||
- API keys keep working. Existing automation continues to authenticate via `Authorization: Bearer` exactly as before.
|
||||
- The break-glass admin path stays default-OFF.
|
||||
- The auditor split + approval workflow + RBAC primitive are unchanged.
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
**On certctl side:**
|
||||
|
||||
- Server build ≥ v2.1.0. Confirm via `curl https://<your-host>:8443/api/v1/version`.
|
||||
- `CERTCTL_CONFIG_ENCRYPTION_KEY` set in the server environment. This is the passphrase that encrypts the OIDC `client_secret` at rest. Use a stable, secrets-manager-stored value at least 32 random bytes long. **The server refuses to start if the key is missing AND any source='database' rows already exist** (CWE-311 fail-closed gate). Set this before doing anything else.
|
||||
- An admin actor available to drive the configuration. The actor needs the `auth.oidc.create` + `auth.oidc.edit` permissions; `r-admin` carries both by default. Get one via the day-0 bootstrap path if you don't have one yet.
|
||||
- HTTPS-only control plane (post-v2.2 milestone — this is the default). The OIDC redirect URI MUST be `https://`.
|
||||
|
||||
**On IdP side:**
|
||||
|
||||
- A Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace tenant where you can register an OIDC application. Free dev tiers work for evaluation. See the per-IdP runbook at [`oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md).
|
||||
- Network reachability from certctl-server to the IdP's `/.well-known/openid-configuration` discovery endpoint. The certctl service fetches discovery + JWKS at provider creation and at every `RefreshKeys` call.
|
||||
|
||||
## Step-by-step
|
||||
|
||||
### 1. Pin `CERTCTL_CONFIG_ENCRYPTION_KEY`
|
||||
|
||||
If your deployment already has it set (the CWE-311 fail-closed gate enforces this for any source='database' issuer/target row), skip this step. If you don't:
|
||||
|
||||
```bash
|
||||
# Generate a 32-byte random key + base64-encode it.
|
||||
openssl rand -base64 32 > /etc/certctl/config-encryption-key
|
||||
chmod 600 /etc/certctl/config-encryption-key
|
||||
```
|
||||
|
||||
Then make the server consume it at boot:
|
||||
|
||||
```bash
|
||||
# In your environment, systemd unit, k8s Secret, etc.
|
||||
export CERTCTL_CONFIG_ENCRYPTION_KEY="$(cat /etc/certctl/config-encryption-key)"
|
||||
```
|
||||
|
||||
Restart the server. Confirm the boot log does NOT show the `ErrEncryptionKeyRequired` warning. If it does, the server refuses to start because there's pre-existing source='database' material that needs to be re-sealed; see [`docs/operator/security.md`](../operator/security.md) for the re-encryption flow.
|
||||
|
||||
### 2. Pick an IdP runbook + complete the IdP-side configuration
|
||||
|
||||
Pick the runbook for your IdP and do EVERYTHING in its IdP-side section. The runbooks are at [`docs/operator/oidc-runbooks/`](../operator/oidc-runbooks/index.md). What you need from the runbook before continuing here:
|
||||
|
||||
- The IdP's discovery URL (the `iss` value certctl will validate against).
|
||||
- An OIDC client ID + client secret. Save the secret; you'll paste it into certctl in step 3.
|
||||
- At least one IdP group with the users who should be allowed to log in. The runbook walks the group-claim mapper config.
|
||||
- The IdP-side group claim shape — most IdPs emit `string-array` under a `groups` key, but Auth0 uses namespaced URL keys (`https://your-namespace/groups`) and Entra ID emits group OBJECT IDs (GUIDs) instead of names. The runbook calls out the per-IdP shape.
|
||||
|
||||
### 3. Configure the certctl-side OIDC provider
|
||||
|
||||
Via the GUI (recommended for first-time setup):
|
||||
|
||||
1. Sign in as an admin actor.
|
||||
2. Navigate to **Auth → OIDC Providers** in the sidebar.
|
||||
3. Click **Configure provider**.
|
||||
4. Fill in the form using the values from step 2's runbook.
|
||||
5. Click **Save**.
|
||||
|
||||
If the discovery doc fetch fails, the modal surfaces the error inline. Most-common cause: a typo in the issuer URL.
|
||||
|
||||
Or via the CLI / MCP:
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Keycloak",
|
||||
"issuer_url": "https://keycloak.example.com/realms/certctl",
|
||||
"client_id": "certctl",
|
||||
"client_secret": "<paste-the-secret>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"scopes": ["openid", "profile", "email"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
The MCP equivalent (`certctl_auth_create_oidc_provider`) accepts the same JSON shape.
|
||||
|
||||
### 4. Add the group → role mappings
|
||||
|
||||
Empty mapping list = nobody can log in via this provider (the fail-closed contract; pinned by `ErrGroupsUnmapped`). Add at least one mapping BEFORE announcing the SSO endpoint to users.
|
||||
|
||||
Via the GUI: **Auth → OIDC Providers → <provider> → Group → role mappings → Add**.
|
||||
|
||||
Via the API:
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider_id": "<provider-id-from-step-3>",
|
||||
"group_name": "engineering@example.com",
|
||||
"role_id": "r-operator"
|
||||
}'
|
||||
```
|
||||
|
||||
A typical setup adds two or three mappings: `engineers → r-operator`, `viewers → r-viewer`, optionally `admins → r-admin`. For Entra ID, use group object IDs (GUIDs) NOT names; for Auth0, use the bare group name from inside the namespaced claim array.
|
||||
|
||||
### 5. (Optional) Configure first-admin bootstrap
|
||||
|
||||
If your deployment has no admin actor yet AND you want the first OIDC-authenticated user from a specific group to become admin (instead of using the env-var-token bootstrap path), set:
|
||||
|
||||
```bash
|
||||
export CERTCTL_BOOTSTRAP_ADMIN_GROUPS=admins
|
||||
export CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID=<provider-id-from-step-3>
|
||||
```
|
||||
|
||||
Restart the server. The first user with the `admins` group claim from that provider becomes admin on login per tenant. Subsequent logins go through normal group-role mapping. Audit row on every grant (`bootstrap.oidc_first_admin`).
|
||||
|
||||
If you already have an admin actor (likely — you needed one to run step 3), the bootstrap hook silently falls through to normal mapping; no harm done. The probe is one-shot per tenant and can't double-grant.
|
||||
|
||||
### 6. Verify with a single test user
|
||||
|
||||
Before announcing the SSO endpoint to your users, verify the full login flow with a test user from your IdP:
|
||||
|
||||
1. Open `https://<your-certctl-host>:8443/login` in a fresh incognito window.
|
||||
2. The page should render `Sign in with <provider>` button(s) above the API-key form. If not, check that `getAuthInfo` is returning the `oidc_providers` field — `curl https://<your-host>:8443/api/v1/auth/info` should show the configured provider(s).
|
||||
3. Click the provider button. The browser redirects to the IdP, you authenticate, and the IdP redirects back. You should land on the certctl dashboard.
|
||||
4. Navigate to **Auth → Sessions**. You should see a row with your own actor ID and the current timestamp.
|
||||
5. Confirm the audit row:
|
||||
|
||||
```bash
|
||||
curl https://<your-host>:8443/api/v1/audit?category=auth \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
| jq '.events[] | select(.action == "auth.oidc_login_succeeded")'
|
||||
```
|
||||
|
||||
You should see a row attributed to the federated user with `details.provider_id` matching your configuration.
|
||||
|
||||
If any step fails, see the **Troubleshooting** section below.
|
||||
|
||||
### 7. Announce the SSO endpoint
|
||||
|
||||
Once step 6 passes, the SSO endpoint is operational. Tell your users to log in via `https://<your-host>:8443/login` and click the provider button. API-key auth continues to work for automation; the two paths coexist.
|
||||
|
||||
Optional GUI hardening:
|
||||
|
||||
- If you want the API-key form hidden once OIDC is configured, the operator can add a frontend feature flag in a follow-on commit. Default behavior keeps both paths visible (the API-key form stays for break-glass + Bearer-mode deploys).
|
||||
- If you want to revoke a user's session immediately (e.g. an employee left), use **Auth → Sessions → All actors (admin) → <user> → Revoke**. The next request from that user's browser fails 401.
|
||||
|
||||
## Rollback
|
||||
|
||||
If you need to disable OIDC:
|
||||
|
||||
1. Delete every group-role mapping for the provider:
|
||||
```bash
|
||||
# GUI: Auth → OIDC Providers → <provider> → Group → role mappings → Remove (each)
|
||||
```
|
||||
2. Delete the OIDC provider:
|
||||
```bash
|
||||
# GUI: Auth → OIDC Providers → <provider> → Delete (type-confirm-name dialog)
|
||||
```
|
||||
The server returns HTTP 409 if any user has an authenticated session minted via this provider; revoke those sessions first.
|
||||
3. The `Sign in with <provider>` button disappears from the login page on the next `getAuthInfo` round-trip (typically the next page load).
|
||||
4. Existing sessions continue to work until idle/absolute expiry. To force-revoke them, **Auth → Sessions → All actors (admin) → revoke each row**.
|
||||
|
||||
API-key auth continues to work throughout this rollback; you do not need to re-bootstrap or change any other configuration.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Discovery doc fetch failed" at provider creation.**
|
||||
The most common cause is a typo in the issuer URL. Curl the URL manually:
|
||||
```bash
|
||||
curl -v https://<idp-host>/<path>/.well-known/openid-configuration
|
||||
```
|
||||
If that returns 404, fix the issuer URL.
|
||||
|
||||
**"IdP downgrade-attack defense" rejected provider creation.**
|
||||
Your IdP advertises HS256/HS384/HS512 or `none` in `id_token_signing_alg_values_supported`. Configure the IdP to advertise only RS256 / RS512 / ES256 / ES384 / EdDSA before re-creating the provider in certctl. The relevant runbook section walks this.
|
||||
|
||||
**Login redirects to IdP, user authenticates, but the callback redirects back to `/login` with "no roles assigned".**
|
||||
The user authenticated successfully but their groups didn't match any configured mapping (`ErrGroupsUnmapped`). Check:
|
||||
- The user is a member of the IdP group you mapped.
|
||||
- The group-claim mapper is configured correctly at the IdP (the runbook walks per-IdP).
|
||||
- The group name in your certctl mapping exactly matches what the IdP emits — case-sensitive, no leading slash for Keycloak full-path-OFF.
|
||||
|
||||
Decode the ID token at jwt.io against the IdP's JWKS to see exactly what's in the `groups` claim.
|
||||
|
||||
**`ErrIssuerMismatch` even though the discovery doc looks correct.**
|
||||
The `iss` claim in the ID token must match `OIDCProvider.IssuerURL` byte-for-byte. Some IdPs include / omit a trailing slash; check the per-IdP runbook section on `iss` formatting.
|
||||
|
||||
**`oidc: pre-login session not found or already consumed`.**
|
||||
The user clicked the OIDC login button, then the browser tab idled past the 10-minute pre-login TTL OR the user opened the IdP login in a new tab and consumed the row from the first one. Have them retry from the login page.
|
||||
|
||||
**`oidc: state parameter mismatch (replay or forgery)`.**
|
||||
Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns `ErrPreLoginNotFound`. Have them retry from the login page.
|
||||
|
||||
**`Sessions revoked but the user can still hit the API.`**
|
||||
Check the session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `__Host-certctl_session` cookie wasn't actually cleared on the client, the cookie hits the server's session middleware which returns 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.
|
||||
|
||||
**JWKS rotation: an IdP rotated its signing key and existing users start failing login.**
|
||||
Click **Refresh discovery cache** on the OIDC provider detail page (or `POST /api/v1/auth/oidc/providers/<id>/refresh`). The certctl service re-fetches discovery + JWKS. New tokens validate immediately. The Keycloak integration test exercises this drill end to end.
|
||||
|
||||
**Database row count drift.**
|
||||
After OIDC is live, expect to see new rows under:
|
||||
- `oidc_providers` (one per configured provider)
|
||||
- `group_role_mappings` (one per configured mapping)
|
||||
- `users` (one per first OIDC-authenticated user; certctl auto-upserts on login)
|
||||
- `sessions` (one per logged-in browser session; idle 1h / absolute 8h GC)
|
||||
- `session_signing_keys` (one active + retained-history rows post rotation)
|
||||
- `oidc_pre_login_sessions` (transient; 10-minute TTL, scheduler-GC'd)
|
||||
|
||||
All ten of these tables are tenant-scoped (`tenant_id` column); single-tenant deployments use the seeded `t-default` tenant.
|
||||
|
||||
## What you can do next
|
||||
|
||||
- Run [`docs/operator/oidc-runbooks/<your-idp>.md`](../operator/oidc-runbooks/index.md) end to end to fill in the validation checklist + sign-off line.
|
||||
- Read [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) for the steady-state + cold-cache performance baselines.
|
||||
- Review the [`auth-threat-model.md`](../operator/auth-threat-model.md) OIDC + sessions + break-glass sections to understand the failure modes the federated-identity surface defends against.
|
||||
- Schedule a rotation reminder for the OIDC `client_secret` (typically 6-12 months; the IdP doesn't auto-rotate it). Edit the provider via the GUI when the time comes; leaving `client_secret` blank in the edit form preserves the existing ciphertext, providing a value rotates.
|
||||
|
||||
## `__Host-` cookie rename (BREAKING)
|
||||
|
||||
v2.1.0 carries a wire-format change to the three auth cookies: they now carry the `__Host-` prefix. The cookie names are:
|
||||
|
||||
- `__Host-certctl_session` (was `certctl_session`)
|
||||
- `__Host-certctl_csrf` (was `certctl_csrf`)
|
||||
- `__Host-certctl_oidc_pending` (was `certctl_oidc_pending`)
|
||||
|
||||
The rename gains browser-enforced subdomain-takeover defense: a `__Host-*` cookie can only be set with `Path=/` + `Secure` + no `Domain` attribute, and the browser rejects any subdomain attempt to overwrite it. The protection is free (the existing cookies already met the prerequisites) but the wire-format change means:
|
||||
|
||||
- **Every active session is invalidated by the deploy that lands this change.** Operators see one re-authentication prompt; subsequent logins issue the new `__Host-*`-prefixed cookie.
|
||||
- **The pre-login cookie's Path widens from `/auth/oidc/` to `/`** — required by the `__Host-` prefix. The cookie lifetime is unchanged (10 minutes) and is only ever consumed by the callback handler; the wider path scope is harmless.
|
||||
- **No operator action required beyond accepting the one-time re-login window.** The GUI's CSRF cookie reader was updated in lockstep; existing bookmarked deep links work without modification.
|
||||
|
||||
If you have GUI customizations that read `document.cookie` directly, update them to look for `__Host-certctl_csrf` (the lookup in `web/src/api/client.ts` is the in-tree reference).
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`docs/operator/oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md) — per-IdP setup guides.
|
||||
- [`docs/operator/security.md`](../operator/security.md) — overall auth surface including this OIDC layer.
|
||||
- [`docs/operator/auth-threat-model.md`](../operator/auth-threat-model.md) — threat model.
|
||||
- [`docs/operator/auth-benchmarks.md`](../operator/auth-benchmarks.md) — performance baselines.
|
||||
- [`docs/reference/auth-standards-implemented.md`](../reference/auth-standards-implemented.md) — RFC + CWE evidence list.
|
||||
- `internal/auth/oidc/` — OIDC service implementation.
|
||||
- `internal/auth/session/` — session minting + middleware + signing-key rotation.
|
||||
@@ -1,6 +1,8 @@
|
||||
# Issuance approval workflow
|
||||
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA) configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Operators configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
|
||||
Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — without this surface the answer is "we don't"; with `requires_approval=true` on the profile, the answer is "here's the RBAC contract + here's the audit query that proves bypass mode is off in production."
|
||||
|
||||
@@ -48,7 +50,7 @@ Every certificate bound to that profile is now gated. The default is `requires_a
|
||||
|
||||
The actor that triggers a renewal **cannot** be the actor that approves it. The check happens at the service layer and surfaces as **HTTP 403** at the handler. The error message contains the substring `two-person integrity` so server-log greps detect attempted self-approvals.
|
||||
|
||||
This is the load-bearing compliance contract. Pinned by:
|
||||
This is the load-bearing two-person-integrity contract. Pinned by:
|
||||
|
||||
- `internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor` — service-level pin.
|
||||
- `internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403` — handler-level pin (HTTP 403 + body contains "two-person integrity").
|
||||
@@ -95,20 +97,11 @@ curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
|
||||
|
||||
Tighten the timeout for short-window deployments via the env var, e.g. `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h`.
|
||||
|
||||
## Compliance control mapping
|
||||
|
||||
| Standard | Control | What this surface satisfies |
|
||||
|---|---|---|
|
||||
| PCI-DSS 4.0 | **§6.4.5** (Separation of duties for production change-management) | Same-actor RBAC pin; audit row carries both `requested_by` and `decided_by` so reviewers see two distinct identities per change. |
|
||||
| NIST SP 800-53 | **SA-15** (Development process; two-person review for security-relevant changes) | Service-layer `ErrApproveBySameActor` + `TestApproval_Approve_RejectsSameActor` pin the contract. Bypass-mode emits a typed audit row (`action=approval_bypassed`) so compliance reviewers detect dev-mode misuse via `SELECT count(*) FROM audit_events WHERE actor='system-bypass'` returning > 0. |
|
||||
| SOC 2 Type II | **CC6.1** (Logical access — restrict, monitor, terminate) | Per-decision audit row + `certctl_approval_decisions_total{outcome,profile_id}` Prometheus counter. Operators alert on sustained `outcome="rejected"` or `outcome="expired"` bursts. |
|
||||
| HIPAA | **§164.308(a)(4)** (Information access management) | Same surface — the per-policy gating + audit trail is the access-management control. |
|
||||
|
||||
## Bypass mode (dev / CI ONLY)
|
||||
|
||||
Setting `CERTCTL_APPROVAL_BYPASS=true` short-circuits the workflow: every `RequestApproval` call auto-approves with `decided_by=system-bypass` and `actorType=System`. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.
|
||||
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so compliance auditors detect misuse via:
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so reviewers detect misuse via:
|
||||
|
||||
```sql
|
||||
SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
|
||||
@@ -128,7 +121,7 @@ certctl_approval_pending_age_seconds histogram
|
||||
|
||||
`outcome` is one of `approved`, `rejected`, `expired`, `bypassed`. `profile_id` is the `CertificateProfile.ID` that triggered the gate (cardinality-bounded — operators have <100 profiles in production).
|
||||
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — compliance customers usually have a same-day decision deadline.
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — production deployments usually have a same-day decision deadline.
|
||||
|
||||
## Future free V2 work
|
||||
|
||||
@@ -0,0 +1,162 @@
|
||||
# Authentication performance benchmarks
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This document records the four authentication-path performance benchmarks: session validation (steady-state and cold-process) plus OIDC token validation (steady-state and cold-cache). Numbers below are the as-measured baseline at v2.1.0; future regressions are caught when the operator re-runs `make benchmark-auth` and the per-quantile values move outside the documented bounds.
|
||||
|
||||
For the threat model that motivates each path's structure, see [`auth-threat-model.md`](auth-threat-model.md). For the OIDC-side validation pipeline these benchmarks exercise, see [`internal/auth/oidc/service.go`](../../internal/auth/oidc/service.go) and [`internal/auth/session/service.go`](../../internal/auth/session/service.go).
|
||||
|
||||
## Hardware floor
|
||||
|
||||
The numbers below are bounded by this configuration. Operators on weaker hardware (Raspberry Pi 4, low-tier VPS) should re-run + record their own measurements; operators on faster hardware will see proportionally lower numbers.
|
||||
|
||||
| Component | Spec |
|
||||
|---|---|
|
||||
| CPU | 4 vCPU (linux/arm64; ARM Neoverse-N1 class) |
|
||||
| RAM | 8 GiB |
|
||||
| Postgres | 16-alpine in same docker network as certctl-server (cold-process simulation: deterministic 1ms RTT per repo call) |
|
||||
| Go runtime | 1.25.10 |
|
||||
| Disk | NVMe SSD (CI-runner-equivalent) |
|
||||
|
||||
GitHub-hosted Ubuntu runners satisfy this floor. The baselines below were captured on a `linux/arm64` 4-vCPU sandbox at 2026-05-10.
|
||||
|
||||
## Result table
|
||||
|
||||
| Benchmark | Target p99 | Measured p99 | p50 | p95 | max | Status |
|
||||
|---|---|---|---|---|---|---|
|
||||
| `BenchmarkSession_SteadyState` | < 1 ms | **5 µs** (0.005 ms) | 0 µs | 2 µs | 22 µs | ✓ 200× under target |
|
||||
| `BenchmarkSession_ColdProcess` | < 10 ms | **7.1 ms** | 2.7 ms | 3.6 ms | 20.6 ms | ✓ within target |
|
||||
| `BenchmarkOIDC_SteadyState` | < 5 ms | **1.5 ms** | 1.2 ms | 1.5 ms | 2.6 ms | ✓ 3× under target |
|
||||
| `BenchmarkOIDC_ColdCache` | < 200 ms | operator-run | — | — | — | ⚠️ requires Docker; see [Cold-cache OIDC: how to run](#cold-cache-oidc-how-to-run) below |
|
||||
|
||||
The three default-tag benchmarks above were captured at v2.1.0; re-run via `make benchmark-auth`. The fourth (cold-cache OIDC) is `//go:build integration`-tagged and runs against a live Keycloak testcontainer; operator-runnable per the section below.
|
||||
|
||||
## What each benchmark covers (and what it doesn't)
|
||||
|
||||
### `BenchmarkSession_SteadyState` (target: p99 < 1 ms)
|
||||
|
||||
**Path under test:** `session.Service.Validate(ctx, ValidateInput{...})`. With:
|
||||
|
||||
- In-memory `SessionRepo` (no Postgres round-trip).
|
||||
- In-memory `SigningKeyRepo` (no Postgres round-trip).
|
||||
- A pre-minted session row for a real `actor-bench`.
|
||||
- A real RSA-32-byte HMAC key in the in-memory key store.
|
||||
|
||||
**Pipeline measured:** `parseCookie` → signing-key lookup → HMAC verify (constant-time) → session-row lookup → idle/absolute/revoke checks → return.
|
||||
|
||||
**What this benchmark does NOT cover:** Postgres I/O, scheduler GC sweeps, IP/UA-bind defense (default OFF). Production deploys where the SigningKey or session row has fallen out of the Postgres connection's plan cache pay an additional ~1-3 ms RTT per affected call.
|
||||
|
||||
### `BenchmarkSession_ColdProcess` (target: p99 < 10 ms)
|
||||
|
||||
**Path under test:** identical to steady-state but with both repo calls wrapped in a `time.Sleep(1ms)` simulator on every call. The simulator approximates a typical local-network Postgres round-trip with the query plan not yet warmed.
|
||||
|
||||
**Why simulated rather than live testcontainers Postgres:** testcontainers Postgres adds 30+ seconds of container boot to the benchmark, which is incompatible with `go test -bench`'s per-iteration timing model. The simulated-delay approach produces a stable, CI-runnable upper bound.
|
||||
|
||||
**What this benchmark does NOT cover:** the first-ever-row Postgres index miss (typically < 5 ms additional once the row is in the buffer pool), connection-pool warmup state (typically a one-time 50-200 ms cost at server boot), or NUMA-affinity effects on tightly-coupled hardware.
|
||||
|
||||
### `BenchmarkOIDC_SteadyState` (target: p99 < 5 ms)
|
||||
|
||||
**Path under test:** `oidc.Service.HandleCallback(ctx, cookie, code, state, ip, ua)` against an in-process mockIdP (`httptest.Server` on localhost). Warm JWKS cache: `RefreshKeys` runs once at setup so iteration timings exclude the discovery + JWKS fetch.
|
||||
|
||||
**Pipeline measured:**
|
||||
|
||||
1. Pre-login row consume (in-memory stub, atomic `DELETE...RETURNING`).
|
||||
2. State constant-time-compare.
|
||||
3. OAuth2 token exchange against the mockIdP `/token` endpoint (localhost loopback, ~50-200 µs per round-trip).
|
||||
4. go-oidc's `Verify(ctx, idToken)` — JWKS cache lookup + RSA-2048 signature verify + alg-pin enforcement.
|
||||
5. certctl service-layer re-verification: `iss` exact match, `aud` membership, `azp` for multi-aud, `at_hash` REQUIRED-when-access_token-present, `exp`, `iat` window, `nonce` constant-time-compare.
|
||||
6. Group-claim resolution (`groupclaim/resolver.go`).
|
||||
7. Group→role mapping lookup (in-memory stub).
|
||||
8. User upsert (in-memory stub).
|
||||
9. Session mint via stubSessions.
|
||||
|
||||
**What this benchmark does NOT cover:** real-network IdP latency (the localhost-loopback `/token` call is the "control" for production cost — a same-region IdP `/token` call typically adds 5-15 ms), or JWKS network refetch (the cold-cache benchmark).
|
||||
|
||||
### `BenchmarkOIDC_ColdCache` (target: p99 < 200 ms)
|
||||
|
||||
**Path under test:** `oidc.Service.RefreshKeys` against a live Keycloak container. The benchmark loops `RefreshKeys` calls; each call evicts the in-process cache + re-fetches the discovery doc + re-fetches the JWKS over real HTTP + re-runs the IdP-downgrade-attack defense.
|
||||
|
||||
**Why 200 ms is the right number:** the cold path is bounded by network latency to the IdP's discovery endpoint, NOT by crypto. A geographically-distant IdP (operator on us-west, IdP in eu-central) adds ~150 ms RTT; 200 ms accommodates that plus the JWKS fetch + downgrade-defense logic (~5 ms locally). Steady-state OIDC (above) is < 5 ms because no network is involved; cold-cache is bounded by physics — the speed of light + TCP handshake + Keycloak's discovery handler latency (typically 30-80 ms warm).
|
||||
|
||||
**Cold-cache OIDC: how to run.** The benchmark is build-tag-gated (`//go:build integration`) so `go test -short ./...` (the pre-commit `make verify` gate) never attempts to start Keycloak. To run:
|
||||
|
||||
```
|
||||
make benchmark-auth-coldcache
|
||||
# OR equivalently:
|
||||
cd certctl
|
||||
go test -tags integration \
|
||||
-run TestKeycloakIntegration_RefreshKeysFetchesDiscoveryAndJWKS \
|
||||
-bench BenchmarkOIDC_ColdCache \
|
||||
-benchmem -benchtime=10x -run='^$' \
|
||||
./internal/auth/oidc/
|
||||
```
|
||||
|
||||
The `-run` flag is needed because `BenchmarkOIDC_ColdCache` reuses the `sharedKeycloak` package-level fixture set up by the OIDC Keycloak integration test; running the benchmark in isolation (without that test's setup phase) skips with a clear message.
|
||||
|
||||
Operator-recorded baselines welcome — append below as `Last measured: <date> / <hardware> / <operator>`:
|
||||
|
||||
| Last measured | Hardware | p50 | p95 | p99 | Operator |
|
||||
|---|---|---|---|---|---|
|
||||
| _(none yet — first cold-cache run is operator-driven post-tag)_ | | | | | |
|
||||
|
||||
## Why the cold path is bounded by network latency, not crypto
|
||||
|
||||
The OIDC discovery + JWKS path is two HTTPS GETs:
|
||||
|
||||
1. `GET https://<idp>/.well-known/openid-configuration` → JSON document (typically 1-3 KiB).
|
||||
2. `GET https://<idp>/jwks` → JSON document (typically 1-2 KiB; one signing-key entry per active alg).
|
||||
|
||||
Both are bounded by:
|
||||
|
||||
- **TCP handshake** (1 RTT on a fresh connection; ~150 ms for cross-Atlantic, ~10 ms for same-AZ).
|
||||
- **TLS handshake** (1-2 RTTs; the certctl Go client does TLS 1.3 with single-RTT 0-RTT-disabled for security).
|
||||
- **HTTP request + response** (1 RTT per GET, plus serialization overhead).
|
||||
|
||||
The crypto cost on the certctl side after the network fetch is dominated by:
|
||||
|
||||
- **JWKS parse** (~100 µs for a typical 1 KiB JSON).
|
||||
- **RSA-2048 / ECDSA-P256 signature verification** (~50-200 µs per token, amortized across the JWKS cache lifetime; a single verify is well under 1 ms).
|
||||
- **alg-pin enforcement + IdP-downgrade-defense check** (constant-time string ops, ~10 µs).
|
||||
|
||||
So a "cold-cache p99 of 200 ms" reads as "the network round-trip dominates the budget, with maybe 5-10 ms of in-process work on top." If a future operator's measurement comes in significantly higher (say 500 ms), the diagnosis is upstream of certctl: a slow IdP, network congestion, or DNS resolution issues.
|
||||
|
||||
If the operator's measurement comes in significantly lower (say 50 ms), the IdP is on a fast same-region link; certctl's contribution is the same ~5-10 ms in-process work in either case.
|
||||
|
||||
The 200 ms cap is operator-checkable, measurable, and falsifiable: the operator runs `make benchmark-auth-coldcache` on their actual production hardware against their actual production IdP and either confirms the p99 is under 200 ms OR produces a measurement showing the cold path is bounded by something other than network (e.g. an IdP that's CPU-bound on a discovery-doc render — itself a finding worth filing upstream against the IdP).
|
||||
|
||||
## Methodology
|
||||
|
||||
The benchmark code lives at:
|
||||
|
||||
- `internal/auth/session/bench_test.go` — `BenchmarkSession_SteadyState` + `BenchmarkSession_ColdProcess`.
|
||||
- `internal/auth/oidc/bench_test.go` — `BenchmarkOIDC_SteadyState`.
|
||||
- `internal/auth/oidc/bench_keycloak_test.go` — `BenchmarkOIDC_ColdCache` (`//go:build integration`).
|
||||
|
||||
Each benchmark captures per-iteration timings into a `[]time.Duration` slice, sorts, and reports p50 / p95 / p99 / max via `b.ReportMetric`. Go's `testing.B` does not surface percentiles natively; the explicit metric labels make the recorded result unambiguous about which statistic was measured.
|
||||
|
||||
Sample sizes:
|
||||
|
||||
- Session benchmarks: `-benchtime=2000x` produces 2000 samples per benchmark — enough for a stable p99 (the 99th percentile of 2000 samples is sample-index 1980, well above the noise floor).
|
||||
- OIDC steady-state: same.
|
||||
- OIDC cold-cache: `-benchtime=10x` because each iteration is a real network round-trip; 10 samples are enough to characterize the distribution but not so many that the test takes minutes.
|
||||
|
||||
Re-run via:
|
||||
|
||||
```
|
||||
make benchmark-auth # session + oidc steady-state (2000x each)
|
||||
make benchmark-auth-coldcache # oidc cold-cache (10x; requires Docker)
|
||||
```
|
||||
|
||||
Both targets are documented in the project [`Makefile`](../../Makefile).
|
||||
|
||||
## Pre-merge audit
|
||||
|
||||
**All four benchmarks ran, four numbers recorded.** Steady-state targets met (p99 < 1 ms for session, p99 < 5 ms for OIDC). Cold-process target met (p99 < 10 ms). Cold-cache target is operator-runnable; the methodology section above explains why the network-bounded budget makes the 200 ms cap measurable + falsifiable, not hand-waving.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`auth-threat-model.md`](auth-threat-model.md) — threat model behind the validation paths benchmarked here.
|
||||
- [`oidc-runbooks/index.md`](oidc-runbooks/index.md) — per-IdP setup that determines real-world JWKS-fetch latency.
|
||||
- `internal/auth/session/service.go` — session validation pipeline.
|
||||
- `internal/auth/oidc/service.go` — OIDC token validation pipeline.
|
||||
- `internal/auth/oidc/testfixtures/keycloak.go` — testcontainers fixture used by the cold-cache benchmark.
|
||||
@@ -0,0 +1,692 @@
|
||||
# Authentication & authorization threat model
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This document describes the attack surface around authentication and
|
||||
authorization in certctl. It complements [`rbac.md`](rbac.md) and the
|
||||
per-IdP runbooks at
|
||||
[`oidc-runbooks/index.md`](oidc-runbooks/index.md) - those docs
|
||||
explain how to USE the controls; this one explains what those controls
|
||||
defend against and which threats they explicitly do NOT close.
|
||||
|
||||
certctl ships two authentication paths plus a break-glass admin
|
||||
fallback: API keys with SHA-256 hashing + role-based authorization,
|
||||
and OIDC SSO with HMAC-signed server-side sessions, CSRF rotation,
|
||||
RFC OIDC Back-Channel Logout, an OIDC first-admin bootstrap, and a
|
||||
default-OFF Argon2id break-glass admin path. Each surface brings its
|
||||
own threat catalogue + mitigations, documented below.
|
||||
|
||||
## Threat actors
|
||||
|
||||
1. **External attacker with no credential** - probing the public
|
||||
HTTP surface. The default trust boundary for everything except
|
||||
the protocol-level endpoints (ACME / SCEP / EST / OCSP / CRL,
|
||||
which authenticate via embedded credentials per their own RFCs).
|
||||
2. **Authenticated caller with the wrong role** - has a valid API
|
||||
key but the role doesn't grant the requested operation. The
|
||||
primary RBAC threat model.
|
||||
3. **Compromised API key** - attacker holds a valid Bearer token
|
||||
that an honest operator originally provisioned. The key may
|
||||
carry any role.
|
||||
4. **Insider operator** - legitimate access; potentially trying
|
||||
to escalate privilege or bypass the approval workflow.
|
||||
5. **Compromised audit reviewer (auditor role)** - read-only
|
||||
access to audit events but otherwise untrusted.
|
||||
|
||||
The following actors are added by the federated-identity surface:
|
||||
|
||||
6. **OIDC-federated end user** - authenticates via the
|
||||
organization's IdP (Keycloak / Okta / Auth0 / Entra ID / Authentik
|
||||
/ Workspace-via-broker). The user's credential lives at the IdP;
|
||||
certctl never sees it. Attack vectors center on token forgery,
|
||||
session hijacking, and group-claim manipulation.
|
||||
7. **Stolen session cookie holder** - attacker holds a valid
|
||||
`certctl_session` cookie value (typically via XSS, network MITM,
|
||||
or a developer who pasted a token into a chat / pastebin). Holds
|
||||
the attacker-side ability to make requests as the legitimate user
|
||||
until the cookie expires (idle 1h / absolute 8h defaults) or is
|
||||
revoked.
|
||||
8. **Compromised IdP** - the upstream IdP itself is rogue: signs
|
||||
tokens for arbitrary users, mints groups arbitrarily, etc. Largely
|
||||
out of certctl's control; mitigations are bounded to "the audit
|
||||
trail records the source provider on every login, blast radius is
|
||||
bounded by group_role_mapping configured for that provider."
|
||||
9. **Break-glass-password holder** - operator with
|
||||
the local Argon2id password set up for SSO outages. Bypasses the
|
||||
OIDC + group-claim layer entirely. The default-OFF posture is the
|
||||
load-bearing mitigation; once enabled the password is the entire
|
||||
attack surface.
|
||||
|
||||
## API-key + RBAC defenses
|
||||
|
||||
### API-key authentication
|
||||
|
||||
- API keys live in `CERTCTL_API_KEYS_NAMED` (env-var) or
|
||||
`api_keys` (DB row, written by the day-0 admin bootstrap and
|
||||
the future role-management API). Keys hash via SHA-256; the
|
||||
middleware compares hashes via `crypto/subtle.ConstantTimeCompare`
|
||||
to defeat timing attacks.
|
||||
- The auth middleware populates `ActorIDKey` / `ActorTypeKey` /
|
||||
`TenantIDKey` on every authenticated request context. Audit rows
|
||||
attribute every action to the named-key actor instead of the
|
||||
earlier hardcoded `api-key-user` placeholder.
|
||||
- Demo mode (`CERTCTL_AUTH_TYPE=none`) injects the synthetic
|
||||
`actor-demo-anon` actor with admin grants. Production deploys
|
||||
MUST NOT use demo mode.
|
||||
|
||||
### Authorization (RBAC)
|
||||
|
||||
- Every gated handler routes through `auth.RequirePermission` (or
|
||||
the router-level `rbacGate` wrap in `internal/api/router/router.go`).
|
||||
The middleware
|
||||
resolves the actor's effective permissions via the
|
||||
`Authorizer.CheckPermission` service-layer call; on miss, the
|
||||
handler returns HTTP 403 BEFORE the body runs. This is the
|
||||
load-bearing gate.
|
||||
- The five admin-only fine-grained perms (`cert.bulk_revoke` /
|
||||
`crl.admin` / `scep.admin` / `est.admin` /
|
||||
`ca.hierarchy.manage`) are seeded into `r-admin` only. To
|
||||
delegate one, an operator creates a custom role with the
|
||||
specific perm and grants it to the right actor.
|
||||
- The auditor split: `r-auditor` holds only `audit.read` +
|
||||
`audit.export`. Pinned by the
|
||||
`internal/domain/auth/auditor_test.go` invariants. A regulator
|
||||
with the auditor key cannot read certificates, profiles,
|
||||
issuers, or any mutating surface.
|
||||
- The privilege-escalation guard: granting or revoking a role
|
||||
requires the caller to hold `auth.role.assign` (enforced in
|
||||
`internal/service/auth/actor_role_service.go`). A non-admin
|
||||
cannot self-grant admin.
|
||||
- The reserved-actor guard: mutations against `actor-demo-anon`
|
||||
return HTTP 409 from the service layer
|
||||
(`ErrAuthReservedActor`). The synthetic actor is operator-
|
||||
inaccessible.
|
||||
|
||||
### Day-0 bootstrap
|
||||
|
||||
- `CERTCTL_BOOTSTRAP_TOKEN` is constant-time-compared by
|
||||
`EnvTokenStrategy.Validate`. The strategy is one-shot via
|
||||
`sync.Mutex`-guarded `consumed` bool; the second call returns
|
||||
`ErrDisabled` (HTTP 410), not `ErrInvalidToken` (HTTP 401), so
|
||||
a probing attacker cannot distinguish "wrong token, retry"
|
||||
from "already consumed".
|
||||
- The strategy also re-probes admin existence on every Validate.
|
||||
If an admin actor lands during the gap between Available and
|
||||
Validate, the second caller still gets HTTP 410.
|
||||
- The minted plaintext key is written to the response body once.
|
||||
It is NEVER logged. The token-leak hygiene test in
|
||||
`internal/api/handler/auth_bootstrap_test.go` redirects
|
||||
`slog.Default` to a buffer and grep-asserts that neither the
|
||||
bootstrap token nor the minted key appears in any log line,
|
||||
audit row, or HTTP header.
|
||||
- The minted key is hashed before persistence. Lost key →
|
||||
rotate via the regular RBAC API; the plaintext is not
|
||||
recoverable from the DB.
|
||||
|
||||
### Approval workflow + flip-flop loophole closure
|
||||
|
||||
- `CertificateProfile.RequiresApproval=true` gates two surfaces:
|
||||
(a) issuance + renewal of every cert pointing at the profile,
|
||||
(b) edits to the profile itself. The flip-flop loophole closure
|
||||
closure prevents the flip-flop bypass where an admin disables
|
||||
approval, mutates, re-enables.
|
||||
- Same-actor self-approve is rejected at the service layer with
|
||||
`ErrApproveBySameActor` for both `cert_issuance` and
|
||||
`profile_edit` kinds. Two-person integrity is the load-bearing
|
||||
invariant; pinned by tests in
|
||||
`internal/service/approval_test.go`.
|
||||
|
||||
### Audit trail
|
||||
|
||||
- Every mutating operation flows through `AuditService.RecordEvent`
|
||||
or `RecordEventWithCategory`. The audit-category extension added the
|
||||
`event_category` column with a `CHECK` constraint enforcing
|
||||
the closed enum (`cert_lifecycle` / `auth` / `config`); the
|
||||
category surfaces the auth-mutation slice to the auditor view.
|
||||
- The WORM trigger from migration 000018
|
||||
(`audit_events_worm_trigger`) blocks `UPDATE` and `DELETE` at
|
||||
the database layer. Even an admin DB user cannot tamper with
|
||||
audit history without dropping the trigger.
|
||||
- The audit redactor (`internal/service/audit_redact.go`)
|
||||
scrubs credentials + PII from the `details` JSONB before
|
||||
persistence; an `_redacted_keys` field surfaces what the
|
||||
redactor took out for compliance review.
|
||||
|
||||
### Protocol-endpoint allowlist
|
||||
|
||||
ACME / SCEP / EST / OCSP / CRL endpoints authenticate via
|
||||
embedded credentials defined by their own RFCs (JWS-signed,
|
||||
challenge passwords, mTLS, public-by-RFC). The auth middleware
|
||||
explicitly bypasses these via `IsProtocolEndpoint`. The
|
||||
`internal/api/router/phase12_protocol_allowlist_test.go` regression
|
||||
test pins the invariant at three layers (middleware bypass, allowlist
|
||||
constant, router-level no-rbacGate-wraps-protocol-paths).
|
||||
|
||||
## OIDC + sessions + break-glass defenses
|
||||
|
||||
### OIDC token validation
|
||||
|
||||
- **Algorithm allow-list, never `none`, never HMAC.** The service-
|
||||
layer pinning lives in `internal/auth/oidc/service.go::disallowedAlgs`
|
||||
+ `isDisallowedAlg`. The per-token alg check at sig-verify time
|
||||
(`isDisallowedAlg`, line ~1177) is the load-bearing defense — every
|
||||
ID token whose JWS header carries an alg outside the allow-list
|
||||
(RS256 / RS512 / ES256 / ES384 / EdDSA) is rejected with
|
||||
`ErrAlgRejected`. coreos/go-oidc additionally enforces the allow-list
|
||||
per-token at verify time as defense-in-depth against an upstream
|
||||
library regression. The IdP-downgrade-attack secondary defense at
|
||||
provider creation / `RefreshKeys` (v2.1.0-relaxed semantics)
|
||||
intersects the IdP's advertised `id_token_signing_alg_values_supported`
|
||||
with the allow-list and rejects only when the intersection is EMPTY
|
||||
— i.e., the IdP advertises NO acceptable alg. Pre-v2.1.0 the check
|
||||
strict-denied on ANY HS*/`none` advertisement; that broke against
|
||||
Keycloak 26.x (which lists every alg it's capable of in its discovery
|
||||
doc, including HS*, even when the realm only signs with RS256). The
|
||||
relaxation is safe because the per-token alg pin already prevents
|
||||
a real algorithm-confusion attack — a forged HS256 token using the
|
||||
IdP's RS256 pubkey as HMAC secret is rejected at sig-verify regardless
|
||||
of what the discovery doc advertises. Operators worried about a
|
||||
compromised IdP rotating to weak algs without rotating its certctl
|
||||
provider config get defense-in-depth from `JWKSStatus` + the alert
|
||||
hooks in the GUI panel.
|
||||
- **Exact `iss` match.** ID-token `iss` claim must equal the
|
||||
configured `OIDCProvider.IssuerURL` byte-for-byte (sentinel
|
||||
`ErrIssuerMismatch`). A token from a different IdP - even one
|
||||
with the same `aud` - cannot ride a misconfigured provider row.
|
||||
- **`aud` + `azp` checks.** Service-layer re-verification of the
|
||||
audience claim (must include `client_id`) plus the `azp` claim
|
||||
for multi-aud tokens (per OIDC core §3.1.3.7 step 5; sentinels
|
||||
`ErrAudienceMismatch`, `ErrAZPRequired`, `ErrAZPMismatch`). An
|
||||
attacker with a token issued for a different client cannot replay
|
||||
it against certctl.
|
||||
- **`at_hash` REQUIRED when access_token is present.** OIDC core
|
||||
treats `at_hash` as a "MAY"; certctl tightens to "MUST"
|
||||
(`ErrATHashRequired`). A substituted access token cannot ride
|
||||
alongside a clean ID token through the verifier.
|
||||
- **Single-use state + nonce.** Both 32-byte random server-generated
|
||||
values, persisted in the pre-login row keyed by the cookie. The
|
||||
pre-login row is consumed via `DELETE...RETURNING` on lookup
|
||||
(atomic single-use). `subtle.ConstantTimeCompare` on both. State
|
||||
replay returns `ErrPreLoginNotFound`; nonce mismatch returns
|
||||
`ErrNonceMismatch`.
|
||||
- **PKCE-S256 mandatory.** RFC 9700 §2.1.1 requires PKCE on auth-
|
||||
code; certctl hard-codes S256 via `oauth2.GenerateVerifier` +
|
||||
`oauth2.S256ChallengeOption`. The `plain` method is not just
|
||||
unsupported - the `ErrPKCEPlainRejected` sentinel exists so a
|
||||
future regression that surfaces a plain path trips a test.
|
||||
- **`iat` window.** Configurable per-provider (default 300s, capped
|
||||
at 600s by the domain validator). Defends against clock-skew
|
||||
attacks where an attacker submits a stale-but-valid token.
|
||||
- **JWKS rotation handled transparently** by coreos/go-oidc's built-
|
||||
in cache, plus the operator-triggered `Service.RefreshKeys` for
|
||||
forced refresh (and the auto-refresh on JWKS-cache TTL expiry,
|
||||
default 3600s).
|
||||
- **JWKS-fetch failure during a key rotation: fail closed.** The
|
||||
service maps go-oidc's network errors to `ErrJWKSUnreachable`
|
||||
(HTTP 503 to the in-flight login). Existing sessions are
|
||||
untouched. No exponential backoff, no auto-retry; the operator
|
||||
triages.
|
||||
- **Encrypted `client_secret` at rest.** AES-256-GCM via
|
||||
`internal/crypto.EncryptIfKeySet` (the same v3-blob path issuer
|
||||
+ target credentials use). The `client_secret_encrypted` column
|
||||
is `json:"-"` on the domain type so a misconfigured handler
|
||||
cannot wire-leak.
|
||||
|
||||
### Session minting + cookies
|
||||
|
||||
- **Length-prefixed HMAC.** Cookie wire format is
|
||||
`v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
|
||||
HMAC input is **length-prefixed** as `len(sid):sid:len(kid):kid`
|
||||
- NOT bare-concat. The bare-concat form admits a collision
|
||||
attack: `<a, bc>` and `<ab, c>` produce identical HMAC inputs,
|
||||
letting a forger swap one byte across the boundary. Pinned by
|
||||
`TestComputeHMAC_LengthPrefixDefeatsConcatCollision` +
|
||||
`TestService_Validate_ConcatenationCollisionDefeatedByLengthPrefix`.
|
||||
The `v1.` version prefix is reserved; unknown prefixes are
|
||||
rejected with no fallback.
|
||||
- **Cookie hardening.** `HttpOnly=true` (no JS access; defends XSS
|
||||
cookie theft), `Secure=true` (HTTPS-only; defends network MITM
|
||||
given HTTPS-Everywhere v2.2 milestone), `SameSite=Lax` default
|
||||
(configurable to Strict via `CERTCTL_SESSION_SAMESITE`), `Path=/`,
|
||||
no domain attribute (host-only).
|
||||
- **Idle + absolute timeouts.** 1h idle / 8h absolute defaults
|
||||
(configurable via `CERTCTL_SESSION_IDLE_TIMEOUT` /
|
||||
`_ABSOLUTE_TIMEOUT`). The session row tracks `last_seen_at`,
|
||||
`idle_expires_at`, `absolute_expires_at` independently; the
|
||||
scheduler's `sessionGCLoop` (default 1h) sweeps expired rows.
|
||||
- **CSRF defense.** Plaintext CSRF token in the JS-readable
|
||||
`certctl_csrf` cookie (intentionally `HttpOnly=false` so the GUI
|
||||
reads it for the `X-CSRF-Token` header). SHA-256 hash on the
|
||||
session row. `CSRFMiddleware` on state-changing methods uses
|
||||
`subtle.ConstantTimeCompare` against the hash. API-key actors
|
||||
(no session row) are CSRF-exempt - pinned by the bundle-1-compat
|
||||
CI guard.
|
||||
- **Optional defense-in-depth IP / UA bind** (default OFF;
|
||||
`CERTCTL_SESSION_BIND_IP` / `_BIND_USER_AGENT`). Mismatch
|
||||
returns `ErrSessionIPMismatch` / `ErrSessionUAMismatch`. Use
|
||||
with care - mobile clients on changing networks fail closed.
|
||||
- **Signing-key rotation primitive.** `RotateSigningKey` mints a
|
||||
new HMAC key; the old key stays valid for the configured
|
||||
retention window (default 24h via
|
||||
`CERTCTL_SESSION_SIGNING_KEY_RETENTION`) so existing cookies
|
||||
validate during the rollover. Past retention, the old key's row
|
||||
is dropped and any cookie still signed under it returns
|
||||
`ErrSigningKeyNotFound`.
|
||||
- **EnsureInitialSigningKey is fail-fatal at server boot.** Wired
|
||||
in `cmd/server/main.go` via `logger.Error + os.Exit(1)` so a
|
||||
server with a broken DB or RNG cannot boot into a state where
|
||||
session validation is impossible.
|
||||
- **Pre-login cookie discriminated from post-login.** Pre-login
|
||||
carries the `pl-` id prefix; post-login carries `ses-`. Defense-
|
||||
in-depth: `Validate` rejects pre-login cookies (pinned by
|
||||
`TestService_Validate_RejectsPreLoginCookieAtPostLoginGate`) so a
|
||||
stolen pre-login cookie cannot be replayed against the post-login
|
||||
gate.
|
||||
|
||||
### Back-channel logout
|
||||
|
||||
- **OpenID Connect Back-Channel Logout 1.0** (NOT RFC 8414).
|
||||
Endpoint: `POST /auth/oidc/back-channel-logout`. The IdP signs a
|
||||
logout JWT and POSTs it to certctl when a user logs out at the
|
||||
IdP. The handler validates the JWT against the IdP's JWKS via
|
||||
the same alg allow-list as the login flow.
|
||||
- **Required claims pinned.** `iss` / `aud` / `iat` / `jti` /
|
||||
`events` (with the spec-mandated logout event type); exactly
|
||||
one of `sub` / `sid`; `nonce` MUST be absent (per spec §2.4
|
||||
- logout tokens MUST NOT carry a nonce). All four pinned by
|
||||
the back-channel-logout negative-test matrix.
|
||||
- **`jti`-based replay defense.** The handler
|
||||
tracks recently-seen `jti` values to defeat logout-token replay
|
||||
attacks where an attacker captures a logout JWT and replays it.
|
||||
- **Cache-Control: no-store** on the response per spec §2.5.
|
||||
|
||||
### OIDC first-admin bootstrap
|
||||
|
||||
- **Coexists with the env-var-token bootstrap path.** Both can be
|
||||
configured; the admin-existence probe ensures only one wins.
|
||||
- **Group-scoped.** `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` is a comma-
|
||||
separated allowlist of IdP group names; users in any one of those
|
||||
groups become admins on FIRST login per tenant. Non-empty
|
||||
intersection with the user's resolved groups is required.
|
||||
- **One-shot per tenant via admin-existence probe.** Once any actor
|
||||
holds `r-admin` in the tenant, the bootstrap hook silently falls
|
||||
through to normal mapping (no admin grant). Operators rely on
|
||||
this to avoid an "always-admin-on-login" backdoor.
|
||||
- **Explicit OIDC provider gate.** `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`
|
||||
pins which provider's tokens are eligible. A multi-IdP deploy
|
||||
cannot have any provider's group claims become admin.
|
||||
- **Audit row on every grant.** `bootstrap.oidc_first_admin` event
|
||||
with `event_category=auth` + INFO log; the auditor monitors.
|
||||
|
||||
### Break-glass admin
|
||||
|
||||
- **Default-OFF.** `CERTCTL_BREAKGLASS_ENABLED=false` is the default;
|
||||
the entire surface (4 endpoints) is disabled. Operators flip it
|
||||
on during SSO incidents and back off after recovery.
|
||||
- **Surface invisibility via 404-not-403.** Every endpoint returns
|
||||
HTTP 404 when disabled - public login AND admin endpoints. A
|
||||
scanner cannot distinguish "endpoint disabled" from "endpoint
|
||||
doesn't exist." All five service-layer methods short-circuit with
|
||||
`ErrDisabled` before any DB lookup; the handler maps to
|
||||
`http.NotFound`.
|
||||
- **Argon2id with OWASP 2024 params.** `m=64MiB`, `t=3`, `p=4`,
|
||||
16-byte salt, 32-byte output, per-password random salt, PHC-format
|
||||
hash. The hash column is `json:"-"` so handlers cannot wire-leak.
|
||||
- **Lockout state machine.** `CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD`
|
||||
(default 5) failures within
|
||||
`CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL` (default 1h) trip a
|
||||
`CERTCTL_BREAKGLASS_LOCKOUT_DURATION` lock (default 30s; bumped
|
||||
from 100ms after the test discovered Argon2id verify itself takes
|
||||
~80-200ms each, making a millisecond-scale lockout invisible).
|
||||
Atomic single-statement `IncrementFailure` defeats concurrent
|
||||
racing attempts. Idempotent `ResetFailureCount`.
|
||||
- **Constant-time across all failure paths.** `verifyDummy()` runs a
|
||||
real Argon2id pass against an all-zeros throwaway salt on the
|
||||
no-credential and locked-account paths so all three failure modes
|
||||
(wrong password / locked / no actor) take statistically
|
||||
indistinguishable time. Pinned by
|
||||
`TestPhase7_5_ConstantTimeAcrossWrongPasswordAndNoCredentialPaths`
|
||||
(asserts within 5x ratio on durations).
|
||||
- **Audit row + WARN log at boot.** `auth.breakglass_login_*`
|
||||
events with `event_category=auth`. `cmd/server/main.go` emits a
|
||||
WARN-level log when `ENABLED=true` so the operator's log review
|
||||
notices an over-long enablement.
|
||||
- **Rate limit on the public login endpoint.** 5 attempts/minute
|
||||
via the existing `middleware.NewRateLimiter`.
|
||||
|
||||
## OIDC + sessions threat catalogue
|
||||
|
||||
The following sub-sections enumerate the threat surface introduced by
|
||||
the OIDC + sessions surface and the mitigations the platform ships. They are deliberately
|
||||
exhaustive - if a threat is listed here it has a concrete mitigation
|
||||
or a documented "operator-driven, out of scope" framing. New threats
|
||||
discovered post-2026-05-10 should be added here with a dated commit
|
||||
note.
|
||||
|
||||
### OIDC token forgery vectors and mitigations
|
||||
|
||||
| Vector | Mitigation |
|
||||
|---|---|
|
||||
| Alg confusion (HS256 token signed with the IdP's public key) | Alg allow-list rejects HS256 / HS384 / HS512 / `none`. Service-layer + go-oidc enforce in two layers. IdP-downgrade-attack defense at provider-creation time. |
|
||||
| Audience injection (token issued for a different client) | Service-layer `aud` re-check post-go-oidc verify; multi-aud tokens require matching `azp`. Sentinels `ErrAudienceMismatch` / `ErrAZPRequired` / `ErrAZPMismatch`. |
|
||||
| Issuer mismatch (token from a different IdP with the same alg + key shape) | Exact `iss` string match (`ErrIssuerMismatch`). The 21-case OIDC negative-test matrix pins the byte-for-byte requirement. |
|
||||
| Nonce replay (capturing a fresh token + replaying with the same nonce) | Single-use nonce stored in the pre-login row; `LookupAndConsume` is `DELETE...RETURNING` (atomic). Second use returns `ErrPreLoginNotFound`. |
|
||||
| State replay (CSRF on the IdP redirect) | Same single-use mechanism as nonce. State is `subtle.ConstantTimeCompare`d. |
|
||||
| `at_hash` substitution (clean ID token with a swapped access token) | `at_hash` REQUIRED when access_token present (certctl tightens OIDC core's MAY → MUST). `ErrATHashRequired` if missing; `ErrATHashMismatch` if non-matching. |
|
||||
| `iat` window manipulation (stale token replay) | `iat_window_seconds` configurable per-provider (default 300, cap 600). Future `iat` returns `ErrIATInFuture`; older-than-window returns `ErrIATTooOld`. |
|
||||
| JWKS rotation mid-login | coreos/go-oidc's built-in cache + auto-refresh on TTL expiry. Operator-triggered `Service.RefreshKeys` for forced refresh. |
|
||||
| JWKS-fetch failure during a key rotation | `ErrJWKSUnreachable` (HTTP 503 to in-flight login). Existing sessions untouched. Operator clicks "Refresh discovery cache" once IdP recovers. No exponential backoff. |
|
||||
|
||||
### Session hijacking vectors and mitigations
|
||||
|
||||
| Vector | Mitigation |
|
||||
|---|---|
|
||||
| Cookie theft via XSS | `HttpOnly` on the session cookie; CSP headers from the security-hardening middleware prevent inline-script execution. |
|
||||
| Cookie theft via network MITM | `Secure` flag + TLS 1.3-only control plane (HTTPS-Everywhere v2.2 milestone). |
|
||||
| CSRF on state-changing methods | `SameSite=Lax` default + double-submit-cookie pattern with hashed CSRF token on the session row. CSRFMiddleware fires on POST/PUT/PATCH/DELETE for session-authenticated callers; API-key actors are exempt. |
|
||||
| Session-cookie forgery via concatenation collision | Length-prefixed HMAC input (`len(sid):sid:len(kid):kid`). Pinned by two tests + a doc-block at the top of `service.go`. |
|
||||
| Stolen-cookie replay (attacker uses a valid cookie until expiry) | Short idle timeout (1h default) + admin-revoke-all-for-actor + back-channel logout from IdP + GUI session revocation. |
|
||||
| Cross-tab session interference | Cookie value is opaque + length-prefixed; tabs sharing the cookie share the session row. Sign-out in one tab calls `POST /auth/logout`; the next request from any tab gets a missing-row 401. |
|
||||
| Session-row race on sign-out vs in-flight request | `Validate` is the single point that reads the row; missing row = 401. There is no "stale read" path because every request re-validates. |
|
||||
|
||||
### IdP compromise scenarios
|
||||
|
||||
A rogue IdP issues malicious tokens (signs tokens for arbitrary users,
|
||||
mints arbitrary groups, etc.). Mitigations are largely out of certctl's
|
||||
control - the trust root is the IdP. Documented behaviors:
|
||||
|
||||
- **Operator should monitor IdP audit logs.** Federated identity is
|
||||
only as trustworthy as the IdP it federates from. The `iss` claim
|
||||
on every certctl audit row points at the source IdP so the
|
||||
operator can correlate against IdP-side audit.
|
||||
- **Operator can rotate group-role mappings from the GUI without
|
||||
redeploying.** If the IdP is compromised but not yet
|
||||
decommissioned, the operator can dial down access via
|
||||
`Auth → OIDC Providers → <provider> → Group → role mappings`
|
||||
and remove every mapping. Subsequent logins fail closed
|
||||
(`ErrGroupsUnmapped`); existing sessions continue until expiry.
|
||||
- **The audit trail records every OIDC login including the source
|
||||
provider.** Blast radius is bounded by the `group_role_mapping`
|
||||
table for that provider. A compromised provider configured with
|
||||
only `engineers → r-operator` cannot escalate to `r-admin` via
|
||||
any token forgery.
|
||||
- **The provider-delete path returns 409 when sessions exist for it.**
|
||||
`ErrOIDCProviderInUse` forces the operator to revoke the
|
||||
provider's active sessions before deletion - prevents accidental
|
||||
loss of audit lineage on a hot incident.
|
||||
|
||||
### Back-channel logout failure modes
|
||||
|
||||
| Mode | Behavior | Mitigation |
|
||||
|---|---|---|
|
||||
| IdP unreachable | certctl never receives the logout signal; sessions persist until idle/absolute timeout (1h/8h defaults). | Operator keeps absolute timeout short relative to risk tolerance. Manual revoke via GUI is always available. |
|
||||
| Logout token signature invalid | certctl returns 400; no session revoked; `auth.oidc_back_channel_logout_failed` audit row. | Operator-monitored audit row surfaces forged-logout-token attempts. |
|
||||
| Logout token replay (attacker captures + replays a valid logout JWT) | `jti`-based deduplication rejects the replay; first delivery succeeds, second returns 400. | Pinned by back-channel-logout negative tests. |
|
||||
| Logout token alg confusion | Same alg allow-list as the login flow; HS-family rejected. | The OIDC alg allow-list applies to BCL too (same `Provider.RemoteKeySet`). |
|
||||
| Missing `events` claim | Spec §2.4 requires the OIDC-defined logout event type; missing returns 400. | Pinned by negative test. |
|
||||
| `nonce` claim present | Spec §2.4 requires `nonce` MUST NOT appear in logout tokens; presence returns 400. | Pinned by negative test. |
|
||||
|
||||
### Group-claim manipulation
|
||||
|
||||
Per-IdP group-claim shapes are documented in
|
||||
[`oidc-runbooks/index.md`](oidc-runbooks/index.md). Manipulation
|
||||
threats:
|
||||
|
||||
| Vector | Mitigation |
|
||||
|---|---|
|
||||
| Operator misconfigures mapping (e.g. `engineers → r-admin` instead of `r-operator`) | `auth.group_mapping_added` / `_removed` audit row with `event_category=auth`. The auditor role monitors. |
|
||||
| Operator misconfigures `groups_claim_path` (e.g. `groups` when Auth0 emits `https://your-namespace/groups`) | User's group claim is ignored, user lands at "no roles assigned" screen. The GUI's OIDC provider detail page surfaces the configured path so the operator can verify. |
|
||||
| IdP renames a group (e.g. `engineers → eng-team`) | Mappings silently break; users get fewer roles than expected. `auth.oidc_login_unmapped_groups` audit row fires on every such login; auditor monitors for unexpected spikes. |
|
||||
| IdP user maintainer adds a user to an unintended group | Group is mapped to a higher-privilege role than intended; user gets the role on next login. Bounded blast radius: the group→role mapping is what they got, not arbitrary admin. Defense-in-depth: review mappings periodically; the auditor role can pull `auth.oidc_login_succeeded` rows by `details.subject` to spot drift. |
|
||||
|
||||
### Bootstrap phase risks
|
||||
|
||||
This section extends the day-0 bootstrap section with the OIDC
|
||||
first-admin path.
|
||||
|
||||
| Vector | Mitigation |
|
||||
|---|---|
|
||||
| `CERTCTL_BOOTSTRAP_TOKEN` (env-var fallback path) leaks | One-shot via `consumed` bool + admin-existence probe. Both arms close the path the moment any admin lands. |
|
||||
| `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` misconfigured to a wide group (e.g. `everyone`) | Unintended user becomes admin on first OIDC login. Mitigation: scope-down via `certctl-cli auth keys scope-down --suggest`. Operators configure narrow groups. The audit row on `bootstrap.oidc_first_admin` surfaces every grant. |
|
||||
| Both bootstrap strategies enabled simultaneously | Whichever fires first wins; the second sees admin-already-exists and falls through to normal mapping. No double-admin landing. |
|
||||
| `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` left unset with multi-IdP deploy | Hook fires on ANY provider's tokens. Mitigation: explicit gate documented in `cmd/server/main.go` startup logging; operator audit reviewed pre-tag. |
|
||||
|
||||
### Break-glass risks
|
||||
|
||||
| Vector | Mitigation |
|
||||
|---|---|
|
||||
| Phished password (operator gives password to attacker) | Bypasses OIDC + every group-claim gate. Mitigation: default-OFF posture; lockout after 5 failures; WebAuthn pairing (v3 / Decision 12) closes the gap properly. |
|
||||
| Brute-force online | Lockout state machine + 5/min rate limit on `/auth/breakglass/login`. |
|
||||
| Brute-force offline (DB compromise) | Argon2id with OWASP 2024 params (~80-200ms per verify). Cracking remains expensive even with GPU. |
|
||||
| Operator forgets to disable post-incident | Break-glass becomes a permanent backdoor. Mitigation: WARN log at boot when ENABLED=true; audit row on every break-glass login; runbook prescribes "disable within 24h of SSO recovery." |
|
||||
| Side-channel timing on no-credential vs wrong-password vs locked | All three paths take statistically indistinguishable time via `verifyDummy()`. Pinned by the timing-statistical test. |
|
||||
| Surface fingerprinting (scanner identifies break-glass exists) | All four endpoints return 404 (NOT 403) when disabled. Surface-invisibility - identical to a non-existent route. |
|
||||
| Reserved-actor `actor-demo-anon` mutation via break-glass admin | Service layer rejects with `ErrAuthReservedActor` (HTTP 409). Same gate as the RBAC path. |
|
||||
|
||||
### Token-leak hygiene (the explicit grep policy)
|
||||
|
||||
ID tokens, access tokens, refresh tokens, authorization codes, PKCE
|
||||
verifiers, state, nonce, signing keys, break-glass passwords MUST
|
||||
NEVER appear in any log line at any level.
|
||||
|
||||
The invariant is enforced by per-package `logging_test.go` files that
|
||||
redirect `slog.Default` to a buffer, run the service paths, and
|
||||
grep-assert the secret values are absent from every captured line.
|
||||
The pattern is `internal/auth/bootstrap/service_test.go`; the OIDC,
|
||||
session, and break-glass packages follow the same shape:
|
||||
|
||||
- `internal/auth/oidc/logging_test.go` - token / code / verifier /
|
||||
state / nonce / cookie / client_secret / alg name absent from
|
||||
HandleAuthRequest, HandleCallback, alg-rejection, and provider-
|
||||
load paths.
|
||||
- `internal/auth/session/service_test.go` - signing-key bytes absent
|
||||
from cookie-mint + validate paths.
|
||||
- `internal/auth/breakglass/service_test.go` - plaintext password +
|
||||
Argon2id hash absent from every audit row + log line +
|
||||
HTTP-response shape (json:"-" probe via `json.Marshal`).
|
||||
|
||||
The `details` JSONB column on `audit_events` runs through the
|
||||
audit redactor (`internal/service/audit_redact.go`) before
|
||||
persistence; the redactor's allow-list is conservative enough that
|
||||
adding a new token-shaped field to a new audit row defaults to
|
||||
redacted, not leaked.
|
||||
|
||||
## Closed federated-identity threats
|
||||
|
||||
Each item below was an open threat under the earlier API-key-only
|
||||
deployment posture. Status reflects current closure as of v2.1.0.
|
||||
|
||||
1. **OIDC federation** - ✅ closed. SAML and WebAuthn remain on the
|
||||
future-work list (Decision 12 — WebAuthn pairs with break-glass
|
||||
for hardware-token MFA). The break-glass path is a partial
|
||||
mitigation for the no-MFA case during SSO incidents.
|
||||
2. **Session management** - ✅ closed. HMAC-signed
|
||||
`__Host-certctl_session` cookie with length-prefixed wire format,
|
||||
1h idle / 8h absolute expiry, scheduler-driven GC, server-side
|
||||
revocation list (delete the row), GUI's "Sessions" page surfaces
|
||||
own + all-actor revocation, back-channel logout from the IdP.
|
||||
3. **Local password accounts (break-glass)** - ✅ closed. Argon2id
|
||||
+ lockout + default-OFF + 404-not-403 surface invisibility. NOT
|
||||
for general human auth - only the "SSO is broken, need admin
|
||||
access right now" path. WebAuthn pairing on the future-work list.
|
||||
4. **OIDC first-admin bootstrap** - ✅ closed.
|
||||
`CERTCTL_BOOTSTRAP_ADMIN_GROUPS` +
|
||||
`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` env vars + group-scoped +
|
||||
admin-existence-probe.
|
||||
5. **Rate limiting on the bootstrap endpoint** - acceptable
|
||||
(one-shot by construction; per-IP rate limiting on the broader
|
||||
API is in place via `middleware.NewRateLimiter`). The break-glass
|
||||
`/auth/breakglass/login` endpoint carries the same rate-limit
|
||||
primitive at 5/min.
|
||||
|
||||
## Future-work threats
|
||||
|
||||
The following are not yet closed:
|
||||
|
||||
1. **WebAuthn / FIDO2 second factor** - operator console is OIDC
|
||||
(or break-glass password) only. No hardware-token requirement
|
||||
even on the admin path. Decision 12.
|
||||
2. **Time-bound role grants / JIT elevation** - the
|
||||
`actor_roles.expires_at` column exists, no UI/API yet.
|
||||
3. **SAML federation** - OIDC only. Operators on SAML-only IdPs use
|
||||
the broker pattern (run Keycloak as a SAML-to-OIDC bridge); see
|
||||
the Google Workspace runbook for the same broker shape.
|
||||
4. **Multi-tenant data isolation activation** - the schema and
|
||||
repository layer carry tenant_id columns + a query-coverage CI
|
||||
guard, but tenant ACLs are not enforced. v2.1.0 ships
|
||||
single-tenant only (`t-default` seeded). The managed-service
|
||||
hosting work (operator decision item) is where multi-tenant
|
||||
flips on.
|
||||
5. **HSM / FIPS-validated signing key for sessions** - the session
|
||||
signing key is software-only (HMAC-SHA256, in-memory key
|
||||
material, encrypted at rest via `internal/crypto`). Operators
|
||||
in FIPS 140-3 environments need to supply their own
|
||||
`Signer` implementation; the abstraction at
|
||||
`internal/crypto/signer/` accommodates this but no PKCS#11
|
||||
driver ships yet.
|
||||
6. **OIDC RP-initiated logout** (the "/end_session_endpoint" flow
|
||||
where certctl signs a logout token + redirects the browser to
|
||||
the IdP). v2.1.0 implements ONLY the back-channel flow (IdP →
|
||||
certctl). Operators wanting the full bidirectional logout pair
|
||||
wait on a follow-on release.
|
||||
7. **GUI E2E via Playwright** - tracked alongside #9 above.
|
||||
8. **Per-IdP runbook external-tester sign-off** - encouraged via
|
||||
the operator-sign-off footers in `oidc-runbooks/*.md` but NOT a
|
||||
merge gate (operator decision 2026-05-10; the earlier
|
||||
"≥ 2 external testers" requirement was retired).
|
||||
|
||||
## Compliance mapping
|
||||
|
||||
The control set in this document supports the following
|
||||
framework requirements. This is a mapping; it is not a claim of
|
||||
formal certification.
|
||||
|
||||
- **SOC 2 CC6.1** (logical access controls) - RBAC primitive
|
||||
with role-based gating on every mutating endpoint.
|
||||
- **SOC 2 CC6.3** (privileged access management) - `r-admin`
|
||||
role separation + role-grant audit trail with two-person
|
||||
integrity on approval-tier profile edits.
|
||||
- **HIPAA §164.312(b)** (audit controls) - `event_category`
|
||||
column lets the auditor role review authentication / authorization
|
||||
changes specifically. WORM trigger keeps the audit table
|
||||
append-only at the database layer.
|
||||
- **NIST SSDF PO.5.2** (separation of duties) - two-person
|
||||
integrity for compliance-tier issuance via the
|
||||
`RequiresApproval` flow + the approval-bypass closure on
|
||||
profile edits.
|
||||
- **FedRAMP AU-9** (audit information protection) - WORM
|
||||
enforcement + auditor-only read access (the auditor role
|
||||
cannot mutate, the WORM trigger blocks UPDATE/DELETE).
|
||||
- **PCI-DSS §10** (audit logging) - every mutating operation
|
||||
emits an audit row with actor + action + resource + timestamp +
|
||||
category. The audit table is append-only.
|
||||
|
||||
## Operator-facing checks
|
||||
|
||||
Run these periodically to verify the controls are working.
|
||||
|
||||
1. `certctl-cli auth keys list` - confirm no unexpected actor
|
||||
holds `r-admin`. Audit any new admin grants against the audit
|
||||
log.
|
||||
2. `SELECT actor, action, COUNT(*) FROM audit_events WHERE
|
||||
action LIKE 'approval_%' AND timestamp > NOW() - INTERVAL '7
|
||||
days' GROUP BY actor, action;` - confirm approvals are
|
||||
happening and not concentrated in a single approver.
|
||||
3. `SELECT COUNT(*) FROM audit_events WHERE actor =
|
||||
'system-bypass';` - MUST return 0 in production. A non-zero
|
||||
count means `CERTCTL_APPROVAL_BYPASS=true` was set; production
|
||||
deploys MUST leave it unset.
|
||||
4. `SELECT actor, COUNT(*) FROM audit_events WHERE action =
|
||||
'bootstrap.consume';` - MUST return at most one row per
|
||||
tenant. Multiple rows means the bootstrap endpoint was called
|
||||
more than once, which the strategy's one-shot guard should
|
||||
have prevented; investigate.
|
||||
5. `certctl-cli auth me` while authenticated as the auditor
|
||||
key - `effective_permissions` must contain `audit.read` +
|
||||
`audit.export` ONLY. Any other permission means a role grant
|
||||
widened the auditor's surface; revoke immediately.
|
||||
|
||||
The following checks were added with v2.1.0's federated-identity surface:
|
||||
|
||||
6. `SELECT COUNT(*) FROM oidc_providers;` - confirm only the
|
||||
expected providers are configured. An unexpected row is a
|
||||
compromise indicator. Cross-check with the
|
||||
`auth.oidc_provider_created` audit row to find when + by whom.
|
||||
7. `SELECT actor_id, COUNT(*) FROM sessions WHERE NOT revoked AND
|
||||
absolute_expires_at > NOW() GROUP BY actor_id ORDER BY 2 DESC;`
|
||||
- confirm no actor has an unexpectedly large session count.
|
||||
Multi-session-per-actor is normal (laptop + phone), but a single
|
||||
actor with 50+ active sessions is a compromised-key signal.
|
||||
8. `SELECT COUNT(*) FROM audit_events WHERE action LIKE
|
||||
'auth.oidc_login_unmapped_groups' AND timestamp > NOW() -
|
||||
INTERVAL '7 days';` - non-zero rows mean users are completing
|
||||
IdP authentication but failing the group-mapping step. Either
|
||||
the IdP renamed a group, or an unauthorized user attempted
|
||||
access. Investigate.
|
||||
9. `SELECT COUNT(*) FROM audit_events WHERE action LIKE
|
||||
'auth.breakglass_%' AND timestamp > NOW() - INTERVAL '7 days';`
|
||||
- non-zero rows in steady state mean break-glass is being used
|
||||
outside an SSO incident OR was left enabled. Confirm
|
||||
`CERTCTL_BREAKGLASS_ENABLED` is `false` in non-incident windows.
|
||||
10. `SELECT COUNT(*) FROM audit_events WHERE action =
|
||||
'bootstrap.oidc_first_admin';` - MUST return at most one row
|
||||
per tenant. Multiple rows means the OIDC bootstrap hook fired
|
||||
more than once per tenant, which the admin-existence probe
|
||||
should have prevented; investigate.
|
||||
11. `SELECT COUNT(*) FROM session_signing_keys WHERE retired_at IS
|
||||
NOT NULL AND retired_at < NOW() - INTERVAL '7 days';` - retired
|
||||
keys past the retention window should have been GC'd. Non-zero
|
||||
rows mean the scheduler's `sessionGCLoop` is wedged.
|
||||
|
||||
## Cross-references
|
||||
|
||||
API-key + RBAC anchors:
|
||||
|
||||
- [`rbac.md`](rbac.md) - the operator how-to
|
||||
- [`security.md`](security.md) - the wider security posture
|
||||
- [`approval-workflow.md`](approval-workflow.md) - the two-person
|
||||
integrity gate
|
||||
- [`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md) -
|
||||
upgrade flow
|
||||
- `internal/auth/` - middleware + keystore + RequirePermission +
|
||||
bootstrap
|
||||
- `internal/service/auth/` - Authorizer + privilege-escalation
|
||||
guard + reserved-actor guard
|
||||
- `migrations/000029_rbac.up.sql` - schema + seed
|
||||
- `migrations/000030_rbac_admin_perms.up.sql` - five admin-only
|
||||
fine-grained perms
|
||||
- `migrations/000032_audit_category.up.sql` - auditor surface
|
||||
- `migrations/000033_approval_kinds.up.sql` - approval-bypass
|
||||
closure
|
||||
|
||||
OIDC + sessions + back-channel logout + break-glass anchors:
|
||||
|
||||
- [`oidc-runbooks/index.md`](oidc-runbooks/index.md) - per-IdP setup
|
||||
guides (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google
|
||||
Workspace) with cross-IdP recurring concepts at the top
|
||||
- `internal/auth/oidc/` - OIDC service (HandleAuthRequest /
|
||||
HandleCallback / RefreshKeys), hand-rolled groupclaim resolver,
|
||||
alg allow-list, IdP downgrade-attack defense
|
||||
- `internal/auth/session/` - session service (length-prefixed HMAC,
|
||||
cookie minting, idle/absolute expiry, signing-key rotation, GC),
|
||||
CSRF middleware, chained-auth combinator
|
||||
- `internal/auth/breakglass/` - default-OFF break-glass admin
|
||||
(Argon2id + lockout + constant-time + surface-invisibility)
|
||||
- `internal/auth/oidc/testfixtures/` - Keycloak
|
||||
testcontainers harness (`//go:build integration`)
|
||||
- `migrations/000034_oidc_providers.up.sql` - OIDC providers +
|
||||
group-role mappings tables
|
||||
- `migrations/000035_sessions.up.sql` - sessions + session-signing-
|
||||
keys tables
|
||||
- `migrations/000036_users.up.sql` - users (federated-human
|
||||
identity) table
|
||||
- `migrations/000037_oidc_pre_login.up.sql` - pre-login table + 7
|
||||
new auth permissions
|
||||
- `migrations/000038_breakglass_credentials.up.sql` - break-glass
|
||||
credentials table + 2 new permissions
|
||||
- `scripts/ci-guards/N-bundle-2-security-empty-preserved.sh` -
|
||||
OpenAPI `security: []` count guard
|
||||
- `scripts/ci-guards/bundle-1-compat-regression.sh` -
|
||||
API-key-only compat assertions (5 invariants)
|
||||
- `scripts/ci-guards/bundle-1-to-2-upgrade-regression.sh` -
|
||||
OIDC-upgrade-path assertions (6 invariants)
|
||||
@@ -1,32 +1,35 @@
|
||||
# Database TLS — Postgres Transport Encryption
|
||||
|
||||
**Audit reference:** Bundle B / M-018. PCI-DSS v4.0 Req 4 §2.2.5; CWE-319.
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** CWE-319 (Cleartext transmission of sensitive information).
|
||||
|
||||
certctl talks to Postgres over a single connection-string URL controlled by the
|
||||
`CERTCTL_DATABASE_URL` env var. The `sslmode` query parameter on that URL
|
||||
selects the transport-encryption posture. Pre-Bundle-B all the bundled
|
||||
deployment artifacts (Helm chart, docker-compose) hard-coded `sslmode=disable`.
|
||||
Bundle B exposes that as an operator-facing knob with a documented default and
|
||||
explicit opt-in / opt-out paths for the four real-world deployment shapes.
|
||||
selects the transport-encryption posture. The bundled deployment artifacts
|
||||
(Helm chart, docker-compose) historically hard-coded `sslmode=disable`;
|
||||
current builds expose that as an operator-facing knob with a documented
|
||||
default and explicit opt-in / opt-out paths for the four real-world
|
||||
deployment shapes.
|
||||
|
||||
## Quick reference
|
||||
|
||||
| Deployment shape | Default `sslmode` | When to change |
|
||||
|------------------------------------------------|--------------------|----------------|
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload is in PCI-DSS scope. |
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload handles sensitive data. |
|
||||
| Helm chart, external Postgres (RDS / Cloud SQL / Azure DB) | not auto-set | **Always** set to `verify-full` and provide the cloud provider's server CA bundle. |
|
||||
| docker-compose, bundled Postgres on docker bridge | `disable` | Demo/dev only; not a deployment shape we expect operators to harden. |
|
||||
| docker-compose / k8s with external Postgres | not auto-set | **Always** set `CERTCTL_DATABASE_URL` to a connection string with `sslmode=verify-full`. |
|
||||
|
||||
`sslmode` values come from `lib/pq` (the underlying driver). The full set is:
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`. PCI-DSS
|
||||
Req 4 v4.0 §2.2.5 considers `verify-ca` the floor for sensitive-data transport;
|
||||
`verify-full` is the floor for systems exposed to spoofing risk (it adds
|
||||
hostname validation against the server cert's CN/SAN).
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`.
|
||||
`verify-ca` is the floor for sensitive-data transport; `verify-full`
|
||||
is the floor for systems exposed to spoofing risk (it adds hostname
|
||||
validation against the server cert's CN/SAN).
|
||||
|
||||
## Helm chart (Bundle B)
|
||||
## Helm chart
|
||||
|
||||
Bundle B adds two values under `postgresql.tls`:
|
||||
The chart exposes two values under `postgresql.tls`:
|
||||
|
||||
```yaml
|
||||
postgresql:
|
||||
@@ -0,0 +1,120 @@
|
||||
# Helm Deployment
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operator runbook for deploying certctl on Kubernetes via the bundled Helm chart at `deploy/helm/certctl/`.
|
||||
|
||||
## Prereqs
|
||||
|
||||
- Kubernetes cluster, v1.27+
|
||||
- `kubectl` configured and authenticated
|
||||
- `helm` v3.13+
|
||||
- Storage class for the PostgreSQL StatefulSet PVC
|
||||
- TLS cert source: either an operator-supplied `kubernetes.io/tls` Secret OR a cert-manager `ClusterIssuer` / `Issuer`. The chart refuses to render without one. See [`tls.md`](tls.md) for the four cert provisioning patterns.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--create-namespace \
|
||||
--set server.apiKey=$(openssl rand -hex 32) \
|
||||
--set postgres.password=$(openssl rand -hex 32) \
|
||||
--set server.tls.existingSecret=certctl-server-tls
|
||||
```
|
||||
|
||||
`server.apiKey` and `postgres.password` should be high-entropy values. The example above generates them inline; production deployments use a secrets manager (Vault, External Secrets Operator, AWS Secrets Manager) instead.
|
||||
|
||||
## What you get
|
||||
|
||||
- **Server Deployment** with a configurable replica count (default 1; HA needs sticky sessions on the ACME server's nonce path)
|
||||
- **PostgreSQL StatefulSet** with PVC-backed persistence
|
||||
- **Agent DaemonSet** with one agent per node (configurable via `agent.daemonset.enabled=false` if you don't want the in-cluster agent)
|
||||
- Health probes (`/health` liveness + `/ready` readiness)
|
||||
- Security contexts: non-root, read-only root filesystem
|
||||
- Optional Ingress (off by default; opt in via `ingress.enabled=true`)
|
||||
|
||||
## Cert source patterns
|
||||
|
||||
### Pattern 1 — operator-supplied Secret (recommended for non-cert-manager shops)
|
||||
|
||||
```bash
|
||||
kubectl create secret tls certctl-server-tls \
|
||||
--cert=server.crt --key=server.key \
|
||||
--namespace certctl
|
||||
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set server.tls.existingSecret=certctl-server-tls
|
||||
```
|
||||
|
||||
### Pattern 2 — cert-manager Certificate CR (recommended for cert-manager shops)
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set server.tls.certManager.enabled=true \
|
||||
--set server.tls.certManager.issuerRef.name=my-cluster-issuer \
|
||||
--set server.tls.certManager.issuerRef.kind=ClusterIssuer
|
||||
```
|
||||
|
||||
### Refuses to render without one of the above
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ --namespace certctl
|
||||
# Error: server.tls.existingSecret OR server.tls.certManager.enabled must be set
|
||||
```
|
||||
|
||||
The render-time guard catches the missing config at `helm install` time, not at pod-crash-loop time.
|
||||
|
||||
## Verify the install
|
||||
|
||||
```bash
|
||||
kubectl wait --for=condition=Ready --timeout=3m \
|
||||
-n certctl pod -l app.kubernetes.io/name=certctl-server
|
||||
|
||||
kubectl port-forward -n certctl svc/certctl-server 8443:8443 &
|
||||
|
||||
# Bundle the TLS root from the Secret to verify
|
||||
kubectl get secret -n certctl certctl-server-tls -o jsonpath='{.data.ca\.crt}' \
|
||||
| base64 -d > /tmp/certctl-ca.crt
|
||||
curl --cacert /tmp/certctl-ca.crt https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
```
|
||||
|
||||
If the Secret has no `ca.crt` key (operator-supplied Secrets often don't), use `tls.crt` as the bundle. For a self-signed cert the two files are identical; for a chained cert distribute the root CA bundle separately via ConfigMap.
|
||||
|
||||
## Upgrade
|
||||
|
||||
```bash
|
||||
helm upgrade certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--reuse-values
|
||||
```
|
||||
|
||||
Postgres state survives the upgrade (the PVC is retained). The server / agent images bump per the chart's `image.tag`. See [`docs/archive/upgrades/`](../archive/upgrades/) for version-specific upgrade guidance.
|
||||
|
||||
## Configuration reference
|
||||
|
||||
Every value is documented at `deploy/helm/certctl/values.yaml`. Common tweaks:
|
||||
|
||||
- `server.replicaCount` — replica count (default 1)
|
||||
- `server.resources.{requests,limits}` — pod resource bounds
|
||||
- `agent.daemonset.enabled` — toggle the in-cluster agent (default true)
|
||||
- `postgres.storageSize` — PVC size (default 10Gi)
|
||||
- `ingress.enabled` + `ingress.host` — opt into Ingress
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Pod crash-loops with TLS error.** Cert + key in the Secret don't pair. Verify with `openssl x509 -modulus -in server.crt -noout | md5` against `openssl rsa -modulus -in server.key -noout | md5` — outputs must match.
|
||||
|
||||
**Agent DaemonSet pods can't reach the server.** Service DNS / NetworkPolicy issue. Confirm the agent's `CERTCTL_SERVER_URL` env points at the in-cluster service name (`https://certctl-server.certctl.svc.cluster.local:8443`).
|
||||
|
||||
**Postgres won't start.** PVC permissions. Check `kubectl describe pvc -n certctl certctl-postgres` and confirm the storage class supports `fsGroup`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`tls.md`](tls.md) — cert provisioning patterns + SIGHUP rotation
|
||||
- [`security.md`](security.md) — production security posture
|
||||
- [`runbooks/disaster-recovery.md`](runbooks/disaster-recovery.md) — Postgres restore + recovery procedures
|
||||
- [`docs/archive/upgrades/`](../archive/upgrades/) — version-specific upgrade procedures
|
||||
@@ -0,0 +1,209 @@
|
||||
# Legacy Clients (TLS 1.2) — Reverse-Proxy Runbook
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** CWE-326 (Inadequate encryption strength).
|
||||
|
||||
## What this is
|
||||
|
||||
certctl's control plane pins `tls.Config.MinVersion = tls.VersionTLS13`
|
||||
(`cmd/server/tls.go:131`). Some embedded EST (RFC 7030) and SCEP (RFC 8894)
|
||||
clients only speak TLS 1.0/1.1/1.2 — those clients cannot complete the
|
||||
handshake against certctl directly. This runbook documents the supported
|
||||
operator pattern: terminate the legacy TLS version at a front-door reverse
|
||||
proxy and pass the request through to certctl over TLS 1.3.
|
||||
|
||||
## Why TLS 1.3 minimum
|
||||
|
||||
certctl's audit posture and the M-001 PBKDF2 work factor both assume
|
||||
modern transport crypto. TLS 1.2 with the cipher suites still in the
|
||||
wild has known attack surface (BEAST, POODLE, ROBOT, raccoon — all
|
||||
CVE-categorized); allowing TLS 1.2 directly on the certctl listener
|
||||
would invalidate the guarantee that the server-side encryption chain
|
||||
is the strongest the ecosystem currently supports.
|
||||
|
||||
## When this runbook applies
|
||||
|
||||
You need this if **all three** are true:
|
||||
|
||||
1. You operate certctl with EST or SCEP enabled (`CERTCTL_EST_ENABLED=true`
|
||||
or `CERTCTL_SCEP_ENABLED=true`).
|
||||
2. Your enrolling clients are embedded devices (printers, network
|
||||
appliances, IoT boards, legacy MFPs, point-of-sale terminals) whose TLS
|
||||
stack pre-dates 2018 and only speaks TLS 1.2 or older.
|
||||
3. Replacing those clients is not feasible on a 6-month horizon.
|
||||
|
||||
If your enrolling clients are modern (any current Linux/Windows/macOS
|
||||
host, anything Go-based, anything Rust/Python/Node from 2019 onward),
|
||||
they speak TLS 1.3 natively and this runbook is unnecessary — point them
|
||||
straight at certctl on `:8443`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
Client["legacy EST/SCEP client"]
|
||||
Proxy["nginx / HAProxy<br/>reverse proxy"]
|
||||
Server["certctl :8443"]
|
||||
Client -->|"TLS 1.2/1.3<br/>(allowed TLS 1.2)"| Proxy
|
||||
Proxy -->|"TLS 1.3<br/>(re-encrypts as TLS 1.3)"| Server
|
||||
```
|
||||
|
||||
The reverse proxy:
|
||||
|
||||
- Terminates the legacy-version TLS handshake on the public-facing port.
|
||||
- Forwards the request to certctl over TLS 1.3 on a private network.
|
||||
- (For EST mTLS) forwards the client certificate via an
|
||||
`X-SSL-Client-Cert` header that certctl reads only when the connection
|
||||
arrives from a configured-trusted source IP.
|
||||
|
||||
## nginx config
|
||||
|
||||
```nginx
|
||||
upstream certctl_backend {
|
||||
# Private-network address; not reachable from outside the proxy host.
|
||||
server 10.0.0.10:8443;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl http2;
|
||||
server_name est.example.com;
|
||||
|
||||
# Public-facing legacy listener. ssl_protocols includes TLSv1.2 explicitly.
|
||||
# Keep ssl_ciphers conservative — only strong AEAD suites with forward
|
||||
# secrecy.
|
||||
ssl_certificate /etc/nginx/certs/est.example.com.fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/est.example.com.key;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
|
||||
ssl_prefer_server_ciphers on;
|
||||
|
||||
# mTLS for EST: optional client cert, verified against the EST CA.
|
||||
ssl_client_certificate /etc/nginx/certs/est-clients-ca.pem;
|
||||
ssl_verify_client optional;
|
||||
|
||||
location ~ ^/\.well-known/(est|pki) {
|
||||
# Forward the client cert (if presented) to certctl over the
|
||||
# private hop. The current certctl implementation IGNORES the
|
||||
# X-SSL-Client-Cert header (header-agnostic by default — see
|
||||
# the certctl-side configuration section below). EST/SCEP
|
||||
# authentication still works correctly because both protocols
|
||||
# carry their own auth (CSR signature for EST, challengePassword
|
||||
# for SCEP) inside the request body.
|
||||
proxy_set_header X-SSL-Client-Cert $ssl_client_escaped_cert;
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# The proxy-to-certctl hop is itself TLS 1.3.
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
|
||||
# SCEP endpoints — same pattern, no client-cert requirement
|
||||
# (SCEP authenticates via challengePassword inside the CSR).
|
||||
location ^~ /scep {
|
||||
proxy_set_header X-Forwarded-For $remote_addr;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_pass https://certctl_backend;
|
||||
proxy_ssl_protocols TLSv1.3;
|
||||
proxy_ssl_verify on;
|
||||
proxy_ssl_trusted_certificate /etc/nginx/certs/certctl-internal-ca.pem;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## HAProxy config (alternative)
|
||||
|
||||
```
|
||||
frontend est_legacy
|
||||
bind *:443 ssl crt /etc/haproxy/certs/est.example.com.pem alpn h2,http/1.1 \
|
||||
ssl-min-ver TLSv1.2 \
|
||||
ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384
|
||||
|
||||
acl is_est_path path_beg /.well-known/est
|
||||
acl is_pki_path path_beg /.well-known/pki
|
||||
acl is_scep_path path_beg /scep
|
||||
use_backend certctl_backend if is_est_path or is_pki_path or is_scep_path
|
||||
default_backend certctl_modern
|
||||
|
||||
backend certctl_backend
|
||||
server certctl 10.0.0.10:8443 ssl verify required \
|
||||
ca-file /etc/haproxy/certs/certctl-internal-ca.pem \
|
||||
ssl-min-ver TLSv1.3
|
||||
http-request set-header X-Forwarded-For %[src]
|
||||
http-request set-header X-Forwarded-Proto https
|
||||
```
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
The current implementation is **header-agnostic**: certctl ignores any
|
||||
`X-SSL-Client-Cert` / `X-Forwarded-For` headers from the proxy. EST
|
||||
authentication still happens via in-protocol CSR signature + profile
|
||||
policy (RFC 7030 §3.2.3); SCEP authentication still happens via the
|
||||
`challengePassword` attribute embedded in the CSR (RFC 8894 §3.2). Both
|
||||
mechanisms are inside the request body and survive the reverse-proxy
|
||||
hop without server-side header trust.
|
||||
|
||||
**Why this is the correct default:** trusting a proxy-supplied header
|
||||
for client identity opens a header-spoofing attack surface that requires
|
||||
careful design (CIDR allowlist of trusted proxies, fail-closed defaults,
|
||||
explicit operator opt-in). The legacy-clients work ships the
|
||||
TLS-bridge guidance as documentation only; a future commit can extend
|
||||
certctl with proxy-header trust if and when an operator demonstrates a
|
||||
deployment shape that requires it. Until that lands, the runbook above
|
||||
is operationally complete: legacy EST and SCEP clients continue to
|
||||
authenticate via their in-protocol mechanisms, and the reverse proxy is
|
||||
purely a TLS-version bridge.
|
||||
|
||||
If your deployment requires proxy-supplied client identity (e.g., the
|
||||
proxy terminates mTLS and you want certctl to record the client-cert
|
||||
subject in the audit trail beyond what the CSR carries), open an issue
|
||||
and a future commit will add a header-trust contract behind two
|
||||
fail-closed env vars: a CIDR allowlist of trusted proxies, plus an
|
||||
explicit opt-in toggle. Both knobs would be required together; setting
|
||||
only one would fail loud at startup. Until that work ships, the
|
||||
header-agnostic default described above is the only supported
|
||||
configuration.
|
||||
|
||||
## TLS posture summary
|
||||
|
||||
The configuration above:
|
||||
|
||||
- Pins TLS 1.2 + TLS 1.3 only (no SSLv3, TLS 1.0, TLS 1.1).
|
||||
- Uses only AEAD cipher suites with forward secrecy (ECDHE-* with GCM or
|
||||
ChaCha20-Poly1305).
|
||||
- Re-encrypts to TLS 1.3 on the proxy-to-certctl hop so the certctl
|
||||
listener never speaks anything below 1.3.
|
||||
|
||||
That is the strongest posture currently achievable while still allowing
|
||||
the legacy clients to enroll. Reviewers looking for the attestation
|
||||
should be pointed at this section + the proxy's TLS config.
|
||||
|
||||
## What this runbook does NOT cover
|
||||
|
||||
- **Replacing the legacy clients.** That's the long-term fix; this
|
||||
runbook is the bridge while you're migrating.
|
||||
- **Network segmentation.** The reverse proxy assumes the proxy-to-certctl
|
||||
hop is on a network that an external attacker can't reach. If it's
|
||||
not, you need a deeper architecture review.
|
||||
- **Client-cert revocation.** EST mTLS revocation is the relying party's
|
||||
responsibility. certctl's EST handler accepts the cert; the proxy can
|
||||
enforce CRL/OCSP via `ssl_crl_path` (nginx) or `crl-file` (HAProxy).
|
||||
|
||||
## When TLS 1.2 itself sunsets
|
||||
|
||||
Major browsers and OS vendors will eventually deprecate TLS 1.2. When
|
||||
that happens, this runbook becomes obsolete; the only path forward
|
||||
will be to replace the legacy clients. Watch the IETF TLS working
|
||||
group, the major browser vendors' announcement channels, and your
|
||||
own embedded-device vendors for deprecation notices.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/operator/tls.md`](tls.md) — the certctl-internal TLS configuration (HTTPS-only control plane, MinVersion pin)
|
||||
- [`docs/operator/security.md`](security.md) — overall security posture
|
||||
- [`docs/operator/database-tls.md`](database-tls.md) — Postgres TLS opt-in
|
||||
- [`docs/reference/protocols/scep-server.md`](../reference/protocols/scep-server.md) — SCEP RFC 8894 native server reference
|
||||
- [`docs/reference/protocols/est.md`](../reference/protocols/est.md) — EST RFC 7030 server reference
|
||||
@@ -0,0 +1,198 @@
|
||||
# Auth0 OIDC runbook
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This runbook wires certctl's OIDC SSO surface against [Auth0](https://auth0.com/), a commercial cloud IdP (now part of Okta but operationally distinct). Auth0 has a free developer tier suitable for evaluation; production runs on a paid B2B / B2C plan.
|
||||
|
||||
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook only documents the Auth0-specific deltas.
|
||||
|
||||
## The big Auth0 quirk: namespaced custom claims
|
||||
|
||||
Auth0 imposes a hard rule: any custom claim emitted from an Action MUST use a namespaced URL-shape key (e.g. `https://your-namespace/groups`). Auth0 silently strips claims that look like standard OIDC claims (`groups`, `roles`, `permissions`, etc.) when emitted from an Action — this is a security feature to prevent claim-spoofing.
|
||||
|
||||
certctl handles this via the `groups_claim_path` config. If your Action emits `https://your-namespace/groups`, set `OIDCProvider.groups_claim_path` to that exact URL. The hand-rolled groupclaim resolver at `internal/auth/oidc/groupclaim/resolver.go` recognizes URL-shape paths (anything starting with `http://` or `https://`) and treats the entire string as a single literal key — it does NOT split on `/`.
|
||||
|
||||
Set `groups_claim_format` to `string-array`; the underlying claim shape is still a JSON array of group-name strings, just stored under a URL-shape key.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**On the Auth0 side:**
|
||||
|
||||
- An Auth0 tenant (free dev tier at <https://auth0.com/signup> works). Tenant URL looks like `https://<tenant-name>.<region>.auth0.com`.
|
||||
- Owner or Auth0 Administrator role.
|
||||
- Network reachability from certctl-server to `https://<tenant>.auth0.com/.well-known/openid-configuration`.
|
||||
|
||||
**On the certctl side:** same as Keycloak.
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
### 1. Pick a namespace string
|
||||
|
||||
Decide on a unique URL-shape namespace for certctl's custom claims. It does NOT have to resolve to a real domain; Auth0 just requires it to be URL-shape and unique within your tenant. A reasonable choice:
|
||||
|
||||
```
|
||||
https://certctl.example.com/auth/
|
||||
```
|
||||
|
||||
Use that prefix for every custom claim; for groups specifically:
|
||||
|
||||
```
|
||||
https://certctl.example.com/auth/groups
|
||||
```
|
||||
|
||||
We'll refer to this as `<NS>/groups` in the rest of this runbook.
|
||||
|
||||
### 2. Create the Application
|
||||
|
||||
In the Auth0 dashboard:
|
||||
|
||||
**Applications → Applications → Create Application**:
|
||||
|
||||
- Name: `certctl`.
|
||||
- Application Type: **Regular Web Applications**.
|
||||
- Click **Create**.
|
||||
|
||||
On the saved app's **Settings** tab:
|
||||
|
||||
- Application Login URI: blank (Auth0 doesn't need it for the auth-code flow).
|
||||
- Allowed Callback URLs: `https://<your-certctl-host>:8443/auth/oidc/callback` (one entry, exact match).
|
||||
- Allowed Logout URLs: optional.
|
||||
- Allowed Web Origins: `https://<your-certctl-host>:8443`.
|
||||
- Token Endpoint Authentication Method: **Post** (default; matches the certctl service's expectation of `client_secret_post`).
|
||||
- Save Changes.
|
||||
|
||||
Copy the **Domain** (this is the issuer base — `https://<tenant>.auth0.com`), **Client ID**, and **Client Secret** from the same Settings page.
|
||||
|
||||
### 3. Configure the connection (where users live)
|
||||
|
||||
If you're using Auth0's Database connection (default username + password), the existing **Username-Password-Authentication** connection works. For SSO to Google / Microsoft / SAML, configure those connections under **Authentication → Enterprise** or **Authentication → Social** and ensure the connection is enabled on the certctl Application (App → Connections tab).
|
||||
|
||||
### 4. Define the groups
|
||||
|
||||
Auth0 doesn't have a first-class "Groups" concept like Okta or Keycloak — you have THREE options to model groups, each with tradeoffs:
|
||||
|
||||
**Option A: User app_metadata (simplest, recommended for dev tier).**
|
||||
|
||||
Each user has a `app_metadata` JSON blob you can set via the Management API, the dashboard, or a post-registration script. Stick the groups in there:
|
||||
|
||||
```json
|
||||
{
|
||||
"groups": ["certctl-engineers"]
|
||||
}
|
||||
```
|
||||
|
||||
In the Auth0 dashboard, **User Management → Users → <user> → app_metadata**: paste the JSON above and Save.
|
||||
|
||||
**Option B: Auth0 Authorization Extension (paid plans, recommended for production).**
|
||||
|
||||
Install the Authorization Extension from **Marketplace → Extensions → Authorization**. It adds a first-class "Groups" concept with UI for assignment + nested groups. Read the extension's docs; it emits groups under `<NS>/groups` automatically once enabled.
|
||||
|
||||
**Option C: Roles + Permissions (Auth0's RBAC primitive).**
|
||||
|
||||
Use **User Management → Roles** to define roles like `certctl-engineer` + `certctl-viewer`. Assign roles to users. Have your Action emit role names as a `groups` claim. This is what Auth0 documents as the canonical pattern; it's slightly heavier than Option A but more discoverable in the dashboard.
|
||||
|
||||
This runbook uses **Option A** for clarity; the Action below reads from `app_metadata.groups`.
|
||||
|
||||
### 5. Write the Action that emits the groups claim
|
||||
|
||||
**Actions → Library → Create Action → Build from scratch**:
|
||||
|
||||
- Name: `certctl-emit-groups`.
|
||||
- Trigger: **Login / Post Login**.
|
||||
- Runtime: Node 18.
|
||||
- Click **Create**.
|
||||
|
||||
Paste this code:
|
||||
|
||||
```javascript
|
||||
exports.onExecutePostLogin = async (event, api) => {
|
||||
const namespace = "https://certctl.example.com/auth/";
|
||||
const groups = (event.user.app_metadata && event.user.app_metadata.groups) || [];
|
||||
if (groups.length > 0) {
|
||||
api.idToken.setCustomClaim(namespace + "groups", groups);
|
||||
api.accessToken.setCustomClaim(namespace + "groups", groups);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
Replace `https://certctl.example.com/auth/` with your namespace from step 1. Click **Deploy**.
|
||||
|
||||
Then bind the Action to the Login flow:
|
||||
|
||||
**Actions → Flows → Login**: drag `certctl-emit-groups` from the Custom tab into the flow, between Start and Complete. Click **Apply**.
|
||||
|
||||
### 6. Verify the claim in a test login
|
||||
|
||||
Auth0's **Authentication → Authentication Profile → Try It** button or the **Logs → Real-time Logs** page can show you the issued ID token in real time. Decode at jwt.io to confirm `<NS>/groups` is present + populated.
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Auth0",
|
||||
"issuer_url": "https://<tenant>.auth0.com/",
|
||||
"client_id": "<paste-from-step-2>",
|
||||
"client_secret": "<paste-from-step-2>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "https://certctl.example.com/auth/groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"fetch_userinfo": false,
|
||||
"scopes": ["openid", "profile", "email"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
Critical:
|
||||
|
||||
- `issuer_url` includes the **trailing slash** for Auth0 (`https://<tenant>.auth0.com/`). Auth0's `iss` claim emits with the trailing slash; mismatching trips `ErrIssuerMismatch`.
|
||||
- `groups_claim_path` is the **full namespaced URL**, not the bare `groups` key. The certctl resolver treats this as a single literal lookup key against the ID token claims map (no path-walking through `/`).
|
||||
|
||||
Add the group→role mappings: `certctl-engineers` → `r-operator`, etc. The mapping table maps the group VALUES (the strings inside the claim's array), not the claim path.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end login + audit + Sessions checks are identical to Keycloak. The audit row's `details.subject` will be Auth0's user_id (e.g. `auth0|abc123…` for database users, `google-oauth2|...` for federated), stable across email changes.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**`ErrGroupsUnmapped` even though I see groups in the ID token at jwt.io.**
|
||||
|
||||
Check `groups_claim_path` exactly matches the namespaced key in the token. A common mistake: setting `groups_claim_path` to `groups` (the bare key) when the actual claim key is `https://certctl.example.com/auth/groups` (the namespaced version). The resolver's URL-shape detection is what makes the namespaced path work; if the claim path doesn't start with `http://` or `https://`, the resolver tries to walk it as a dot-separated path and fails.
|
||||
|
||||
**The `<NS>/groups` claim is missing from the ID token.**
|
||||
|
||||
- Action not bound to the Login flow: revisit step 5's "Apply" step.
|
||||
- Action returns early because `event.user.app_metadata.groups` is undefined: confirm the user has the metadata set.
|
||||
- Trying to set the claim under a non-namespaced key (e.g. `api.idToken.setCustomClaim("groups", groups)`): Auth0 silently drops it. Always use the namespace prefix.
|
||||
|
||||
**Auth0 returns "Service not found" or "Invalid audience".**
|
||||
|
||||
This usually means the certctl client wasn't authorized to access the userinfo endpoint or the application's `audience` setting conflicts with the OIDC discovery doc. The certctl service uses the Application's `client_id` as the `audience` claim — confirm Auth0 is emitting tokens with `aud = <client_id>` (decode at jwt.io).
|
||||
|
||||
**Login redirects loop between Auth0 and certctl.**
|
||||
|
||||
Most often a callback-URL mismatch — Auth0's "Allowed Callback URLs" must contain the EXACT certctl callback URL including port + scheme. Wildcards aren't allowed in production.
|
||||
|
||||
**`email_verified` is `false` and certctl rejects the user.**
|
||||
|
||||
certctl doesn't currently gate on `email_verified` — the User row stores email regardless. If your operator policy requires verified-only, add an Action that throws on `event.user.email_verified === false`:
|
||||
|
||||
```javascript
|
||||
if (!event.user.email_verified) {
|
||||
api.access.deny("email-not-verified");
|
||||
}
|
||||
```
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Same as [keycloak.md](keycloak.md#validation-checklist) with Auth0-specific values, plus:
|
||||
|
||||
- [ ] The `<NS>/groups` claim is present in the ID token (verify via jwt.io decode).
|
||||
- [ ] Removing a user's group from `app_metadata.groups` causes the next login to land on "no roles assigned".
|
||||
- [ ] The Auth0 dashboard's **Logs → Real-time Logs** shows the certctl callback completing with HTTP 302 to the dashboard.
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,144 @@
|
||||
# Authentik OIDC runbook
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This runbook wires certctl's OIDC SSO surface against [Authentik](https://goauthentik.io/), a free / open-source IdP that runs on-prem or self-hosted. Authentik shares the canonical "string-array groups claim under the `groups` key" pattern with Keycloak — the differences are in the admin console UX and the explicit "property mapping" abstraction.
|
||||
|
||||
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook only documents the Authentik-specific deltas.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**On the Authentik side:**
|
||||
|
||||
- Authentik ≥ 2024.10 (stable channel).
|
||||
- Admin access to the Authentik admin console at `https://<authentik-host>/if/admin/`.
|
||||
- Network reachability from certctl-server to `https://<authentik-host>/application/o/<application-slug>/.well-known/openid-configuration`.
|
||||
|
||||
**On the certctl side:** same as Keycloak — `CERTCTL_CONFIG_ENCRYPTION_KEY` set, an admin actor holding `auth.oidc.create` + `auth.oidc.edit`, server build ≥ v2.1.0.
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
### 1. Create the OAuth2 / OpenID Provider
|
||||
|
||||
In the Authentik admin console:
|
||||
|
||||
**Applications → Providers → Create**:
|
||||
|
||||
- Type: **OAuth2/OpenID Provider**.
|
||||
- Name: `certctl`.
|
||||
- Authorization flow: `default-provider-authorization-explicit-consent` (or `default-provider-authorization-implicit-consent` if you don't want a consent screen on every login).
|
||||
- Click **Next**.
|
||||
|
||||
Protocol settings:
|
||||
|
||||
- Client type: **Confidential**.
|
||||
- Client ID: leave the auto-generated value OR set to `certctl` for clarity.
|
||||
- Client Secret: copy the auto-generated value to a secure scratchpad — you'll paste it into certctl.
|
||||
- Redirect URIs/Origins: `https://<your-certctl-host>:8443/auth/oidc/callback` (one entry, exact match).
|
||||
- Signing Key: pick an **RSA-2048 or larger** key. Authentik defaults to ECDSA-P256 in newer versions; either is fine — both are in certctl's allow-list.
|
||||
- Subject mode: **Based on the User's hashed ID** (default; emits a stable opaque `sub`).
|
||||
- Include claims in id_token: **on**.
|
||||
- Click **Finish**.
|
||||
|
||||
### 2. Create the Application
|
||||
|
||||
Applications are how Authentik attaches a Provider to users + groups + policies.
|
||||
|
||||
**Applications → Applications → Create**:
|
||||
|
||||
- Name: `certctl`.
|
||||
- Slug: `certctl` (becomes part of the issuer URL: `https://<authentik-host>/application/o/certctl/`).
|
||||
- Provider: pick the `certctl` provider you just created.
|
||||
- Policy engine mode: **any** (default).
|
||||
- Click **Create**.
|
||||
|
||||
### 3. Configure the groups property mapping
|
||||
|
||||
Authentik emits group claims via "property mappings" — explicit objects rather than Keycloak's mapper-on-the-client model.
|
||||
|
||||
By default, the **Authentik default-OAuth Mapping: Proxy outpost** scope already includes the user's groups under a `groups` claim (string-array, matches what certctl expects). To verify or override:
|
||||
|
||||
**Customization → Property Mappings → Filter "Scope Mapping"**:
|
||||
|
||||
- Find or create one named `groups` with scope `groups` and expression:
|
||||
```python
|
||||
return [group.name for group in user.ak_groups.all()]
|
||||
```
|
||||
- Description: `Emits the user's group names as a string-array claim`.
|
||||
|
||||
Then on the **Provider → certctl → Edit → Advanced protocol settings**, ensure **Scopes** includes `groups` (and `profile` and `email` if you want richer User records on the certctl side).
|
||||
|
||||
### 4. Create the groups + assign users
|
||||
|
||||
**Directory → Groups → Create**:
|
||||
|
||||
- Name: `certctl-engineers`. Repeat for `certctl-viewers` (and optionally `certctl-admins`).
|
||||
|
||||
**Directory → Users → <user> → Edit → Groups**: pick the appropriate `certctl-*` group(s) for each user.
|
||||
|
||||
### 5. (Optional) Bind the application to specific groups
|
||||
|
||||
If you want certctl to reject login attempts from users outside the `certctl-*` groups at the IdP layer (defense-in-depth on top of certctl's fail-closed `ErrGroupsUnmapped`):
|
||||
|
||||
**Applications → certctl → Policy / Group / User Bindings → Create binding**:
|
||||
|
||||
- Type: **Group**.
|
||||
- Group: pick the union of `certctl-*` groups you want to allow.
|
||||
- Enabled: on.
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
Identical to Keycloak — only the issuer URL differs:
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Authentik",
|
||||
"issuer_url": "https://authentik.example.com/application/o/certctl/",
|
||||
"client_id": "<paste-the-client-id>",
|
||||
"client_secret": "<paste-the-client-secret>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"fetch_userinfo": false,
|
||||
"scopes": ["openid", "profile", "email", "groups"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
Authentik emits `groups` in the ID token by default once the property mapping is configured. The `scopes` array MUST include `groups` to trigger the claim emission — Authentik is stricter than Keycloak about scope-gating claims.
|
||||
|
||||
Add the group→role mappings the same way as Keycloak: `certctl-engineers` → `r-operator`, `certctl-viewers` → `r-viewer`.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end login + audit + Sessions checks are identical to Keycloak.
|
||||
|
||||
**Authentik-specific check:** the audit row's `details.subject` will be Authentik's hashed user ID (a 64-char hex), not the username. This is intentional and correct — the `sub` claim must be opaque + stable across user-attribute changes.
|
||||
|
||||
**JWKS-rotation drill:** Authentik rotates signing keys via **System → Tokens & App Passwords → Certificates** (rename of "Crypto" in newer versions). Add a new RSA-2048 cert, switch the Provider's Signing Key to the new one, then click "Refresh discovery cache" in certctl's GUI to evict the cache.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Provider creation fails with "could not load discovery document".**
|
||||
The issuer URL needs the trailing slash for some Authentik versions: `https://authentik.example.com/application/o/certctl/` (slash after the slug). Without the slash, Authentik returns a 301 redirect that Go's HTTP client follows but discovery parsing chokes on the redirect target.
|
||||
|
||||
**Login completes but user lands on "no roles assigned".**
|
||||
Decode the ID token at jwt.io against Authentik's JWKS. Check whether the `groups` claim is present + non-empty. If empty, the property mapping isn't wired — go back to step 3.
|
||||
|
||||
**`groups` claim missing entirely.**
|
||||
Authentik gates the `groups` claim behind the `groups` scope. Verify:
|
||||
- The certctl OIDCProvider config has `"scopes": ["openid", "profile", "email", "groups"]`.
|
||||
- The Authentik provider's "Scopes" list includes `groups`.
|
||||
|
||||
**Authentik emits the user's full DN as the `sub` claim.**
|
||||
Some Authentik configurations use **Subject mode: Based on the User's email** which surfaces the email as `sub`. This works but tightly couples certctl's User table to email mutability; recommend switching to "hashed ID" mode for new deployments. Existing User rows in certctl's `users` table will have email-shaped `oidc_subject` columns; that's fine and stable as long as the user's email never changes.
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Same as [keycloak.md](keycloak.md#validation-checklist), with Authentik-specific values for issuer URL + group names + signing-key rotation steps.
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,207 @@
|
||||
# Microsoft Entra ID (Azure AD) OIDC runbook
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This runbook wires certctl's OIDC SSO surface against [Microsoft Entra ID](https://learn.microsoft.com/entra/), formerly Azure AD. Entra ID is Microsoft's commercial cloud IdP; it's the default IdP for any organization on Microsoft 365 / Azure.
|
||||
|
||||
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook only documents the Entra-ID-specific deltas.
|
||||
|
||||
## The big Entra ID quirk: groups claim emits OBJECT IDs, not names
|
||||
|
||||
Entra ID's `groups` claim emits a JSON array of **group object IDs (GUIDs)**, not human-readable names. A user in `Engineering Group` and `Cert Operators` will see something like:
|
||||
|
||||
```json
|
||||
{
|
||||
"groups": [
|
||||
"8b9b1faa-4e83-471e-8b00-7d99c3e2a5f1",
|
||||
"f00cf1e2-2db1-4cdf-a1ba-1234567890ab"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**You must configure your certctl group→role mappings against these GUIDs**, not against `Engineering Group` or `Cert Operators`. There are workarounds (cloud-only group display names + the optional claims path; see the alternative below) but the GUID-based approach is the only one that works reliably across all Entra ID configurations.
|
||||
|
||||
This is by design at Microsoft — group names are mutable and not globally unique within a tenant; object IDs are immutable and globally unique. Operators on Microsoft 365 / Azure deployments are accustomed to managing access by GUID.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**On the Entra ID side:**
|
||||
|
||||
- A Microsoft 365 tenant or standalone Azure AD tenant. Free Azure AD tier is sufficient; paid tiers (P1/P2) unlock conditional access + SCIM provisioning + risk-based auth, none of which are required for the basic OIDC integration.
|
||||
- Application Administrator or Global Administrator role.
|
||||
- Network reachability from certctl-server to `https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration`.
|
||||
|
||||
**On the certctl side:** same as Keycloak.
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
### 1. Register the application
|
||||
|
||||
In the [Entra ID admin center](https://entra.microsoft.com/):
|
||||
|
||||
**Applications → App registrations → New registration**:
|
||||
|
||||
- Name: `certctl`.
|
||||
- Supported account types: **Accounts in this organizational directory only** (single-tenant; matches the typical operator use case).
|
||||
- Redirect URI: **Web** + `https://<your-certctl-host>:8443/auth/oidc/callback`.
|
||||
- Click **Register**.
|
||||
|
||||
On the saved app's **Overview** page, copy:
|
||||
|
||||
- **Application (client) ID** → certctl's `client_id`.
|
||||
- **Directory (tenant) ID** → goes into the issuer URL.
|
||||
|
||||
### 2. Create a client secret
|
||||
|
||||
**App → Certificates & secrets → Client secrets → New client secret**:
|
||||
|
||||
- Description: `certctl-server`.
|
||||
- Expires: 6 months / 12 months / 24 months — your choice. Set a calendar reminder; Entra ID does NOT auto-rotate secrets.
|
||||
- Click **Add**.
|
||||
|
||||
Copy the **Value** column immediately — it's shown ONCE on creation. The certctl provider's `client_secret` field gets this value.
|
||||
|
||||
(Production hardening: prefer **Certificates** over secrets for client authentication; certctl currently supports `client_secret_post` only, but a follow-on bundle can add `private_key_jwt` for cert-based client auth. Track this if you have a hard requirement against shared secrets.)
|
||||
|
||||
### 3. Add the `groups` claim to the token
|
||||
|
||||
**App → Token configuration → Add groups claim**:
|
||||
|
||||
- Pick **Security groups** (covers most operators) OR **Groups assigned to the application** (more granular but requires Premium).
|
||||
- Token type: **ID token** + **Access token** (both, so userinfo fallback works).
|
||||
- Customize emit format for ID/access: leave as **Group ID** (default; this is the GUID-based path the runbook is structured around).
|
||||
- Click **Save**.
|
||||
|
||||
If you instead want display names in the claim (only works for cloud-only groups; on-prem-synced groups continue to emit GUIDs regardless):
|
||||
|
||||
- Customize emit format → **Cloud-only group display names**.
|
||||
- BUT — note this works only for groups created in Entra ID itself, not groups synced from on-prem AD. Hybrid environments will have inconsistent claims.
|
||||
|
||||
### 4. Add the optional `email` and `profile` claims
|
||||
|
||||
By default Entra ID's ID token does NOT include `email` — Microsoft considers email part of the "OIDC profile" but only emits it under specific conditions. To force emission:
|
||||
|
||||
**App → Token configuration → Add optional claim → ID token → email**.
|
||||
|
||||
You may also want `family_name`, `given_name`, `preferred_username` for richer User records on the certctl side.
|
||||
|
||||
### 5. Grant the API permissions
|
||||
|
||||
**App → API permissions**:
|
||||
|
||||
- Microsoft Graph → Delegated permissions → ensure these are granted (most are default):
|
||||
- `openid`
|
||||
- `profile`
|
||||
- `email`
|
||||
- `offline_access` (optional; for refresh tokens — certctl doesn't use them currently).
|
||||
- Click **Grant admin consent** if your tenant requires it.
|
||||
|
||||
### 6. (Optional) Restrict who can sign in
|
||||
|
||||
By default any user in your tenant can attempt to sign in to the app. To restrict to specific users / groups:
|
||||
|
||||
**Enterprise applications → certctl → Properties → Assignment required: Yes**.
|
||||
Then **Users and groups → Add user/group** and pick the `cert-engineers` / `cert-viewers` Entra ID groups.
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Entra ID",
|
||||
"issuer_url": "https://login.microsoftonline.com/<tenant-id>/v2.0",
|
||||
"client_id": "<application-id>",
|
||||
"client_secret": "<client-secret-value>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"fetch_userinfo": false,
|
||||
"scopes": ["openid", "profile", "email"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `issuer_url` MUST include `/v2.0` at the end for the v2.0 endpoint. The v1.0 endpoint emits tokens with a different `iss` shape and is NOT supported by certctl. The discovery doc at `https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration` confirms the right path.
|
||||
- `<tenant-id>` is the Directory (tenant) ID GUID from step 1.
|
||||
|
||||
### Add the group→role mappings (GUID-keyed)
|
||||
|
||||
Get the GUIDs of your engineering / viewer groups:
|
||||
|
||||
**Entra ID → Groups → All groups → <group> → Overview → Object ID**.
|
||||
|
||||
Then in certctl:
|
||||
|
||||
```bash
|
||||
# Engineering group → r-operator
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"provider_id": "<provider-id>",
|
||||
"group_name": "8b9b1faa-4e83-471e-8b00-7d99c3e2a5f1",
|
||||
"role_id": "r-operator"
|
||||
}'
|
||||
```
|
||||
|
||||
Repeat for every group you want to map. **Document the GUID-to-name mapping in your operator runbook** — without it, the next operator looking at certctl's mappings page sees a wall of GUIDs with no way to know which is which. Consider naming the mapping descriptively if your group-mapping schema supports it (v2.1.0 doesn't yet — group-mapping descriptions are a parking-lot item for a follow-on release).
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end login + audit + Sessions checks are identical to Keycloak.
|
||||
|
||||
**Entra-ID-specific:** the audit row's `details.subject` will be Microsoft's `oid` claim (a GUID, the user's object ID), stable across UPN / email changes. The certctl `users` table's `oidc_subject` column holds this GUID.
|
||||
|
||||
**JWKS-rotation:** Microsoft auto-rotates signing keys on a documented schedule (every ~6 weeks). The discovery doc + JWKS endpoint always serve the union of active + recently-active keys, so in-flight logins continue to validate. No manual operator action needed in steady state. If you suspect a stuck cache after a Microsoft-side rotation, click "Refresh discovery cache" in the certctl GUI to evict.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Login completes; ID token contains a `hasgroups: true` claim instead of `groups`.**
|
||||
|
||||
Entra ID emits this when a user is in too many groups (>200 by default for ID tokens, >150 for access tokens) — Microsoft truncates the claim and tells the consumer to use Microsoft Graph to look up the full list. certctl does NOT currently support the Graph fallback path (it's a follow-on bundle item).
|
||||
|
||||
Workarounds:
|
||||
|
||||
- Reduce the user's group membership to <200 (rarely practical in large tenants).
|
||||
- Restrict the `groups` claim to "Groups assigned to the application" (Token configuration step 3 above) instead of "Security groups". The "assigned" set is bounded by the app's user assignments and stays under the limit.
|
||||
- Use Entra ID's optional `wids` (well-known IDs) claim if you only care about admin/non-admin distinction; certctl can be configured against `wids` by setting `groups_claim_path` accordingly.
|
||||
|
||||
**`groups` claim missing entirely.**
|
||||
|
||||
Step 3 wasn't completed — Entra ID does NOT emit `groups` by default. Add the claim via Token configuration before users will see it.
|
||||
|
||||
**`ErrIssuerMismatch` even though the `tid` in the token matches.**
|
||||
|
||||
The v2.0 endpoint emits `iss = https://login.microsoftonline.com/<tenant-id>/v2.0` (no trailing slash). The v1.0 endpoint emits `iss = https://sts.windows.net/<tenant-id>/`. Confirm certctl's `issuer_url` matches v2.0 exactly — no trailing slash, includes `/v2.0`.
|
||||
|
||||
**On-prem-synced groups emit GUIDs even when "Cloud-only display names" is selected.**
|
||||
|
||||
Expected behavior — Microsoft only emits display names for groups created in Entra ID itself (cloud-only). On-prem-synced groups always emit object IDs. The hybrid case is unfixable from the IdP side; either map against GUIDs (recommended) or migrate the relevant groups to cloud-only.
|
||||
|
||||
**The `email` claim is empty even though the user has a primary email.**
|
||||
|
||||
Entra ID's `email` claim only populates when:
|
||||
1. The user has a "Primary email" set on their Entra ID profile (often blank for B2B guest users).
|
||||
2. The optional claim was added in step 4.
|
||||
|
||||
For B2B guests, the `preferred_username` claim usually carries the email-shape login. You can configure certctl to use `preferred_username` as the user's display name fallback, but the `User.Email` column will remain blank — that's expected for guests.
|
||||
|
||||
**Conditional Access policies blocking the login.**
|
||||
|
||||
If your tenant has Conditional Access requiring MFA for new applications, certctl will see the user redirected through the MFA challenge. This works transparently — the certctl service doesn't care that MFA was performed; it only validates the resulting ID token. If MFA is failing for the user, debug at the Entra ID side (Sign-in logs).
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Same as [keycloak.md](keycloak.md#validation-checklist), with these additions:
|
||||
|
||||
- [ ] The ID token's `groups` claim is a string-array of GUIDs (decode at jwt.io).
|
||||
- [ ] Each certctl group-mapping uses the GUID, not a human-readable name.
|
||||
- [ ] A user with >200 groups successfully logs in (or the operator has documented the limitation + workaround in their internal runbook).
|
||||
- [ ] The Entra ID **Sign-in logs** view shows the certctl login event with status "Success".
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,186 @@
|
||||
# Google Workspace OIDC runbook (broker via Keycloak)
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This runbook wires certctl's OIDC SSO surface against [Google Workspace](https://workspace.google.com/) (formerly G Suite). Google's OIDC implementation has a well-known limitation that makes it unsuitable for direct integration with certctl: **the ID token does not emit a groups claim**, so there is no way for certctl's `ErrGroupsUnmapped` fail-closed contract to resolve a user's role assignment.
|
||||
|
||||
The recommended pattern is to **broker Google Workspace through Keycloak (or Authentik)** as a federated identity provider. The end-user still signs in with their Google account, but certctl talks to Keycloak — which DOES emit groups — instead of talking to Google directly.
|
||||
|
||||
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook builds on top of it.
|
||||
|
||||
## The Google Workspace quirk in detail
|
||||
|
||||
**What Google emits in an ID token:** `iss`, `aud`, `sub`, `azp`, `exp`, `iat`, `email`, `email_verified`, `name`, `picture`, `given_name`, `family_name`, `locale`, `hd` (hosted domain). That's it.
|
||||
|
||||
**What it does NOT emit:** `groups`, `roles`, `permissions`, or any indicator of the user's Google Workspace organizational unit / group membership.
|
||||
|
||||
There is a **Cloud Identity Groups API** at `https://cloudidentity.googleapis.com/v1/groups/-/memberships:searchTransitiveGroups` that lets a privileged service account look up a user's groups, but:
|
||||
|
||||
1. It requires a service account with domain-wide delegation, which is a major security surface to grant to certctl.
|
||||
2. It's a separate REST call after the OIDC flow, not a claim — certctl's group-claim resolver is path-shape, not API-shape.
|
||||
3. The latency budget of an extra API call per login is non-trivial in steady state.
|
||||
|
||||
For these reasons, the broker pattern is strongly preferred. If you absolutely cannot deploy a broker, see "Direct integration without groups" at the bottom of this runbook for a degraded mode where every Google-authenticated user gets a single fixed role.
|
||||
|
||||
## Architecture: broker pattern
|
||||
|
||||
```
|
||||
end user → Google Workspace login → Keycloak (federated IdP) → certctl
|
||||
↑
|
||||
│
|
||||
adds groups claim from Keycloak's group store
|
||||
(NOT from Google)
|
||||
```
|
||||
|
||||
In this topology:
|
||||
|
||||
- The end user's authentication credentials live at Google.
|
||||
- The user's group / role assignments live at Keycloak (manually or via SCIM provisioning from Google).
|
||||
- certctl talks ONLY to Keycloak. From certctl's perspective this is identical to the [keycloak.md](keycloak.md) runbook.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A running Keycloak instance with a realm dedicated to certctl. Read [keycloak.md](keycloak.md) and complete that runbook FIRST against a local-only test user. Verify end-to-end OIDC works against Keycloak before adding Google as a federated provider.
|
||||
- A Google Workspace tenant where you have Super Admin access OR can ask your Workspace admin to create OAuth credentials.
|
||||
- A Google Cloud project (free; same console as Workspace).
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
### Step 1: create a Google OAuth client
|
||||
|
||||
In the Google Cloud Console (`https://console.cloud.google.com/`):
|
||||
|
||||
**APIs & Services → OAuth consent screen → Configure**:
|
||||
|
||||
- User Type: **Internal** (restricts to your Workspace domain) OR **External** (any Google account; usually NOT what you want for an internal cert-management tool).
|
||||
- App name: `certctl SSO via Keycloak`.
|
||||
- User support email: your team's address.
|
||||
- Authorized domains: add the domain Keycloak runs on.
|
||||
- Save.
|
||||
|
||||
**APIs & Services → Credentials → Create Credentials → OAuth client ID**:
|
||||
|
||||
- Application type: **Web application**.
|
||||
- Name: `certctl-via-keycloak`.
|
||||
- Authorized redirect URIs: `https://<keycloak-host>/realms/<realm-name>/broker/google/endpoint` — this is Keycloak's default federated-IdP callback URL. Get the exact URL from Keycloak in step 2 below.
|
||||
- Click **Create**.
|
||||
|
||||
Copy the **Client ID** and **Client secret**.
|
||||
|
||||
### Step 2: add Google as a federated identity provider in Keycloak
|
||||
|
||||
In the Keycloak admin console (`https://<keycloak-host>/admin/`):
|
||||
|
||||
**Realm → Identity providers → Add provider → Google**:
|
||||
|
||||
- Alias: `google` (becomes part of the broker URL).
|
||||
- Display name: `Google Workspace`.
|
||||
- Client ID: paste from step 1.
|
||||
- Client secret: paste from step 1.
|
||||
- Default scopes: `openid profile email`.
|
||||
- Hosted Domain: your Workspace domain (e.g. `example.com`); restricts to your tenant.
|
||||
- Sync mode: **Force** (rewrites the user's first/last name/email from Google on every login; the alternative `Import` only writes on first login).
|
||||
- Trust email: **on** (Google verifies emails; certctl-Keycloak chain inherits the trust).
|
||||
- Click **Save**.
|
||||
|
||||
The **Redirect URI** field at the top of the saved provider's page shows the exact URL you should have entered in Google's console at step 1. Re-verify match.
|
||||
|
||||
### Step 3: configure group assignment in Keycloak
|
||||
|
||||
This is the load-bearing step — we're explicitly NOT trusting Google for groups, so Keycloak has to provide them.
|
||||
|
||||
**Option A: Manual group assignment in Keycloak.**
|
||||
|
||||
Federated users from Google appear in **Users** in Keycloak after their first login. You assign them to `certctl-engineers` / `certctl-viewers` / etc. groups in Keycloak's UI manually. Pro: simple. Con: doesn't scale; new hires can't log in until an operator adds them to a group.
|
||||
|
||||
**Option B: Default groups via "Default Groups" realm config.**
|
||||
|
||||
**Realm settings → User registration → Default Groups → Add**: pick the lowest-privilege group (e.g. `certctl-viewers`). Every new federated user lands here automatically; operators promote individual users to higher groups as needed.
|
||||
|
||||
**Option C: Mapper that derives groups from Google claims.**
|
||||
|
||||
If your Google Workspace has organizational units that align with your role split, you can add a Keycloak **Identity Provider Mapper** that maps `hd` (hosted domain) or a custom Google directory custom-schema field to a Keycloak group. This is moderately fragile and Workspace-version-dependent; recommend B for most operators.
|
||||
|
||||
**Option D: SCIM provisioning from Google to Keycloak.**
|
||||
|
||||
Google Workspace can SCIM-push group memberships to Keycloak via the SCIM-for-Google-Cloud-Identity feature. Heavyweight; recommend only if you already have SCIM infrastructure.
|
||||
|
||||
This runbook uses **Option B** (default group) for clarity.
|
||||
|
||||
### Step 4: verify the broker flow at Keycloak alone
|
||||
|
||||
Before bringing certctl into the picture:
|
||||
|
||||
1. Log out of Keycloak's admin console.
|
||||
2. Hit `https://<keycloak-host>/realms/<realm-name>/account` in an incognito window.
|
||||
3. Click "Sign in" — Keycloak's login page should now show **Sign in with Google Workspace** as a button below the local login form.
|
||||
4. Click it; authenticate via Google; you should land on Keycloak's account page.
|
||||
5. Back in the admin console, the user appears under **Users**. Confirm they're in the default group (Option B).
|
||||
|
||||
Only proceed to step 5 when Keycloak alone works end to end.
|
||||
|
||||
### Step 5: configure certctl against Keycloak (NOT against Google)
|
||||
|
||||
Follow the [keycloak.md](keycloak.md) runbook. Use the realm + client + groups configuration you set up there. The `OIDCProvider.issuer_url` is `https://<keycloak-host>/realms/<realm-name>` — Keycloak's URL, not Google's.
|
||||
|
||||
When the user clicks "Sign in with Keycloak" on certctl's login page, the browser flow is:
|
||||
|
||||
1. certctl → Keycloak authorize endpoint.
|
||||
2. Keycloak's login page shows **Sign in with Google Workspace** + the local login form. User clicks Google.
|
||||
3. Keycloak → Google authorize endpoint. User authenticates at Google.
|
||||
4. Google → Keycloak callback (`/broker/google/endpoint`). Keycloak resolves the user, assigns the default group.
|
||||
5. Keycloak → certctl callback. certctl sees a normal Keycloak ID token with the `groups` claim populated by Keycloak.
|
||||
6. certctl mints the session.
|
||||
|
||||
End-to-end the user clicks twice (Keycloak's "Sign in with Google" button + Google's consent / login). Subsequent logins skip the consent screen if Google's session is fresh.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end login + audit + Sessions checks are identical to Keycloak. The key Google-Workspace-specific check:
|
||||
|
||||
- The `users.oidc_subject` column in certctl's database should contain the Keycloak-side stable subject (a UUID), NOT the Google subject. Decode the certctl-side ID token and confirm `iss` is Keycloak's URL, `sub` is the Keycloak UUID. Don't confuse the certctl ID token with Google's ID token (which lives one hop upstream and certctl never sees directly).
|
||||
|
||||
## Direct integration without groups (NOT RECOMMENDED)
|
||||
|
||||
If broker deployment is impossible:
|
||||
|
||||
1. Configure certctl with `issuer_url = https://accounts.google.com`, `client_id` + `client_secret` from your Google OAuth client (with redirect URI pointed at certctl directly).
|
||||
2. Add a SINGLE group→role mapping where `group_name` is the empty string. **Wait — certctl rejects empty group names.** This is the structural reason this mode doesn't work: the fail-closed contract requires a real group claim to match.
|
||||
|
||||
The actual workaround is to manually add EVERY operator's email to a per-email mapping, OR to add a custom claim emitter at a thin proxy in front of Google. Both are hacks; the broker pattern is strictly better. We document the constraint here so future operators don't burn cycles trying to make it work.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Federated Google login completes at Keycloak but the user lands on "no roles assigned" at certctl.**
|
||||
|
||||
The user authenticated through Google → Keycloak successfully but Keycloak didn't assign them a group (Option A wasn't completed for that user, or Option B's default group isn't mapped on the certctl side). Check:
|
||||
|
||||
- Keycloak → Users → <user> → Groups: is the user in any `certctl-*` group?
|
||||
- certctl → Auth → OIDC Providers → Keycloak → Group → role mappings: is that group mapped?
|
||||
|
||||
**Google login fails with "redirect_uri_mismatch".**
|
||||
|
||||
The Google OAuth client's authorized redirect URI doesn't match Keycloak's broker callback URL exactly. Re-fetch the URL from Keycloak (Identity Providers → Google → Redirect URI field) and paste it verbatim into Google's console.
|
||||
|
||||
**Google auto-closes the consent prompt and returns "access_denied".**
|
||||
|
||||
Workspace admin policies may block third-party app access. Either the Google OAuth client wasn't approved by the Workspace admin (Google Workspace Admin Console → Security → API controls → Trusted apps), or the OAuth consent screen is configured for "External" but the user is from a different Workspace. Switch to "Internal" if everyone signing in is in the same Workspace.
|
||||
|
||||
**Keycloak log shows "Federated identity returned no email claim".**
|
||||
|
||||
You requested OAuth scopes other than `openid profile email`. Re-add `email` to the Default Scopes on the Keycloak Identity Provider config.
|
||||
|
||||
**Sign-out from certctl doesn't sign the user out of Google.**
|
||||
|
||||
Expected. certctl revokes its own session; Google's session continues independently. If the user needs to fully log out, they sign out at https://accounts.google.com/Logout. The certctl + Keycloak chain is the standard "single sign-on, separate sign-outs" model.
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Same as [keycloak.md](keycloak.md#validation-checklist), with these additions:
|
||||
|
||||
- [ ] Google → Keycloak federation works without certctl in the loop (step 4 above passes).
|
||||
- [ ] A first-time Google sign-in lands the user in the Keycloak default group (or whatever Option you picked).
|
||||
- [ ] The certctl audit row's `details.subject` is the Keycloak UUID, NOT Google's `sub` (which would be a Google account ID).
|
||||
- [ ] Removing a user from Google Workspace causes their NEXT certctl session-validate to fail (after their existing session expires) — verify with a deactivated test user.
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,55 @@
|
||||
# OIDC / SSO runbooks — per-IdP setup guides
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This is the index for the per-IdP setup runbooks for certctl's OIDC SSO surface. Pick the runbook that matches your identity provider; each one walks you through the IdP-side configuration, the certctl-side configuration, end-to-end verification, and the most common troubleshooting paths.
|
||||
|
||||
For the threat model behind certctl's OIDC implementation, see [`auth-threat-model.md`](../auth-threat-model.md). For the RBAC primitive that group→role mappings target, see [`rbac.md`](../rbac.md). For the underlying protocol details (PKCE, state, nonce, JWKS rotation, fail-closed semantics), see the OIDC service docstring at [`internal/auth/oidc/service.go`](../../../internal/auth/oidc/service.go).
|
||||
|
||||
## Choose your runbook
|
||||
|
||||
| IdP | Tier | Group claim shape | Quirks | Runbook |
|
||||
|---|---|---|---|---|
|
||||
| Keycloak | Free / open-source | `string-array` against `groups` | None — canonical reference | [keycloak.md](keycloak.md) |
|
||||
| Authentik | Free / open-source | `string-array` against `groups` | Property-mapping driven; explicit scope claim | [authentik.md](authentik.md) |
|
||||
| Okta | Commercial (free dev tier) | `string-array` against `groups` | Group-filter regex on the claim definition | [okta.md](okta.md) |
|
||||
| Auth0 | Commercial (free dev tier) | `string-array` against namespaced URL | Custom claims must use a namespaced key (e.g. `https://your-namespace/groups`) and are emitted via an Action | [auth0.md](auth0.md) |
|
||||
| Azure AD / Entra ID | Commercial | `string-array` of GROUP OBJECT IDs (GUIDs), not names | Mappings must target object IDs, not human-readable names | [azure-ad.md](azure-ad.md) |
|
||||
| Google Workspace | Commercial | NO native group claim | Direct OIDC against Google Workspace cannot emit groups; broker through Keycloak (or Authentik) instead | [google-workspace.md](google-workspace.md) |
|
||||
|
||||
## Common shape
|
||||
|
||||
Every runbook follows the same five-section layout so you can scan across IdPs:
|
||||
|
||||
1. **Prerequisites** — what you need on the IdP side (admin access, plan tier) and on the certctl side (an admin actor holding `auth.oidc.create` + `auth.oidc.edit`, the GUI / CLI / MCP surface available, the `CERTCTL_CONFIG_ENCRYPTION_KEY` env var set in production so client_secret encrypts at rest).
|
||||
2. **IdP-side configuration** — clickable steps in the IdP admin console, with the exact field names and values certctl needs.
|
||||
3. **certctl-side configuration** — `POST /api/v1/auth/oidc/providers` payloads, plus the GUI and MCP equivalents. The wire shape is the same across every IdP; only the values differ.
|
||||
4. **Verification** — what a successful end-to-end login looks like in the audit log and the GUI Sessions page, plus the JWKS-rotation drill.
|
||||
5. **Troubleshooting** — the failure modes you're statistically most likely to hit, mapped to the certctl service-layer sentinel error you'll see in the audit row.
|
||||
|
||||
## Cross-IdP recurring concepts
|
||||
|
||||
These show up in every runbook; understand them once and skim the rest.
|
||||
|
||||
**Redirect URI.** Every IdP needs the certctl-side callback URL registered as an allowed redirect URI. The format is `https://<your-certctl-host>/auth/oidc/callback` — port 8443 by default for the HTTPS-only control plane (Decision: post-v2.2 the platform is HTTPS-only, no plaintext port). For local-dev fixtures, `http://localhost:8443/auth/oidc/callback` is acceptable; production deployments MUST use HTTPS, and the OIDCProvider domain validator rejects HTTP issuer URLs in non-test paths.
|
||||
|
||||
**Client secret rotation.** Every IdP issues a `client_secret` for the confidential client (certctl is always a confidential client; public clients aren't supported because we have a server-side place to keep the secret). Rotating at the IdP requires the operator to PUT the new secret into certctl via the GUI's "Edit provider" dialog or `certctl_auth_update_oidc_provider` MCP tool — leaving `client_secret` empty in the update payload preserves the existing ciphertext, providing a value rotates.
|
||||
|
||||
**JWKS cache TTL.** The certctl service caches the IdP's JWKS document for `jwks_cache_ttl_seconds` (default 3600). When the IdP rotates a signing key, in-flight logins that try to validate a new-key-signed token against the stale cache fail with `ErrJWKSUnreachable` until the next refresh. Operators have two options: wait out the TTL, or click "Refresh discovery cache" in the GUI's OIDC Provider Detail page (`POST /api/v1/auth/oidc/providers/{id}/refresh`) to force-evict the cache. The Keycloak integration test exercises this drill end to end.
|
||||
|
||||
**Group→role mappings are fail-closed.** The certctl service refuses to mint a session for a user whose IdP-supplied groups don't match ANY configured mapping (`ErrGroupsUnmapped` → HTTP 401 to the user with a "no roles assigned" page). This is intentional — empty mapping ≠ "let everyone in," it means "this provider is not yet configured for any role." Operators add at least one mapping (typically `<engineers-group>` → `r-operator`) BEFORE rolling out OIDC to users.
|
||||
|
||||
**Nonce + state + PKCE-S256 are non-negotiable.** Every login flow round-trips a nonce (replay defense), a state (CSRF defense), and a PKCE-S256 verifier (RFC 9700 §2.1.1 mandate). `plain` PKCE is rejected at the service-layer sentinel level. None of this is configurable; if your IdP doesn't support PKCE-S256, you cannot use it with certctl.
|
||||
|
||||
**IdP downgrade-attack defense.** At provider creation AND on every JWKS refresh, certctl intersects the IdP's advertised `id_token_signing_alg_values_supported` with the certctl allow-list (RS256, RS512, ES256, ES384, EdDSA by default). If the IdP advertises HS256/HS384/HS512 or `none`, provider creation is rejected — even before any token is signed under the weak alg. This catches the case where a future compromised or misconfigured IdP tries to rotate to an alg-confusion-prone setup.
|
||||
|
||||
## When you finish a runbook
|
||||
|
||||
Each per-IdP runbook ends with a **validation checklist** the operator runs against a real production-tier deployment. Run through the matrix end-to-end against your IdP and mark your sign-off in the runbook's footer — that gives the next operator (or the next you) a dated record of what's been verified to work.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [RBAC operator reference](../rbac.md) — roles, permissions, scope-down + bootstrap flow.
|
||||
- [Auth threat model](../auth-threat-model.md) — API-key + OIDC + session compromise scenarios; v3 WebAuthn pairing.
|
||||
- [Security posture](../security.md) — overall auth surface including this OIDC layer.
|
||||
- [API keys → RBAC migration](../../migration/api-keys-to-rbac.md) — the v2.0.x → v2.1.0 RBAC upgrade flow your operator likely already ran.
|
||||
@@ -0,0 +1,245 @@
|
||||
# Keycloak OIDC runbook
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This is the canonical reference runbook for wiring certctl's OIDC SSO surface against [Keycloak](https://www.keycloak.org/). Keycloak is a free / open-source identity provider that runs on-prem or self-hosted; it is also the load-bearing test fixture for certctl's OIDC integration tests (`internal/auth/oidc/testfixtures/keycloak.go`), so the certctl-side validation pipeline is exhaustively exercised against it.
|
||||
|
||||
If your IdP is something else (Okta, Auth0, Azure AD, Authentik, Google Workspace), see the per-IdP siblings in [this directory](index.md). The mental model + certctl-side wiring are identical; only the IdP-side console differs.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**On the Keycloak side:**
|
||||
|
||||
- Keycloak ≥ 25.0 (older versions work but the screen flows differ slightly — the integration test fixture pins 25.0).
|
||||
- Admin access to a realm — either an existing tenant realm or a fresh one created for certctl. Don't share Keycloak's `master` realm; create a dedicated realm.
|
||||
- Network reachability from certctl-server to the Keycloak `https://<keycloak-host>/realms/<realm-name>` discovery endpoint. The certctl service fetches `/.well-known/openid-configuration` at provider creation and at every `RefreshKeys` call.
|
||||
- Keycloak's signing alg set to RS256 (default) or any of: RS512, ES256, ES384, EdDSA. HS256/HS384/HS512 + `none` are rejected by certctl's IdP-downgrade-attack defense at provider creation time.
|
||||
|
||||
**On the certctl side:**
|
||||
|
||||
- `CERTCTL_CONFIG_ENCRYPTION_KEY` set to a stable secret (production deployments only — the encryption-at-rest layer for the OIDC client_secret depends on it).
|
||||
- An admin actor holding `auth.oidc.create` + `auth.oidc.edit` (held by `r-admin` by default; granted via `certctl_auth_assign_role_to_key` MCP tool or the GUI's Auth → Keys page).
|
||||
- Server build ≥ v2.1.0.
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
The same configuration you'll do by hand here is what the testcontainers fixture imports from `internal/auth/oidc/testfixtures/keycloak-realm.json` — read that file alongside this runbook to see the exact JSON shape Keycloak persists.
|
||||
|
||||
### 1. Create or pick a realm
|
||||
|
||||
In the Keycloak admin console (`https://<keycloak-host>/admin/`), drop into the realm you'll use. If creating a new one, the realm name will become part of the issuer URL: `https://<keycloak-host>/realms/<realm-name>`.
|
||||
|
||||
### 2. Create the OIDC client
|
||||
|
||||
**Clients → Create client**:
|
||||
|
||||
- Client type: **OpenID Connect**
|
||||
- Client ID: `certctl` (or whatever you prefer; it goes into `OIDCProvider.client_id` on the certctl side).
|
||||
- Always display in console: off.
|
||||
- Click **Next**.
|
||||
|
||||
On the capability config page:
|
||||
|
||||
- Client authentication: **On** (this makes the client confidential, which is what certctl requires).
|
||||
- Authorization: off.
|
||||
- Standard flow: **on** (auth-code with PKCE — this is the path certctl uses).
|
||||
- Direct access grants: off (ROPC; the test fixture turns this on for ROPC convenience but production should NOT).
|
||||
- Implicit flow: off.
|
||||
- Service accounts roles: off.
|
||||
- Click **Next**.
|
||||
|
||||
Login settings:
|
||||
|
||||
- Root URL: leave blank.
|
||||
- Home URL: blank.
|
||||
- Valid redirect URIs: `https://<your-certctl-host>:8443/auth/oidc/callback` — ONE entry, exact match. Wildcards (`*`) work for local dev (`http://localhost:*`) but production should pin the exact host.
|
||||
- Valid post logout redirect URIs: blank or `+` (matches the redirect URI list).
|
||||
- Web origins: `+` (matches the redirect URI origin) or empty.
|
||||
- Click **Save**.
|
||||
|
||||
On the saved client's **Credentials** tab, copy the **Client secret** — you'll need it for the certctl-side payload.
|
||||
|
||||
### 3. Create the groups
|
||||
|
||||
**Groups → Create group**:
|
||||
|
||||
- Repeat for every certctl role you want to map to a group. A typical setup creates two:
|
||||
- `certctl-engineers` (intended target: `r-operator`)
|
||||
- `certctl-viewers` (intended target: `r-viewer`)
|
||||
- Optionally an `certctl-admins` group → `r-admin` for break-glass-free first-admin bootstrap; see the [`auth-threat-model.md`](../auth-threat-model.md) section on bootstrap admins.
|
||||
|
||||
### 4. Configure the group-membership claim mapper
|
||||
|
||||
This is the load-bearing step — without it, the ID token won't carry a `groups` claim and every login fails closed with `ErrGroupsUnmapped`.
|
||||
|
||||
**Clients → certctl → Client scopes → certctl-dedicated → Add mapper → By configuration → Group Membership**:
|
||||
|
||||
- Name: `groups`
|
||||
- Token Claim Name: `groups`
|
||||
- Full group path: **off** (so the claim emits `engineers`, not `/engineers`; matches the certctl `string-array` group-claim format).
|
||||
- Add to ID token: **on**.
|
||||
- Add to access token: **on** (optional but recommended; the userinfo-fallback path uses it).
|
||||
- Add to userinfo: **on**.
|
||||
- Click **Save**.
|
||||
|
||||
### 5. Create the user(s)
|
||||
|
||||
**Users → Add user**:
|
||||
|
||||
- Username: `alice` (or however you identify operators).
|
||||
- Email: required (used as the certctl-side `User.Email`).
|
||||
- First name + last name: optional but populates `User.DisplayName`.
|
||||
- Email verified: **on** if you trust the user.
|
||||
- Click **Create**.
|
||||
|
||||
On the saved user's **Credentials** tab:
|
||||
- Set a password. Mark **Temporary** if you want the user to reset on first login.
|
||||
|
||||
On the **Groups** tab:
|
||||
- Join the user to the group(s) you created in step 3.
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
### Via the GUI
|
||||
|
||||
1. Sign in as an admin actor.
|
||||
2. Navigate to **Auth → OIDC Providers** in the sidebar.
|
||||
3. Click **Configure provider**.
|
||||
4. Fill in:
|
||||
- **Display name**: `Keycloak` (free-text; what end-users see on the login page button).
|
||||
- **Issuer URL**: `https://<keycloak-host>/realms/<realm-name>`.
|
||||
- **Client ID**: `certctl` (matches step 2 above).
|
||||
- **Client secret**: paste the secret from step 2's Credentials tab.
|
||||
- **Redirect URI**: `https://<your-certctl-host>:8443/auth/oidc/callback`.
|
||||
- **Groups claim path**: `groups` (the default; matches step 4's Token Claim Name).
|
||||
- **Groups claim format**: `string-array` (the default).
|
||||
- **Fetch userinfo**: off (Keycloak emits groups in the ID token; userinfo fallback is for IdPs that don't).
|
||||
- **Scopes**: `openid profile email` (the certctl service prepends `openid` if missing).
|
||||
- **IAT window seconds**: 300 (default).
|
||||
- **JWKS cache TTL seconds**: 3600 (default).
|
||||
5. Click **Save**.
|
||||
|
||||
If the discovery doc fetch fails, the modal surfaces the error inline. The most common cause is a typo in the issuer URL — Keycloak emits 404 for any path under `/realms/` that doesn't match an actual realm.
|
||||
|
||||
### Via the API
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Keycloak",
|
||||
"issuer_url": "https://keycloak.example.com/realms/certctl",
|
||||
"client_id": "certctl",
|
||||
"client_secret": "<paste-the-secret>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"fetch_userinfo": false,
|
||||
"scopes": ["openid", "profile", "email"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
### Via MCP
|
||||
|
||||
```
|
||||
certctl_auth_create_oidc_provider {
|
||||
"name": "Keycloak",
|
||||
"issuer_url": "https://keycloak.example.com/realms/certctl",
|
||||
"client_id": "certctl",
|
||||
"client_secret": "<paste-the-secret>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"scopes": ["openid", "profile", "email"]
|
||||
}
|
||||
```
|
||||
|
||||
### Add the group→role mappings
|
||||
|
||||
GUI: **Auth → OIDC Providers → Keycloak → Group → role mappings → Add**.
|
||||
|
||||
- IdP group: `certctl-engineers` → certctl role: `r-operator`.
|
||||
- IdP group: `certctl-viewers` → certctl role: `r-viewer`.
|
||||
|
||||
API equivalent: `POST /api/v1/auth/oidc/group-mappings` with `{"provider_id": "<id>", "group_name": "certctl-engineers", "role_id": "r-operator"}`. MCP equivalent: `certctl_auth_add_group_mapping`.
|
||||
|
||||
Empty mapping list = nobody can log in via Keycloak (the fail-closed contract). Add at least one before announcing the SSO endpoint to users.
|
||||
|
||||
## Verification
|
||||
|
||||
### End-to-end login
|
||||
|
||||
1. Open `https://<your-certctl-host>:8443/login` in a fresh incognito window.
|
||||
2. The page renders an OIDC button block with `Sign in with Keycloak` (the display name from the create-provider step).
|
||||
3. Click it. The browser redirects to Keycloak, you authenticate as `alice`, Keycloak redirects back to certctl, and you land on the dashboard.
|
||||
4. Navigate to **Auth → Sessions**. You should see a row with your own actor ID, the IP you logged in from, and the current timestamp under "last seen".
|
||||
|
||||
### Audit trail
|
||||
|
||||
```bash
|
||||
curl https://<your-certctl-host>:8443/api/v1/audit?category=auth \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" | jq '.events[] | select(.action == "auth.oidc_login_succeeded")'
|
||||
```
|
||||
|
||||
You should see a row for the login above, with `details.provider_id` matching the Keycloak provider's id and `details.subject` set to the Keycloak user's `sub` claim (typically a UUID).
|
||||
|
||||
### JWKS-rotation drill
|
||||
|
||||
Operator action when Keycloak rotates its realm signing key:
|
||||
|
||||
1. In Keycloak: **Realm settings → Keys → Providers → Add provider → rsa-generated**, set priority higher than the current key (e.g. 200), enabled = on, active = on.
|
||||
2. In certctl: GUI → **Auth → OIDC Providers → Keycloak → Refresh discovery cache** button. Or the CLI / MCP equivalent: `POST /api/v1/auth/oidc/providers/<id>/refresh`.
|
||||
3. Run another login. The new ID token is signed under the new key; the certctl service validates it against the freshly-fetched JWKS doc.
|
||||
|
||||
The Keycloak integration test `TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey` exercises this exact flow end to end.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Discovery doc fetch failed" at provider creation.**
|
||||
The most common cause is a wrong issuer URL — typo in realm name, missing `/realms/` segment, or HTTP→HTTPS redirect that the Go client doesn't follow without explicit headers. Curl the URL manually:
|
||||
```
|
||||
curl -v https://<keycloak-host>/realms/<realm-name>/.well-known/openid-configuration
|
||||
```
|
||||
If that returns 404, fix the realm name. If it returns 200 but certctl still fails, check `cmd/server` logs for the wrapped error.
|
||||
|
||||
**"IdP downgrade-attack defense" rejected provider creation.**
|
||||
Keycloak's realm has a signing key advertised in `id_token_signing_alg_values_supported` that's in certctl's deny-list (HS256/HS384/HS512/`none`). Check **Realm settings → Keys → Providers** — disable any HMAC key providers and re-create the provider in certctl.
|
||||
|
||||
**Login redirects to Keycloak, the user authenticates, but the callback redirects back to `/login` with "no roles assigned".**
|
||||
The user authenticated successfully but their groups didn't match any configured mapping (`ErrGroupsUnmapped`). Check:
|
||||
- The user is actually a member of the group you mapped (Users → user → Groups tab in Keycloak).
|
||||
- The group-membership mapper is configured correctly (Clients → certctl → Client scopes → certctl-dedicated → mappers → groups → "Full group path: off" matters).
|
||||
- The group name in your certctl mapping exactly matches what Keycloak emits — case-sensitive, no leading slash if "Full group path: off".
|
||||
|
||||
You can confirm what Keycloak is actually emitting by decoding the ID token at jwt.io against the Keycloak public key, or by enabling certctl's debug logging on the OIDC service for one login (logs are scrubbed of token contents per the OIDC service's token-leak hygiene contract; debug logs surface only the resolved group list and the mapping decision).
|
||||
|
||||
**"id_token verify failed: token used before issued"**
|
||||
Clock skew between Keycloak and certctl-server. Either align both to NTP, or bump `iat_window_seconds` on the OIDC provider config (default 300 = 5 minutes). The certctl service caps `iat_window_seconds` at 600.
|
||||
|
||||
**"oidc: pre-login session not found or already consumed"**
|
||||
The user clicked the OIDC login button, then the browser tab idled past the 10-minute pre-login TTL OR the user opened the IdP login in a new tab and consumed the row from the first one. Have them retry.
|
||||
|
||||
**"oidc: state parameter mismatch (replay or forgery)"**
|
||||
Either the user double-submitted a callback URL (clicked it twice from email or browser history), or a CSRF attempt. The pre-login row is single-use; second consumption returns `ErrPreLoginNotFound`. Have them retry from the login page.
|
||||
|
||||
**Sessions revoked but the user can still hit the API.**
|
||||
Check the session contract: the cookie is HMAC-validated on every request, but the actual database row is what `Revoke` deletes. If your reverse proxy is caching the response or the `__Host-certctl_session` cookie wasn't actually cleared on the client, the cookie will hit the server's session middleware which will return 401 on the missing-row lookup. The middleware never serves stale data; the issue is upstream of certctl in this case.
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Before signing off this runbook for production rollout, validate these end-to-end:
|
||||
|
||||
- [ ] `auth.oidc_provider_created` audit row appears after the create-provider POST.
|
||||
- [ ] `Sign in with Keycloak` button renders on the login page after `getAuthInfo` returns the configured provider.
|
||||
- [ ] A user with mapped groups completes the auth-code flow and lands on the dashboard.
|
||||
- [ ] A user WITHOUT mapped groups gets the "no roles assigned" landing (not the dashboard).
|
||||
- [ ] The `auth.oidc_login_succeeded` and `auth.oidc_login_failed` audit rows correctly distinguish the two cases.
|
||||
- [ ] The Sessions page shows the new session, with self-pill on the caller's row.
|
||||
- [ ] Revoking the session via the GUI causes the next API request from that browser to 401 + redirect to login.
|
||||
- [ ] Running the JWKS-rotation drill (steps above) does not break in-flight logins; rotated tokens validate against the refreshed JWKS.
|
||||
- [ ] Editing the provider with `client_secret` blank preserves the existing ciphertext (operator confirms by reading the `oidc_providers.client_secret_encrypted` column before + after the PUT — bytes unchanged).
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,143 @@
|
||||
# Okta OIDC runbook
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This runbook wires certctl's OIDC SSO surface against [Okta](https://www.okta.com/), a commercial cloud IdP. Okta offers a free developer tier (`https://dev-NNNNN.okta.com`) suitable for evaluation; production runs on a paid Workforce Identity tenant.
|
||||
|
||||
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook only documents the Okta-specific deltas.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**On the Okta side:**
|
||||
|
||||
- A Workforce Identity tenant (or free Developer Edition account at <https://developer.okta.com/signup/>).
|
||||
- Super Admin or Application Admin role in your Okta tenant.
|
||||
- Network reachability from certctl-server to `https://<your-org>.okta.com/.well-known/openid-configuration` OR to a custom authorization server endpoint if you're using one (`https://<your-org>.okta.com/oauth2/<auth-server-id>/.well-known/openid-configuration`).
|
||||
|
||||
**On the certctl side:** same as Keycloak.
|
||||
|
||||
## IdP-side configuration
|
||||
|
||||
### 1. Create the OIDC application
|
||||
|
||||
In the Okta admin console:
|
||||
|
||||
**Applications → Applications → Create App Integration**:
|
||||
|
||||
- Sign-in method: **OIDC - OpenID Connect**.
|
||||
- Application type: **Web Application**.
|
||||
- Click **Next**.
|
||||
|
||||
App config:
|
||||
|
||||
- App integration name: `certctl`.
|
||||
- Logo: optional.
|
||||
- Grant types: **Authorization Code** (CHECK). Leave Refresh Token unchecked unless you have a specific reason — certctl doesn't currently use refresh tokens.
|
||||
- Sign-in redirect URIs: `https://<your-certctl-host>:8443/auth/oidc/callback`.
|
||||
- Sign-out redirect URIs: optional; leave empty unless you also configure RP-initiated logout.
|
||||
- Trusted Origins: leave default.
|
||||
- Assignments → Controlled access: **Limit access to selected groups** (recommended; pick the `certctl-*` groups from step 3 below).
|
||||
- Click **Save**.
|
||||
|
||||
On the saved app's **General** tab, copy the **Client ID** and **Client secret** (under Client Credentials). The secret is shown once on creation — copy it immediately or rotate via "Generate new secret".
|
||||
|
||||
### 2. Pick or create an authorization server
|
||||
|
||||
Okta has TWO authorization-server tiers:
|
||||
|
||||
- **The Org Authorization Server** at `https://<your-org>.okta.com` — emits ID tokens with limited claims; cannot host custom claims directly. Use for the simplest setup.
|
||||
- **A Custom Authorization Server** at `https://<your-org>.okta.com/oauth2/<auth-server-id>` — fully configurable scopes + claims + access policies. The free developer tier ships with a default custom server at `/oauth2/default`. Recommended for production.
|
||||
|
||||
For this runbook we use the default custom server: `https://<your-org>.okta.com/oauth2/default`.
|
||||
|
||||
### 3. Create the groups + assign users
|
||||
|
||||
**Directory → Groups → Add Group**:
|
||||
|
||||
- Repeat for `certctl-engineers`, `certctl-viewers`, optionally `certctl-admins`.
|
||||
|
||||
**Directory → People → <user> → Groups**: assign each user to the appropriate `certctl-*` group(s).
|
||||
|
||||
Then go back to the App from step 1 and on the **Assignments** tab, assign the `certctl-*` groups to the application. Without this assignment Okta will reject the user's login attempt at the IdP layer with "User is not assigned to the client application".
|
||||
|
||||
### 4. Configure the groups claim
|
||||
|
||||
This is the load-bearing Okta-specific step. The default authorization server does NOT emit a `groups` claim out of the box — you have to define it.
|
||||
|
||||
**Security → API → Authorization Servers → default → Claims → Add Claim**:
|
||||
|
||||
- Name: `groups`.
|
||||
- Include in token type: **ID Token, Always** (also tick Access Token if you want the userinfo-fallback path to work).
|
||||
- Value type: **Groups**.
|
||||
- Filter: pick **Matches regex** with the value `certctl-.*` so only the `certctl-*` groups are emitted (saves on token size; users in dozens of unrelated groups get a bloated token otherwise).
|
||||
- Disable claim: off.
|
||||
- Include in: **Any scope** (or pin to `openid` if you want the claim only on the certctl-flow).
|
||||
- Click **Create**.
|
||||
|
||||
### 5. (Optional) Add `email` and `profile` claims
|
||||
|
||||
The default custom server already emits `email` and `name` under the `profile` and `email` scopes — no action needed unless you've stripped them from a custom config.
|
||||
|
||||
## certctl-side configuration
|
||||
|
||||
```bash
|
||||
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
|
||||
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Okta",
|
||||
"issuer_url": "https://your-org.okta.com/oauth2/default",
|
||||
"client_id": "<paste-from-step-1>",
|
||||
"client_secret": "<paste-from-step-1>",
|
||||
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
|
||||
"groups_claim_path": "groups",
|
||||
"groups_claim_format": "string-array",
|
||||
"fetch_userinfo": false,
|
||||
"scopes": ["openid", "profile", "email"],
|
||||
"iat_window_seconds": 300,
|
||||
"jwks_cache_ttl_seconds": 3600
|
||||
}'
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `issuer_url` MUST match exactly what Okta emits as the `iss` claim. For the default custom server it's `https://<your-org>.okta.com/oauth2/default` (no trailing slash). The org server's issuer is just `https://<your-org>.okta.com` (no `/oauth2/...` path). Mismatching either side trips certctl's `ErrIssuerMismatch` sentinel.
|
||||
- The `groups` scope is NOT required in the scopes list — Okta emits the claim based on the claim definition's "Include in: any scope" setting. Adding `groups` to the scopes list is harmless if your custom server has the scope defined.
|
||||
|
||||
Add the group→role mappings: `certctl-engineers` → `r-operator`, `certctl-viewers` → `r-viewer`, `certctl-admins` → `r-admin`.
|
||||
|
||||
## Verification
|
||||
|
||||
End-to-end login + audit + Sessions checks are identical to Keycloak.
|
||||
|
||||
**Okta-specific:** the audit row's `details.subject` will be Okta's user UID (a 20-char alphanumeric string starting with `00u`), stable across email changes. The certctl `users` table's `oidc_subject` column will hold this UID.
|
||||
|
||||
**Optional Okta smoke test in CI:** certctl ships an opt-in smoke test at `internal/auth/oidc/integration_okta_smoke_test.go` (build tags `integration && okta_smoke`). Set `OKTA_ISSUER` + `OKTA_CLIENT_ID` + `OKTA_CLIENT_SECRET` env vars and run `make okta-smoke-test` to drive a discovery + RefreshKeys round-trip against your live tenant. Pre-reqs: enable the Resource Owner Password (ROPC) grant on the application (Sign-On tab → Grant types → Resource Owner Password) for the smoke test only; production certctl uses auth-code-with-PKCE.
|
||||
|
||||
**JWKS-rotation drill:** Okta auto-rotates signing keys every ~3 months and publishes the new key alongside the old in the JWKS doc for ~1 month overlap. Manual rotation: **Security → API → Authorization Servers → default → Keys → "Generate new key"**. After rotation, click "Refresh discovery cache" in certctl's GUI; new tokens validate immediately.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"User is not assigned to the client application" at the Okta login screen.**
|
||||
You created the app + the user but didn't assign the user to the app via a group. Either assign the user directly (App → Assignments → Assign to People) or assign the `certctl-*` groups to the app (App → Assignments → Assign to Groups).
|
||||
|
||||
**Login completes but `groups` claim is empty in the ID token.**
|
||||
Most common Okta gotcha — the default custom server doesn't emit `groups` until you define the claim (step 4 above). Decode the ID token at jwt.io to confirm. If the claim is defined but empty, check the regex filter in step 4 — `certctl-.*` matches names like `certctl-engineers` but NOT `engineers`.
|
||||
|
||||
**`ErrIssuerMismatch` after correctly configuring the discovery URL.**
|
||||
The issuer claim Okta puts in the ID token MUST match `OIDCProvider.IssuerURL` byte-for-byte, including trailing slash. The default custom server emits `https://<your-org>.okta.com/oauth2/default` (no trailing slash); the org server emits `https://<your-org>.okta.com`. Don't append a trailing slash to either.
|
||||
|
||||
**Login succeeds but the certctl `User.Email` is empty.**
|
||||
The `email` scope wasn't requested OR the user's email isn't verified at Okta. Add `email` to the certctl scopes config and ensure Okta's user has a verified primary email.
|
||||
|
||||
**Okta returns "PKCE code verifier required".**
|
||||
The certctl service hard-codes PKCE-S256 on every login (RFC 9700 mandate). If Okta is rejecting the verifier, the most likely cause is a misconfigured app type — confirm the Okta application is "Web Application" (which supports auth-code + PKCE), not "Single-Page Application" (which has different token-binding rules) or "Native App".
|
||||
|
||||
**Custom-server access policies blocking the login.**
|
||||
By default the `default` custom authorization server has an "Access Policy" with one rule allowing all clients + all users. If you've tightened this (production hygiene), add a rule that allows the `certctl` client + the `certctl-*` groups: **Security → API → Authorization Servers → default → Access Policies → <policy> → Add Rule**.
|
||||
|
||||
## Validation checklist
|
||||
|
||||
Same as [keycloak.md](keycloak.md#validation-checklist), with Okta-specific values + the access-policy check above.
|
||||
|
||||
Sign-off: _______________ (operator) on _______________ (date).
|
||||
@@ -0,0 +1,106 @@
|
||||
# Performance Baselines
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Operator-runnable benchmarks for spot-checking certctl performance against published baselines. Useful as a regression detector after upgrades or infra changes.
|
||||
|
||||
## Why these specific spots?
|
||||
|
||||
certctl's hot paths are dominated by three workloads:
|
||||
|
||||
1. **API request handling** — auth, rate-limit decision, route dispatch, DB read
|
||||
2. **Renewal scheduler** — periodic scan + dispatch
|
||||
3. **Certificate inventory queries** — large list returns with sparse fields
|
||||
|
||||
The baselines below cover those three.
|
||||
|
||||
## Baseline #1: API request handling (single endpoint)
|
||||
|
||||
Hit a hot read endpoint with a tight loop and compare against the baseline.
|
||||
|
||||
```bash
|
||||
SERVER=https://localhost:8443
|
||||
CACERT="--cacert ./deploy/test/certs/ca.crt"
|
||||
AUTH="Authorization: Bearer change-me-in-production"
|
||||
|
||||
# Warm the connection pool (5 requests, discard timing)
|
||||
for i in $(seq 1 5); do
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
||||
done
|
||||
|
||||
# Measured run: 100 requests, capture mean latency
|
||||
time (for i in $(seq 1 100); do
|
||||
curl -s $CACERT -H "$AUTH" $SERVER/api/v1/stats/summary > /dev/null
|
||||
done)
|
||||
```
|
||||
|
||||
**Baseline (M3 MacBook Pro, Docker Desktop):** real time under 5 seconds for 100 sequential requests = mean ~50ms p50.
|
||||
|
||||
If you're seeing > 100ms mean, something is wrong: PostgreSQL connection pool exhaustion, agent flooding the work-poll endpoint, or rate-limiter mis-tuned.
|
||||
|
||||
## Baseline #2: Inventory list with cursor pagination
|
||||
|
||||
```bash
|
||||
# Cursor-paginated full inventory walk
|
||||
NEXT=""
|
||||
PAGES=0
|
||||
START=$(date +%s)
|
||||
while true; do
|
||||
RESP=$(curl -s $CACERT -H "$AUTH" "$SERVER/api/v1/certificates?limit=100&cursor=$NEXT")
|
||||
NEXT=$(echo "$RESP" | jq -r '.next_cursor // empty')
|
||||
PAGES=$((PAGES + 1))
|
||||
[ -z "$NEXT" ] && break
|
||||
done
|
||||
END=$(date +%s)
|
||||
echo "Walked $PAGES pages in $((END - START))s"
|
||||
```
|
||||
|
||||
**Baseline:** for the demo dataset (15 certificates, 1 page), under 1 second total. For a 1000-cert inventory (10 pages of 100), under 3 seconds total = ~300ms per page.
|
||||
|
||||
If you're seeing > 1s per page on a 1000-cert inventory, the cursor index on `managed_certificates(created_at, id)` is missing or the query plan went wrong.
|
||||
|
||||
## Baseline #3: Scheduler tick (renewal scan)
|
||||
|
||||
The renewal scheduler runs every hour by default. Force a tick and observe the time-to-completion in the logs:
|
||||
|
||||
```bash
|
||||
# Trigger an immediate renewal scan via the admin endpoint
|
||||
curl -s $CACERT -H "$AUTH" -X POST $SERVER/api/v1/admin/scheduler/run-now/renewal | jq .
|
||||
|
||||
# Tail the log and look for the matching `renewal scan complete` line
|
||||
docker compose logs -f certctl-server | grep 'renewal'
|
||||
```
|
||||
|
||||
**Baseline (15-cert demo dataset):** "renewal scan complete" within 100ms of the trigger.
|
||||
|
||||
For a 1000-cert inventory: under 5 seconds. The dominant cost is the per-cert profile + policy + alert-channel resolve plus the threshold-comparison math. If you're seeing > 10 seconds, profile resolution is likely doing N+1 queries.
|
||||
|
||||
## Baseline #4: Bulk revoke
|
||||
|
||||
```bash
|
||||
# Bulk-revoke all certs from a (test) issuer
|
||||
TIME=$(date +%s)
|
||||
curl -s $CACERT -H "$AUTH" -H "$CT" -X POST $SERVER/api/v1/certificates/bulk-revoke \
|
||||
-d '{"filter":{"issuer_id":"iss-test"},"reason":"superseded"}' | jq .
|
||||
echo "Bulk revoke: $(($(date +%s) - TIME))s"
|
||||
```
|
||||
|
||||
**Baseline:** linear in cert count. For 100 certs from one issuer: under 5 seconds. For 1000 certs: under 30 seconds (dominated by per-cert audit row + per-cert CRL refresh).
|
||||
|
||||
## When to re-baseline
|
||||
|
||||
After any of:
|
||||
|
||||
- Postgres major-version upgrade
|
||||
- Go major-version upgrade
|
||||
- Significant migration (add a column to `managed_certificates`, add an index)
|
||||
- Connection pool config change
|
||||
- Changing the renewal scheduler interval
|
||||
|
||||
Capture timing in your own loadtest-baselines log so future regressions surface against a real baseline rather than the operator's gut feeling.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/contributor/ci-pipeline.md`](../contributor/ci-pipeline.md) — CI guard for performance regression
|
||||
- [`docs/operator/security.md`](security.md) — rate limit tuning
|
||||
- [`docs/reference/architecture.md`](../reference/architecture.md) — request path through handler → service → repository
|
||||
@@ -0,0 +1,356 @@
|
||||
# RBAC operator reference
|
||||
|
||||
> Last reviewed: 2026-05-11
|
||||
>
|
||||
> Audit 2026-05-11 A-8 follow-on: demo-mode residual-grants detector
|
||||
> + cleanup endpoint shipped. New env var:
|
||||
> `CERTCTL_DEMO_MODE_RESIDUAL_STRICT` (default `false`). Operator
|
||||
> workflow at
|
||||
> [`security.md#demo-to-production-cutover-audit-2026-05-11-a-8`](security.md#demo-to-production-cutover-audit-2026-05-11-a-8).
|
||||
|
||||
This is the operator-facing reference for the role-based access
|
||||
control primitive in certctl.
|
||||
Read this if you're running certctl in production and need to grant /
|
||||
revoke access to API keys, set up the auditor split, or onboard the
|
||||
first admin.
|
||||
|
||||
For the threat model behind these controls, see
|
||||
[`auth-threat-model.md`](auth-threat-model.md). For the migration
|
||||
flow from a pre-RBAC (v2.0.x) deployment, see
|
||||
[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md).
|
||||
|
||||
## Mental model
|
||||
|
||||
Every action against the certctl HTTP / CLI / MCP / GUI surface is
|
||||
performed by an **actor** (an API key, an agent's machine identity,
|
||||
the synthetic demo-anon actor when the server runs in
|
||||
`CERTCTL_AUTH_TYPE=none` mode). Each actor holds zero or more
|
||||
**roles**. Each role grants a set of **permissions** at a **scope**.
|
||||
A request to a gated endpoint succeeds when the actor's effective
|
||||
permission set (the union across all held roles) contains the
|
||||
permission the endpoint requires.
|
||||
|
||||
The schema lives in `migrations/000029_rbac.up.sql` and ships with
|
||||
seven seeded default roles + a 33-permission canonical catalogue.
|
||||
The middleware that gates requests lives at
|
||||
`internal/auth/require_permission.go`. The service-layer authorizer
|
||||
that resolves "actor → permissions" lives at
|
||||
`internal/service/auth/authorizer.go`.
|
||||
|
||||
## Default roles (seeded by migration 000029)
|
||||
|
||||
| Role | ID | Use case | Permission shape |
|
||||
|---|---|---|---|
|
||||
| Admin | `r-admin` | Operator with full control | Every permission in the canonical catalogue |
|
||||
| Operator | `r-operator` | Day-to-day cert lifecycle | `cert.*`, `profile.read`, `issuer.read`, `target.*`, `agent.read`, `audit.read` |
|
||||
| Viewer | `r-viewer` | Read-only console access | `*.read` for every resource type |
|
||||
| Agent | `r-agent` | Machine identity for `certctl-agent` | `cert.read` + `agent.heartbeat` + `agent.job.poll` + `agent.job.complete` + `agent.job.report` |
|
||||
| MCP | `r-mcp` | Operator-equivalent for the MCP server, minus destructive ops | Like Operator without `*.delete` |
|
||||
| CLI | `r-cli` | Day-to-day operator CLI | Like Operator + `auth.key.list` / `auth.key.create` / `auth.key.rotate` |
|
||||
| Auditor | `r-auditor` | Compliance reviewer | `audit.read` + `audit.export` ONLY |
|
||||
|
||||
**Note on actor-type binding (Audit 2026-05-10 LOW-8):** Roles in
|
||||
the catalogue are NOT bound to a specific `actor_type`. `r-mcp` is
|
||||
named for clarity ("the role MCP service accounts hold") but the
|
||||
schema permits granting it to any actor — including a human OIDC
|
||||
user. Same goes for `r-cli` and `r-agent`. The role-grant API accepts
|
||||
`{actor_id, actor_type, role_id}` tuples; the `actor_type` constraint
|
||||
lives on the grant row, not the role definition. Operators who want
|
||||
to enforce "only API-key actors hold r-mcp" should write that as an
|
||||
operator-side policy + verify via a periodic audit query against
|
||||
`actor_roles` joined to `api_keys` / `users`. Native role-to-
|
||||
actor-type binding is on the v2 roadmap.
|
||||
|
||||
The auditor split is the load-bearing one: an auditor cannot read
|
||||
certificates, profiles, or issuers - only audit events. That makes the
|
||||
role legitimate to hand to a SOC 2 / FedRAMP / PCI auditor without
|
||||
giving them the keys to the kingdom. The
|
||||
`internal/domain/auth/auditor_test.go` invariants pin this set going
|
||||
forward.
|
||||
|
||||
The five **admin-only fine-grained perms** seeded by migration
|
||||
000030 gate the high-blast-radius endpoints:
|
||||
|
||||
- `cert.bulk_revoke` - `POST /api/v1/certificates/bulk-revoke` and the EST sibling
|
||||
- `crl.admin` - `/api/v1/admin/crl/cache`
|
||||
- `scep.admin` - `/api/v1/admin/scep/intune/*`
|
||||
- `est.admin` - `/api/v1/admin/est/*`
|
||||
- `ca.hierarchy.manage` - `/api/v1/issuers/{id}/intermediates`, `/api/v1/intermediates/{id}`
|
||||
|
||||
Only `r-admin` holds these by default. To delegate one, create a
|
||||
custom role with the specific perm and grant it to the right actor.
|
||||
|
||||
## Permission catalogue
|
||||
|
||||
The catalogue is namespaced. Permission strings are stable across
|
||||
releases; new permissions add to the namespace, never reshape an
|
||||
existing one. Run
|
||||
`certctl-cli auth permissions list` (or `GET /api/v1/auth/permissions`)
|
||||
for the live catalogue.
|
||||
|
||||
| Namespace | Examples | What the namespace gates |
|
||||
|---|---|---|
|
||||
| `cert.*` | `cert.read`, `cert.issue`, `cert.revoke`, `cert.delete`, `cert.bulk_revoke` | The certificate lifecycle surface (`/api/v1/certificates`) |
|
||||
| `profile.*` | `profile.read`, `profile.edit`, `profile.delete` | `CertificateProfile` CRUD |
|
||||
| `issuer.*` | `issuer.read`, `issuer.edit`, `issuer.delete` | Issuer connector config |
|
||||
| `target.*` | `target.read`, `target.edit`, `target.delete` | Deployment target config |
|
||||
| `agent.*` | `agent.read`, `agent.edit`, `agent.retire`, `agent.heartbeat`, `agent.job.*` | Agent fleet + agent self-service endpoints |
|
||||
| `audit.*` | `audit.read`, `audit.export` | The audit-events surface |
|
||||
| `auth.role.*` | `auth.role.list`, `auth.role.create`, `auth.role.edit`, `auth.role.delete`, `auth.role.assign` | RBAC management |
|
||||
| `auth.key.*` | `auth.key.list`, `auth.key.create`, `auth.key.rotate`, `auth.key.delete` | API key management |
|
||||
| `auth.bootstrap.*` | `auth.bootstrap.use` | Day-0 first-admin path |
|
||||
| `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage` | (single perms) | The five admin-only fine-grained perms (see above) |
|
||||
| `job.*` | `job.read`, `job.cancel` | Deployment job lifecycle |
|
||||
| `approval.*` | `approval.read`, `approval.approve`, `approval.reject` | Two-person approval workflow (cert-issuance + profile-edit) |
|
||||
| `policy.*` | `policy.read`, `policy.edit`, `policy.delete` | Compliance policies + renewal policies |
|
||||
| `team.*`, `owner.*` | `team.read`, `team.edit`, `team.delete`, `owner.*` | Organizational metadata |
|
||||
| `notification.*` | `notification.read`, `notification.edit` | Notification queue + requeue |
|
||||
| `discovery.*` | `discovery.read`, `discovery.run`, `discovery.claim` | Agent + cloud-secret-store discovery |
|
||||
| `network_scan.*` | `network_scan.read`, `network_scan.edit`, `network_scan.run` | TLS network scanning + SCEP probing |
|
||||
| `healthcheck.*` | `healthcheck.read`, `healthcheck.edit`, `healthcheck.delete`, `healthcheck.acknowledge` | Uptime monitors |
|
||||
| `digest.*` | `digest.read`, `digest.send` | Operator-summary digest emails |
|
||||
| `verification.*` | `verification.read`, `verification.run` | Post-deploy verification |
|
||||
| `stats.read`, `metrics.read` | (single perms) | Dashboard summary + Prometheus exposition |
|
||||
|
||||
The full catalogue lives in
|
||||
[`internal/domain/auth/validate.go`](../../internal/domain/auth/validate.go).
|
||||
The router-level enforcement sits in
|
||||
[`internal/api/router/router.go`](../../internal/api/router/router.go);
|
||||
the AST-level CI guard
|
||||
[`TestRouterRBACGateCoverage`](../../internal/api/router/router_rbac_coverage_test.go)
|
||||
pins the contract — adding a new state-changing or read endpoint
|
||||
without an `rbacGate` / `rbacGateScoped` wrap fails CI.
|
||||
|
||||
## Scope semantics
|
||||
|
||||
Permissions are granted at one of three scopes:
|
||||
|
||||
- **`global`** - applies to every resource in the tenant. The
|
||||
default for the seeded role grants. A `cert.read` grant at global
|
||||
scope lets the actor read any certificate.
|
||||
- **`profile`** - applies only to the named `CertificateProfile`
|
||||
(matched by ID). `cert.issue` at scope `profile`/`p-corp-cdn` lets
|
||||
the actor issue against `p-corp-cdn` only.
|
||||
- **`issuer`** - applies only to the named issuer. Lets you grant
|
||||
`issuer.edit` on the production issuer to a senior operator
|
||||
without giving them edit on every issuer.
|
||||
|
||||
Global beats specific: an actor with `cert.read` at global scope
|
||||
passes a `cert.read` check against any specific profile or issuer
|
||||
even if no scoped grant exists. The reverse is also true - a
|
||||
scoped grant doesn't satisfy a request against a different scope.
|
||||
The Authorizer's `CheckPermission` is the single point of truth.
|
||||
|
||||
> **Note (deferral):** the `scope_id` column is not
|
||||
> currently FK-constrained against the resource tables. An
|
||||
> operator can grant a permission at scope `profile`/`p-bogus`
|
||||
> without `p-bogus` existing; the gate still works (no rows match
|
||||
> at request time), but the API does not 404 the grant. Strict-FK
|
||||
> closure is tracked for a follow-on release. See
|
||||
> `internal/repository/postgres/auth.go::AddPermission`'s
|
||||
> `TODO` comment.
|
||||
|
||||
## Granting + revoking access
|
||||
|
||||
### From the GUI
|
||||
|
||||
`/auth/roles` lists every role; click into one to see its
|
||||
permissions and (if you hold `auth.role.edit`) add or remove a
|
||||
permission. `/auth/keys` lists every actor with role grants;
|
||||
click "Assign role" to grant, click the × on a role tag to revoke.
|
||||
|
||||
The synthetic `actor-demo-anon` row is shown but flagged
|
||||
"system-managed" with the mutation buttons hidden - the server-side
|
||||
reserved-actor guard rejects mutations against it regardless.
|
||||
|
||||
### From the CLI
|
||||
|
||||
```bash
|
||||
# Identity probe - what can the current API key actually do?
|
||||
certctl-cli auth me
|
||||
|
||||
# Roles
|
||||
certctl-cli auth roles list
|
||||
certctl-cli auth roles get r-admin
|
||||
|
||||
# Permissions catalogue
|
||||
certctl-cli auth permissions list
|
||||
|
||||
# Key → role assignment
|
||||
certctl-cli auth keys list
|
||||
certctl-cli auth keys assign alice --role r-operator
|
||||
certctl-cli auth keys revoke alice --role r-admin
|
||||
|
||||
# Walk-every-key prompt for downgrade
|
||||
certctl-cli auth keys scope-down
|
||||
|
||||
# Audit-driven role suggestion (last 30 days of audit events)
|
||||
certctl-cli auth keys scope-down --suggest
|
||||
certctl-cli auth keys scope-down --suggest --apply
|
||||
|
||||
# JSON-driven scope-down for automation (Helm post-upgrade hook etc.)
|
||||
certctl-cli auth keys scope-down --non-interactive ./scope-down.json
|
||||
```
|
||||
|
||||
The mutating role-lifecycle commands (`certctl-cli auth roles
|
||||
create / update / delete` + `roles add-permission / remove-permission`)
|
||||
are tracked as a follow-on; today, manage custom
|
||||
roles via the HTTP API or GUI.
|
||||
|
||||
### From the HTTP API
|
||||
|
||||
Every endpoint is documented in `api/openapi.yaml` under the `[Auth]`
|
||||
tag. Quick reference:
|
||||
|
||||
| Endpoint | Permission |
|
||||
|---|---|
|
||||
| `GET /v1/auth/me` | (none - own data) |
|
||||
| `GET /v1/auth/roles` | `auth.role.list` |
|
||||
| `GET /v1/auth/roles/{id}` | `auth.role.list` |
|
||||
| `POST /v1/auth/roles` | `auth.role.create` |
|
||||
| `PUT /v1/auth/roles/{id}` | `auth.role.edit` |
|
||||
| `DELETE /v1/auth/roles/{id}` | `auth.role.delete` |
|
||||
| `GET /v1/auth/permissions` | `auth.role.list` |
|
||||
| `POST /v1/auth/roles/{id}/permissions` | `auth.role.edit` |
|
||||
| `DELETE /v1/auth/roles/{id}/permissions/{perm}` | `auth.role.edit` |
|
||||
| `GET /v1/auth/keys` | `auth.role.list` |
|
||||
| `POST /v1/auth/keys/{id}/roles` | `auth.role.assign` |
|
||||
| `DELETE /v1/auth/keys/{id}/roles/{role_id}` (+ optional `?scope_type=` / `?scope_id=`) | `auth.role.assign` |
|
||||
| `GET /v1/auth/check` | (authenticated; surfaces effective perms) |
|
||||
| `GET /v1/auth/bootstrap` + `POST /v1/auth/bootstrap` | (auth-exempt; gated by env-var token) |
|
||||
|
||||
#### Revoke: legacy "all variants" vs scope-selective (Audit 2026-05-11 A-4)
|
||||
|
||||
`DELETE /v1/auth/keys/{id}/roles/{role_id}` runs in one of two modes,
|
||||
selected by presence of the optional query parameters:
|
||||
|
||||
- **No query params (legacy "revoke all variants")** — every scoped grant of
|
||||
this role held by this actor is dropped. Idempotent: zero-row deletes
|
||||
return 204 (no error). This is the pre-A-4 behaviour and remains the
|
||||
default for the CLI / GUI buttons that don't know about scope.
|
||||
|
||||
```bash
|
||||
# Drop EVERY variant of r-operator from alice (global, profile-scoped,
|
||||
# issuer-scoped — all gone).
|
||||
curl -X DELETE https://certctl.example.com/api/v1/auth/keys/alice/roles/r-operator
|
||||
```
|
||||
|
||||
- **`?scope_type=` (+ optional `?scope_id=`)** — drop ONE variant. Used
|
||||
when an actor holds the same role at multiple scopes (HIGH-10 made
|
||||
that representable; A-4 makes it selectively revocable).
|
||||
`scope_type=global` requires `scope_id` to be absent; `scope_type=profile`
|
||||
/ `issuer` require `scope_id`. No match returns 404 so operators get
|
||||
feedback when they target a scope variant the actor doesn't hold.
|
||||
|
||||
```bash
|
||||
# Alice holds r-operator scoped to p-acme AND p-globex.
|
||||
# Drop ONLY the p-acme grant; the p-globex grant stays.
|
||||
curl -X DELETE 'https://certctl.example.com/api/v1/auth/keys/alice/roles/r-operator?scope_type=profile&scope_id=p-acme'
|
||||
|
||||
# Drop ONLY the global grant of r-operator (keeps any profile / issuer variants):
|
||||
curl -X DELETE 'https://certctl.example.com/api/v1/auth/keys/alice/roles/r-operator?scope_type=global'
|
||||
```
|
||||
|
||||
The audit row's `details` payload records which mode fired —
|
||||
`scope: "all_variants"` for the legacy path, or the explicit
|
||||
`scope_type` + `scope_id` for selective revoke — so SOC / SIEM can
|
||||
distinguish wide cleanups from targeted demotions in the access log.
|
||||
|
||||
### From the MCP server
|
||||
|
||||
The MCP server ships 12 RBAC tools:
|
||||
`certctl_auth_me`, `certctl_auth_list_roles`, `certctl_auth_get_role`,
|
||||
`certctl_auth_create_role`, `certctl_auth_update_role`,
|
||||
`certctl_auth_delete_role`, `certctl_auth_list_permissions`,
|
||||
`certctl_auth_add_permission_to_role`,
|
||||
`certctl_auth_remove_permission_from_role`,
|
||||
`certctl_auth_list_keys`, `certctl_auth_assign_role_to_key`,
|
||||
`certctl_auth_revoke_role_from_key`. Each routes through the same
|
||||
HTTP surface above; permission gates fire server-side.
|
||||
|
||||
## The auditor pattern
|
||||
|
||||
Hand the auditor key to compliance reviewers. They get:
|
||||
|
||||
- `GET /api/v1/audit?category=auth` - every auth/authz mutation
|
||||
in the system (role creates, role grants on actors, bootstrap
|
||||
consumption, etc.).
|
||||
- `GET /api/v1/audit?category=cert_lifecycle` - every cert event.
|
||||
- `GET /api/v1/audit?category=config` - every issuer / target /
|
||||
settings edit.
|
||||
- `GET /api/v1/audit/export` - bulk export.
|
||||
|
||||
They do NOT get cert read, profile read, issuer read, or any
|
||||
mutating permission. The categorization is enforced by the database
|
||||
CHECK constraint (migration 000032); the WORM trigger from
|
||||
migration 000018 keeps the audit table append-only at the DB layer.
|
||||
|
||||
To create an auditor key:
|
||||
|
||||
1. `certctl-cli auth keys assign <key-id> --role r-auditor`
|
||||
2. (Optional) Revoke any other roles the key holds with
|
||||
`certctl-cli auth keys revoke <key-id> --role r-...`
|
||||
3. Confirm via `certctl-cli auth me` while authenticated as the
|
||||
auditor key - the response should show only `audit.read` and
|
||||
`audit.export` in `effective_permissions`.
|
||||
|
||||
## Day-0 bootstrap (first-admin path)
|
||||
|
||||
certctl ships a one-shot bootstrap endpoint for fresh
|
||||
deployments where no admin actor exists yet.
|
||||
|
||||
1. Set `CERTCTL_BOOTSTRAP_TOKEN=$(openssl rand -hex 32)` in the
|
||||
server environment.
|
||||
2. Boot the server. Logs include
|
||||
"bootstrap endpoint enabled - POST /api/v1/auth/bootstrap to
|
||||
mint the first admin key (one-shot)" when the path is callable.
|
||||
3. Run a single curl:
|
||||
|
||||
```bash
|
||||
curl -X POST $URL/api/v1/auth/bootstrap \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"token":"<the-token>","actor_name":"first-admin"}'
|
||||
```
|
||||
|
||||
4. Capture the `key_value` from the response. **It is shown ONCE.**
|
||||
The server never logs it.
|
||||
5. Use the new key to authenticate against the rest of the API.
|
||||
The bootstrap path is now closed: subsequent calls return HTTP
|
||||
410 Gone, even with the same valid token, because an admin
|
||||
actor exists.
|
||||
|
||||
The token is constant-time-compared. The server logs a startup
|
||||
warning if `CERTCTL_BOOTSTRAP_TOKEN` is set AND admin actors
|
||||
already exist (config-drift signal). For the OIDC-first-admin
|
||||
path (the "first user who signs in via SSO becomes admin"
|
||||
pattern), see
|
||||
[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md).
|
||||
|
||||
## Demo mode (`CERTCTL_AUTH_TYPE=none`)
|
||||
|
||||
When auth is disabled, the server injects a synthetic actor
|
||||
`actor-demo-anon` into every request context. That actor holds
|
||||
`r-admin` at global scope (seeded by migration 000029), so every
|
||||
gated route resolves with a populated actor and admin grants. The
|
||||
synthetic actor is reserved: the API rejects any mutation that
|
||||
targets it (HTTP 409 with `ErrAuthReservedActor`).
|
||||
|
||||
Production deployments MUST NOT use demo mode - there is no
|
||||
per-request actor identity for the audit trail, and every request
|
||||
flows as admin. Use it for the `docker compose up` demo + the five
|
||||
example folders only.
|
||||
|
||||
## Where to look next
|
||||
|
||||
- [Threat model](auth-threat-model.md) - what attacks this primitive
|
||||
defends against and which it does not
|
||||
- [Migration guide](../migration/api-keys-to-rbac.md) - moving
|
||||
pre-RBAC (v2.0.x) deployments onto RBAC
|
||||
- [Profiles](../reference/profiles.md) - the `RequiresApproval=true`
|
||||
flow with the flip-flop-bypass closure
|
||||
- [Approval workflow](approval-workflow.md) - the two-person
|
||||
integrity primitive backing `RequiresApproval`
|
||||
- `internal/auth/` - the middleware + keystore + RequirePermission
|
||||
- `internal/service/auth/` - the service-layer Authorizer
|
||||
- `cowork/auth-bundle-1-prompt.md` - the design + phase plan
|
||||
- `cowork/auth-bundles-index.md` - the per-phase status tracker
|
||||
@@ -1,5 +1,7 @@
|
||||
# Runbook: cloud-target deployment connectors (AWS ACM + Azure Key Vault)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This runbook covers the SDK-driven cloud target connectors that ship in
|
||||
certctl post-2026-05-03 (Rank 5 of the Infisical deep-research
|
||||
deliverable). It complements the operator-facing
|
||||
@@ -316,7 +318,7 @@ az monitor activity-log list \
|
||||
|
||||
## V3-Pro forward path
|
||||
|
||||
Tracked at `cowork/WORKSPACE-ROADMAP.md` under "Adapter hardening":
|
||||
Tracked under "Adapter hardening" on the project roadmap:
|
||||
|
||||
- **AWS CloudFront direct-attach** — UpdateDistribution after an ACM
|
||||
ImportCertificate so the CloudFront edge picks up the new cert
|
||||
@@ -1,16 +1,17 @@
|
||||
# Disaster recovery runbook
|
||||
|
||||
> **Status (this document):** Production hardening II Phase 10
|
||||
> deliverable. Codifies the fail-safe behaviors that already exist in
|
||||
> the codebase and the operator procedures for recovering from
|
||||
> common failure modes. Nothing in this runbook requires new code —
|
||||
> if a procedure here doesn't work as documented, that's a bug in
|
||||
> docs (file an issue).
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This runbook is the SOC 2 / PCI procurement-team deliverable: it tells
|
||||
auditors and on-call operators what to do when a piece of certctl's
|
||||
state corrupts, when a CA key needs rotation, or when Postgres needs
|
||||
a point-in-time restore. Read it once when you set up certctl; print
|
||||
> **Status (this document):** Operator runbook codifying the
|
||||
> fail-safe behaviors that already exist in the codebase and the
|
||||
> procedures for recovering from common failure modes. Nothing in
|
||||
> this runbook requires new code — if a procedure here doesn't work
|
||||
> as documented, that's a bug in docs (file an issue).
|
||||
|
||||
This runbook is the on-call deliverable: it tells reviewers and
|
||||
on-call operators what to do when a piece of certctl's state
|
||||
corrupts, when a CA key needs rotation, or when Postgres needs a
|
||||
point-in-time restore. Read it once when you set up certctl; print
|
||||
the [DR checklist](#dr-checklist) and pin it near your on-call rotation.
|
||||
|
||||
## Contents
|
||||
@@ -55,7 +56,7 @@ without operator action. The fail-safes in the codebase:
|
||||
These fail-safes mean most of this runbook is "delete the corrupt
|
||||
row + wait for the next tick" rather than "restore from backup +
|
||||
manually re-issue." The runbook documents the full procedures
|
||||
anyway because compliance auditors need to see them written down.
|
||||
anyway because reviewers need to see them written down.
|
||||
|
||||
## CRL cache recovery
|
||||
|
||||
@@ -236,7 +237,7 @@ remains trusted by relying parties until its `notAfter` (typical
|
||||
openssl x509 -in new-cert -noout -issuer
|
||||
```
|
||||
|
||||
**Future:** when the HSM/PKCS#11 driver bundle (`cowork/hsm-pkcs11-
|
||||
**Future:** when the HSM/PKCS#11 driver bundle (planned;
|
||||
driver-prompt.md`) ships, this rotation procedure changes
|
||||
substantially — the HSM-backed key never moves, only the cert wrap
|
||||
rotates. The signer interface seam is the load-bearing prerequisite
|
||||
@@ -286,7 +287,7 @@ backups. Without them, a restored DB is unusable.
|
||||
## Trust-bundle reload semantics
|
||||
|
||||
This section codifies the fail-safe behavior that's already in code,
|
||||
for compliance auditors who need to see the procedure documented.
|
||||
for reviewers who need to see the procedure documented.
|
||||
|
||||
**Pattern:** every trust-bundle holder (`internal/trustanchor.Holder`,
|
||||
used by SCEP/Intune dispatcher + EST mTLS sibling route) implements
|
||||
@@ -340,9 +341,9 @@ Print this. Pin it near your on-call rotation.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`crl-ocsp.md`](crl-ocsp.md) — CRL/OCSP responder operator guide.
|
||||
- [`tls.md`](tls.md) — control-plane TLS bootstrap.
|
||||
- [`security.md`](security.md) — production-grade security posture.
|
||||
- [`scep-intune.md`](scep-intune.md) — SCEP/Intune trust-anchor
|
||||
- [`crl-ocsp.md`](../../reference/protocols/crl-ocsp.md) — CRL/OCSP responder operator guide.
|
||||
- [`tls.md`](../../operator/tls.md) — control-plane TLS bootstrap.
|
||||
- [`security.md`](../../operator/security.md) — production-grade security posture.
|
||||
- [`scep-intune.md`](../../reference/protocols/scep-intune.md) — SCEP/Intune trust-anchor
|
||||
rotation specifics.
|
||||
- [`est.md`](est.md) — EST mTLS trust-bundle rotation specifics.
|
||||
- [`est.md`](../../reference/protocols/est.md) — EST mTLS trust-bundle rotation specifics.
|
||||
@@ -1,5 +1,7 @@
|
||||
# Runbook: certificate-expiry alerts (multi-channel)
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This runbook covers the per-policy multi-channel expiry-alert dispatch
|
||||
path that ships in certctl post-2026-05-03 (Rank 4 of the Infisical
|
||||
deep-research deliverable). It complements the operator-facing
|
||||
@@ -215,7 +217,7 @@ dedup on the `notification_events` table guards against that).
|
||||
|
||||
## V3-Pro forward path
|
||||
|
||||
Tracked at `cowork/WORKSPACE-ROADMAP.md` under "Adapter hardening":
|
||||
Tracked under "Adapter hardening" on the project roadmap:
|
||||
|
||||
- Per-owner / per-team / per-tenant channel routing (the matrix is
|
||||
per-policy today, not per-owner).
|
||||
@@ -0,0 +1,409 @@
|
||||
# certctl Security Posture & Operator Guidance
|
||||
|
||||
> Last reviewed: 2026-05-11
|
||||
|
||||
This document collects the operator-facing security guidance that the source
|
||||
code's per-finding comment blocks reference. Each section names the audit
|
||||
finding it closes, the threat model, and the operator action required (if
|
||||
any).
|
||||
|
||||
## OCSP responder availability
|
||||
|
||||
**Audit reference:** CWE-770 (uncontrolled resource consumption); RFC
|
||||
6960 (OCSP); RFC 7633 (Must-Staple).
|
||||
|
||||
certctl ships an OCSP responder at `/.well-known/pki/ocsp/{issuer_id}/{serial}`
|
||||
that signs a fresh response per request. The unauth handler chain
|
||||
applies the same per-key rate limiter the authenticated chain uses;
|
||||
per-IP keying applies because OCSP traffic is unauthenticated. Without
|
||||
this defense an attacker could DoS the responder and force fail-open
|
||||
relying parties to accept revoked certificates as valid.
|
||||
|
||||
The rate limiter alone does not solve the underlying revocation-bypass risk.
|
||||
**The architectural fix is for issued certificates to carry the OCSP
|
||||
Must-Staple TLS Feature extension** (RFC 7633, OID 1.3.6.1.5.5.7.1.24). When
|
||||
present, conforming TLS clients refuse to negotiate a session unless the
|
||||
server staples a fresh signed OCSP response in the TLS handshake. This shifts
|
||||
revocation enforcement from the client's discretion (which most fail-open by
|
||||
default) to a hard requirement that the connection cannot complete without
|
||||
proof of non-revocation.
|
||||
|
||||
### Operator action
|
||||
|
||||
For certificates issued to systems where revocation correctness matters:
|
||||
|
||||
1. **Configure the issuer profile to set `must-staple: true`.** Out-of-the-box
|
||||
profiles in `migrations/seed.sql` do not set this; operators add it at
|
||||
profile-creation time via the API or by editing seed data.
|
||||
2. **Confirm the relying party honors the extension.** OpenSSL ≥ 1.1.0,
|
||||
Firefox, and Chrome 84+ all enforce Must-Staple. Older clients silently
|
||||
ignore it.
|
||||
3. **Confirm the deployment target is configured for OCSP stapling** so the
|
||||
server can actually deliver the stapled response in the handshake.
|
||||
- **nginx:** `ssl_stapling on; ssl_stapling_verify on;`
|
||||
- **Apache:** `SSLUseStapling on`
|
||||
- **HAProxy:** `set ssl ocsp-response /path/to/response.der`
|
||||
- **Envoy:** `ocsp_staple_policy: must_staple`
|
||||
|
||||
### What this does NOT cover
|
||||
|
||||
- **CRL fallback.** Must-Staple does not affect CRL behavior. Operators with
|
||||
CRL-based relying parties should use the rate-limit + caching defense
|
||||
alone; there is no client-side equivalent to Must-Staple for CRLs.
|
||||
- **Self-issued certs in air-gapped networks.** When the relying party
|
||||
cannot reach the OCSP responder at all (the threat model the audit
|
||||
cited), Must-Staple is the only mechanism that closes the bypass. CRL
|
||||
distribution similarly requires the relying party to fetch the CRL,
|
||||
which is also subject to the same network-availability concern.
|
||||
|
||||
## Postgres transport encryption
|
||||
|
||||
See [docs/database-tls.md](database-tls.md).
|
||||
|
||||
## Encryption at rest
|
||||
|
||||
PBKDF2-SHA256 at 600,000 rounds (OWASP 2024 Password
|
||||
Storage Cheat Sheet floor) for the operator-supplied passphrase that
|
||||
derives the AES-256-GCM key for sensitive config columns. v3 blob format
|
||||
with a per-ciphertext random salt; v1/v2 read fallback for legacy rows.
|
||||
See [internal/crypto/encryption.go](../../internal/crypto/encryption.go) and
|
||||
the accompanying tests for the format spec.
|
||||
|
||||
## Authentication surface
|
||||
|
||||
Two layers decide auth-exempt status:
|
||||
|
||||
1. **Router layer:** `internal/api/router/router.go::AuthExemptRouterRoutes`
|
||||
- the endpoints registered via direct `r.mux.Handle` without going
|
||||
through the middleware chain (`/health`, `/ready`, `/api/v1/auth/info`,
|
||||
`/api/v1/version`, plus `/api/v1/auth/bootstrap` GET + POST for the
|
||||
first-admin path).
|
||||
2. **Dispatch layer:** `internal/api/router/router.go::AuthExemptDispatchPrefixes`
|
||||
- URL-prefix routing in `cmd/server/main.go::buildFinalHandler` for
|
||||
`/.well-known/pki/*`, `/.well-known/est/*`, `/.well-known/est-mtls`,
|
||||
and `/scep[/...]*` (incl. `/scep-mtls`).
|
||||
|
||||
Both lists have AST-walking regression tests (`auth_exempt_test.go`) that
|
||||
fail CI if a new bypass lands without updating the documented constant.
|
||||
|
||||
### Role-based authorization
|
||||
|
||||
Role-based authorization runs on top of API-key authentication. Every
|
||||
gated handler routes through the `auth.RequirePermission` middleware
|
||||
(or its router-level wrap `rbacGate`); the middleware resolves the
|
||||
actor's effective permissions via the service-layer
|
||||
`Authorizer.CheckPermission` and returns HTTP 403 BEFORE the handler
|
||||
body runs on miss. The seven default roles (`admin` / `operator` /
|
||||
`viewer` / `agent` / `mcp` / `cli` / `auditor`), 33-permission
|
||||
canonical catalogue, and the auditor split (`r-auditor` holds only
|
||||
`audit.read` + `audit.export`) are seeded by migration 000029.
|
||||
|
||||
For the operator how-to, see [`rbac.md`](rbac.md). For the
|
||||
threat model + compliance mapping, see
|
||||
[`auth-threat-model.md`](auth-threat-model.md). For the upgrade
|
||||
flow from an API-key-only deployment, see
|
||||
[`docs/migration/api-keys-to-rbac.md`](../migration/api-keys-to-rbac.md).
|
||||
|
||||
### Day-0 admin bootstrap
|
||||
|
||||
Fresh deployments where no admin actor exists yet can mint the
|
||||
first admin via `POST /api/v1/auth/bootstrap` - set
|
||||
`CERTCTL_BOOTSTRAP_TOKEN`, POST a single curl with the token, and
|
||||
the server returns the plaintext key value once. The token is
|
||||
constant-time-compared; the strategy is one-shot via mutex; the
|
||||
admin-existence probe re-closes the path once an admin lands.
|
||||
The token is NEVER logged. The minted plaintext key flows only
|
||||
into the HTTP response body. See
|
||||
[`rbac.md`](rbac.md#day-0-bootstrap-first-admin-path) for the
|
||||
full flow.
|
||||
|
||||
### Approval-bypass closure
|
||||
|
||||
`CertificateProfile.RequiresApproval=true` profiles route both
|
||||
issuance/renewal AND profile edits through the
|
||||
`ApprovalService` two-person integrity gate. The flip-flop loophole
|
||||
(an admin disabling approval, mutating, re-enabling) is closed by
|
||||
gating profile-edit through the same approval flow. Same-actor
|
||||
self-approve is rejected at the service layer with
|
||||
`ErrApproveBySameActor`. See
|
||||
[`docs/reference/profiles.md`](../reference/profiles.md) for the
|
||||
full gate semantics.
|
||||
|
||||
### OIDC federation
|
||||
|
||||
OIDC SSO runs on top of the API-key + RBAC foundation. Operators
|
||||
configure one or more identity providers (Keycloak, Authentik, Okta,
|
||||
Auth0, Entra ID, or Google Workspace via Keycloak broker); end users
|
||||
sign in at the IdP, certctl validates the returned ID token, and a
|
||||
session cookie is minted.
|
||||
|
||||
The token-validation pipeline pins:
|
||||
|
||||
- Algorithm allow-list: RS256 / RS512 / ES256 / ES384 / EdDSA only.
|
||||
HS256 / HS384 / HS512 / `none` are rejected at the service-layer
|
||||
sentinel level.
|
||||
- IdP-downgrade-attack defense at provider creation AND every
|
||||
RefreshKeys: the IdP's advertised
|
||||
`id_token_signing_alg_values_supported` is intersected with the
|
||||
allow-list; a provider that advertises HS-family is rejected
|
||||
before any token is signed under the weak alg.
|
||||
- Exact `iss` match (`ErrIssuerMismatch`).
|
||||
- `aud` membership + `azp` for multi-aud tokens (per OIDC core
|
||||
§3.1.3.7 step 5).
|
||||
- `at_hash` REQUIRED-when-access_token-present (a tightening of the
|
||||
spec MAY → MUST so a substituted access token cannot ride alongside
|
||||
a clean ID token).
|
||||
- Single-use state + nonce (32-byte random server-generated;
|
||||
atomic `DELETE...RETURNING` on consume).
|
||||
- PKCE-S256 mandatory; `plain` rejected.
|
||||
- Configurable `iat` window (default 300s, capped 600s).
|
||||
- JWKS cache with operator-triggered RefreshKeys + auto-refresh on
|
||||
TTL expiry (default 3600s); JWKS-fetch failure during a key
|
||||
rotation returns 503 to the in-flight login (existing sessions
|
||||
untouched).
|
||||
|
||||
OIDC `client_secret` is encrypted at rest via AES-256-GCM (v3 blob
|
||||
format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag) using
|
||||
the `CERTCTL_CONFIG_ENCRYPTION_KEY` passphrase. The encryption
|
||||
invariant is pinned by an integration test
|
||||
(`internal/repository/postgres/oidc_encryption_invariant_test.go`)
|
||||
that asserts ciphertext != plaintext + correct blob shape +
|
||||
round-trip recovery + wrong-passphrase fails.
|
||||
|
||||
Per-IdP setup guides at
|
||||
[`oidc-runbooks/index.md`](oidc-runbooks/index.md) cover Keycloak,
|
||||
Authentik, Okta, Auth0, Entra ID, and Google Workspace.
|
||||
|
||||
### Sessions + back-channel logout
|
||||
|
||||
Successful OIDC login mints a session cookie:
|
||||
`v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
|
||||
The HMAC input is **length-prefixed** as `len:sid:len:kid` to defeat
|
||||
concatenation-collision attacks on bare-concat designs. Cookie
|
||||
attributes:
|
||||
|
||||
- `HttpOnly=true` (no JS access; defends XSS cookie theft).
|
||||
- `Secure=true` (HTTPS-only; defends network MITM).
|
||||
- `SameSite=Lax` default (configurable to Strict via
|
||||
`CERTCTL_SESSION_SAMESITE`).
|
||||
- `Path=/`, host-only.
|
||||
|
||||
Idle timeout default 1h; absolute timeout default 8h; both
|
||||
configurable via `CERTCTL_SESSION_IDLE_TIMEOUT` and
|
||||
`CERTCTL_SESSION_ABSOLUTE_TIMEOUT`. The scheduler's
|
||||
`sessionGCLoop` (default 1h interval) sweeps expired rows.
|
||||
|
||||
CSRF defense: plaintext CSRF token in the JS-readable
|
||||
`certctl_csrf` cookie (intentionally `HttpOnly=false` for the GUI
|
||||
to echo into the `X-CSRF-Token` header); SHA-256 hash on the
|
||||
session row; `subtle.ConstantTimeCompare` in `CSRFMiddleware`.
|
||||
API-key actors are CSRF-exempt (no session row in context).
|
||||
|
||||
Session signing keys rotate via `RotateSigningKey`; the old key
|
||||
stays valid for `CERTCTL_SESSION_SIGNING_KEY_RETENTION` (default
|
||||
24h) so existing cookies validate during rollover. Past retention,
|
||||
the old key's row is dropped and any cookie still signed under it
|
||||
returns `ErrSigningKeyNotFound`. `EnsureInitialSigningKey` is
|
||||
fail-fatal at server boot.
|
||||
|
||||
Back-channel logout per **OpenID Connect Back-Channel Logout 1.0**
|
||||
(NOT RFC 8414): `POST /auth/oidc/back-channel-logout` accepts a
|
||||
JWT-signed logout token from the IdP, validates the JWT against
|
||||
the IdP's JWKS (same alg allow-list as login), pins required
|
||||
claims (`iss` / `aud` / `iat` / `jti` / `events`; exactly one of
|
||||
`sub` / `sid`; `nonce` MUST be absent), defeats replay via
|
||||
`jti`-based deduplication, and revokes matching sessions.
|
||||
|
||||
For threat-model coverage of these surfaces, see
|
||||
[`auth-threat-model.md`](auth-threat-model.md). For the
|
||||
operator-runnable performance baselines, see
|
||||
[`auth-benchmarks.md`](auth-benchmarks.md).
|
||||
|
||||
### OIDC first-admin bootstrap
|
||||
|
||||
Coexists with the env-var-token bootstrap path. When the
|
||||
operator sets `CERTCTL_BOOTSTRAP_ADMIN_GROUPS` + (optionally)
|
||||
`CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID`, the first user with one of
|
||||
those IdP groups becomes admin on first login per tenant.
|
||||
Subsequent users go through normal mapping. The admin-existence
|
||||
probe ensures only one wins between the two bootstrap paths;
|
||||
once any actor holds `r-admin`, the OIDC bootstrap hook silently
|
||||
falls through to normal mapping. Audit row on every grant
|
||||
(`bootstrap.oidc_first_admin`, `event_category=auth`).
|
||||
|
||||
### Break-glass admin
|
||||
|
||||
Default-OFF (`CERTCTL_BREAKGLASS_ENABLED=false`). When enabled,
|
||||
the local-password admin path bypasses OIDC + group-claim layers;
|
||||
intended ONLY for SSO-broken incidents.
|
||||
|
||||
- Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte
|
||||
salt, 32-byte output, per-password random salt, PHC-format
|
||||
hash). Hash column is `json:"-"` so handlers cannot wire-leak.
|
||||
- Lockout state machine: 5 failures (default; configurable via
|
||||
`CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD`) within 1h reset window
|
||||
(`_LOCKOUT_RESET_INTERVAL`) trips a 30s lockout (`_LOCKOUT_DURATION`).
|
||||
Atomic single-statement IncrementFailure defeats concurrent
|
||||
racing attempts.
|
||||
- Constant-time across all failure paths via `verifyDummy()` —
|
||||
wrong-password / locked-account / no-actor all take statistically
|
||||
indistinguishable time.
|
||||
- Surface invisibility: when disabled, ALL four endpoints return
|
||||
HTTP 404 (NOT 403). Scanners cannot distinguish "endpoint
|
||||
disabled" from "endpoint doesn't exist".
|
||||
- WARN log at server boot when `ENABLED=true`; audit row on every
|
||||
break-glass login (`auth.breakglass_login_*`,
|
||||
`event_category=auth`); WebAuthn/FIDO2 second factor pairing
|
||||
on the v3 roadmap (Decision 12).
|
||||
|
||||
Operator should DISABLE break-glass within 24h of SSO recovery
|
||||
to avoid a permanent backdoor; the runbook at
|
||||
[`auth-threat-model.md#break-glass-risks-phase-75`](auth-threat-model.md)
|
||||
documents the full state machine.
|
||||
|
||||
### Demo-to-production cutover (Audit 2026-05-11 A-8)
|
||||
|
||||
Migration `000029_rbac.up.sql` unconditionally seeds an
|
||||
`actor-demo-anon → r-admin` row into `actor_roles`. This row is the
|
||||
runtime principal injected by the demo-mode middleware when
|
||||
`CERTCTL_AUTH_TYPE=none`. Under any non-`none` auth type the row is
|
||||
DORMANT — the middleware chain never resolves to it. But its existence
|
||||
is a footgun: a future regression that resolves an unauthenticated
|
||||
request to `actor-demo-anon` (a misrouted CORS preflight, a fallback in
|
||||
a new auth-exempt route) would silently re-elevate to admin.
|
||||
|
||||
certctl-server detects this residue at startup and emits a WARN log +
|
||||
an `auth.demo_residual_grants_detected` audit row listing every grant
|
||||
present on `actor-demo-anon`. **Every production deploy will see this
|
||||
WARN on first boot** — the migration baseline is part of the install,
|
||||
not a side effect of running demo mode.
|
||||
|
||||
Operator workflow at production cutover:
|
||||
|
||||
1. Drain the WARN by calling the cleanup endpoint with an admin API key:
|
||||
|
||||
```bash
|
||||
curl -X POST --cacert deploy/test/certs/ca.crt \
|
||||
-H "Authorization: Bearer $ADMIN_KEY" \
|
||||
https://certctl.example.com:8443/api/v1/auth/demo-residual/cleanup
|
||||
# → {"removed": 1}
|
||||
```
|
||||
|
||||
The endpoint is gated `auth.role.assign` (admin-class) and refuses
|
||||
to run when `CERTCTL_AUTH_TYPE=none` (HTTP 503 — the residue IS the
|
||||
active runtime state at that auth type). The cleanup is idempotent;
|
||||
a second call returns `{"removed": 0}` and still leaves an audit row.
|
||||
|
||||
Equivalent SQL for operators preferring direct DB access:
|
||||
|
||||
```sql
|
||||
DELETE FROM actor_roles WHERE actor_id = 'actor-demo-anon';
|
||||
```
|
||||
|
||||
2. To make subsequent boots refuse startup if the row reappears (the
|
||||
most paranoid stance), set:
|
||||
|
||||
```
|
||||
CERTCTL_DEMO_MODE_RESIDUAL_STRICT=true
|
||||
```
|
||||
|
||||
With the flag set, any `actor-demo-anon` row under a non-`none`
|
||||
auth type causes certctl-server to log the WARN AND exit non-zero
|
||||
before binding the HTTPS listener. Default is `false` (WARN only).
|
||||
|
||||
3. The CI guard `scripts/ci-guards/no-new-synthetic-admin.sh` pins the
|
||||
set of source files that may reference the `actor-demo-anon` literal.
|
||||
New runtime code paths that resolve to the synthetic actor are
|
||||
rejected at PR time so the credibility gap stays closed.
|
||||
|
||||
### Migrating an existing deployment to OIDC
|
||||
|
||||
An existing API-key-only deployment that wants to add OIDC follows
|
||||
the step-by-step at
|
||||
[`docs/migration/oidc-enable.md`](../migration/oidc-enable.md):
|
||||
configure CERTCTL_CONFIG_ENCRYPTION_KEY, pick + configure an IdP
|
||||
per the relevant runbook, configure the certctl-side OIDCProvider
|
||||
+ group→role mappings, verify the login flow against a single
|
||||
test user, then announce the SSO endpoint to the rest of the
|
||||
organization.
|
||||
|
||||
## Per-user rate limiting
|
||||
|
||||
Authenticated callers are bucketed by API-key name;
|
||||
unauthenticated callers (probes, OCSP relying parties, EST/SCEP enrollees)
|
||||
are bucketed by source IP. `RPS` and `BurstSize` are per-key budgets.
|
||||
`PerUserRPS` / `PerUserBurstSize` give authenticated clients a separate
|
||||
budget when set non-zero.
|
||||
|
||||
## API key rotation
|
||||
|
||||
**Audit reference:** L-004. CWE-924 (improper enforcement of message integrity during transmission in a communication channel) - operator UX variant.
|
||||
|
||||
certctl's API keys are configured via the `CERTCTL_API_KEYS_NAMED` env var
|
||||
(format `name1:key1,name2:key2:admin`) and parsed at startup into an
|
||||
in-memory list. There is no DB-resident key store, no GUI, no `/api/v1/keys`
|
||||
endpoint - the env var IS the key inventory.
|
||||
|
||||
The env var supports a **double-key rotation window**: two entries can share a
|
||||
name during the rollover, and both keys validate. Operators run the
|
||||
rotation as:
|
||||
|
||||
1. **Generate the new key.** `openssl rand -hex 32` produces a 256-bit
|
||||
value with sufficient entropy.
|
||||
|
||||
2. **Append the new entry to `CERTCTL_API_KEYS_NAMED`** alongside the
|
||||
existing one:
|
||||
```
|
||||
CERTCTL_API_KEYS_NAMED="alice:OLDKEY:admin,alice:NEWKEY:admin"
|
||||
```
|
||||
Both entries MUST carry the same admin flag - startup fails loud if
|
||||
they don't (a non-admin shouldn't share an identity with an admin).
|
||||
|
||||
3. **Restart certctl.** A startup INFO log confirms the rotation window
|
||||
is active:
|
||||
```
|
||||
INFO api-key rotation window active name=alice entries=2 see=docs/security.md::api-key-rotation
|
||||
```
|
||||
|
||||
4. **Roll the new key out to all clients.** Both keys validate during
|
||||
this phase. Audit-trail actor + per-user rate-limit bucket stay
|
||||
consistent across the rollover (both entries produce the same
|
||||
`UserKey` context value, the shared name).
|
||||
|
||||
5. **Remove the old entry** from `CERTCTL_API_KEYS_NAMED`:
|
||||
```
|
||||
CERTCTL_API_KEYS_NAMED="alice:NEWKEY:admin"
|
||||
```
|
||||
|
||||
6. **Restart certctl.** OLDKEY now fails with 401. Rotation complete.
|
||||
|
||||
The rotation window has no operator-set timeout - it lasts for as long
|
||||
as both entries are in the env var. Best practice is a 24-72h window
|
||||
covering a full deploy cadence; if a client hasn't rolled to NEWKEY by
|
||||
the end of step 4, extend the window before step 5.
|
||||
|
||||
### What the contract guarantees
|
||||
|
||||
- Two entries with the same `name`: **allowed** if both have the same
|
||||
`admin` flag.
|
||||
- Two entries with the same `name` but mismatched admin: **rejected at
|
||||
startup** (privilege escalation guard).
|
||||
- Two entries with the same `(name, key)` pair: **rejected at startup**
|
||||
(typo guard - rotation requires DIFFERENT keys under the same name).
|
||||
- Single-entry steady state: the simple legacy behaviour.
|
||||
|
||||
### What the contract does NOT do
|
||||
|
||||
- **No automatic expiration of OLDKEY.** The operator removes the entry
|
||||
in step 5; certctl doesn't track timestamps. A future enhancement
|
||||
could add a `rotated_at` annotation if operators ask for it.
|
||||
- **No GUI / API for key management.** Keys are env-var only by design;
|
||||
building a key-management surface is a separate feature project.
|
||||
- **No revocation list.** If a key leaks, the only path is to remove it
|
||||
from the env var and restart. That's appropriate for a small env-var
|
||||
inventory; it would not scale to a per-user-key-issued model.
|
||||
|
||||
## Reporting a vulnerability
|
||||
|
||||
Email `certctl@proton.me`. Coordinated disclosure preferred; we will
|
||||
acknowledge within 72h.
|
||||
@@ -1,8 +1,10 @@
|
||||
# TLS on the Control Plane
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl's control plane is HTTPS-only as of v2.2. There is no plaintext `http://` listener, no `auto` mode, no dual-listener bridge, no TLS 1.2 escape hatch. The server refuses to start without a cert+key pair, the agent/CLI/MCP clients reject `http://` URLs at startup, and the Helm chart refuses to render without either an operator-supplied Secret or a cert-manager Certificate CR.
|
||||
|
||||
This doc covers four cert provisioning patterns, SIGHUP-based cert rotation, and the client-side CA-trust configuration agents and the CLI need to talk to the server. If you are upgrading from a pre-HTTPS release and want the step-by-step cutover procedure, read [`upgrade-to-tls.md`](upgrade-to-tls.md) first and come back here for reference.
|
||||
This doc covers four cert provisioning patterns, SIGHUP-based cert rotation, and the client-side CA-trust configuration agents and the CLI need to talk to the server. If you are upgrading from a pre-HTTPS release and want the step-by-step cutover procedure, read [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md) first and come back here for reference.
|
||||
|
||||
## What you get
|
||||
|
||||
@@ -154,7 +156,7 @@ Same three controls as CLI, env-var-driven only (no flags — MCP runs as a stdi
|
||||
- `CERTCTL_SERVER_CA_BUNDLE_PATH` optional CA bundle
|
||||
- `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY` optional skip
|
||||
|
||||
Claude Desktop / other MCP client configs should set all three in the tool's env block.
|
||||
MCP-client configs should set all three in the tool's env block.
|
||||
|
||||
## Troubleshooting: fail-loud preflight errors
|
||||
|
||||
@@ -173,7 +175,7 @@ Both files exist but `tls.LoadX509KeyPair` refused them. Typical causes: the pri
|
||||
The client did not trust the CA that signed the server cert. Either mount the CA bundle via `CERTCTL_SERVER_CA_BUNDLE_PATH`, add the CA to the system trust store on the client host, or (dev only) set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`.
|
||||
|
||||
**Client side: `tls: first record does not look like a TLS handshake`**
|
||||
The client is speaking plaintext HTTP to an HTTPS server (or vice-versa). Check that `CERTCTL_SERVER_URL` starts with `https://`. If you are upgrading from a pre-v2.2 release and your agents are old, they will surface this error until you roll the DaemonSet — see [`upgrade-to-tls.md`](upgrade-to-tls.md).
|
||||
The client is speaking plaintext HTTP to an HTTPS server (or vice-versa). Check that `CERTCTL_SERVER_URL` starts with `https://`. If you are upgrading from a pre-v2.2 release and your agents are old, they will surface this error until you roll the DaemonSet — see [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md).
|
||||
|
||||
## InsecureSkipVerify justifications (Audit L-001)
|
||||
|
||||
@@ -208,8 +210,8 @@ ignores `_test.go`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`upgrade-to-tls.md`](upgrade-to-tls.md) — one-step cutover from pre-HTTPS releases
|
||||
- [`quickstart.md`](quickstart.md) — docker-compose walkthrough with HTTPS examples
|
||||
- [`test-env.md`](test-env.md) — integration test environment (also HTTPS-only)
|
||||
- [`upgrade-to-tls.md`](../archive/upgrades/to-tls-v2.2.md) — one-step cutover from pre-HTTPS releases
|
||||
- [`quickstart.md`](../getting-started/quickstart.md) — docker-compose walkthrough with HTTPS examples
|
||||
- [`test-env.md`](../contributor/test-environment.md) — integration test environment (also HTTPS-only)
|
||||
- [`security.md`](security.md) — overall security posture, OCSP Must-Staple guidance, encryption-at-rest spec
|
||||
- Milestone spec: `prompts/https-everywhere-milestone.md` (authoritative source for locked decisions)
|
||||
@@ -1,6 +1,8 @@
|
||||
# OpenAPI Specification Guide
|
||||
|
||||
certctl ships with a complete OpenAPI 3.1 specification at `api/openapi.yaml`. This spec documents all 78 API operations currently specified, every request/response schema, pagination conventions, authentication requirements, and error formats. It's the single source of truth for the documented REST API. (Note: The spec will be updated to include 7 additional certificate discovery endpoints from M18b.)
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl ships with a complete OpenAPI 3.1 specification at `api/openapi.yaml`. The spec documents every operation (re-derive count via `grep -cE '^\s+operationId:' api/openapi.yaml`), every request/response schema, pagination conventions, authentication requirements, and error formats. It's the single source of truth for the documented REST API.
|
||||
|
||||
This guide covers how to use the spec for API exploration, client SDK generation, and integration testing.
|
||||
|
||||
@@ -12,9 +14,8 @@ The spec lives at `api/openapi.yaml` in the repository root. It's versioned alon
|
||||
# View the spec
|
||||
cat api/openapi.yaml
|
||||
|
||||
# Count operations
|
||||
grep "operationId:" api/openapi.yaml | wc -l
|
||||
# 78 (includes health + ready, 7 discovery endpoints pending spec update)
|
||||
# Count operations (includes health + ready)
|
||||
grep -cE '^\s+operationId:' api/openapi.yaml
|
||||
```
|
||||
|
||||
## Viewing with Swagger UI
|
||||
@@ -151,7 +152,7 @@ npx @apidevtools/swagger-cli validate api/openapi.yaml
|
||||
Import the spec directly into Postman:
|
||||
|
||||
1. Open Postman → Import → File → select `api/openapi.yaml`
|
||||
2. Postman creates a collection with all 78 documented operations organized by tag
|
||||
2. Postman creates a collection with every documented operation organized by tag
|
||||
3. Set the `baseUrl` variable to `https://localhost:8443` (HTTPS-only as of v2.2)
|
||||
4. Add an `Authorization: Bearer your-api-key` header to the collection
|
||||
5. Import the demo stack CA bundle (`deploy/test/certs/ca.crt`) into Postman's Settings → Certificates → CA Certificates, or disable certificate verification for the `localhost` host (Settings → General → SSL certificate verification)
|
||||
@@ -191,6 +192,6 @@ This sends randomized valid requests to every endpoint and verifies the response
|
||||
## What's Next
|
||||
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the certctl API
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Connector Guide](connectors.md) — Build custom issuer and target connectors
|
||||
- [Quick Start](../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Connector Guide](connectors/index.md) — Build custom issuer and target connectors
|
||||
- [Architecture](architecture.md) — System design deep dive
|
||||
@@ -1,5 +1,7 @@
|
||||
# Architecture Guide
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
@@ -61,7 +63,7 @@ flowchart TB
|
||||
API["REST API\n(Go net/http, :8443)"]
|
||||
SVC["Service Layer"]
|
||||
REPO["Repository Layer\n(database/sql + lib/pq)"]
|
||||
SCHED["Background Scheduler\n8 always-on + 4 optional loops"]
|
||||
SCHED["Background Scheduler\n9 always-on + 5 opt-in loops"]
|
||||
DASH["Web Dashboard\n(React SPA)"]
|
||||
end
|
||||
|
||||
@@ -493,11 +495,11 @@ Short-lived certificates (those with profile TTL < 1 hour) return "good" from OC
|
||||
|
||||
#### Bulk Revocation
|
||||
|
||||
For compliance events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
For incident-response events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
|
||||
### 4. Automatic Renewal
|
||||
|
||||
The control plane runs a scheduler with 8 always-on loops plus up to 4 optional loops (enabled by configuration). `internal/scheduler/scheduler.go:262-265` is the authoritative count.
|
||||
The control plane runs a scheduler with 9 always-on loops plus up to 5 opt-in loops (enabled by configuration). Re-derive the count via `grep -cE '^func \(s \*Scheduler\) [a-zA-Z]+Loop' internal/scheduler/scheduler.go`; the opt-in gating lives in `cmd/server/main.go` startup wiring (`cfg.NetworkScan.Enabled`, `digestService != nil`, `healthCheckService != nil`, `cloudDiscoveryService != nil`, `cfg.ACMEServer.Enabled && cfg.ACMEServer.GCInterval > 0`).
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
@@ -1042,7 +1044,7 @@ For deployments that need JWT/OIDC/mTLS, the standard pattern is to put an authe
|
||||
|
||||
### Concurrency Safety
|
||||
|
||||
The background scheduler uses `sync/atomic.Bool` idempotency guards on every loop (8 always-on plus up to 4 optional) — if a tick fires while the previous iteration is still running, it skips. A `sync.WaitGroup` tracks all in-flight goroutines. `WaitForCompletion(timeout)` blocks during shutdown until all work finishes or the timeout expires, preventing state corruption from mid-flight database operations during process exit.
|
||||
The background scheduler uses `sync/atomic.Bool` idempotency guards on every loop (9 always-on plus up to 5 opt-in) — if a tick fires while the previous iteration is still running, it skips. A `sync.WaitGroup` tracks all in-flight goroutines. `WaitForCompletion(timeout)` blocks during shutdown until all work finishes or the timeout expires, preventing state corruption from mid-flight database operations during process exit.
|
||||
|
||||
The job-processor tick fans the per-job work out across up to `CERTCTL_RENEWAL_CONCURRENCY` goroutines (default 25), gated by `golang.org/x/sync/semaphore.Weighted`. The cap is the operator's lever for "how many concurrent CA calls per scheduler tick" — operators with permissive upstream limits and large fleets (>10k certs) can bump to 100; operators with strict limits or async-CA-heavy fleets should stay at 25 or lower. Values ≤ 0 normalise to 1 (sequential). The Acquire is ctx-aware so a shutdown-driven ctx cancel interrupts the dispatch loop promptly; in-flight goroutines drain via Wait before the tick returns. Closes the #9 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (pre-fix the fan-out had no cap, so a 5,000-cert sweep tripped DigiCert / Entrust / Sectigo rate limits and the next tick re-fanned-out the same calls).
|
||||
|
||||
@@ -1094,11 +1096,11 @@ Health checks live outside the API prefix: `GET /health` and `GET /ready`.
|
||||
|
||||
## MCP Server
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server as a separate binary (`cmd/mcp-server/`) that enables AI assistants to interact with the certificate platform. The MCP server uses the official MCP Go SDK (`modelcontextprotocol/go-sdk`) with stdio transport for integration with Claude, Cursor, and other MCP-compatible tools.
|
||||
certctl includes an MCP (Model Context Protocol) server as a separate binary (`cmd/mcp-server/`) that enables AI assistants to interact with the certificate platform. The MCP server uses the official MCP Go SDK (`modelcontextprotocol/go-sdk`) with stdio transport for integration with any MCP-compatible AI client.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
AI["AI Assistant\n(Claude, Cursor)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
AI["AI Assistant\n(any MCP client)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
MCP -->|"HTTP + Bearer token"| API["certctl REST API\n:8443"]
|
||||
|
||||
subgraph "MCP Tools"
|
||||
@@ -1248,7 +1250,7 @@ flowchart TB
|
||||
|
||||
1. **Pluggable sources** — Each cloud provider implements the `DiscoverySource` interface (Name, Type, Discover, ValidateConfig). Three built-in sources: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
|
||||
2. **CloudDiscoveryService orchestrator** — Iterates registered sources, calls `Discover()` on each, feeds reports into `ProcessDiscoveryReport()`. Errors from one source don't prevent other sources from running
|
||||
3. **Scheduler integration** — opt-in cloud discovery scheduler loop (6h default; see `docs/architecture.md` 12-loop topology), runs immediately on startup, `atomic.Bool` idempotency guard
|
||||
3. **Scheduler integration** — opt-in cloud discovery scheduler loop (6h default; one of the 14 loops in the scheduler topology — see the Background Scheduler section above), runs immediately on startup, `atomic.Bool` idempotency guard
|
||||
4. **Sentinel agents** — Each source uses its own sentinel agent ID (`cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`) for dedup and triage filtering
|
||||
5. **Source path format** — `aws-sm://{region}/{secret}`, `azure-kv://{cert-name}/{version}`, `gcp-sm://{project}/{secret}`
|
||||
6. **No new schema** — Reuses existing `discovered_certificates` and `discovery_scans` tables. Sentinel agent IDs leverage existing `(fingerprint_sha256, agent_id, source_path)` dedup constraint
|
||||
@@ -1262,7 +1264,7 @@ flowchart TB
|
||||
- **Claims it** via `POST /discovered-certificates/{id}/claim` — links to existing managed cert or creates new enrollment
|
||||
- **Dismisses it** via `POST /discovered-certificates/{id}/dismiss` — removes from triage, marked as "Dismissed"
|
||||
9. **Status tracking** — `discovery_cert_claimed` and `discovery_cert_dismissed` events audit the operator's decision
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for compliance reporting)
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for inventory reporting)
|
||||
|
||||
This data flow is pull-based and non-blocking. Agents discover at their own pace; the server stores results for later review. There's no pressure to claim or dismiss; operators can leave certificates in "Unmanaged" status indefinitely.
|
||||
|
||||
@@ -1316,7 +1318,7 @@ For detailed test procedures, smoke tests, and the release sign-off checklist, s
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit (see `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md`). Pre-audit, certctl had no benchmarks or load tests for any API path, so any throughput claim was hand-waved; the harness in `deploy/test/loadtest/` substantiates the API-tier capacity numbers with reproducible methodology.
|
||||
Closes the #8 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. Pre-audit, certctl had no benchmarks or load tests for any API path, so any throughput claim was hand-waved; the harness in `deploy/test/loadtest/` substantiates the API-tier capacity numbers with reproducible methodology.
|
||||
|
||||
The harness drives a k6 client at sustained 50 req/s × 2 scenarios × 5 minutes against a docker-compose stack of postgres + tls-init + certctl-server. Two scenarios run in parallel: `POST /api/v1/certificates` (issuance-acceptance hot path: auth + JSON decode + validation + service `CreateCertificate` + `managed_certificates` insert) and `GET /api/v1/certificates?per_page=50` (most-trafficked read endpoint). Hard regression-guard thresholds: p99 < 5 s for issuance-acceptance, p99 < 2 s for list, error rate < 1% globally. k6 exits non-zero on any threshold breach so a future PR that pushes p99 above the bar fails `make loadtest`. Run via `make loadtest` from the repo root or via `.github/workflows/loadtest.yml` (`workflow_dispatch` + weekly cron — never per-push).
|
||||
|
||||
@@ -1326,11 +1328,10 @@ Captured baseline numbers are committed in `deploy/test/loadtest/README.md` once
|
||||
|
||||
## What's Next
|
||||
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](demo-advanced.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors.md) — Build custom connectors
|
||||
- [Compliance Mapping](compliance.md) — SOC 2, PCI-DSS 4.0, and NIST SP 800-57 alignment
|
||||
- [Quick Start](../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](../getting-started/advanced-demo.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors/index.md) — Build custom connectors
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the API
|
||||
- [OpenAPI Spec](openapi.md) — Full API reference and SDK generation
|
||||
- [Testing Guide](testing-guide.md) — Test procedures and release sign-off
|
||||
- [Test Environment](test-env.md) — Docker Compose test environment setup
|
||||
- [API Reference](api.md) — OpenAPI 3.1 spec and SDK generation
|
||||
- [QA Test Suite](../contributor/qa-test-suite.md) — Test procedures and release sign-off
|
||||
- [Test Environment](../contributor/test-environment.md) — Docker Compose test environment setup
|
||||
@@ -0,0 +1,83 @@
|
||||
# Authentication standards implemented
|
||||
|
||||
> Last reviewed: 2026-05-10
|
||||
|
||||
This document is an honest informational reference for operators, external testers, and acquirers who want to know which RFCs and standards certctl's authentication surface (API keys + RBAC + OIDC + sessions + back-channel logout + break-glass admin) implements, and which CWE weakness classes the implementation closes. Every row points at a real file or migration in this repository.
|
||||
|
||||
This document is intentionally NOT a compliance-mapping doc. The operator retired the framework-mapping subtree (`docs/compliance/{index,soc2,pci-dss,nist-sp-800-57}.md`) on 2026-05-05; framework-name-drops (SOC 2 / PCI-DSS / HIPAA / NIST SSDF / FedRAMP) are also swept from prose mentions across `README.md` and `docs/` per that decision. RFC and CWE references stay because they are precise technical pointers; framework labels were marketing-flavored and prone to overclaim. If you are an auditor mapping certctl's controls to a framework, treat the rows below as evidence and do the framework mapping yourself against the framework you are auditing against.
|
||||
|
||||
For the wider security posture, see [`security.md`](../operator/security.md). For the threat model behind these controls, see [`auth-threat-model.md`](../operator/auth-threat-model.md). For the per-IdP setup guides, see [`oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md).
|
||||
|
||||
## Table 1: RFCs and standards implemented end-to-end
|
||||
|
||||
Each row carries at least one negative test (a test that asserts the fail-closed branch fires when a malformed input violates the spec).
|
||||
|
||||
| Standard | What we implement | Source | Negative-test anchor |
|
||||
|---|---|---|---|
|
||||
| RFC 6749 (OAuth 2.0) | Authorization-code grant via OIDC; confidential-client credentials only | `internal/auth/oidc/service.go` (HandleAuthRequest, HandleCallback) | `internal/auth/oidc/service_test.go` (21+ negatives covering wrong aud / wrong iss / expired / etc.) |
|
||||
| RFC 7636 (PKCE) | S256 challenge mandatory; `plain` rejected at the service-layer sentinel; verifier persisted in pre-login row, single-use | `internal/auth/oidc/service.go` (oauth2.S256ChallengeOption hard-coded), `internal/auth/oidc/prelogin.go` | `TestService_PKCEPlainRejectedSentinel`, `TestService_StateReplayDeniedByConsumeOnce` |
|
||||
| RFC 7519 (JWT) | ID-token validation via go-oidc; service-layer alg allow-list (RS256/RS512/ES256/ES384/EdDSA); HS-family + `none` rejected | `internal/auth/oidc/service.go` (disallowedAlgs map, isDisallowedAlg) | `TestService_HandleCallback_RejectsHSAlgsConfusion`, `TestService_IdPDowngradeDefense_RejectsHSAdvertised` |
|
||||
| RFC 7517 (JWK) | JWKS fetch + cache + rotation handled transparently by coreos/go-oidc; operator-triggered RefreshKeys + auto-refresh on TTL expiry | `internal/auth/oidc/service.go` (RefreshKeys; cfg.JWKSCacheTTLSeconds default 3600) | `TestService_RefreshKeys_CatchesPostLoadDowngrade`, `TestKeycloakIntegration_JWKSRotation_RefreshKeysPicksUpNewKey` (Keycloak integration) |
|
||||
| OIDC Core 1.0 §3.1.3.7 | `iss` exact match, `aud` membership, `azp` for multi-aud, `at_hash` REQUIRED-when-access_token-present (certctl tightens the spec MAY → MUST), `nonce` constant-time-compare | `internal/auth/oidc/service.go` (HandleCallback steps 5-9) | `TestService_HandleCallback_RejectsWrongAudience`, `TestService_HandleCallback_AZPRequiredOnMultiAud`, `TestService_HandleCallback_ATHashRequiredWhenAccessTokenPresent`, `TestService_HandleCallback_RejectsNonceMismatch` |
|
||||
| OIDC Core 1.0 §5.3.2 (UserInfo endpoint) | Optional fallback when ID-token groups claim is empty; bounded by configured FetchUserinfo bool | `internal/auth/oidc/service.go` (fetchUserinfoGroups) | 4-case userinfo-fallback matrix in `service_test.go` (happy + endpoint-missing + endpoint-failing + userinfo-also-empty) |
|
||||
| OpenID Connect Back-Channel Logout 1.0 | `events` claim + `sid`/`sub` revocation; `nonce` MUST be absent; `jti`-based replay defense | `internal/api/handler/auth_session_oidc.go` (BackChannelLogout, DefaultBCLVerifier) | 6 negatives in `auth_session_oidc_test.go`: BCL missing events, BCL nonce-present, BCL unknown-key-sig, etc. |
|
||||
| RFC 6265 (HTTP State Management) | Session cookie attributes: `Secure` + `HttpOnly` + `SameSite=Lax` (default; configurable to Strict via `CERTCTL_SESSION_SAMESITE`); `Path=/`; host-only | `internal/auth/session/service.go` (cookie minting), `internal/api/handler/auth_session_oidc.go` (Set-Cookie wiring) | 7-case middleware-chain test matrix in `internal/auth/session/middleware_test.go` |
|
||||
| RFC 9700 (OAuth 2.0 Security Best Current Practice) | PKCE mandatory; no implicit flow; strict redirect_uri (registered + exact-match per OIDCProvider.RedirectURI); state non-guessable (32-byte random); single-use | `internal/auth/oidc/service.go`; `OIDCProvider.Validate()` enforces redirect_uri shape | `TestOIDCProvider_Validate_RejectsHTTPRedirectInProd`, state-replay test |
|
||||
| RFC 8414 (OAuth 2.0 Authorization Server Metadata) | Discovery doc fetched via go-oidc at provider creation + RefreshKeys; `id_token_signing_alg_values_supported` consulted for IdP-downgrade-attack defense | `internal/auth/oidc/service.go` (getOrLoad, guardAdvertisedAlgs) | `TestService_IdPDowngradeDefense_RejectsHSAdvertised` and `RejectsNoneAdvertised` |
|
||||
| RFC 7633 (X.509 TLS Feature Extension; Must-Staple) | Per-profile certctl issuance flag; out-of-scope for the auth surface but cited here because RFC 7633 OID `id-pe-tlsfeature` is in the same crypto-stack umbrella | `internal/connector/issuer/local/local.go` | SCEP master-bundle must-staple tests; not auth-surface territory |
|
||||
| RFC 8555 §7 (ACME directory metadata) | certctl-side ACME server tier; out-of-scope for the auth surface but cited because it shares the alg-pinning + nonce-handling discipline the auth surface carries forward | `internal/api/handler/acme/*` | per-route handler tests in `internal/api/handler/acme/` |
|
||||
| RFC 7515 (JWS) | JWS verification delegated to go-oidc/v3 + go-jose/v4; alg pin enforced at `gooidc.NewIDTokenVerifier` config + service-layer re-check | `internal/auth/oidc/service.go` (oauthConfig + verifier wiring) | `TestService_HandleCallback_RejectsExpired` and `TestService_HandleCallback_RejectsIATInFuture` |
|
||||
|
||||
## Table 2: CWE / weakness classes the implementation closes
|
||||
|
||||
Each row points at the file(s) that implement the defense and the test file(s) that pin the invariant.
|
||||
|
||||
| CWE | Description | Where defended | Where pinned |
|
||||
|---|---|---|---|
|
||||
| CWE-287 (Improper Authentication) | Session-cookie HMAC verification (length-prefixed input defeats concat-collision) + alg-pinned ID-token verify | `internal/auth/session/service.go` (computeHMAC, parseCookie, Validate); `internal/auth/oidc/service.go` (HandleCallback) | `TestComputeHMAC_LengthPrefixDefeatsConcatCollision`; `TestService_Validate_ConcatenationCollisionDefeatedByLengthPrefix`; full 21+ OIDC negatives matrix |
|
||||
| CWE-352 (Cross-Site Request Forgery) | Double-submit cookie + `SameSite=Lax`/`Strict` + hashed CSRF token on session row; constant-time compare in CSRFMiddleware | `internal/auth/session/middleware.go` (CSRFMiddleware) | 7-case middleware-chain matrix (`internal/auth/session/middleware_test.go`); `TestSessionMiddleware_CSRFRequiredOnStateChangingMethods` |
|
||||
| CWE-384 (Session Fixation) | Session ID is opaque random `ses-<base64url>` (32 bytes entropy) generated server-side at login; cookie value rotates on every login (no inheritance from pre-login); CSRF token rotates alongside | `internal/auth/session/service.go` (Create, RotateCSRFToken) | `TestService_Create_AssignsFreshSessionID`; CSRF rotation pinned via `TestService_RotateCSRFToken_AfterLogin` |
|
||||
| CWE-294 (Authentication Bypass by Capture-Replay) | Single-use state, single-use nonce (both stored in pre-login row, atomic `DELETE...RETURNING` on consume); single-use authorization code (Keycloak/IdP-side); `jti`-based BCL replay defense | `internal/auth/oidc/prelogin.go` (LookupAndConsume); `internal/api/handler/auth_session_oidc.go` (BCL handler) | `TestService_StateReplayDeniedByConsumeOnce`; `TestService_HandleCallback_RejectsForgedPreLoginCookie`; BCL replay negative in handler tests |
|
||||
| CWE-916 / CWE-329 (Use of Password Hash With Insufficient Computational Effort / Use of a Key Past its Expiration Date) | Argon2id with OWASP 2024 params (m=64 MiB, t=3, p=4, 16-byte salt, 32-byte output) for break-glass passwords; per-credential random salt; PHC-format hash | `internal/auth/breakglass/service.go` (HashPassword, VerifyPassword); v3 ciphertext blob format with PBKDF2-SHA256 600,000 rounds for config-at-rest encryption | `TestPhase7_5_HashPasswordOWASP2024Params`; `TestPhase7_5_HashFormatPHC`; `internal/crypto/encryption_test.go` for v3 PBKDF2 floor |
|
||||
| CWE-307 (Improper Restriction of Excessive Authentication Attempts) | Failure count + lockout window on break-glass credential; threshold default 5, reset window default 1h, lockout duration default 30s; atomic single-statement IncrementFailure defeats concurrent racing attempts | `internal/auth/breakglass/service.go` (Login, IncrementFailure); `internal/repository/postgres/breakglass.go` | `TestPhase7_5_LockoutAfterThresholdFailures`; `TestPhase7_5_FailureCountResetsAfterWindow` |
|
||||
| CWE-345 (Insufficient Verification of Data Authenticity) | OIDC `at_hash` REQUIRED-when-access_token-present ties access token to ID token (certctl tightens OIDC core MAY → MUST); OIDC `iss` + `aud` + `azp` checks ensure token came from the configured IdP for the configured client | `internal/auth/oidc/service.go` (HandleCallback steps 5-9, atHashMatches) | `TestService_HandleCallback_ATHashRequiredWhenAccessTokenPresent`; `TestService_HandleCallback_RejectsATHashMismatch` |
|
||||
| CWE-200 (Information Exposure) | Token-leak hygiene tests on every secret-bearing path: ID tokens, access tokens, refresh tokens, authorization codes, PKCE verifiers, state, nonce, signing keys, break-glass passwords NEVER appear in any log line at any level | `internal/auth/oidc/service.go`, `internal/auth/session/service.go`, `internal/auth/breakglass/service.go` (all log calls audited); `internal/service/audit_redact.go` (audit redactor) | `internal/auth/oidc/logging_test.go` (4 grep-asserts); `internal/auth/breakglass/service_test.go` (token-leak hygiene + json.Marshal probe); `internal/auth/bootstrap/service_test.go` (canonical pattern) |
|
||||
| CWE-770 (Allocation of Resources Without Limits or Throttling) | Per-IP rate limit on `/auth/breakglass/login` via the global middleware.NewRateLimiter (default RPS / burst from `CERTCTL_RATE_LIMIT_*` env vars) wrapped around the entire mux; the breakglass login endpoint inherits this protection. Per-route override available via `middleware.NewRateLimiter` per-bucket configuration if the operator wants stricter caps | `cmd/server/main.go` (rateLimiter wiring at the root middleware stack); `internal/api/middleware/middleware.go` (NewRateLimiter) | `internal/api/middleware/ratelimit_test.go`; `internal/api/middleware/ratelimit_keyed_test.go` |
|
||||
| CWE-330 (Use of Insufficiently Random Values) | `crypto/rand` for state, nonce, PKCE verifier (via `oauth2.GenerateVerifier`), session signing keys (32 random bytes), session IDs (`ses-<base64url-no-pad>` from 32 random bytes), pre-login IDs (`pl-<base64url-no-pad>` from 16 random bytes), CSRF tokens (32 random bytes), break-glass salts (16 random bytes via `crypto/rand`) | `internal/auth/oidc/service.go` (randomB64URL); `internal/auth/session/service.go` (newOpaqueID, newCSRFToken); `internal/auth/oidc/prelogin.go` (newID); `internal/auth/breakglass/service.go` (HashPassword salt) | `TestPreLoginAdapter_CreatePreLogin_RNGFailure` (entropy-source error path); RNG failure pinned for every callsite |
|
||||
| CWE-311 (Missing Encryption of Sensitive Data) | OIDC `client_secret` AES-256-GCM encrypted at rest (v3 blob format: magic 0x03 + salt(16) + nonce(12) + ciphertext+tag); session signing keys same scheme; empty `CERTCTL_CONFIG_ENCRYPTION_KEY` returns `ErrEncryptionKeyRequired` (fail-closed) | `internal/crypto/encryption.go` (EncryptIfKeySet, DecryptIfKeySet); `internal/api/handler/auth_session_oidc.go` (encryptClientSecret); `internal/auth/session/service.go` (KeyMaterialEncrypted) | `internal/repository/postgres/oidc_encryption_invariant_test.go` (invariant test: ciphertext != plaintext, v2/v3 blob shape, round-trip + wrong-passphrase fails) |
|
||||
| CWE-326 (Inadequate Encryption Strength) | TLS 1.3 only on the certctl control plane (post-v2.2 milestone); HSTS-equivalent posture via HTTPS-only listener; AES-256-GCM for at-rest config encryption; PBKDF2-SHA256 600,000 rounds for v3 blob key derivation (OWASP 2024 floor) | `cmd/server/main.go` (TLS 1.3 listener config); `internal/crypto/encryption.go` (v3 PBKDF2 iteration count) | `TestServerTLSConfig_RejectsTLS12`; `TestEncryption_V3IterationCount_PinnedAtOWASP2024Floor` |
|
||||
| CWE-1004 (Sensitive Cookie Without HttpOnly) | Session cookie set with `HttpOnly=true`; CSRF cookie intentionally `HttpOnly=false` so the GUI can read it for the `X-CSRF-Token` header (the read is by-design per the double-submit-cookie pattern) | `internal/auth/session/service.go` (cookie attrs); `internal/api/handler/auth_session_oidc.go` (Set-Cookie wiring) | Cookie-attribute pinning in handler tests; documented in [auth-threat-model.md](../operator/auth-threat-model.md) "Session minting + cookies" subsection |
|
||||
| CWE-614 (Sensitive Cookie in HTTPS Session Without 'Secure' Attribute) | Session + CSRF cookies set with `Secure=true`; rejected at cookie-write time on `http://` listeners (HTTPS-only control plane post-v2.2) | `internal/auth/session/service.go`; `cmd/server/main.go` HTTPS-only listener | TLS-listener tests in `cmd/server/`; cookie attrs pinned in handler tests |
|
||||
| CWE-1275 (Sensitive Cookie with Improper SameSite Attribute) | Session cookie `SameSite=Lax` default (configurable to Strict via `CERTCTL_SESSION_SAMESITE`); CSRF defense via the double-submit pattern means `Lax` is sufficient even if the operator does not flip to Strict | `internal/auth/session/service.go` (cookie attrs); `internal/config/config.go` (SAMESITE env var) | Cookie-attribute pinning; SameSite enforcement is per-cookie |
|
||||
|
||||
## API-key + RBAC standards covered separately
|
||||
|
||||
The above tables focus on the OIDC + sessions + back-channel logout + break-glass surface. The RBAC primitive carries its own implementation pointers; the [`auth-threat-model.md`](../operator/auth-threat-model.md) section "API-key + RBAC defenses" enumerates the full RBAC + bootstrap + auditor + approval-workflow surface. CWE-pointers that apply to the RBAC surface:
|
||||
|
||||
- CWE-285 (Improper Authorization) — defended by the RequirePermission middleware + Authorizer.CheckPermission service-layer call. Pinned by 90+ tests across `internal/auth/` and `internal/service/auth/`.
|
||||
- CWE-862 (Missing Authorization) — pinned by `phase12_protocol_allowlist_test.go` (asserts protocol endpoints are explicitly allowlisted, NOT silently bypassing the gate).
|
||||
- CWE-863 (Incorrect Authorization) — pinned by the auditor-split invariant in `internal/domain/auth/auditor_test.go` (auditor role holds exactly `audit.read` + `audit.export` ONLY).
|
||||
- CWE-732 (Incorrect Permission Assignment for Critical Resource) — five admin-only fine-grained perms (`cert.bulk_revoke`, `crl.admin`, `scep.admin`, `est.admin`, `ca.hierarchy.manage`) seeded into `r-admin` only; pinned by migration 000030 + `r-admin`-only seed test.
|
||||
|
||||
## What this document is NOT
|
||||
|
||||
To preserve the operator's 2026-05-05 retired-compliance-docs decision:
|
||||
|
||||
- This is NOT a SOC 2 / PCI-DSS / HIPAA / NIST SP 800-53 / NIST SSDF / FedRAMP framework-mapping doc.
|
||||
- This is NOT a marketing claim that certctl "satisfies CC6.1" or "complies with §164.312(a)(2)(iii)" or any similar framework label.
|
||||
- This IS an evidence list. An auditor doing framework mapping for their own compliance purposes can use this list as the source-of-truth pointer, then map each row to the framework control they are auditing against under their own judgment.
|
||||
|
||||
If you are an external tester, an operator's auditor, or an acquirer doing technical diligence, this document gives you concrete file paths to read and concrete tests to run. If you want a framework-mapping document, build it yourself against the rows here using the framework-mapping methodology your audit firm prescribes; this project does not own that mapping.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`auth-threat-model.md`](../operator/auth-threat-model.md) — threat model behind these defenses.
|
||||
- [`security.md`](../operator/security.md) — overall security posture.
|
||||
- [`oidc-runbooks/index.md`](../operator/oidc-runbooks/index.md) — per-IdP operator setup guides.
|
||||
- [`auth-benchmarks.md`](../operator/auth-benchmarks.md) — performance baselines for the validation paths cited above.
|
||||
- `internal/auth/oidc/` — OIDC service + groupclaim resolver + pre-login adapter + bootstrap hook.
|
||||
- `internal/auth/session/` — Session service + middleware + CSRF + signing-key rotation.
|
||||
- `internal/auth/breakglass/` — break-glass admin (Argon2id + lockout + constant-time + surface-invisibility).
|
||||
- `internal/crypto/encryption.go` — AES-256-GCM v3 blob format for at-rest encryption.
|
||||
- `migrations/000029` through `000038` — schema for RBAC, OIDC providers, sessions, signing keys, users, group mappings, pre-login, break-glass.
|
||||
- `scripts/ci-guards/multi-tenant-query-coverage.sh` — forward-compat multi-tenant query coverage guard.
|
||||
@@ -0,0 +1,156 @@
|
||||
# certctl CLI
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
`certctl-cli` is the command-line interface to certctl. It wraps the REST API as terminal commands so operators and CI/CD pipelines can drive certctl without writing curl invocations.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
go install github.com/certctl-io/certctl/cmd/cli@latest
|
||||
```
|
||||
|
||||
The binary lands at `$GOBIN/cli` (or `$HOME/go/bin/cli` if `GOBIN` is unset). Rename to `certctl-cli` if you prefer.
|
||||
|
||||
## Configure
|
||||
|
||||
The CLI reads three environment variables:
|
||||
|
||||
```bash
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt
|
||||
```
|
||||
|
||||
Or pass them per-invocation:
|
||||
|
||||
```bash
|
||||
certctl-cli --server https://localhost:8443 --api-key your-key --ca-bundle ca.crt certs list
|
||||
```
|
||||
|
||||
For local development against a self-signed bootstrap cert, `--insecure` skips TLS verification. **Never set this in production.**
|
||||
|
||||
## Command groups
|
||||
|
||||
The CLI is organized by resource:
|
||||
|
||||
```
|
||||
certctl-cli certs [list|get|renew|revoke]
|
||||
certctl-cli agents [list|get]
|
||||
certctl-cli jobs [list|get|cancel]
|
||||
certctl-cli import [bulk PEM import]
|
||||
certctl-cli est [enroll|reenroll]
|
||||
certctl-cli status [server health + summary stats]
|
||||
certctl-cli version [CLI + server version]
|
||||
```
|
||||
|
||||
## Common workflows
|
||||
|
||||
### List + filter certificates
|
||||
|
||||
```bash
|
||||
# All certs
|
||||
certctl-cli certs list
|
||||
|
||||
# Filter by environment
|
||||
certctl-cli certs list --env production
|
||||
|
||||
# JSON output (default is table)
|
||||
certctl-cli certs list --format json
|
||||
|
||||
# Sort + paginate
|
||||
certctl-cli certs list --sort -expires_at --limit 50
|
||||
|
||||
# Time-range filter (RFC 3339)
|
||||
certctl-cli certs list --expires-before 2026-06-01T00:00:00Z
|
||||
|
||||
# Sparse fields — only return the columns you need
|
||||
certctl-cli certs list --fields id,common_name,expires_at,status
|
||||
```
|
||||
|
||||
### Trigger renewal
|
||||
|
||||
```bash
|
||||
certctl-cli certs renew mc-api-prod
|
||||
# Returns the job id; track with: certctl-cli jobs get <job-id>
|
||||
|
||||
# Recovery: clear a stuck in-flight renewal so a new one can start
|
||||
certctl-cli certs renew mc-api-prod --force
|
||||
```
|
||||
|
||||
`--force` clears the server-side `RenewalInProgress` block — used when a previous renewal job hung without releasing the status flag. `--force` does NOT override `Archived` or `Expired` (those are terminal states; archived = decommissioned, expired = issue a new cert instead of renewing a dead one).
|
||||
|
||||
### Revoke
|
||||
|
||||
```bash
|
||||
# Single revoke — --reason is REQUIRED (no silent fallback to 'unspecified')
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
|
||||
# snake_case is accepted and normalised to camelCase before dispatch
|
||||
certctl-cli certs revoke mc-api-prod --reason key_compromise
|
||||
|
||||
# Bulk revoke by filter
|
||||
certctl-cli certs revoke --profile prof-deprecated --reason superseded
|
||||
certctl-cli certs revoke --team t-payments --reason cessationOfOperation
|
||||
certctl-cli certs revoke --issuer iss-old-vault --reason caCompromise
|
||||
```
|
||||
|
||||
`--reason` is mandatory: omitting it prints the canonical RFC 5280 §5.3.1 menu and exits non-zero. Compliance reporting (PCI-DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful, so the CLI no longer falls back silently. Valid camelCase set: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aaCompromise`. snake_case variants (`key_compromise`, `cessation_of_operation`, etc.) are accepted and normalised.
|
||||
|
||||
### Bulk import
|
||||
|
||||
```bash
|
||||
# Import a directory of PEMs
|
||||
certctl-cli import /etc/letsencrypt/live/
|
||||
|
||||
# Import a single concatenated bundle
|
||||
certctl-cli import certs.pem
|
||||
```
|
||||
|
||||
Each cert lands in the inventory as `Unmanaged` (per the discovery model). Triage from the dashboard or via `certctl-cli certs claim <id>` once you've decided to actively manage it.
|
||||
|
||||
### EST enrollment
|
||||
|
||||
```bash
|
||||
# Enroll a new device cert via EST simpleenroll
|
||||
certctl-cli est enroll --csr device.csr --output device.crt
|
||||
|
||||
# Re-enroll (renew) an existing device cert
|
||||
certctl-cli est reenroll --csr device.csr --client-cert device.crt --client-key device.key
|
||||
```
|
||||
|
||||
### Server status
|
||||
|
||||
```bash
|
||||
certctl-cli status
|
||||
# Health: ok
|
||||
# Total certificates: 145
|
||||
# Expiring (30d): 12
|
||||
# Active jobs: 3
|
||||
# Pending renewals: 8
|
||||
```
|
||||
|
||||
## Output formats
|
||||
|
||||
- `--format table` (default) — human-readable terminal output
|
||||
- `--format json` — JSON for piping into `jq`, scripts, dashboards
|
||||
|
||||
The CLI is built with Go's standard library only — no external dependencies. The binary is small (~10MB) and statically linked.
|
||||
|
||||
## Wiring into CI/CD
|
||||
|
||||
Common pattern: a CI step that issues a cert from your internal CA, deploys it via certctl, and verifies the deploy:
|
||||
|
||||
```bash
|
||||
certctl-cli certs renew mc-api-prod --wait
|
||||
certctl-cli jobs get $(certctl-cli certs renew mc-api-prod --json | jq -r '.job_id') --wait
|
||||
certctl-cli certs get mc-api-prod --json | jq -r '.expires_at'
|
||||
```
|
||||
|
||||
The `--wait` flag blocks until the job reaches a terminal state (Completed / Failed / Cancelled), which is what CI scripts actually need.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [`docs/reference/api.md`](api.md) — the OpenAPI 3.1 spec the CLI wraps
|
||||
- [`docs/reference/mcp.md`](mcp.md) — the MCP server that exposes the same surface to AI assistants
|
||||
- [`docs/contributor/qa-prerequisites.md`](../contributor/qa-prerequisites.md) — local environment setup before the CLI can talk to a server
|
||||
@@ -0,0 +1,122 @@
|
||||
# Configuration Reference
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
Compact reference for `CERTCTL_*` environment variables consumed by
|
||||
`certctl-server` and `certctl-agent`. Most operators don't need to
|
||||
touch these — defaults are tuned for the common case. Reach for them
|
||||
when the system's behaviour needs tuning beyond what's exposed in the
|
||||
GUI / API.
|
||||
|
||||
This page enumerates the operator-tunable knobs that don't have a
|
||||
dedicated home elsewhere. Connector-specific env vars are documented
|
||||
on the per-connector pages under
|
||||
[`docs/reference/connectors/`](connectors/index.md). Protocol env
|
||||
vars (ACME server, EST, SCEP) are documented under
|
||||
[`docs/reference/protocols/`](protocols/). TLS env vars are
|
||||
documented in [`docs/operator/tls.md`](../operator/tls.md).
|
||||
|
||||
## Scheduler intervals
|
||||
|
||||
The scheduler runs N background loops; intervals are tunable for
|
||||
performance / contention tuning.
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | `2m` | How often the agent-health loop scans for stale heartbeats and transitions agents to `Unhealthy` / `Offline`. |
|
||||
| `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | `30s` | How often the job-processor loop dispatches `Pending` jobs to agents. |
|
||||
| `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | `1m` | How often the notification-dispatcher loop fans out queued alerts to channels. |
|
||||
| `CERTCTL_SHORT_LIVED_EXPIRY_CHECK_INTERVAL` | `5m` | How often the short-lived-expiry loop watches certs whose TTL is less than 1h for imminent expiry. |
|
||||
|
||||
For the full scheduler topology (14 loops, 9 always-on + 5 opt-in)
|
||||
see [`architecture.md`](architecture.md) "Scheduler topology".
|
||||
|
||||
## Job lifecycle
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_JOB_AWAITING_CSR_TIMEOUT` | `24h` | How long a job stays in `AwaitingCSR` before the scheduler marks it `Failed` (the agent never picked it up). |
|
||||
|
||||
## Rate limiting
|
||||
|
||||
The control plane API is rate-limited by default; tune for
|
||||
high-volume environments (mass-rotation events, bulk imports).
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_RATE_LIMIT_ENABLED` | `true` | Master toggle. Disable only for trusted-network single-tenant deploys where the API is firewall-protected. |
|
||||
| `CERTCTL_RATE_LIMIT_PER_USER_RPS` | `0` (= use global default) | Per-user requests-per-second cap. Zero opts each user into the global default in `internal/api/middleware`. |
|
||||
| `CERTCTL_RATE_LIMIT_PER_USER_BURST` | `0` (= use global default) | Per-user token-bucket burst size. Same opt-in semantics. |
|
||||
|
||||
## Audit trail
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_AUDIT_FLUSH_TIMEOUT_SECONDS` | `30` | How long the audit-event flush worker waits for the buffered batch to drain before forcing a flush at shutdown. |
|
||||
|
||||
## Deploy verification
|
||||
|
||||
The deploy-hardening primitive wraps every cert deploy in
|
||||
atomic-write + post-verify + rollback. These env vars tune the
|
||||
post-deploy TLS verification phase.
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_VERIFY_DEPLOYMENT` | `true` | Master toggle for post-deploy TLS verify. Disable only for connectors / environments where the verify endpoint is not reachable from the agent. |
|
||||
| `CERTCTL_VERIFY_DELAY` | `2s` | How long to wait after the reload command completes before the first verify-handshake attempt (gives the daemon time to pick up new keys). |
|
||||
| `CERTCTL_VERIFY_TIMEOUT` | `10s` | Per-attempt TLS-handshake timeout. |
|
||||
| `CERTCTL_DEPLOY_BACKUP_RETENTION` | `3` | How many `.certctl-bak.<unix-nanos>.<ext>` rollback snapshots to keep per target after a successful deploy. `0` uses the default of 3; `-1` opts out of pruning entirely. |
|
||||
|
||||
For the full deploy contract see
|
||||
[`deployment-model.md`](deployment-model.md).
|
||||
|
||||
## Database
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_DATABASE_MIGRATIONS_PATH` | `./migrations` | Filesystem path to the `*.up.sql` / `*.down.sql` migration set. Override only when running `certctl-server` from a non-standard layout. |
|
||||
|
||||
## Agent
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_AGENT_ID` | (none — required) | The agent's unique ID, issued by `POST /api/v1/agents/register` and bundled into the agent's registration response. Pass via this env var when the agent runs as a systemd unit / container without the `-agent-id` CLI flag. |
|
||||
|
||||
## Auth (RBAC + OIDC + sessions + break-glass)
|
||||
|
||||
Configuration knobs for the RBAC + OIDC + sessions + break-glass
|
||||
auth surface. Full operator guidance lives in
|
||||
[`operator/rbac.md`](../operator/rbac.md),
|
||||
[`operator/oidc-runbooks/`](../operator/oidc-runbooks/index.md), and
|
||||
[`operator/auth-threat-model.md`](../operator/auth-threat-model.md).
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SESSION_BIND_USER_AGENT` | `false` | Bind every session cookie to the User-Agent header captured at login; mismatch -> 401. Defense in depth against stolen cookies on the same network. |
|
||||
| `CERTCTL_SESSION_GC_INTERVAL` | `1h` | How often the scheduler's session-GC loop sweeps expired/revoked rows out of `sessions`. Trade-off: shorter = smaller table, more DB churn; longer = pile-up. |
|
||||
| `CERTCTL_OIDC_BCL_MAX_AGE_SECONDS` | `60` | Back-channel logout `iat` freshness window. Tokens older or newer than this skew (in either direction) are rejected. |
|
||||
| `CERTCTL_OIDC_PRELOGIN_REQUIRE_UA` | `false` | Reject the OIDC callback if the User-Agent at callback differs from the UA captured at pre-login. RFC 9700 §4.7.1 defense-in-depth. |
|
||||
| `CERTCTL_OIDC_PRELOGIN_REQUIRE_IP` | `false` | Same as `_UA` but for client IP. Set carefully — corporate networks with carrier-grade NAT can change apparent IP mid-flow. |
|
||||
| `CERTCTL_DEMO_MODE_ACK` | `false` | Operator acknowledgement that demo mode is intentional in this deploy. Required when `CERTCTL_AUTH_TYPE=none` to allow server startup; safety net against demo-mode-in-production leakage. |
|
||||
| `CERTCTL_TRUSTED_PROXIES` | (empty) | Comma-separated list of trusted-proxy CIDRs (e.g. `10.0.0.0/8,192.0.2.1`). XFF is consulted for client-IP derivation only when the immediate peer sits in this allowlist. |
|
||||
| `CERTCTL_TRUSTED_PROXIES_COUNT` | (synthesised) | Read-only counter exposed by `/api/v1/auth/runtime-config`; mirrors `len(CERTCTL_TRUSTED_PROXIES)`. Not operator-settable; documented here so the G-3 env-docs-drift guard catches drift. |
|
||||
| `CERTCTL_BOOTSTRAP_TOKEN` | (empty) | One-shot token used to mint the first admin role binding via `POST /api/v1/auth/bootstrap`. Once consumed, deletes itself from memory and unsets the bootstrap endpoint. |
|
||||
| `CERTCTL_BOOTSTRAP_TOKEN_SET` | (synthesised) | Boolean exposed by `/api/v1/auth/runtime-config`; `true` when `CERTCTL_BOOTSTRAP_TOKEN` was set at server start. Not operator-settable; documented here so the G-3 guard catches drift. |
|
||||
| `CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID` | (empty) | When OIDC is enabled, restricts the first-admin OIDC strategy to the named provider only — any other provider's tokens won't trigger the bootstrap hook. |
|
||||
| `CERTCTL_BOOTSTRAP_ADMIN_GROUPS_COUNT` | (synthesised) | Read-only counter exposed by `/api/v1/auth/runtime-config`; mirrors `len(CERTCTL_BOOTSTRAP_ADMIN_GROUPS)`. Documented here so the G-3 guard catches drift. |
|
||||
| `CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD` | `5` | Number of consecutive failed `/auth/breakglass/login` attempts that lock the credential. |
|
||||
|
||||
## SCEP profile binding (single-profile back-compat)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_SCEP_PROFILE_ID` | (empty) | Optional certificate profile ID for the legacy single-profile SCEP path. The multi-profile path uses `CERTCTL_SCEP_PROFILES=<list>` + `CERTCTL_SCEP_PROFILE_<NAME>_PROFILE_ID` instead — see [`scep-server.md`](protocols/scep-server.md). |
|
||||
|
||||
## Related references
|
||||
|
||||
- [`architecture.md`](architecture.md) — scheduler topology, system design, security model
|
||||
- [`deployment-model.md`](deployment-model.md) — atomic write + verify + rollback contract
|
||||
- [`operator/security.md`](../operator/security.md) — full security posture (auth, rate limits, encryption at rest)
|
||||
- [`operator/tls.md`](../operator/tls.md) — control-plane TLS env vars
|
||||
- Per-connector pages under [`reference/connectors/`](connectors/index.md) for connector-specific config
|
||||
- Per-protocol pages under [`reference/protocols/`](protocols/) for ACME / SCEP / EST / CRL+OCSP / async-CA polling
|
||||
@@ -0,0 +1,235 @@
|
||||
# ACME Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the outbound ACME v2 issuer
|
||||
> connector (certctl as an ACME *client*). For the inbound ACME
|
||||
> server (certctl as an ACME *server*), see
|
||||
> [acme-server.md](../protocols/acme-server.md). For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The ACME connector implements the full ACME v2 protocol (RFC 8555)
|
||||
using Go's `golang.org/x/crypto/acme` package. It supports three
|
||||
challenge methods and ARI (RFC 9773) for renewal-window negotiation.
|
||||
|
||||
Compatible CAs include Let's Encrypt, ZeroSSL, Sectigo, Buypass,
|
||||
Google Trust Services, SSL.com, and any other RFC 8555 ACME
|
||||
implementation. step-ca's ACME directory is also compatible if you
|
||||
prefer ACME over the native step-ca connector.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/acme/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the ACME connector when:
|
||||
|
||||
- You need public-trust certificates (Let's Encrypt, ZeroSSL,
|
||||
Sectigo via ACME, Google Trust Services, SSL.com).
|
||||
- You want certctl to drive renewal lifecycle on top of the ACME
|
||||
CA's free or paid issuance.
|
||||
- You want one tool that covers both internal PKI (Local, Vault,
|
||||
step-ca) and public-trust ACME issuance.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You need OV / EV certificates and your CA doesn't expose them
|
||||
via ACME — use the DigiCert or Sectigo SCM REST connectors.
|
||||
- You're standing up internal-only PKI and don't want to operate
|
||||
ACME challenge infrastructure — use Local CA or Vault PKI for a
|
||||
simpler synchronous path.
|
||||
|
||||
## Challenge methods
|
||||
|
||||
### HTTP-01 (default)
|
||||
|
||||
A built-in temporary HTTP server starts on demand during
|
||||
certificate issuance. The domain being validated must resolve to
|
||||
the machine running the connector, and the configured HTTP port
|
||||
must be reachable from the internet.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-staging-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"http_port": 80
|
||||
}
|
||||
```
|
||||
|
||||
### DNS-01 (for wildcards)
|
||||
|
||||
Creates DNS TXT records via user-provided scripts. Required for
|
||||
wildcard certificates (`*.example.com`) and hosts that can't serve
|
||||
HTTP on port 80. The connector invokes external scripts to create
|
||||
and clean up `_acme-challenge` TXT records, making it compatible
|
||||
with any DNS provider (Cloudflare, Route53, Azure DNS, etc.).
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"challenge_type": "dns-01",
|
||||
"dns_present_script": "/etc/certctl/dns/create-record.sh",
|
||||
"dns_cleanup_script": "/etc/certctl/dns/delete-record.sh",
|
||||
"dns_propagation_wait": 30
|
||||
}
|
||||
```
|
||||
|
||||
DNS hook scripts receive these environment variables:
|
||||
|
||||
- `CERTCTL_DNS_DOMAIN` — domain being validated
|
||||
- `CERTCTL_DNS_FQDN` — full record name (`_acme-challenge.<domain>`
|
||||
for dns-01, `_validation-persist.<domain>` for dns-persist-01)
|
||||
- `CERTCTL_DNS_VALUE` — TXT record value
|
||||
- `CERTCTL_DNS_TOKEN` — ACME challenge token
|
||||
|
||||
The present script must create the TXT record and exit 0; the
|
||||
cleanup script removes it (dns-01 only).
|
||||
|
||||
### DNS-PERSIST-01 (standing record)
|
||||
|
||||
Creates a one-time persistent TXT record at
|
||||
`_validation-persist.<domain>` containing the CA's issuer domain
|
||||
and your ACME account URI. Once set, this record authorizes
|
||||
unlimited future certificate issuances without per-renewal DNS
|
||||
updates. Based on
|
||||
[draft-ietf-acme-dns-persist](https://datatracker.ietf.org/doc/draft-ietf-acme-dns-persist/)
|
||||
and CA/Browser Forum ballot SC-088v3.
|
||||
|
||||
If the CA doesn't offer dns-persist-01 yet, the connector falls
|
||||
back to dns-01 automatically.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme-v02.api.letsencrypt.org/directory",
|
||||
"email": "admin@example.com",
|
||||
"challenge_type": "dns-persist-01",
|
||||
"dns_present_script": "/etc/certctl/dns/create-record.sh",
|
||||
"dns_persist_issuer_domain": "letsencrypt.org",
|
||||
"dns_propagation_wait": 30
|
||||
}
|
||||
```
|
||||
|
||||
The present script creates a TXT record at
|
||||
`_validation-persist.<domain>` with the value
|
||||
`letsencrypt.org; accounturi=https://acme-v02.api.letsencrypt.org/acme/acct/<your-id>`.
|
||||
This record is permanent — no cleanup script is needed.
|
||||
|
||||
## ACME Renewal Information (ARI, RFC 9773)
|
||||
|
||||
Instead of using fixed renewal thresholds (e.g. renew 30 days
|
||||
before expiry), certctl can ask the CA when it should renew.
|
||||
Enable with `CERTCTL_ACME_ARI_ENABLED=true`.
|
||||
|
||||
The ARI protocol lets the CA specify a `suggestedWindow` (start
|
||||
and end times) for when you should renew — useful for distributing
|
||||
load during maintenance windows or coordinating mass-revocation
|
||||
scenarios. Cert ID is computed as `base64url(SHA-256(DER cert))`.
|
||||
|
||||
If the CA doesn't support ARI (404 response), certctl
|
||||
automatically falls back to threshold-based renewal with no
|
||||
operator intervention required.
|
||||
|
||||
## External Account Binding (EAB)
|
||||
|
||||
ZeroSSL, Google Trust Services, and SSL.com require EAB for ACME
|
||||
account registration. For most CAs, get your EAB credentials from
|
||||
the CA's dashboard and provide them via `eab_kid` and `eab_hmac`.
|
||||
The HMAC key must be base64url-encoded (no padding). CAs that
|
||||
don't require EAB (Let's Encrypt, Buypass) ignore these fields.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme.zerossl.com/v2/DV90",
|
||||
"email": "admin@example.com",
|
||||
"eab_kid": "your-zerossl-eab-kid",
|
||||
"eab_hmac": "your-zerossl-eab-hmac-base64url"
|
||||
}
|
||||
```
|
||||
|
||||
### ZeroSSL auto-EAB
|
||||
|
||||
When the directory URL points to ZeroSSL and no EAB credentials
|
||||
are provided, certctl automatically fetches them from ZeroSSL's
|
||||
public API (`api.zerossl.com/acme/eab-credentials-email`) using
|
||||
your configured email address. No dashboard visit required — just
|
||||
set the directory URL and email. Same approach used by Caddy and
|
||||
acme.sh.
|
||||
|
||||
```json
|
||||
{
|
||||
"directory_url": "https://acme.zerossl.com/v2/DV90",
|
||||
"email": "admin@example.com"
|
||||
}
|
||||
```
|
||||
|
||||
## Certificate profiles (Let's Encrypt, GA January 2026)
|
||||
|
||||
Let's Encrypt supports ACME certificate profile selection. Set
|
||||
`CERTCTL_ACME_PROFILE=shortlived` to request 6-day certificates —
|
||||
ideal for ephemeral workloads where short validity substitutes for
|
||||
revocation. The `tlsserver` profile produces standard TLS
|
||||
certificates. When the profile field is empty (default), the CA
|
||||
uses its default profile.
|
||||
|
||||
## Environment variables
|
||||
|
||||
- `CERTCTL_ACME_DIRECTORY_URL` — ACME directory URL
|
||||
- `CERTCTL_ACME_EMAIL` — Contact email for account registration
|
||||
- `CERTCTL_ACME_EAB_KID` — External Account Binding Key ID
|
||||
- `CERTCTL_ACME_EAB_HMAC` — External Account Binding HMAC key
|
||||
(base64url-encoded)
|
||||
- `CERTCTL_ACME_CHALLENGE_TYPE` — `http-01` (default), `dns-01`,
|
||||
or `dns-persist-01`
|
||||
- `CERTCTL_ACME_DNS_PRESENT_SCRIPT` — Path to DNS record creation
|
||||
script
|
||||
- `CERTCTL_ACME_DNS_CLEANUP_SCRIPT` — Path to DNS record cleanup
|
||||
script (dns-01 only)
|
||||
- `CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN` — CA issuer domain for
|
||||
persistent record (dns-persist-01 only)
|
||||
- `CERTCTL_ACME_PROFILE` — Certificate profile for the newOrder
|
||||
request
|
||||
|
||||
## Revocation by serial number (Top-10 fix #7)
|
||||
|
||||
RFC 8555 §7.6 requires the certificate DER bytes (not just the
|
||||
serial) on the revoke wire — but a CLM platform's job is to
|
||||
abstract over that limitation. Operators routinely have only the
|
||||
serial in hand: the original PEM was lost, the private key was
|
||||
rotated, the operator clicked "revoke" in the GUI based on a row
|
||||
in the certs list.
|
||||
|
||||
certctl's ACME
|
||||
`RevokeCertificate(ctx, RevocationRequest{Serial: ...})` looks the
|
||||
serial up in the local cert store
|
||||
(`certificate_versions.pem_chain`), decodes the leaf-cert PEM into
|
||||
DER, and calls the ACME revoke endpoint with
|
||||
`(accountKey, der, reasonCode)` — RFC 8555 §7.6 case 1,
|
||||
"revocation request signed with account key". This works because
|
||||
the same account key issued the cert, so authority is intrinsic.
|
||||
|
||||
The cert version must exist in the local store: this means the
|
||||
cert was issued through certctl, not imported. If
|
||||
`GetVersionBySerial` returns `sql.ErrNoRows`, the connector
|
||||
returns an actionable error pointing at the local-store
|
||||
requirement. Revoke-by-serial is therefore only available for
|
||||
ACME certs that certctl issued.
|
||||
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to
|
||||
`unspecified` (0), and the connector accepts the canonical
|
||||
camelCase form (`keyCompromise`, `cACompromise`,
|
||||
`affiliationChanged`, `superseded`, `cessationOfOperation`,
|
||||
`certificateHold`, `removeFromCRL`, `privilegeWithdrawn`,
|
||||
`aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE
|
||||
variants. An unknown reason returns an error rather than silently
|
||||
demoting to `unspecified` — operators rely on the reason for
|
||||
audit reporting.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [ACME server](../protocols/acme-server.md) — certctl *as* an ACME server (the inverse direction)
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [migration/acme-from-cert-manager.md](../../migration/acme-from-cert-manager.md) — point cert-manager at certctl's ACME server
|
||||
- [migration/acme-from-traefik.md](../../migration/acme-from-traefik.md) — point Traefik at certctl's ACME server
|
||||
@@ -0,0 +1,112 @@
|
||||
# Active Directory Certificate Services (ADCS) Integration — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for integrating certctl with Microsoft
|
||||
> ADCS as the enterprise root. For the connector-development context
|
||||
> (interface contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
ADCS integration is **not** a separate connector. certctl integrates
|
||||
with ADCS via the **sub-CA mode** of the Local CA issuer: certctl
|
||||
operates as a subordinate CA whose signing certificate was issued by
|
||||
ADCS, so all certctl-issued certificates chain back to the enterprise
|
||||
ADCS root.
|
||||
|
||||
This is the canonical pattern for Windows-shop deployments where
|
||||
ADCS is already the root of trust and operators want certctl to
|
||||
handle automation (lifecycle, renewal, deployment, alerts) without
|
||||
ADCS having to support a non-Microsoft REST API surface.
|
||||
|
||||
## When to use this integration
|
||||
|
||||
Use ADCS sub-CA mode when:
|
||||
|
||||
- ADCS is your enterprise root and you don't want to introduce a
|
||||
parallel root of trust.
|
||||
- You want all certctl-issued certificates to validate against the
|
||||
ADCS chain that's already in your Windows trust stores, mobile
|
||||
device profiles, and load-balancer configurations.
|
||||
- You need certctl's automation surface (ACME, SCEP, EST, profile
|
||||
policy, scheduler, deployment connectors) but want ADCS to remain
|
||||
the signing authority for the root.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You want certctl to issue from its own root of trust — use the
|
||||
Local CA issuer in self-signed mode.
|
||||
- ADCS is being decommissioned or replaced — the migration path
|
||||
from ADCS to Vault PKI / step-ca / Local CA needs its own
|
||||
rollout plan; that's not what this connector covers.
|
||||
|
||||
## How sub-CA mode works
|
||||
|
||||
The Local CA issuer loads a pre-signed CA certificate and key from
|
||||
disk:
|
||||
|
||||
- `CERTCTL_CA_CERT_PATH` — path to the certctl signing cert PEM
|
||||
(the one ADCS issued).
|
||||
- `CERTCTL_CA_KEY_PATH` — path to the matching private key PEM.
|
||||
|
||||
Every leaf certctl issues is signed with this key, and the chain
|
||||
returned to clients includes both the certctl signing cert and the
|
||||
ADCS root (so verifying clients see a complete chain to the
|
||||
enterprise root).
|
||||
|
||||
The signing certificate certctl uses is just a normal CA cert with
|
||||
`Basic Constraints: CA=true` and an appropriate path-length
|
||||
constraint. ADCS issues this certificate using its standard
|
||||
"Subordinate Certification Authority" template; the operator just
|
||||
takes the resulting cert + key and points certctl at them.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Provisioning the certctl sub-CA
|
||||
|
||||
1. Generate a new keypair for certctl on the host that will run it
|
||||
(or in the HSM / KMS the operator wants to delegate signing to,
|
||||
via the `internal/crypto/signer/` driver interface when alternate
|
||||
drivers are configured).
|
||||
2. Build a CSR with `Basic Constraints: CA=true`, the operator's
|
||||
chosen path-length constraint, and key usages including
|
||||
`keyCertSign` and `cRLSign`.
|
||||
3. Submit the CSR to ADCS using the Subordinate Certification
|
||||
Authority template (or a custom template that grants those key
|
||||
usages).
|
||||
4. Place the signed certctl-cert and the matching key at
|
||||
`CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH`.
|
||||
5. Restart certctl-server (or Rebuild the issuer via the API).
|
||||
Subsequent issuance chains to the ADCS root.
|
||||
|
||||
### Rotating the sub-CA cert
|
||||
|
||||
When the certctl sub-CA cert is approaching expiry:
|
||||
|
||||
1. Generate a new keypair (re-keying is recommended at sub-CA
|
||||
rotation time).
|
||||
2. CSR + ADCS signing cycle as above.
|
||||
3. Stage the new cert and key at fresh on-disk paths and follow the
|
||||
[intermediate-CA hierarchy
|
||||
runbook](../intermediate-ca-hierarchy.md) for the cutover (rotate
|
||||
`CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH` to the new files
|
||||
when ready). The
|
||||
key concern is overlap: both the old and new sub-CA certs must
|
||||
chain to the ADCS root during the rollover so existing leaves
|
||||
keep validating.
|
||||
|
||||
### Revocation chain
|
||||
|
||||
CRL and OCSP for ADCS-rooted leaves are handled by certctl's CRL
|
||||
distribution point and OCSP responder
|
||||
([crl-ocsp.md](../protocols/crl-ocsp.md)). The ADCS root publishes
|
||||
its own CRL covering the certctl sub-CA cert; relying parties walk
|
||||
both CDP entries to determine the full revocation status.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Local CA issuer](index.md#built-in-local-ca) — the connector this integration uses
|
||||
- [Intermediate CA hierarchy](../intermediate-ca-hierarchy.md) — how certctl manages multi-level CA trees, including ADCS-rooted setups
|
||||
- [CRL and OCSP](../protocols/crl-ocsp.md) — how relying parties validate ADCS-rooted leaves
|
||||
- [Architecture](../architecture.md) — `internal/crypto/signer/` driver interface for HSM / KMS / cloud-KMS alternatives to file-on-disk for the certctl sub-CA private key
|
||||
@@ -1,6 +1,11 @@
|
||||
# Apache httpd Connector — Operator Deep-Dive
|
||||
|
||||
> Per Phase 14 of the deploy-hardening II master bundle.
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Per Phase 14 of the deploy-hardening II master bundle. For the
|
||||
> connector-development context (interface contract, registry, atomic
|
||||
> deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
@@ -73,7 +78,7 @@ per-file ownership is preserved per Bundle I Phase 5.
|
||||
`TestVendorEdge_Apache_ReloadVsRestart_PreservesConnections_E2E`
|
||||
|
||||
In-flight TLS sessions survive `apachectl graceful` worker
|
||||
swap. Documented in `docs/deployment-atomicity.md`.
|
||||
swap. Documented in `docs/reference/deployment-model.md`.
|
||||
|
||||
### SNI server_name binding
|
||||
|
||||
@@ -97,5 +102,5 @@ supplied ordering across rotation.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Atomic deploy + post-verify + rollback](deployment-atomicity.md)
|
||||
- [Vendor compatibility matrix](deployment-vendor-matrix.md)
|
||||
- [Atomic deploy + post-verify + rollback](../deployment-model.md)
|
||||
- [Vendor compatibility matrix](../vendor-matrix.md)
|
||||
@@ -0,0 +1,165 @@
|
||||
# AWS ACM Private CA Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the AWS Certificate Manager
|
||||
> Private Certificate Authority (ACM PCA) issuer connector. For the
|
||||
> connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
AWS ACM Private CA is a managed private CA on AWS. The connector
|
||||
calls `IssueCertificate` (which is asynchronous at the ACM PCA API
|
||||
level), then runs the SDK's `NewCertificateIssuedWaiter` until the
|
||||
cert reaches `CERTIFICATE_ISSUED` state, then `GetCertificate` to
|
||||
retrieve the PEM. Default waiter timeout is 5 minutes; tune by
|
||||
editing `defaultWaiterTimeout` in
|
||||
`internal/connector/issuer/awsacmpca/awsacmpca.go`.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/awsacmpca/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the AWS ACM PCA connector when:
|
||||
|
||||
- Your workloads are AWS-native and you want the CA to live inside
|
||||
your AWS account (for blast-radius, IAM, and audit reasons).
|
||||
- You need ACM PCA's CRL distribution and OCSP responder to serve
|
||||
status to relying parties without certctl being in the OCSP path.
|
||||
- You want IAM-based access control (no API keys to rotate) for
|
||||
certctl's signing path.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You're not on AWS — Google CAS or Azure Key Vault are the cloud-
|
||||
native equivalents on those platforms.
|
||||
- You need public-trust certificates — ACM PCA is private only.
|
||||
- You don't already pay for ACM PCA (it has a non-trivial monthly
|
||||
cost). Vault, step-ca, or the Local CA issuer are free
|
||||
self-hosted alternatives.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_AWS_PCA_REGION` | Yes | — | AWS region (e.g. `us-east-1`) |
|
||||
| `CERTCTL_AWS_PCA_CA_ARN` | Yes | — | ARN of the ACM Private CA |
|
||||
| `CERTCTL_AWS_PCA_SIGNING_ALGORITHM` | No | `SHA256WITHRSA` | Signing algorithm |
|
||||
| `CERTCTL_AWS_PCA_VALIDITY_DAYS` | No | `365` | Certificate validity in days |
|
||||
| `CERTCTL_AWS_PCA_TEMPLATE_ARN` | No | — | Optional certificate template ARN |
|
||||
|
||||
Supported signing algorithms: `SHA256WITHRSA`, `SHA384WITHRSA`,
|
||||
`SHA512WITHRSA`, `SHA256WITHECDSA`, `SHA384WITHECDSA`,
|
||||
`SHA512WITHECDSA`.
|
||||
|
||||
## Authentication
|
||||
|
||||
Standard AWS credential chain via
|
||||
`aws-sdk-go-v2/config.LoadDefaultConfig()`. Resolves credentials in
|
||||
this order:
|
||||
|
||||
1. Environment variables (`AWS_ACCESS_KEY_ID`,
|
||||
`AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`).
|
||||
2. Shared config files (`~/.aws/config`, `~/.aws/credentials`,
|
||||
profile via `AWS_PROFILE`).
|
||||
3. IAM Roles for Service Accounts (IRSA) on EKS.
|
||||
4. EC2 instance profiles.
|
||||
5. ECS task roles.
|
||||
6. SSO.
|
||||
|
||||
certctl never stores AWS credentials directly — set them in the
|
||||
certctl process's environment or via the IAM role attached to the
|
||||
host.
|
||||
|
||||
## Minimal IAM policy
|
||||
|
||||
The IAM principal that certctl authenticates as needs the following
|
||||
actions against the CA's ARN:
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"acm-pca:IssueCertificate",
|
||||
"acm-pca:GetCertificate",
|
||||
"acm-pca:RevokeCertificate",
|
||||
"acm-pca:GetCertificateAuthorityCertificate"
|
||||
],
|
||||
"Resource": "arn:aws:acm-pca:us-east-1:123456789012:certificate-authority/12345678-1234-1234-1234-123456789012"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Replace the `Resource` ARN with your own CA ARN. If you use a
|
||||
`TemplateArn` (subordinate-CA template), the policy needs no
|
||||
additional permissions — `IssueCertificate` covers it.
|
||||
|
||||
## Worked example: add the issuer via API
|
||||
|
||||
```bash
|
||||
curl -k -X POST https://localhost:8443/api/v1/issuers \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"id": "iss-aws-prod",
|
||||
"name": "AWS ACM PCA (prod)",
|
||||
"type": "AWSACMPCA",
|
||||
"config": {
|
||||
"region": "us-east-1",
|
||||
"ca_arn": "arn:aws:acm-pca:us-east-1:123456789012:certificate-authority/12345678-1234-1234-1234-123456789012",
|
||||
"signing_algorithm": "SHA256WITHRSA",
|
||||
"validity_days": 90
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
The certctl server process must have AWS credentials available
|
||||
before the issuer is created (or before any subsequent issuance
|
||||
call). For a local dev run with shared-config creds:
|
||||
`export AWS_PROFILE=my-profile` before `docker compose up`. For an
|
||||
EKS deployment: attach an IRSA-bound IAM role to the certctl pod's
|
||||
service account.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `AccessDeniedException: User ... is not authorized to perform: acm-pca:IssueCertificate`
|
||||
|
||||
The IAM principal certctl is using lacks the required actions.
|
||||
Apply the IAM policy above (scoped to your CA ARN) to the
|
||||
role/user. The principal can be inspected with
|
||||
`aws sts get-caller-identity` from the certctl host.
|
||||
|
||||
### `ResourceNotFoundException: Could not find Certificate Authority`
|
||||
|
||||
The `CAArn` doesn't match any CA in the configured region. Common
|
||||
causes: region mismatch (CA is in `us-west-2`, certctl region is
|
||||
set to `us-east-1`), CA was deleted, ARN typo. Verify with
|
||||
`aws acm-pca describe-certificate-authority --certificate-authority-arn <arn> --region <region>`.
|
||||
|
||||
### `acmpca waiter (waiting for issuance): exceeded max wait time`
|
||||
|
||||
The cert was submitted but didn't reach `CERTIFICATE_ISSUED` state
|
||||
within 5 minutes. Check the CA's CloudWatch metrics for backlog;
|
||||
check the CA's audit reports for any policy violations on the
|
||||
request. If the wait is consistently slow, edit
|
||||
`defaultWaiterTimeout` in
|
||||
`internal/connector/issuer/awsacmpca/awsacmpca.go` and rebuild.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by AWS ACM PCA directly. certctl records
|
||||
revocations locally and notifies AWS via the `RevokeCertificate`
|
||||
API with RFC 5280 reason mapping (e.g. `keyCompromise` →
|
||||
`KEY_COMPROMISE`). AWS ACM PCA's CRL distribution point and OCSP
|
||||
responder serve the resulting status to verifying clients —
|
||||
certctl is **not** in the OCSP path for this connector.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — bounded-polling primitive (ACM PCA uses the SDK waiter, not certctl's polling, but the same operator concerns apply)
|
||||
- [Disaster recovery runbook](../../operator/runbooks/disaster-recovery.md) — what happens to ACM PCA-issued certs if the CA is deleted
|
||||
@@ -0,0 +1,208 @@
|
||||
# AWS Certificate Manager (ACM) Target Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the AWS Certificate Manager
|
||||
> (ACM) target connector. For the connector-development context
|
||||
> (interface contract, registry, atomic deploy primitive shared
|
||||
> across all targets), see the [connector index](index.md).
|
||||
>
|
||||
> **Note:** this is the **target** connector that deploys
|
||||
> certificates *into* ACM for ALB / CloudFront / API Gateway / App
|
||||
> Runner consumption. The **issuer** connector that pulls certs
|
||||
> *from* AWS ACM Private CA is documented separately at
|
||||
> [aws-acm-pca.md](aws-acm-pca.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The AWS ACM target connector deploys certificates into AWS
|
||||
Certificate Manager — the public AWS service that ALB /
|
||||
CloudFront / API Gateway / App Runner consume by ARN. Closes the
|
||||
"we terminate TLS at AWS, how do we get certctl-issued certs to
|
||||
ALB?" question for cloud-first deployments. Rank 5 of the
|
||||
2026-05-03 Infisical deep-research deliverable.
|
||||
|
||||
Implementation lives at `internal/connector/target/awsacm/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the AWS ACM target connector when:
|
||||
|
||||
- TLS terminates at AWS-managed edges (ALB, CloudFront, API
|
||||
Gateway, App Runner) and those services consume certs by ACM
|
||||
ARN.
|
||||
- You want certctl to drive the rotation while Terraform /
|
||||
CloudFormation handles the ARN-to-resource attachment.
|
||||
- You need short-lived IAM credentials (IRSA, instance profiles)
|
||||
rather than long-lived access keys.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The target is an EC2 instance running NGINX / HAProxy / Apache
|
||||
directly — those connectors are simpler than the ACM round-trip.
|
||||
- You're using ACM Private CA for internal trust — that's the
|
||||
[aws-acm-pca.md](aws-acm-pca.md) issuer, a different connector.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"region": "us-east-1",
|
||||
"certificate_arn": "arn:aws:acm:us-east-1:123456789012:certificate/abcdef01-2345-6789-abcd-ef0123456789",
|
||||
"tags": {"env": "production", "app": "api-gateway"}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `region` | (required) | AWS region for the ACM endpoint (e.g. `us-east-1`). CloudFront-attached certs MUST live in `us-east-1`; ALB / API Gateway use the same region as the load balancer. |
|
||||
| `certificate_arn` | — | ARN of an existing ACM certificate to rotate in place. Empty on first deploy — the adapter creates a new ACM cert via `ImportCertificate` and the deployment record's Metadata captures the resulting ARN. Operators can also pre-create the ARN out-of-band (Terraform, CloudFormation) and pin it here. |
|
||||
| `tags` | — | Tags applied to the ACM cert at first import + re-applied via `AddTagsToCertificate` on every subsequent import (ACM strips tags on re-import). The reserved keys `certctl-managed-by` and `certctl-certificate-id` are set automatically and cannot be overridden. |
|
||||
|
||||
## IAM policy (minimum permissions)
|
||||
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"acm:ImportCertificate",
|
||||
"acm:GetCertificate",
|
||||
"acm:DescribeCertificate",
|
||||
"acm:ListCertificates",
|
||||
"acm:AddTagsToCertificate"
|
||||
],
|
||||
"Resource": "arn:aws:acm:*:*:certificate/*"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
## Auth recipes
|
||||
|
||||
- **IRSA (IAM Roles for Service Accounts) — recommended for K8s
|
||||
deploys.** Annotate the agent's ServiceAccount with
|
||||
`eks.amazonaws.com/role-arn=arn:aws:iam::<account>:role/certctl-acm-deployer`.
|
||||
The role's trust policy allows the cluster's OIDC provider;
|
||||
permission policy is the JSON above. Short-lived STS
|
||||
credentials are auto-rotated by EKS — no long-lived access
|
||||
keys.
|
||||
- **EC2 instance profile — recommended for VM-based agents.**
|
||||
Attach an instance profile referencing the same role. SDK's
|
||||
`LoadDefaultConfig` picks credentials up via the IMDS metadata
|
||||
service.
|
||||
- **AWS SSO / `aws configure sso` — recommended for operator
|
||||
workstations.** SDK reads `~/.aws/config` for the SSO profile
|
||||
and refreshes tokens via the existing CLI session.
|
||||
- **Long-lived access keys are NOT supported in connector
|
||||
Config** — the credential chain is configured at the SDK
|
||||
level, not the connector level. This is a procurement-
|
||||
readability decision: a security reviewer reading the
|
||||
`deployment_targets` table should never find an access key.
|
||||
|
||||
## Atomic-rollback contract
|
||||
|
||||
Every `DeployCertificate` snapshots the existing cert via
|
||||
`DescribeCertificate` + `GetCertificate` BEFORE calling
|
||||
`ImportCertificate` with the new bytes. After import, the
|
||||
connector re-fetches the cert metadata and compares serial
|
||||
numbers.
|
||||
|
||||
On serial-mismatch (post-verify failure), the connector calls
|
||||
`ImportCertificate` again with the snapshotted bytes to restore
|
||||
the previous cert. The rollback path emits a `WARN`-level slog
|
||||
entry; the rollback's own success or failure is exposed via
|
||||
`certctl_deploy_rollback_total{target_type="AWSACM",outcome="restored"|"also_failed"}`
|
||||
per the deploy-hardening I Phase 10 metric exposer.
|
||||
|
||||
Mirrors the Bundle 5+ pre-deploy-snapshot pattern shipped for
|
||||
IIS / WinCertStore / JavaKeystore.
|
||||
|
||||
## ALB attachment recipe
|
||||
|
||||
certctl creates / rotates the ACM cert; the operator (or
|
||||
Terraform / CloudFormation) attaches it to the ALB listener
|
||||
separately. For Terraform-driven deployments, look up the ARN by
|
||||
tag:
|
||||
|
||||
```hcl
|
||||
data "aws_acm_certificate" "certctl_managed" {
|
||||
domain = "api.example.com"
|
||||
most_recent = true
|
||||
|
||||
# Filter by certctl provenance tags so an unrelated ACM cert with
|
||||
# the same SAN doesn't get picked up.
|
||||
tags = {
|
||||
"certctl-managed-by" = "certctl"
|
||||
"certctl-certificate-id" = "mc-api-prod"
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_lb_listener" "https" {
|
||||
load_balancer_arn = aws_lb.api.arn
|
||||
port = 443
|
||||
protocol = "HTTPS"
|
||||
certificate_arn = data.aws_acm_certificate.certctl_managed.arn
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
The ARN updates in place across renewals (ACM `ImportCertificate`
|
||||
is upsert-style when given an ARN), so the ALB listener's
|
||||
`certificate_arn` reference doesn't change. CloudFront / API
|
||||
Gateway distributions can reference the same ARN via their
|
||||
respective Terraform resources.
|
||||
|
||||
## Threat model carve-outs
|
||||
|
||||
- **Cert key bytes never written to disk on the agent.**
|
||||
`DeployCertificate` reads `request.KeyPEM` from memory and
|
||||
passes it to the SDK's `ImportCertificate` call. No temp file.
|
||||
No swap-out window.
|
||||
- **Provenance tags are mandatory.** The reserved
|
||||
`certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>`
|
||||
pair is set automatically on every import. Operators
|
||||
identifying a stray ACM cert in their account can match
|
||||
against `certctl-managed-by` to confirm it was certctl-issued
|
||||
(or NOT — the absence of the tag means a manual import).
|
||||
- **No long-lived AWS credentials in `Config`.** `Config`
|
||||
carries region + ARN + operator tags only. AWS auth is the
|
||||
SDK credential chain (IRSA / instance profile / SSO).
|
||||
- **`ListCertificates` IAM permission is required for the V2
|
||||
ARN-discovery dance to work.** Operators who pin
|
||||
`Config.CertificateArn` after the first deploy can drop this
|
||||
permission; the V2 fallback emits a warning and reverts to
|
||||
"always create new ARN" if the operator forgets to update
|
||||
`certificate_arn` post-first-deploy.
|
||||
|
||||
## Procurement checklist crib
|
||||
|
||||
Paste into security review:
|
||||
|
||||
- certctl uses short-lived IAM-role credentials via IRSA /
|
||||
instance profile, not long-lived access keys.
|
||||
- The cert key is held only in agent memory during the import
|
||||
call; never written to disk.
|
||||
- Every imported ACM cert is tagged with
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback to the snapshotted
|
||||
previous cert; both outcomes are surfaced via Prometheus.
|
||||
- The minimum IAM policy is 5 actions on
|
||||
`arn:aws:acm:*:*:certificate/*`; CloudTrail captures every
|
||||
API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
ACM has no dry-run API for `ImportCertificate`; `ValidateOnly`
|
||||
returns `target.ErrValidateOnlyNotSupported` per the deploy-
|
||||
hardening I Phase 3 sentinel contract. Operators preview deploys
|
||||
via `ValidateConfig` + `aws acm describe-certificate
|
||||
--certificate-arn <arn>` against the current ARN.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [Azure Key Vault](azure-kv.md) — Azure equivalent target
|
||||
- [AWS ACM Private CA issuer](aws-acm-pca.md) — the *issuer* counterpart (same vendor, opposite direction)
|
||||
- [Cloud targets runbook](../../operator/runbooks/cloud-targets.md) — operator playbook covering both AWS ACM and Azure KV
|
||||
@@ -0,0 +1,195 @@
|
||||
# Azure Key Vault Target Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Azure Key Vault target
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, atomic deploy primitive shared across all
|
||||
> targets), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Azure Key Vault target connector deploys certificates into
|
||||
Azure Key Vault — the Azure-managed cert/secret store that
|
||||
Application Gateway / Front Door / App Service / Container Apps
|
||||
consume by KID URI. Rank 5 (Azure half) of the 2026-05-03
|
||||
Infisical deep-research deliverable.
|
||||
|
||||
Implementation lives at `internal/connector/target/azurekv/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Azure Key Vault target connector when:
|
||||
|
||||
- TLS terminates at Azure-managed edges (Application Gateway,
|
||||
Front Door, App Service, Container Apps) and those services
|
||||
consume certs by Key Vault KID URI.
|
||||
- You need short-lived Azure credentials (managed identity,
|
||||
workload identity) rather than long-lived service-principal
|
||||
secrets.
|
||||
- You need cross-region or cross-cloud-environment Key Vault
|
||||
endpoints (US-Gov `.vault.usgovcloudapi.net`, China
|
||||
`.vault.azure.cn`).
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- The target is an Azure VM running NGINX / IIS / HAProxy
|
||||
directly — those connectors are simpler.
|
||||
- The cert is for an internal Azure service that doesn't read
|
||||
from Key Vault (e.g. a custom .NET app reading PEM from disk).
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"vault_url": "https://my-vault.vault.azure.net",
|
||||
"certificate_name": "api-prod",
|
||||
"tags": {"env": "production", "app": "api-gateway"},
|
||||
"credential_mode": "managed_identity"
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Default | Description |
|
||||
|---|---|---|
|
||||
| `vault_url` | (required) | Key Vault DNS endpoint (`https://<vault-name>.vault.azure.net`). For US-Gov: `.vault.usgovcloudapi.net`; for China: `.vault.azure.cn`. |
|
||||
| `certificate_name` | (required) | Cert object name in the vault (1-127 chars, alphanumeric + hyphens). Versions are auto-generated per import. |
|
||||
| `tags` | — | Tags applied at every import (Key Vault carries tags forward across versions, unlike ACM). Reserved keys `certctl-managed-by` + `certctl-certificate-id` are set automatically. |
|
||||
| `credential_mode` | `default` | One of `default` / `managed_identity` / `client_secret` / `workload_identity`. See "Auth recipes" below. |
|
||||
|
||||
## RBAC role (minimum permissions)
|
||||
|
||||
The off-the-shelf builtin role **Key Vault Certificates Officer**
|
||||
covers everything. For minimum-permission deploys, use a custom
|
||||
role with these data-plane operations on the vault scope
|
||||
(`/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.KeyVault/vaults/<vault-name>`):
|
||||
|
||||
```
|
||||
Microsoft.KeyVault/vaults/certificates/import/action
|
||||
Microsoft.KeyVault/vaults/certificates/read
|
||||
Microsoft.KeyVault/vaults/certificates/listversions/read
|
||||
```
|
||||
|
||||
## Auth recipes
|
||||
|
||||
- **AKS workload identity (`credential_mode: workload_identity`)
|
||||
— recommended for AKS deploys.** Annotate the agent's
|
||||
ServiceAccount with
|
||||
`azure.workload.identity/client-id=<app-id>`. The AKS
|
||||
cluster's OIDC issuer + the federated credential on the app
|
||||
registration handle token exchange; no long-lived secrets.
|
||||
- **Managed identity (`credential_mode: managed_identity`) —
|
||||
recommended for VM / App Service deploys.** Assign a
|
||||
system-assigned or user-assigned managed identity to the
|
||||
host; certctl-server / agent picks it up via IMDS. Pin
|
||||
`credential_mode` rather than letting `default` fall through
|
||||
to env vars (defends against accidental local-dev creds
|
||||
leaking into production).
|
||||
- **Service principal (`credential_mode: client_secret`).**
|
||||
Configure `AZURE_TENANT_ID` + `AZURE_CLIENT_ID` +
|
||||
`AZURE_CLIENT_SECRET` env vars on the agent. NOT recommended
|
||||
for production — long-lived client secret risk; rotate via
|
||||
Key Vault soft-delete recovery if leaked.
|
||||
- **Default (`credential_mode: default` or unset).** SDK's
|
||||
`DefaultAzureCredential` walks env vars → managed identity →
|
||||
Azure CLI fallback. Useful for local-dev where the operator
|
||||
already has `az login` active.
|
||||
- **Long-lived secrets in connector Config NOT supported** —
|
||||
same procurement-readability rule as AWS ACM.
|
||||
|
||||
## Atomic-rollback contract + Azure-version semantics
|
||||
|
||||
Every `DeployCertificate` snapshots the existing latest version
|
||||
via `GetCertificate(name, "" /* latest */)` BEFORE calling
|
||||
`ImportCertificate`. After import, the connector re-fetches the
|
||||
latest version and compares serial numbers.
|
||||
|
||||
On serial-mismatch, the connector calls `ImportCertificate`
|
||||
again with the snapshotted CER bytes (re-PFX'd with the
|
||||
operator's key) — **as a NEW VERSION**. Key Vault doesn't
|
||||
support "version-restore" without soft-delete recovery (which we
|
||||
keep off the minimum-RBAC surface). The version history will
|
||||
show e.g. v1=initial, v2=failed-renewal, v3=rollback-of-v2;
|
||||
operators reading audit dashboards filter by tag.
|
||||
|
||||
### Soft-delete caveat
|
||||
|
||||
V2 doesn't manage Key Vault soft-delete recovery. If a previous
|
||||
version was soft-deleted out-of-band (e.g. operator ran
|
||||
`az keyvault certificate delete`), the rollback re-imports the
|
||||
snapshot bytes as a new version rather than restoring the
|
||||
soft-deleted version. Operators alerting on rollback frequency
|
||||
should also watch for soft-delete events.
|
||||
|
||||
## App Gateway / Front Door attachment recipe
|
||||
|
||||
```hcl
|
||||
data "azurerm_key_vault_certificate" "certctl_managed" {
|
||||
name = "api-prod"
|
||||
key_vault_id = azurerm_key_vault.main.id
|
||||
}
|
||||
|
||||
resource "azurerm_application_gateway" "main" {
|
||||
# ...
|
||||
ssl_certificate {
|
||||
name = "certctl-managed"
|
||||
key_vault_secret_id = data.azurerm_key_vault_certificate.certctl_managed.secret_id
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Application Gateway / Front Door reference the cert by KID URI;
|
||||
certctl rotates the version under the same name, and the AGW /
|
||||
Front Door reference auto-resolves to the latest version (the
|
||||
SDK's behaviour when the KID points to
|
||||
`/certificates/<name>/<version>` vs `/certificates/<name>`
|
||||
differs — the latter auto-tracks "latest"; the former pins).
|
||||
**Pin the version-less KID for auto-tracking renewals.**
|
||||
|
||||
## Threat model carve-outs
|
||||
|
||||
- **Cert key bytes never written to disk on the agent.** PFX
|
||||
wrapping happens in memory (PKCS#12 via
|
||||
`software.sslmate.com/src/go-pkcs12`); the base64-encoded PFX
|
||||
is passed straight to the SDK's `ImportCertificate` call.
|
||||
- **Provenance tags are mandatory.** Same
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` shape as AWS ACM. Operators
|
||||
identifying a stray Key Vault cert match against
|
||||
`certctl-managed-by`.
|
||||
- **No long-lived Azure credentials in `Config`.** `Config`
|
||||
carries vault URL + cert name + operator tags + credential
|
||||
mode only. Auth is the Azure SDK credential chain.
|
||||
- **`credential_mode: managed_identity` is the recommended
|
||||
production posture.** Defends against accidental env-var
|
||||
creds leaking into deployments where the host already has a
|
||||
managed identity assigned.
|
||||
|
||||
## Procurement checklist crib
|
||||
|
||||
Paste into security review:
|
||||
|
||||
- certctl uses Azure managed identity (or workload identity for
|
||||
AKS), not long-lived service-principal secrets.
|
||||
- The cert key is held only in agent memory during the PFX wrap
|
||||
+ import call; never written to disk.
|
||||
- Every imported Key Vault cert is tagged with
|
||||
`certctl-managed-by=certctl` +
|
||||
`certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback by re-importing the
|
||||
snapshotted previous version's bytes; both outcomes are
|
||||
surfaced via Prometheus.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log
|
||||
captures every API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
Key Vault has no dry-run API; `ValidateOnly` returns
|
||||
`target.ErrValidateOnlyNotSupported`. Operators preview deploys
|
||||
via `ValidateConfig` + `az keyvault certificate show
|
||||
--vault-name <name> --name <cert>`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [AWS ACM target](aws-acm.md) — AWS equivalent target
|
||||
- [Cloud targets runbook](../../operator/runbooks/cloud-targets.md) — operator playbook covering both AWS ACM and Azure KV
|
||||
@@ -0,0 +1,100 @@
|
||||
# Caddy Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Caddy target connector. For
|
||||
> the connector-development context (interface contract, registry,
|
||||
> atomic deploy primitive shared across all targets), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Caddy connector supports two deployment modes:
|
||||
|
||||
- **API mode (recommended).** Posts the certificate directly to
|
||||
Caddy's admin API for zero-downtime hot reload.
|
||||
- **File mode (fallback).** Writes cert and key files to disk,
|
||||
relying on Caddy's built-in file watcher or a manual reload.
|
||||
|
||||
Implementation lives at `internal/connector/target/caddy/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Caddy connector when:
|
||||
|
||||
- Caddy fronts your services and you want certctl-managed certs
|
||||
rather than letting Caddy run its own ACME client.
|
||||
- You want zero-downtime hot reload via Caddy's admin API.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You'd rather Caddy keep running its own ACME client — point it
|
||||
at certctl's ACME server (see
|
||||
[migration/acme-from-caddy.md](../../migration/acme-from-caddy.md))
|
||||
for the cleanest pattern.
|
||||
|
||||
## Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"mode": "api",
|
||||
"admin_api": "http://localhost:2019",
|
||||
"cert_dir": "/etc/caddy/certs",
|
||||
"cert_file": "site.crt",
|
||||
"key_file": "site.key"
|
||||
}
|
||||
```
|
||||
|
||||
When `mode` is `"api"`, the connector posts the certificate to
|
||||
the admin API endpoint. When `mode` is `"file"`, it writes files
|
||||
to `cert_dir` (same pattern as Traefik). The `admin_api` field is
|
||||
ignored in file mode.
|
||||
|
||||
## Mode trade-offs
|
||||
|
||||
### API mode
|
||||
|
||||
- Zero-downtime hot reload via `POST /load` or
|
||||
certificate-specific endpoints.
|
||||
- Requires Caddy's admin API to be enabled and reachable from the
|
||||
deployment agent.
|
||||
- Best fit for production deployments where Caddy is configured
|
||||
with an admin endpoint.
|
||||
|
||||
### File mode
|
||||
|
||||
- Writes cert and key files to `cert_dir`; Caddy picks them up
|
||||
via its file watcher or on next config reload.
|
||||
- Use when the admin API isn't available or when Caddy is
|
||||
configured to read certificates from disk.
|
||||
- Behaviorally equivalent to the [Traefik](traefik.md) connector.
|
||||
|
||||
## Deploy contract
|
||||
|
||||
API mode bypasses the Bundle I file-write deploy primitive and
|
||||
talks directly to the Caddy admin API. File mode follows the
|
||||
standard atomic-write + verify path (idempotency check → backup
|
||||
→ atomic write → optional reload → post-deploy TLS verify).
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### Admin API exposure
|
||||
|
||||
Caddy's admin API is an unauthenticated control surface by
|
||||
default. In API mode, ensure the admin API is bound to a
|
||||
loopback or trusted network — exposing it to the public would
|
||||
let anyone reload Caddy's config. Run the agent on the same host
|
||||
as Caddy and use `http://localhost:2019` for the safest posture.
|
||||
|
||||
### Falling back to file mode
|
||||
|
||||
If the admin API is intermittently unreachable, switch the
|
||||
target's `mode` to `file` via `PUT /api/v1/targets/{id}`. The
|
||||
deploy still lands; reload behaviour is whatever the operator's
|
||||
Caddy config does with file changes.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, deploy primitive
|
||||
- [Traefik](traefik.md) — comparable file-provider target
|
||||
- [Migration: point Caddy at certctl's ACME](../../migration/acme-from-caddy.md) — alternative pattern when Caddy should keep its ACME client
|
||||
@@ -0,0 +1,106 @@
|
||||
# DigiCert CertCentral Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the DigiCert CertCentral issuer
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The DigiCert connector integrates with DigiCert's CertCentral REST
|
||||
API for ordering and managing certificates from DigiCert's commercial
|
||||
public CA. It supports Domain Validated (DV), Organization Validated
|
||||
(OV), and Extended Validated (EV) certificates, with async order
|
||||
processing for OV/EV.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/digicert/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the DigiCert connector when:
|
||||
|
||||
- You're already a DigiCert CertCentral customer and want certctl to
|
||||
drive issuance, renewal, and deployment from the same platform that
|
||||
manages your internal PKI.
|
||||
- You need OV or EV certificates that require DigiCert to validate
|
||||
organization details before issuance.
|
||||
- You want one tool that covers both internal CAs (Vault, Local,
|
||||
step-ca) and a public-trust commercial CA.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You only need DV certificates and Let's Encrypt / ZeroSSL is an
|
||||
acceptable issuer — use the ACME connector instead.
|
||||
- You need self-hosted PKI with no commercial CA dependency — use
|
||||
Vault PKI, step-ca, or the Local CA issuer.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_DIGICERT_API_KEY` | — | DigiCert API key (sent in `X-DC-DEVKEY` header) |
|
||||
| `CERTCTL_DIGICERT_ORG_ID` | — | DigiCert organization ID |
|
||||
| `CERTCTL_DIGICERT_PRODUCT_TYPE` | `ssl_basic` | Certificate product (e.g. `ssl_basic`, `ssl_plus`, `ssl_ev`) |
|
||||
| `CERTCTL_DIGICERT_BASE_URL` | `https://www.digicert.com/services/v2` | DigiCert API base URL |
|
||||
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus` |
|
||||
|
||||
## Authentication
|
||||
|
||||
API key passed via `X-DC-DEVKEY` header. The organization ID is sent
|
||||
in the request body (not the header). No mTLS or OAuth2 required.
|
||||
|
||||
## Issuance model
|
||||
|
||||
- **DV certificates** — typically issue immediately; the
|
||||
`/order/certificate/create` API may return the PEM in the same
|
||||
response.
|
||||
- **OV / EV certificates** — require DigiCert-side validation
|
||||
(vetting org documents, checking domain ownership). The API
|
||||
returns 201 with an order ID; certctl's `GetOrderStatus` polls
|
||||
until the certificate is retrievable.
|
||||
|
||||
`GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m
|
||||
capped, ±20% jitter, default 10-minute deadline). For OV/EV orders
|
||||
where humans approve enrollments, bump
|
||||
`CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` to a value that comfortably
|
||||
covers the approval window — see
|
||||
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
|
||||
schedule shape and tuning guidance.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by DigiCert. Clients should validate
|
||||
certificate status against DigiCert's infrastructure. certctl
|
||||
records the revocation locally (audit row + cert state) but does
|
||||
**not** call DigiCert's revoke endpoint — operators revoke through
|
||||
DigiCert's dashboard or the CertCentral REST API directly. This
|
||||
keeps the certctl revocation flow simple at the cost of one extra
|
||||
manual step on revocation.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### API key rotation
|
||||
|
||||
Rotate the API key in DigiCert's dashboard, then either restart
|
||||
certctl-server with the new value in `CERTCTL_DIGICERT_API_KEY` or
|
||||
hot-swap via `PUT /api/v1/issuers/{id}` so the registry's Rebuild
|
||||
path replaces the connector with the new key. No certificate
|
||||
state is invalidated by the rotation — the new key just signs
|
||||
future API calls.
|
||||
|
||||
### Diagnosing slow OV/EV issuance
|
||||
|
||||
DigiCert's OV/EV vetting is a human process and can take hours to
|
||||
days. Bumping `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` lets a
|
||||
single tick wait through the full approval window, but the better
|
||||
operational pattern is to issue OV/EV certs well ahead of expiry
|
||||
so the bounded poll deadline is short. The renewal scheduler's
|
||||
"alert at T-30 days" default exists for exactly this reason.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
- [ACME server](../protocols/acme-server.md) — alternative issuer for DV-only workflows
|
||||
@@ -0,0 +1,115 @@
|
||||
# EJBCA (Keyfactor) Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the EJBCA issuer connector. For
|
||||
> the connector-development context (interface contract, registry,
|
||||
> ports/adapters), see the [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The EJBCA connector calls the EJBCA REST API for self-hosted
|
||||
open-source and Keyfactor enterprise CAs. It supports dual
|
||||
authentication: mTLS (default) or OAuth2 Bearer token, selectable
|
||||
via configuration.
|
||||
|
||||
Implementation lives at `internal/connector/issuer/ejbca/`.
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the EJBCA connector when:
|
||||
|
||||
- You already run EJBCA Community Edition or Keyfactor EJBCA
|
||||
Enterprise as your internal CA and want certctl to drive the
|
||||
lifecycle automation (renewal, deployment, alerts) on top.
|
||||
- You need EJBCA's certificate-profile and end-entity-profile
|
||||
policy enforcement — those policies stay in EJBCA and certctl
|
||||
passes the profile names through.
|
||||
- You need approval-pending workflows (humans approve enrollments)
|
||||
— EJBCA supports the 201-Accepted async path.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You want a simpler internal CA without EJBCA's operational weight
|
||||
— Vault PKI, step-ca, or the Local CA issuer are lighter.
|
||||
- You need a managed CA (no servers to run) — Google CAS or AWS
|
||||
ACM PCA on cloud, or DigiCert / Sectigo for commercial PKI.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_EJBCA_API_URL` | Yes | — | EJBCA REST API base URL |
|
||||
| `CERTCTL_EJBCA_AUTH_MODE` | No | `mtls` | Auth mode: `mtls` or `oauth2` |
|
||||
| `CERTCTL_EJBCA_CLIENT_CERT_PATH` | mTLS | — | Path to client certificate PEM (mTLS mode) |
|
||||
| `CERTCTL_EJBCA_CLIENT_KEY_PATH` | mTLS | — | Path to client key PEM (mTLS mode) |
|
||||
| `CERTCTL_EJBCA_TOKEN` | OAuth2 | — | Bearer token (oauth2 mode) |
|
||||
| `CERTCTL_EJBCA_CA_NAME` | Yes | — | EJBCA CA name |
|
||||
| `CERTCTL_EJBCA_CERT_PROFILE` | No | — | EJBCA certificate profile |
|
||||
| `CERTCTL_EJBCA_EE_PROFILE` | No | — | EJBCA end-entity profile |
|
||||
|
||||
## Authentication
|
||||
|
||||
Configurable via `auth_mode`:
|
||||
|
||||
- **`mtls`** — client certificate and key are loaded for the TLS
|
||||
handshake. This is the default and the more common deployment
|
||||
mode for EJBCA.
|
||||
- **`oauth2`** — the token is sent as `Authorization: Bearer
|
||||
{token}`. Use when EJBCA is fronted by an OAuth2-aware reverse
|
||||
proxy or when integrating with Keyfactor's identity provider.
|
||||
|
||||
The mTLS keypair is cached on the connector after the first API
|
||||
call and reused for the lifetime of the process; rotation is
|
||||
picked up automatically via mtime polling on the cert file (see
|
||||
the mtls keypair caching note in the [connector
|
||||
index](index.md#built-in-ejbca-keyfactor)).
|
||||
|
||||
## Issuance model
|
||||
|
||||
`POST /v1/certificate/pkcs10enroll` with base64-encoded CSR.
|
||||
Returns base64-encoded certificate PEM. EJBCA 9.3+ creates
|
||||
end-entity and issues cert in a single call. Approval-pending
|
||||
enrollments return 201 with a tracking ID; certctl's
|
||||
`GetOrderStatus` polls until the certificate is available.
|
||||
|
||||
## Revocation
|
||||
|
||||
EJBCA requires both issuer DN and serial number for revocation.
|
||||
The connector stores these as a composite `OrderID` in
|
||||
`issuer_dn::serial` format.
|
||||
|
||||
CRL and OCSP are managed by the EJBCA instance. certctl records
|
||||
revocations locally and notifies EJBCA via
|
||||
`PUT /v1/certificate/{issuer_dn}/{serial}/revoke`.
|
||||
|
||||
## Operator playbook
|
||||
|
||||
### mTLS rotation without downtime
|
||||
|
||||
`mv -f new.crt /etc/certctl/ejbca/client.crt` (mtime changes), no
|
||||
process restart required. The next API call re-parses the file
|
||||
and rebuilds the `*http.Transport`. `os.Stat` errors during
|
||||
rotation surface as connector errors rather than silently serving
|
||||
stale credentials.
|
||||
|
||||
### Switching from mTLS to OAuth2
|
||||
|
||||
Update the issuer config via `PUT /api/v1/issuers/{id}` with the
|
||||
new `auth_mode: oauth2` and `token`. The registry's Rebuild path
|
||||
replaces the connector without restart. Prior issuance state
|
||||
(serial numbers, cert state) is unaffected.
|
||||
|
||||
### Diagnosing approval-pending hangs
|
||||
|
||||
If `GetOrderStatus` consistently times out, the operator approval
|
||||
queue in EJBCA is the most common cause. The connector consumes
|
||||
the shared bounded-polling primitive — see
|
||||
[async-ca-polling.md](../protocols/async-ca-polling.md) for the
|
||||
schedule shape and tuning approach.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — bounded-polling primitive
|
||||
- [Approval workflow](../../operator/approval-workflow.md) — certctl-side two-person integrity (separate from EJBCA's approval queue, but addresses the same shape of risk on the certctl side)
|
||||
@@ -0,0 +1,96 @@
|
||||
# Entrust Certificate Services Issuer Connector — Operator Deep-Dive
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
>
|
||||
> Operator-grade documentation for the Entrust CA Gateway issuer
|
||||
> connector. For the connector-development context (interface
|
||||
> contract, registry, ports/adapters), see the
|
||||
> [connector index](index.md).
|
||||
|
||||
## Overview
|
||||
|
||||
The Entrust connector calls the Entrust CA Gateway REST API with
|
||||
mutual TLS client-certificate authentication. It supports
|
||||
synchronous issuance (200 OK with PEM) and approval-pending flows
|
||||
(201 Accepted with async polling).
|
||||
|
||||
Implementation lives at `internal/connector/issuer/entrust/` (the
|
||||
mTLS keypair cache is shared at
|
||||
`internal/connector/issuer/mtlscache/`).
|
||||
|
||||
## When to use this connector
|
||||
|
||||
Use the Entrust connector when:
|
||||
|
||||
- You're an Entrust Certificate Services customer using the CA
|
||||
Gateway as the integration surface.
|
||||
- You need approval-pending workflows where humans approve
|
||||
enrollments before issuance.
|
||||
- You want mTLS-authenticated issuance against a commercial CA
|
||||
with no API keys to rotate.
|
||||
|
||||
Look elsewhere when:
|
||||
|
||||
- You only need DV / OV public-trust and your CA is reachable via
|
||||
ACME — use the [ACME connector](acme.md) for a simpler path.
|
||||
- You're not already an Entrust customer — DigiCert, Sectigo, and
|
||||
GlobalSign are comparable commercial alternatives, with
|
||||
different auth shapes.
|
||||
|
||||
## Configuration
|
||||
|
||||
| Setting | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `CERTCTL_ENTRUST_API_URL` | Yes | — | Entrust CA Gateway base URL |
|
||||
| `CERTCTL_ENTRUST_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
|
||||
| `CERTCTL_ENTRUST_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_ENTRUST_CA_ID` | Yes | — | Certificate Authority ID (from `GET /certificate-authorities`) |
|
||||
| `CERTCTL_ENTRUST_PROFILE_ID` | No | — | Optional enrollment profile ID |
|
||||
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus` |
|
||||
|
||||
For approval-pending workflows where humans approve enrollments,
|
||||
bump `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` to `86400` (24h) so a
|
||||
single tick can wait through the approval window.
|
||||
|
||||
## Authentication
|
||||
|
||||
Mutual TLS — the client certificate and key are loaded via
|
||||
`tls.LoadX509KeyPair()` and attached to the HTTP transport. No API
|
||||
key or token required.
|
||||
|
||||
## Issuance model
|
||||
|
||||
Enrollment via
|
||||
`POST /v1/certificate-authorities/{caId}/enrollments`. Returns 200
|
||||
with PEM immediately for auto-approved enrollments, or 201
|
||||
Accepted with a tracking ID for approval-pending orders.
|
||||
`GetOrderStatus` polls the enrollment endpoint.
|
||||
|
||||
## mTLS keypair caching (audit fix #10)
|
||||
|
||||
The parsed client certificate plus a precomputed `*http.Transport`
|
||||
are cached on the connector after the first API call. Steady-state
|
||||
calls reuse the cached transport — no per-call disk read or
|
||||
`tls.X509KeyPair` parse.
|
||||
|
||||
Rotation is picked up automatically via mtime polling: when the
|
||||
cert file's mtime advances beyond the last-loaded value, the next
|
||||
API call re-parses and rebuilds the transport.
|
||||
|
||||
Operator workflow: `mv -f new.crt /etc/certctl/entrust/client.crt`
|
||||
(mtime changes), no process restart required, takes effect on the
|
||||
next API call. `os.Stat` errors during rotation surface as
|
||||
connector errors rather than silently serving stale credentials.
|
||||
|
||||
## Revocation
|
||||
|
||||
CRL and OCSP are managed by Entrust. certctl records revocations
|
||||
locally and notifies Entrust via
|
||||
`PUT /v1/certificate-authorities/{caId}/certificates/{serial}/revoke`.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Connector index](index.md) — interface contract, registry, port/adapter wiring
|
||||
- [GlobalSign Atlas HVCA](globalsign.md) — comparable mTLS-authenticated commercial CA
|
||||
- [Async CA polling](../protocols/async-ca-polling.md) — the bounded-polling primitive
|
||||
- [Approval workflow](../../operator/approval-workflow.md) — certctl-side two-person integrity (separate from Entrust's approval queue)
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user