Compare commits
235 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 677524d9ec | |||
| 9dc0742e77 | |||
| 1440a30d28 | |||
| a3d8b9c607 | |||
| aa6fafdee9 | |||
| 86fffa305a | |||
| e17788355b | |||
| 87213128cc | |||
| 697fa792ea | |||
| 9c1d446e40 | |||
| 3192cd15c5 | |||
| af47d19ae2 | |||
| cfc234ec42 | |||
| a91197014f | |||
| d6959a75c1 | |||
| 97b23e98d9 | |||
| 4cf5fcdb4f | |||
| 1ee67b7792 | |||
| 128d0eeaa8 | |||
| 9834b4e4a4 | |||
| cab579368b | |||
| 4e5522a999 | |||
| 55ce86b132 | |||
| 52248be717 | |||
| 04c7eca615 | |||
| 6e646e0fe8 | |||
| 675b87ba63 | |||
| 707d8de4fb | |||
| 0725713e19 | |||
| 1ee77c89f8 | |||
| 4bc8b3e723 | |||
| 469611650c | |||
| 91642e2860 | |||
| 0200c7f4a4 | |||
| fe7e766510 | |||
| ff7357f889 | |||
| 3287e174dc | |||
| a53a4b845b | |||
| 9143da5fa8 | |||
| b3cc7cbdb2 | |||
| eef1db0f0a | |||
| 72f5246ce3 | |||
| cb308bb4c7 | |||
| ad93e99158 | |||
| 9d0c3dfa15 | |||
| 2c9602db71 | |||
| ef670fa6da | |||
| 5a6ec39cfd | |||
| e3196e7b50 | |||
| bea69efd12 | |||
| 283ec27ca4 | |||
| a67a6b6c30 | |||
| ccd89c348f | |||
| 478a141498 | |||
| 2497be496d | |||
| 25dd6c07f3 | |||
| eb14236166 | |||
| bbb628243f | |||
| cdc9d03d5b | |||
| e951d319d0 | |||
| d14a45401b | |||
| 655e2879e6 | |||
| e757ef1471 | |||
| 27afa4463d | |||
| 80450c7180 | |||
| c655e0f8c5 | |||
| 5abeeb882b | |||
| b1df6dab27 | |||
| 672e1d991d | |||
| 89b910a8f1 | |||
| 6315ef102a | |||
| 119986fa7e | |||
| 3853b7460c | |||
| e9947dc0fe | |||
| b813660c74 | |||
| 387fb555ac | |||
| f549a7aa79 | |||
| b219e5d68a | |||
| 1f6cf0eafa | |||
| a49eae8155 | |||
| 1c7d085f16 | |||
| cc6eec3608 | |||
| 86fb140414 | |||
| 13cd4d98ba | |||
| 84bc1245a1 | |||
| e1bcde4cf1 | |||
| 3f619bcaac | |||
| f3a85d6b08 | |||
| 596d86a206 | |||
| f2e60b93a3 | |||
| f16a9c767a | |||
| 3a27c87b3f | |||
| 0ed8676066 | |||
| bcefb11e65 | |||
| 75cf8475f5 | |||
| c015cab2f4 | |||
| 3da6584ab8 | |||
| 68f6fd474b | |||
| 614e4e636b | |||
| 370f856725 | |||
| 7382e5f03b | |||
| 5567d4b411 | |||
| e5516d7286 | |||
| fd94e0bd19 | |||
| d0415d3b5e | |||
| c6efa4ab39 | |||
| dedf7fa3a9 | |||
| 4b5927dfff | |||
| cc03f55006 | |||
| 93e1dc598c | |||
| 25f33b830f | |||
| 7d6ef44e21 | |||
| dfa4dbbcbd | |||
| f92c997a50 | |||
| 697c0be9f3 | |||
| 8f146e08d6 | |||
| e6088c79a3 | |||
| e19b8c95fe | |||
| 995b72df05 | |||
| 9954fd1100 | |||
| 2a14a1da01 | |||
| 5a53b648b1 | |||
| cb72292b83 | |||
| 3a11e447cf | |||
| bad02e6f23 | |||
| 4c3b7cbb16 | |||
| e8c64b47dd | |||
| 9feb6c796d | |||
| fd05bacb76 | |||
| f51571297d | |||
| 9a41d0ca39 | |||
| 8b52da6aef | |||
| adfb682754 | |||
| 0822f748a5 | |||
| 368ea681a5 | |||
| b059ec930f | |||
| 2238f28610 | |||
| bbba618beb | |||
| cfc4d3f3e8 | |||
| c06d23dd7a | |||
| 6c8d4eca40 | |||
| 836534f2a7 | |||
| 648e2f7ab1 | |||
| 6375909591 | |||
| 3e5ff4b9c3 | |||
| 76d0ce2a0f | |||
| 207f2c6879 | |||
| 46a58d518a | |||
| c5be6d059f | |||
| ec209c9736 | |||
| d4f02c5f4b | |||
| 2409f2e464 | |||
| 225c7141b8 | |||
| 8807a7303d | |||
| a6515b4323 | |||
| 11173a74c6 | |||
| ec0e7a3560 | |||
| a0b9285323 | |||
| 2655493ac8 | |||
| a8fc177118 | |||
| 20378ea7bb | |||
| bcf2c3ae92 | |||
| 5f81de3219 | |||
| 397d2a1588 | |||
| 65567d0d83 | |||
| 0abd984285 | |||
| ec21c9bb29 | |||
| cb2ef9d0e7 | |||
| da79dde611 | |||
| 935ea1bf9f | |||
| 11e752ac01 | |||
| 03472072b8 | |||
| 63e6f3ef91 | |||
| a00bb349c4 | |||
| 78c7bc16b0 | |||
| 1f98f31f83 | |||
| 6d508cf53f | |||
| 591dcfb139 | |||
| 4881056528 | |||
| 6da60d1287 | |||
| baafab50c5 | |||
| 9b5b9ad3a2 | |||
| 1b4c55af65 | |||
| 01607f8614 | |||
| d27cf3545b | |||
| 144bd5fdf9 | |||
| c617a686d6 | |||
| 09ff51c5ae | |||
| 5716d227b1 | |||
| 67ccbb46fd | |||
| 6d5ca5ec9d | |||
| fde5b39d53 | |||
| de9264baf7 | |||
| 305c7dc851 | |||
| 10f9574bcd | |||
| a0afa7ab6f | |||
| 4655f68e87 | |||
| 677c28aeca | |||
| 1f065d67bb | |||
| fe70910755 | |||
| fd6f236a5c | |||
| 200bdf990f | |||
| 3e5cc86c5a | |||
| 3e3e68fd3a | |||
| fd6ae98222 | |||
| b4ac0cda43 | |||
| a41f271c58 | |||
| be72627aeb | |||
| ef92b07448 | |||
| 5b301f9354 | |||
| 2e297b430e | |||
| 7bc6ad9823 | |||
| 6ccdf45179 | |||
| 69483786aa | |||
| 1f5ab16b18 | |||
| a8d04cded4 | |||
| 8308beb5bb | |||
| b9633e5b1a | |||
| d55807947e | |||
| d9fd0a147e | |||
| 03593d4304 | |||
| 87355c3efb | |||
| f92d148881 | |||
| 50c520e1ff | |||
| 8380cb7946 | |||
| 6d8ab54f46 | |||
| e19c240a79 | |||
| 5c38bc3bfe | |||
| b5687aece8 | |||
| cdb6ebdb6a | |||
| bb85f1a56e | |||
| 44c4d89011 | |||
| eaccbcdcf1 | |||
| 4e3cff0729 | |||
| 09c819d424 |
@@ -13,22 +13,43 @@ POSTGRES_PASSWORD=change-me-in-production
|
||||
# Certctl Server
|
||||
# All server vars use the CERTCTL_ prefix (see internal/config/config.go)
|
||||
# ==============================================================================
|
||||
CERTCTL_DATABASE_URL=postgres://certctl:certctl@postgres:5432/certctl?sslmode=disable
|
||||
# IMPORTANT: keep the password segment of CERTCTL_DATABASE_URL in sync with
|
||||
# POSTGRES_PASSWORD above. If you deploy via `deploy/docker-compose.yml`,
|
||||
# this value is *overridden* by the compose file's
|
||||
# `postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/...`
|
||||
# interpolation — but if you run the binary directly with this .env loaded
|
||||
# (e.g. `set -a; source .env; ./certctl-server`), update *both* lines.
|
||||
# Background: editing POSTGRES_PASSWORD after the postgres data directory
|
||||
# has been initialized once does NOT rotate the password — initdb only
|
||||
# seeds pg_authid on first boot of an empty volume. See docs/quickstart.md
|
||||
# "Warning" callout and `internal/repository/postgres/db.go::wrapPingError`
|
||||
# for the SQLSTATE 28P01 diagnostic that fires when the two drift.
|
||||
CERTCTL_DATABASE_URL=postgres://certctl:change-me-in-production@postgres:5432/certctl?sslmode=disable
|
||||
CERTCTL_SERVER_HOST=0.0.0.0
|
||||
CERTCTL_SERVER_PORT=8443
|
||||
CERTCTL_LOG_LEVEL=info
|
||||
CERTCTL_LOG_FORMAT=json
|
||||
|
||||
# Auth type: "api-key", "jwt", or "none" (for demo/development)
|
||||
# Auth type: "api-key" (production) or "none" (demo/development).
|
||||
# For JWT/OIDC, run an authenticating gateway in front of certctl
|
||||
# (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and
|
||||
# set CERTCTL_AUTH_TYPE=none on the upstream — see
|
||||
# docs/architecture.md "Authenticating-gateway pattern". G-1 removed
|
||||
# the in-process "jwt" option (no JWT middleware shipped — silent auth
|
||||
# downgrade); see docs/upgrade-to-v2-jwt-removal.md if you previously
|
||||
# set CERTCTL_AUTH_TYPE=jwt.
|
||||
CERTCTL_AUTH_TYPE=none
|
||||
# Required when CERTCTL_AUTH_TYPE is "api-key" or "jwt"
|
||||
# Required when CERTCTL_AUTH_TYPE is "api-key".
|
||||
# Generate with: openssl rand -base64 32
|
||||
# CERTCTL_AUTH_SECRET=change-me-in-production
|
||||
|
||||
# ==============================================================================
|
||||
# Certctl Agent
|
||||
# ==============================================================================
|
||||
CERTCTL_SERVER_URL=http://localhost:8443
|
||||
# HTTPS-only as of v2.2 (TLS 1.3 pinned). Agents reject http:// URLs at
|
||||
# startup. Use the docker-compose self-signed bootstrap CA bundle from
|
||||
# `deploy/test/certs/ca.crt` or supply your own via CERTCTL_SERVER_CA_BUNDLE_PATH.
|
||||
CERTCTL_SERVER_URL=https://localhost:8443
|
||||
CERTCTL_API_KEY=change-me-in-production
|
||||
CERTCTL_AGENT_NAME=local-agent
|
||||
|
||||
|
||||
@@ -19,7 +19,7 @@ jobs:
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: '1.25'
|
||||
go-version: '1.25.9'
|
||||
|
||||
- name: Go Build
|
||||
run: |
|
||||
@@ -31,9 +31,325 @@ jobs:
|
||||
- name: Go Vet
|
||||
run: go vet ./...
|
||||
|
||||
- name: Install golangci-lint
|
||||
run: |
|
||||
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.11.4
|
||||
|
||||
- name: Run golangci-lint
|
||||
run: golangci-lint run ./... --timeout 5m
|
||||
|
||||
- name: Install govulncheck
|
||||
run: go install golang.org/x/vuln/cmd/govulncheck@latest
|
||||
|
||||
- name: Run govulncheck
|
||||
run: govulncheck ./...
|
||||
|
||||
- name: Forbidden auth-type literal regression guard (G-1)
|
||||
# G-1 closed the JWT silent auth downgrade by removing "jwt" from the
|
||||
# accepted CERTCTL_AUTH_TYPE values. This step grep-fails the build
|
||||
# if "jwt" reappears in any of the *additive* auth-type surfaces:
|
||||
# the validAuthTypes / ValidAuthTypes() set, the OpenAPI enum, the
|
||||
# helm chart's allowed-types list, or the .env.example default.
|
||||
# Comment lines and the dedicated rejection branch in config.go
|
||||
# (`c.Auth.Type == "jwt"`) are intentionally exempt — those are the
|
||||
# G-1 fix itself, not a regression.
|
||||
#
|
||||
# Connector packages (internal/connector/) are exempt because the
|
||||
# Google OAuth2 service-account JWT and step-ca provisioner one-
|
||||
# time-token JWT are external-protocol uses, unrelated to certctl's
|
||||
# own auth shape. Test files (_test.go) are exempt so negative
|
||||
# tests can pass the literal.
|
||||
#
|
||||
# See docs/upgrade-to-v2-jwt-removal.md for the closure rationale,
|
||||
# or internal/config/config.go::ValidAuthTypes for the allowed set.
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Scoped patterns that indicate "jwt" being added back to an
|
||||
# allowed-set surface. Each catches a regression shape we've
|
||||
# actually seen in pre-G-1 code:
|
||||
# - Go map/slice literal: "jwt": true or "jwt",
|
||||
# - Go switch case: case "jwt"
|
||||
# - YAML enum: enum: [..., jwt, ...] or - jwt
|
||||
# - .env conditional: AUTH_TYPE.*"jwt"|=jwt$
|
||||
BAD=$(grep -rnEH \
|
||||
-e '"jwt"\s*:\s*true' \
|
||||
-e '"jwt"\s*,' \
|
||||
-e 'case\s+"jwt"' \
|
||||
-e 'enum:.*\bjwt\b' \
|
||||
-e '^\s*-\s*jwt\s*$' \
|
||||
-e 'AUTH_TYPE\s*=\s*jwt\s*$' \
|
||||
-e 'AUTH_TYPE\s*=\s*jwt\s*#' \
|
||||
-e 'auth\.type\s*=\s*jwt\s*$' \
|
||||
-e 'AuthType\("jwt"\)' \
|
||||
internal/config/ \
|
||||
internal/api/ \
|
||||
cmd/ \
|
||||
api/openapi.yaml \
|
||||
.env.example \
|
||||
deploy/.env.example \
|
||||
deploy/helm/certctl/values.yaml \
|
||||
deploy/helm/certctl/templates/ \
|
||||
2>/dev/null \
|
||||
| grep -v '_test.go' \
|
||||
| grep -vE '^\s*[^:]+:[0-9]+:\s*(//|#)' \
|
||||
| grep -v 'is no longer accepted' \
|
||||
|| true)
|
||||
if [ -n "$BAD" ]; then
|
||||
echo "G-1 regression: \"jwt\" reappeared in an allowed-set surface:"
|
||||
echo "$BAD"
|
||||
echo ""
|
||||
echo "Allowed surface for 'jwt' literals: comment lines, the"
|
||||
echo "dedicated rejection branch in internal/config/config.go,"
|
||||
echo "and connector packages (Google OAuth2, step-ca)."
|
||||
echo "See docs/upgrade-to-v2-jwt-removal.md and"
|
||||
echo "internal/config/config.go::ValidAuthTypes()."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Forbidden api_key_hash JSON-shape regression guard (G-2)
|
||||
# G-2 closed cat-s5-apikey_leak by tagging Agent.APIKeyHash
|
||||
# `json:"-"` and adding a defense-in-depth Agent.MarshalJSON that
|
||||
# zeroes the field on the marshal-time copy. This step grep-fails
|
||||
# the build if `api_key_hash` reappears in any of the *additive*
|
||||
# JSON-emitting surfaces: a Go struct json tag in internal/domain/,
|
||||
# an OpenAPI Agent schema property, a TypeScript field declaration
|
||||
# in web/src/, or an enum-list / discriminator in handler
|
||||
# production code.
|
||||
#
|
||||
# Repository, migration, seed, service, integration-test, and
|
||||
# unit-test files are exempt — those are server-internal use
|
||||
# sites (the DB column stays, the in-memory struct field stays,
|
||||
# the auth-lookup path stays). Comment lines are exempt so the
|
||||
# G-2 closure rationale can stay in the source.
|
||||
#
|
||||
# See coverage-gap-audit-2026-04-24-v5/unified-audit.md
|
||||
# cat-s5-apikey_leak for the closure rationale, or
|
||||
# internal/domain/connector.go::Agent::MarshalJSON for the
|
||||
# redaction enforcement.
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Scoped patterns that indicate api_key_hash being added back
|
||||
# to a JSON-emitting surface. Each catches a regression shape
|
||||
# that pre-G-2 actually shipped or that a future refactor
|
||||
# could plausibly introduce:
|
||||
# - Go struct tag: `json:"api_key_hash"`
|
||||
# - Frontend interface: api_key_hash[?]: string
|
||||
# - OpenAPI schema property: api_key_hash: (column-aligned)
|
||||
# - YAML enum / array: - api_key_hash
|
||||
BAD=$(grep -rnEH \
|
||||
-e 'json:"api_key_hash[",]' \
|
||||
-e '^\s*api_key_hash\??\s*:' \
|
||||
-e '^\s*-\s*api_key_hash\s*$' \
|
||||
internal/domain/ \
|
||||
internal/api/ \
|
||||
cmd/ \
|
||||
api/openapi.yaml \
|
||||
web/src/ \
|
||||
2>/dev/null \
|
||||
| grep -v '_test.go' \
|
||||
| grep -vE '^\s*[^:]+:[0-9]+:\s*(//|#)' \
|
||||
|| true)
|
||||
if [ -n "$BAD" ]; then
|
||||
echo "G-2 regression: api_key_hash reappeared in a JSON-emitting surface:"
|
||||
echo "$BAD"
|
||||
echo ""
|
||||
echo "Allowed surface for api_key_hash literals: comment lines,"
|
||||
echo "the database column (migrations/), the in-memory struct"
|
||||
echo "field tagged \`json:\"-\"\`, and the repository / service"
|
||||
echo "use sites. See internal/domain/connector.go::Agent and"
|
||||
echo "coverage-gap-audit-2026-04-24-v5/unified-audit.md"
|
||||
echo "cat-s5-apikey_leak for the closure rationale."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Forbidden plaintext HEALTHCHECK regression guard (U-2)
|
||||
# U-2 closed cat-u-healthcheck_protocol_mismatch by switching the
|
||||
# published image's HEALTHCHECK from `curl -f http://localhost:
|
||||
# 8443/health` (always failed against the HTTPS-only listener) to
|
||||
# `curl -fsk https://localhost:8443/health`. This step grep-fails
|
||||
# the build if any Dockerfile in the repo carries the pre-U-2
|
||||
# plaintext shape — either explicitly (`http://localhost:8443/
|
||||
# health` in a HEALTHCHECK) or via the looser pattern of any
|
||||
# HEALTHCHECK that targets `http://` against the certctl server
|
||||
# port.
|
||||
#
|
||||
# Comment lines and the docs/upgrade-to-tls.md:182 expected-to-
|
||||
# fail invariant ("plaintext is gone, expect Connection refused")
|
||||
# are intentionally exempt — we DO want the upgrade-doc string
|
||||
# `http://localhost:8443/health` to remain there, since it
|
||||
# documents what operators should test for to confirm plaintext
|
||||
# is dead. The guardrail is scoped to Dockerfile* only, so docs
|
||||
# are out of its reach.
|
||||
#
|
||||
# See coverage-gap-audit-2026-04-24-v5/unified-audit.md
|
||||
# cat-u-healthcheck_protocol_mismatch for the closure rationale,
|
||||
# or deploy/test/healthcheck_test.go for the binary-image
|
||||
# contract the runtime test pins.
|
||||
run: |
|
||||
set -e
|
||||
|
||||
# Patterns that catch the actual regression shapes:
|
||||
# - HEALTHCHECK directive carrying any http:// (even if the
|
||||
# port differs, no plaintext probe should ship).
|
||||
# - The exact pre-U-2 string for grep-friendliness.
|
||||
BAD=$(grep -rnEH \
|
||||
-e 'HEALTHCHECK.*http://' \
|
||||
-e 'curl[^|&;]*-f[^|&;]*http://localhost:8443/health' \
|
||||
Dockerfile Dockerfile.agent Dockerfile.* 2>/dev/null \
|
||||
| grep -vE '^\s*[^:]+:[0-9]+:\s*#' \
|
||||
|| true)
|
||||
if [ -n "$BAD" ]; then
|
||||
echo "U-2 regression: plaintext HEALTHCHECK reappeared in a Dockerfile:"
|
||||
echo "$BAD"
|
||||
echo ""
|
||||
echo "Allowed: HTTPS HEALTHCHECK with -k (acceptable for"
|
||||
echo "localhost-to-localhost), or non-HTTP probe shapes"
|
||||
echo "(pgrep, /proc check). See Dockerfile / Dockerfile.agent"
|
||||
echo "for the post-U-2 reference shape and"
|
||||
echo "coverage-gap-audit-2026-04-24-v5/unified-audit.md"
|
||||
echo "cat-u-healthcheck_protocol_mismatch for rationale."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Forbidden migration mount in compose initdb (U-3)
|
||||
# U-3 closed cat-u-seed_initdb_schema_drift (GitHub #10) by
|
||||
# eliminating the dual-source-of-truth between
|
||||
# `migrations/*.up.sql` mounted into postgres
|
||||
# `/docker-entrypoint-initdb.d/` and the same files re-applied at
|
||||
# runtime by `RunMigrations`. Pre-U-3 every new migration that
|
||||
# the seed depended on (000013 added `policy_rules.severity`,
|
||||
# 000017 renames `retry_interval_seconds`, etc.) had to be added
|
||||
# by hand to the compose mount list; missing the update crashed
|
||||
# initdb on first boot, postgres flagged unhealthy, and the
|
||||
# whole stack failed to start from a fresh clone. Post-U-3 the
|
||||
# server is the single source of truth — `RunMigrations` +
|
||||
# `RunSeed` apply everything at boot.
|
||||
#
|
||||
# This step grep-fails the build if any compose file under
|
||||
# `deploy/` re-introduces a `migrations/.*\.sql` mount into
|
||||
# `/docker-entrypoint-initdb.d`. Comments are exempt so the
|
||||
# post-fix rationale block in the compose files (which
|
||||
# documents WHY the mounts were removed) doesn't trip the guard.
|
||||
# The demo overlay's `seed_demo.sql` is the explicit exception:
|
||||
# it is tolerated only when it lives behind the
|
||||
# CERTCTL_DEMO_SEED env var (post-U-3 demo path) — bare initdb
|
||||
# mounts are NOT tolerated. The grep matches all compose
|
||||
# mount-list shapes (`-` indented, `volumes:` indented, both),
|
||||
# so any future drift surfaces here before the operator hits it
|
||||
# on a fresh clone.
|
||||
#
|
||||
# See coverage-gap-audit-2026-04-24-v5/unified-audit.md
|
||||
# cat-u-seed_initdb_schema_drift for the closure rationale, or
|
||||
# internal/repository/postgres/db.go::RunSeed for the runtime
|
||||
# contract.
|
||||
run: |
|
||||
set -e
|
||||
|
||||
BAD=$(grep -rnEH \
|
||||
-e 'migrations/.*\.sql:.*docker-entrypoint-initdb' \
|
||||
-e 'seed.*\.sql:.*docker-entrypoint-initdb' \
|
||||
deploy/docker-compose.yml \
|
||||
deploy/docker-compose.test.yml \
|
||||
deploy/docker-compose.demo.yml \
|
||||
2>/dev/null \
|
||||
| grep -vE '^\s*[^:]+:[0-9]+:\s*#' \
|
||||
|| true)
|
||||
if [ -n "$BAD" ]; then
|
||||
echo "U-3 regression: migration/seed mount into postgres initdb reappeared:"
|
||||
echo "$BAD"
|
||||
echo ""
|
||||
echo "The post-U-3 contract is: postgres comes up with an empty"
|
||||
echo "schema and the server applies migrations + seed at boot via"
|
||||
echo "internal/repository/postgres.RunMigrations + RunSeed. Demo"
|
||||
echo "data lives behind CERTCTL_DEMO_SEED=true (RunDemoSeed),"
|
||||
echo "not an initdb mount. See"
|
||||
echo "coverage-gap-audit-2026-04-24-v5/unified-audit.md"
|
||||
echo "cat-u-seed_initdb_schema_drift for the closure rationale."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Forbidden StatusBadge dead-key + Certificate phantom-field regression guard (D-1)
|
||||
# D-1 master closed cat-d-359e92c20cbf (Agent: 'Stale' dead key,
|
||||
# 'Degraded' missing), cat-d-9f4c8e4a91f1 (Notification: 'dead'
|
||||
# missing), cat-d-1447e04732e7 (Cert: 'PendingIssuance' dead
|
||||
# key), cat-f-cert_detail_page_key_render_fallback (render-site
|
||||
# uses cert.X directly), and cat-f-ae0d06b6588f (Certificate
|
||||
# TS phantom fields). This step grep-fails the build if either
|
||||
# half of the closure is reverted:
|
||||
#
|
||||
# 1. The dead StatusBadge keys ('Stale' for Agent, 'PendingIssuance'
|
||||
# for Cert) reappearing as map literals, OR
|
||||
# 2. The five phantom Certificate TS fields (serial_number,
|
||||
# fingerprint_sha256, key_algorithm, key_size, issued_at)
|
||||
# reappearing on the `Certificate` interface in types.ts
|
||||
# (CertificateVersion legitimately carries them and is
|
||||
# explicitly excluded by the awk pre-filter below).
|
||||
#
|
||||
# Comments are exempt so the closure prose in StatusBadge.tsx +
|
||||
# types.ts can stay. Test files are exempt so negative tests
|
||||
# asserting the dead keys fall through to neutral keep working.
|
||||
#
|
||||
# See coverage-gap-audit-2026-04-24-v5/unified-audit.md
|
||||
# cat-d-* / cat-f-* for the closure rationale, or
|
||||
# web/src/components/StatusBadge.test.tsx for the live
|
||||
# enum-coverage contract.
|
||||
run: |
|
||||
set -e
|
||||
|
||||
BAD_BADGE=$(grep -nE "^\s*(Stale|PendingIssuance)\s*:\s*'badge-" \
|
||||
web/src/components/StatusBadge.tsx 2>/dev/null \
|
||||
| grep -v '\.test\.' \
|
||||
| grep -vE '^\s*[^:]+:[0-9]+:\s*//' \
|
||||
|| true)
|
||||
if [ -n "$BAD_BADGE" ]; then
|
||||
echo "D-1 regression: dead StatusBadge key reappeared:"
|
||||
echo "$BAD_BADGE"
|
||||
echo ""
|
||||
echo "Allowed surface: comment lines naming the removed key in"
|
||||
echo "the file's preamble. The Go-side AgentStatus values are"
|
||||
echo "Online/Offline/Degraded (no Stale); CertificateStatus values"
|
||||
echo "are Pending/Active/... (no PendingIssuance). See"
|
||||
echo "web/src/components/StatusBadge.test.tsx for the contract."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Certificate TS phantom-field check. Scoped to the
|
||||
# `export interface Certificate {` block in web/src/api/types.ts
|
||||
# — CertificateVersion legitimately declares these fields and
|
||||
# must NOT trip the guardrail. The awk window opens on the
|
||||
# exact `Certificate {` header (not `CertificateVersion {`,
|
||||
# not `CertificateProfile {`) and closes at the first `}`,
|
||||
# then the grep matches a phantom-field declaration anywhere
|
||||
# in that window.
|
||||
BAD_TS=$(awk '
|
||||
/^export interface Certificate \{/ { flag=1; next }
|
||||
flag && /^\}/ { flag=0 }
|
||||
flag { print FILENAME":"NR":"$0 }
|
||||
' web/src/api/types.ts \
|
||||
| grep -E '\b(serial_number|fingerprint_sha256|key_algorithm|key_size|issued_at)\??\s*:' \
|
||||
|| true)
|
||||
if [ -n "$BAD_TS" ]; then
|
||||
echo "D-1 regression: Certificate TS interface re-added a phantom field:"
|
||||
echo "$BAD_TS"
|
||||
echo ""
|
||||
echo "These fields live on CertificateVersion, not ManagedCertificate."
|
||||
echo "The Go-side ManagedCertificate has never carried them; the"
|
||||
echo "TS optional declarations were silently undefined on every"
|
||||
echo "list response. Render-site consumers (e.g. CertificateDetailPage)"
|
||||
echo "use latestVersion?.field as the canonical access path."
|
||||
echo "See coverage-gap-audit-2026-04-24-v5/unified-audit.md"
|
||||
echo "cat-f-ae0d06b6588f for the closure rationale."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Race Detection
|
||||
run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/crypto/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
|
||||
|
||||
- name: Go Test with Coverage
|
||||
run: |
|
||||
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/mcp/... ./internal/cli/... -count=1 -cover -coverprofile=coverage.out
|
||||
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
|
||||
|
||||
- name: Check Coverage Thresholds
|
||||
run: |
|
||||
@@ -41,7 +357,7 @@ jobs:
|
||||
echo "=== Coverage Report ==="
|
||||
go tool cover -func=coverage.out | tail -1
|
||||
|
||||
# Check service layer coverage (target: 70%+)
|
||||
# Check service layer coverage (target: 60%+)
|
||||
SERVICE_COV=$(go tool cover -func=coverage.out | grep 'internal/service' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
||||
echo "Service layer coverage: ${SERVICE_COV}%"
|
||||
|
||||
@@ -49,13 +365,40 @@ jobs:
|
||||
HANDLER_COV=$(go tool cover -func=coverage.out | grep 'internal/api/handler' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
||||
echo "Handler layer coverage: ${HANDLER_COV}%"
|
||||
|
||||
# Check domain layer coverage (target: 40%+)
|
||||
DOMAIN_COV=$(go tool cover -func=coverage.out | grep 'internal/domain' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
||||
echo "Domain layer coverage: ${DOMAIN_COV}%"
|
||||
|
||||
# Check middleware layer coverage (target: 50%+)
|
||||
MIDDLEWARE_COV=$(go tool cover -func=coverage.out | grep 'internal/api/middleware' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
||||
echo "Middleware layer coverage: ${MIDDLEWARE_COV}%"
|
||||
|
||||
# Check crypto package coverage (target: 85%+)
|
||||
# M-8 rationale: encryption primitives are a security-critical gate.
|
||||
# v2 format, key-derivation, fallback, and fail-closed sentinel paths
|
||||
# all need exhaustive coverage to avoid silent regressions (CWE-916 / CWE-329).
|
||||
CRYPTO_COV=$(go tool cover -func=coverage.out | grep 'internal/crypto' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
||||
echo "Crypto package coverage: ${CRYPTO_COV}%"
|
||||
|
||||
# Fail if thresholds not met
|
||||
if [ "$(echo "$SERVICE_COV < 30" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Service layer coverage ${SERVICE_COV}% is below 30% threshold"
|
||||
if [ "$(echo "$SERVICE_COV < 55" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Service layer coverage ${SERVICE_COV}% is below 55% threshold"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$(echo "$HANDLER_COV < 50" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Handler layer coverage ${HANDLER_COV}% is below 50% threshold"
|
||||
if [ "$(echo "$HANDLER_COV < 60" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Handler layer coverage ${HANDLER_COV}% is below 60% threshold"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
||||
exit 1
|
||||
fi
|
||||
if [ "$(echo "$CRYPTO_COV < 85" | bc -l)" -eq 1 ]; then
|
||||
echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 85% threshold"
|
||||
exit 1
|
||||
fi
|
||||
echo "Coverage thresholds passed!"
|
||||
@@ -93,3 +436,46 @@ jobs:
|
||||
- name: Build Frontend
|
||||
working-directory: web
|
||||
run: npx vite build
|
||||
|
||||
helm-lint:
|
||||
name: Helm Chart Validation
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Install Helm
|
||||
uses: azure/setup-helm@v4
|
||||
with:
|
||||
version: '3.13.0'
|
||||
|
||||
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
|
||||
# configured. Every lint/template invocation below must pick exactly one
|
||||
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
|
||||
# (certctl.tls.required) and docs/tls.md.
|
||||
- name: Lint Helm Chart
|
||||
run: |
|
||||
helm lint deploy/helm/certctl/ \
|
||||
--set server.tls.existingSecret=certctl-tls-ci
|
||||
|
||||
- name: Template Helm Chart (existingSecret mode)
|
||||
run: |
|
||||
helm template certctl deploy/helm/certctl/ \
|
||||
--set server.tls.existingSecret=certctl-tls-ci \
|
||||
> /dev/null
|
||||
|
||||
- name: Template Helm Chart (cert-manager mode)
|
||||
run: |
|
||||
helm template certctl deploy/helm/certctl/ \
|
||||
--set server.tls.certManager.enabled=true \
|
||||
--set server.tls.certManager.issuerRef.name=letsencrypt-prod \
|
||||
> /dev/null
|
||||
|
||||
- name: Template Helm Chart (guard fails without TLS)
|
||||
run: |
|
||||
# Inverse test: the chart MUST refuse to render when no TLS source is
|
||||
# configured. If this ever renders successfully, the fail-loud guard
|
||||
# in certctl.tls.required has regressed.
|
||||
if helm template certctl deploy/helm/certctl/ > /dev/null 2>&1; then
|
||||
echo "::error::Helm chart rendered without a TLS source — fail-loud guard regressed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
@@ -7,14 +7,208 @@ on:
|
||||
|
||||
env:
|
||||
REGISTRY: ghcr.io
|
||||
# Keep in lock-step with .github/workflows/ci.yml (M-3).
|
||||
GO_VERSION: '1.25.9'
|
||||
IMAGE_NAMESPACE: shankar0123
|
||||
|
||||
jobs:
|
||||
build-and-push:
|
||||
# ----------------------------------------------------------------------
|
||||
# build-binaries (M-3): matrix build every (binary × OS × arch) tuple.
|
||||
# For each tuple we produce: the binary, a SPDX-JSON SBOM, a keyless
|
||||
# Cosign signature + certificate bundle, and a single-line sha256sum
|
||||
# file. All artefacts are uploaded to a workflow-scoped artifact; the
|
||||
# aggregate-checksums job fans them back in for release upload.
|
||||
# ----------------------------------------------------------------------
|
||||
build-binaries:
|
||||
name: Build ${{ matrix.binary }} (${{ matrix.os }}/${{ matrix.arch }})
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: read
|
||||
id-token: write # Cosign keyless OIDC identity token
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
binary: [agent, server, cli, mcp-server]
|
||||
os: [linux, darwin]
|
||||
arch: [amd64, arm64]
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v5
|
||||
with:
|
||||
go-version: ${{ env.GO_VERSION }}
|
||||
|
||||
- name: Extract version from tag
|
||||
id: version
|
||||
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Build binary
|
||||
id: build
|
||||
env:
|
||||
GOOS: ${{ matrix.os }}
|
||||
GOARCH: ${{ matrix.arch }}
|
||||
CGO_ENABLED: '0'
|
||||
VERSION: ${{ steps.version.outputs.VERSION }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
OUTPUT_NAME="certctl-${{ matrix.binary }}-${{ matrix.os }}-${{ matrix.arch }}"
|
||||
mkdir -p dist
|
||||
go build \
|
||||
-trimpath \
|
||||
-ldflags="-w -s -X main.Version=${VERSION}" \
|
||||
-o "dist/${OUTPUT_NAME}" \
|
||||
"./cmd/${{ matrix.binary }}"
|
||||
ls -lh "dist/${OUTPUT_NAME}"
|
||||
echo "output_name=${OUTPUT_NAME}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Generate SBOM (SPDX-JSON)
|
||||
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
|
||||
with:
|
||||
file: dist/${{ steps.build.outputs.output_name }}
|
||||
format: spdx-json
|
||||
output-file: dist/${{ steps.build.outputs.output_name }}.sbom.spdx.json
|
||||
upload-artifact: false
|
||||
upload-release-assets: false
|
||||
|
||||
- name: Install Cosign
|
||||
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
|
||||
|
||||
- name: Keyless-sign binary with Cosign
|
||||
env:
|
||||
OUTPUT_NAME: ${{ steps.build.outputs.output_name }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
# Cosign v3.0 (shipped by cosign-installer@v4.1.1 default
|
||||
# cosign-release=v3.0.5) removed --output-signature/--output-certificate
|
||||
# on sign-blob. The replacement is --bundle, which emits a unified
|
||||
# Sigstore bundle (signature + cert chain + Rekor inclusion proof) as
|
||||
# a single .sigstore.json artefact. M-11.
|
||||
cosign sign-blob \
|
||||
--yes \
|
||||
--bundle "dist/${OUTPUT_NAME}.sigstore.json" \
|
||||
"dist/${OUTPUT_NAME}"
|
||||
|
||||
- name: Compute SHA-256 sidecar
|
||||
env:
|
||||
OUTPUT_NAME: ${{ steps.build.outputs.output_name }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
cd dist
|
||||
sha256sum "${OUTPUT_NAME}" > "${OUTPUT_NAME}.sha256"
|
||||
cat "${OUTPUT_NAME}.sha256"
|
||||
|
||||
- name: Upload build artefacts
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: binary-${{ steps.build.outputs.output_name }}
|
||||
path: |
|
||||
dist/${{ steps.build.outputs.output_name }}
|
||||
dist/${{ steps.build.outputs.output_name }}.sigstore.json
|
||||
dist/${{ steps.build.outputs.output_name }}.sbom.spdx.json
|
||||
dist/${{ steps.build.outputs.output_name }}.sha256
|
||||
if-no-files-found: error
|
||||
retention-days: 7
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# aggregate-checksums (M-3): fan in every matrix artefact, produce a
|
||||
# single checksums.txt (sha256sum format, compatible with `sha256sum
|
||||
# -c`), sign it with Cosign, upload everything to the GitHub Release,
|
||||
# and emit a base64-encoded hash manifest for the SLSA generator.
|
||||
# ----------------------------------------------------------------------
|
||||
aggregate-checksums:
|
||||
name: Aggregate checksums & sign
|
||||
runs-on: ubuntu-latest
|
||||
needs: [build-binaries]
|
||||
permissions:
|
||||
contents: write
|
||||
id-token: write # Cosign keyless OIDC identity token
|
||||
outputs:
|
||||
hashes: ${{ steps.hashes.outputs.hashes }}
|
||||
steps:
|
||||
- name: Download binary artefacts
|
||||
uses: actions/download-artifact@v4
|
||||
with:
|
||||
pattern: binary-*
|
||||
path: artifacts
|
||||
merge-multiple: true
|
||||
|
||||
- name: Aggregate SHA-256 sums
|
||||
id: hashes
|
||||
run: |
|
||||
set -euo pipefail
|
||||
cd artifacts
|
||||
: > checksums.txt
|
||||
for f in certctl-*; do
|
||||
case "$f" in
|
||||
*.sigstore.json|*.sbom.spdx.json|*.sha256|checksums.txt)
|
||||
continue ;;
|
||||
esac
|
||||
sha256sum "$f" >> checksums.txt
|
||||
done
|
||||
echo "=== checksums.txt ==="
|
||||
cat checksums.txt
|
||||
# base64 hashes (single line, no wrapping) for SLSA generator.
|
||||
HASHES=$(base64 -w0 < checksums.txt)
|
||||
echo "hashes=${HASHES}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Install Cosign
|
||||
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
|
||||
|
||||
- name: Keyless-sign checksums.txt
|
||||
run: |
|
||||
set -euo pipefail
|
||||
cd artifacts
|
||||
# Cosign v3.0 --bundle replaces the removed v2 flag pair
|
||||
# --output-signature / --output-certificate. See M-11.
|
||||
cosign sign-blob \
|
||||
--yes \
|
||||
--bundle checksums.txt.sigstore.json \
|
||||
checksums.txt
|
||||
|
||||
- name: Upload artefacts to GitHub Release
|
||||
uses: softprops/action-gh-release@v2
|
||||
if: startsWith(github.ref, 'refs/tags/')
|
||||
with:
|
||||
files: |
|
||||
artifacts/certctl-*
|
||||
artifacts/checksums.txt
|
||||
artifacts/checksums.txt.sigstore.json
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# provenance-binaries (M-3): SLSA Level 3 provenance for every binary.
|
||||
# The SLSA generic generator reusable workflow runs in a hermetic
|
||||
# workflow run, producing multiple.intoto.jsonl from the base64 hash
|
||||
# manifest and uploading it as a release asset.
|
||||
# ----------------------------------------------------------------------
|
||||
provenance-binaries:
|
||||
name: SLSA provenance (binaries)
|
||||
needs: [aggregate-checksums]
|
||||
permissions:
|
||||
actions: read
|
||||
id-token: write
|
||||
contents: write
|
||||
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0
|
||||
with:
|
||||
base64-subjects: "${{ needs.aggregate-checksums.outputs.hashes }}"
|
||||
upload-assets: true
|
||||
provenance-name: multiple.intoto.jsonl
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# build-and-push-docker: push container images to GHCR with native
|
||||
# SLSA L3 provenance (mode=max) and SBOM attestations emitted by
|
||||
# docker/build-push-action@v6, plus a keyless Cosign signature on the
|
||||
# image digest for identity-bound verification. The M-4 proxy-propagation
|
||||
# build-args block is retained verbatim — M-3 only adds supply-chain
|
||||
# steps; it never touches M-4 wiring.
|
||||
# ----------------------------------------------------------------------
|
||||
build-and-push-docker:
|
||||
name: Build & Push Docker Images
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write
|
||||
packages: write
|
||||
id-token: write # Cosign keyless OIDC identity token
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
@@ -28,48 +222,146 @@ jobs:
|
||||
|
||||
- name: Extract version from tag
|
||||
id: version
|
||||
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
|
||||
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Set up Docker Buildx
|
||||
uses: docker/setup-buildx-action@v3
|
||||
|
||||
- name: Install Cosign
|
||||
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
|
||||
|
||||
- name: Build and push server image
|
||||
id: server-push
|
||||
uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
file: ./Dockerfile
|
||||
push: true
|
||||
tags: |
|
||||
${{ env.REGISTRY }}/shankar0123/certctl-server:${{ steps.version.outputs.VERSION }}
|
||||
${{ env.REGISTRY }}/shankar0123/certctl-server:latest
|
||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server:${{ steps.version.outputs.VERSION }}
|
||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server:latest
|
||||
# Proxy propagation (M-4, Issue #9) — forwards runner-level proxy
|
||||
# secrets into the Docker build so self-hosted runners behind
|
||||
# corporate proxies can reach public registries. GitHub-hosted
|
||||
# runners don't need proxies, so the secrets are optional and
|
||||
# resolve to empty strings when unset — byte-identical to the
|
||||
# pre-fix behaviour for the public-runner path.
|
||||
build-args: |
|
||||
HTTP_PROXY=${{ secrets.HTTP_PROXY }}
|
||||
HTTPS_PROXY=${{ secrets.HTTPS_PROXY }}
|
||||
NO_PROXY=${{ secrets.NO_PROXY }}
|
||||
# Supply-chain hardening (M-3): emit native SLSA L3 provenance
|
||||
# and SBOM attestations bound to the image manifest.
|
||||
provenance: mode=max
|
||||
sbom: true
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
|
||||
- name: Keyless-sign server image with Cosign
|
||||
env:
|
||||
DIGEST: ${{ steps.server-push.outputs.digest }}
|
||||
IMAGE: ${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server
|
||||
run: |
|
||||
set -euo pipefail
|
||||
cosign sign --yes "${IMAGE}@${DIGEST}"
|
||||
|
||||
- name: Build and push agent image
|
||||
id: agent-push
|
||||
uses: docker/build-push-action@v6
|
||||
with:
|
||||
context: .
|
||||
file: ./Dockerfile.agent
|
||||
push: true
|
||||
tags: |
|
||||
${{ env.REGISTRY }}/shankar0123/certctl-agent:${{ steps.version.outputs.VERSION }}
|
||||
${{ env.REGISTRY }}/shankar0123/certctl-agent:latest
|
||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent:${{ steps.version.outputs.VERSION }}
|
||||
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent:latest
|
||||
# Proxy propagation (M-4, Issue #9) — see server-image step for
|
||||
# rationale. Empty secrets resolve to empty build args, leaving
|
||||
# the un-proxied code path byte-identical to the pre-fix tree.
|
||||
build-args: |
|
||||
HTTP_PROXY=${{ secrets.HTTP_PROXY }}
|
||||
HTTPS_PROXY=${{ secrets.HTTPS_PROXY }}
|
||||
NO_PROXY=${{ secrets.NO_PROXY }}
|
||||
# Supply-chain hardening (M-3): emit native SLSA L3 provenance
|
||||
# and SBOM attestations bound to the image manifest.
|
||||
provenance: mode=max
|
||||
sbom: true
|
||||
cache-from: type=gha
|
||||
cache-to: type=gha,mode=max
|
||||
|
||||
- name: Create GitHub Release
|
||||
- name: Keyless-sign agent image with Cosign
|
||||
env:
|
||||
DIGEST: ${{ steps.agent-push.outputs.digest }}
|
||||
IMAGE: ${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent
|
||||
run: |
|
||||
set -euo pipefail
|
||||
cosign sign --yes "${IMAGE}@${DIGEST}"
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# create-release: stamp the release body. The actual asset uploads are
|
||||
# handled by aggregate-checksums (binaries, SBOMs, sigs, certs,
|
||||
# checksums.txt + signature) and the SLSA generator (multiple.intoto.jsonl).
|
||||
# ----------------------------------------------------------------------
|
||||
create-release:
|
||||
name: Create Release Notes
|
||||
runs-on: ubuntu-latest
|
||||
needs: [build-binaries, aggregate-checksums, provenance-binaries, build-and-push-docker]
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: Extract version from tag
|
||||
id: version
|
||||
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
|
||||
|
||||
- name: Create release with notes
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
generate_release_notes: true
|
||||
body: |
|
||||
## Installation
|
||||
|
||||
### Quick Install (Linux/macOS)
|
||||
|
||||
```bash
|
||||
curl -sSL https://raw.githubusercontent.com/shankar0123/certctl/master/install-agent.sh | bash
|
||||
```
|
||||
|
||||
### Manual Binary Download
|
||||
|
||||
Download the appropriate binary for your OS and architecture:
|
||||
|
||||
- **Linux x86_64**: `certctl-agent-linux-amd64`
|
||||
- **Linux ARM64**: `certctl-agent-linux-arm64`
|
||||
- **macOS x86_64**: `certctl-agent-darwin-amd64`
|
||||
- **macOS ARM64 (Apple Silicon)**: `certctl-agent-darwin-arm64`
|
||||
|
||||
Then make it executable and start the service:
|
||||
|
||||
```bash
|
||||
chmod +x certctl-agent-linux-amd64
|
||||
sudo mv certctl-agent-linux-amd64 /usr/local/bin/certctl-agent
|
||||
```
|
||||
|
||||
## Docker Images
|
||||
|
||||
Pull pre-built Docker images for server and agent:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/shankar0123/certctl-server:${{ steps.version.outputs.VERSION }}
|
||||
docker pull ghcr.io/shankar0123/certctl-agent:${{ steps.version.outputs.VERSION }}
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
Or use the latest tag:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/shankar0123/certctl-server:latest
|
||||
docker pull ghcr.io/shankar0123/certctl-agent:latest
|
||||
```
|
||||
|
||||
## Docker Compose Quick Start
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shankar0123/certctl.git
|
||||
@@ -77,3 +369,92 @@ jobs:
|
||||
cp deploy/.env.example deploy/.env
|
||||
docker compose -f deploy/docker-compose.yml up -d
|
||||
```
|
||||
|
||||
## Server Binaries
|
||||
|
||||
Pre-compiled server binaries are also available for direct installation:
|
||||
|
||||
- **Linux x86_64**: `certctl-server-linux-amd64`
|
||||
- **Linux ARM64**: `certctl-server-linux-arm64`
|
||||
- **macOS x86_64**: `certctl-server-darwin-amd64`
|
||||
- **macOS ARM64 (Apple Silicon)**: `certctl-server-darwin-arm64`
|
||||
|
||||
## CLI & MCP Server Binaries
|
||||
|
||||
The `certctl-cli` (REST API wrapper) and `certctl-mcp-server` (Model Context
|
||||
Protocol bridge) binaries ship for all four platforms as well:
|
||||
|
||||
- `certctl-cli-{linux,darwin}-{amd64,arm64}`
|
||||
- `certctl-mcp-server-{linux,darwin}-{amd64,arm64}`
|
||||
|
||||
## Verifying this release
|
||||
|
||||
Every binary, `checksums.txt`, and container image is signed with Cosign
|
||||
keyless OIDC. Each binary ships with a SPDX-JSON SBOM. Binaries are covered
|
||||
by SLSA Level 3 provenance; container images carry native SLSA L3 provenance
|
||||
and SBOM attestations (docker/build-push-action `provenance: mode=max`,
|
||||
`sbom: true`) in addition to a Cosign signature on the digest.
|
||||
|
||||
**1. Verify SHA-256 checksums:**
|
||||
|
||||
```bash
|
||||
sha256sum -c checksums.txt
|
||||
```
|
||||
|
||||
**2. Verify the Cosign signature on checksums.txt (keyless OIDC):**
|
||||
|
||||
```bash
|
||||
cosign verify-blob \
|
||||
--bundle checksums.txt.sigstore.json \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
checksums.txt
|
||||
```
|
||||
|
||||
Replace `checksums.txt` with any individual binary name to verify that
|
||||
artefact directly (each binary ships with its own `.sigstore.json`
|
||||
bundle, e.g. `cosign verify-blob --bundle certctl-agent-linux-amd64.sigstore.json …`).
|
||||
|
||||
**3. Verify SLSA Level 3 provenance (binaries):**
|
||||
|
||||
```bash
|
||||
slsa-verifier verify-artifact \
|
||||
--provenance-path multiple.intoto.jsonl \
|
||||
--source-uri github.com/shankar0123/certctl \
|
||||
--source-tag ${{ steps.version.outputs.VERSION }} \
|
||||
certctl-agent-linux-amd64
|
||||
```
|
||||
|
||||
**4. Verify container image signature and attestations:**
|
||||
|
||||
```bash
|
||||
IMAGE=ghcr.io/shankar0123/certctl-server:${{ steps.version.outputs.VERSION }}
|
||||
cosign verify \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SBOM attestation (SPDX-JSON) emitted by docker/build-push-action
|
||||
cosign verify-attestation --type spdxjson \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SLSA provenance attestation (mode=max)
|
||||
cosign verify-attestation --type slsaprovenance \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
```
|
||||
|
||||
## Helm Chart
|
||||
|
||||
Deploy certctl to Kubernetes using Helm:
|
||||
|
||||
```bash
|
||||
helm repo add certctl https://github.com/shankar0123/certctl/tree/master/deploy/helm
|
||||
helm repo update
|
||||
helm install certctl certctl/certctl
|
||||
```
|
||||
|
||||
See `deploy/helm/certctl/` for values customization.
|
||||
|
||||
@@ -43,6 +43,11 @@ vendor/
|
||||
tmp/
|
||||
temp/
|
||||
*.log
|
||||
*.bak
|
||||
|
||||
# Private keys (agent-generated, never commit)
|
||||
cmd/agent/*.key
|
||||
cmd/agent/*.pem
|
||||
|
||||
# Database
|
||||
*.db
|
||||
@@ -57,11 +62,29 @@ certctl-agent
|
||||
certctl-cli
|
||||
/server
|
||||
/agent
|
||||
/cli
|
||||
/mcp-server
|
||||
|
||||
# Private strategy docs
|
||||
roadmap.md
|
||||
SECURITY_REMEDIATION.md
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
mcp-server
|
||||
|
||||
# Local Go build/module caches (session-scoped, never committed)
|
||||
/.gocache/
|
||||
/.gomodcache/
|
||||
/.gopath/
|
||||
/.gomodcache-gopath/
|
||||
|
||||
# Design scratch files (session-scoped)
|
||||
/.i004-design.md
|
||||
/.i005-design.md
|
||||
|
||||
# HTTPS-Everywhere (M-007) Phase 6: the docker-compose.test.yml tls-init
|
||||
# container writes ca.crt / server.crt / server.key into this directory so
|
||||
# the host-side integration_test.go binary can pin the CA via
|
||||
# CERTCTL_TEST_CA_BUNDLE=./certs/ca.crt. Material is regenerated on every
|
||||
# `docker compose up` and never belongs in git.
|
||||
/deploy/test/certs/
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
version: "2"
|
||||
|
||||
run:
|
||||
timeout: 5m
|
||||
|
||||
linters:
|
||||
default: none
|
||||
enable:
|
||||
- contextcheck
|
||||
- govet
|
||||
- staticcheck
|
||||
- unused
|
||||
settings:
|
||||
staticcheck:
|
||||
checks:
|
||||
- "all"
|
||||
- "-ST1005" # error strings should not be capitalized (pre-existing style)
|
||||
- "-ST1000" # package comment style (pre-existing)
|
||||
- "-ST1003" # naming convention (pre-existing)
|
||||
- "-ST1016" # method receiver naming (pre-existing)
|
||||
- "-QF1001" # apply De Morgan's law (style suggestion)
|
||||
- "-QF1003" # convert if/else to switch (style suggestion)
|
||||
- "-QF1012" # use fmt.Fprintf (style suggestion)
|
||||
- "-SA1019" # deprecated API usage (elliptic.Marshal — Go hasn't removed it)
|
||||
- "-SA9003" # empty branch (intentional in switch stubs)
|
||||
- "-S1009" # redundant nil check (pre-existing style)
|
||||
- "-S1011" # use single append with spread (pre-existing style)
|
||||
exclusions:
|
||||
max-issues-per-linter: 0
|
||||
max-same-issues: 0
|
||||
|
||||
# Linters temporarily disabled — re-enable incrementally as pre-existing issues are fixed:
|
||||
# - errcheck (50 issues — unchecked error returns throughout codebase)
|
||||
# - gocritic (50 issues — diagnostic/performance suggestions)
|
||||
# - gosec (23 issues — security warnings in test/stub code)
|
||||
# - ineffassign (13 issues — dead assignments)
|
||||
# - noctx (25 issues — http.Get without context)
|
||||
# - bodyclose (response body close missing)
|
||||
@@ -0,0 +1,177 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to certctl are documented in this file. Dates use ISO 8601. Versions follow [Semantic Versioning](https://semver.org/).
|
||||
|
||||
## [unreleased] — 2026-04-25
|
||||
|
||||
### D-1: StatusBadge enum drift + Certificate phantom fields — closed end-to-end
|
||||
|
||||
> The dashboard silently lied in five places. Agents in the `Degraded` state (the only Go-side AgentStatus that means "needs operator attention") rendered as default neutral grey because StatusBadge mapped `Stale` (a key Go has never emitted) to yellow and let the real `Degraded` value fall through to the dictionary default. Dead-letter notifications (`status: 'dead'`, retries exhausted) rendered as default neutral, visually equated with `read` (operator-acknowledged). The Certificate badge map carried a `PendingIssuance` key that no Go enum value ever emits — dead key, latent confusion vector. CertificateDetailPage's Key Algorithm and Key Size rows always rendered `—` even when the data was a single fetch away, because the lookup went through `cert.key_algorithm` directly — and the underlying `Certificate` TypeScript interface declared five optional fields (`serial_number`, `fingerprint_sha256`, `key_algorithm`, `key_size`, `issued_at`) that Go's `ManagedCertificate` has never carried (those values live on `CertificateVersion`). Five findings, two files, one frontend rebuild. Pre-D-1 the only reason this didn't trip a regression suite was that the regression suite never asserted "every Go-emitted enum value gets a non-default StatusBadge class" — D-1 fixes the visual lies and adds a 38-case Vitest property test that walks every Go enum and pins the contract.
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
- **`Certificate` TypeScript interface no longer declares `serial_number?`, `fingerprint_sha256?`, `key_algorithm?`, `key_size?`, or `issued_at?`.** The Go `ManagedCertificate` (`internal/domain/certificate.go`) has never emitted these fields on list responses; they live on `CertificateVersion` and are reachable via `getCertificateVersions(id)`. Pre-D-5 (the cat-f phantom-fields finding) the optional declarations made `cert.X` always-undefined on lists, and downstream consumers silently rendered `—` for every cert. Post-D-5 a `cert.X` access for any of the five fields is a TypeScript compile error, forcing every consumer to acknowledge the version-fallback pattern. The OpenAPI `ManagedCertificate` schema was already correct — only the TS type was drifted.
|
||||
- **StatusBadge no longer maps `Stale` (Agent) or `PendingIssuance` (Certificate).** Both were dead keys — no Go enum value emits them. Operators with custom CSS hooked off `.badge-warning` for `Stale` will see the same color come back via the new `Degraded` mapping (same class), but JS/TS code that switches on the literal `'Stale'` will need to switch on `'Degraded'` instead. The `PendingIssuance` deletion has no documented downstream consumer.
|
||||
|
||||
### Added
|
||||
|
||||
- **`web/src/components/StatusBadge.tsx`: `Degraded` (Agent) → `badge-warning` and `dead` (Notification) → `badge-danger`.** First mappings restore the color contract for the two real Go-side values that previously fell through to the dictionary default. The `Degraded` mapping cross-references `internal/domain/connector.go::AgentStatusDegraded`; the `dead` mapping cross-references `internal/domain/notification.go::NotificationStatusDead`.
|
||||
- **`web/src/components/StatusBadge.test.tsx`: 38-case Vitest property test.** Iterates every Go-side enum value (`AgentStatus`, `CertificateStatus`, `JobStatus`, `NotificationStatus`, `DiscoveryStatus`, `HealthStatus`) plus the two frontend-synthesized `Enabled`/`Disabled` labels, asserts every value gets a non-default class (or, for the five intentionally-neutral terminal values like `Archived`/`Cancelled`/`read`, an explicit `badge badge-neutral`). Includes negative assertions on the deleted `Stale` and `PendingIssuance` keys (must fall through to neutral) and specific UX-correctness assertions on the operator-attention semantics (`dead` → danger, `Degraded` → warning).
|
||||
- **`web/src/api/types.test.ts`: D-5 Certificate phantom-fields trim regression.** A `Certificate` literal construction pinned post-trim, plus a sibling `CertificateVersion` literal pinning that the trimmed fields still live on the version envelope. The `tsc --noEmit` gate in CI is the primary enforcement; the test is the documentation of intent.
|
||||
- **CI regression guardrail in `.github/workflows/ci.yml` (`Forbidden StatusBadge dead-key + Certificate phantom-field regression guard (D-1)`).** Two grep blocks: (1) catches `Stale: 'badge-...'` or `PendingIssuance: 'badge-...'` in `web/src/components/StatusBadge.tsx`; (2) uses an awk-scoped window over the `export interface Certificate {` block in `web/src/api/types.ts` to catch any of the five phantom fields reappearing — explicitly excludes the `CertificateVersion` block which legitimately carries them. Verified locally on the post-fix tree (passes) and against synthetic regressions (each fires the guardrail).
|
||||
|
||||
### Changed
|
||||
|
||||
- **`web/src/pages/CertificateDetailPage.tsx`: Key Algorithm and Key Size rows now read from `latestVersion?.key_algorithm` / `latestVersion?.key_size`.** Mirrors the existing `latestVersion` fallback used for `serial_number` and `fingerprint_sha256` earlier in the same file. Pre-D-4 these rows accessed `cert.key_algorithm` and `cert.key_size` directly — both phantom fields per D-5 — so the rows always rendered `—`. The same file's `serial_number` / `fingerprint_sha256` / `issued_at` derivations were also simplified to drop the now-impossible `cert.X || latestVersion?.X` cert-side leg.
|
||||
- **`web/src/components/StatusBadge.tsx` adds a leading docblock** naming the Go-side source-of-truth file for every status family it maps (`AgentStatus`, `CertificateStatus`, `JobStatus`, `NotificationStatus`, `DiscoveryStatus`, `HealthStatus`) and pointing at the property test as the regression vector for future enum changes.
|
||||
- **`api/openapi.yaml::ManagedCertificate`** gets a leading comment cross-referencing the D-5 closure and explaining why per-issuance fields legitimately don't appear here (they live on `CertificateVersion`). Schema property list unchanged — the OpenAPI spec was already correct.
|
||||
|
||||
### Closed audit findings
|
||||
|
||||
- `cat-d-359e92c20cbf` (P1 primary) — Agent: `Stale` dead key + `Degraded` neutral fallthrough
|
||||
- `cat-d-9f4c8e4a91f1` (P2) — Notification: `dead` missing
|
||||
- `cat-d-1447e04732e7` (P3) — Certificate: `PendingIssuance` dead key
|
||||
- `cat-f-cert_detail_page_key_render_fallback` (P2) — render-site uses `cert.key_algorithm` directly
|
||||
- `cat-f-ae0d06b6588f` (P2) — Certificate TS phantom fields (root cause)
|
||||
|
||||
### Known follow-ups (deferred from D-1 scope)
|
||||
|
||||
The audit's broader type-drift cluster (`diff-05x06-7cdf4e78ae24` Agent TS, `diff-05x06-2044a46f4dd0` DeploymentTarget TS, `diff-05x06-caba9eb3620e` Notification TS, `diff-05x06-85ab6b98a2f7` DiscoveredCertificate TS, `diff-05x06-97fab8783a5c` Issuer TS) is out of D-1 scope. Recon for those is per-type field-by-field diff Go ↔ TS — codegen-shaped, not edit-shaped — and warrants its own D-2 master prompt.
|
||||
|
||||
### U-3: GitHub #10 reopened — fresh-clone first-up postgres init failure (P1) — closed end-to-end
|
||||
|
||||
> Operator `mikeakasully` cloned v2.0.50 fresh, ran the canonical quickstart `docker compose -f deploy/docker-compose.yml up -d --build`, and postgres reported `unhealthy` indefinitely; dependent containers (certctl-server, certctl-agent) never started. Root cause: the deploy compose stack mounted both a hand-curated subset of `migrations/*.up.sql` and `seed.sql` into postgres `/docker-entrypoint-initdb.d/`. Postgres applied them at initdb time. Once `seed.sql` referenced columns added by migrations *after* the mounted cutoff (e.g., `policy_rules.severity` from migration 000013, which the mount list never included), initdb crashed mid-seed and the container loop wedged. Two sources of truth — the mount list and the in-tree migration ladder — diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. The U-3 closure removes the dual source: postgres now boots empty and the server applies the entire migration ladder + seed at startup via `RunMigrations` + `RunSeed`. Same pattern Helm has used since day one. Bundled with four ride-along audit findings whose fixes are in adjacent code (column rename, missing column, dropped orphan columns, new build-identity endpoint) so operators take the schema-change pain only once.
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
- **`deploy/docker-compose.yml` postgres no longer initdb-mounts the migration files or `seed.sql`.** Operators running on a populated `postgres_data` volume from a pre-U-3 release see no behavioral change (the schema is already in place; `RunMigrations` is `IF NOT EXISTS` and `RunSeed` is `ON CONFLICT DO NOTHING`). Operators running on a *fresh* clone now rely on the server to apply both — which is the bug fix. There is no rollback path other than re-introducing the dual-source-of-truth hazard. See `internal/repository/postgres/db.go::RunSeed` for the runtime contract.
|
||||
- **`migrations/000017_db_coupling_cleanup.up.sql` renames `renewal_policies.retry_interval_minutes` → `retry_interval_seconds`.** The column always held seconds; the column name lied (`cat-o-retry_interval_unit_mismatch`). Operators running raw SQL against the old name need to update their queries. The Go layer (`internal/repository/postgres/renewal_policy.go`) is updated in lockstep so the in-tree code path is unaffected.
|
||||
- **`migrations/000017_db_coupling_cleanup.up.sql` drops `network_scan_targets.health_check_enabled` and `network_scan_targets.health_check_interval_seconds`.** These columns were declared by a long-ago migration but never wired into Go code (`cat-o-health_check_column_orphans`) — schema noise that confused operators reading raw SQL. Anyone with custom dashboards selecting those columns will break.
|
||||
- **The compose demo overlay (`deploy/docker-compose.demo.yml`) no longer initdb-mounts `seed_demo.sql`.** It now sets `CERTCTL_DEMO_SEED=true` and the server applies the demo seed at boot via `RunDemoSeed` after baseline migrations + seed.sql are in place. Same single-source-of-truth pattern as the production path.
|
||||
|
||||
### Added
|
||||
|
||||
- **Migration `000017_db_coupling_cleanup`** (up + down). Bundles three schema changes in idempotent SQL: (1) rename `renewal_policies.retry_interval_minutes` → `retry_interval_seconds` (DO $$ guard so re-application is safe), (2) add `notification_events.created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()`, (3) drop the orphan `network_scan_targets.health_check_*` columns. Reduces operator-visible "schema-change releases" from four to one.
|
||||
- **`internal/repository/postgres.RunSeed`** — runtime equivalent of the deleted initdb mount for `seed.sql`. Called from `cmd/server/main.go` immediately after `RunMigrations`. Idempotent (every INSERT in the shipped seed uses `ON CONFLICT (id) DO NOTHING`); missing-file is a no-op so operators with custom packaging that strips the seed don't break.
|
||||
- **`internal/repository/postgres.RunDemoSeed`** + **`config.DatabaseConfig.DemoSeed`** + **`CERTCTL_DEMO_SEED` env var.** Replaces the deleted `seed_demo.sql` initdb mount. The compose demo overlay sets `CERTCTL_DEMO_SEED=true` and the server applies the demo seed after baseline. Same idempotency contract as the baseline path. Default-off so a vanilla deploy never lands fake-history rows.
|
||||
- **`GET /api/v1/version` endpoint** + **`internal/api/handler.VersionHandler`**. Returns `{version, commit, modified, build_time, go_version}` from `runtime/debug.ReadBuildInfo()` with ldflags-supplied `Version` taking priority. Wired through the no-auth dispatch in `cmd/server/main.go` so probes and rollout systems can read build identity without Bearer credentials. Audit middleware excludes the path so rollout polls don't dominate the audit trail. Closes `cat-u-no_version_endpoint`.
|
||||
- **`notification_events.created_at` column** is now populated by `NotificationRepository.Create` (with a `time.Now()` fallback when the caller leaves it zero) and read back by `scanNotification`. Pre-U-3 the JSON API serialised `0001-01-01T00:00:00Z` — closes `cat-o-notification_created_at_dead_field`.
|
||||
- **Five regression tests** for the U-3 contract: `TestRunSeed_AppliesIdempotently`, `TestRunSeed_MissingFileIsNoOp`, `TestRunDemoSeed_AppliesIdempotently`, `TestMigration000017_RetryIntervalRename`, `TestMigration000017_NotificationCreatedAt`, `TestMigration000017_HealthCheckOrphansDropped`, plus `TestNotificationRepository_CreatedAt_IsPersisted` / `TestNotificationRepository_CreatedAt_DefaultsToNow` for the round-trip. All testcontainers-gated (skipped under `-short`). Three handler-layer unit tests pin `/api/v1/version` (`TestVersion_ReturnsBuildInfo`, `TestVersion_RejectsNonGet`, `TestVersion_LdflagsOverride`).
|
||||
- **CI regression guardrail** in `.github/workflows/ci.yml` (`Forbidden migration mount in compose initdb (U-3)`) — grep-fails the build if any `migrations/.*\.sql` or `seed.*\.sql` file is re-mounted into `/docker-entrypoint-initdb.d` in any compose file. Catches future drift before a fresh-clone operator hits it.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`deploy/docker-compose.yml`** + **`deploy/docker-compose.test.yml`** — postgres `volumes:` no longer mount migrations or seed files; postgres healthcheck gains `start_period: 30s`; certctl-server healthcheck gains `start_period: 30s` to absorb the runtime migration + seed application window on first boot.
|
||||
- **`deploy/docker-compose.demo.yml`** — replaces the `seed_demo.sql` initdb mount with the `CERTCTL_DEMO_SEED=true` env var on `certctl-server`.
|
||||
- **`migrations/seed.sql`** — `INSERT INTO renewal_policies` updated to use the new `retry_interval_seconds` column name (lockstep with migration 000017).
|
||||
- **`internal/repository/postgres/renewal_policy.go`** — column references updated to `retry_interval_seconds` across SELECT, INSERT, and UPDATE sites (lockstep with migration 000017).
|
||||
|
||||
### Closed audit findings
|
||||
|
||||
- `cat-u-seed_initdb_schema_drift` (P1, primary U-3 finding)
|
||||
- `cat-o-retry_interval_unit_mismatch` (P1)
|
||||
- `cat-o-notification_created_at_dead_field` (P2)
|
||||
- `cat-o-health_check_column_orphans` (P1)
|
||||
- `cat-u-no_version_endpoint` (P2)
|
||||
|
||||
### G-1: JWT silent auth downgrade — closed end-to-end
|
||||
|
||||
> Pre-G-1 the config validator accepted `CERTCTL_AUTH_TYPE=jwt` and the startup log faithfully echoed `"authentication enabled" "type"="jwt"`. Reasonable people read that and concluded JWT was on. It wasn't. The auth-middleware wiring at `cmd/server/main.go` unconditionally routed every request through the api-key bearer middleware regardless of `cfg.Auth.Type`. So `CERTCTL_AUTH_TYPE=jwt` quietly compared incoming `Authorization: Bearer <something>` against whatever string the operator put in `CERTCTL_AUTH_SECRET` — real JWT clients got 401, and operators who treated `CERTCTL_AUTH_SECRET` as a *signing* secret (because they thought they were configuring JWT) had effectively handed an attacker an api-key. A security finding masquerading as a config option. We chose to remove the option rather than ship JWT middleware — the audit-recommended structural fix that closes the hazard. Operators who actually need JWT/OIDC front certctl with an authenticating gateway (oauth2-proxy / Envoy `ext_authz` / Traefik `ForwardAuth` / Pomerium / Authelia) and run the upstream certctl with `CERTCTL_AUTH_TYPE=none`. The same pattern works on docker-compose and Helm.
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
- **`CERTCTL_AUTH_TYPE=jwt` is no longer accepted.** Pre-G-1 the value was silently downgraded to api-key middleware. Post-G-1 the server fails at startup with a dedicated diagnostic naming the authenticating-gateway pattern. Operators with this in their env block must either switch to `api-key` (if they were de facto using api-key auth all along — same Bearer token continues to work) or switch to `none` and front certctl with an oauth2-proxy / Envoy / Traefik / Pomerium gateway. See [`docs/upgrade-to-v2-jwt-removal.md`](docs/upgrade-to-v2-jwt-removal.md).
|
||||
- **Helm chart `server.auth.type=jwt` now fails at `helm install` / `helm upgrade` template time.** New `certctl.validateAuthType` template helper runs on every template that depends on `.Values.server.auth.type` (`server-deployment.yaml`, `server-configmap.yaml`, `server-secret.yaml`) and fails the render with a pointer at the gateway-fronting pattern.
|
||||
- **OpenAPI spec `auth_type` enum no longer includes `jwt`.** API consumers checking `/api/v1/auth/info` against the spec will see a smaller enum.
|
||||
|
||||
### Removed
|
||||
|
||||
- Documented references to JWT in the certctl auth surface (config docblocks, middleware/health-handler comments, `.env.example`, `docs/architecture.md` middleware-stack bullet). Connector-level JWT references (Google OAuth2 service-account JWT in `internal/connector/discovery/gcpsm/`, `internal/connector/issuer/googlecas/`; step-ca's provisioner one-time-token JWT in `internal/connector/issuer/stepca/`) are unrelated and untouched — those are external-protocol uses, not certctl's own auth shape.
|
||||
|
||||
### Added
|
||||
|
||||
- **`config.AuthType` typed alias** with `AuthTypeAPIKey` / `AuthTypeNone` exported constants. Single source of truth for the allowed set across the validator, the runtime defense-in-depth switch in `main.go`, and the helm chart's `validateAuthType` helper.
|
||||
- **`config.ValidAuthTypes()`** helper returning the complete allowed set; pinned by a property test (`TestValidAuthTypesDoesNotContainJWT`) that fails the build if `"jwt"` is ever re-added to the slice.
|
||||
- **Defense-in-depth runtime guard** in `cmd/server/main.go` immediately after `config.Load()` — a `switch config.AuthType(cfg.Auth.Type)` that exits 1 if the validator was bypassed (test harness, alt config loader, env-var rebinding).
|
||||
- **`certctl.validateAuthType` Helm template helper** mirroring the existing `certctl.tls.required` pattern. Fails template render on any `server.auth.type` outside `{api-key, none}`.
|
||||
- **`docs/architecture.md` "Authenticating-gateway pattern (JWT, OIDC, mTLS)"** section explaining the design rationale for the narrow in-process auth surface and listing oauth2-proxy / Envoy `ext_authz` / Traefik `ForwardAuth` / Pomerium / Authelia / Caddy `forward_auth` / Apache `mod_auth_openidc` / nginx `auth_request` as the standard fronting options.
|
||||
- **`docs/upgrade-to-v2-jwt-removal.md`** migration guide. Same shape as `docs/upgrade-to-tls.md`. Walks through the dedicated startup error, both recovery paths (`api-key` vs gateway-fronting), a complete docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy `ext_authz` patterns, and rollback posture.
|
||||
- **`deploy/helm/certctl/README.md`** "JWT / OIDC via authenticating gateway" section with a Kubernetes-flavored oauth2-proxy + certctl walkthrough.
|
||||
- **CI regression guardrail** in `.github/workflows/ci.yml` (`Forbidden auth-type literal regression guard (G-1)`) — grep-fails the build if `"jwt"` appears as an auth-type literal in production code or spec. Connector packages exempt (legitimate external-protocol uses).
|
||||
- **Negative test coverage** in `internal/config/config_test.go`: `TestValidate_JWTAuth_RejectedDedicated` (two table rows pinning that the dedicated G-1 error fires regardless of whether `Secret` is set), `TestValidAuthTypesDoesNotContainJWT` (property-level guard), `TestValidAuthTypesIsExactly_APIKey_None` (allowed-set contract), `TestValidate_GenericInvalidAuthType` (pins that other invalid values still surface the generic invalid-auth-type error, so the dedicated G-1 path doesn't accidentally swallow non-jwt typos).
|
||||
|
||||
### Changed
|
||||
|
||||
- `internal/api/middleware/middleware.go::AuthConfig.Type` field comment now references the typed `config.AuthType` constants instead of an inline string enumeration.
|
||||
- `internal/api/handler/health.go::HealthHandler.AuthType` field comment same treatment.
|
||||
- `internal/api/handler/health_test.go` — the prior `TestAuthInfo_ReturnsAuthType_JWT` (which asserted the handler echoed `"jwt"`, baking the silent-downgrade lie into the regression suite) is removed; the pre-existing `TestAuthInfo_ReturnsAuthType_APIKey` continues to cover the api-key happy path.
|
||||
- Auth-disabled startup log in `main.go` now points operators at the authenticating-gateway pattern explicitly.
|
||||
|
||||
### U-2: Dockerfile HEALTHCHECK protocol mismatch — closed end-to-end
|
||||
|
||||
> Pre-U-2 the published `ghcr.io/shankar0123/certctl-server` image shipped with `HEALTHCHECK CMD curl -f http://localhost:8443/health`. The server has been HTTPS-only since the v2.2 HTTPS-Everywhere milestone (`cmd/server/main.go::ListenAndServeTLS`, no plaintext fallback, TLS 1.3 pinned), so the probe failed every interval and Docker marked the container `unhealthy` indefinitely. Operators inside docker-compose / Helm / the example stacks were unaffected — compose overrides the HEALTHCHECK with `--cacert + https://`, Helm uses explicit `httpGet` probes that ignore Docker's HEALTHCHECK, and every example compose file overrides with `curl -sfk https://localhost:8443/health`. But anyone running bare `docker run` / Docker Swarm / Nomad / ECS — exactly the "I just pulled the published image" path — saw permanent `unhealthy` status and (depending on orchestrator policy) a restart-loop. Recon for U-2 also surfaced two adjacent bugs from the same v2.2 milestone gap: the Helm chart's `readinessProbe.httpGet.path` pointed at `/readyz`, a route the server doesn't register (only `/health` and `/ready` are wired and bypass the auth middleware), so K8s readiness probes were getting 404/auth-rejection and pods stayed `NotReady`; and the agent image had no HEALTHCHECK at all (the compose override called `pgrep -f certctl-agent` against an image that didn't ship `procps` — latent always-fail). All three are closed in this commit.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`Dockerfile` HEALTHCHECK now speaks HTTPS.** Bare `docker run` / Swarm / Nomad / ECS users no longer see `unhealthy` forever. The probe uses `curl -fsk https://localhost:8443/health` — `-k` (insecure) is acceptable because the probe is localhost-to-localhost: the same process serving the cert is being probed; the probe never traverses a network. Compose / Helm / examples already perform full cert-chain validation and are unaffected.
|
||||
- **Helm `server.readinessProbe.httpGet.path` corrected from `/readyz` to `/ready`.** The `/readyz` path was never registered as a no-auth route (see `internal/api/router/router.go:81` and `cmd/server/main.go:920`), so K8s readiness probes received 401 (api-key auth rejection) or 404 (when auth was disabled). Pods previously failed to report Ready under most realistic Helm deployments. Liveness probe path (`/health`) was already correct and is unchanged.
|
||||
- **`docs/connectors.md` curl examples** (15 sites) updated from `http://localhost:8443/...` to `https://localhost:8443/...` with a one-time `--cacert "$CA"` extraction note matching the existing pattern in `docs/quickstart.md`. Pre-U-2 these examples silently failed against the HTTPS listener.
|
||||
|
||||
### Added
|
||||
|
||||
- **`Dockerfile.agent` HEALTHCHECK** — `pgrep -f certctl-agent` process-presence check (the agent has no HTTP listener; presence is the right primitive). Bare-`docker run` agents now report health-status the same way compose-managed ones do. Also adds `procps` to the runtime image so `pgrep` is actually available — pre-U-2 the docker-compose override at `deploy/docker-compose.yml:173` called `pgrep -f certctl-agent` against an image that lacked it (latent always-fail; container was reported unhealthy in compose too, just rarely noticed because nothing acted on the signal).
|
||||
- **`deploy/test/healthcheck_test.go`** (`//go:build integration`) — image-level integration tests. `TestPublishedServerImage_HealthcheckSpecUsesHTTPS` builds the server image, inspects `Config.Healthcheck.Test` via `docker inspect`, and asserts the array contains `https://localhost:8443/health` and `-k`, and does NOT contain `http://localhost:8443/health` (negative regression contract). `TestPublishedAgentImage_HealthcheckSpecExists` builds the agent image and asserts the HEALTHCHECK uses `pgrep` against `certctl-agent`. Both tests `t.Skip` cleanly when docker isn't available (sandbox / CI without docker-in-docker). A third runtime test (`TestPublishedServerImage_HealthcheckTransitionsToHealthy`) is a `t.Skip` placeholder until the harness wires a sidecar postgres for image-level smoke — documented honestly so the next refactor adopts it instead of rediscovering the gap.
|
||||
- **CI regression guardrail** in `.github/workflows/ci.yml` (`Forbidden plaintext HEALTHCHECK regression guard (U-2)`) — grep-fails the build if any `Dockerfile*` carries `HEALTHCHECK.*http://` or `curl -f http://localhost:8443/health`. Comments exempt; the `docs/upgrade-to-tls.md:182` post-cutover invariant string (which deliberately documents the expected-failure shape) is out of the guardrail's scope because the guardrail only scans Dockerfiles.
|
||||
|
||||
### Changed
|
||||
|
||||
- `Dockerfile` final-stage HEALTHCHECK lines now carry a long-form docblock explaining the `-k` design choice, the published-image vs compose vs Helm vs examples coverage matrix, and cross-references to the audit closure + the integration test.
|
||||
- `Dockerfile.agent` runtime stage adds `procps` to the apk install so the new HEALTHCHECK and the existing compose override both have a working `pgrep`.
|
||||
- `deploy/helm/certctl/values.yaml` server probes block now carries an explanatory comment naming the registered probe routes (`/health`, `/ready`) and the U-2 closure rationale for the `/readyz` → `/ready` correction.
|
||||
|
||||
## [2.2.0] — 2026-04-19
|
||||
|
||||
### HTTPS Everywhere — The Irony
|
||||
|
||||
> certctl manages other teams' certificates. Until v2.2, it didn't terminate TLS on its own control plane. We treated the server as an internal service sitting behind whatever TLS-terminating infrastructure the operator already owned — reverse proxies, Kubernetes Ingress controllers, service mesh sidecars. Working through an EST coverage-gap audit surfaced this as a credibility problem we wanted to fix head-on: a cert-lifecycle product should ship with HTTPS by default. This release flips that. Self-signed bootstrap for docker-compose demos, operator-supplied Secret for Helm (with optional cert-manager integration), and a one-step cutover with no backward-compat bridge. Out-of-date agents will fail at the TLS handshake layer on upgrade; the upgrade guide walks operators through the roll.
|
||||
|
||||
### Breaking Changes
|
||||
|
||||
- **HTTPS-only control plane. The plaintext HTTP listener is gone.** There is no `CERTCTL_TLS_ENABLED=false` escape hatch and no `:8080` fallback. Operators who were running certctl behind their own TLS terminator must either (a) continue doing so and let the downstream TLS terminator talk to certctl's HTTPS listener, or (b) bring their own cert/key and terminate on certctl directly. Either path requires config changes — see `docs/upgrade-to-tls.md` for a one-step cutover.
|
||||
- **Agents reject `CERTCTL_SERVER_URL=http://...` at startup.** This is a pre-flight config validation failure with a fail-loud diagnostic pointing at `docs/upgrade-to-tls.md`. Not a TCP-refused, not a TLS-handshake-error — the agent will not even attempt the network call. Every agent deployment must be reconfigured before upgrading the server.
|
||||
- **CLI and MCP clients require `https://` URLs.** Same pre-flight rejection of plaintext schemes.
|
||||
- **TLS 1.2 is not supported. TLS 1.3 only.** The server's `tls.Config.MinVersion` is pinned to `tls.VersionTLS13`. Any client still negotiating TLS 1.2 will fail at the handshake. Modern curl, Go stdlib, browsers, and Kubernetes tooling all default to 1.3-capable; legacy clients may need an upgrade.
|
||||
- **Helm chart requires a TLS source.** `helm install` without one of `server.tls.existingSecret`, `server.tls.certManager.enabled`, or (for eval only) `server.tls.selfSigned.enabled` fails at template time with a diagnostic pointing at `docs/tls.md`. There is no default-to-plaintext path.
|
||||
|
||||
### Added
|
||||
|
||||
- **Self-signed bootstrap for Docker Compose demos.** A `certctl-tls-init` init container runs before the server on first boot, generates a SAN-valid self-signed cert into `deploy/test/certs/`, and exits. The server mounts the resulting cert/key. Every curl in the demo stack pins against `./deploy/test/certs/ca.crt` with `--cacert`.
|
||||
- **Helm chart TLS provisioning — three modes.** Operator-supplied Secret (`server.tls.existingSecret`), cert-manager integration (`server.tls.certManager.enabled` with issuer selection), or self-signed (`server.tls.selfSigned.enabled` — eval only, not supported for production). Chart templates enforce exactly one is active.
|
||||
- **Hot-reload of TLS cert/key on `SIGHUP`.** Overwrite the cert/key on disk, send `SIGHUP` to the server PID, watch the `slog.Info("tls.reload", ...)` log line, and new TLS connections use the new cert. Failure during reload is logged and does not crash the server; the previous cert remains in use.
|
||||
- **Agent CA-bundle env vars.** `CERTCTL_SERVER_CA_BUNDLE_PATH` points at a PEM file the agent's HTTP client will trust. `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY` disables verification (development only — the agent logs a loud warning at startup). `install-agent.sh` writes both as commented template lines into the generated `agent.env`.
|
||||
- **Integration test suite runs over HTTPS.** `go test -tags=integration ./deploy/test/...` stands up the full Compose stack, extracts the self-signed CA bundle, and exercises every certctl API over `https://localhost:8443`. All 34 subtests green.
|
||||
- **`docs/tls.md`** — cert provisioning patterns: bring-your-own Secret, cert-manager, self-signed bootstrap, SAN requirements, rotation workflows, SIGHUP reload semantics, troubleshooting.
|
||||
- **`docs/upgrade-to-tls.md`** — one-step cutover guide for existing v2.1 operators. Walks through the agent fleet roll, Helm upgrade sequencing, downgrade-is-not-supported warnings, and cert-provisioning decision tree.
|
||||
|
||||
### Changed
|
||||
|
||||
- `cmd/server/main.go` now calls `http.Server.ListenAndServeTLS(certFile, keyFile)`. The plaintext `ListenAndServe` code path is deleted — `grep -rn "ListenAndServe[^T]" cmd/ internal/` returns zero hits.
|
||||
- All documentation curls (`docs/testing-guide.md`, `docs/quickstart.md`, `deploy/helm/INSTALLATION.md`, `deploy/helm/DEPLOYMENT_GUIDE.md`, `deploy/ENVIRONMENTS.md`, `docs/openapi.md`, migration guides, example READMEs) use `https://localhost:8443` and `--cacert` against the demo stack's bundle.
|
||||
- OpenAPI spec (`api/openapi.yaml`) `servers` blocks default to `https://localhost:8443`.
|
||||
|
||||
### Security
|
||||
|
||||
- TLS 1.3 pinned via `tls.Config.MinVersion = tls.VersionTLS13`.
|
||||
- Plaintext HTTP listener removed entirely — no port 8080, no `Upgrade-Insecure-Requests`, no HSTS-required redirect dance. There is only one port: 8443, TLS 1.3.
|
||||
- `grep -rn "http://" cmd/ internal/` returns zero hits outside test fixtures and the agent-side URL-scheme rejection error message.
|
||||
|
||||
### Upgrade Notes
|
||||
|
||||
Read `docs/upgrade-to-tls.md` before upgrading. The short version:
|
||||
|
||||
1. Pick a TLS source — bring-your-own cert, cert-manager, or self-signed bootstrap.
|
||||
2. Upgrade the server with TLS configured. First boot over HTTPS.
|
||||
3. Roll the agent fleet: set `CERTCTL_SERVER_URL=https://...` and, if using a private CA, `CERTCTL_SERVER_CA_BUNDLE_PATH`. Old agents will fail loud at startup — expected.
|
||||
4. Roll CLI/MCP clients the same way.
|
||||
|
||||
There is no backward-compat bridge. There is no dual-listener mode. The cutover is one step.
|
||||
@@ -3,17 +3,43 @@
|
||||
# Stage 1: Build frontend
|
||||
FROM node:20-alpine AS frontend
|
||||
|
||||
# Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
|
||||
# behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
|
||||
# `NO_PROXY` are forwarded via `docker build --build-arg` (or compose
|
||||
# `build.args`), they are re-exported as ENV with both upper- and lower-case
|
||||
# names because npm/apk/curl read the lowercase variants while Go, Node, and
|
||||
# most HTTP libraries read the uppercase ones.
|
||||
ARG HTTP_PROXY=
|
||||
ARG HTTPS_PROXY=
|
||||
ARG NO_PROXY=
|
||||
ENV HTTP_PROXY=${HTTP_PROXY} \
|
||||
HTTPS_PROXY=${HTTPS_PROXY} \
|
||||
NO_PROXY=${NO_PROXY} \
|
||||
http_proxy=${HTTP_PROXY} \
|
||||
https_proxy=${HTTPS_PROXY} \
|
||||
no_proxy=${NO_PROXY}
|
||||
|
||||
WORKDIR /app/web
|
||||
|
||||
COPY web/package.json web/package-lock.json ./
|
||||
RUN npm ci
|
||||
|
||||
COPY web/ .
|
||||
RUN npm run build
|
||||
RUN npm ci --include=dev || npm ci --include=dev && \
|
||||
node_modules/.bin/tsc --version && \
|
||||
npm run build
|
||||
|
||||
# Stage 2: Build Go binary
|
||||
FROM golang:1.25-alpine AS builder
|
||||
|
||||
# Proxy propagation (M-4, Issue #9) — see Stage 1 rationale.
|
||||
ARG HTTP_PROXY=
|
||||
ARG HTTPS_PROXY=
|
||||
ARG NO_PROXY=
|
||||
ENV HTTP_PROXY=${HTTP_PROXY} \
|
||||
HTTPS_PROXY=${HTTPS_PROXY} \
|
||||
NO_PROXY=${NO_PROXY} \
|
||||
http_proxy=${HTTP_PROXY} \
|
||||
https_proxy=${HTTPS_PROXY} \
|
||||
no_proxy=${NO_PROXY}
|
||||
|
||||
RUN apk add --no-cache git ca-certificates tzdata
|
||||
|
||||
WORKDIR /app
|
||||
@@ -50,7 +76,34 @@ USER certctl
|
||||
|
||||
EXPOSE 8443
|
||||
|
||||
# Image-level HEALTHCHECK for bare `docker run` / Docker Swarm / Nomad / ECS.
|
||||
#
|
||||
# U-2 (P1, cat-u-healthcheck_protocol_mismatch): pre-U-2 this probe used
|
||||
# `curl -f http://localhost:8443/health`, which always failed against the
|
||||
# HTTPS-only listener (HTTPS-Everywhere milestone, v2.2 / tag v2.0.47 —
|
||||
# `cmd/server/main.go::ListenAndServeTLS`, no plaintext fallback, TLS 1.3
|
||||
# pinned). Operators outside docker-compose / Helm saw permanent
|
||||
# `unhealthy` status and a restart-loop the first time they pulled the
|
||||
# image. The compose stack overrides this HEALTHCHECK with `--cacert` to
|
||||
# the bootstrap CA bundle (deploy/docker-compose.yml:126); the Helm chart
|
||||
# uses explicit `httpGet` probes with `scheme: HTTPS` and ignores Docker's
|
||||
# HEALTHCHECK; every example compose file in `examples/*/docker-compose.yml`
|
||||
# overrides with `curl -sfk https://localhost:8443/health`. This image-
|
||||
# level probe is for the bare-`docker run` consumer ONLY.
|
||||
#
|
||||
# `-k` (insecure) is acceptable here because the probe is localhost-to-
|
||||
# localhost: the same process serving the cert is being probed; the probe
|
||||
# never traverses a network. Pinning a `--cacert` is not viable for the
|
||||
# published image because the bootstrap cert is per-deploy (generated into
|
||||
# the `certs` named volume on first up; operator-supplied via Helm's
|
||||
# `existingSecret` or cert-manager). Compose / Helm / examples already
|
||||
# perform full cert-chain validation and are unaffected.
|
||||
#
|
||||
# CI grep guardrail at .github/workflows/ci.yml ("Forbidden plaintext
|
||||
# HEALTHCHECK regression guard (U-2)") blocks reintroduction of the
|
||||
# `http://` shape. Image-level integration test in
|
||||
# deploy/test/healthcheck_test.go pins the contract end-to-end.
|
||||
HEALTHCHECK --interval=10s --timeout=5s --start-period=5s --retries=5 \
|
||||
CMD curl -f http://localhost:8443/health || exit 1
|
||||
CMD curl -fsk https://localhost:8443/health || exit 1
|
||||
|
||||
ENTRYPOINT ["/app/server"]
|
||||
|
||||
@@ -2,6 +2,22 @@
|
||||
# Stage 1: Build
|
||||
FROM golang:1.25-alpine AS builder
|
||||
|
||||
# Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
|
||||
# behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
|
||||
# `NO_PROXY` are forwarded via `docker build --build-arg` (or compose
|
||||
# `build.args`), they are re-exported as ENV with both upper- and lower-case
|
||||
# names because apk and curl read the lowercase variants while Go reads the
|
||||
# uppercase ones.
|
||||
ARG HTTP_PROXY=
|
||||
ARG HTTPS_PROXY=
|
||||
ARG NO_PROXY=
|
||||
ENV HTTP_PROXY=${HTTP_PROXY} \
|
||||
HTTPS_PROXY=${HTTPS_PROXY} \
|
||||
NO_PROXY=${NO_PROXY} \
|
||||
http_proxy=${HTTP_PROXY} \
|
||||
https_proxy=${HTTPS_PROXY} \
|
||||
no_proxy=${NO_PROXY}
|
||||
|
||||
RUN apk add --no-cache git ca-certificates
|
||||
|
||||
WORKDIR /app
|
||||
@@ -20,7 +36,14 @@ RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
|
||||
# Stage 2: Runtime
|
||||
FROM alpine:3.19
|
||||
|
||||
RUN apk add --no-cache ca-certificates curl
|
||||
# U-2: `procps` ships pgrep, which the HEALTHCHECK below uses to verify the
|
||||
# agent process is alive. Pre-U-2 the deploy/docker-compose.yml agent
|
||||
# HEALTHCHECK called `pgrep -f certctl-agent` against this image but
|
||||
# pgrep wasn't installed — the compose probe was a latent always-fail.
|
||||
# Adding procps here fixes both the new image-level HEALTHCHECK and the
|
||||
# pre-existing compose override. Adds ~250KB to the image; acceptable for
|
||||
# observability parity with the server image.
|
||||
RUN apk add --no-cache ca-certificates curl procps
|
||||
|
||||
RUN addgroup -g 1000 certctl && \
|
||||
adduser -D -u 1000 -G certctl certctl
|
||||
@@ -35,4 +58,19 @@ RUN mkdir -p /var/lib/certctl/keys && \
|
||||
|
||||
USER certctl
|
||||
|
||||
# Image-level HEALTHCHECK for bare `docker run` / Docker Swarm / Nomad / ECS.
|
||||
#
|
||||
# U-2 (P1, cat-u-healthcheck_protocol_mismatch — adjacent fix): the agent
|
||||
# has no HTTP listener (it polls the server via outbound HTTPS), so a
|
||||
# process-presence check is the correct primitive. Pre-U-2 the agent image
|
||||
# shipped with no HEALTHCHECK at all, so bare-`docker run` operators got
|
||||
# zero health signal and orchestrators that key off Docker's HEALTHCHECK
|
||||
# (Swarm, Nomad, ECS) saw the container reported as `none`. The compose
|
||||
# override at deploy/docker-compose.yml:173 used the same `pgrep -f
|
||||
# certctl-agent` shape; we mirror it here so the published image has
|
||||
# parity with the compose stack and the override on docker-compose.yml
|
||||
# becomes redundant-but-correct rather than load-bearing.
|
||||
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
|
||||
CMD pgrep -f certctl-agent > /dev/null || exit 1
|
||||
|
||||
ENTRYPOINT ["/app/agent"]
|
||||
|
||||
@@ -6,20 +6,27 @@ Licensor: Shankar Reddy
|
||||
Licensed Work: certctl
|
||||
The Licensed Work is (c) 2026 Shankar Reddy.
|
||||
Additional Use Grant: You may make use of the Licensed Work, provided that
|
||||
you may not use the Licensed Work for a Certificate
|
||||
Management Service. A "Certificate Management Service"
|
||||
is a commercial offering that allows third parties
|
||||
(other than your employees and contractors acting on
|
||||
your behalf) to access and/or use the Licensed Work's
|
||||
certificate lifecycle management functionality as part
|
||||
of a hosted or managed service.
|
||||
you may not use the Licensed Work for a Commercial
|
||||
Certificate Service. A "Commercial Certificate Service"
|
||||
is any product, service, or offering in which a third
|
||||
party (other than your employees and contractors
|
||||
acting on your behalf) accesses, uses, or benefits
|
||||
from the Licensed Work's certificate management
|
||||
functionality — including but not limited to lifecycle
|
||||
management, discovery, monitoring, alerting, renewal
|
||||
automation, deployment, and revocation — as part of
|
||||
or in connection with an offering for which
|
||||
compensation is received. This restriction applies
|
||||
regardless of whether the Licensed Work is hosted,
|
||||
managed, embedded, bundled, or integrated with
|
||||
another product or service.
|
||||
|
||||
Change Date: March 14, 2033
|
||||
|
||||
Change License: Apache License, Version 2.0
|
||||
|
||||
For information about alternative licensing arrangements for the Licensed Work,
|
||||
please contact: skreddy040@gmail.com
|
||||
please contact: certctl@proton.me
|
||||
|
||||
Notice
|
||||
|
||||
|
||||
@@ -7,122 +7,185 @@
|
||||
|
||||
# certctl — Self-Hosted Certificate Lifecycle Platform
|
||||
|
||||
90+ API endpoints. 21 database tables. 900+ tests. Full GUI. Ships with Docker Compose.
|
||||
|
||||
```mermaid
|
||||
timeline
|
||||
title TLS Certificate Maximum Lifespan (CA/Browser Forum Ballot SC-081v3)
|
||||
2015 : 5 years
|
||||
2018 : 825 days
|
||||
2020 : 398 days
|
||||
March 2026 : 200 days
|
||||
March 2027 : 100 days
|
||||
March 2029 : 47 days
|
||||
```
|
||||
[](LICENSE)
|
||||
[](https://goreportcard.com/report/github.com/shankar0123/certctl)
|
||||
[](https://github.com/shankar0123/certctl/releases)
|
||||
[](https://github.com/shankar0123/certctl/stargazers)
|
||||
|
||||
TLS certificate lifespans are shrinking fast. The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) unanimously in April 2025, setting a phased reduction: **200 days** by March 2026, **100 days** by March 2027, and **47 days** by March 2029. Organizations managing dozens or hundreds of certificates can no longer rely on spreadsheets, calendar reminders, or manual renewal workflows. The math doesn't work — at 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever.
|
||||
|
||||
certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong.
|
||||
certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. It's free, self-hosted, and covers the same lifecycle that enterprise platforms charge $100K+/year for.
|
||||
|
||||
[](LICENSE)
|
||||
[](https://goreportcard.com/report/github.com/shankar0123/certctl)
|
||||

|
||||
```mermaid
|
||||
gantt
|
||||
title TLS Certificate Maximum Lifespan — CA/Browser Forum Ballot SC-081v3
|
||||
dateFormat YYYY-MM-DD
|
||||
axisFormat
|
||||
todayMarker off
|
||||
section 2015
|
||||
5 years (1825 days) :done, 2020-01-01, 1825d
|
||||
section 2018
|
||||
825 days :done, 2020-01-01, 825d
|
||||
section 2020
|
||||
398 days :active, 2020-01-01, 398d
|
||||
section 2026
|
||||
200 days :crit, 2020-01-01, 200d
|
||||
section 2027
|
||||
100 days :crit, 2020-01-01, 100d
|
||||
section 2029
|
||||
47 days :crit, 2020-01-01, 47d
|
||||
```
|
||||
|
||||
> **Actively maintained — shipping weekly.** Found something? [Open a GitHub issue](https://github.com/shankar0123/certctl/issues) — issues get triaged same-day. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
|
||||
|
||||
**Ready to try it?** Jump to the [Quick Start](#quick-start) — you'll have a running dashboard in under 5 minutes.
|
||||
|
||||
## Documentation
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Why certctl?](docs/why-certctl.md) | How certctl compares to ACME clients, agent-based SaaS, and enterprise platforms |
|
||||
| [Concepts](docs/concepts.md) | TLS certificates explained from scratch — for beginners who know nothing about certs |
|
||||
| [Quick Start](docs/quickstart.md) | Get running in 5 minutes — dashboard, API, CLI, discovery, stakeholder demo flow |
|
||||
| [Quick Start](docs/quickstart.md) | 5-minute setup — dashboard, API, CLI, discovery, stakeholder demo flow |
|
||||
| [Docker Compose Environments](deploy/ENVIRONMENTS.md) | Service-by-service walkthrough of all 4 compose files, env var reference |
|
||||
| [Deployment Examples](docs/examples.md) | 5 turnkey scenarios (ACME+NGINX, wildcard DNS-01, private CA, step-ca, multi-issuer) with migration guides |
|
||||
| [Advanced Demo](docs/demo-advanced.md) | Issue a certificate end-to-end with technical deep-dives |
|
||||
| [Architecture](docs/architecture.md) | System design, data flow diagrams, security model |
|
||||
| [Connectors](docs/connectors.md) | Build custom issuer, target, and notifier connectors |
|
||||
| [Feature Inventory](docs/features.md) | Complete reference of all capabilities, API endpoints, and configuration |
|
||||
| [Connector Reference](docs/connectors.md) | Configuration for all issuer, target, and notifier connectors |
|
||||
| [MCP Server](docs/mcp.md) | AI integration via Model Context Protocol — setup, available tools, examples |
|
||||
| [OpenAPI 3.1 Spec](docs/openapi.md) | API reference guide with endpoint overview ([raw spec](api/openapi.yaml)) |
|
||||
| [Compliance Mapping](docs/compliance.md) | SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 alignment guides |
|
||||
| [Manual Testing Guide](docs/testing-guide.md) | 284 tests across 25 areas — full V2 QA runbook with exact commands and pass/fail criteria |
|
||||
| [Migrate from certbot](docs/migrate-from-certbot.md) | Step-by-step migration from certbot cron jobs to certctl |
|
||||
| [Migrate from acme.sh](docs/migrate-from-acmesh.md) | Migration guide for acme.sh users, DNS hook compatibility |
|
||||
| [certctl for cert-manager users](docs/certctl-for-cert-manager-users.md) | How certctl complements cert-manager for mixed infrastructure |
|
||||
| [Test Environment](docs/test-env.md) | Docker Compose test environment with real CA backends |
|
||||
| [Testing Guide](docs/testing-guide.md) | Comprehensive test procedures, smoke tests, and release sign-off checklist |
|
||||
|
||||
## Contents
|
||||
## Supported Integrations
|
||||
|
||||
- [Why certctl Exists](#why-certctl-exists)
|
||||
- [What It Does](#what-it-does)
|
||||
- [Screenshots](#screenshots)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Architecture](#architecture)
|
||||
- [Configuration](#configuration)
|
||||
- [MCP Server (AI Integration)](#mcp-server-ai-integration)
|
||||
- [CLI](#cli)
|
||||
- [API Overview](#api-overview)
|
||||
- [Supported Integrations](#supported-integrations)
|
||||
- [Development](#development)
|
||||
- [Security](#security)
|
||||
- [Roadmap](#roadmap)
|
||||
- [License](#license)
|
||||
### Certificate Issuers
|
||||
|
||||
## Why certctl Exists
|
||||
| Issuer | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| Local CA (self-signed + sub-CA) | `GenericCA` | Sub-CA mode chains to enterprise root (ADCS, etc.) |
|
||||
| ACME v2 (Let's Encrypt, ZeroSSL, etc.) | `ACME` | HTTP-01, DNS-01, DNS-PERSIST-01 challenges. EAB auto-fetch from ZeroSSL. Profile selection (`tlsserver`, `shortlived`). |
|
||||
| step-ca (Smallstep) | `StepCA` | JWK provisioner auth, issuance + renewal + revocation |
|
||||
| OpenSSL / Custom CA | `OpenSSL` | Shell script adapter — any CA with a CLI |
|
||||
| HashiCorp Vault PKI | `VaultPKI` | Token auth, synchronous issuance, CRL/OCSP delegated to Vault |
|
||||
| DigiCert CertCentral | `DigiCert` | Async order model, OV/EV support, PEM bundle parsing |
|
||||
| Sectigo SCM | `Sectigo` | 3-header auth, DV/OV/EV, collect-not-ready graceful handling |
|
||||
| Google Cloud CAS | `GoogleCAS` | OAuth2 service account, synchronous issuance, CA pool selection |
|
||||
| AWS ACM Private CA | `AWSACMPCA` | Synchronous issuance, configurable signing algorithm/template ARN |
|
||||
| Entrust Certificate Services | `Entrust` | mTLS client certificate auth, synchronous/approval-pending issuance |
|
||||
| GlobalSign Atlas HVCA | `GlobalSign` | mTLS + API key/secret dual auth, serial-based tracking |
|
||||
| EJBCA (Keyfactor) | `EJBCA` | Dual auth (mTLS or OAuth2), self-hosted open-source CA |
|
||||
|
||||
Certificate lifecycle tooling today falls into two camps: expensive enterprise platforms (Venafi, Keyfactor, Sectigo) that cost six figures and take months to deploy, or single-purpose tools (cert-manager, certbot) that handle one slice of the problem. If you run a mixed infrastructure — some NGINX, some Apache, a few HAProxy nodes, maybe an F5 — and you need to manage certificates from multiple CAs, there's nothing self-hosted that covers the full lifecycle without vendor lock-in.
|
||||
**Note:** ADCS integration is handled via the Local CA's sub-CA mode — certctl operates as a subordinate CA with its signing certificate issued by ADCS. Any CA with a shell-accessible signing interface can be integrated via the OpenSSL/Custom CA connector.
|
||||
|
||||
certctl fills that gap. It's **CA-agnostic** — the issuer connector interface means you can plug in any certificate authority: a self-signed local CA for dev, Let's Encrypt via ACME for public certs, Smallstep step-ca for your private PKI, your enterprise ADCS via sub-CA mode, or any custom CA through a shell script adapter. You're never locked to a single CA vendor, and you can run multiple issuers simultaneously for different certificate types.
|
||||
### Deployment Targets
|
||||
|
||||
It's also **target-agnostic**. Agents deploy certificates to NGINX, Apache, and HAProxy today, with the same pluggable connector model for any server that accepts cert files. The control plane never initiates outbound connections — agents poll for work, which means certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
| Target | Type | Notes |
|
||||
|--------|------|-------|
|
||||
| NGINX | `NGINX` | File write, config validation, reload |
|
||||
| Apache httpd | `Apache` | Separate cert/chain/key files, configtest, graceful reload |
|
||||
| HAProxy | `HAProxy` | Combined PEM file, validate, reload |
|
||||
| Traefik | `Traefik` | File provider deployment, auto-reload via filesystem watch |
|
||||
| Caddy | `Caddy` | Dual-mode: admin API hot-reload or file-based |
|
||||
| Envoy | `Envoy` | File-based with optional SDS JSON config |
|
||||
| Postfix | `Postfix` | Mail server TLS, pairs with S/MIME support |
|
||||
| Dovecot | `Dovecot` | Mail server TLS, pairs with S/MIME support |
|
||||
| Microsoft IIS | `IIS` | Local PowerShell or remote WinRM, PEM→PFX, SNI support |
|
||||
| F5 BIG-IP | `F5` | iControl REST via proxy agent, transaction-based atomic updates |
|
||||
| SSH (Agentless) | `SSH` | SFTP cert/key deployment to any Linux/Unix server |
|
||||
| Windows Certificate Store | `WinCertStore` | PowerShell Import-PfxCertificate, configurable store/location |
|
||||
| Java Keystore | `JavaKeystore` | PEM→PKCS#12→keytool pipeline, JKS and PKCS12 formats |
|
||||
| Kubernetes Secrets | `KubernetesSecrets` | `kubernetes.io/tls` Secrets, in-cluster or kubeconfig auth |
|
||||
|
||||
## What It Does
|
||||
### Enrollment Protocols
|
||||
|
||||
certctl gives you a single pane of glass for every TLS certificate in your organization. The **web dashboard** shows your full certificate inventory — what's healthy, what's expiring, what's already expired, and who owns each one. The **REST API** (95 endpoints under `/api/v1/` + `/.well-known/est/`) lets you automate everything. **Agents** deployed on your infrastructure generate private keys locally, discover existing certificates on disk, and submit CSRs — private keys never leave your servers. The **network scanner** discovers certificates on TLS endpoints across your infrastructure without requiring agents. The **EST server** (RFC 7030) enables device and WiFi certificate enrollment via industry-standard Enrollment over Secure Transport. The background scheduler watches expiration dates and triggers renewals automatically — when certificate lifespans drop to 47 days, certctl handles the constant rotation without human involvement.
|
||||
| Protocol | Standard | Use Case |
|
||||
|----------|----------|----------|
|
||||
| EST (Enrollment over Secure Transport) | RFC 7030 | Device enrollment, WiFi/802.1X, IoT |
|
||||
| SCEP (Simple Certificate Enrollment Protocol) | RFC 8894 | MDM platforms (Jamf, Intune), network devices |
|
||||
| ACME v2 | RFC 8555 | Public CA automated issuance (Let's Encrypt, ZeroSSL) |
|
||||
| ACME ARI (Renewal Information) | RFC 9773 | CA-directed renewal timing — the CA tells you when to renew |
|
||||
|
||||
**Core capabilities:**
|
||||
### Standards & Revocation
|
||||
|
||||
- **Full lifecycle automation** — issuance, renewal, deployment, and revocation with zero human intervention. Configurable renewal policies trigger jobs automatically based on expiration thresholds.
|
||||
- **CA-agnostic issuer connectors** — Local CA (self-signed + sub-CA for enterprise root chains), ACME v2 with HTTP-01 and DNS-01 challenges (Let's Encrypt, Sectigo, any ACME-compatible CA), Smallstep step-ca (native /sign API), and OpenSSL/Custom CA (delegate to any shell script). Pluggable interface — add your own CA in one file.
|
||||
- **Agent-side key generation** — agents generate ECDSA P-256 keys locally, store them with 0600 permissions, and submit only the CSR. Private keys never touch the control plane. This is the default mode, not an opt-in feature.
|
||||
- **Certificate discovery** — agents scan filesystems for existing PEM/DER certificates and report findings for triage. The network scanner probes TLS endpoints across CIDR ranges to find certificates you didn't know existed.
|
||||
- **Revocation infrastructure** — RFC 5280 revocation with all standard reason codes, DER-encoded X.509 CRL per issuer, embedded OCSP responder, and short-lived certificate exemption (certs under 1 hour skip CRL/OCSP).
|
||||
- **Policy engine** — 5 rule types with violation tracking and severity levels. Certificate profiles enforce allowed key types, maximum TTL, and crypto constraints at enrollment time.
|
||||
- **Immutable audit trail** — every action recorded to an append-only log. Every API call recorded with method, path, actor, SHA-256 body hash, response status, and latency. No update or delete on audit records.
|
||||
- **Operational dashboard** — Full React GUI with certificate inventory, bulk operations (multi-select renew/revoke/reassign), deployment timeline visualization, inline policy editing, agent fleet overview, expiration heatmaps, and real-time short-lived credential tracking.
|
||||
- **Observability** — JSON and Prometheus metrics endpoints, 5 stats API endpoints for dashboards, structured slog logging with request ID propagation. Compatible with Prometheus, Grafana Agent, Datadog Agent, and Victoria Metrics.
|
||||
- **Notifications** — threshold-based alerting with deduplication. Routes to email, webhooks, Slack, Microsoft Teams, PagerDuty, and OpsGenie.
|
||||
- **EST enrollment (RFC 7030)** — built-in Enrollment over Secure Transport server for device certificate enrollment. Supports WiFi/802.1X, MDM, and IoT use cases. PKCS#7 certs-only wire format, accepts PEM or base64-encoded DER CSRs, configurable issuer and profile binding.
|
||||
- **AI and CLI access** — MCP server exposes all 78 API operations as tools for Claude, Cursor, and any MCP-compatible client. CLI tool with 12 subcommands for terminal workflows and scripting.
|
||||
| Capability | Standard | Notes |
|
||||
|------------|----------|-------|
|
||||
| DER-encoded X.509 CRL | RFC 5280 | Per-issuer, signed by issuing CA, 24h validity |
|
||||
| Embedded OCSP responder | RFC 6960 | Good/revoked/unknown status per issuer |
|
||||
| S/MIME certificates | RFC 8551 | Email protection EKU, adaptive KeyUsage flags |
|
||||
| Certificate export | — | PEM (JSON/file) and PKCS#12 formats |
|
||||
| ACME DNS-PERSIST-01 | IETF draft | Standing validation record, no per-renewal DNS updates |
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph "Control Plane"
|
||||
API["REST API + Dashboard\n:8443"]
|
||||
PG[("PostgreSQL")]
|
||||
end
|
||||
### Notifiers
|
||||
|
||||
subgraph "Your Infrastructure"
|
||||
A1["Agent"] --> T1["NGINX"]
|
||||
A2["Agent"] --> T2["Apache / HAProxy"]
|
||||
A3["Agent"] --> T3["F5 · IIS"]
|
||||
end
|
||||
| Notifier | Type |
|
||||
|----------|------|
|
||||
| Email (SMTP) | `Email` |
|
||||
| Webhooks | `Webhook` |
|
||||
| Slack | `Slack` |
|
||||
| Microsoft Teams | `Teams` |
|
||||
| PagerDuty | `PagerDuty` |
|
||||
| OpsGenie | `OpsGenie` |
|
||||
|
||||
API --> PG
|
||||
A1 & A2 & A3 -->|"CSR + status\n(no private keys)"| API
|
||||
API -->|"Signed certs"| A1 & A2 & A3
|
||||
API -->|"Issue/Renew"| CA["Certificate Authorities\nLocal CA · ACME · step-ca · OpenSSL"]
|
||||
```
|
||||
All connectors are pluggable — build your own by implementing the [connector interface](docs/connectors.md).
|
||||
|
||||
### Screenshots
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
|  |  |
|
||||
| **Dashboard** — real-time stats, expiration heatmap, renewal trends, issuance rate | **Certificates** — full inventory with status filters, environment, owner, team |
|
||||
|  |  |
|
||||
| **Agents** — fleet health, hostname, OS/arch, IP, version tracking | **Fleet Overview** — OS distribution, status breakdown, version analysis |
|
||||
|  |  |
|
||||
| **Jobs** — issuance, renewal, deployment job queue with status filters | **Notifications** — expiration warnings, renewal results, unread/all toggle |
|
||||
|  |  |
|
||||
| **Policies** — enforcement rules for ownership, environments, lifetime, renewal | **Profiles** — enrollment templates with key types, max TTL, crypto constraints |
|
||||
|  |  |
|
||||
| **Issuers** — CA connectors (Local CA, Let's Encrypt, step-ca, DigiCert) | **Targets** — deployment targets (NGINX, F5 BIG-IP, IIS, HAProxy) |
|
||||
|  |  |
|
||||
| **Owners** — certificate ownership with email and team assignment | **Teams** — organizational grouping for notification routing |
|
||||
|  |  |
|
||||
| **Agent Groups** — dynamic grouping by OS, arch, CIDR, version | **Audit Trail** — immutable log with filters, CSV/JSON export |
|
||||
|  | |
|
||||
| **Short-Lived Credentials** — ephemeral certs with live TTL countdown | |
|
||||
<table>
|
||||
<tr>
|
||||
<td><a href="docs/screenshots/v2-dashboard.png"><img src="docs/screenshots/v2-dashboard.png" width="400" alt="Dashboard"></a><br><b>Dashboard</b><br><sub>Stats, expiration heatmap, renewal trends, issuance rate</sub></td>
|
||||
<td><a href="docs/screenshots/v2-certificates.png"><img src="docs/screenshots/v2-certificates.png" width="400" alt="Certificates"></a><br><b>Certificates</b><br><sub>Inventory with bulk ops, status filters, owner/team columns</sub></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><a href="docs/screenshots/v2-issuers.png"><img src="docs/screenshots/v2-issuers.png" width="400" alt="Issuers"></a><br><b>Issuers</b><br><sub>Catalog with 10 CA types, GUI config, test connection</sub></td>
|
||||
<td><a href="docs/screenshots/v2-jobs.png"><img src="docs/screenshots/v2-jobs.png" width="400" alt="Jobs"></a><br><b>Jobs</b><br><sub>Issuance, renewal, deployment queue with approval workflow</sub></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
**[See all screenshots →](docs/screenshots/)**
|
||||
|
||||
## Why certctl
|
||||
|
||||
Certificate lifecycle tooling falls into two camps: enterprise platforms (Venafi, Keyfactor) that cost six figures and take months to deploy, or single-purpose tools (certbot, cert-manager) that handle one slice of the problem. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, and target-agnostic. If you're running certbot cron jobs, manually renewing certs, or stitching together scripts across mixed infrastructure, certctl replaces all of that.
|
||||
|
||||
Built for **platform engineering and DevOps teams** managing 10–500+ certificates, **security and compliance teams** who need audit trails and policy enforcement for SOC 2, PCI-DSS 4.0, or NIST SP 800-57 ([compliance mapping included](docs/compliance.md)), and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For a detailed comparison, see [Why certctl?](docs/why-certctl.md)
|
||||
|
||||
**Architecture.** Go 1.25 control plane with handler→service→repository layering, PostgreSQL 16 backend (21 tables), and a pull-only deployment model — the server never initiates outbound connections. Agents poll for work. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). Background scheduler runs 7 loops: renewal with ARI integration (1h), job processing (30s), agent health (2m), notifications (1m), short-lived cert expiry (30s), network scanning (6h), certificate digest (24h). See [Architecture Guide](docs/architecture.md) for full system diagrams.
|
||||
|
||||
**Security-first.** Agents generate ECDSA P-256 keys locally — private keys never touch the control plane. API key auth enforced by default with SHA-256 hashing and constant-time comparison. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Atomic idempotency guards on scheduler loops. Issuer and target credentials encrypted at rest with AES-256-GCM. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit.
|
||||
|
||||
**Key design decisions.** TEXT primary keys — human-readable prefixed IDs (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resources at a glance in logs and queries. Idempotent migrations (`IF NOT EXISTS`, `ON CONFLICT DO NOTHING`) safe for repeated execution. Dynamic configuration via GUI with AES-256-GCM encrypted credential storage and env var backward compatibility. Handlers define their own service interfaces for clean dependency inversion.
|
||||
|
||||
## What It Does
|
||||
|
||||
**Automated lifecycle.** Certificates renew and deploy themselves. The scheduler monitors expiration, issues through your CA, and deploys to targets — zero human intervention. ACME ARI (RFC 9773) lets the CA direct renewal timing. Ready for 47-day (SC-081v3) and 6-day (Let's Encrypt shortlived) certificate lifetimes.
|
||||
|
||||
**Operational dashboard.** 26-page GUI covers the entire lifecycle: certificate inventory with bulk ops, deployment timeline with rollback, discovery triage, network scan management, agent fleet health, short-lived credential countdown, approval workflows, and observability metrics. Configure issuers and targets from the dashboard — no env var editing, no server restarts.
|
||||
|
||||
**Private keys stay on your servers.** Agents generate ECDSA P-256 keys locally, submit only the CSR. The control plane never touches private keys. After deployment, agents probe the live TLS endpoint and compare SHA-256 fingerprints to confirm the right certificate is actually being served.
|
||||
|
||||
**Discovery.** Agents scan filesystems for existing PEM/DER certificates. The network scanner probes TLS endpoints across CIDR ranges without agents. Cloud discovery finds certificates in AWS Secrets Manager, Azure Key Vault, and GCP Secret Manager. Continuous TLS health monitoring tracks endpoint status (healthy/degraded/down/cert_mismatch) with configurable thresholds and historical probe data. All discovery modes feed into a unified triage workflow — claim, dismiss, or import what you find.
|
||||
|
||||
**Policy engine.** Certificate profiles constrain key types, max TTL, and EKUs — with crypto policy enforcement that validates every CSR against profile rules before it reaches the issuer. MaxTTL caps are enforced per issuer connector. Approval workflows pause jobs for human review. Ownership tracking routes notifications to the right team. Agent groups match devices by OS, architecture, IP CIDR, and version.
|
||||
|
||||
**Enrollment protocols.** EST server (RFC 7030) for device and WiFi enrollment. SCEP server (RFC 8894) for MDM platforms and network devices. S/MIME issuance with email protection EKU.
|
||||
|
||||
**Revocation.** Single and bulk revocation (by profile, owner, agent, or issuer). DER-encoded X.509 CRL per issuer, signed by the issuing CA. Embedded OCSP responder. RFC 5280 reason codes. Short-lived certs (TTL < 1 hour) are exempt — expiry is sufficient revocation.
|
||||
|
||||
**Audit and observability.** Immutable append-only audit trail records every lifecycle action, every API call, and every approval decision. Prometheus metrics endpoint. Scheduled certificate digest emails. Continuous endpoint health monitoring with state machine transitions and real-time alerts.
|
||||
|
||||
**Notifications.** Slack, Teams, PagerDuty, OpsGenie, SMTP, webhooks. Routed by certificate owner. Daily digest emails with stats and expiring certs.
|
||||
|
||||
**Multiple interfaces.** REST API (111 routes), CLI (12 commands), MCP server (80 tools for Claude, Cursor, Windsurf), Helm chart, web dashboard. Certificate export in PEM and PKCS#12.
|
||||
|
||||
**First-run onboarding.** Wizard guides you through connecting a CA, deploying an agent, and issuing your first certificate. Or start with the pre-populated demo — 32 certificates, 10 issuers, 180 days of history.
|
||||
|
||||
For the complete capability breakdown, see the [Feature Inventory](docs/features.md).
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -134,180 +197,167 @@ cd certctl
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
Wait ~30 seconds, then open **http://localhost:8443** in your browser.
|
||||
Wait ~30 seconds, then open **https://localhost:8443** in your browser. (The shipped `docker-compose.yml` self-signs a cert via the `certctl-tls-init` init container on first boot — accept the browser warning for the demo, or feed the generated `ca.crt` to your client.) The onboarding wizard walks you through connecting a CA, deploying an agent, and issuing your first certificate.
|
||||
|
||||
The dashboard comes pre-loaded with 15 demo certificates, 5 agents, policy rules, audit events, and notifications — a realistic snapshot of a certificate inventory so you can explore immediately.
|
||||
**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history:
|
||||
|
||||
Verify the API:
|
||||
```bash
|
||||
curl http://localhost:8443/health
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
|
||||
```
|
||||
|
||||
The `deploy/` directory has four compose files: `docker-compose.yml` (base platform), `docker-compose.demo.yml` (demo data overlay), `docker-compose.dev.yml` (PgAdmin + debug logging), and `docker-compose.test.yml` (standalone integration tests with real CA backends). See the [Docker Compose Environments Guide](deploy/ENVIRONMENTS.md) for a service-by-service walkthrough, or the [Quick Start](docs/quickstart.md#docker-compose-environments) for a summary.
|
||||
|
||||
```bash
|
||||
curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
|
||||
curl -s http://localhost:8443/api/v1/certificates | jq '.total'
|
||||
# 15
|
||||
```
|
||||
|
||||
### Manual Build
|
||||
The control plane is HTTPS-only (TLS 1.3, no plaintext listener). See [`docs/tls.md`](docs/tls.md) for cert provisioning patterns and [`docs/upgrade-to-tls.md`](docs/upgrade-to-tls.md) if you're upgrading from a pre-v2.2 release.
|
||||
|
||||
### Agent Install (One-Liner)
|
||||
|
||||
```bash
|
||||
# Prerequisites: Go 1.25+, PostgreSQL 16+
|
||||
go mod download
|
||||
make build
|
||||
|
||||
# Set up database
|
||||
export CERTCTL_DATABASE_URL="postgres://certctl:certctl@localhost:5432/certctl?sslmode=disable"
|
||||
export CERTCTL_AUTH_TYPE=none
|
||||
make migrate-up
|
||||
|
||||
# Start server
|
||||
./bin/server
|
||||
|
||||
# Start agent (separate terminal)
|
||||
export CERTCTL_SERVER_URL=http://localhost:8443
|
||||
export CERTCTL_API_KEY=change-me-in-production
|
||||
export CERTCTL_AGENT_NAME=local-agent
|
||||
export CERTCTL_AGENT_ID=agent-local-01
|
||||
./bin/agent --agent-id=agent-local-01
|
||||
curl -sSL https://raw.githubusercontent.com/shankar0123/certctl/master/install-agent.sh | bash
|
||||
```
|
||||
|
||||
## Architecture
|
||||
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See [install-agent.sh](install-agent.sh) for details.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph "Control Plane (certctl-server)"
|
||||
DASH["Web Dashboard\nReact SPA"]
|
||||
API["REST API\nGo 1.25 net/http"]
|
||||
SVC["Service Layer"]
|
||||
REPO["Repository Layer\ndatabase/sql + lib/pq"]
|
||||
SCHED["Scheduler\nRenewal · Jobs · Health · Notifications · Short-Lived Expiry · Network Scan"]
|
||||
end
|
||||
### Helm Chart (Kubernetes)
|
||||
|
||||
subgraph "Data Store"
|
||||
PG[("PostgreSQL 16\n21 tables\nTEXT primary keys")]
|
||||
end
|
||||
|
||||
subgraph "Agents"
|
||||
AG["certctl-agent\nKey generation · CSR · Deployment"]
|
||||
end
|
||||
|
||||
DASH --> API
|
||||
API --> SVC --> REPO --> PG
|
||||
SCHED --> SVC
|
||||
AG -->|"Heartbeat + CSR"| API
|
||||
API -->|"Cert + Chain"| AG
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set server.apiKey=your-api-key \
|
||||
--set postgres.password=your-db-password
|
||||
```
|
||||
|
||||
### Key Design Decisions
|
||||
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See [values.yaml](deploy/helm/certctl/values.yaml) for all configuration options.
|
||||
|
||||
- **Private keys isolated from the control plane.** Agents generate ECDSA P-256 keys locally and submit CSRs (public key only). The server signs the CSR and returns the certificate — private keys never touch the control plane. Server-side keygen is available via `CERTCTL_KEYGEN_MODE=server` for demo/development only.
|
||||
- **TEXT primary keys, not UUIDs.** IDs are human-readable prefixed strings (`mc-api-prod`, `t-platform`, `o-alice`) so you can identify resource types at a glance in logs and queries.
|
||||
- **Handler → Service → Repository layering.** Handlers define their own service interfaces for clean dependency inversion. No global service singletons.
|
||||
- **Idempotent migrations.** All schema uses `IF NOT EXISTS` and seed data uses `ON CONFLICT (id) DO NOTHING`, safe for repeated execution.
|
||||
### Docker Pull
|
||||
|
||||
### Database Schema
|
||||
```bash
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-server
|
||||
docker pull shankar0123.docker.scarf.sh/certctl-agent
|
||||
```
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `managed_certificates` | Certificate records with metadata, status, expiry, tags |
|
||||
| `certificate_versions` | Historical versions with PEM chains and CSRs |
|
||||
| `renewal_policies` | Renewal window, auto-renew settings, retry config, alert thresholds |
|
||||
| `issuers` | CA configurations (Local CA, ACME, etc.) |
|
||||
| `deployment_targets` | Target systems (NGINX, F5, IIS) with agent assignments |
|
||||
| `agents` | Registered agents with heartbeat tracking, OS/arch/IP metadata |
|
||||
| `jobs` | Issuance, renewal, deployment, and validation jobs |
|
||||
| `teams` | Organizational groups for certificate ownership |
|
||||
| `owners` | Individual owners with email for notifications |
|
||||
| `policy_rules` | Enforcement rules (allowed issuers, environments, metadata) |
|
||||
| `policy_violations` | Flagged non-compliance with severity levels |
|
||||
| `audit_events` | Immutable action log (append-only, no update/delete) |
|
||||
| `notification_events` | Email and webhook notification records |
|
||||
| `certificate_target_mappings` | Many-to-many cert ↔ target relationships |
|
||||
| `certificate_profiles` | Named enrollment profiles with allowed key types, max TTL, crypto constraints |
|
||||
| `agent_groups` | Dynamic device grouping by OS, architecture, IP CIDR, version |
|
||||
| `agent_group_members` | Manual include/exclude membership for agent groups |
|
||||
| `certificate_revocations` | Revocation records with RFC 5280 reason codes, serial numbers, issuer notification status |
|
||||
| `discovered_certificates` | Filesystem and network-discovered certificates with fingerprint deduplication |
|
||||
| `discovery_scans` | Discovery scan history with timestamps and agent attribution |
|
||||
| `network_scan_targets` | Network scan target definitions with CIDRs, ports, schedule, and scan metrics |
|
||||
## Verifying this release
|
||||
|
||||
## Configuration
|
||||
Every `v*` tag publishes signed, attested release artefacts. Binaries
|
||||
(`certctl-agent`, `certctl-server`, `certctl-cli`, `certctl-mcp-server` for
|
||||
`linux|darwin × amd64|arm64`) ship alongside a `checksums.txt`, per-binary
|
||||
SPDX-JSON SBOMs, Cosign signatures, and SLSA Level 3 provenance. Container
|
||||
images on `ghcr.io/shankar0123/certctl-{server,agent}` are built with
|
||||
`docker/build-push-action` `provenance: mode=max` + `sbom: true` and are
|
||||
additionally signed with Cosign at the image digest.
|
||||
|
||||
All server environment variables use the `CERTCTL_` prefix:
|
||||
All signatures use Cosign keyless OIDC; the signing identity is the
|
||||
release workflow running on a signed tag.
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_SERVER_HOST` | `127.0.0.1` | Server bind address |
|
||||
| `CERTCTL_SERVER_PORT` | `8080` | Server listen port |
|
||||
| `CERTCTL_DATABASE_URL` | `postgres://localhost/certctl` | PostgreSQL connection string |
|
||||
| `CERTCTL_DATABASE_MAX_CONNS` | `25` | Connection pool size |
|
||||
| `CERTCTL_LOG_LEVEL` | `info` | Log level: `debug`, `info`, `warn`, `error` |
|
||||
| `CERTCTL_LOG_FORMAT` | `json` | Log format: `json` or `text` |
|
||||
| `CERTCTL_AUTH_TYPE` | `api-key` | Auth mode: `api-key`, `jwt`, or `none` |
|
||||
| `CERTCTL_AUTH_SECRET` | — | Required for `api-key` and `jwt` auth types |
|
||||
| `CERTCTL_KEYGEN_MODE` | `agent` | Key generation mode: `agent` (production) or `server` (demo only) |
|
||||
| `CERTCTL_ACME_DIRECTORY_URL` | — | ACME directory URL (e.g., Let's Encrypt staging) |
|
||||
| `CERTCTL_ACME_EMAIL` | — | Contact email for ACME account registration |
|
||||
| `CERTCTL_ACME_CHALLENGE_TYPE` | — | ACME challenge type: `http-01` (default) or `dns-01` |
|
||||
| `CERTCTL_CA_CERT_PATH` | — | Path to CA certificate for sub-CA mode |
|
||||
| `CERTCTL_CA_KEY_PATH` | — | Path to CA private key for sub-CA mode |
|
||||
| `CERTCTL_CORS_ORIGINS` | — | Comma-separated allowed CORS origins (empty = same-origin, `*` = all) |
|
||||
| `CERTCTL_RATE_LIMIT_ENABLED` | `true` | Enable/disable token bucket rate limiting |
|
||||
| `CERTCTL_RATE_LIMIT_RPS` | `50` | Requests per second limit |
|
||||
| `CERTCTL_RATE_LIMIT_BURST` | `100` | Maximum burst size for rate limiter |
|
||||
| `CERTCTL_DATABASE_MIGRATIONS_PATH` | `./migrations` | Path to SQL migration files |
|
||||
| `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL` | `1h` | How often the scheduler checks for expiring certs |
|
||||
| `CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL` | `30s` | How often the scheduler processes pending jobs |
|
||||
| `CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL` | `2m` | How often the scheduler checks agent health |
|
||||
| `CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL` | `1m` | How often the scheduler processes pending notifications |
|
||||
| `CERTCTL_ACME_DNS_PRESENT_SCRIPT` | — | Script to create DNS-01 `_acme-challenge` TXT record |
|
||||
| `CERTCTL_ACME_DNS_CLEANUP_SCRIPT` | — | Script to remove DNS-01 `_acme-challenge` TXT record |
|
||||
| `CERTCTL_STEPCA_URL` | — | step-ca server URL |
|
||||
| `CERTCTL_STEPCA_PROVISIONER` | — | step-ca JWK provisioner name |
|
||||
| `CERTCTL_STEPCA_KEY_PATH` | — | Path to step-ca provisioner private key (JWK JSON) |
|
||||
| `CERTCTL_STEPCA_PASSWORD` | — | step-ca provisioner key password |
|
||||
| `CERTCTL_OPENSSL_SIGN_SCRIPT` | — | Script for OpenSSL/Custom CA certificate signing |
|
||||
| `CERTCTL_OPENSSL_REVOKE_SCRIPT` | — | Script for OpenSSL/Custom CA certificate revocation |
|
||||
| `CERTCTL_OPENSSL_CRL_SCRIPT` | — | Script for OpenSSL/Custom CA CRL generation |
|
||||
| `CERTCTL_OPENSSL_TIMEOUT_SECONDS` | `30` | Timeout for OpenSSL script execution |
|
||||
| `CERTCTL_NETWORK_SCAN_ENABLED` | `false` | Enable server-side network certificate discovery (TLS scanning) |
|
||||
| `CERTCTL_NETWORK_SCAN_INTERVAL` | `6h` | How often the scheduler runs network scans |
|
||||
| `CERTCTL_EST_ENABLED` | `false` | Enable EST (RFC 7030) enrollment endpoints under /.well-known/est/ |
|
||||
| `CERTCTL_EST_ISSUER_ID` | `iss-local` | Issuer connector ID used for EST certificate enrollment |
|
||||
| `CERTCTL_EST_PROFILE_ID` | — | Optional certificate profile ID to constrain EST enrollments |
|
||||
| `CERTCTL_SLACK_WEBHOOK_URL` | — | Slack incoming webhook URL for notifications |
|
||||
| `CERTCTL_TEAMS_WEBHOOK_URL` | — | Microsoft Teams incoming webhook URL |
|
||||
| `CERTCTL_PAGERDUTY_ROUTING_KEY` | — | PagerDuty Events API v2 routing key |
|
||||
| `CERTCTL_OPSGENIE_API_KEY` | — | OpsGenie Alert API key |
|
||||
**1. Verify SHA-256 checksums:**
|
||||
|
||||
Agent environment variables:
|
||||
```bash
|
||||
sha256sum -c checksums.txt
|
||||
```
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_SERVER_URL` | `http://localhost:8080` | Control plane URL |
|
||||
| `CERTCTL_API_KEY` | — | Agent API key |
|
||||
| `CERTCTL_AGENT_NAME` | `certctl-agent` | Agent display name |
|
||||
| `CERTCTL_AGENT_ID` | — | Registered agent ID (required) |
|
||||
| `CERTCTL_KEY_DIR` | `/var/lib/certctl/keys` | Directory for storing private keys (agent keygen mode) |
|
||||
| `CERTCTL_DISCOVERY_DIRS` | — | Comma-separated directories to scan for existing certificates (e.g., `/etc/nginx/certs,/etc/ssl/certs`) |
|
||||
**2. Verify the Cosign signature on `checksums.txt`:**
|
||||
|
||||
Docker Compose overrides these for the demo stack (see `deploy/docker-compose.yml`): port `8443`, auth type `none`, database pointing to the postgres container.
|
||||
```bash
|
||||
cosign verify-blob \
|
||||
--bundle checksums.txt.sigstore.json \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
checksums.txt
|
||||
```
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
Every individual binary ships with its own `.sigstore.json` bundle
|
||||
(unified Sigstore bundle containing signature, certificate chain, and
|
||||
Rekor inclusion proof). Swap `checksums.txt` for any binary name and
|
||||
point `--bundle` at the matching `<binary>.sigstore.json` to verify it
|
||||
directly.
|
||||
|
||||
certctl ships a standalone MCP (Model Context Protocol) server that exposes all 78 API endpoints as tools for AI assistants — Claude, Cursor, Windsurf, OpenClaw, VS Code Copilot, and any MCP-compatible client.
|
||||
**3. Verify SLSA Level 3 provenance on a binary:**
|
||||
|
||||
```bash
|
||||
slsa-verifier verify-artifact \
|
||||
--provenance-path multiple.intoto.jsonl \
|
||||
--source-uri github.com/shankar0123/certctl \
|
||||
--source-tag v2.1.0 \
|
||||
certctl-agent-linux-amd64
|
||||
```
|
||||
|
||||
**4. Verify a container image signature and its SBOM / provenance attestations:**
|
||||
|
||||
```bash
|
||||
IMAGE=ghcr.io/shankar0123/certctl-server:v2.1.0
|
||||
|
||||
cosign verify \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SBOM attestation (SPDX-JSON, emitted by docker/build-push-action)
|
||||
cosign verify-attestation --type spdxjson \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
|
||||
# SLSA provenance attestation (docker/build-push-action `provenance: mode=max`)
|
||||
cosign verify-attestation --type slsaprovenance \
|
||||
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
|
||||
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
|
||||
"$IMAGE"
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
Pick the scenario closest to your setup and have it running in 2 minutes.
|
||||
|
||||
| Example | Scenario |
|
||||
|---------|----------|
|
||||
| [`examples/acme-nginx/`](examples/acme-nginx/) | Let's Encrypt + NGINX, HTTP-01 challenges |
|
||||
| [`examples/acme-wildcard-dns01/`](examples/acme-wildcard-dns01/) | Wildcard certs via DNS-01 (Cloudflare hook included) |
|
||||
| [`examples/private-ca-traefik/`](examples/private-ca-traefik/) | Local CA (self-signed or sub-CA) + Traefik file provider |
|
||||
| [`examples/step-ca-haproxy/`](examples/step-ca-haproxy/) | Smallstep step-ca + HAProxy combined PEM |
|
||||
| [`examples/multi-issuer/`](examples/multi-issuer/) | ACME for public + Local CA for internal, one dashboard |
|
||||
|
||||
Each directory contains a `docker-compose.yml` and a `README.md` explaining the scenario, prerequisites, and customization.
|
||||
|
||||
## CLI
|
||||
|
||||
```bash
|
||||
# Install
|
||||
go install github.com/shankar0123/certctl/cmd/mcp-server@latest
|
||||
go install github.com/shankar0123/certctl/cmd/cli@latest
|
||||
|
||||
# Configure
|
||||
export CERTCTL_SERVER_URL=http://localhost:8443 # certctl API endpoint
|
||||
export CERTCTL_API_KEY=your-api-key # optional if auth disabled
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # or --ca-bundle on the CLI; --insecure for dev self-signed
|
||||
|
||||
# Run (stdio transport — add to your AI client config)
|
||||
# Usage
|
||||
certctl-cli certs list # List all certificates
|
||||
certctl-cli certs renew mc-api-prod # Trigger renewal
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
certctl-cli agents list # List registered agents
|
||||
certctl-cli jobs list # List jobs
|
||||
certctl-cli status # Server health + summary stats
|
||||
certctl-cli import certs.pem # Bulk import from PEM file
|
||||
certctl-cli certs list --format json # JSON output (default: table)
|
||||
```
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
|
||||
certctl ships a standalone MCP (Model Context Protocol) server that exposes all 80 API endpoints as tools for AI assistants — Claude, Cursor, Windsurf, OpenClaw, VS Code Copilot, and any MCP-compatible client.
|
||||
|
||||
```bash
|
||||
# Install and run
|
||||
go install github.com/shankar0123/certctl/cmd/mcp-server@latest
|
||||
export CERTCTL_SERVER_URL=https://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH=/path/to/ca.crt # required for self-signed bootstrap
|
||||
mcp-server
|
||||
```
|
||||
|
||||
The MCP server is env-vars-only — there are no CLI flags for TLS. If you must bypass verification for local development against a self-signed cert, set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true`. Never set that in production.
|
||||
|
||||
**Claude Desktop** (`claude_desktop_config.json`):
|
||||
```json
|
||||
{
|
||||
@@ -315,315 +365,47 @@ mcp-server
|
||||
"certctl": {
|
||||
"command": "mcp-server",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "http://localhost:8443",
|
||||
"CERTCTL_API_KEY": "your-api-key"
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_API_KEY": "your-api-key",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/ca.crt"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
78 tools organized by resource: certificates (9), CRL/OCSP (3), issuers (6), targets (5), agents (8), jobs (5), policies (6), profiles (5), teams (5), owners (5), agent groups (6), audit (2), notifications (3), stats (5), metrics (1), health (4).
|
||||
|
||||
## CLI
|
||||
|
||||
certctl ships a command-line tool for terminal-based certificate management workflows.
|
||||
|
||||
```bash
|
||||
# Install
|
||||
go install github.com/shankar0123/certctl/cmd/cli@latest
|
||||
|
||||
# Configure
|
||||
export CERTCTL_SERVER_URL=http://localhost:8443
|
||||
export CERTCTL_API_KEY=your-api-key
|
||||
|
||||
# Certificate commands
|
||||
certctl-cli certs list # List all certificates
|
||||
certctl-cli certs get mc-api-prod # Get certificate details
|
||||
certctl-cli certs renew mc-api-prod # Trigger renewal
|
||||
certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
|
||||
# Agent and job commands
|
||||
certctl-cli agents list # List registered agents
|
||||
certctl-cli agents get ag-web-prod # Get agent details
|
||||
certctl-cli jobs list # List jobs
|
||||
certctl-cli jobs get job-123 # Get job details
|
||||
certctl-cli jobs cancel job-123 # Cancel a pending job
|
||||
|
||||
# Operations
|
||||
certctl-cli status # Server health + summary stats
|
||||
certctl-cli import certs.pem # Bulk import from PEM file
|
||||
certctl-cli version # Show CLI version
|
||||
|
||||
# Output formats
|
||||
certctl-cli certs list --format json # JSON output (default: table)
|
||||
```
|
||||
|
||||
## API Overview
|
||||
|
||||
All endpoints are under `/api/v1/` and return JSON. List endpoints support pagination (`?page=1&per_page=50`). Full request/response schemas are available in the [OpenAPI 3.1 spec](api/openapi.yaml).
|
||||
|
||||
### Certificates
|
||||
```
|
||||
GET /api/v1/certificates List (filter, sort, cursor, sparse fields)
|
||||
POST /api/v1/certificates Create
|
||||
GET /api/v1/certificates/{id} Get
|
||||
PUT /api/v1/certificates/{id} Update
|
||||
DELETE /api/v1/certificates/{id} Archive (soft delete)
|
||||
GET /api/v1/certificates/{id}/versions Version history
|
||||
GET /api/v1/certificates/{id}/deployments List deployment targets
|
||||
POST /api/v1/certificates/{id}/renew Trigger renewal → 202 Accepted
|
||||
POST /api/v1/certificates/{id}/deploy Trigger deployment → 202 Accepted
|
||||
POST /api/v1/certificates/{id}/revoke Revoke with RFC 5280 reason code
|
||||
GET /api/v1/crl Certificate Revocation List (JSON)
|
||||
GET /api/v1/crl/{issuer_id} DER-encoded X.509 CRL
|
||||
GET /api/v1/ocsp/{issuer_id}/{serial} OCSP responder (good/revoked/unknown)
|
||||
```
|
||||
|
||||
### Agents
|
||||
```
|
||||
GET /api/v1/agents List
|
||||
POST /api/v1/agents Register
|
||||
GET /api/v1/agents/{id} Get
|
||||
POST /api/v1/agents/{id}/heartbeat Record heartbeat
|
||||
POST /api/v1/agents/{id}/csr Submit CSR for issuance
|
||||
GET /api/v1/agents/{id}/certificates/{certId} Retrieve signed certificate
|
||||
GET /api/v1/agents/{id}/work Poll for pending deployment jobs
|
||||
POST /api/v1/agents/{id}/jobs/{jobId}/status Report job completion/failure
|
||||
POST /api/v1/agents/{id}/discoveries Submit certificate discovery scan results
|
||||
```
|
||||
|
||||
### Certificate Discovery
|
||||
```
|
||||
GET /api/v1/discovered-certificates List discovered certificates (?agent_id, ?status)
|
||||
GET /api/v1/discovered-certificates/{id} Get discovery detail
|
||||
POST /api/v1/discovered-certificates/{id}/claim Link discovered cert to managed cert
|
||||
POST /api/v1/discovered-certificates/{id}/dismiss Dismiss discovery
|
||||
GET /api/v1/discovery-scans List discovery scan history
|
||||
GET /api/v1/discovery-summary Aggregated discovery status (new, claimed, dismissed counts)
|
||||
```
|
||||
|
||||
### Infrastructure
|
||||
```
|
||||
GET /api/v1/issuers List issuers
|
||||
POST /api/v1/issuers Create
|
||||
GET /api/v1/issuers/{id} Get
|
||||
PUT /api/v1/issuers/{id} Update
|
||||
DELETE /api/v1/issuers/{id} Delete
|
||||
POST /api/v1/issuers/{id}/test Test connectivity
|
||||
|
||||
GET /api/v1/targets List deployment targets
|
||||
POST /api/v1/targets Create
|
||||
GET /api/v1/targets/{id} Get
|
||||
PUT /api/v1/targets/{id} Update
|
||||
DELETE /api/v1/targets/{id} Delete
|
||||
```
|
||||
|
||||
### Organization
|
||||
```
|
||||
GET /api/v1/teams List teams
|
||||
POST /api/v1/teams Create
|
||||
GET /api/v1/teams/{id} Get
|
||||
PUT /api/v1/teams/{id} Update
|
||||
DELETE /api/v1/teams/{id} Delete
|
||||
GET /api/v1/owners List owners
|
||||
POST /api/v1/owners Create
|
||||
GET /api/v1/owners/{id} Get
|
||||
PUT /api/v1/owners/{id} Update
|
||||
DELETE /api/v1/owners/{id} Delete
|
||||
```
|
||||
|
||||
### Operations
|
||||
```
|
||||
GET /api/v1/jobs List (filter: status, type)
|
||||
GET /api/v1/jobs/{id} Get
|
||||
POST /api/v1/jobs/{id}/cancel Cancel
|
||||
POST /api/v1/jobs/{id}/approve Approve (interactive renewal)
|
||||
POST /api/v1/jobs/{id}/reject Reject (interactive renewal)
|
||||
|
||||
GET /api/v1/policies List policy rules
|
||||
POST /api/v1/policies Create
|
||||
GET /api/v1/policies/{id} Get
|
||||
PUT /api/v1/policies/{id} Update (enable/disable)
|
||||
DELETE /api/v1/policies/{id} Delete
|
||||
GET /api/v1/policies/{id}/violations List violations for rule
|
||||
|
||||
GET /api/v1/profiles List certificate profiles
|
||||
POST /api/v1/profiles Create
|
||||
GET /api/v1/profiles/{id} Get
|
||||
PUT /api/v1/profiles/{id} Update
|
||||
DELETE /api/v1/profiles/{id} Delete
|
||||
|
||||
GET /api/v1/agent-groups List agent groups
|
||||
POST /api/v1/agent-groups Create
|
||||
GET /api/v1/agent-groups/{id} Get
|
||||
PUT /api/v1/agent-groups/{id} Update
|
||||
DELETE /api/v1/agent-groups/{id} Delete
|
||||
GET /api/v1/agent-groups/{id}/members List members
|
||||
|
||||
GET /api/v1/audit Query audit trail
|
||||
GET /api/v1/audit/{id} Get audit event
|
||||
GET /api/v1/notifications List notifications
|
||||
GET /api/v1/notifications/{id} Get notification
|
||||
POST /api/v1/notifications/{id}/read Mark as read
|
||||
```
|
||||
|
||||
### Observability
|
||||
```
|
||||
GET /api/v1/stats/summary Dashboard summary (totals, expiring, agents, jobs)
|
||||
GET /api/v1/stats/certificates-by-status Certificate counts grouped by status
|
||||
GET /api/v1/stats/expiration-timeline Expiration buckets (?days=30)
|
||||
GET /api/v1/stats/job-trends Job success/failure over time (?days=7)
|
||||
GET /api/v1/stats/issuance-rate Certificate issuance rate (?days=7)
|
||||
GET /api/v1/metrics JSON metrics (gauges, counters, uptime)
|
||||
GET /api/v1/metrics/prometheus Prometheus exposition format (text/plain)
|
||||
```
|
||||
|
||||
### Network Discovery
|
||||
```
|
||||
GET /api/v1/network-scan-targets List scan targets
|
||||
POST /api/v1/network-scan-targets Create scan target (CIDRs, ports, schedule)
|
||||
GET /api/v1/network-scan-targets/{id} Get scan target
|
||||
PUT /api/v1/network-scan-targets/{id} Update scan target
|
||||
DELETE /api/v1/network-scan-targets/{id} Delete scan target
|
||||
POST /api/v1/network-scan-targets/{id}/scan Trigger immediate scan
|
||||
```
|
||||
|
||||
### Auth
|
||||
```
|
||||
GET /api/v1/auth/info Auth mode info (no auth required)
|
||||
GET /api/v1/auth/check Validate credentials
|
||||
```
|
||||
|
||||
### EST Enrollment (RFC 7030)
|
||||
```
|
||||
GET /.well-known/est/cacerts CA certificate chain (PKCS#7 certs-only)
|
||||
POST /.well-known/est/simpleenroll Simple enrollment (PEM or base64-DER CSR)
|
||||
POST /.well-known/est/simplereenroll Simple re-enrollment (certificate renewal)
|
||||
GET /.well-known/est/csrattrs CSR attributes request
|
||||
```
|
||||
|
||||
### Health
|
||||
```
|
||||
GET /health Server health check
|
||||
GET /ready Readiness check
|
||||
```
|
||||
|
||||
## Supported Integrations
|
||||
|
||||
### Certificate Issuers
|
||||
| Issuer | Status | Type |
|
||||
|--------|--------|------|
|
||||
| Local CA (self-signed + sub-CA) | Implemented | `GenericCA` |
|
||||
| ACME v2 (Let's Encrypt, Sectigo) | Implemented (HTTP-01 + DNS-01) | `ACME` |
|
||||
| step-ca | Implemented | `StepCA` |
|
||||
| OpenSSL / Custom CA | Implemented | `OpenSSL` |
|
||||
| Vault PKI | Planned | — |
|
||||
| DigiCert | Planned | — |
|
||||
|
||||
**Note:** ADCS integration is handled via the Local CA's sub-CA mode — certctl operates as a subordinate CA with its signing certificate issued by ADCS.
|
||||
|
||||
### Deployment Targets
|
||||
| Target | Status | Type |
|
||||
|--------|--------|------|
|
||||
| NGINX | Implemented | `NGINX` |
|
||||
| Apache httpd | Implemented | `Apache` |
|
||||
| HAProxy | Implemented | `HAProxy` |
|
||||
| F5 BIG-IP | Interface only | `F5` |
|
||||
| Microsoft IIS | Interface only | `IIS` |
|
||||
| Kubernetes Secrets | Planned | — |
|
||||
|
||||
### Notifiers
|
||||
| Notifier | Status | Type |
|
||||
|----------|--------|------|
|
||||
| Email (SMTP) | Implemented | `Email` |
|
||||
| Webhooks | Implemented | `Webhook` |
|
||||
| Slack | Implemented | `Slack` |
|
||||
| Microsoft Teams | Implemented | `Teams` |
|
||||
| PagerDuty | Implemented | `PagerDuty` |
|
||||
| OpsGenie | Implemented | `OpsGenie` |
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install dev tools (golangci-lint, migrate CLI, air)
|
||||
make install-tools
|
||||
|
||||
# Run tests
|
||||
make test
|
||||
|
||||
# Run with coverage
|
||||
make test-coverage
|
||||
|
||||
# Lint
|
||||
make lint
|
||||
|
||||
# Format
|
||||
make fmt
|
||||
make build # Build server + agent binaries
|
||||
make test # Run tests
|
||||
make lint # golangci-lint (11 linters)
|
||||
govulncheck ./... # Vulnerability scan
|
||||
make docker-up # Start Docker Compose stack
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
|
||||
```bash
|
||||
make docker-up # Start stack (server + postgres + agent)
|
||||
make docker-down # Stop stack
|
||||
make docker-logs-server # Server logs
|
||||
make docker-logs-agent # Agent logs
|
||||
make docker-clean # Stop + remove volumes
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
### Private Key Management
|
||||
- **Agent keygen mode (default)**: Agents generate ECDSA P-256 keys locally and store them with 0600 permissions in `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`). Only the CSR (public key) is sent to the control plane. Private keys never leave agent infrastructure.
|
||||
- **Server keygen mode (demo only)**: Set `CERTCTL_KEYGEN_MODE=server` for development/demo with Local CA. The control plane generates RSA-2048 keys server-side. A log warning is emitted at startup.
|
||||
|
||||
### Authentication
|
||||
- Agent-to-server: API key (registered at agent creation)
|
||||
- API key and JWT auth types supported; `none` for demo/development
|
||||
- Auth type and secret configured via `CERTCTL_AUTH_TYPE` and `CERTCTL_AUTH_SECRET`
|
||||
|
||||
### Audit Trail
|
||||
- Immutable append-only log in PostgreSQL (`audit_events` table)
|
||||
- Every lifecycle action attributed to an actor with timestamp and resource reference
|
||||
- No update or delete operations on audit records
|
||||
- Every API call recorded to audit trail with method, path, actor, SHA-256 body hash, response status, and latency (M19)
|
||||
CI runs on every push: `go vet`, `go test -race`, `golangci-lint`, `govulncheck`, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%). Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build. 1,668 Go test functions with 625+ subtests, plus frontend test suite.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### V1 (v1.0.0 released)
|
||||
All nine development milestones (M1–M9) are complete. The backend covers the full certificate lifecycle: Local CA and ACME v2 issuers, NGINX/Apache/HAProxy/F5/IIS target connectors, threshold-based expiration alerting, agent-side ECDSA P-256 key generation, API auth with rate limiting, and a full React dashboard wired to the real API. The CI pipeline runs build, vet, test with coverage gates (service layer 30%+, handler layer 50%+), frontend type checking, Vitest test suite, and Vite production build on every push. Docker images are published to GitHub Container Registry on every version tag via the release workflow.
|
||||
### V1 (v1.0.0) — Shipped
|
||||
Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector, agent-side key generation, API auth + rate limiting, React dashboard, CI pipeline with coverage gates, Docker images on GHCR.
|
||||
|
||||
### V2: Operational Maturity
|
||||
- **M10: Agent Metadata + Targets** ✅ — agents report OS, architecture, IP, hostname, version via heartbeat; Apache httpd and HAProxy target connectors
|
||||
- **M11: Crypto Policy + Profiles + Ownership** ✅ — certificate profiles (named enrollment profiles with allowed key types, max TTL, crypto constraints), certificate ownership tracking (owners + teams + notification routing), dynamic agent groups (OS/arch/IP CIDR/version matching), interactive renewal approval (AwaitingApproval state)
|
||||
- **M12: Sub-CA + DNS-01 + step-ca** ✅ — Local CA sub-CA mode (enterprise root chain with RSA/ECDSA/PKCS#8), ACME DNS-01 challenges (script-based DNS hooks for any provider, wildcard cert support), step-ca issuer connector (native /sign API with JWK provisioner auth)
|
||||
- **M15a: Core Revocation** ✅ — revocation API with all RFC 5280 reason codes, JSON CRL endpoint, webhook + email revocation notifications, best-effort issuer notification, `certificate_revocations` table with idempotent recording, 48 new tests
|
||||
- **M15b: OCSP + Revocation GUI** ✅ — embedded OCSP responder (GET /api/v1/ocsp/{issuer_id}/{serial}), DER-encoded X.509 CRL (GET /api/v1/crl/{issuer_id}), short-lived cert exemption (TTL < 1h skip CRL/OCSP), revocation GUI with reason modal, ~31 new tests
|
||||
- **M13: GUI Operations** ✅ — bulk cert operations (multi-select → renew, revoke, reassign owner), deployment status timeline, inline policy/profile editor, target connector configuration wizard, audit trail export (CSV/JSON), short-lived credentials dashboard view
|
||||
- **M14: Observability** ✅ — dashboard charts (expiration heatmap, cert status distribution, job trends, issuance rate), agent fleet overview with OS/arch grouping, JSON metrics endpoint, stats API (5 endpoints), structured logging with request IDs, deployment rollback
|
||||
- **M18a: MCP Server** ✅ (V2.1) — AI-native integration, all 78 REST API endpoints exposed as MCP tools for Claude, Cursor, OpenClaw, and any MCP-compatible client
|
||||
- **M19: Immutable API Audit Log** ✅ — every API call recorded to immutable audit trail (method, path, actor, SHA-256 body hash, status, latency), async recording via goroutine, configurable path exclusions
|
||||
- **M16a: Notifier Connectors** ✅ — Slack (incoming webhook), Microsoft Teams (MessageCard), PagerDuty (Events API v2), OpsGenie (Alert API v2) — config-driven enablement via env vars
|
||||
- **M17: Additional Connectors** ✅ — OpenSSL/Custom CA issuer connector (script-based signing with configurable timeout)
|
||||
- **M16b: CLI + Bulk Import** ✅ — `certctl-cli` with 12 subcommands (certs list/get/renew/revoke, agents list/get, jobs list/get/cancel, import, status, version), stdlib-only, JSON/table output
|
||||
- **M20: Enhanced Query API** ✅ — sparse field selection (`?fields=`), sort with direction (`?sort=-notAfter`), time-range filters (`expires_before`, `created_after`, etc.), cursor-based pagination (`?cursor=&page_size=`), `GET /certificates/{id}/deployments`, additional filters (`agent_id`, `profile_id`)
|
||||
- **M18b: Filesystem Cert Discovery** ✅ — agents scan configured directories (PEM/DER), report findings to control plane, deduplication by SHA-256 fingerprint, claim/dismiss/triage workflow via API
|
||||
- **M21: Network Cert Discovery** ✅ — server-side active TLS scanning of CIDR ranges and ports, concurrent probing (50 goroutines), CIDR expansion with /20 safety cap, sentinel agent pattern for discovery pipeline reuse, CRUD API for scan targets, scheduler integration (6h default)
|
||||
- **M22: Prometheus Metrics** ✅ — `GET /api/v1/metrics/prometheus` returns Prometheus exposition format (`text/plain; version=0.0.4`), 11 metrics with `certctl_` prefix, compatible with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics
|
||||
- **M23: EST Server (RFC 7030)** ✅ — Enrollment over Secure Transport for device/WiFi certificate enrollment, 4 endpoints under /.well-known/est/, PKCS#7 certs-only wire format, base64-encoded DER CSR input, configurable issuer + profile binding, audit trail, 28 new tests
|
||||
- **Compliance Mapping** ✅ — SOC 2 Type II, PCI-DSS 4.0, NIST SP 800-57 capability mapping documentation
|
||||
### V2: Operational Maturity — Shipped
|
||||
30+ milestones shipping enterprise-grade features for free. Sub-CA mode, ACME DNS-01/DNS-PERSIST-01/EAB/ARI (RFC 9773)/profile selection, step-ca, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM PCA, Entrust, GlobalSign, EJBCA, OpenSSL/Custom CA issuers. NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets targets. EST server (RFC 7030) and SCEP server (RFC 8894) enrollment protocols. RFC 5280 revocation with DER CRL + embedded OCSP responder. Certificate profiles, ownership tracking, team assignment, agent groups, interactive approval workflows. Filesystem, network, and cloud secret manager (AWS SM, Azure KV, GCP SM) certificate discovery with triage GUI. Dynamic issuer/target configuration via GUI with AES-256-GCM encrypted storage. First-run onboarding wizard. Post-deployment TLS verification. Certificate export (PEM/PKCS#12). S/MIME support. Prometheus metrics. Scheduled certificate digest emails. Slack, Teams, PagerDuty, OpsGenie, SMTP notifications. MCP server (80 tools), CLI (12 commands), Helm chart. Compliance mapping (SOC 2, PCI-DSS 4.0, NIST SP 800-57). 5 turnkey deployment examples. Agent install script. Migration guides from certbot, acme.sh, and cert-manager. See the [Feature Inventory](docs/features.md) for details.
|
||||
|
||||
### V3: certctl Pro
|
||||
Enterprise capabilities for larger deployments are available in the commercial tier.
|
||||
|
||||
Team access controls, identity provider integration, enterprise deployment targets, compliance and risk scoring, advanced fleet operations, event-driven architecture, advanced search, real-time operational views, and premium CA integrations.
|
||||
|
||||
> **Need SSO, RBAC, F5/IIS deployment, or real-time fleet operations?** [Join the certctl Pro waitlist](https://forms.gle/YOUR_FORM_ID) — early access shipping Q2 2026.
|
||||
|
||||
### V4+: Cloud, Scale & Passive Discovery
|
||||
Passive network discovery (TLS listener), Kubernetes integration, cloud infrastructure targets (AWS ALB/ACM, Azure Key Vault), extended CA support, and platform-scale features.
|
||||
### V4+: Cloud & Scale
|
||||
Kubernetes cert-manager external issuer, cloud infrastructure targets, extended CA support, and platform-scale features.
|
||||
|
||||
## License
|
||||
|
||||
Certctl is licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not offer certctl as a managed/hosted certificate management service to third parties.
|
||||
Certctl is licensed under the [Business Source License 1.1](LICENSE). The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial offering to third parties, whether hosted, managed, embedded, bundled, or integrated. The BSL 1.1 license converts automatically to Apache 2.0 on March 14, 2033.
|
||||
|
||||
For licensing inquiries: certctl@proton.me
|
||||
|
||||
---
|
||||
|
||||
If certctl solves a problem you have, [star the repo](https://github.com/shankar0123/certctl) to help others find it. Questions, bugs, or feature requests — [open an issue](https://github.com/shankar0123/certctl/issues).
|
||||
|
||||
@@ -8,43 +8,67 @@ import (
|
||||
"crypto/rand"
|
||||
"crypto/rsa"
|
||||
"crypto/sha256"
|
||||
"crypto/tls"
|
||||
"crypto/x509"
|
||||
"crypto/x509/pkix"
|
||||
"encoding/json"
|
||||
"encoding/pem"
|
||||
"errors"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"runtime"
|
||||
"strings"
|
||||
"sync"
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/connector/target"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/apache"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/caddy"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/envoy"
|
||||
pf "github.com/shankar0123/certctl/internal/connector/target/postfix"
|
||||
sshconn "github.com/shankar0123/certctl/internal/connector/target/ssh"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/f5"
|
||||
jks "github.com/shankar0123/certctl/internal/connector/target/javakeystore"
|
||||
k8s "github.com/shankar0123/certctl/internal/connector/target/k8ssecret"
|
||||
wcs "github.com/shankar0123/certctl/internal/connector/target/wincertstore"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/haproxy"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/iis"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/nginx"
|
||||
"github.com/shankar0123/certctl/internal/connector/target/traefik"
|
||||
)
|
||||
|
||||
// AgentConfig represents the agent-side configuration.
|
||||
type AgentConfig struct {
|
||||
ServerURL string // Control plane server URL (e.g., http://localhost:8443)
|
||||
APIKey string // Agent API key for authentication
|
||||
AgentName string // Agent name for identification
|
||||
AgentID string // Agent ID for API calls (set after registration or from env)
|
||||
Hostname string // Server hostname
|
||||
KeyDir string // Directory for storing private keys (default: /var/lib/certctl/keys)
|
||||
DiscoveryDirs []string // Directories to scan for certificates (comma-separated via env)
|
||||
ServerURL string // Control plane server URL (e.g., https://localhost:8443) — must be https:// scheme
|
||||
APIKey string // Agent API key for authentication
|
||||
AgentName string // Agent name for identification
|
||||
AgentID string // Agent ID for API calls (set after registration or from env)
|
||||
Hostname string // Server hostname
|
||||
KeyDir string // Directory for storing private keys (default: /var/lib/certctl/keys)
|
||||
DiscoveryDirs []string // Directories to scan for certificates (comma-separated via env)
|
||||
CABundlePath string // Optional path to a PEM-encoded CA bundle that signed the server's cert (empty = system roots)
|
||||
InsecureSkipVerify bool // Dev-only: skip TLS certificate verification. Never enable in production. See docs/tls.md.
|
||||
}
|
||||
|
||||
// ErrAgentRetired is the sentinel returned by [Agent.Run] when the control
|
||||
// plane responds with HTTP 410 Gone to a heartbeat or work-poll request — the
|
||||
// canonical signal that this agent's row has been soft-retired server-side
|
||||
// (see I-004 in cowork/certctl-coverage-gap-audit.md). The binary must
|
||||
// terminate cleanly: an init-system restart would only produce another 410
|
||||
// and wedge the host in a restart loop. main() translates this sentinel into
|
||||
// a zero exit code so systemd (Restart=on-failure) and launchd do not respawn
|
||||
// the process. Do not wrap this error — main() matches it with errors.Is.
|
||||
var ErrAgentRetired = fmt.Errorf("agent retired by control plane")
|
||||
|
||||
// Agent represents the local agent that runs on target servers.
|
||||
// It periodically sends heartbeats, polls for work, executes deployment and CSR jobs,
|
||||
// and scans configured directories for existing certificates.
|
||||
@@ -60,6 +84,17 @@ type Agent struct {
|
||||
pollInterval time.Duration
|
||||
discoveryInterval time.Duration
|
||||
consecutiveFailures int
|
||||
|
||||
// I-004: terminal retirement signal. retiredSignal is closed exactly once
|
||||
// (guarded by retiredOnce) when either sendHeartbeat or pollForWork
|
||||
// observes HTTP 410 Gone. The Run() select loop picks up the close and
|
||||
// returns ErrAgentRetired, unwinding the goroutine cleanly so main() can
|
||||
// log + exit(0). Using a channel + sync.Once (rather than an atomic bool
|
||||
// + polling) lets us fall through the select statement immediately instead
|
||||
// of waiting for the next ticker; the zero-allocation close is safe to
|
||||
// race with ctx.Done() and other cases.
|
||||
retiredOnce sync.Once
|
||||
retiredSignal chan struct{}
|
||||
}
|
||||
|
||||
// WorkResponse represents the response from the work polling endpoint.
|
||||
@@ -82,15 +117,78 @@ type JobItem struct {
|
||||
}
|
||||
|
||||
// NewAgent creates a new agent instance.
|
||||
func NewAgent(cfg *AgentConfig, logger *slog.Logger) *Agent {
|
||||
//
|
||||
// The returned HTTP client enforces HTTPS-only control-plane access per the
|
||||
// HTTPS-Everywhere milestone (see docs/tls.md). TLS 1.3 is required; the
|
||||
// optional CABundlePath loads a PEM bundle into RootCAs so the agent can
|
||||
// trust internal / self-signed server certs without touching system trust
|
||||
// stores. InsecureSkipVerify is a dev-only escape hatch — callers must log a
|
||||
// loud warning when it's set; never enable in production (see §2.4 of the
|
||||
// milestone spec and docs/upgrade-to-tls.md).
|
||||
//
|
||||
// Returns an error if CABundlePath is set but unreadable or malformed — fail
|
||||
// loud at startup rather than silently fall back to system roots, which would
|
||||
// turn a misconfigured bundle path into a cryptic "x509: certificate signed
|
||||
// by unknown authority" on the first heartbeat.
|
||||
func NewAgent(cfg *AgentConfig, logger *slog.Logger) (*Agent, error) {
|
||||
tlsConfig := &tls.Config{
|
||||
MinVersion: tls.VersionTLS13,
|
||||
InsecureSkipVerify: cfg.InsecureSkipVerify, //nolint:gosec // opt-in dev escape hatch, documented in docs/tls.md
|
||||
}
|
||||
if cfg.CABundlePath != "" {
|
||||
pemBytes, err := os.ReadFile(cfg.CABundlePath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("reading CA bundle at %q: %w", cfg.CABundlePath, err)
|
||||
}
|
||||
pool := x509.NewCertPool()
|
||||
if !pool.AppendCertsFromPEM(pemBytes) {
|
||||
return nil, fmt.Errorf("CA bundle at %q contains no valid PEM-encoded certificates", cfg.CABundlePath)
|
||||
}
|
||||
tlsConfig.RootCAs = pool
|
||||
}
|
||||
|
||||
httpClient := &http.Client{
|
||||
Timeout: 30 * time.Second,
|
||||
Transport: &http.Transport{
|
||||
TLSClientConfig: tlsConfig,
|
||||
ForceAttemptHTTP2: true,
|
||||
MaxIdleConns: 10,
|
||||
IdleConnTimeout: 90 * time.Second,
|
||||
TLSHandshakeTimeout: 10 * time.Second,
|
||||
ExpectContinueTimeout: 1 * time.Second,
|
||||
},
|
||||
}
|
||||
|
||||
return &Agent{
|
||||
config: cfg,
|
||||
logger: logger,
|
||||
client: &http.Client{Timeout: 30 * time.Second},
|
||||
client: httpClient,
|
||||
heartbeatInterval: 60 * time.Second,
|
||||
pollInterval: 30 * time.Second,
|
||||
discoveryInterval: 6 * time.Hour, // scan for certs every 6 hours
|
||||
}
|
||||
retiredSignal: make(chan struct{}),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// markRetired records that the control plane has declared this agent retired
|
||||
// (HTTP 410 Gone on heartbeat or work poll). Idempotent via sync.Once — if
|
||||
// both the heartbeat and work-poll paths observe 410 in the same tick, only
|
||||
// the first close() runs and we avoid a runtime panic. Emits an ERROR-level
|
||||
// log line so init-system journaling captures it prominently, and includes
|
||||
// the source (heartbeat/work_poll), response body, and status code so the
|
||||
// operator can verify it's a genuine retirement signal rather than a
|
||||
// misrouted request. After this returns, the select-loop case in Run()
|
||||
// observes the closed channel on its next iteration and returns
|
||||
// ErrAgentRetired.
|
||||
func (a *Agent) markRetired(source string, statusCode int, body string) {
|
||||
a.retiredOnce.Do(func() {
|
||||
a.logger.Error("agent has been retired by control plane — shutting down",
|
||||
"source", source,
|
||||
"status", statusCode,
|
||||
"body", body,
|
||||
"agent_id", a.config.AgentID)
|
||||
close(a.retiredSignal)
|
||||
})
|
||||
}
|
||||
|
||||
// Run starts the agent's main loop.
|
||||
@@ -146,6 +244,19 @@ func (a *Agent) Run(ctx context.Context) error {
|
||||
a.logger.Info("agent shutting down", "reason", ctx.Err())
|
||||
return ctx.Err()
|
||||
|
||||
// I-004: retiredSignal is closed exactly once (via markRetired's
|
||||
// sync.Once) when either sendHeartbeat or pollForWork observes HTTP 410
|
||||
// Gone from the control plane. Falling through this case immediately
|
||||
// (rather than waiting for the next ticker) lets the agent shut down
|
||||
// quickly once retirement is confirmed — every extra heartbeat against a
|
||||
// retired row is wasted work and noise in the audit trail. Returning
|
||||
// ErrAgentRetired propagates up to main(), which matches it with
|
||||
// errors.Is and exits(0) so systemd/launchd do not respawn the process.
|
||||
case <-a.retiredSignal:
|
||||
a.logger.Info("agent retired signal received — exiting event loop",
|
||||
"agent_id", a.config.AgentID)
|
||||
return ErrAgentRetired
|
||||
|
||||
case <-heartbeatTicker.C:
|
||||
a.sendHeartbeat(ctx)
|
||||
|
||||
@@ -158,7 +269,14 @@ func (a *Agent) Run(ctx context.Context) error {
|
||||
a.logger.Warn("backing off due to consecutive failures",
|
||||
"failures", a.consecutiveFailures,
|
||||
"backoff", backoff.String())
|
||||
time.Sleep(backoff)
|
||||
// F-003: ctx-aware wait so graceful shutdown does not stall on
|
||||
// a long backoff. If ctx cancels mid-backoff, return to the
|
||||
// outer loop so the <-ctx.Done() case can trigger clean exit.
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
continue
|
||||
case <-time.After(backoff):
|
||||
}
|
||||
}
|
||||
a.pollForWork(ctx)
|
||||
|
||||
@@ -201,6 +319,22 @@ func (a *Agent) sendHeartbeat(ctx context.Context) {
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// I-004: HTTP 410 Gone is the terminal signal from the control plane that
|
||||
// this agent's row has been soft-retired (see internal/api/handler/agent.go
|
||||
// heartbeat path + AgentRetirementService). Treat it separately from the
|
||||
// generic non-200 error branch: record the event to markRetired (which closes
|
||||
// retiredSignal exactly once via sync.Once) and return without bumping
|
||||
// consecutiveFailures — this is not a transient failure, it's a clean
|
||||
// shutdown. The Run() select loop picks up the closed channel on its next
|
||||
// iteration and returns ErrAgentRetired, which main() translates into an
|
||||
// exit(0) so systemd/launchd don't respawn the process into another 410
|
||||
// loop.
|
||||
if resp.StatusCode == http.StatusGone {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
a.markRetired("heartbeat", resp.StatusCode, string(body))
|
||||
return
|
||||
}
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
a.logger.Error("heartbeat rejected",
|
||||
@@ -229,6 +363,19 @@ func (a *Agent) pollForWork(ctx context.Context) {
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
|
||||
// other hot path that can observe an agent's soft-retirement; if the
|
||||
// heartbeat tick happens to fire after a work-poll tick within the same
|
||||
// retirement window, this branch catches it first. markRetired's sync.Once
|
||||
// guards idempotency so racing both paths in the same tick only closes the
|
||||
// signal channel once. No consecutiveFailures increment — retirement is
|
||||
// not a transient failure.
|
||||
if resp.StatusCode == http.StatusGone {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
a.markRetired("work_poll", resp.StatusCode, string(body))
|
||||
return
|
||||
}
|
||||
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
a.logger.Error("work poll rejected",
|
||||
@@ -342,11 +489,23 @@ func (a *Agent) executeCSRJob(ctx context.Context, job JobItem) {
|
||||
}
|
||||
|
||||
// Step 3: Create CSR with common name and SANs
|
||||
// Split SANs into DNS names and email addresses for proper CSR encoding
|
||||
var dnsNames []string
|
||||
var emailAddresses []string
|
||||
for _, san := range job.SANs {
|
||||
if strings.Contains(san, "@") {
|
||||
emailAddresses = append(emailAddresses, san)
|
||||
} else {
|
||||
dnsNames = append(dnsNames, san)
|
||||
}
|
||||
}
|
||||
|
||||
csrTemplate := &x509.CertificateRequest{
|
||||
Subject: pkix.Name{
|
||||
CommonName: job.CommonName,
|
||||
},
|
||||
DNSNames: job.SANs,
|
||||
DNSNames: dnsNames,
|
||||
EmailAddresses: emailAddresses,
|
||||
}
|
||||
|
||||
csrDER, err := x509.CreateCertificateRequest(rand.Reader, csrTemplate, privKey)
|
||||
@@ -508,6 +667,16 @@ func (a *Agent) executeDeploymentJob(ctx context.Context, job JobItem) {
|
||||
"target_type", job.TargetType,
|
||||
"success", result.Success,
|
||||
"message", result.Message)
|
||||
|
||||
// If verification is enabled, verify the deployment by probing the live TLS endpoint
|
||||
targetHost, targetPort, err := extractTargetHostAndPort(job.TargetConfig)
|
||||
if err != nil {
|
||||
a.logger.Warn("could not extract target host/port for verification",
|
||||
"job_id", job.ID,
|
||||
"error", err)
|
||||
} else {
|
||||
a.verifyAndReportDeployment(ctx, job, targetHost, targetPort, certOnly)
|
||||
}
|
||||
} else {
|
||||
a.logger.Info("no target type specified, skipping connector invocation",
|
||||
"job_id", job.ID)
|
||||
@@ -559,7 +728,11 @@ func (a *Agent) createTargetConnector(targetType string, configJSON json.RawMess
|
||||
return nil, fmt.Errorf("invalid F5 config: %w", err)
|
||||
}
|
||||
}
|
||||
return f5.New(&cfg, a.logger), nil
|
||||
conn, err := f5.New(&cfg, a.logger)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to create F5 connector: %w", err)
|
||||
}
|
||||
return conn, nil
|
||||
|
||||
case "IIS":
|
||||
var cfg iis.Config
|
||||
@@ -568,7 +741,90 @@ func (a *Agent) createTargetConnector(targetType string, configJSON json.RawMess
|
||||
return nil, fmt.Errorf("invalid IIS config: %w", err)
|
||||
}
|
||||
}
|
||||
return iis.New(&cfg, a.logger), nil
|
||||
return iis.New(&cfg, a.logger)
|
||||
|
||||
case "Traefik":
|
||||
var cfg traefik.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid Traefik config: %w", err)
|
||||
}
|
||||
}
|
||||
return traefik.New(&cfg, a.logger), nil
|
||||
|
||||
case "Caddy":
|
||||
var cfg caddy.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid Caddy config: %w", err)
|
||||
}
|
||||
}
|
||||
return caddy.New(&cfg, a.logger), nil
|
||||
|
||||
case "Envoy":
|
||||
var cfg envoy.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid Envoy config: %w", err)
|
||||
}
|
||||
}
|
||||
return envoy.New(&cfg, a.logger), nil
|
||||
|
||||
case "Postfix":
|
||||
var cfg pf.Config
|
||||
cfg.Mode = "postfix"
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid Postfix config: %w", err)
|
||||
}
|
||||
}
|
||||
return pf.New(&cfg, a.logger), nil
|
||||
|
||||
case "Dovecot":
|
||||
var cfg pf.Config
|
||||
cfg.Mode = "dovecot"
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid Dovecot config: %w", err)
|
||||
}
|
||||
}
|
||||
return pf.New(&cfg, a.logger), nil
|
||||
|
||||
case "SSH":
|
||||
var cfg sshconn.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid SSH config: %w", err)
|
||||
}
|
||||
}
|
||||
return sshconn.New(&cfg, a.logger)
|
||||
|
||||
case "WinCertStore":
|
||||
var cfg wcs.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid WinCertStore config: %w", err)
|
||||
}
|
||||
}
|
||||
return wcs.New(&cfg, a.logger)
|
||||
|
||||
case "JavaKeystore":
|
||||
var cfg jks.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid JavaKeystore config: %w", err)
|
||||
}
|
||||
}
|
||||
return jks.New(&cfg, a.logger), nil
|
||||
|
||||
case "KubernetesSecrets":
|
||||
var cfg k8s.Config
|
||||
if len(configJSON) > 0 {
|
||||
if err := json.Unmarshal(configJSON, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid KubernetesSecrets config: %w", err)
|
||||
}
|
||||
}
|
||||
return k8s.New(&cfg, a.logger)
|
||||
|
||||
default:
|
||||
return nil, fmt.Errorf("unsupported target type: %s", targetType)
|
||||
@@ -914,12 +1170,14 @@ func certKeyInfo(cert *x509.Certificate) (string, int) {
|
||||
|
||||
func main() {
|
||||
// Parse command-line flags (with env var fallbacks for Docker deployment)
|
||||
serverURL := flag.String("server", getEnvDefault("CERTCTL_SERVER_URL", "http://localhost:8443"), "Control plane server URL")
|
||||
serverURL := flag.String("server", getEnvDefault("CERTCTL_SERVER_URL", "https://localhost:8443"), "Control plane server URL (must be https://)")
|
||||
apiKey := flag.String("api-key", getEnvDefault("CERTCTL_API_KEY", ""), "Agent API key")
|
||||
agentName := flag.String("name", getEnvDefault("CERTCTL_AGENT_NAME", "certctl-agent"), "Agent name")
|
||||
agentID := flag.String("agent-id", getEnvDefault("CERTCTL_AGENT_ID", ""), "Agent ID (from registration)")
|
||||
keyDir := flag.String("key-dir", getEnvDefault("CERTCTL_KEY_DIR", "/var/lib/certctl/keys"), "Directory for storing private keys")
|
||||
discoveryDirsStr := flag.String("discovery-dirs", getEnvDefault("CERTCTL_DISCOVERY_DIRS", ""), "Comma-separated directories to scan for certificates")
|
||||
caBundlePath := flag.String("ca-bundle", getEnvDefault("CERTCTL_SERVER_CA_BUNDLE_PATH", ""), "Path to a PEM-encoded CA bundle that signed the server's TLS cert (optional; falls back to system roots)")
|
||||
insecureSkipVerify := flag.Bool("insecure-skip-verify", getEnvBoolDefault("CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY", false), "Dev-only: skip TLS certificate verification. Never enable in production. See docs/tls.md.")
|
||||
flag.Parse()
|
||||
|
||||
if *apiKey == "" {
|
||||
@@ -933,6 +1191,18 @@ func main() {
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Pre-flight URL-scheme validation — reject plaintext http:// before any
|
||||
// network call. The HTTPS-Everywhere milestone (§2.4, §7) mandates that
|
||||
// mis-configured agents fail loudly at startup with a diagnostic pointing
|
||||
// at the upgrade guide, rather than producing a TCP-refused or
|
||||
// TLS-handshake-error that obscures the actual cause.
|
||||
if err := validateHTTPSScheme(*serverURL); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
|
||||
fmt.Fprintf(os.Stderr, "\nThe certctl control plane is HTTPS-only as of v2.2.\n")
|
||||
fmt.Fprintf(os.Stderr, "See docs/upgrade-to-tls.md for the cutover walkthrough.\n")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Set up structured logging
|
||||
logLevel := slog.LevelInfo
|
||||
if getEnvDefault("CERTCTL_LOG_LEVEL", "info") == "debug" {
|
||||
@@ -961,17 +1231,27 @@ func main() {
|
||||
|
||||
// Create agent configuration
|
||||
agentCfg := &AgentConfig{
|
||||
ServerURL: *serverURL,
|
||||
APIKey: *apiKey,
|
||||
AgentName: *agentName,
|
||||
AgentID: *agentID,
|
||||
Hostname: hostname,
|
||||
KeyDir: *keyDir,
|
||||
DiscoveryDirs: discoveryDirs,
|
||||
ServerURL: *serverURL,
|
||||
APIKey: *apiKey,
|
||||
AgentName: *agentName,
|
||||
AgentID: *agentID,
|
||||
Hostname: hostname,
|
||||
KeyDir: *keyDir,
|
||||
DiscoveryDirs: discoveryDirs,
|
||||
CABundlePath: *caBundlePath,
|
||||
InsecureSkipVerify: *insecureSkipVerify,
|
||||
}
|
||||
|
||||
if agentCfg.InsecureSkipVerify {
|
||||
logger.Warn("TLS certificate verification is disabled (CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true) — never enable this in production")
|
||||
}
|
||||
|
||||
// Create and start agent
|
||||
agent := NewAgent(agentCfg, logger)
|
||||
agent, err := NewAgent(agentCfg, logger)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: failed to initialize agent: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Create context with cancellation for graceful shutdown
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
@@ -1000,6 +1280,19 @@ func main() {
|
||||
cancel()
|
||||
<-errChan
|
||||
case err := <-errChan:
|
||||
// I-004: ErrAgentRetired is a terminal, *clean* shutdown — the control
|
||||
// plane responded HTTP 410 Gone on heartbeat/work-poll, meaning this
|
||||
// agent's row has been soft-retired and will never be reachable again.
|
||||
// Exit 0 so systemd's Restart=on-failure and launchd's KeepAlive do NOT
|
||||
// respawn the process into another 410 loop (which would wedge the host
|
||||
// and spam the control plane). Operators can observe the retirement via
|
||||
// audit_events or the AgentsPage retired tab; the terminal log line on
|
||||
// the way out is enough for post-mortem forensics.
|
||||
if errors.Is(err, ErrAgentRetired) {
|
||||
logger.Info("agent retired by control plane — exiting without restart",
|
||||
"agent_id", agentCfg.AgentID)
|
||||
return
|
||||
}
|
||||
if err != context.Canceled {
|
||||
logger.Error("agent error", "error", err)
|
||||
os.Exit(1)
|
||||
@@ -1016,3 +1309,49 @@ func getEnvDefault(key, defaultValue string) string {
|
||||
}
|
||||
return defaultValue
|
||||
}
|
||||
|
||||
// getEnvBoolDefault parses an environment variable as a boolean. Accepts "1",
|
||||
// "t", "true", "T", "TRUE", "True" as true; anything else (including empty)
|
||||
// returns the provided default. Kept permissive on purpose so operators can
|
||||
// flip the dev-only TLS skip-verify toggle with any common truthy spelling
|
||||
// without having to remember exactly what we parse.
|
||||
func getEnvBoolDefault(key string, defaultValue bool) bool {
|
||||
raw := os.Getenv(key)
|
||||
if raw == "" {
|
||||
return defaultValue
|
||||
}
|
||||
switch strings.ToLower(strings.TrimSpace(raw)) {
|
||||
case "1", "t", "true", "yes", "on":
|
||||
return true
|
||||
case "0", "f", "false", "no", "off":
|
||||
return false
|
||||
default:
|
||||
return defaultValue
|
||||
}
|
||||
}
|
||||
|
||||
// validateHTTPSScheme enforces the HTTPS-Everywhere milestone's §7 acceptance
|
||||
// criterion: "Agent with CERTCTL_SERVER_URL=http://... fails at startup with
|
||||
// a fail-loud diagnostic pointing at docs/upgrade-to-tls.md. Not TCP-refused,
|
||||
// not TLS-handshake-error — a pre-flight config validation failure before any
|
||||
// network call." Returns a descriptive error; the caller prints the upgrade
|
||||
// guide pointer and exits non-zero.
|
||||
func validateHTTPSScheme(serverURL string) error {
|
||||
if serverURL == "" {
|
||||
return fmt.Errorf("CERTCTL_SERVER_URL is empty — set it to an https:// URL (e.g., https://certctl-server:8443)")
|
||||
}
|
||||
u, err := url.Parse(serverURL)
|
||||
if err != nil {
|
||||
return fmt.Errorf("CERTCTL_SERVER_URL %q is not a valid URL: %w", serverURL, err)
|
||||
}
|
||||
switch strings.ToLower(u.Scheme) {
|
||||
case "https":
|
||||
return nil
|
||||
case "http":
|
||||
return fmt.Errorf("CERTCTL_SERVER_URL %q uses plaintext http:// — the certctl control plane is HTTPS-only", serverURL)
|
||||
case "":
|
||||
return fmt.Errorf("CERTCTL_SERVER_URL %q is missing a scheme — expected https://", serverURL)
|
||||
default:
|
||||
return fmt.Errorf("CERTCTL_SERVER_URL %q uses unsupported scheme %q — expected https://", serverURL, u.Scheme)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,285 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"context"
|
||||
"crypto/sha256"
|
||||
"crypto/tls"
|
||||
"crypto/x509"
|
||||
"encoding/json"
|
||||
"encoding/pem"
|
||||
"fmt"
|
||||
"io"
|
||||
"log/slog"
|
||||
"net"
|
||||
"net/http"
|
||||
"time"
|
||||
)
|
||||
|
||||
// verifyDeployment probes the live TLS endpoint for a deployment target and verifies
|
||||
// that the deployed certificate matches what we expect.
|
||||
//
|
||||
// Parameters:
|
||||
// - targetHost: the hostname or IP of the target (extracted from target config)
|
||||
// - targetPort: the TLS port of the target (e.g., 443)
|
||||
// - expectedCertPEM: the PEM-encoded certificate that was deployed
|
||||
// - delay: wait time before probing (e.g., 2 seconds for reload to take effect)
|
||||
// - timeout: overall timeout for TLS connection attempt (e.g., 10 seconds)
|
||||
//
|
||||
// Returns:
|
||||
// - A VerificationResult if probing succeeded (even if cert doesn't match)
|
||||
// - An error if the probe itself failed (network error, timeout, etc.)
|
||||
//
|
||||
// The function compares the SHA-256 fingerprints of the expected and actual certificates.
|
||||
// If the certificate served at the endpoint differs, Verified will be false but no error
|
||||
// is returned — this is an expected verification failure, not a probe failure.
|
||||
func verifyDeployment(
|
||||
ctx context.Context,
|
||||
targetHost string,
|
||||
targetPort int,
|
||||
expectedCertPEM string,
|
||||
delay time.Duration,
|
||||
timeout time.Duration,
|
||||
logger *slog.Logger,
|
||||
) (*VerificationResult, error) {
|
||||
// Wait for reload to take effect
|
||||
if delay > 0 {
|
||||
select {
|
||||
case <-time.After(delay):
|
||||
case <-ctx.Done():
|
||||
return nil, ctx.Err()
|
||||
}
|
||||
}
|
||||
|
||||
// Parse expected certificate to compute its fingerprint
|
||||
expectedFp, err := computeCertificateFingerprint(expectedCertPEM)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to parse expected certificate: %w", err)
|
||||
}
|
||||
|
||||
// Connect to the target's TLS endpoint
|
||||
address := fmt.Sprintf("%s:%d", targetHost, targetPort)
|
||||
if logger != nil {
|
||||
logger.Debug("probing TLS endpoint for verification",
|
||||
"address", address,
|
||||
"expected_fingerprint", expectedFp)
|
||||
}
|
||||
|
||||
dialer := &net.Dialer{Timeout: timeout}
|
||||
conn, err := tls.DialWithDialer(dialer, "tcp", address, &tls.Config{
|
||||
// SECURITY NOTE: InsecureSkipVerify is intentionally set to true here.
|
||||
// Post-deployment verification must probe the live endpoint to extract and
|
||||
// compare the served certificate fingerprint, regardless of its validity
|
||||
// state (expired, self-signed, internal CA, etc.). This setting is scoped
|
||||
// to verification probing only — it is NEVER used for control-plane API
|
||||
// calls, issuer connector communication, or any operation that trusts the
|
||||
// certificate. The verification result compares SHA-256 fingerprints only.
|
||||
// See TICKET-016 for full security audit rationale.
|
||||
InsecureSkipVerify: true,
|
||||
ServerName: targetHost, // For SNI
|
||||
})
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to connect to %s: %w", address, err)
|
||||
}
|
||||
defer conn.Close()
|
||||
|
||||
// Extract the leaf certificate from the TLS connection
|
||||
state := conn.ConnectionState()
|
||||
if len(state.PeerCertificates) == 0 {
|
||||
return nil, fmt.Errorf("no certificates presented by %s", address)
|
||||
}
|
||||
|
||||
leafCert := state.PeerCertificates[0]
|
||||
actualFp := fmt.Sprintf("%x", sha256.Sum256(leafCert.Raw))
|
||||
|
||||
if logger != nil {
|
||||
logger.Debug("received certificate from endpoint",
|
||||
"address", address,
|
||||
"cn", leafCert.Subject.CommonName,
|
||||
"actual_fingerprint", actualFp)
|
||||
}
|
||||
|
||||
// Compare fingerprints
|
||||
verified := actualFp == expectedFp
|
||||
if logger != nil {
|
||||
if !verified {
|
||||
logger.Warn("certificate fingerprint mismatch at endpoint",
|
||||
"address", address,
|
||||
"expected_fingerprint", expectedFp,
|
||||
"actual_fingerprint", actualFp)
|
||||
} else {
|
||||
logger.Info("certificate verification succeeded",
|
||||
"address", address,
|
||||
"fingerprint", actualFp)
|
||||
}
|
||||
}
|
||||
|
||||
return &VerificationResult{
|
||||
ExpectedFingerprint: expectedFp,
|
||||
ActualFingerprint: actualFp,
|
||||
Verified: verified,
|
||||
VerifiedAt: time.Now().UTC(),
|
||||
}, nil
|
||||
}
|
||||
|
||||
// VerificationResult represents the outcome of verifying a deployed certificate.
|
||||
type VerificationResult struct {
|
||||
ExpectedFingerprint string `json:"expected_fingerprint"`
|
||||
ActualFingerprint string `json:"actual_fingerprint"`
|
||||
Verified bool `json:"verified"`
|
||||
VerifiedAt time.Time `json:"verified_at"`
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
// computeCertificateFingerprint computes the SHA-256 fingerprint of a PEM-encoded certificate.
|
||||
func computeCertificateFingerprint(certPEM string) (string, error) {
|
||||
block, _ := pem.Decode([]byte(certPEM))
|
||||
if block == nil {
|
||||
return "", fmt.Errorf("failed to decode PEM certificate")
|
||||
}
|
||||
|
||||
cert, err := x509.ParseCertificate(block.Bytes)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to parse x509 certificate: %w", err)
|
||||
}
|
||||
|
||||
fp := sha256.Sum256(cert.Raw)
|
||||
return fmt.Sprintf("%x", fp), nil
|
||||
}
|
||||
|
||||
// reportVerificationResult submits the verification result back to the control plane.
|
||||
// This is a best-effort operation — a failure to report doesn't block agent progress.
|
||||
func (a *Agent) reportVerificationResult(
|
||||
ctx context.Context,
|
||||
jobID string,
|
||||
targetID string,
|
||||
result *VerificationResult,
|
||||
) error {
|
||||
if jobID == "" || targetID == "" || result == nil {
|
||||
return fmt.Errorf("missing required fields for verification report")
|
||||
}
|
||||
|
||||
// Build the request payload
|
||||
payload := map[string]interface{}{
|
||||
"target_id": targetID,
|
||||
"expected_fingerprint": result.ExpectedFingerprint,
|
||||
"actual_fingerprint": result.ActualFingerprint,
|
||||
"verified": result.Verified,
|
||||
"error": result.Error,
|
||||
}
|
||||
|
||||
body, err := json.Marshal(payload)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to marshal verification result: %w", err)
|
||||
}
|
||||
|
||||
// POST to /api/v1/jobs/{id}/verify
|
||||
url := fmt.Sprintf("%s/api/v1/jobs/%s/verify", a.config.ServerURL, jobID)
|
||||
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(body))
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create verification request: %w", err)
|
||||
}
|
||||
|
||||
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", a.config.APIKey))
|
||||
req.Header.Set("Content-Type", "application/json")
|
||||
|
||||
resp, err := a.client.Do(req)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to send verification result: %w", err)
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
// Check response status
|
||||
if resp.StatusCode != http.StatusOK {
|
||||
bodyBytes, _ := io.ReadAll(resp.Body)
|
||||
return fmt.Errorf("verification reporting failed with status %d: %s", resp.StatusCode, string(bodyBytes))
|
||||
}
|
||||
|
||||
if a.logger != nil {
|
||||
a.logger.Debug("verification result reported to control plane",
|
||||
"job_id", jobID,
|
||||
"verified", result.Verified)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractTargetHostAndPort extracts the host and port from target configuration.
|
||||
// Common target configs include "host" or "hostname" and "port" fields.
|
||||
func extractTargetHostAndPort(configJSON json.RawMessage) (string, int, error) {
|
||||
var config map[string]interface{}
|
||||
if err := json.Unmarshal(configJSON, &config); err != nil {
|
||||
return "", 0, fmt.Errorf("invalid target config JSON: %w", err)
|
||||
}
|
||||
|
||||
// Try common field names for hostname
|
||||
var host string
|
||||
for _, key := range []string{"host", "hostname", "target", "address"} {
|
||||
if h, ok := config[key].(string); ok && h != "" {
|
||||
host = h
|
||||
break
|
||||
}
|
||||
}
|
||||
if host == "" {
|
||||
return "", 0, fmt.Errorf("target config missing host/hostname field")
|
||||
}
|
||||
|
||||
// Try common field names for port, default to 443
|
||||
port := 443
|
||||
if p, ok := config["port"].(float64); ok {
|
||||
port = int(p)
|
||||
}
|
||||
if port < 1 || port > 65535 {
|
||||
return "", 0, fmt.Errorf("invalid port: %d", port)
|
||||
}
|
||||
|
||||
return host, port, nil
|
||||
}
|
||||
|
||||
// verifyAndReportDeployment performs TLS endpoint verification and reports the result.
|
||||
// This is a best-effort operation — failures are logged but don't affect deployment status.
|
||||
func (a *Agent) verifyAndReportDeployment(
|
||||
ctx context.Context,
|
||||
job JobItem,
|
||||
targetHost string,
|
||||
targetPort int,
|
||||
certPEM string,
|
||||
) {
|
||||
// Perform verification with configured timeout and delay
|
||||
result, err := verifyDeployment(ctx, targetHost, targetPort, certPEM,
|
||||
2*time.Second, // delay before probing
|
||||
10*time.Second, // timeout for TLS connection
|
||||
a.logger)
|
||||
|
||||
if err != nil {
|
||||
if a.logger != nil {
|
||||
a.logger.Warn("verification probe failed",
|
||||
"job_id", job.ID,
|
||||
"target_host", targetHost,
|
||||
"target_port", targetPort,
|
||||
"error", err)
|
||||
}
|
||||
// Probe failure: report error but continue
|
||||
result = &VerificationResult{
|
||||
Error: err.Error(),
|
||||
VerifiedAt: time.Now().UTC(),
|
||||
}
|
||||
}
|
||||
|
||||
// Report result to control plane
|
||||
if job.TargetID == nil {
|
||||
if a.logger != nil {
|
||||
a.logger.Warn("cannot report verification: target_id is nil", "job_id", job.ID)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
if err := a.reportVerificationResult(ctx, job.ID, *job.TargetID, result); err != nil {
|
||||
if a.logger != nil {
|
||||
a.logger.Warn("failed to report verification result",
|
||||
"job_id", job.ID,
|
||||
"error", err)
|
||||
}
|
||||
// Non-blocking: continue even if report fails
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,431 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/ecdsa"
|
||||
"crypto/elliptic"
|
||||
"crypto/rand"
|
||||
"crypto/x509"
|
||||
"crypto/x509/pkix"
|
||||
"encoding/json"
|
||||
"encoding/pem"
|
||||
"fmt"
|
||||
"math/big"
|
||||
"net"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestComputeCertificateFingerprint(t *testing.T) {
|
||||
// Generate a test certificate for fingerprint validation
|
||||
cert, err := generateTestCert()
|
||||
if err != nil {
|
||||
t.Fatalf("failed to generate test cert: %v", err)
|
||||
}
|
||||
|
||||
certPEM := string(pem.EncodeToMemory(&pem.Block{
|
||||
Type: "CERTIFICATE",
|
||||
Bytes: cert.Raw,
|
||||
}))
|
||||
|
||||
fp, err := computeCertificateFingerprint(certPEM)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
if len(fp) != 64 { // SHA256 hex = 64 chars
|
||||
t.Errorf("expected 64 char fingerprint, got %d", len(fp))
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeCertificateFingerprint_InvalidPEM(t *testing.T) {
|
||||
_, err := computeCertificateFingerprint("not a valid pem")
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid PEM")
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeCertificateFingerprint_EmptyString(t *testing.T) {
|
||||
_, err := computeCertificateFingerprint("")
|
||||
if err == nil {
|
||||
t.Error("expected error for empty string")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_ValidConfig(t *testing.T) {
|
||||
config := map[string]interface{}{
|
||||
"host": "example.com",
|
||||
"port": 443.0,
|
||||
}
|
||||
configJSON, _ := json.Marshal(config)
|
||||
|
||||
host, port, err := extractTargetHostAndPort(configJSON)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
if host != "example.com" {
|
||||
t.Errorf("expected host example.com, got %s", host)
|
||||
}
|
||||
if port != 443 {
|
||||
t.Errorf("expected port 443, got %d", port)
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_DefaultPort(t *testing.T) {
|
||||
config := map[string]interface{}{
|
||||
"hostname": "test.local",
|
||||
}
|
||||
configJSON, _ := json.Marshal(config)
|
||||
|
||||
host, port, err := extractTargetHostAndPort(configJSON)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
if host != "test.local" {
|
||||
t.Errorf("expected host test.local, got %s", host)
|
||||
}
|
||||
if port != 443 {
|
||||
t.Errorf("expected default port 443, got %d", port)
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_MissingHost(t *testing.T) {
|
||||
config := map[string]interface{}{
|
||||
"port": 443.0,
|
||||
}
|
||||
configJSON, _ := json.Marshal(config)
|
||||
|
||||
_, _, err := extractTargetHostAndPort(configJSON)
|
||||
if err == nil {
|
||||
t.Error("expected error for missing host")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_InvalidJSON(t *testing.T) {
|
||||
configJSON := []byte("invalid json{")
|
||||
|
||||
_, _, err := extractTargetHostAndPort(configJSON)
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid JSON")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_AlternativeFieldNames(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
config map[string]interface{}
|
||||
expected string
|
||||
}{
|
||||
{"host", map[string]interface{}{"host": "host1.com"}, "host1.com"},
|
||||
{"hostname", map[string]interface{}{"hostname": "host2.com"}, "host2.com"},
|
||||
{"target", map[string]interface{}{"target": "host3.com"}, "host3.com"},
|
||||
{"address", map[string]interface{}{"address": "host4.com"}, "host4.com"},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
configJSON, _ := json.Marshal(tt.config)
|
||||
host, _, err := extractTargetHostAndPort(configJSON)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
if host != tt.expected {
|
||||
t.Errorf("expected %s, got %s", tt.expected, host)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyDeployment_Timeout(t *testing.T) {
|
||||
cert, _ := generateTestCert()
|
||||
certPEM := string(pem.EncodeToMemory(&pem.Block{
|
||||
Type: "CERTIFICATE",
|
||||
Bytes: cert.Raw,
|
||||
}))
|
||||
|
||||
ctx := context.Background()
|
||||
result, err := verifyDeployment(ctx, "192.0.2.1", 443, certPEM, 0, 100*time.Millisecond, nil)
|
||||
|
||||
// Connection to reserved test IP should timeout or fail
|
||||
if err == nil && result == nil {
|
||||
t.Error("expected error or result for unreachable host")
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyDeployment_InvalidCertPEM(t *testing.T) {
|
||||
ctx := context.Background()
|
||||
result, err := verifyDeployment(ctx, "localhost", 443, "not a cert", 0, 5*time.Second, nil)
|
||||
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid certificate PEM")
|
||||
}
|
||||
if result != nil {
|
||||
t.Error("expected no result on error")
|
||||
}
|
||||
}
|
||||
|
||||
// Helper function to generate a test certificate for testing
|
||||
func generateTestCert() (*x509.Certificate, error) {
|
||||
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
template := &x509.Certificate{
|
||||
SerialNumber: big.NewInt(1),
|
||||
Subject: pkix.Name{
|
||||
CommonName: "test.example.com",
|
||||
},
|
||||
NotBefore: time.Now(),
|
||||
NotAfter: time.Now().Add(24 * time.Hour),
|
||||
KeyUsage: x509.KeyUsageDigitalSignature,
|
||||
BasicConstraintsValid: true,
|
||||
DNSNames: []string{"test.example.com"},
|
||||
}
|
||||
|
||||
certDER, err := x509.CreateCertificate(rand.Reader, template, template, &key.PublicKey, key)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return x509.ParseCertificate(certDER)
|
||||
}
|
||||
|
||||
func TestReportVerificationResult_Success(t *testing.T) {
|
||||
// Create mock HTTP server
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
if r.URL.Path != "/api/v1/jobs/j-test/verify" {
|
||||
t.Errorf("unexpected path: %s", r.URL.Path)
|
||||
}
|
||||
if r.Method != "POST" {
|
||||
t.Errorf("unexpected method: %s", r.Method)
|
||||
}
|
||||
|
||||
// Check auth header
|
||||
auth := r.Header.Get("Authorization")
|
||||
if auth != "Bearer test-api-key" {
|
||||
t.Errorf("unexpected auth header: %s", auth)
|
||||
}
|
||||
|
||||
// Verify request body
|
||||
var payload map[string]interface{}
|
||||
json.NewDecoder(r.Body).Decode(&payload)
|
||||
if payload["verified"] != true {
|
||||
t.Error("expected verified to be true")
|
||||
}
|
||||
|
||||
w.WriteHeader(http.StatusOK)
|
||||
json.NewEncoder(w).Encode(map[string]interface{}{
|
||||
"job_id": "j-test",
|
||||
"verified": true,
|
||||
})
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
cfg := &AgentConfig{
|
||||
ServerURL: server.URL,
|
||||
APIKey: "test-api-key",
|
||||
}
|
||||
agent, _ := NewAgent(cfg, nil)
|
||||
|
||||
result := &VerificationResult{
|
||||
ExpectedFingerprint: "abc123",
|
||||
ActualFingerprint: "abc123",
|
||||
Verified: true,
|
||||
VerifiedAt: time.Now().UTC(),
|
||||
}
|
||||
|
||||
err := agent.reportVerificationResult(context.Background(), "j-test", "t-nginx1", result)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReportVerificationResult_MissingFields(t *testing.T) {
|
||||
agent, _ := NewAgent(&AgentConfig{}, nil)
|
||||
|
||||
result := &VerificationResult{
|
||||
Verified: true,
|
||||
VerifiedAt: time.Now().UTC(),
|
||||
}
|
||||
|
||||
err := agent.reportVerificationResult(context.Background(), "", "t-nginx1", result)
|
||||
if err == nil {
|
||||
t.Error("expected error for missing job ID")
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyDeployment_ContextCancellation(t *testing.T) {
|
||||
cert, _ := generateTestCert()
|
||||
certPEM := string(pem.EncodeToMemory(&pem.Block{
|
||||
Type: "CERTIFICATE",
|
||||
Bytes: cert.Raw,
|
||||
}))
|
||||
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
cancel() // Cancel immediately
|
||||
|
||||
result, err := verifyDeployment(ctx, "localhost", 443, certPEM, 1*time.Second, 5*time.Second, nil)
|
||||
|
||||
if err == nil {
|
||||
t.Error("expected error for cancelled context")
|
||||
}
|
||||
if result != nil {
|
||||
t.Error("expected no result on context cancellation")
|
||||
}
|
||||
}
|
||||
|
||||
// Mock TLS server for verification testing.
|
||||
// Reserved for future use when real TLS verification integration tests are added.
|
||||
var _ = func(t *testing.T, cert *x509.Certificate) (string, func()) {
|
||||
// Create TLS listener with test certificate
|
||||
listener, err := net.Listen("tcp", "127.0.0.1:0")
|
||||
if err != nil {
|
||||
t.Fatalf("failed to create listener: %v", err)
|
||||
}
|
||||
|
||||
address := listener.Addr().String()
|
||||
|
||||
go func() {
|
||||
conn, err := listener.Accept()
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
defer conn.Close()
|
||||
// Simple echo to keep connection alive
|
||||
buf := make([]byte, 1024)
|
||||
conn.Read(buf) //nolint:errcheck
|
||||
}()
|
||||
|
||||
cleanup := func() {
|
||||
listener.Close()
|
||||
}
|
||||
|
||||
return address, cleanup
|
||||
}
|
||||
|
||||
func TestVerificationResult_JSONMarshaling(t *testing.T) {
|
||||
now := time.Now().UTC()
|
||||
result := &VerificationResult{
|
||||
ExpectedFingerprint: "abc123",
|
||||
ActualFingerprint: "def456",
|
||||
Verified: false,
|
||||
VerifiedAt: now,
|
||||
Error: "fingerprint mismatch",
|
||||
}
|
||||
|
||||
data, err := json.Marshal(result)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error marshaling: %v", err)
|
||||
}
|
||||
|
||||
var unmarshaled VerificationResult
|
||||
err = json.Unmarshal(data, &unmarshaled)
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error unmarshaling: %v", err)
|
||||
}
|
||||
|
||||
if unmarshaled.Error != "fingerprint mismatch" {
|
||||
t.Errorf("error mismatch: got %s", unmarshaled.Error)
|
||||
}
|
||||
}
|
||||
|
||||
func TestReportVerificationResult_ServerError(t *testing.T) {
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
w.Write([]byte("server error"))
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
cfg := &AgentConfig{
|
||||
ServerURL: server.URL,
|
||||
APIKey: "test-api-key",
|
||||
}
|
||||
agent, _ := NewAgent(cfg, nil)
|
||||
|
||||
result := &VerificationResult{
|
||||
ExpectedFingerprint: "abc123",
|
||||
ActualFingerprint: "abc123",
|
||||
Verified: true,
|
||||
VerifiedAt: time.Now().UTC(),
|
||||
}
|
||||
|
||||
err := agent.reportVerificationResult(context.Background(), "j-test", "t-nginx1", result)
|
||||
if err == nil {
|
||||
t.Error("expected error for server error response")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_InvalidPort(t *testing.T) {
|
||||
config := map[string]interface{}{
|
||||
"host": "example.com",
|
||||
"port": 99999.0,
|
||||
}
|
||||
configJSON, _ := json.Marshal(config)
|
||||
|
||||
_, _, err := extractTargetHostAndPort(configJSON)
|
||||
if err == nil {
|
||||
t.Error("expected error for invalid port")
|
||||
}
|
||||
}
|
||||
|
||||
func TestExtractTargetHostAndPort_ZeroPort(t *testing.T) {
|
||||
config := map[string]interface{}{
|
||||
"host": "example.com",
|
||||
"port": 0.0,
|
||||
}
|
||||
configJSON, _ := json.Marshal(config)
|
||||
|
||||
_, _, err := extractTargetHostAndPort(configJSON)
|
||||
if err == nil {
|
||||
t.Error("expected error for zero port")
|
||||
}
|
||||
}
|
||||
|
||||
func TestVerifyDeployment_FingerprintComparison(t *testing.T) {
|
||||
// Create a simple TLS server for testing
|
||||
server := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
// Get the server's TLS certificate from TLS config
|
||||
if len(server.TLS.Certificates) == 0 {
|
||||
t.Skip("no TLS certificates configured on test server")
|
||||
}
|
||||
|
||||
// Parse the leaf certificate from the DER bytes
|
||||
leafDER := server.TLS.Certificates[0].Certificate[0]
|
||||
leafCert, err := x509.ParseCertificate(leafDER)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to parse test server certificate: %v", err)
|
||||
}
|
||||
|
||||
certPEM := string(pem.EncodeToMemory(&pem.Block{
|
||||
Type: "CERTIFICATE",
|
||||
Bytes: leafCert.Raw,
|
||||
}))
|
||||
|
||||
// Get host and port from the listener address
|
||||
addr := server.Listener.Addr().String()
|
||||
host, portStr, err := net.SplitHostPort(addr)
|
||||
if err != nil {
|
||||
t.Fatalf("failed to parse server address: %v", err)
|
||||
}
|
||||
port := 0
|
||||
fmt.Sscanf(portStr, "%d", &port)
|
||||
|
||||
// Verify deployment against the live TLS server
|
||||
ctx := context.Background()
|
||||
result, _ := verifyDeployment(ctx, host, port, certPEM, 0, 5*time.Second, nil)
|
||||
|
||||
// This test may fail in some environments due to TLS setup complexity
|
||||
// The key is testing the fingerprint comparison logic
|
||||
if result != nil {
|
||||
if result.Verified && result.ExpectedFingerprint != result.ActualFingerprint {
|
||||
t.Error("fingerprint mismatch: expected and actual should match if Verified is true")
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -3,7 +3,9 @@ package main
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"net/url"
|
||||
"os"
|
||||
"strings"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/cli"
|
||||
)
|
||||
@@ -27,35 +29,50 @@ Commands:
|
||||
certs renew ID Trigger certificate renewal
|
||||
certs revoke ID Revoke a certificate
|
||||
|
||||
agents list List agents
|
||||
agents get ID Get agent details
|
||||
agents list List agents (add --retired to list soft-retired agents)
|
||||
agents get ID Get agent details
|
||||
agents retire ID Soft-retire an agent (add --force --reason "…" to cascade)
|
||||
|
||||
jobs list List jobs
|
||||
jobs get ID Get job details
|
||||
jobs cancel ID Cancel a pending job
|
||||
|
||||
import FILE Bulk import certificates from PEM file(s)
|
||||
Required: --owner-id, --team-id, --renewal-policy-id, --issuer-id
|
||||
Optional: --name-template (default {cn}), --environment (default imported)
|
||||
|
||||
status Show server health + summary stats
|
||||
version Show CLI version
|
||||
|
||||
Examples:
|
||||
certctl-cli --server http://localhost:8443 --api-key mykey certs list
|
||||
certctl-cli --server https://localhost:8443 --api-key mykey certs list
|
||||
certctl-cli certs renew mc-prod --format json
|
||||
certctl-cli import certs.pem
|
||||
`)
|
||||
}
|
||||
|
||||
serverURL := fs.String("server", os.Getenv("CERTCTL_SERVER_URL"), "certctl server URL (env: CERTCTL_SERVER_URL)")
|
||||
if *serverURL == "" {
|
||||
*serverURL = "http://localhost:8443"
|
||||
// HTTPS-Everywhere (v2.2): the server is HTTPS-only. The default URL uses
|
||||
// https://; plaintext http:// is rejected by validateHTTPSScheme below.
|
||||
defaultServer := os.Getenv("CERTCTL_SERVER_URL")
|
||||
if defaultServer == "" {
|
||||
defaultServer = "https://localhost:8443"
|
||||
}
|
||||
serverURL := fs.String("server", defaultServer, "certctl server URL — must be https:// (env: CERTCTL_SERVER_URL)")
|
||||
|
||||
apiKey := fs.String("api-key", os.Getenv("CERTCTL_API_KEY"), "API key for authentication (env: CERTCTL_API_KEY)")
|
||||
format := fs.String("format", "table", "Output format: table, json")
|
||||
caBundlePath := fs.String("ca-bundle", os.Getenv("CERTCTL_SERVER_CA_BUNDLE_PATH"), "Path to a PEM-encoded CA bundle that signed the server cert (env: CERTCTL_SERVER_CA_BUNDLE_PATH)")
|
||||
insecure := fs.Bool("insecure", strings.EqualFold(os.Getenv("CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY"), "true"), "Skip TLS certificate verification — dev only, never set in production (env: CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY)")
|
||||
|
||||
fs.Parse(os.Args[1:])
|
||||
|
||||
if err := validateHTTPSScheme(*serverURL); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
|
||||
fmt.Fprintf(os.Stderr, "\nThe certctl control plane is HTTPS-only as of v2.2.\n")
|
||||
fmt.Fprintf(os.Stderr, "See docs/upgrade-to-tls.md for the cutover walkthrough.\n")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
args := fs.Args()
|
||||
if len(args) == 0 {
|
||||
fs.Usage()
|
||||
@@ -63,13 +80,16 @@ Examples:
|
||||
}
|
||||
|
||||
// Create client
|
||||
client := cli.NewClient(*serverURL, *apiKey, *format)
|
||||
client, err := cli.NewClient(*serverURL, *apiKey, *format, *caBundlePath, *insecure)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
// Dispatch to appropriate command
|
||||
command := args[0]
|
||||
cmdArgs := args[1:]
|
||||
|
||||
var err error
|
||||
switch command {
|
||||
case "certs":
|
||||
err = handleCerts(client, cmdArgs)
|
||||
@@ -130,15 +150,27 @@ func handleCerts(client *cli.Client, args []string) error {
|
||||
reason = subArgs[2]
|
||||
}
|
||||
return client.RevokeCertificate(id, reason)
|
||||
case "bulk-revoke":
|
||||
return client.BulkRevokeCertificates(subArgs)
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown subcommand: certs %s\n", subcommand)
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// handleAgents dispatches the `agents` subcommands.
|
||||
//
|
||||
// I-004 additions:
|
||||
//
|
||||
// agents list --retired — hit the opt-in /agents/retired endpoint
|
||||
// instead of the default listing (which
|
||||
// filters retired rows out).
|
||||
// agents retire <id> — soft-retire an agent (DELETE /agents/{id}).
|
||||
// --force cascades; --reason is required with
|
||||
// --force (mirrors ErrForceReasonRequired).
|
||||
func handleAgents(client *cli.Client, args []string) error {
|
||||
if len(args) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: agents <list|get> [options]\n")
|
||||
fmt.Fprintf(os.Stderr, "usage: agents <list|get|retire> [options]\n")
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -147,13 +179,34 @@ func handleAgents(client *cli.Client, args []string) error {
|
||||
|
||||
switch subcommand {
|
||||
case "list":
|
||||
return client.ListAgents(subArgs)
|
||||
// --retired flag splits to a separate endpoint. We intercept it
|
||||
// client-side and strip it before delegating, so both code paths
|
||||
// share the --page/--per-page flag parsing inside the client.
|
||||
retired := false
|
||||
rest := make([]string, 0, len(subArgs))
|
||||
for _, a := range subArgs {
|
||||
if a == "--retired" {
|
||||
retired = true
|
||||
continue
|
||||
}
|
||||
rest = append(rest, a)
|
||||
}
|
||||
if retired {
|
||||
return client.ListRetiredAgents(rest)
|
||||
}
|
||||
return client.ListAgents(rest)
|
||||
case "get":
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: agents get <id>\n")
|
||||
return nil
|
||||
}
|
||||
return client.GetAgent(subArgs[0])
|
||||
case "retire":
|
||||
if len(subArgs) == 0 {
|
||||
fmt.Fprintf(os.Stderr, "usage: agents retire <id> [--force] [--reason <reason>]\n")
|
||||
return nil
|
||||
}
|
||||
return client.RetireAgent(subArgs)
|
||||
default:
|
||||
fmt.Fprintf(os.Stderr, "unknown subcommand: agents %s\n", subcommand)
|
||||
return nil
|
||||
@@ -201,3 +254,26 @@ func handleImport(client *cli.Client, args []string) error {
|
||||
func handleStatus(client *cli.Client) error {
|
||||
return client.GetStatus()
|
||||
}
|
||||
|
||||
// validateHTTPSScheme rejects plaintext and empty-scheme server URLs at
|
||||
// startup so operators get a fail-loud diagnostic before any network call,
|
||||
// not a TCP-refused or TLS-handshake-error downstream. See docs/upgrade-to-tls.md.
|
||||
func validateHTTPSScheme(serverURL string) error {
|
||||
if serverURL == "" {
|
||||
return fmt.Errorf("server URL is empty — set --server (or CERTCTL_SERVER_URL) to an https:// URL (e.g., https://certctl-server:8443)")
|
||||
}
|
||||
u, err := url.Parse(serverURL)
|
||||
if err != nil {
|
||||
return fmt.Errorf("server URL %q is not a valid URL: %w", serverURL, err)
|
||||
}
|
||||
switch strings.ToLower(u.Scheme) {
|
||||
case "https":
|
||||
return nil
|
||||
case "http":
|
||||
return fmt.Errorf("server URL %q uses plaintext http:// — the certctl control plane is HTTPS-only", serverURL)
|
||||
case "":
|
||||
return fmt.Errorf("server URL %q is missing a scheme — expected https://", serverURL)
|
||||
default:
|
||||
return fmt.Errorf("server URL %q uses unsupported scheme %q — expected https://", serverURL, u.Scheme)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,96 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestValidateHTTPSScheme pins the pre-flight URL-scheme guard that the
|
||||
// HTTPS-Everywhere milestone (v2.2, §3.2) requires on the certctl-cli binary
|
||||
// startup path. The CLI's diagnostic is distinct from the agent and MCP server
|
||||
// because it surfaces the --server flag alongside CERTCTL_SERVER_URL — so the
|
||||
// empty-URL case pins that flag-name substring separately. Every other case
|
||||
// mirrors the dispatch arms in cmd/cli/main.go:validateHTTPSScheme; drifting
|
||||
// the substrings is what this test is here to catch.
|
||||
func TestValidateHTTPSScheme(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
serverURL string
|
||||
wantErr bool
|
||||
wantErrSub string // substring that MUST appear in the error message
|
||||
}{
|
||||
{
|
||||
name: "https URL passes",
|
||||
serverURL: "https://certctl-server:8443",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "https URL with path passes",
|
||||
serverURL: "https://certctl.example.com/api/v1",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "uppercase HTTPS scheme passes (url.Parse lowercases)",
|
||||
serverURL: "HTTPS://certctl-server:8443",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "empty URL rejected mentions --server flag",
|
||||
serverURL: "",
|
||||
wantErr: true,
|
||||
wantErrSub: "--server",
|
||||
},
|
||||
{
|
||||
name: "empty URL rejected also mentions CERTCTL_SERVER_URL",
|
||||
serverURL: "",
|
||||
wantErr: true,
|
||||
wantErrSub: "CERTCTL_SERVER_URL",
|
||||
},
|
||||
{
|
||||
name: "plaintext http rejected",
|
||||
serverURL: "http://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "plaintext http://",
|
||||
},
|
||||
{
|
||||
name: "bare host missing scheme rejected",
|
||||
serverURL: "localhost:8443",
|
||||
wantErr: true,
|
||||
// url.Parse treats "localhost:8443" as scheme=localhost, opaque=8443
|
||||
// — exercises the default arm (unsupported scheme) rather than the
|
||||
// empty-scheme arm. Both are fail-closed, which is what we care about.
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
{
|
||||
name: "path-only URL rejected",
|
||||
serverURL: "//certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "missing a scheme",
|
||||
},
|
||||
{
|
||||
name: "unsupported scheme rejected",
|
||||
serverURL: "ftp://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
{
|
||||
name: "ws scheme rejected",
|
||||
serverURL: "ws://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := validateHTTPSScheme(tt.serverURL)
|
||||
if (err != nil) != tt.wantErr {
|
||||
t.Fatalf("validateHTTPSScheme(%q) err=%v wantErr=%v", tt.serverURL, err, tt.wantErr)
|
||||
}
|
||||
if tt.wantErr && tt.wantErrSub != "" && !strings.Contains(err.Error(), tt.wantErrSub) {
|
||||
t.Errorf("validateHTTPSScheme(%q) err=%q must contain %q so operators see the right diagnostic",
|
||||
tt.serverURL, err.Error(), tt.wantErrSub)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -4,8 +4,10 @@ import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/url"
|
||||
"os"
|
||||
"os/signal"
|
||||
"strings"
|
||||
|
||||
gomcp "github.com/modelcontextprotocol/go-sdk/mcp"
|
||||
|
||||
@@ -16,14 +18,33 @@ import (
|
||||
var Version = "dev"
|
||||
|
||||
func main() {
|
||||
// HTTPS-Everywhere (v2.2): the server is HTTPS-only. The default URL
|
||||
// uses https://; plaintext http:// is rejected by validateHTTPSScheme
|
||||
// below with a fail-loud pre-flight diagnostic pointing at
|
||||
// docs/upgrade-to-tls.md, so operators never get a TCP-refused or
|
||||
// TLS-handshake-error downstream. See docs/tls.md for CA bundle and
|
||||
// insecure-skip-verify guidance.
|
||||
serverURL := os.Getenv("CERTCTL_SERVER_URL")
|
||||
if serverURL == "" {
|
||||
serverURL = "http://localhost:8443"
|
||||
serverURL = "https://localhost:8443"
|
||||
}
|
||||
|
||||
if err := validateHTTPSScheme(serverURL); err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
|
||||
fmt.Fprintf(os.Stderr, "\nThe certctl control plane is HTTPS-only as of v2.2.\n")
|
||||
fmt.Fprintf(os.Stderr, "See docs/upgrade-to-tls.md for the cutover walkthrough.\n")
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
apiKey := os.Getenv("CERTCTL_API_KEY")
|
||||
caBundlePath := os.Getenv("CERTCTL_SERVER_CA_BUNDLE_PATH")
|
||||
insecure := strings.EqualFold(os.Getenv("CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY"), "true")
|
||||
|
||||
client := mcp.NewClient(serverURL, apiKey)
|
||||
client, err := mcp.NewClient(serverURL, apiKey, caBundlePath, insecure)
|
||||
if err != nil {
|
||||
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
server := gomcp.NewServer(&gomcp.Implementation{
|
||||
Name: "certctl",
|
||||
@@ -41,3 +62,26 @@ func main() {
|
||||
log.Fatalf("MCP server error: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// validateHTTPSScheme rejects plaintext and empty-scheme server URLs at
|
||||
// startup so operators get a fail-loud diagnostic before any network call,
|
||||
// not a TCP-refused or TLS-handshake-error downstream. See docs/upgrade-to-tls.md.
|
||||
func validateHTTPSScheme(serverURL string) error {
|
||||
if serverURL == "" {
|
||||
return fmt.Errorf("server URL is empty — set CERTCTL_SERVER_URL to an https:// URL (e.g., https://certctl-server:8443)")
|
||||
}
|
||||
u, err := url.Parse(serverURL)
|
||||
if err != nil {
|
||||
return fmt.Errorf("server URL %q is not a valid URL: %w", serverURL, err)
|
||||
}
|
||||
switch strings.ToLower(u.Scheme) {
|
||||
case "https":
|
||||
return nil
|
||||
case "http":
|
||||
return fmt.Errorf("server URL %q uses plaintext http:// — the certctl control plane is HTTPS-only", serverURL)
|
||||
case "":
|
||||
return fmt.Errorf("server URL %q is missing a scheme — expected https://", serverURL)
|
||||
default:
|
||||
return fmt.Errorf("server URL %q uses unsupported scheme %q — expected https://", serverURL, u.Scheme)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,90 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestValidateHTTPSScheme pins the pre-flight URL-scheme guard that the
|
||||
// HTTPS-Everywhere milestone (v2.2, §3.2) requires on the MCP server binary
|
||||
// startup path. The whole point is to fail loud with a diagnostic that points
|
||||
// at docs/upgrade-to-tls.md *before* any network call — not a cryptic
|
||||
// TCP-refused or TLS-handshake-error two ticks later. Every case here mirrors
|
||||
// the dispatch arms in cmd/mcp-server/main.go:validateHTTPSScheme; drifting
|
||||
// the error-message substrings is what this test is here to catch.
|
||||
func TestValidateHTTPSScheme(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
serverURL string
|
||||
wantErr bool
|
||||
wantErrSub string // substring that MUST appear in the error message
|
||||
}{
|
||||
{
|
||||
name: "https URL passes",
|
||||
serverURL: "https://certctl-server:8443",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "https URL with path passes",
|
||||
serverURL: "https://certctl.example.com/api/v1",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "uppercase HTTPS scheme passes (url.Parse lowercases)",
|
||||
serverURL: "HTTPS://certctl-server:8443",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "empty URL rejected",
|
||||
serverURL: "",
|
||||
wantErr: true,
|
||||
wantErrSub: "server URL is empty",
|
||||
},
|
||||
{
|
||||
name: "plaintext http rejected",
|
||||
serverURL: "http://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "plaintext http://",
|
||||
},
|
||||
{
|
||||
name: "bare host missing scheme rejected",
|
||||
serverURL: "localhost:8443",
|
||||
wantErr: true,
|
||||
// url.Parse treats "localhost:8443" as scheme=localhost, opaque=8443
|
||||
// — exercises the default arm (unsupported scheme) rather than the
|
||||
// empty-scheme arm. Both are fail-closed, which is what we care about.
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
{
|
||||
name: "path-only URL rejected",
|
||||
serverURL: "//certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "missing a scheme",
|
||||
},
|
||||
{
|
||||
name: "unsupported scheme rejected",
|
||||
serverURL: "ftp://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
{
|
||||
name: "ws scheme rejected",
|
||||
serverURL: "ws://certctl-server:8443",
|
||||
wantErr: true,
|
||||
wantErrSub: "unsupported scheme",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := validateHTTPSScheme(tt.serverURL)
|
||||
if (err != nil) != tt.wantErr {
|
||||
t.Fatalf("validateHTTPSScheme(%q) err=%v wantErr=%v", tt.serverURL, err, tt.wantErr)
|
||||
}
|
||||
if tt.wantErr && tt.wantErrSub != "" && !strings.Contains(err.Error(), tt.wantErrSub) {
|
||||
t.Errorf("validateHTTPSScheme(%q) err=%q must contain %q so operators see the right diagnostic",
|
||||
tt.serverURL, err.Error(), tt.wantErrSub)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,314 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// TestBuildFinalHandler_Dispatch is the M-001 regression harness for the outer
|
||||
// HTTP dispatch layer. It pins which path prefixes ride the no-auth middleware
|
||||
// chain (EST, SCEP, /.well-known/pki, health/ready, /api/v1/auth/info) versus
|
||||
// the authenticated chain (/api/v1/*).
|
||||
//
|
||||
// The concern under test is ONLY the dispatch in buildFinalHandler — the
|
||||
// handlers themselves are mocked as marker handlers that stamp "AUTH" or
|
||||
// "NOAUTH" into the response body. Service-layer concerns (SCEP password
|
||||
// validation, EST CSR validation, API auth enforcement) are covered by their
|
||||
// respective test suites.
|
||||
//
|
||||
// Case (i) is the central guard: EST with NO client cert / NO Bearer token
|
||||
// MUST reach the no-auth handler (pre-M-001 it was 401'd by the Auth
|
||||
// middleware, blocking enrollment for every real-world EST client).
|
||||
func TestBuildFinalHandler_Dispatch(t *testing.T) {
|
||||
// Marker handlers — each stamps a unique body so tests can verify which
|
||||
// chain the request traversed.
|
||||
authHandler := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.Header().Set("X-Chain", "auth")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte("AUTH"))
|
||||
})
|
||||
noAuthHandler := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.Header().Set("X-Chain", "noauth")
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte("NOAUTH"))
|
||||
})
|
||||
|
||||
// Dashboard directory with index.html + assets/ for SPA fallback and
|
||||
// static-asset tests. Cleaned up by t.TempDir.
|
||||
webDir := t.TempDir()
|
||||
indexHTML := []byte("<!doctype html><html><body>certctl dashboard</body></html>")
|
||||
if err := os.WriteFile(filepath.Join(webDir, "index.html"), indexHTML, 0o644); err != nil {
|
||||
t.Fatalf("write index.html: %v", err)
|
||||
}
|
||||
assetsDir := filepath.Join(webDir, "assets")
|
||||
if err := os.MkdirAll(assetsDir, 0o755); err != nil {
|
||||
t.Fatalf("mkdir assets: %v", err)
|
||||
}
|
||||
assetJS := []byte("console.log('certctl');")
|
||||
if err := os.WriteFile(filepath.Join(assetsDir, "app.js"), assetJS, 0o644); err != nil {
|
||||
t.Fatalf("write app.js: %v", err)
|
||||
}
|
||||
|
||||
handler := buildFinalHandler(authHandler, noAuthHandler, webDir, true /* dashboardEnabled */)
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
method string
|
||||
path string
|
||||
wantBody string // "AUTH" | "NOAUTH" | "" (== substring match against response body)
|
||||
wantBodyPrefix string
|
||||
wantStatus int
|
||||
description string
|
||||
}{
|
||||
// ---- Case (i): M-001 central regression guard ----
|
||||
{
|
||||
name: "est_cacerts_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/.well-known/est/cacerts",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "EST clients cannot present Bearer tokens — must NOT be 401'd before reaching the handler (RFC 7030 §4.1.1)",
|
||||
},
|
||||
{
|
||||
name: "est_simpleenroll_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodPost,
|
||||
path: "/.well-known/est/simpleenroll",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "RFC 7030 §4.2 simpleenroll served from no-auth chain (option D)",
|
||||
},
|
||||
{
|
||||
name: "est_simplereenroll_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodPost,
|
||||
path: "/.well-known/est/simplereenroll",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "RFC 7030 §4.2.2 simplereenroll also on no-auth chain",
|
||||
},
|
||||
{
|
||||
name: "est_csrattrs_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/.well-known/est/csrattrs",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "RFC 7030 §4.5 csrattrs also on no-auth chain",
|
||||
},
|
||||
|
||||
// ---- Cases (ii) + (iii): SCEP dispatch ----
|
||||
// The actual challengePassword validation lives in the service layer
|
||||
// (internal/service/scep.go). This test pins that ALL /scep* requests
|
||||
// reach the no-auth chain — the service layer is then responsible for
|
||||
// rejecting or accepting based on password contents.
|
||||
{
|
||||
name: "scep_exact_path_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/scep",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "SCEP clients authenticate via CSR challengePassword, not Bearer (RFC 8894 §3.2)",
|
||||
},
|
||||
{
|
||||
name: "scep_subpath_reaches_noauth_handler",
|
||||
method: http.MethodPost,
|
||||
path: "/scep/",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Trailing-slash variant must also ride no-auth chain",
|
||||
},
|
||||
{
|
||||
name: "scep_query_string_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/scep?operation=GetCACaps",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Query string does not affect dispatch — operation dispatch is handler-internal",
|
||||
},
|
||||
// Defensive: /scepxyz MUST NOT match the SCEP prefix (guards against
|
||||
// over-broad matching that would leak non-SCEP paths into no-auth).
|
||||
{
|
||||
name: "scepxyz_does_not_match_scep_prefix",
|
||||
method: http.MethodGet,
|
||||
path: "/scepxyz",
|
||||
wantStatus: http.StatusOK,
|
||||
wantBody: "certctl dashboard",
|
||||
description: "SPA fallback — /scepxyz must not be confused with /scep or /scep/",
|
||||
},
|
||||
|
||||
// ---- Case (iv): RFC 5280 CRL + RFC 6960 OCSP ----
|
||||
{
|
||||
name: "pki_crl_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/.well-known/pki/crl/abc123",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "RFC 5280 CRL distribution point must be served without auth",
|
||||
},
|
||||
{
|
||||
name: "pki_ocsp_no_auth_reaches_noauth_handler",
|
||||
method: http.MethodGet,
|
||||
path: "/.well-known/pki/ocsp/abc123/serial",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "RFC 6960 OCSP responder must be served without auth",
|
||||
},
|
||||
|
||||
// ---- Case (v): Authenticated API routes ----
|
||||
{
|
||||
name: "api_v1_certificates_goes_through_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/api/v1/certificates",
|
||||
wantBody: "AUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Primary API surface must still require Bearer token",
|
||||
},
|
||||
{
|
||||
name: "api_v1_auth_check_goes_through_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/api/v1/auth/check",
|
||||
wantBody: "AUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "auth/check validates the caller's Bearer — auth chain required",
|
||||
},
|
||||
{
|
||||
name: "api_v1_jobs_goes_through_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/api/v1/jobs",
|
||||
wantBody: "AUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Jobs API is part of the privileged surface",
|
||||
},
|
||||
|
||||
// ---- Health probes bypass auth ----
|
||||
{
|
||||
name: "health_bypasses_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/health",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Docker/K8s health probes cannot carry Bearer tokens",
|
||||
},
|
||||
{
|
||||
name: "ready_bypasses_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/ready",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "Readiness probe also unauthenticated",
|
||||
},
|
||||
{
|
||||
name: "auth_info_bypasses_auth",
|
||||
method: http.MethodGet,
|
||||
path: "/api/v1/auth/info",
|
||||
wantBody: "NOAUTH",
|
||||
wantStatus: http.StatusOK,
|
||||
description: "React app calls auth/info BEFORE login to discover auth mode",
|
||||
},
|
||||
|
||||
// ---- Static assets served by file server ----
|
||||
{
|
||||
name: "static_asset_served_by_file_server",
|
||||
method: http.MethodGet,
|
||||
path: "/assets/app.js",
|
||||
wantStatus: http.StatusOK,
|
||||
wantBody: "console.log('certctl');",
|
||||
description: "Built Vite assets served directly without auth",
|
||||
},
|
||||
|
||||
// ---- SPA fallback ----
|
||||
{
|
||||
name: "spa_fallback_serves_index_html",
|
||||
method: http.MethodGet,
|
||||
path: "/",
|
||||
wantStatus: http.StatusOK,
|
||||
wantBody: "certctl dashboard",
|
||||
description: "Root path serves SPA entry point",
|
||||
},
|
||||
{
|
||||
name: "spa_fallback_for_unknown_route",
|
||||
method: http.MethodGet,
|
||||
path: "/certificates",
|
||||
wantStatus: http.StatusOK,
|
||||
wantBody: "certctl dashboard",
|
||||
description: "React Router routes fall through to index.html",
|
||||
},
|
||||
{
|
||||
name: "spa_fallback_deep_route",
|
||||
method: http.MethodGet,
|
||||
path: "/certificates/mc-api-prod/detail",
|
||||
wantStatus: http.StatusOK,
|
||||
wantBody: "certctl dashboard",
|
||||
description: "Deep React Router routes also fall through to SPA",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
req := httptest.NewRequest(tc.method, tc.path, nil)
|
||||
w := httptest.NewRecorder()
|
||||
handler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != tc.wantStatus {
|
||||
t.Errorf("status = %d, want %d (%s)", w.Code, tc.wantStatus, tc.description)
|
||||
}
|
||||
body := w.Body.String()
|
||||
if tc.wantBody != "" && !strings.Contains(body, tc.wantBody) {
|
||||
t.Errorf("body %q does not contain %q (%s)", body, tc.wantBody, tc.description)
|
||||
}
|
||||
if tc.wantBodyPrefix != "" && !strings.HasPrefix(body, tc.wantBodyPrefix) {
|
||||
t.Errorf("body %q does not start with %q (%s)", body, tc.wantBodyPrefix, tc.description)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestBuildFinalHandler_NoDashboard pins the API-only (dashboard-absent)
|
||||
// dispatch behavior. When web/dist/index.html is missing, everything that's
|
||||
// not a no-auth bypass route falls through to the authenticated apiHandler
|
||||
// (pre-M-001 behavior for headless deployments). EST/SCEP/PKI still ride the
|
||||
// no-auth chain.
|
||||
func TestBuildFinalHandler_NoDashboard(t *testing.T) {
|
||||
authHandler := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte("AUTH"))
|
||||
})
|
||||
noAuthHandler := http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
_, _ = w.Write([]byte("NOAUTH"))
|
||||
})
|
||||
|
||||
handler := buildFinalHandler(authHandler, noAuthHandler, "/nonexistent", false /* dashboardEnabled */)
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
path string
|
||||
wantBody string
|
||||
}{
|
||||
{"est_still_no_auth", "/.well-known/est/cacerts", "NOAUTH"},
|
||||
{"scep_still_no_auth", "/scep", "NOAUTH"},
|
||||
{"pki_still_no_auth", "/.well-known/pki/crl/x", "NOAUTH"},
|
||||
{"health_still_no_auth", "/health", "NOAUTH"},
|
||||
{"api_still_auth", "/api/v1/certificates", "AUTH"},
|
||||
// The difference: non-API, non-special paths go through auth chain when
|
||||
// there's no dashboard to serve (preserves legacy headless behavior).
|
||||
{"unknown_path_falls_through_to_auth", "/", "AUTH"},
|
||||
{"unknown_deep_path_falls_through_to_auth", "/random/path", "AUTH"},
|
||||
}
|
||||
|
||||
for _, tc := range tests {
|
||||
t.Run(tc.name, func(t *testing.T) {
|
||||
req := httptest.NewRequest(http.MethodGet, tc.path, nil)
|
||||
w := httptest.NewRecorder()
|
||||
handler.ServeHTTP(w, req)
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("status = %d, want 200", w.Code)
|
||||
}
|
||||
if got := w.Body.String(); !strings.Contains(got, tc.wantBody) {
|
||||
t.Errorf("body = %q, want to contain %q", got, tc.wantBody)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,650 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"strings"
|
||||
"testing"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/api/middleware"
|
||||
"github.com/shankar0123/certctl/internal/api/router"
|
||||
"github.com/shankar0123/certctl/internal/config"
|
||||
"github.com/shankar0123/certctl/internal/service"
|
||||
)
|
||||
|
||||
// TestMain_HealthEndpointBypassesAuth verifies that health check endpoints
|
||||
// bypass auth middleware while protected API endpoints require auth.
|
||||
// This is the most critical test — it validates the core routing pattern used in main.go.
|
||||
func TestMain_HealthEndpointBypassesAuth(t *testing.T) {
|
||||
// Simulate the finalHandler logic from main.go with minimal setup
|
||||
// Create handler functions for health endpoints
|
||||
healthHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"status":"ok"}`))
|
||||
})
|
||||
|
||||
readyHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"status":"ready"}`))
|
||||
})
|
||||
|
||||
authInfoHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"auth_type":"api-key"}`))
|
||||
})
|
||||
|
||||
// Protected API endpoint
|
||||
certHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`[]`))
|
||||
})
|
||||
|
||||
// Build the handler chain the same way main.go does
|
||||
authMiddleware := middleware.NewAuth(middleware.AuthConfig{
|
||||
Type: "api-key",
|
||||
Secret: "test-secret-key",
|
||||
})
|
||||
|
||||
// API handler with auth
|
||||
authHandler := middleware.Chain(certHandler,
|
||||
middleware.RequestID,
|
||||
middleware.Recovery,
|
||||
authMiddleware,
|
||||
)
|
||||
|
||||
// Create finalHandler matching main.go logic
|
||||
finalHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
path := r.URL.Path
|
||||
switch path {
|
||||
case "/health":
|
||||
healthHandler.ServeHTTP(w, r)
|
||||
case "/ready":
|
||||
readyHandler.ServeHTTP(w, r)
|
||||
case "/api/v1/auth/info":
|
||||
authInfoHandler.ServeHTTP(w, r)
|
||||
case "/api/v1/certificates":
|
||||
authHandler.ServeHTTP(w, r)
|
||||
default:
|
||||
http.Error(w, "Not Found", http.StatusNotFound)
|
||||
}
|
||||
})
|
||||
|
||||
tests := []struct {
|
||||
name string
|
||||
path string
|
||||
method string
|
||||
bypassesAuth bool
|
||||
expectedStatus int
|
||||
}{
|
||||
{
|
||||
name: "GET /health without auth",
|
||||
path: "/health",
|
||||
method: "GET",
|
||||
bypassesAuth: true,
|
||||
expectedStatus: http.StatusOK,
|
||||
},
|
||||
{
|
||||
name: "GET /ready without auth",
|
||||
path: "/ready",
|
||||
method: "GET",
|
||||
bypassesAuth: true,
|
||||
expectedStatus: http.StatusOK,
|
||||
},
|
||||
{
|
||||
name: "GET /api/v1/auth/info without auth",
|
||||
path: "/api/v1/auth/info",
|
||||
method: "GET",
|
||||
bypassesAuth: true,
|
||||
expectedStatus: http.StatusOK,
|
||||
},
|
||||
{
|
||||
name: "GET /api/v1/certificates without auth (should fail)",
|
||||
path: "/api/v1/certificates",
|
||||
method: "GET",
|
||||
bypassesAuth: false,
|
||||
expectedStatus: http.StatusUnauthorized,
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
req := httptest.NewRequest(tt.method, tt.path, nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
finalHandler.ServeHTTP(w, req)
|
||||
|
||||
if tt.bypassesAuth && w.Code != tt.expectedStatus {
|
||||
t.Errorf("endpoint %s should bypass auth, got status %d, expected %d",
|
||||
tt.path, w.Code, tt.expectedStatus)
|
||||
}
|
||||
|
||||
if !tt.bypassesAuth && w.Code != tt.expectedStatus {
|
||||
t.Logf("endpoint %s requires auth, got status %d, expected %d (auth middleware working)",
|
||||
tt.path, w.Code, tt.expectedStatus)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_HealthHandlersRespond verifies health endpoints return correct responses.
|
||||
func TestMain_HealthHandlersRespond(t *testing.T) {
|
||||
healthHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"status":"ok"}`))
|
||||
})
|
||||
|
||||
req := httptest.NewRequest("GET", "/health", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
healthHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("expected status 200, got %d", w.Code)
|
||||
}
|
||||
|
||||
if body := w.Body.String(); body != `{"status":"ok"}` {
|
||||
t.Errorf("expected body '{\"status\":\"ok\"}', got '%s'", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_AuthMiddlewareRejectsUnauthorized verifies auth middleware works.
|
||||
func TestMain_AuthMiddlewareRejectsUnauthorized(t *testing.T) {
|
||||
// Create a protected endpoint
|
||||
protectedHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"data":"protected"}`))
|
||||
})
|
||||
|
||||
// Wrap with auth middleware
|
||||
authMiddleware := middleware.NewAuth(middleware.AuthConfig{
|
||||
Type: "api-key",
|
||||
Secret: "test-secret-key",
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(protectedHandler, authMiddleware)
|
||||
|
||||
// Request without auth should be rejected
|
||||
req := httptest.NewRequest("GET", "/api/v1/protected", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusUnauthorized {
|
||||
t.Errorf("expected status 401 for unauthorized request, got %d", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_AuthMiddlewareAllowsWithValidKey verifies auth middleware allows valid keys.
|
||||
func TestMain_AuthMiddlewareAllowsWithValidKey(t *testing.T) {
|
||||
testKey := "test-secret-key"
|
||||
|
||||
// Create a protected endpoint
|
||||
protectedHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"data":"protected"}`))
|
||||
})
|
||||
|
||||
// Wrap with auth middleware
|
||||
authMiddleware := middleware.NewAuth(middleware.AuthConfig{
|
||||
Type: "api-key",
|
||||
Secret: testKey,
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(protectedHandler, authMiddleware)
|
||||
|
||||
// Request with valid auth should be allowed
|
||||
req := httptest.NewRequest("GET", "/api/v1/protected", nil)
|
||||
req.Header.Set("Authorization", "Bearer "+testKey)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("expected status 200 for authorized request, got %d", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_ServerConfigFromEnvironment verifies config.Load() reads env vars correctly.
|
||||
func TestMain_ServerConfigFromEnvironment(t *testing.T) {
|
||||
// Save original env vars
|
||||
oldAuthType := os.Getenv("CERTCTL_AUTH_TYPE")
|
||||
oldServerHost := os.Getenv("CERTCTL_SERVER_HOST")
|
||||
oldServerPort := os.Getenv("CERTCTL_SERVER_PORT")
|
||||
oldTLSCert := os.Getenv("CERTCTL_SERVER_TLS_CERT_PATH")
|
||||
oldTLSKey := os.Getenv("CERTCTL_SERVER_TLS_KEY_PATH")
|
||||
defer func() {
|
||||
if oldAuthType != "" {
|
||||
os.Setenv("CERTCTL_AUTH_TYPE", oldAuthType)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_AUTH_TYPE")
|
||||
}
|
||||
if oldServerHost != "" {
|
||||
os.Setenv("CERTCTL_SERVER_HOST", oldServerHost)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_HOST")
|
||||
}
|
||||
if oldServerPort != "" {
|
||||
os.Setenv("CERTCTL_SERVER_PORT", oldServerPort)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_PORT")
|
||||
}
|
||||
if oldTLSCert != "" {
|
||||
os.Setenv("CERTCTL_SERVER_TLS_CERT_PATH", oldTLSCert)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_TLS_CERT_PATH")
|
||||
}
|
||||
if oldTLSKey != "" {
|
||||
os.Setenv("CERTCTL_SERVER_TLS_KEY_PATH", oldTLSKey)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_TLS_KEY_PATH")
|
||||
}
|
||||
}()
|
||||
|
||||
// HTTPS-only control plane: Validate() refuses to pass without a readable
|
||||
// cert/key pair on disk. Materialize a throwaway ECDSA P-256 pair using the
|
||||
// same generator cmd/server/tls_test.go uses for the certHolder tests.
|
||||
dir := t.TempDir()
|
||||
certPath := dir + "/server.crt"
|
||||
keyPath := dir + "/server.key"
|
||||
generateTestCert(t, certPath, keyPath, "main-test-cn")
|
||||
|
||||
// Set test env vars
|
||||
os.Setenv("CERTCTL_AUTH_TYPE", "none")
|
||||
os.Setenv("CERTCTL_SERVER_HOST", "127.0.0.1")
|
||||
os.Setenv("CERTCTL_SERVER_PORT", "8080")
|
||||
os.Setenv("CERTCTL_SERVER_TLS_CERT_PATH", certPath)
|
||||
os.Setenv("CERTCTL_SERVER_TLS_KEY_PATH", keyPath)
|
||||
|
||||
cfg, err := config.Load()
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to load config from env vars: %v", err)
|
||||
}
|
||||
|
||||
if cfg.Auth.Type != "none" {
|
||||
t.Errorf("Expected auth type 'none', got '%s'", cfg.Auth.Type)
|
||||
}
|
||||
|
||||
if cfg.Server.Host != "127.0.0.1" {
|
||||
t.Errorf("Expected server host '127.0.0.1', got '%s'", cfg.Server.Host)
|
||||
}
|
||||
|
||||
if cfg.Server.Port != 8080 {
|
||||
t.Errorf("Expected server port 8080, got %d", cfg.Server.Port)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_AuthTypeConfiguration verifies auth type is read from config.
|
||||
func TestMain_AuthTypeConfiguration(t *testing.T) {
|
||||
// Save original env vars
|
||||
oldAuthType := os.Getenv("CERTCTL_AUTH_TYPE")
|
||||
oldAuthSecret := os.Getenv("CERTCTL_AUTH_SECRET")
|
||||
oldTLSCert := os.Getenv("CERTCTL_SERVER_TLS_CERT_PATH")
|
||||
oldTLSKey := os.Getenv("CERTCTL_SERVER_TLS_KEY_PATH")
|
||||
defer func() {
|
||||
if oldAuthType != "" {
|
||||
os.Setenv("CERTCTL_AUTH_TYPE", oldAuthType)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_AUTH_TYPE")
|
||||
}
|
||||
if oldAuthSecret != "" {
|
||||
os.Setenv("CERTCTL_AUTH_SECRET", oldAuthSecret)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_AUTH_SECRET")
|
||||
}
|
||||
if oldTLSCert != "" {
|
||||
os.Setenv("CERTCTL_SERVER_TLS_CERT_PATH", oldTLSCert)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_TLS_CERT_PATH")
|
||||
}
|
||||
if oldTLSKey != "" {
|
||||
os.Setenv("CERTCTL_SERVER_TLS_KEY_PATH", oldTLSKey)
|
||||
} else {
|
||||
os.Unsetenv("CERTCTL_SERVER_TLS_KEY_PATH")
|
||||
}
|
||||
}()
|
||||
|
||||
// HTTPS-only control plane: config.Load()→Validate() refuses to pass
|
||||
// without a readable cert/key pair. Mint one throwaway pair for the whole
|
||||
// sub-test cohort — auth type toggles don't care about the TLS surface.
|
||||
dir := t.TempDir()
|
||||
certPath := dir + "/server.crt"
|
||||
keyPath := dir + "/server.key"
|
||||
generateTestCert(t, certPath, keyPath, "main-test-cn")
|
||||
os.Setenv("CERTCTL_SERVER_TLS_CERT_PATH", certPath)
|
||||
os.Setenv("CERTCTL_SERVER_TLS_KEY_PATH", keyPath)
|
||||
|
||||
// Set auth secret for api-key mode
|
||||
os.Setenv("CERTCTL_AUTH_SECRET", "test-secret")
|
||||
|
||||
testCases := []string{"api-key", "none"}
|
||||
|
||||
for _, authType := range testCases {
|
||||
t.Run(fmt.Sprintf("auth_type_%s", authType), func(t *testing.T) {
|
||||
os.Setenv("CERTCTL_AUTH_TYPE", authType)
|
||||
|
||||
cfg, err := config.Load()
|
||||
if err != nil {
|
||||
t.Fatalf("Failed to load config: %v", err)
|
||||
}
|
||||
|
||||
if cfg.Auth.Type != authType {
|
||||
t.Errorf("Expected auth type '%s', got '%s'", authType, cfg.Auth.Type)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_MiddlewareChainConstruction tests that middleware can be properly chained.
|
||||
func TestMain_MiddlewareChainConstruction(t *testing.T) {
|
||||
// Test that the middleware.Chain function works as expected
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte("success"))
|
||||
})
|
||||
|
||||
// Chain with RequestID and Recovery middleware
|
||||
chainedHandler := middleware.Chain(baseHandler,
|
||||
middleware.RequestID,
|
||||
middleware.Recovery,
|
||||
)
|
||||
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("expected status 200, got %d", w.Code)
|
||||
}
|
||||
|
||||
if body := w.Body.String(); body != "success" {
|
||||
t.Errorf("expected body 'success', got '%s'", body)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_RequestIDMiddleware verifies RequestID is added to responses.
|
||||
func TestMain_RequestIDMiddleware(t *testing.T) {
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
})
|
||||
|
||||
// Wrap with RequestID middleware
|
||||
chainedHandler := middleware.Chain(baseHandler, middleware.RequestID)
|
||||
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
// RequestID should be set in response header
|
||||
if rid := w.Header().Get("X-Request-ID"); rid == "" {
|
||||
t.Logf("X-Request-ID header not present (middleware may work differently)")
|
||||
} else {
|
||||
t.Logf("X-Request-ID header set: %s", rid)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_RecoveryMiddlewareHandlesPanic verifies recovery middleware works.
|
||||
func TestMain_RecoveryMiddlewareHandlesPanic(t *testing.T) {
|
||||
panicHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
panic("test panic")
|
||||
})
|
||||
|
||||
// Wrap with recovery middleware
|
||||
chainedHandler := middleware.Chain(panicHandler, middleware.Recovery)
|
||||
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
// Should not panic
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
// Should return 500 error
|
||||
if w.Code != http.StatusInternalServerError {
|
||||
t.Logf("Expected 500 for panicked handler, got %d", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_ServiceInitialization tests that services can be instantiated.
|
||||
// This validates the initialization pattern from main.go without needing a real DB.
|
||||
func TestMain_ServiceInitialization(t *testing.T) {
|
||||
logger := slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{
|
||||
Level: slog.LevelInfo,
|
||||
}))
|
||||
|
||||
// Create test issuer registry (same as main.go does)
|
||||
issuerRegistry := service.NewIssuerRegistry(logger)
|
||||
|
||||
if issuerRegistry == nil {
|
||||
t.Fatal("issuer registry should not be nil")
|
||||
}
|
||||
|
||||
// Verify the registry has a Len() method (used in main.go)
|
||||
count := issuerRegistry.Len()
|
||||
if count < 0 {
|
||||
t.Errorf("issuer registry length should be >= 0, got %d", count)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_CORSMiddlewareSetHeaders verifies CORS headers are set.
|
||||
func TestMain_CORSMiddlewareSetHeaders(t *testing.T) {
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
})
|
||||
|
||||
corsMiddleware := middleware.NewCORS(middleware.CORSConfig{
|
||||
AllowedOrigins: []string{"http://example.com"},
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(baseHandler, corsMiddleware)
|
||||
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
req.Header.Set("Origin", "http://example.com")
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
// CORS middleware should set access control headers
|
||||
if acah := w.Header().Get("Access-Control-Allow-Origin"); acah == "" {
|
||||
t.Logf("Access-Control-Allow-Origin not set (may be by design)")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_AuthNoneMode verifies auth can be disabled.
|
||||
func TestMain_AuthNoneMode(t *testing.T) {
|
||||
protectedHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"data":"protected"}`))
|
||||
})
|
||||
|
||||
// Wrap with auth middleware in "none" mode
|
||||
authMiddleware := middleware.NewAuth(middleware.AuthConfig{
|
||||
Type: "none",
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(protectedHandler, authMiddleware)
|
||||
|
||||
// Request without auth should be allowed in "none" mode
|
||||
req := httptest.NewRequest("GET", "/api/v1/protected", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("expected status 200 in 'none' auth mode, got %d", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_RouterRegistration tests that router registration works.
|
||||
func TestMain_RouterRegistration(t *testing.T) {
|
||||
r := router.New()
|
||||
|
||||
// Register a test handler
|
||||
r.RegisterFunc("GET /test", func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte("test"))
|
||||
})
|
||||
|
||||
// Request the route
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
r.ServeHTTP(w, req)
|
||||
|
||||
// Route should be registered and accessible
|
||||
if w.Code == http.StatusNotFound {
|
||||
t.Errorf("route not registered, got 404")
|
||||
} else if w.Code == http.StatusOK {
|
||||
t.Logf("route registered successfully")
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_RateLimiterIntegration tests rate limiter middleware works.
|
||||
func TestMain_RateLimiterIntegration(t *testing.T) {
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
})
|
||||
|
||||
// Create rate limiter with 10 RPS, 1 burst
|
||||
rateLimiter := middleware.NewRateLimiter(middleware.RateLimitConfig{
|
||||
RPS: 10,
|
||||
BurstSize: 1,
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(baseHandler, rateLimiter)
|
||||
|
||||
// First request should succeed
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code == http.StatusServiceUnavailable {
|
||||
t.Logf("rate limiter is active")
|
||||
} else {
|
||||
t.Logf("rate limiter allowed request (status %d)", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_ContentTypeMiddleware verifies content type is set correctly.
|
||||
func TestMain_ContentTypeMiddleware(t *testing.T) {
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"status":"ok"}`))
|
||||
})
|
||||
|
||||
// Wrap with middleware that sets Content-Type
|
||||
chainedHandler := middleware.Chain(baseHandler, middleware.ContentType)
|
||||
|
||||
req := httptest.NewRequest("GET", "/api/v1/test", nil)
|
||||
w := httptest.NewRecorder()
|
||||
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
// Verify response
|
||||
if w.Code != http.StatusOK {
|
||||
t.Errorf("expected status 200, got %d", w.Code)
|
||||
}
|
||||
|
||||
// ContentType middleware should set header
|
||||
if ct := w.Header().Get("Content-Type"); ct != "" {
|
||||
t.Logf("Content-Type header set: %s", ct)
|
||||
}
|
||||
}
|
||||
|
||||
// TestMain_ContextPropagation verifies context is propagated through middleware.
|
||||
func TestMain_ContextPropagation(t *testing.T) {
|
||||
type contextKey string
|
||||
testKey := contextKey("test-key")
|
||||
testValue := "test-value"
|
||||
|
||||
baseHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
val := r.Context().Value(testKey)
|
||||
if val == testValue {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
} else {
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
}
|
||||
})
|
||||
|
||||
chainedHandler := middleware.Chain(baseHandler, middleware.RequestID)
|
||||
|
||||
req := httptest.NewRequest("GET", "/test", nil)
|
||||
// Add context value before request
|
||||
req = req.WithContext(context.WithValue(req.Context(), testKey, testValue))
|
||||
|
||||
w := httptest.NewRecorder()
|
||||
chainedHandler.ServeHTTP(w, req)
|
||||
|
||||
if w.Code != http.StatusOK {
|
||||
t.Logf("Context value may not be propagated (status %d), this may be expected", w.Code)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPreflightSCEPChallengePassword is the H-2 regression guard for the
|
||||
// startup pre-flight check. The helper MUST return a non-nil error whenever
|
||||
// SCEP is enabled with an empty challenge password — that configuration
|
||||
// previously allowed unauthenticated certificate enrollment (CWE-306).
|
||||
// Disabled-SCEP and configured-password cases must pass cleanly.
|
||||
func TestPreflightSCEPChallengePassword(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
enabled bool
|
||||
challengePassword string
|
||||
wantErr bool
|
||||
wantErrSubstring string
|
||||
}{
|
||||
{
|
||||
name: "disabled_empty_password_ok",
|
||||
enabled: false,
|
||||
challengePassword: "",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "disabled_with_password_ok",
|
||||
enabled: false,
|
||||
challengePassword: "leftover-value",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "enabled_empty_password_rejected",
|
||||
enabled: true,
|
||||
challengePassword: "",
|
||||
wantErr: true,
|
||||
wantErrSubstring: "CERTCTL_SCEP_CHALLENGE_PASSWORD",
|
||||
},
|
||||
{
|
||||
name: "enabled_with_password_ok",
|
||||
enabled: true,
|
||||
challengePassword: "hunter2",
|
||||
wantErr: false,
|
||||
},
|
||||
{
|
||||
name: "enabled_single_char_password_ok",
|
||||
enabled: true,
|
||||
challengePassword: "x",
|
||||
wantErr: false,
|
||||
},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
err := preflightSCEPChallengePassword(tt.enabled, tt.challengePassword)
|
||||
if tt.wantErr {
|
||||
if err == nil {
|
||||
t.Fatalf("expected error, got nil")
|
||||
}
|
||||
if tt.wantErrSubstring != "" && !strings.Contains(err.Error(), tt.wantErrSubstring) {
|
||||
t.Errorf("expected error to mention %q, got: %v", tt.wantErrSubstring, err)
|
||||
}
|
||||
if !strings.Contains(err.Error(), "CWE-306") {
|
||||
t.Errorf("expected error to cite CWE-306 for traceability, got: %v", err)
|
||||
}
|
||||
} else if err != nil {
|
||||
t.Errorf("expected no error, got: %v", err)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,164 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"crypto/tls"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"os"
|
||||
"os/signal"
|
||||
"sync"
|
||||
"syscall"
|
||||
)
|
||||
|
||||
// certHolder stores the server's TLS certificate under a mutex so it can be
|
||||
// swapped atomically by a SIGHUP handler without restarting the server. A
|
||||
// *tls.Config that wires GetCertificate → (*certHolder).GetCertificate reads
|
||||
// through the holder on every ClientHello, so a successful reload takes
|
||||
// effect on the next new connection immediately and without dropping
|
||||
// in-flight requests.
|
||||
//
|
||||
// Concurrency: GetCertificate is invoked from crypto/tls handshake goroutines
|
||||
// on every new inbound connection; Reload is invoked from the SIGHUP watcher
|
||||
// goroutine. sync.Mutex is sufficient — TLS handshakes are not an inner-loop
|
||||
// hot path and the critical section is a single pointer read.
|
||||
type certHolder struct {
|
||||
mu sync.Mutex
|
||||
cert *tls.Certificate
|
||||
certPath string
|
||||
keyPath string
|
||||
}
|
||||
|
||||
// newCertHolder loads the initial cert+key pair from disk and returns a
|
||||
// holder ready to serve handshakes. Returns a non-nil error if either file
|
||||
// is missing, unreadable, or the pair does not round-trip through
|
||||
// tls.LoadX509KeyPair (for example the key does not sign the cert). The
|
||||
// caller is expected to treat a non-nil error as a fail-loud startup gate
|
||||
// and os.Exit(1) — the HTTPS-everywhere milestone (§3 locked decisions)
|
||||
// prohibits plaintext HTTP fallback.
|
||||
func newCertHolder(certPath, keyPath string) (*certHolder, error) {
|
||||
cert, err := tls.LoadX509KeyPair(certPath, keyPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("load TLS cert/key (cert=%q key=%q): %w", certPath, keyPath, err)
|
||||
}
|
||||
return &certHolder{
|
||||
cert: &cert,
|
||||
certPath: certPath,
|
||||
keyPath: keyPath,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// GetCertificate is the tls.Config.GetCertificate hook. Returns the current
|
||||
// cert under the holder's mutex. ClientHelloInfo is ignored — the control
|
||||
// plane does not multiplex by SNI.
|
||||
func (h *certHolder) GetCertificate(_ *tls.ClientHelloInfo) (*tls.Certificate, error) {
|
||||
h.mu.Lock()
|
||||
defer h.mu.Unlock()
|
||||
return h.cert, nil
|
||||
}
|
||||
|
||||
// Reload re-reads the cert+key pair from disk and swaps the holder
|
||||
// atomically on success. On failure the holder retains its previous cert
|
||||
// and the error is propagated to the caller — the SIGHUP watcher logs and
|
||||
// keeps serving the previous cert rather than crashing on a bad reload.
|
||||
// This is deliberately "fail-safe on reload, fail-loud on startup": an
|
||||
// operator rotating certs wants a recoverable error, not a restart loop.
|
||||
func (h *certHolder) Reload() error {
|
||||
cert, err := tls.LoadX509KeyPair(h.certPath, h.keyPath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("reload TLS cert/key (cert=%q key=%q): %w", h.certPath, h.keyPath, err)
|
||||
}
|
||||
h.mu.Lock()
|
||||
h.cert = &cert
|
||||
h.mu.Unlock()
|
||||
return nil
|
||||
}
|
||||
|
||||
// watchSIGHUP installs a signal handler that calls Reload() on each SIGHUP.
|
||||
// The returned stop function closes the internal done channel and stops
|
||||
// signal delivery so the goroutine can exit cleanly during shutdown. Errors
|
||||
// from Reload are logged but do not terminate the watcher — the operator
|
||||
// can fix the files and send another SIGHUP.
|
||||
//
|
||||
// Defensive design note: this deliberately does NOT panic on Reload error
|
||||
// even though HTTPS is mission-critical. A rotation that writes half-files
|
||||
// (operator overwrites cert.pem then key.pem as two separate copies) would
|
||||
// otherwise crash the server mid-rotation. Logging + retaining the old
|
||||
// cert gives the operator a bounded window to fix and re-SIGHUP.
|
||||
func (h *certHolder) watchSIGHUP(logger *slog.Logger) (stop func()) {
|
||||
ch := make(chan os.Signal, 1)
|
||||
signal.Notify(ch, syscall.SIGHUP)
|
||||
done := make(chan struct{})
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-ch:
|
||||
if err := h.Reload(); err != nil {
|
||||
logger.Error("TLS cert reload failed; continuing with previous cert",
|
||||
"error", err,
|
||||
"cert_path", h.certPath,
|
||||
"key_path", h.keyPath)
|
||||
continue
|
||||
}
|
||||
logger.Info("TLS cert reloaded via SIGHUP",
|
||||
"cert_path", h.certPath,
|
||||
"key_path", h.keyPath)
|
||||
case <-done:
|
||||
signal.Stop(ch)
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
return func() { close(done) }
|
||||
}
|
||||
|
||||
// buildServerTLSConfig returns the TLS 1.3-only *tls.Config for the HTTPS
|
||||
// server. Pinned per HTTPS-everywhere milestone §2.1 + §3 locked decisions:
|
||||
//
|
||||
// - MinVersion: TLS 1.3 (no TLS 1.2 escape hatch). Go 1.25's crypto/tls
|
||||
// automatically rejects older versions.
|
||||
// - CurvePreferences: explicit [X25519, P-256]. Explicit ordering keeps
|
||||
// the handshake deterministic and documents the accepted curves.
|
||||
// - No CipherSuites field: TLS 1.3 cipher suites are not negotiable in
|
||||
// the handshake (all three mandatory suites — AES-128-GCM-SHA256,
|
||||
// AES-256-GCM-SHA384, CHACHA20-POLY1305-SHA256 — are always offered).
|
||||
// Go's crypto/tls ignores CipherSuites for TLS 1.3.
|
||||
// - GetCertificate: reads through the holder so SIGHUP rotations take
|
||||
// effect on the next new connection without a restart. Setting
|
||||
// tls.Config.Certificates directly would pin the first-loaded cert
|
||||
// and defeat SIGHUP reload.
|
||||
func buildServerTLSConfig(holder *certHolder) *tls.Config {
|
||||
return &tls.Config{
|
||||
MinVersion: tls.VersionTLS13,
|
||||
CurvePreferences: []tls.CurveID{tls.X25519, tls.CurveP256},
|
||||
GetCertificate: holder.GetCertificate,
|
||||
}
|
||||
}
|
||||
|
||||
// preflightServerTLS is the fail-loud startup gate for HTTPS. Returns a
|
||||
// non-nil error when the TLS configuration is missing or the cert+key pair
|
||||
// cannot be parsed, so the caller refuses to start the control plane
|
||||
// (HTTPS-everywhere §3 locked decisions: no plaintext HTTP fallback).
|
||||
//
|
||||
// Duplicates the emptiness + stat + parse checks in config.Validate() for
|
||||
// defense in depth, mirroring the pattern established by
|
||||
// preflightSCEPChallengePassword (which itself duplicates
|
||||
// config.Validate()'s SCEP check for CWE-306). Extracted into a separate
|
||||
// function so the gate is unit-testable without booting the full server.
|
||||
func preflightServerTLS(certPath, keyPath string) error {
|
||||
if certPath == "" {
|
||||
return fmt.Errorf("CERTCTL_SERVER_TLS_CERT_PATH is empty: HTTPS-only control plane refuses to start (see docs/tls.md)")
|
||||
}
|
||||
if keyPath == "" {
|
||||
return fmt.Errorf("CERTCTL_SERVER_TLS_KEY_PATH is empty: HTTPS-only control plane refuses to start (see docs/tls.md)")
|
||||
}
|
||||
if _, err := os.Stat(certPath); err != nil {
|
||||
return fmt.Errorf("TLS cert file %q unreadable: %w (see docs/tls.md)", certPath, err)
|
||||
}
|
||||
if _, err := os.Stat(keyPath); err != nil {
|
||||
return fmt.Errorf("TLS key file %q unreadable: %w (see docs/tls.md)", keyPath, err)
|
||||
}
|
||||
if _, err := tls.LoadX509KeyPair(certPath, keyPath); err != nil {
|
||||
return fmt.Errorf("TLS cert/key pair invalid (cert=%q key=%q): %w (see docs/tls.md)", certPath, keyPath, err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
@@ -0,0 +1,418 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"crypto/ecdsa"
|
||||
"crypto/elliptic"
|
||||
"crypto/rand"
|
||||
"crypto/tls"
|
||||
"crypto/x509"
|
||||
"crypto/x509/pkix"
|
||||
"encoding/pem"
|
||||
"errors"
|
||||
"io"
|
||||
"log/slog"
|
||||
"math/big"
|
||||
"net"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sync"
|
||||
"syscall"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// generateTestCert writes a PEM-encoded self-signed leaf cert + ECDSA P-256
|
||||
// key pair to certPath/keyPath. The subject is derived from cn so tests can
|
||||
// tell reloaded certs apart from original certs by re-parsing the served
|
||||
// Certificate and comparing the CN.
|
||||
func generateTestCert(t *testing.T, certPath, keyPath, cn string) {
|
||||
t.Helper()
|
||||
priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
|
||||
if err != nil {
|
||||
t.Fatalf("ecdsa.GenerateKey: %v", err)
|
||||
}
|
||||
tmpl := &x509.Certificate{
|
||||
SerialNumber: big.NewInt(time.Now().UnixNano()),
|
||||
Subject: pkix.Name{CommonName: cn},
|
||||
NotBefore: time.Now().Add(-1 * time.Hour),
|
||||
NotAfter: time.Now().Add(24 * time.Hour),
|
||||
KeyUsage: x509.KeyUsageDigitalSignature,
|
||||
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
|
||||
DNSNames: []string{"localhost"},
|
||||
IPAddresses: []net.IP{net.ParseIP("127.0.0.1"), net.ParseIP("::1")},
|
||||
}
|
||||
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &priv.PublicKey, priv)
|
||||
if err != nil {
|
||||
t.Fatalf("x509.CreateCertificate: %v", err)
|
||||
}
|
||||
certPEM := pem.EncodeToMemory(&pem.Block{Type: "CERTIFICATE", Bytes: der})
|
||||
keyDER, err := x509.MarshalECPrivateKey(priv)
|
||||
if err != nil {
|
||||
t.Fatalf("MarshalECPrivateKey: %v", err)
|
||||
}
|
||||
keyPEM := pem.EncodeToMemory(&pem.Block{Type: "EC PRIVATE KEY", Bytes: keyDER})
|
||||
if err := os.WriteFile(certPath, certPEM, 0o600); err != nil {
|
||||
t.Fatalf("write cert: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(keyPath, keyPEM, 0o600); err != nil {
|
||||
t.Fatalf("write key: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
// readCertCN returns the CommonName from the leaf cert currently held by the
|
||||
// holder, by exercising the same GetCertificate path the tls handshake would
|
||||
// take. Lets tests assert which generation of the cert is being served.
|
||||
func readCertCN(t *testing.T, h *certHolder) string {
|
||||
t.Helper()
|
||||
c, err := h.GetCertificate(&tls.ClientHelloInfo{})
|
||||
if err != nil {
|
||||
t.Fatalf("GetCertificate: %v", err)
|
||||
}
|
||||
leaf, err := x509.ParseCertificate(c.Certificate[0])
|
||||
if err != nil {
|
||||
t.Fatalf("ParseCertificate: %v", err)
|
||||
}
|
||||
return leaf.Subject.CommonName
|
||||
}
|
||||
|
||||
func silentLogger() *slog.Logger {
|
||||
return slog.New(slog.NewTextHandler(io.Discard, &slog.HandlerOptions{Level: slog.LevelError}))
|
||||
}
|
||||
|
||||
func TestNewCertHolder_ValidPair_LoadsCert(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-initial")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
if got := readCertCN(t, h); got != "cn-initial" {
|
||||
t.Fatalf("CN mismatch: got %q want %q", got, "cn-initial")
|
||||
}
|
||||
}
|
||||
|
||||
func TestNewCertHolder_MissingFile_Fails(t *testing.T) {
|
||||
_, err := newCertHolder("/nonexistent/cert.pem", "/nonexistent/key.pem")
|
||||
if err == nil {
|
||||
t.Fatal("expected error for missing files, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestNewCertHolder_MalformedCert_Fails(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "bad.crt")
|
||||
keyPath := filepath.Join(dir, "bad.key")
|
||||
if err := os.WriteFile(certPath, []byte("not a pem cert"), 0o600); err != nil {
|
||||
t.Fatalf("write cert: %v", err)
|
||||
}
|
||||
if err := os.WriteFile(keyPath, []byte("not a pem key"), 0o600); err != nil {
|
||||
t.Fatalf("write key: %v", err)
|
||||
}
|
||||
_, err := newCertHolder(certPath, keyPath)
|
||||
if err == nil {
|
||||
t.Fatal("expected error for malformed PEM, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestCertHolder_Reload_SwapsCert(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-v1")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
if got := readCertCN(t, h); got != "cn-v1" {
|
||||
t.Fatalf("initial CN: got %q want cn-v1", got)
|
||||
}
|
||||
|
||||
// Rotate on disk and reload.
|
||||
generateTestCert(t, certPath, keyPath, "cn-v2")
|
||||
if err := h.Reload(); err != nil {
|
||||
t.Fatalf("Reload: %v", err)
|
||||
}
|
||||
if got := readCertCN(t, h); got != "cn-v2" {
|
||||
t.Fatalf("post-reload CN: got %q want cn-v2", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCertHolder_Reload_FailureRetainsPreviousCert(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-v1")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
|
||||
// Corrupt the cert file and attempt reload.
|
||||
if err := os.WriteFile(certPath, []byte("garbage"), 0o600); err != nil {
|
||||
t.Fatalf("corrupt cert: %v", err)
|
||||
}
|
||||
if err := h.Reload(); err == nil {
|
||||
t.Fatal("expected Reload error for corrupt file, got nil")
|
||||
}
|
||||
// Holder should still serve the v1 cert.
|
||||
if got := readCertCN(t, h); got != "cn-v1" {
|
||||
t.Fatalf("post-failed-reload CN: got %q want cn-v1 (reload must not clobber on failure)", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestCertHolder_GetCertificate_Concurrent(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-concurrent")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
|
||||
// 64 readers + 1 rotator for 500ms. Race detector catches any unsynchronized
|
||||
// swap of h.cert. Rotator writes fresh files + Reload, readers call
|
||||
// GetCertificate in a tight loop.
|
||||
var wg sync.WaitGroup
|
||||
done := make(chan struct{})
|
||||
const readers = 64
|
||||
for i := 0; i < readers; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for {
|
||||
select {
|
||||
case <-done:
|
||||
return
|
||||
default:
|
||||
if _, err := h.GetCertificate(&tls.ClientHelloInfo{}); err != nil {
|
||||
t.Errorf("GetCertificate: %v", err)
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for i := 0; i < 20; i++ {
|
||||
generateTestCert(t, certPath, keyPath, "cn-concurrent")
|
||||
_ = h.Reload()
|
||||
time.Sleep(10 * time.Millisecond)
|
||||
}
|
||||
}()
|
||||
time.Sleep(300 * time.Millisecond)
|
||||
close(done)
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
func TestCertHolder_WatchSIGHUP_ReloadsOnSignal(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-before-sighup")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
stop := h.watchSIGHUP(silentLogger())
|
||||
defer stop()
|
||||
|
||||
// Rotate on disk, then fire SIGHUP to our own process and poll for the swap.
|
||||
generateTestCert(t, certPath, keyPath, "cn-after-sighup")
|
||||
if err := syscall.Kill(syscall.Getpid(), syscall.SIGHUP); err != nil {
|
||||
t.Fatalf("SIGHUP: %v", err)
|
||||
}
|
||||
deadline := time.Now().Add(2 * time.Second)
|
||||
for time.Now().Before(deadline) {
|
||||
if readCertCN(t, h) == "cn-after-sighup" {
|
||||
return
|
||||
}
|
||||
time.Sleep(10 * time.Millisecond)
|
||||
}
|
||||
t.Fatalf("watcher did not reload cert within 2s (CN still %q)", readCertCN(t, h))
|
||||
}
|
||||
|
||||
func TestCertHolder_WatchSIGHUP_StopExits(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-stop")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
stop := h.watchSIGHUP(silentLogger())
|
||||
|
||||
// Closing should be synchronous and safe; a subsequent SIGHUP must not
|
||||
// cause a reload (the watcher goroutine is gone).
|
||||
stop()
|
||||
time.Sleep(50 * time.Millisecond) // let goroutine exit
|
||||
|
||||
// After stop, the signal may still be delivered to the process but the
|
||||
// watcher has called signal.Stop so this channel is no longer receiving.
|
||||
// Simply assert that calling stop() twice does not panic — the goroutine
|
||||
// has already exited, so a second close would panic on the `done`
|
||||
// channel; we do NOT call stop twice. Instead verify no regression in
|
||||
// the held cert.
|
||||
if got := readCertCN(t, h); got != "cn-stop" {
|
||||
t.Fatalf("unexpected cert rotation after stop: got %q want cn-stop", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestBuildServerTLSConfig_IsTLS13Only(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-cfg")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
cfg := buildServerTLSConfig(h)
|
||||
if cfg.MinVersion != tls.VersionTLS13 {
|
||||
t.Fatalf("MinVersion: got %#x want %#x (TLS 1.3)", cfg.MinVersion, tls.VersionTLS13)
|
||||
}
|
||||
wantCurves := []tls.CurveID{tls.X25519, tls.CurveP256}
|
||||
if len(cfg.CurvePreferences) != len(wantCurves) {
|
||||
t.Fatalf("CurvePreferences length: got %d want %d", len(cfg.CurvePreferences), len(wantCurves))
|
||||
}
|
||||
for i, c := range cfg.CurvePreferences {
|
||||
if c != wantCurves[i] {
|
||||
t.Fatalf("CurvePreferences[%d]: got %v want %v", i, c, wantCurves[i])
|
||||
}
|
||||
}
|
||||
if cfg.GetCertificate == nil {
|
||||
t.Fatal("GetCertificate: nil (holder not wired; SIGHUP reload would be broken)")
|
||||
}
|
||||
if len(cfg.Certificates) != 0 {
|
||||
t.Fatalf("Certificates: got %d want 0 (static cert would pin the first load and defeat reload)", len(cfg.Certificates))
|
||||
}
|
||||
}
|
||||
|
||||
func TestBuildServerTLSConfig_Handshake_TLS12Rejected(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-handshake")
|
||||
|
||||
h, err := newCertHolder(certPath, keyPath)
|
||||
if err != nil {
|
||||
t.Fatalf("newCertHolder: %v", err)
|
||||
}
|
||||
serverCfg := buildServerTLSConfig(h)
|
||||
|
||||
ln, err := tls.Listen("tcp", "127.0.0.1:0", serverCfg)
|
||||
if err != nil {
|
||||
t.Fatalf("tls.Listen: %v", err)
|
||||
}
|
||||
defer ln.Close()
|
||||
|
||||
// Server loop: accept and immediately close (we only care about the
|
||||
// handshake outcome).
|
||||
go func() {
|
||||
for {
|
||||
conn, err := ln.Accept()
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
// Force handshake so the server-side error surfaces.
|
||||
_ = conn.(*tls.Conn).Handshake()
|
||||
conn.Close()
|
||||
}
|
||||
}()
|
||||
|
||||
// TLS 1.3 client — should succeed.
|
||||
clientOK := &tls.Config{
|
||||
MinVersion: tls.VersionTLS13,
|
||||
MaxVersion: tls.VersionTLS13,
|
||||
InsecureSkipVerify: true,
|
||||
}
|
||||
c, err := tls.Dial("tcp", ln.Addr().String(), clientOK)
|
||||
if err != nil {
|
||||
t.Fatalf("TLS 1.3 dial failed (expected success): %v", err)
|
||||
}
|
||||
if c.ConnectionState().Version != tls.VersionTLS13 {
|
||||
t.Fatalf("negotiated version: got %#x want TLS 1.3 (%#x)", c.ConnectionState().Version, tls.VersionTLS13)
|
||||
}
|
||||
c.Close()
|
||||
|
||||
// TLS 1.2 client — must be rejected at handshake.
|
||||
clientOld := &tls.Config{
|
||||
MinVersion: tls.VersionTLS12,
|
||||
MaxVersion: tls.VersionTLS12,
|
||||
InsecureSkipVerify: true,
|
||||
}
|
||||
if _, err := tls.Dial("tcp", ln.Addr().String(), clientOld); err == nil {
|
||||
t.Fatal("TLS 1.2 dial succeeded; HTTPS-everywhere requires server to refuse TLS 1.2")
|
||||
}
|
||||
}
|
||||
|
||||
func TestPreflightServerTLS_MissingCertPath(t *testing.T) {
|
||||
err := preflightServerTLS("", "/any/key.pem")
|
||||
if err == nil {
|
||||
t.Fatal("expected error for empty cert path, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestPreflightServerTLS_MissingKeyPath(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-preflight")
|
||||
err := preflightServerTLS(certPath, "")
|
||||
if err == nil {
|
||||
t.Fatal("expected error for empty key path, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestPreflightServerTLS_CertFileNotReadable(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
if err := os.WriteFile(keyPath, []byte("k"), 0o600); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err := preflightServerTLS(filepath.Join(dir, "nope.crt"), keyPath)
|
||||
if err == nil {
|
||||
t.Fatal("expected error for unreadable cert path, got nil")
|
||||
}
|
||||
if !errors.Is(err, os.ErrNotExist) {
|
||||
t.Fatalf("expected os.ErrNotExist wrapped in error chain, got: %v", err)
|
||||
}
|
||||
}
|
||||
|
||||
func TestPreflightServerTLS_InvalidKeyPair(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
// Pair of valid cert + garbage key — files are readable but the pair
|
||||
// doesn't round-trip tls.LoadX509KeyPair.
|
||||
generateTestCert(t, certPath, keyPath, "cn-bad-pair")
|
||||
if err := os.WriteFile(keyPath, []byte("-----BEGIN EC PRIVATE KEY-----\nBAD\n-----END EC PRIVATE KEY-----\n"), 0o600); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
err := preflightServerTLS(certPath, keyPath)
|
||||
if err == nil {
|
||||
t.Fatal("expected error for invalid key pair, got nil")
|
||||
}
|
||||
}
|
||||
|
||||
func TestPreflightServerTLS_ValidPair_NoError(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
certPath := filepath.Join(dir, "tls.crt")
|
||||
keyPath := filepath.Join(dir, "tls.key")
|
||||
generateTestCert(t, certPath, keyPath, "cn-ok")
|
||||
if err := preflightServerTLS(certPath, keyPath); err != nil {
|
||||
t.Fatalf("unexpected error for valid pair: %v", err)
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,525 @@
|
||||
# certctl Docker Compose Environments
|
||||
|
||||
This guide walks through every Docker Compose file in the `deploy/` directory. Each section explains what the environment does, when to use it, every service and environment variable, and the commands to run it. If you've never used Docker before, start with the [Prerequisites](#prerequisites) section. If you're experienced, skip to the environment you need.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [How Docker Compose Works (30-Second Version)](#how-docker-compose-works)
|
||||
3. [Base Environment (docker-compose.yml)](#base-environment)
|
||||
4. [Demo Overlay (docker-compose.demo.yml)](#demo-overlay)
|
||||
5. [Development Overlay (docker-compose.dev.yml)](#development-overlay)
|
||||
6. [Test Environment (docker-compose.test.yml)](#test-environment)
|
||||
7. [Environment Variable Reference](#environment-variable-reference)
|
||||
8. [Common Operations](#common-operations)
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
You need two things: **Docker** (the container runtime) and **Docker Compose** (an orchestration tool that ships with Docker Desktop).
|
||||
|
||||
On macOS:
|
||||
```bash
|
||||
brew install --cask docker
|
||||
```
|
||||
|
||||
On Linux (Ubuntu/Debian):
|
||||
```bash
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
# Log out and back in for group changes to take effect
|
||||
```
|
||||
|
||||
Verify the install:
|
||||
```bash
|
||||
docker --version # Docker Engine 24+ recommended
|
||||
docker compose version # Docker Compose v2+ required (note: no hyphen)
|
||||
```
|
||||
|
||||
**What Docker actually does:** Docker packages an application and all its dependencies (OS libraries, runtimes, config files) into an isolated unit called a container. When you run `docker compose up`, Docker reads a YAML file that describes multiple containers, creates a private network between them, and starts everything in the right order. Each container sees only its own filesystem and network unless you explicitly share volumes or ports.
|
||||
|
||||
**Why this matters for certctl:** Instead of installing PostgreSQL, building Go binaries, configuring the agent, and wiring everything together by hand, one command gives you the complete platform. Each compose file targets a different use case.
|
||||
|
||||
---
|
||||
|
||||
## How Docker Compose Works
|
||||
|
||||
A compose file defines **services** (containers), **networks** (how they talk to each other), and **volumes** (persistent storage). The key concepts:
|
||||
|
||||
**Services** are named containers. `certctl-server` is the API and web dashboard. `postgres` is the database. `certctl-agent` polls the server for certificate work.
|
||||
|
||||
**Depends_on + healthchecks** control startup order. The server won't start until PostgreSQL reports healthy. The agent won't start until the server reports healthy. This prevents connection errors during boot.
|
||||
|
||||
**Volumes** persist data across restarts. `postgres_data` keeps your database between `docker compose down` and `docker compose up`. Adding `-v` to `down` deletes volumes for a clean slate.
|
||||
|
||||
**Overlay files** let you layer changes. Running `docker compose -f base.yml -f overlay.yml up` merges both files. The overlay can add services, change environment variables, or mount extra volumes without editing the base.
|
||||
|
||||
**Port mapping** (`"8443:8443"`) maps host port (left) to container port (right). After startup, `https://localhost:8443` on your machine reaches the certctl server inside its container (HTTPS-only as of v2.2; the `certctl-tls-init` init container bootstraps a self-signed cert into `deploy/test/certs/`).
|
||||
|
||||
---
|
||||
|
||||
## Base Environment
|
||||
|
||||
**File:** `docker-compose.yml`
|
||||
**When to use:** Production deployments, first-time setup, or any time you want a clean dashboard with the onboarding wizard.
|
||||
|
||||
### What it runs
|
||||
|
||||
Three services on a private bridge network:
|
||||
|
||||
| Service | Image | Purpose | Ports |
|
||||
|---------|-------|---------|-------|
|
||||
| `postgres` | `postgres:16-alpine` | Database. Stores certificates, agents, jobs, audit trail, policies, discovery results. | 5432 |
|
||||
| `certctl-server` | Built from `Dockerfile` | API server + web dashboard + background scheduler. | 8443 |
|
||||
| `certctl-agent` | Built from `Dockerfile.agent` | Polls server for work, generates keys, deploys certificates, discovers existing certs. | none |
|
||||
|
||||
### Starting it
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shankar0123/certctl.git
|
||||
cd certctl
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
`--build` compiles the Go server and agent from source, including the React frontend. Without it, Docker may reuse a stale image from a previous build.
|
||||
|
||||
`-d` runs in detached mode (background). Omit it to see logs in your terminal.
|
||||
|
||||
Wait about 30 seconds, then verify:
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml ps
|
||||
# All three services should show "Up (healthy)"
|
||||
|
||||
curl --cacert ./deploy/test/certs/ca.crt https://localhost:8443/health
|
||||
# {"status":"healthy"}
|
||||
```
|
||||
|
||||
The control plane is HTTPS-only as of v2.2. The `certctl-tls-init` init container bootstraps a self-signed cert into `deploy/test/certs/` on first boot; pin it with `--cacert` (as above) or pass `-k` for one-off smoke tests (never in production).
|
||||
|
||||
Open **https://localhost:8443** in your browser. You'll see the onboarding wizard guiding you through: connecting a CA, deploying an agent, and adding your first certificate. Your browser will flag the self-signed cert as untrusted — accept the warning for local evaluation, or import `deploy/test/certs/ca.crt` into your OS trust store to make the warning go away.
|
||||
|
||||
### Service-by-service walkthrough
|
||||
|
||||
#### PostgreSQL
|
||||
|
||||
```yaml
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
environment:
|
||||
POSTGRES_DB: certctl
|
||||
POSTGRES_USER: certctl
|
||||
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-certctl}
|
||||
```
|
||||
|
||||
Alpine-based PostgreSQL 16. The `${POSTGRES_PASSWORD:-certctl}` syntax means: use the `POSTGRES_PASSWORD` environment variable from your shell if set, otherwise default to `certctl`. For production, create a `.env` file:
|
||||
|
||||
```bash
|
||||
echo 'POSTGRES_PASSWORD=your-secure-password-here' > deploy/.env
|
||||
```
|
||||
|
||||
The `volumes` section mounts 10 migration files into PostgreSQL's init directory (`/docker-entrypoint-initdb.d/`). PostgreSQL runs these SQL files in alphabetical order on first boot only. They create the schema (tables, indexes, constraints) and seed the base data (default issuer, default policy). If the `postgres_data` volume already exists with an initialized database, these scripts are skipped entirely.
|
||||
|
||||
**Expert note:** The numbered prefix pattern (`001_`, `002_`, ..., `020_`) ensures deterministic execution order. All migrations use `IF NOT EXISTS` and `ON CONFLICT DO NOTHING` for idempotency, so re-running them against an existing database is safe.
|
||||
|
||||
**Stateful volume — first-boot password binding (U-1).** The same "first boot only" semantics that govern migration scripts also govern `POSTGRES_PASSWORD`. The official `postgres` image runs `initdb` exactly once — when `/var/lib/postgresql/data` is empty — and that pass is the only time `POSTGRES_PASSWORD` is written into `pg_authid`. On every subsequent boot, the postgres container ignores the env var and authenticates against whatever password was baked into the data directory on the original `up`. Editing `POSTGRES_PASSWORD` in `.env` after a successful first boot therefore only updates the **certctl-server** container's `CERTCTL_DATABASE_URL` — postgres still expects the previous password, and the server fails to ping with `pq: password authentication failed for user "certctl"` (SQLSTATE 28P01). The certctl-server container surfaces this case explicitly: when SQLSTATE 28P01 fires at startup, the wrap text in `internal/repository/postgres/db.go::wrapPingError` points operators at the two remediation paths — destructive volume teardown via `docker compose -f deploy/docker-compose.yml down -v && up -d --build`, or non-destructive in-place rotation via `docker compose -f deploy/docker-compose.yml exec postgres psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new>';"` followed by a server restart with the matching `POSTGRES_PASSWORD`. Use the destructive path on the demo / first-time setup; use the non-destructive path on any environment that holds data you want to keep.
|
||||
|
||||
#### certctl Server
|
||||
|
||||
```yaml
|
||||
certctl-server:
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
CERTCTL_DATABASE_URL: postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/certctl?sslmode=disable
|
||||
CERTCTL_SERVER_HOST: 0.0.0.0
|
||||
CERTCTL_SERVER_PORT: 8443
|
||||
CERTCTL_LOG_LEVEL: info
|
||||
CERTCTL_AUTH_TYPE: none
|
||||
CERTCTL_KEYGEN_MODE: server
|
||||
CERTCTL_NETWORK_SCAN_ENABLED: "true"
|
||||
CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key}
|
||||
```
|
||||
|
||||
The server is the control plane. It serves the REST API, the React dashboard, runs 7 background scheduler loops (renewal, job processing, health checks, notifications, short-lived cert expiry, network scanning, digest emails), and manages the issuer/target registry.
|
||||
|
||||
Key environment variables explained:
|
||||
|
||||
- `CERTCTL_DATABASE_URL` references the `postgres` service by hostname. Docker's internal DNS resolves `postgres` to the container's IP on the bridge network. `sslmode=disable` is appropriate because traffic stays on the private Docker network.
|
||||
- `CERTCTL_AUTH_TYPE: none` disables API key authentication so you can explore immediately. For production, set `api-key` and configure `CERTCTL_AUTH_SECRET`.
|
||||
- `CERTCTL_KEYGEN_MODE: server` means the server generates private keys. This is convenient for demos but insecure for production. In production, set `agent` so keys are generated on agent machines and never transmitted.
|
||||
- `CERTCTL_CONFIG_ENCRYPTION_KEY` enables AES-256-GCM encryption for issuer and target configurations stored in the database (credentials, API keys). Without this, the dynamic configuration GUI (adding issuers/targets from the dashboard) won't encrypt sensitive fields. For production, generate a strong random key.
|
||||
- `CERTCTL_NETWORK_SCAN_ENABLED` activates the scheduler loop that probes TLS endpoints on your network to discover certificates you might not be managing.
|
||||
|
||||
**Expert note:** The healthcheck hits `GET /health` every 10 seconds with 5 retries. The `depends_on: condition: service_healthy` on the agent means Docker holds agent startup until this check passes. Resource limits (`cpus: '1.0'`, `memory: 512M`) prevent the server from consuming unbounded resources in shared environments.
|
||||
|
||||
#### certctl Agent
|
||||
|
||||
```yaml
|
||||
certctl-agent:
|
||||
depends_on:
|
||||
certctl-server:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
CERTCTL_SERVER_URL: http://certctl-server:8443
|
||||
CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
|
||||
CERTCTL_AGENT_NAME: docker-agent
|
||||
CERTCTL_LOG_LEVEL: info
|
||||
CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys
|
||||
volumes:
|
||||
- agent_keys:/var/lib/certctl/keys
|
||||
```
|
||||
|
||||
The agent is a lightweight Go binary that polls the server for pending work (certificate deployments, CSR generation requests), executes that work locally, and reports results back. It also scans configured directories for existing certificates (filesystem discovery).
|
||||
|
||||
- `CERTCTL_SERVER_URL` uses the Docker internal hostname `certctl-server`. This resolves inside the Docker network only.
|
||||
- `CERTCTL_DISCOVERY_DIRS` tells the agent which directories to scan for existing certificates. The agent walks these directories recursively, parses PEM and DER files, and reports findings to the server for triage.
|
||||
- The `agent_keys` volume persists private keys generated by the agent across container restarts. Without this volume, keys would be lost when the container stops.
|
||||
|
||||
**Expert note:** The agent's healthcheck uses `pgrep` because the agent doesn't expose an HTTP endpoint. The `restart: unless-stopped` policy means Docker automatically restarts the agent on crashes but respects manual `docker compose stop` commands.
|
||||
|
||||
### Stopping and cleaning up
|
||||
|
||||
```bash
|
||||
# Stop containers but keep data
|
||||
docker compose -f deploy/docker-compose.yml down
|
||||
|
||||
# Stop and delete all data (database, keys, volumes)
|
||||
docker compose -f deploy/docker-compose.yml down -v
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Demo Overlay
|
||||
|
||||
**File:** `docker-compose.demo.yml`
|
||||
**When to use:** Demos, screenshots, stakeholder presentations, or any time you want a populated dashboard on first boot.
|
||||
|
||||
### What it adds
|
||||
|
||||
One line: mounts `seed_demo.sql` into PostgreSQL's init directory. This 667-line SQL file inserts 180 days of simulated operational history: teams, owners, certificates across multiple issuers, agents on different platforms, jobs with realistic timestamps, discovery scan results, audit events, policies, and profiles.
|
||||
|
||||
### Starting it
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
|
||||
```
|
||||
|
||||
The `-f` flags are ordered: base first, overlay second. Docker merges them. The demo overlay adds the seed_demo.sql volume mount to the `postgres` service defined in the base file.
|
||||
|
||||
### What you see
|
||||
|
||||
The dashboard shows pre-populated charts: expiration heatmap with upcoming renewals, status distribution across Active/Expiring/Expired/Failed states, 30-day job trends, and issuance rates. The sidebar pages (Certificates, Agents, Discovery, Jobs, etc.) all have data to explore.
|
||||
|
||||
### Resetting demo data
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml down -v
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
|
||||
```
|
||||
|
||||
The `down -v` deletes the `postgres_data` volume. On next boot, PostgreSQL re-runs all init scripts including the demo seed, giving you a clean starting point.
|
||||
|
||||
**Expert note:** The demo overlay is a pure data layer, not a configuration change. The server, agent, and their environment variables remain identical to the base. This means any behavior you see in the demo is exactly what the base environment produces once you populate data through normal operations.
|
||||
|
||||
---
|
||||
|
||||
## Development Overlay
|
||||
|
||||
**File:** `docker-compose.dev.yml`
|
||||
**When to use:** When you're contributing to certctl and need debug logging, database inspection, or a debugger attached to the server process.
|
||||
|
||||
### What it adds
|
||||
|
||||
| Addition | Purpose |
|
||||
|----------|---------|
|
||||
| Debug-level logging on server and agent | See every HTTP request, scheduler tick, and connector operation |
|
||||
| PgAdmin on port 5050 | Visual database browser for inspecting tables, running queries |
|
||||
| Delve debugger port 40000 | Attach a Go debugger to the running server process |
|
||||
|
||||
### Starting it
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.dev.yml up --build
|
||||
```
|
||||
|
||||
Omit `-d` during development so you see logs streaming in your terminal.
|
||||
|
||||
### Using PgAdmin
|
||||
|
||||
Open **http://localhost:5050** in your browser. PgAdmin is pre-configured in desktop mode (no login required). To connect to the certctl database:
|
||||
|
||||
1. Right-click "Servers" in the left panel, choose "Register" > "Server"
|
||||
2. Name: `certctl`
|
||||
3. Connection tab: Host = `postgres`, Port = `5432`, Username = `certctl`, Password = `certctl` (or whatever you set in `.env`)
|
||||
|
||||
From there you can browse all 19 tables, inspect certificate records, view audit events, check the scheduler's job queue, and run arbitrary SQL.
|
||||
|
||||
### Using the Delve debugger
|
||||
|
||||
Port 40000 is exposed for remote debugging. To use it, you'd need to modify the Dockerfile to build with debug symbols and start the server under Delve:
|
||||
|
||||
```bash
|
||||
# In Dockerfile, replace the CMD with:
|
||||
CMD ["dlv", "--listen=:40000", "--headless=true", "--api-version=2", "exec", "/app/server"]
|
||||
```
|
||||
|
||||
Then attach from your IDE (VS Code, GoLand) using remote debug configuration pointing to `localhost:40000`.
|
||||
|
||||
### Hot reload
|
||||
|
||||
The dev overlay includes commented-out volume mounts for source code directories. Uncomment them and install [air](https://github.com/cosmtrek/air) to get automatic recompilation on file changes:
|
||||
|
||||
```bash
|
||||
go install github.com/cosmtrek/air@latest
|
||||
```
|
||||
|
||||
**Expert note:** The `builds: context: ..` in the dev overlay overrides the base service's image reference, forcing a local build from the repository root. This means changes to your Go source code are compiled fresh on each `docker compose up --build`.
|
||||
|
||||
---
|
||||
|
||||
## Test Environment
|
||||
|
||||
**File:** `docker-compose.test.yml`
|
||||
**When to use:** Integration testing against real CA backends. This is a standalone environment (not an overlay) with 7 containers on a static-IP subnet.
|
||||
|
||||
### What it runs
|
||||
|
||||
| Service | IP | Purpose |
|
||||
|---------|----|---------|
|
||||
| `postgres` | 10.30.50.2 | Database (clean, no demo data) |
|
||||
| `pebble-challtestsrv` | 10.30.50.3 | DNS/HTTP challenge test server for Pebble |
|
||||
| `pebble` | 10.30.50.4 | ACME test server (simulates Let's Encrypt) |
|
||||
| `step-ca` | 10.30.50.5 | Private CA (Smallstep, JWK provisioner) |
|
||||
| `certctl-server` | 10.30.50.6 | Control plane with all issuers configured |
|
||||
| `nginx` | 10.30.50.7 | TLS target server for deployment testing |
|
||||
| `certctl-agent` | 10.30.50.8 | Agent with NGINX volume + discovery |
|
||||
|
||||
### Why static IPs?
|
||||
|
||||
Pebble (the ACME test server) validates HTTP-01 challenges by connecting to the challenge URL. It resolves domain names via `pebble-challtestsrv`, which is configured to return `10.30.50.6` (the certctl server) for all lookups. Without static IPs, container IPs would be assigned randomly on each boot, breaking the challenge validation chain.
|
||||
|
||||
The `/24` subnet (10.30.50.0/24) provides 254 usable addresses, far more than needed but standard practice for test networks.
|
||||
|
||||
### Starting it
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.test.yml up --build
|
||||
```
|
||||
|
||||
Wait for all health checks to pass (about 60 seconds for step-ca's first-run bootstrap). Then:
|
||||
|
||||
```bash
|
||||
# Dashboard with auth enabled (HTTPS-only as of v2.2; browser will warn on the self-signed cert —
|
||||
# accept the warning or trust `deploy/test/certs/ca.crt` in your OS keychain)
|
||||
open https://localhost:8443
|
||||
# API key: test-key-2026
|
||||
|
||||
# NGINX serving a self-signed placeholder
|
||||
curl -k https://localhost:8444
|
||||
```
|
||||
|
||||
### What's different from the base
|
||||
|
||||
The test environment is configured for production-like behavior:
|
||||
|
||||
- **API key auth enabled** (`CERTCTL_AUTH_TYPE: api-key`, `CERTCTL_AUTH_SECRET: test-key-2026`). Every API request needs `Authorization: Bearer test-key-2026`.
|
||||
- **Agent-side key generation** (`CERTCTL_KEYGEN_MODE: agent`). The agent generates ECDSA P-256 keys locally and submits only the CSR to the server. Private keys never leave the agent container.
|
||||
- **Three real issuers configured:**
|
||||
- **Local CA** (self-signed) for instant issuance testing
|
||||
- **ACME via Pebble** for Let's Encrypt-compatible flow testing (HTTP-01 challenges validated through the challenge test server)
|
||||
- **step-ca** for private CA testing with JWK provisioner authentication
|
||||
- **EST server enabled** (`CERTCTL_EST_ENABLED: "true"`) for RFC 7030 enrollment testing
|
||||
- **Post-deployment verification enabled** (`CERTCTL_VERIFY_DEPLOYMENT: "true"`) so the agent probes NGINX after deploying a cert and confirms the TLS fingerprint matches
|
||||
- **Dynamic config encryption enabled** (`CERTCTL_CONFIG_ENCRYPTION_KEY`) so issuer/target configs added through the GUI are encrypted at rest
|
||||
- **TLS trust bootstrapping:** The server runs a `setup-trust.sh` entrypoint that fetches Pebble's root CA from its management API and copies step-ca's root cert from a shared volume, then runs `update-ca-certificates` before starting the server binary. This is necessary because both CAs use self-signed roots that aren't in Alpine's default trust store.
|
||||
|
||||
### Running the Go integration tests
|
||||
|
||||
The test environment is designed to support the Go integration test suite at `deploy/test/integration_test.go`:
|
||||
|
||||
```bash
|
||||
# Start the environment
|
||||
docker compose -f deploy/docker-compose.test.yml up --build -d
|
||||
|
||||
# Wait for health checks
|
||||
sleep 30
|
||||
|
||||
# Run integration tests (from repo root)
|
||||
go test -tags integration -v ./deploy/test/...
|
||||
```
|
||||
|
||||
The integration tests exercise 12 phases: health, agent heartbeat, Local CA issuance, ACME issuance, renewal, step-ca issuance, revocation + CRL + OCSP, EST enrollment, S/MIME issuance, discovery, network scan, and deployment verification. PostgreSQL port 5432 is exposed so the test binary can query the database directly for assertions.
|
||||
|
||||
See [docs/test-env.md](../docs/test-env.md) for the full walkthrough and manual QA procedures.
|
||||
|
||||
### Stopping and cleaning up
|
||||
|
||||
```bash
|
||||
# Stop but keep data (volumes persist)
|
||||
docker compose -f deploy/docker-compose.test.yml down
|
||||
|
||||
# Full reset (delete step-ca bootstrap, database, agent keys, NGINX certs)
|
||||
docker compose -f deploy/docker-compose.test.yml down -v
|
||||
```
|
||||
|
||||
**Expert note:** The step-ca container auto-bootstraps on first run: generates a root CA, creates a JWK provisioner named "admin" with password "password123", and writes everything to the `stepca_data` volume. Subsequent starts reuse this volume. If you `down -v`, the next boot generates a new root CA, which means all previously issued step-ca certs become untrusted.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variable Reference
|
||||
|
||||
Every `CERTCTL_*` environment variable is read by the server's `internal/config/config.go` via `os.Getenv`. If the prefix is missing, the variable is silently ignored.
|
||||
|
||||
### Server
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_DATABASE_URL` | (required) | PostgreSQL connection string |
|
||||
| `CERTCTL_SERVER_HOST` | `0.0.0.0` | Listen address |
|
||||
| `CERTCTL_SERVER_PORT` | `8443` | Listen port |
|
||||
| `CERTCTL_LOG_LEVEL` | `info` | Log verbosity: `debug`, `info`, `warn`, `error` |
|
||||
| `CERTCTL_AUTH_TYPE` | `api-key` | Auth mode: `api-key` or `none` |
|
||||
| `CERTCTL_AUTH_SECRET` | (none) | API key(s), comma-separated for rotation |
|
||||
| `CERTCTL_KEYGEN_MODE` | `agent` | Key generation: `agent` (production) or `server` (demo) |
|
||||
| `CERTCTL_CONFIG_ENCRYPTION_KEY` | (none) | AES-256-GCM key for encrypting issuer/target configs in DB |
|
||||
| `CERTCTL_NETWORK_SCAN_ENABLED` | `false` | Enable network TLS scanning scheduler loop |
|
||||
| `CERTCTL_NETWORK_SCAN_INTERVAL` | `6h` | How often the network scanner runs |
|
||||
| `CERTCTL_MAX_BODY_SIZE` | `1048576` | Max request body size in bytes (1MB) |
|
||||
| `CERTCTL_CORS_ORIGINS` | (empty) | Allowed CORS origins, comma-separated. Empty = deny all cross-origin |
|
||||
| `CERTCTL_RATE_LIMIT_RPS` | `10` | Requests per second per client |
|
||||
| `CERTCTL_RATE_LIMIT_BURST` | `20` | Burst allowance above RPS |
|
||||
|
||||
### Agent
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_SERVER_URL` | (required) | Server API URL |
|
||||
| `CERTCTL_API_KEY` | (none) | API key for authenticating with server |
|
||||
| `CERTCTL_AGENT_NAME` | (hostname) | Display name in dashboard |
|
||||
| `CERTCTL_AGENT_ID` | (auto-generated) | Stable agent identifier |
|
||||
| `CERTCTL_KEYGEN_MODE` | `agent` | Must match server setting |
|
||||
| `CERTCTL_LOG_LEVEL` | `info` | Log verbosity |
|
||||
| `CERTCTL_KEY_DIR` | `/var/lib/certctl/keys` | Directory for private key storage (0600 perms) |
|
||||
| `CERTCTL_DISCOVERY_DIRS` | (none) | Comma-separated paths to scan for existing certs |
|
||||
|
||||
### Issuers (Server)
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `CERTCTL_ACME_DIRECTORY_URL` | ACME CA directory (e.g., Let's Encrypt, Pebble) |
|
||||
| `CERTCTL_ACME_EMAIL` | ACME account email |
|
||||
| `CERTCTL_ACME_CHALLENGE_TYPE` | `http-01`, `dns-01`, or `dns-persist-01` |
|
||||
| `CERTCTL_ACME_INSECURE` | Skip TLS verification for ACME CA (test only) |
|
||||
| `CERTCTL_ACME_EAB_KID` / `CERTCTL_ACME_EAB_HMAC` | External Account Binding for ZeroSSL, Google Trust Services |
|
||||
| `CERTCTL_ACME_ARI_ENABLED` | Enable RFC 9773 Renewal Information |
|
||||
| `CERTCTL_ACME_PROFILE` | ACME profile (`tlsserver`, `shortlived`) |
|
||||
| `CERTCTL_STEPCA_URL` | step-ca server URL |
|
||||
| `CERTCTL_STEPCA_ROOT_CERT` | Path to step-ca root CA cert |
|
||||
| `CERTCTL_STEPCA_PROVISIONER` | Provisioner name |
|
||||
| `CERTCTL_STEPCA_PASSWORD` | Provisioner password |
|
||||
| `CERTCTL_STEPCA_KEY_PATH` | Path to provisioner key |
|
||||
| `CERTCTL_CA_CERT_PATH` / `CERTCTL_CA_KEY_PATH` | Sub-CA mode: load CA cert+key from disk |
|
||||
| `CERTCTL_VAULT_ADDR` | Vault server address |
|
||||
| `CERTCTL_VAULT_TOKEN` | Vault auth token |
|
||||
| `CERTCTL_VAULT_MOUNT` | PKI secrets engine mount (default: `pki`) |
|
||||
| `CERTCTL_VAULT_ROLE` | PKI role name |
|
||||
| `CERTCTL_DIGICERT_API_KEY` | DigiCert CertCentral API key |
|
||||
| `CERTCTL_DIGICERT_ORG_ID` | DigiCert organization ID |
|
||||
| `CERTCTL_SECTIGO_CUSTOMER_URI` / `_LOGIN` / `_PASSWORD` | Sectigo SCM auth |
|
||||
| `CERTCTL_GOOGLE_CAS_PROJECT` / `_LOCATION` / `_CA_POOL` / `_CREDENTIALS` | Google CAS config |
|
||||
|
||||
### EST Server
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_EST_ENABLED` | `false` | Enable RFC 7030 EST endpoints |
|
||||
| `CERTCTL_EST_ISSUER_ID` | `iss-local` | Which issuer processes EST enrollments |
|
||||
| `CERTCTL_EST_PROFILE_ID` | (none) | Optional profile constraint |
|
||||
|
||||
### Post-Deployment Verification
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `CERTCTL_VERIFY_DEPLOYMENT` | `false` | Agent probes TLS after deploying |
|
||||
| `CERTCTL_VERIFY_TIMEOUT` | `10s` | TLS probe timeout |
|
||||
| `CERTCTL_VERIFY_DELAY` | `2s` | Wait before probing (let service reload) |
|
||||
|
||||
### Notifications
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `CERTCTL_SMTP_HOST` / `_PORT` / `_USERNAME` / `_PASSWORD` / `_FROM_ADDRESS` / `_USE_TLS` | SMTP email |
|
||||
| `CERTCTL_SLACK_WEBHOOK_URL` / `_CHANNEL` / `_USERNAME` | Slack notifications |
|
||||
| `CERTCTL_TEAMS_WEBHOOK_URL` | Microsoft Teams |
|
||||
| `CERTCTL_PAGERDUTY_ROUTING_KEY` / `_SEVERITY` | PagerDuty alerts |
|
||||
| `CERTCTL_OPSGENIE_API_KEY` / `_PRIORITY` | OpsGenie alerts |
|
||||
| `CERTCTL_DIGEST_ENABLED` / `_INTERVAL` / `_RECIPIENTS` | Scheduled digest email |
|
||||
|
||||
---
|
||||
|
||||
## Common Operations
|
||||
|
||||
### Viewing logs
|
||||
|
||||
```bash
|
||||
# All services
|
||||
docker compose -f deploy/docker-compose.yml logs -f
|
||||
|
||||
# Single service
|
||||
docker compose -f deploy/docker-compose.yml logs -f certctl-server
|
||||
|
||||
# Last 100 lines
|
||||
docker compose -f deploy/docker-compose.yml logs --tail 100 certctl-server
|
||||
```
|
||||
|
||||
### Rebuilding after code changes
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
Docker only rebuilds images that have changed source files. The `--build` flag is essential after editing Go code or frontend files.
|
||||
|
||||
### Connecting to the database directly
|
||||
|
||||
```bash
|
||||
docker exec -it certctl-postgres psql -U certctl -d certctl
|
||||
```
|
||||
|
||||
Useful queries:
|
||||
```sql
|
||||
-- Certificate inventory
|
||||
SELECT id, common_name, status, expires_at FROM managed_certificates ORDER BY expires_at;
|
||||
|
||||
-- Recent jobs
|
||||
SELECT id, type, status, certificate_id, created_at FROM jobs ORDER BY created_at DESC LIMIT 20;
|
||||
|
||||
-- Audit trail
|
||||
SELECT event_type, actor, resource_id, created_at FROM audit_events ORDER BY created_at DESC LIMIT 20;
|
||||
|
||||
-- Issuer configurations (encrypted_config is AES-256-GCM)
|
||||
SELECT id, type, source, enabled, test_status FROM issuers;
|
||||
```
|
||||
|
||||
### Checking container resource usage
|
||||
|
||||
```bash
|
||||
docker stats --no-stream
|
||||
```
|
||||
|
||||
### Upgrading
|
||||
|
||||
```bash
|
||||
git pull
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
Migrations are idempotent (`IF NOT EXISTS`), so upgrading to a version with new schema changes is safe. PostgreSQL only runs init scripts on first boot of a fresh volume, so new migrations in an upgrade require running them manually:
|
||||
|
||||
```bash
|
||||
docker exec -i certctl-postgres psql -U certctl -d certctl < migrations/000011_new_feature.up.sql
|
||||
```
|
||||
|
||||
Or, for a clean upgrade: `down -v` and `up --build` (loses existing data).
|
||||
@@ -0,0 +1,26 @@
|
||||
# Demo mode: pre-populated dashboard with 32 certificates, 8 agents, 10 issuers, etc.
|
||||
# Use this to showcase certctl's dashboard with realistic data.
|
||||
#
|
||||
# Usage:
|
||||
# docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build
|
||||
#
|
||||
# To start fresh (wipe previous data):
|
||||
# docker compose -f docker-compose.yml -f docker-compose.demo.yml down -v
|
||||
# docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift): pre-U-3 this overlay mounted
|
||||
# `seed_demo.sql` into postgres `/docker-entrypoint-initdb.d/`. That worked
|
||||
# only because the production stack also mounted the migrations there, so
|
||||
# the schema existed at initdb time. Once U-3 dropped the production
|
||||
# initdb mounts (single source of truth: server runs RunMigrations + RunSeed
|
||||
# at boot), the demo seed could no longer be applied at initdb time — the
|
||||
# tables it references wouldn't exist yet.
|
||||
#
|
||||
# Post-U-3 the demo overlay just sets CERTCTL_DEMO_SEED=true; the server
|
||||
# applies seed_demo.sql at boot via postgres.RunDemoSeed AFTER baseline
|
||||
# migrations + seed.sql are in place. Same single source of truth, no
|
||||
# initdb mounts, no schema-vs-seed drift.
|
||||
services:
|
||||
certctl-server:
|
||||
environment:
|
||||
CERTCTL_DEMO_SEED: "true"
|
||||
@@ -9,11 +9,21 @@ services:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Node frontend stage and Go module
|
||||
# download can reach the public registries behind corporate proxies.
|
||||
# Defaults to empty; omit the variables from the host environment for
|
||||
# un-proxied builds and the behaviour is byte-identical to the pre-fix
|
||||
# tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
environment:
|
||||
# Verbose logging for development
|
||||
LOG_LEVEL: debug
|
||||
SERVER_HOST: 0.0.0.0
|
||||
SERVER_PORT: 8443
|
||||
CERTCTL_LOG_LEVEL: debug
|
||||
CERTCTL_SERVER_HOST: 0.0.0.0
|
||||
CERTCTL_SERVER_PORT: "8443"
|
||||
volumes:
|
||||
# Mount local source for hot reload (requires air or similar)
|
||||
# Uncomment if using air or similar for hot reload:
|
||||
@@ -29,8 +39,17 @@ services:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile.agent
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Go module download stage can reach
|
||||
# the public Go module proxy behind corporate proxies. Defaults to
|
||||
# empty; omit the variables from the host environment for un-proxied
|
||||
# builds and the behaviour is byte-identical to the pre-fix tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
environment:
|
||||
LOG_LEVEL: debug
|
||||
CERTCTL_LOG_LEVEL: debug
|
||||
|
||||
# PgAdmin for database exploration
|
||||
pgadmin:
|
||||
|
||||
@@ -0,0 +1,429 @@
|
||||
# =============================================================================
|
||||
# certctl Testing Environment — Docker Compose
|
||||
# =============================================================================
|
||||
#
|
||||
# Spins up the full certctl platform with real CA backends for manual QA:
|
||||
#
|
||||
# 0. certctl-tls-init — one-shot init container; writes self-signed
|
||||
# server.crt/.key/ca.crt into ./test/certs (bind
|
||||
# mount, not a named volume — host-readable for
|
||||
# the Go integration test binary)
|
||||
# 1. PostgreSQL 16 — database (clean, no demo data)
|
||||
# 2. certctl-server — control plane API + web dashboard on :8443 (HTTPS)
|
||||
# 3. certctl-agent — polls for work, deploys certs to NGINX
|
||||
# 4. step-ca — private CA (JWK provisioner, auto-bootstraps)
|
||||
# 5. Pebble — ACME test server (simulates Let's Encrypt)
|
||||
# 6. pebble-challtestsrv — DNS/HTTP challenge test server for Pebble
|
||||
# 7. NGINX — TLS target server on :8080 (HTTP) / :8444 (HTTPS)
|
||||
#
|
||||
# Usage:
|
||||
# cd deploy
|
||||
# docker compose -f docker-compose.test.yml up --build
|
||||
#
|
||||
# Dashboard: https://localhost:8443 (self-signed — use --cacert test/certs/ca.crt)
|
||||
# API key: test-key-2026
|
||||
# NGINX: https://localhost:8444 (self-signed placeholder until cert deployed)
|
||||
#
|
||||
# Integration tests: `go test -tags integration ./deploy/test/...` picks up
|
||||
# the CA bundle at ./test/certs/ca.crt automatically via CERTCTL_TEST_CA_BUNDLE.
|
||||
#
|
||||
# See docs/test-env.md for the full walkthrough.
|
||||
# =============================================================================
|
||||
|
||||
services:
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# HTTPS-Everywhere Phase 6 — self-signed TLS bootstrap for the test harness.
|
||||
# ---------------------------------------------------------------------------
|
||||
# Mirrors the production `certctl-tls-init` (see docker-compose.yml §10-43)
|
||||
# but writes into a *host bind mount* (./test/certs) instead of a named
|
||||
# volume. The named-volume approach works fine inside Docker but hides the
|
||||
# CA bundle from the Go integration test binary that runs on the host; the
|
||||
# bind mount exposes /etc/certctl/tls/ca.crt at deploy/test/certs/ca.crt
|
||||
# so `newTestClient()` can load it into an x509.CertPool and validate the
|
||||
# self-signed server cert. Test-only divergence, explicitly documented.
|
||||
#
|
||||
# The generated cert has SAN=DNS:certctl-server,DNS:localhost,IP:127.0.0.1
|
||||
# so both in-cluster traffic (agent → certctl-server:8443) and host traffic
|
||||
# (go test → localhost:8443) validate cleanly. Destroy via
|
||||
# `docker compose -f docker-compose.test.yml down -v` + `rm -rf test/certs`
|
||||
# to force regeneration. Keys written 0600, certs 0644, owned 1000:1000
|
||||
# (the UID the server binary runs as inside its container per Dockerfile:64).
|
||||
certctl-tls-init:
|
||||
image: alpine/openssl:latest
|
||||
container_name: certctl-test-tls-init
|
||||
restart: "no"
|
||||
entrypoint: /bin/sh
|
||||
command:
|
||||
- -c
|
||||
- |
|
||||
set -eu
|
||||
CERT=/etc/certctl/tls/server.crt
|
||||
KEY=/etc/certctl/tls/server.key
|
||||
CA=/etc/certctl/tls/ca.crt
|
||||
if [ -f "$$CERT" ] && [ -f "$$KEY" ] && [ -f "$$CA" ]; then
|
||||
echo "TLS cert already present at $$CERT — skipping generation"
|
||||
else
|
||||
mkdir -p /etc/certctl/tls
|
||||
openssl req -x509 -newkey ec \
|
||||
-pkeyopt ec_paramgen_curve:P-256 \
|
||||
-nodes \
|
||||
-keyout "$$KEY" \
|
||||
-out "$$CERT" \
|
||||
-days 3650 \
|
||||
-subj "/CN=certctl-server" \
|
||||
-addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1,IP:::1"
|
||||
cp "$$CERT" "$$CA"
|
||||
echo "Generated self-signed TLS cert for certctl-test-server (ECDSA-P256/SHA-256, 3650d, CN=certctl-server)"
|
||||
fi
|
||||
# The test server container runs as root (see `user: "0:0"` below)
|
||||
# because setup-trust.sh needs to update the system trust store, so
|
||||
# the perms here are really about host-side readability — 0644 on
|
||||
# the CA/cert lets `go test` on the host read the bundle without a
|
||||
# chown dance.
|
||||
chown 1000:1000 "$$CERT" "$$KEY" "$$CA" || true
|
||||
chmod 0644 "$$CERT" "$$CA"
|
||||
chmod 0600 "$$KEY"
|
||||
volumes:
|
||||
- ./test/certs:/etc/certctl/tls
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.9
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Database
|
||||
# ---------------------------------------------------------------------------
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10): the test stack used
|
||||
# to mount a hand-curated subset of migrations + seed.sql + a never-checked-in
|
||||
# seed_test.sql into postgres `/docker-entrypoint-initdb.d/`. Same hazard as
|
||||
# the production compose — initdb crashed any time a new migration shipped
|
||||
# that the seed depended on without the mount list being updated. Post-U-3
|
||||
# the schema is built EXCLUSIVELY by the server at startup via
|
||||
# internal/repository/postgres.RunMigrations + RunSeed. Postgres comes up
|
||||
# empty and the server lands the full ladder + baseline seed in one shot.
|
||||
# `start_period: 30s` matches the production compose and shields slow CI
|
||||
# runners from healthcheck flap during initdb.
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: certctl-test-postgres
|
||||
environment:
|
||||
POSTGRES_DB: certctl
|
||||
POSTGRES_USER: certctl
|
||||
POSTGRES_PASSWORD: testpass
|
||||
volumes:
|
||||
- test_postgres_data:/var/lib/postgresql/data
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.2
|
||||
ports:
|
||||
- "5432:5432"
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U certctl -d certctl"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pebble — ACME test server (simulates Let's Encrypt)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pebble is the official ACME test server from Let's Encrypt (RFC 8555).
|
||||
# It validates challenges via the companion challtestsrv.
|
||||
# Root CA cert available at https://pebble:15000/roots/0 (management API).
|
||||
pebble-challtestsrv:
|
||||
image: ghcr.io/letsencrypt/pebble-challtestsrv:latest
|
||||
container_name: certctl-test-challtestsrv
|
||||
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
|
||||
# Matches the official Pebble docker-compose format.
|
||||
# -doh "" disables DoH (default :8443 would conflict with certctl server).
|
||||
# defaultIPv4 must point to the certctl-server (10.30.50.6) because that's where
|
||||
# the ACME HTTP-01 challenge server runs (port 80 inside the container).
|
||||
# Pebble resolves domains via challtestsrv, then connects to this IP to validate.
|
||||
command: -defaultIPv4 10.30.50.6 -defaultIPv6 "" -doh ""
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.3
|
||||
restart: unless-stopped
|
||||
|
||||
pebble:
|
||||
image: ghcr.io/letsencrypt/pebble:latest
|
||||
container_name: certctl-test-pebble
|
||||
depends_on:
|
||||
- pebble-challtestsrv
|
||||
environment:
|
||||
PEBBLE_VA_NOSLEEP: 1
|
||||
PEBBLE_VA_ALWAYS_VALID: 0
|
||||
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
|
||||
command:
|
||||
- -config
|
||||
- /test/config/pebble-config.json
|
||||
- -dnsserver
|
||||
- "10.30.50.3:8053"
|
||||
- -strict
|
||||
volumes:
|
||||
- ./test/pebble-config.json:/test/config/pebble-config.json:ro
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.4
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# step-ca — Private CA (Smallstep)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Auto-bootstraps on first run: generates root CA + JWK provisioner "admin".
|
||||
# Root cert: /home/step/certs/root_ca.crt (inside stepca_data volume)
|
||||
# Provisioner key: /home/step/secrets/provisioner_key (encrypted JWK)
|
||||
step-ca:
|
||||
image: smallstep/step-ca:latest
|
||||
container_name: certctl-test-stepca
|
||||
environment:
|
||||
DOCKER_STEPCA_INIT_NAME: "certctl-test-ca"
|
||||
DOCKER_STEPCA_INIT_DNS_NAMES: "step-ca,localhost"
|
||||
DOCKER_STEPCA_INIT_PROVISIONER_NAME: "admin"
|
||||
DOCKER_STEPCA_INIT_PASSWORD: "password123"
|
||||
DOCKER_STEPCA_INIT_ADDRESS: ":9000"
|
||||
volumes:
|
||||
- stepca_data:/home/step
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.5
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-fk", "https://localhost:9000/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
start_period: 15s
|
||||
retries: 10
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# certctl Server (Control Plane)
|
||||
# ---------------------------------------------------------------------------
|
||||
# Connects to PostgreSQL, Pebble (ACME), step-ca, and Local CA.
|
||||
#
|
||||
# TLS trust problem: Pebble and step-ca use self-signed root CAs that
|
||||
# aren't in Alpine's trust store. The ACME and step-ca connectors use
|
||||
# Go's default http.Client (no InsecureSkipVerify), so they need the
|
||||
# CA certs in the system trust store.
|
||||
#
|
||||
# Solution: setup-trust.sh runs as root, fetches Pebble CA from its
|
||||
# management API, copies step-ca root cert from the shared volume,
|
||||
# runs update-ca-certificates, then execs the server binary.
|
||||
certctl-server:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Node frontend stage and Go module
|
||||
# download can reach the public registries behind corporate proxies.
|
||||
# Defaults to empty; omit the variables from the host environment for
|
||||
# un-proxied builds and the behaviour is byte-identical to the pre-fix
|
||||
# tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
container_name: certctl-test-server
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
pebble:
|
||||
condition: service_started
|
||||
step-ca:
|
||||
condition: service_healthy
|
||||
# HTTPS-Everywhere Phase 6: block server boot until the init container
|
||||
# has written server.crt / server.key / ca.crt into ./test/certs. The
|
||||
# init container runs once and exits 0; service_completed_successfully
|
||||
# makes that a gating dependency rather than a liveness one.
|
||||
certctl-tls-init:
|
||||
condition: service_completed_successfully
|
||||
# Run as root so update-ca-certificates can write to /etc/ssl/certs.
|
||||
# Container isolation provides the security boundary.
|
||||
user: "0:0"
|
||||
entrypoint: ["/bin/sh", "/app/setup-trust.sh"]
|
||||
environment:
|
||||
# Database
|
||||
CERTCTL_DATABASE_URL: postgres://certctl:testpass@postgres:5432/certctl?sslmode=disable
|
||||
|
||||
# Server
|
||||
CERTCTL_SERVER_HOST: 0.0.0.0
|
||||
CERTCTL_SERVER_PORT: 8443
|
||||
# HTTPS-Everywhere Phase 6: point the server at the init-container-generated
|
||||
# cert/key pair (bind-mounted from ./test/certs). Same paths as production
|
||||
# compose so the server binary code path is identical; only the host-side
|
||||
# storage differs (bind mount vs named volume — see §certctl-tls-init block).
|
||||
CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt
|
||||
CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key
|
||||
CERTCTL_LOG_LEVEL: debug
|
||||
|
||||
# Auth — API key required (production-like)
|
||||
CERTCTL_AUTH_TYPE: api-key
|
||||
CERTCTL_AUTH_SECRET: test-key-2026
|
||||
|
||||
# Key generation — agent-side (production-like)
|
||||
CERTCTL_KEYGEN_MODE: agent
|
||||
|
||||
# Local CA issuer (iss-local) — self-signed mode (no CA cert/key paths)
|
||||
# This is the simplest issuer, always available.
|
||||
|
||||
# ACME issuer (iss-acme-staging) — pointed at Pebble
|
||||
CERTCTL_ACME_DIRECTORY_URL: https://pebble:14000/dir
|
||||
CERTCTL_ACME_EMAIL: test@certctl.dev
|
||||
CERTCTL_ACME_CHALLENGE_TYPE: http-01
|
||||
CERTCTL_ACME_INSECURE: "true"
|
||||
|
||||
# step-ca issuer (iss-stepca)
|
||||
CERTCTL_STEPCA_URL: https://step-ca:9000
|
||||
CERTCTL_STEPCA_ROOT_CERT: /stepca-data/certs/root_ca.crt
|
||||
CERTCTL_STEPCA_PROVISIONER: admin
|
||||
CERTCTL_STEPCA_PASSWORD: password123
|
||||
CERTCTL_STEPCA_KEY_PATH: /stepca-data/secrets/provisioner_key
|
||||
|
||||
# EST server (RFC 7030) — uses Local CA by default
|
||||
CERTCTL_EST_ENABLED: "true"
|
||||
CERTCTL_EST_ISSUER_ID: iss-local
|
||||
|
||||
# Dynamic issuer/target config encryption (M34/M35)
|
||||
CERTCTL_CONFIG_ENCRYPTION_KEY: test-encryption-key-32chars!!
|
||||
|
||||
# Network scanning
|
||||
CERTCTL_NETWORK_SCAN_ENABLED: "true"
|
||||
|
||||
# Post-deployment TLS verification
|
||||
CERTCTL_VERIFY_DEPLOYMENT: "true"
|
||||
CERTCTL_VERIFY_TIMEOUT: "10s"
|
||||
CERTCTL_VERIFY_DELAY: "3s"
|
||||
ports:
|
||||
- "8443:8443"
|
||||
volumes:
|
||||
- ./test/setup-trust.sh:/app/setup-trust.sh:ro
|
||||
# step-ca data volume (root cert at /certs/root_ca.crt, key at /secrets/provisioner_key)
|
||||
- stepca_data:/stepca-data:ro
|
||||
# HTTPS-Everywhere Phase 6: read-only bind mount of the init-generated
|
||||
# TLS material. The init container writes here; server reads here; the
|
||||
# agent mounts the same host path at the same container path (see below)
|
||||
# so /etc/certctl/tls/ca.crt resolves to the *same* bytes on both sides.
|
||||
- ./test/certs:/etc/certctl/tls:ro
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.6
|
||||
healthcheck:
|
||||
# HTTPS-Everywhere Phase 6: healthcheck now speaks TLS with --cacert to
|
||||
# verify the self-signed server cert against the init-generated bundle.
|
||||
# /health requires auth when CERTCTL_AUTH_TYPE=api-key, so include the
|
||||
# Bearer token. curl exits non-zero on both TLS handshake failure and
|
||||
# non-2xx status — either failure keeps depends_on: {condition:
|
||||
# service_healthy} from unblocking the agent, which is what we want.
|
||||
test: ["CMD", "curl", "--cacert", "/etc/certctl/tls/ca.crt", "-f", "-H", "Authorization: Bearer test-key-2026", "https://localhost:8443/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
start_period: 30s
|
||||
retries: 10
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# NGINX — TLS Target Server
|
||||
# ---------------------------------------------------------------------------
|
||||
# The agent deploys certificates here via the shared nginx_certs volume.
|
||||
# nginx-entrypoint.sh generates a self-signed placeholder cert so NGINX
|
||||
# can boot before the agent deploys a real cert.
|
||||
#
|
||||
# Ports: 8080 (HTTP) / 8444 (HTTPS) — offset to avoid conflict with server.
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
container_name: certctl-test-nginx
|
||||
entrypoint: ["/bin/sh", "/entrypoint.sh"]
|
||||
volumes:
|
||||
- ./test/nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
- ./test/nginx-entrypoint.sh:/entrypoint.sh:ro
|
||||
- nginx_certs:/etc/nginx/certs
|
||||
ports:
|
||||
- "8080:80"
|
||||
- "8444:443"
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.7
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl -fk https://localhost/health || exit 1"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
start_period: 15s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# certctl Agent
|
||||
# ---------------------------------------------------------------------------
|
||||
# Polls the server for work, generates ECDSA P-256 keys locally,
|
||||
# deploys certs to NGINX via the shared volume, and discovers existing
|
||||
# certs in the NGINX cert directory.
|
||||
certctl-agent:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile.agent
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Go module download stage can reach
|
||||
# the public Go module proxy behind corporate proxies. Defaults to
|
||||
# empty; omit the variables from the host environment for un-proxied
|
||||
# builds and the behaviour is byte-identical to the pre-fix tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
container_name: certctl-test-agent
|
||||
depends_on:
|
||||
certctl-server:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
# HTTPS-Everywhere Phase 6: agent dials the server over TLS and validates
|
||||
# the self-signed cert against the CA bundle pinned by
|
||||
# CERTCTL_SERVER_CA_BUNDLE_PATH. Same env vars + container paths as
|
||||
# production compose so the agent binary code path (loadCABundle →
|
||||
# x509.CertPool → *tls.Config{RootCAs, MinVersion: TLS13}) is identical.
|
||||
CERTCTL_SERVER_URL: https://certctl-server:8443
|
||||
CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
|
||||
CERTCTL_API_KEY: test-key-2026
|
||||
CERTCTL_AGENT_NAME: test-agent-01
|
||||
CERTCTL_AGENT_ID: agent-test-01
|
||||
CERTCTL_KEYGEN_MODE: agent
|
||||
CERTCTL_LOG_LEVEL: debug
|
||||
CERTCTL_DISCOVERY_DIRS: /nginx-certs
|
||||
volumes:
|
||||
- agent_keys:/var/lib/certctl/keys
|
||||
- nginx_certs:/nginx-certs
|
||||
# HTTPS-Everywhere Phase 6: same bind mount as the server, same path,
|
||||
# so /etc/certctl/tls/ca.crt resolves to the identical bytes. This is
|
||||
# the only way the CN=certctl-server cert validates on the agent side.
|
||||
- ./test/certs:/etc/certctl/tls:ro
|
||||
networks:
|
||||
certctl-test:
|
||||
ipv4_address: 10.30.50.8
|
||||
restart: unless-stopped
|
||||
|
||||
# =============================================================================
|
||||
# Network
|
||||
# =============================================================================
|
||||
# Static IPs are required because:
|
||||
# - Pebble needs to know the challtestsrv DNS server address (10.30.50.3)
|
||||
# - challtestsrv resolves all domains to certctl-server (10.30.50.6) for HTTP-01 challenges
|
||||
# - Avoids DNS race conditions during startup
|
||||
networks:
|
||||
certctl-test:
|
||||
driver: bridge
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 10.30.50.0/24
|
||||
|
||||
# =============================================================================
|
||||
# Volumes
|
||||
# =============================================================================
|
||||
volumes:
|
||||
test_postgres_data:
|
||||
driver: local
|
||||
stepca_data:
|
||||
driver: local
|
||||
agent_keys:
|
||||
driver: local
|
||||
nginx_certs:
|
||||
driver: local
|
||||
@@ -1,5 +1,81 @@
|
||||
services:
|
||||
# HTTPS-Everywhere Phase 3 — self-signed TLS bootstrap (init container).
|
||||
# Generates a CN=certctl-server ECDSA-P256 (SHA-256 signature) cert with
|
||||
# the SAN list locked by milestone §3.6 on first boot; subsequent boots
|
||||
# see the cert already present in the `certs` named volume and no-op out.
|
||||
# Server + agent mount the volume read-only. Destroy via `docker compose
|
||||
# down -v` to force regeneration. This bootstrap is for docker-compose
|
||||
# demos and local dev only; Helm operators supply a Secret / cert-manager
|
||||
# Certificate per docs/tls.md.
|
||||
#
|
||||
# Rationale for ECDSA-P256 (was ed25519 pre-v2.0.48): Apple's TLS stack
|
||||
# — Safari Network Framework and the macOS-bundled LibreSSL 3.3.6
|
||||
# /usr/bin/curl — does not advertise ed25519 in the ClientHello
|
||||
# signature_algorithms extension for server certs, yielding "tls: peer
|
||||
# doesn't support any of the certificate's signature algorithms" at
|
||||
# handshake. ECDSA-P256 with SHA-256 is universally supported. See
|
||||
# docs/tls.md Pattern 1.
|
||||
certctl-tls-init:
|
||||
image: alpine/openssl:latest
|
||||
container_name: certctl-tls-init
|
||||
restart: "no"
|
||||
entrypoint: /bin/sh
|
||||
command:
|
||||
- -c
|
||||
- |
|
||||
set -eu
|
||||
CERT=/etc/certctl/tls/server.crt
|
||||
KEY=/etc/certctl/tls/server.key
|
||||
CA=/etc/certctl/tls/ca.crt
|
||||
if [ -f "$$CERT" ] && [ -f "$$KEY" ] && [ -f "$$CA" ]; then
|
||||
echo "TLS cert already present at $$CERT — skipping generation"
|
||||
else
|
||||
mkdir -p /etc/certctl/tls
|
||||
openssl req -x509 -newkey ec \
|
||||
-pkeyopt ec_paramgen_curve:P-256 \
|
||||
-nodes \
|
||||
-keyout "$$KEY" \
|
||||
-out "$$CERT" \
|
||||
-days 3650 \
|
||||
-subj "/CN=certctl-server" \
|
||||
-addext "subjectAltName=DNS:certctl-server,DNS:localhost,IP:127.0.0.1,IP:::1"
|
||||
cp "$$CERT" "$$CA"
|
||||
echo "Generated self-signed TLS cert for certctl-server (ECDSA-P256/SHA-256, 3650d, CN=certctl-server)"
|
||||
fi
|
||||
# certctl binary runs as UID 1000 inside the server container per
|
||||
# Dockerfile:64-65; the cert + key must be readable by that UID.
|
||||
chown 1000:1000 "$$CERT" "$$KEY" "$$CA"
|
||||
chmod 0644 "$$CERT" "$$CA"
|
||||
chmod 0600 "$$KEY"
|
||||
volumes:
|
||||
- certs:/etc/certctl/tls
|
||||
networks:
|
||||
- certctl-network
|
||||
|
||||
# PostgreSQL database
|
||||
#
|
||||
# U-3 (P1, cat-u-seed_initdb_schema_drift, GitHub #10):
|
||||
# Pre-U-3 this stack mounted a hand-curated subset of `migrations/*.up.sql`
|
||||
# plus `seed.sql` into `/docker-entrypoint-initdb.d/`, and postgres
|
||||
# initdb-applied them on first boot. The mount list rotted every time a
|
||||
# new migration shipped that the seed depended on (000013 added
|
||||
# policy_rules.severity, 000017 renames retry_interval_minutes, etc.) —
|
||||
# initdb crashed, the container reported `unhealthy` indefinitely, and
|
||||
# `docker compose -f deploy/docker-compose.yml up -d --build` from a
|
||||
# fresh clone of v2.0.50 hit it on the first try.
|
||||
#
|
||||
# Post-U-3 the schema is built EXCLUSIVELY by the server at startup via
|
||||
# internal/repository/postgres.RunMigrations + RunSeed. Single source of
|
||||
# truth, no list to keep in sync. Postgres comes up empty; the server
|
||||
# waits for it healthy, then applies the full migration ladder + seed in
|
||||
# one shot. Helm + the dev examples were already runtime-only (Path B)
|
||||
# and worked through the same window.
|
||||
#
|
||||
# `start_period: 30s` gives postgres room to bootstrap on slow runners
|
||||
# (CI macOS, low-spec laptops) before the healthcheck failure counter
|
||||
# starts ticking. Pre-U-3 a slow first-init combined with the
|
||||
# `unhealthy` flap to cascade into certctl-server's `service_healthy`
|
||||
# depends_on, blocking the whole stack.
|
||||
postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: certctl-postgres
|
||||
@@ -11,9 +87,6 @@ services:
|
||||
- "5432:5432"
|
||||
volumes:
|
||||
- postgres_data:/var/lib/postgresql/data
|
||||
- ../migrations/000001_initial_schema.up.sql:/docker-entrypoint-initdb.d/001_schema.sql
|
||||
- ../migrations/seed.sql:/docker-entrypoint-initdb.d/002_seed.sql
|
||||
- ../migrations/seed_demo.sql:/docker-entrypoint-initdb.d/003_seed_demo.sql
|
||||
networks:
|
||||
- certctl-network
|
||||
healthcheck:
|
||||
@@ -21,6 +94,7 @@ services:
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
|
||||
# Certctl Server (API + scheduler)
|
||||
@@ -28,26 +102,49 @@ services:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Node frontend stage and Go module
|
||||
# download can reach the public registries behind corporate proxies.
|
||||
# Defaults to empty; omit the variables from the host environment for
|
||||
# un-proxied builds and the behaviour is byte-identical to the pre-fix
|
||||
# tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
container_name: certctl-server
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
certctl-tls-init:
|
||||
condition: service_completed_successfully
|
||||
environment:
|
||||
CERTCTL_DATABASE_URL: postgres://certctl:${POSTGRES_PASSWORD:-certctl}@postgres:5432/certctl?sslmode=disable
|
||||
CERTCTL_SERVER_HOST: 0.0.0.0
|
||||
CERTCTL_SERVER_PORT: 8443
|
||||
CERTCTL_SERVER_TLS_CERT_PATH: /etc/certctl/tls/server.crt
|
||||
CERTCTL_SERVER_TLS_KEY_PATH: /etc/certctl/tls/server.key
|
||||
CERTCTL_LOG_LEVEL: info
|
||||
CERTCTL_AUTH_TYPE: none
|
||||
CERTCTL_KEYGEN_MODE: server # Demo uses server-side keygen; production should use "agent"
|
||||
CERTCTL_NETWORK_SCAN_ENABLED: "true" # Enable network scan GUI with seeded demo targets
|
||||
CERTCTL_CONFIG_ENCRYPTION_KEY: ${CERTCTL_CONFIG_ENCRYPTION_KEY:-change-me-32-char-encryption-key} # AES-256-GCM for dynamic issuer/target config
|
||||
ports:
|
||||
- "8443:8443"
|
||||
volumes:
|
||||
- certs:/etc/certctl/tls:ro
|
||||
networks:
|
||||
- certctl-network
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8443/health"]
|
||||
test: ["CMD", "curl", "--cacert", "/etc/certctl/tls/ca.crt", "-f", "https://localhost:8443/health"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
# U-3: server boot now does RunMigrations + RunSeed before listening on
|
||||
# 8443. On a fresh clone the full migration ladder + seed application
|
||||
# can take ~10s on a small VM; start_period prevents the first few
|
||||
# healthcheck attempts from counting as failures while that work runs.
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
logging:
|
||||
driver: "json-file"
|
||||
@@ -65,17 +162,29 @@ services:
|
||||
build:
|
||||
context: ..
|
||||
dockerfile: Dockerfile.agent
|
||||
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
|
||||
# vars into the Docker build so the Go module download stage can reach
|
||||
# the public Go module proxy behind corporate proxies. Defaults to
|
||||
# empty; omit the variables from the host environment for un-proxied
|
||||
# builds and the behaviour is byte-identical to the pre-fix tree.
|
||||
args:
|
||||
HTTP_PROXY: ${HTTP_PROXY:-}
|
||||
HTTPS_PROXY: ${HTTPS_PROXY:-}
|
||||
NO_PROXY: ${NO_PROXY:-}
|
||||
container_name: certctl-agent
|
||||
depends_on:
|
||||
certctl-server:
|
||||
condition: service_healthy
|
||||
environment:
|
||||
CERTCTL_SERVER_URL: http://certctl-server:8443
|
||||
CERTCTL_SERVER_URL: https://certctl-server:8443
|
||||
CERTCTL_SERVER_CA_BUNDLE_PATH: /etc/certctl/tls/ca.crt
|
||||
CERTCTL_API_KEY: ${CERTCTL_API_KEY:-change-me-in-production}
|
||||
CERTCTL_AGENT_NAME: docker-agent
|
||||
CERTCTL_LOG_LEVEL: info
|
||||
CERTCTL_DISCOVERY_DIRS: /var/lib/certctl/keys # Agent scans this directory for existing certificates
|
||||
volumes:
|
||||
- agent_keys:/var/lib/certctl/keys
|
||||
- certs:/etc/certctl/tls:ro
|
||||
networks:
|
||||
- certctl-network
|
||||
healthcheck:
|
||||
@@ -104,3 +213,5 @@ volumes:
|
||||
driver: local
|
||||
agent_keys:
|
||||
driver: local
|
||||
certs:
|
||||
driver: local
|
||||
|
||||
@@ -0,0 +1,461 @@
|
||||
# Certctl Helm Chart - Complete Summary
|
||||
|
||||
## Overview
|
||||
|
||||
A production-ready Helm chart for deploying certctl (self-hosted certificate lifecycle management platform) on Kubernetes. The chart provides:
|
||||
|
||||
- High availability support with multi-replica deployments
|
||||
- Persistent PostgreSQL database with automatic schema migration
|
||||
- DaemonSet or Deployment-based agent deployment
|
||||
- Comprehensive security contexts and RBAC
|
||||
- Multiple deployment scenarios (dev, prod, HA, external DB)
|
||||
- Full documentation and examples
|
||||
|
||||
## Chart Metadata
|
||||
|
||||
- **Name**: certctl
|
||||
- **Chart Version**: 0.1.0
|
||||
- **App Version**: 2.1.0
|
||||
- **Type**: application
|
||||
- **License**: BSL-1.1 (converts to Apache 2.0 in 2033)
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
deploy/helm/
|
||||
├── README.md # Main Helm chart documentation
|
||||
├── DEPLOYMENT_GUIDE.md # Step-by-step deployment guide
|
||||
├── CHART_SUMMARY.md # This file
|
||||
│
|
||||
├── certctl/
|
||||
│ ├── Chart.yaml # Chart metadata
|
||||
│ ├── values.yaml # Default configuration values
|
||||
│ ├── .helmignore # Files to ignore when building chart
|
||||
│ │
|
||||
│ └── templates/
|
||||
│ ├── _helpers.tpl # Helm template helper functions
|
||||
│ ├── NOTES.txt # Post-deployment notes
|
||||
│ │
|
||||
│ ├── server-deployment.yaml # Certctl API server deployment
|
||||
│ ├── server-service.yaml # Server Kubernetes service
|
||||
│ ├── server-configmap.yaml # Server configuration
|
||||
│ ├── server-secret.yaml # Server secrets (API key, DB password, etc)
|
||||
│ │
|
||||
│ ├── postgres-statefulset.yaml # PostgreSQL database statefulset
|
||||
│ ├── postgres-service.yaml # PostgreSQL headless service
|
||||
│ ├── postgres-secret.yaml # Database credentials secret
|
||||
│ │
|
||||
│ ├── agent-daemonset.yaml # Certctl agent daemonset/deployment
|
||||
│ ├── agent-configmap.yaml # Agent configuration
|
||||
│ │
|
||||
│ ├── ingress.yaml # Optional ingress resource
|
||||
│ └── serviceaccount.yaml # ServiceAccount and RBAC
|
||||
│
|
||||
└── examples/
|
||||
├── values-dev.yaml # Development/testing configuration
|
||||
├── values-prod-ha.yaml # Production HA configuration
|
||||
├── values-external-db.yaml # External PostgreSQL (RDS, Cloud SQL)
|
||||
└── values-acme-dns01.yaml # ACME with DNS-01 (Let's Encrypt)
|
||||
```
|
||||
|
||||
## Key Components
|
||||
|
||||
### 1. Server Deployment
|
||||
|
||||
**File**: `templates/server-deployment.yaml`
|
||||
|
||||
- Manages certctl API server instances
|
||||
- Configurable replicas (default: 1)
|
||||
- Health checks (liveness & readiness probes)
|
||||
- Security context: non-root user, read-only filesystem
|
||||
- Resource limits (default: 500m CPU, 512Mi memory)
|
||||
- Automatic restart on failure
|
||||
|
||||
**Values**:
|
||||
```yaml
|
||||
server:
|
||||
replicas: 1
|
||||
port: 8443
|
||||
auth:
|
||||
type: api-key
|
||||
apiKey: "REQUIRED"
|
||||
resources:
|
||||
requests: {cpu: 100m, memory: 128Mi}
|
||||
limits: {cpu: 500m, memory: 512Mi}
|
||||
```
|
||||
|
||||
### 2. PostgreSQL StatefulSet
|
||||
|
||||
**File**: `templates/postgres-statefulset.yaml`
|
||||
|
||||
- Persistent database storage
|
||||
- Automatic schema migrations on startup
|
||||
- Single replica (can be extended with external HA tools)
|
||||
- Health checks via pg_isready
|
||||
- Configurable storage size and class
|
||||
- Security context: non-root user (UID 999)
|
||||
|
||||
**Values**:
|
||||
```yaml
|
||||
postgresql:
|
||||
enabled: true
|
||||
storage:
|
||||
size: 10Gi
|
||||
storageClass: "" # Use default
|
||||
auth:
|
||||
database: certctl
|
||||
username: certctl
|
||||
password: "REQUIRED"
|
||||
```
|
||||
|
||||
### 3. Agent DaemonSet/Deployment
|
||||
|
||||
**File**: `templates/agent-daemonset.yaml`
|
||||
|
||||
- DaemonSet mode: one agent per Kubernetes node
|
||||
- Deployment mode: custom number of agent replicas
|
||||
- Local key storage with secure permissions (0600)
|
||||
- Health checks and automatic restart
|
||||
- Optional certificate discovery from filesystem
|
||||
|
||||
**Values**:
|
||||
```yaml
|
||||
agent:
|
||||
enabled: true
|
||||
kind: DaemonSet # or Deployment
|
||||
replicas: 1 # for Deployment only
|
||||
keyDir: /var/lib/certctl/keys
|
||||
discoveryDirs: "/etc/ssl/certs" # optional
|
||||
```
|
||||
|
||||
### 4. Ingress (Optional)
|
||||
|
||||
**File**: `templates/ingress.yaml`
|
||||
|
||||
- Optional HTTPS ingress
|
||||
- cert-manager integration for automatic TLS
|
||||
- Multiple host support
|
||||
- Path-based routing
|
||||
|
||||
**Values**:
|
||||
```yaml
|
||||
ingress:
|
||||
enabled: false
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
```
|
||||
|
||||
### 5. ConfigMaps and Secrets
|
||||
|
||||
**Files**:
|
||||
- `server-configmap.yaml` - Non-secret server configuration
|
||||
- `server-secret.yaml` - API key, database URL, SMTP password
|
||||
- `postgres-secret.yaml` - Database credentials
|
||||
- `agent-configmap.yaml` - Agent configuration
|
||||
|
||||
All secrets are base64-encoded and stored in Kubernetes Secrets.
|
||||
|
||||
### 6. ServiceAccount and RBAC
|
||||
|
||||
**File**: `templates/serviceaccount.yaml`
|
||||
|
||||
- Optional ServiceAccount creation
|
||||
- Optional RBAC (ClusterRole, ClusterRoleBinding)
|
||||
- Namespace-scoped by default
|
||||
|
||||
## Deployment Scenarios
|
||||
|
||||
### Development Setup
|
||||
|
||||
Use `examples/values-dev.yaml`:
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-dev.yaml \
|
||||
--set server.auth.apiKey="dev-key" \
|
||||
--set postgresql.auth.password="dev-password"
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Single server replica
|
||||
- Demo auth (no API key required)
|
||||
- Small database (5Gi)
|
||||
- LoadBalancer service for easy access
|
||||
- Debug logging level
|
||||
|
||||
### Production HA Setup
|
||||
|
||||
Use `examples/values-prod-ha.yaml`:
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 32)"
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- 3 server replicas with pod anti-affinity
|
||||
- Large database storage (100Gi)
|
||||
- Pod disruption budgets
|
||||
- Prometheus monitoring enabled
|
||||
- Production resource limits
|
||||
|
||||
### External PostgreSQL
|
||||
|
||||
Use `examples/values-external-db.yaml`:
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-external-db.yaml \
|
||||
--set postgresql.enabled=false \
|
||||
--set 'server.env.CERTCTL_DATABASE_URL=postgres://...'
|
||||
```
|
||||
|
||||
**Use cases**:
|
||||
- AWS RDS
|
||||
- Google Cloud SQL
|
||||
- Azure Database for PostgreSQL
|
||||
- External self-managed PostgreSQL
|
||||
|
||||
### ACME with DNS-01
|
||||
|
||||
Use `examples/values-acme-dns01.yaml`:
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-acme-dns01.yaml
|
||||
```
|
||||
|
||||
**Enables**:
|
||||
- Automatic certificate issuance from Let's Encrypt
|
||||
- DNS-01 challenge (wildcard support)
|
||||
- Custom DNS provider scripts
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Server Configuration
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `server.replicas` | 1 | Number of server replicas |
|
||||
| `server.port` | 8443 | Server port |
|
||||
| `server.auth.type` | api-key | Authentication type — `api-key` or `none` (G-1: `jwt` removed; for JWT/OIDC use a fronting authenticating gateway, see `docs/architecture.md` and `docs/upgrade-to-v2-jwt-removal.md`) |
|
||||
| `server.auth.apiKey` | "" | API key (REQUIRED when `auth.type=api-key`) |
|
||||
| `server.logging.level` | info | Log level |
|
||||
| `server.logging.format` | json | Log format |
|
||||
|
||||
### PostgreSQL Configuration
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `postgresql.enabled` | true | Enable internal PostgreSQL |
|
||||
| `postgresql.storage.size` | 10Gi | Database storage size |
|
||||
| `postgresql.storage.storageClass` | "" | Storage class name |
|
||||
| `postgresql.auth.password` | "" | Database password (REQUIRED) |
|
||||
|
||||
### Agent Configuration
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `agent.enabled` | true | Deploy agents |
|
||||
| `agent.kind` | DaemonSet | DaemonSet or Deployment |
|
||||
| `agent.replicas` | 1 | Replicas (Deployment only) |
|
||||
| `agent.keyDir` | /var/lib/certctl/keys | Key storage directory |
|
||||
|
||||
### Issuer Configuration
|
||||
|
||||
| Option | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| `server.issuer.local.enabled` | true | Enable Local CA |
|
||||
| `server.issuer.acme.enabled` | false | Enable ACME |
|
||||
| `server.issuer.acme.directoryURL` | "" | ACME directory URL |
|
||||
| `server.issuer.acme.email` | "" | ACME email |
|
||||
| `server.issuer.acme.challengeType` | http-01 | Challenge type |
|
||||
|
||||
See `values.yaml` for complete configuration options.
|
||||
|
||||
## Helm Template Functions
|
||||
|
||||
Defined in `templates/_helpers.tpl`:
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `certctl.name` | Chart name |
|
||||
| `certctl.fullname` | Full release name |
|
||||
| `certctl.chart` | Chart name and version |
|
||||
| `certctl.labels` | Common labels |
|
||||
| `certctl.selectorLabels` | Selector labels |
|
||||
| `certctl.serverSelectorLabels` | Server selector labels |
|
||||
| `certctl.agentSelectorLabels` | Agent selector labels |
|
||||
| `certctl.postgresSelectorLabels` | PostgreSQL selector labels |
|
||||
| `certctl.serviceAccountName` | ServiceAccount name |
|
||||
| `certctl.serverImage` | Server image URI |
|
||||
| `certctl.agentImage` | Agent image URI |
|
||||
| `certctl.postgresImage` | PostgreSQL image URI |
|
||||
| `certctl.databaseURL` | Database connection string |
|
||||
| `certctl.serverURL` | Server URL for agents |
|
||||
|
||||
## Security Features
|
||||
|
||||
### Pod Security
|
||||
|
||||
- Non-root users (UID 1000 for app, UID 999 for PostgreSQL)
|
||||
- Read-only root filesystems
|
||||
- No privilege escalation
|
||||
- Dropped capabilities (ALL)
|
||||
- Resource limits to prevent DoS
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- All sensitive data in Kubernetes Secrets
|
||||
- Base64 encoded at rest
|
||||
- Can be integrated with:
|
||||
- sealed-secrets
|
||||
- external-secrets
|
||||
- Vault
|
||||
- AWS Secrets Manager
|
||||
|
||||
### RBAC
|
||||
|
||||
- ServiceAccount per release
|
||||
- Optional ClusterRole/ClusterRoleBinding
|
||||
- Extensible for custom permissions
|
||||
|
||||
### Network Security
|
||||
|
||||
- Support for Kubernetes NetworkPolicies
|
||||
- Service-to-service communication via internal DNS
|
||||
- Optional Ingress with TLS
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Health Checks
|
||||
|
||||
- Liveness probes (detect dead containers)
|
||||
- Readiness probes (detect not-ready services)
|
||||
- HTTP endpoints: `/health`, `/readyz`
|
||||
|
||||
### Logging
|
||||
|
||||
- Structured JSON logging
|
||||
- Request ID propagation
|
||||
- Configurable log levels (debug, info, warn, error)
|
||||
|
||||
### Metrics
|
||||
|
||||
- Prometheus metrics endpoint: `/api/v1/metrics/prometheus`
|
||||
- Optional ServiceMonitor for Prometheus Operator
|
||||
- Built-in metrics:
|
||||
- Certificate counts by status
|
||||
- Agent counts and status
|
||||
- Job completion/failure rates
|
||||
- Server uptime
|
||||
|
||||
## Installation Quick Reference
|
||||
|
||||
```bash
|
||||
# Development
|
||||
helm install certctl certctl/ \
|
||||
--set server.auth.apiKey=dev \
|
||||
--set postgresql.auth.password=dev
|
||||
|
||||
# Production HA
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 32)"
|
||||
|
||||
# External database
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-external-db.yaml \
|
||||
--set postgresql.enabled=false \
|
||||
--set 'server.env.CERTCTL_DATABASE_URL=postgres://...'
|
||||
|
||||
# ACME with Let's Encrypt
|
||||
helm install certctl certctl/ \
|
||||
--set server.issuer.acme.enabled=true \
|
||||
--set server.issuer.acme.directoryURL=https://acme-v02.api.letsencrypt.org/directory
|
||||
|
||||
# Check status
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
kubectl logs -l app.kubernetes.io/component=server -f
|
||||
|
||||
# Upgrade
|
||||
helm upgrade certctl certctl/ -f new-values.yaml
|
||||
|
||||
# Uninstall
|
||||
helm uninstall certctl
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Use Secrets Management
|
||||
|
||||
```bash
|
||||
# Use sealed-secrets
|
||||
kubectl create secret generic certctl-secrets \
|
||||
--from-literal=api-key="$(openssl rand -base64 32)" \
|
||||
--dry-run=client -o yaml | kubeseal -f - | kubectl apply -f -
|
||||
```
|
||||
|
||||
### 2. Configure Resource Limits
|
||||
|
||||
Match limits to your cluster capacity:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
resources:
|
||||
requests: {cpu: 250m, memory: 256Mi}
|
||||
limits: {cpu: 1000m, memory: 512Mi}
|
||||
```
|
||||
|
||||
### 3. Enable HA for Production
|
||||
|
||||
```yaml
|
||||
server:
|
||||
replicas: 3
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution: [...]
|
||||
```
|
||||
|
||||
### 4. Use Persistent Storage
|
||||
|
||||
```yaml
|
||||
postgresql:
|
||||
storage:
|
||||
size: 100Gi
|
||||
storageClass: fast-ssd
|
||||
```
|
||||
|
||||
### 5. Enable Monitoring
|
||||
|
||||
```yaml
|
||||
monitoring:
|
||||
enabled: true
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
- **README.md** - Complete Helm chart documentation
|
||||
- **DEPLOYMENT_GUIDE.md** - Step-by-step deployment instructions
|
||||
- **values.yaml** - Commented configuration reference
|
||||
|
||||
## Support
|
||||
|
||||
For issues, questions, or contributions:
|
||||
- GitHub: https://github.com/shankar0123/certctl
|
||||
- Documentation: https://github.com/shankar0123/certctl/tree/main/docs
|
||||
|
||||
## License
|
||||
|
||||
BSL-1.1 (Business Source License)
|
||||
Converts to Apache 2.0 on March 14, 2033
|
||||
@@ -0,0 +1,518 @@
|
||||
# Certctl Helm Deployment Guide
|
||||
|
||||
Complete guide for deploying certctl on Kubernetes with Helm.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [Installation Methods](#installation-methods)
|
||||
3. [Production Deployment](#production-deployment)
|
||||
4. [Configuration Examples](#configuration-examples)
|
||||
5. [Post-Deployment Setup](#post-deployment-setup)
|
||||
6. [Monitoring and Logging](#monitoring-and-logging)
|
||||
7. [Maintenance](#maintenance)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Required Tools
|
||||
|
||||
```bash
|
||||
# Verify Kubernetes cluster access
|
||||
kubectl cluster-info
|
||||
kubectl get nodes
|
||||
|
||||
# Install Helm (if not already installed)
|
||||
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
|
||||
helm version
|
||||
|
||||
# Verify Helm installation
|
||||
helm repo list
|
||||
```
|
||||
|
||||
### Kubernetes Requirements
|
||||
|
||||
- Kubernetes 1.19 or later
|
||||
- At least 2GB available memory
|
||||
- At least 10GB available storage (for PostgreSQL)
|
||||
- Network policies support (optional, for security)
|
||||
- Ingress controller (nginx, istio, etc.) - optional
|
||||
|
||||
### Create Namespace
|
||||
|
||||
```bash
|
||||
# Create isolated namespace
|
||||
kubectl create namespace certctl
|
||||
|
||||
# Set as default namespace
|
||||
kubectl config set-context --current --namespace=certctl
|
||||
|
||||
# Label for network policies (optional)
|
||||
kubectl label namespace certctl certctl-ns=true
|
||||
```
|
||||
|
||||
## Installation Methods
|
||||
|
||||
### Method 1: Minimal Development Setup
|
||||
|
||||
Perfect for testing and development:
|
||||
|
||||
```bash
|
||||
# Install with minimal configuration
|
||||
helm install certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
--set server.auth.apiKey="dev-key-change-in-production" \
|
||||
--set postgresql.auth.password="dev-password-change-in-production"
|
||||
|
||||
# Wait for deployment
|
||||
kubectl rollout status deployment/certctl-server
|
||||
kubectl rollout status statefulset/certctl-postgres
|
||||
```
|
||||
|
||||
### Method 2: Production HA Setup
|
||||
|
||||
For production workloads:
|
||||
|
||||
```bash
|
||||
# Generate secure credentials
|
||||
API_KEY=$(openssl rand -base64 32)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
# Install with HA configuration
|
||||
helm install certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
--values deploy/helm/examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$API_KEY" \
|
||||
--set postgresql.auth.password="$DB_PASSWORD"
|
||||
```
|
||||
|
||||
### Method 3: External PostgreSQL
|
||||
|
||||
Using managed database service:
|
||||
|
||||
```bash
|
||||
# Install with external database
|
||||
helm install certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
--values deploy/helm/examples/values-external-db.yaml \
|
||||
--set server.auth.apiKey="$API_KEY" \
|
||||
--set 'server.env.CERTCTL_DATABASE_URL=postgres://user:pass@db.example.com:5432/certctl?sslmode=require'
|
||||
```
|
||||
|
||||
### Method 4: Using Custom values.yaml
|
||||
|
||||
Recommended for GitOps workflows:
|
||||
|
||||
```bash
|
||||
# Create values file with secrets management
|
||||
cat > /tmp/certctl-values.yaml <<EOF
|
||||
server:
|
||||
auth:
|
||||
apiKey: "$API_KEY"
|
||||
logging:
|
||||
level: info
|
||||
|
||||
postgresql:
|
||||
auth:
|
||||
password: "$DB_PASSWORD"
|
||||
storage:
|
||||
size: 50Gi
|
||||
|
||||
agent:
|
||||
enabled: true
|
||||
kind: DaemonSet
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
EOF
|
||||
|
||||
# Install using values file
|
||||
helm install certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
--values /tmp/certctl-values.yaml
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### Step 1: Prepare Environment
|
||||
|
||||
```bash
|
||||
# Create namespace
|
||||
kubectl create namespace certctl
|
||||
cd deploy/helm
|
||||
|
||||
# Generate credentials
|
||||
API_KEY=$(openssl rand -base64 32)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
echo "API Key: $API_KEY"
|
||||
echo "DB Password: $DB_PASSWORD"
|
||||
|
||||
# Save credentials in secure location (e.g., 1Password, Vault, AWS Secrets Manager)
|
||||
```
|
||||
|
||||
### Step 2: Prepare Storage
|
||||
|
||||
```bash
|
||||
# List available storage classes
|
||||
kubectl get storageclass
|
||||
|
||||
# If needed, create a high-performance storage class for production
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: fast-ssd
|
||||
provisioner: ebs.csi.aws.com # For AWS, adjust for your cloud provider
|
||||
parameters:
|
||||
type: gp3
|
||||
iops: "3000"
|
||||
throughput: "125"
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 3: Set Up TLS with cert-manager
|
||||
|
||||
```bash
|
||||
# Install cert-manager (if not already installed)
|
||||
helm repo add jetstack https://charts.jetstack.io
|
||||
helm repo update
|
||||
helm install cert-manager jetstack/cert-manager \
|
||||
--namespace cert-manager \
|
||||
--create-namespace \
|
||||
--set installCRDs=true
|
||||
|
||||
# Create ClusterIssuer for Let's Encrypt
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: letsencrypt-prod
|
||||
spec:
|
||||
acme:
|
||||
server: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: admin@example.com
|
||||
privateKeySecretRef:
|
||||
name: letsencrypt-prod
|
||||
solvers:
|
||||
- http01:
|
||||
ingress:
|
||||
class: nginx
|
||||
EOF
|
||||
```
|
||||
|
||||
### Step 4: Install Certctl
|
||||
|
||||
```bash
|
||||
# Install using HA values
|
||||
helm install certctl certctl/ \
|
||||
--namespace certctl \
|
||||
--values examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$API_KEY" \
|
||||
--set postgresql.auth.password="$DB_PASSWORD" \
|
||||
--set ingress.annotations."cert-manager\.io/cluster-issuer"=letsencrypt-prod \
|
||||
--set ingress.hosts[0].host=certctl.example.com
|
||||
|
||||
# Verify installation
|
||||
kubectl get all -l app.kubernetes.io/instance=certctl
|
||||
```
|
||||
|
||||
### Step 5: Verify Deployment
|
||||
|
||||
```bash
|
||||
# Check pod status
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
kubectl describe pods -l app.kubernetes.io/instance=certctl
|
||||
|
||||
# Check service status
|
||||
kubectl get svc -l app.kubernetes.io/instance=certctl
|
||||
|
||||
# Check ingress status
|
||||
kubectl get ingress
|
||||
kubectl describe ingress certctl
|
||||
|
||||
# Test API connectivity (HTTPS-only as of v2.2)
|
||||
POD=$(kubectl get pods -l app.kubernetes.io/component=server -o jsonpath='{.items[0].metadata.name}')
|
||||
kubectl port-forward $POD 8443:8443 &
|
||||
# If the chart provisioned a self-signed cert, fetch the CA bundle from the TLS secret first:
|
||||
# kubectl get secret certctl-server-tls -o jsonpath='{.data.ca\.crt}' | base64 -d > /tmp/certctl-ca.crt
|
||||
curl --cacert /tmp/certctl-ca.crt -H "Authorization: Bearer $API_KEY" https://localhost:8443/health
|
||||
```
|
||||
|
||||
### Step 6: Access the Dashboard
|
||||
|
||||
```bash
|
||||
# Port forward to local machine
|
||||
kubectl port-forward svc/certctl-server 8443:8443 &
|
||||
|
||||
# Or if using Ingress:
|
||||
# Open browser: https://certctl.example.com
|
||||
# Login with API key: $API_KEY
|
||||
```
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
### Example 1: ACME (Let's Encrypt)
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set server.issuer.acme.enabled=true \
|
||||
--set server.issuer.acme.directoryURL=https://acme-v02.api.letsencrypt.org/directory \
|
||||
--set server.issuer.acme.email=admin@example.com \
|
||||
--set server.issuer.acme.challengeType=http-01
|
||||
```
|
||||
|
||||
### Example 2: DNS-01 (Wildcard Certs)
|
||||
|
||||
Requires DNS scripts ConfigMap:
|
||||
|
||||
```bash
|
||||
# Create DNS scripts ConfigMap
|
||||
kubectl create configmap dns-scripts \
|
||||
--from-file=dns-present.sh=./scripts/dns-present.sh \
|
||||
--from-file=dns-cleanup.sh=./scripts/dns-cleanup.sh
|
||||
|
||||
# Install with DNS-01
|
||||
helm install certctl certctl/ \
|
||||
--set server.issuer.acme.enabled=true \
|
||||
--set server.issuer.acme.challengeType=dns-01 \
|
||||
--values examples/values-acme-dns01.yaml
|
||||
```
|
||||
|
||||
### Example 3: AWS RDS Database
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set postgresql.enabled=false \
|
||||
--set 'server.env.CERTCTL_DATABASE_URL=postgres://user:password@mydb.c9akciq32.us-east-1.rds.amazonaws.com:5432/certctl?sslmode=require'
|
||||
```
|
||||
|
||||
### Example 4: Multiple Issuers
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set server.issuer.local.enabled=true \
|
||||
--set server.issuer.acme.enabled=true \
|
||||
--set server.issuer.acme.directoryURL=https://acme-v02.api.letsencrypt.org/directory
|
||||
```
|
||||
|
||||
### Example 5: Email Notifications
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set server.smtp.enabled=true \
|
||||
--set server.smtp.host=smtp.example.com \
|
||||
--set server.smtp.port=587 \
|
||||
--set server.smtp.username=alerts@example.com \
|
||||
--set server.smtp.password="$SMTP_PASSWORD" \
|
||||
--set server.smtp.fromAddress=certctl@example.com
|
||||
```
|
||||
|
||||
## Post-Deployment Setup
|
||||
|
||||
### 1. Initial Database Setup
|
||||
|
||||
```bash
|
||||
# Check database connection
|
||||
POD=$(kubectl get pods -l app.kubernetes.io/component=postgres -o jsonpath='{.items[0].metadata.name}')
|
||||
|
||||
# Execute psql commands
|
||||
kubectl exec -it $POD -- \
|
||||
psql -U certctl -d certctl -c '\dt'
|
||||
|
||||
# View database status
|
||||
kubectl logs $POD | tail -20
|
||||
```
|
||||
|
||||
### 2. Create Default Certificates
|
||||
|
||||
```bash
|
||||
# Port forward to API
|
||||
kubectl port-forward svc/certctl-server 8443:8443 &
|
||||
|
||||
# Create a test certificate (HTTPS-only as of v2.2 — pin the chart-provisioned CA bundle)
|
||||
# kubectl get secret certctl-server-tls -o jsonpath='{.data.ca\.crt}' | base64 -d > /tmp/certctl-ca.crt
|
||||
API_KEY="your-api-key"
|
||||
curl --cacert /tmp/certctl-ca.crt -X POST https://localhost:8443/api/v1/certificates \
|
||||
-H "Authorization: Bearer $API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"common_name": "test.example.com",
|
||||
"sans": ["test.example.com", "*.example.com"],
|
||||
"owner": "admin@example.com"
|
||||
}'
|
||||
```
|
||||
|
||||
### 3. Configure Agents
|
||||
|
||||
```bash
|
||||
# Get agent names
|
||||
kubectl get pods -l app.kubernetes.io/component=agent -o wide
|
||||
|
||||
# Check agent connectivity
|
||||
POD=$(kubectl get pods -l app.kubernetes.io/component=agent -o jsonpath='{.items[0].metadata.name}')
|
||||
kubectl logs $POD | grep -i heartbeat
|
||||
```
|
||||
|
||||
### 4. Set Up HTTPS for Web Dashboard
|
||||
|
||||
The Ingress will handle TLS if configured properly:
|
||||
|
||||
```bash
|
||||
# Verify ingress is ready
|
||||
kubectl get ingress
|
||||
kubectl describe ingress certctl
|
||||
|
||||
# Test HTTPS
|
||||
curl https://certctl.example.com/health
|
||||
```
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
### 1. View Logs
|
||||
|
||||
```bash
|
||||
# Server logs
|
||||
kubectl logs -l app.kubernetes.io/component=server -f --all-containers=true
|
||||
|
||||
# PostgreSQL logs
|
||||
kubectl logs -l app.kubernetes.io/component=postgres -f
|
||||
|
||||
# Agent logs
|
||||
kubectl logs -l app.kubernetes.io/component=agent -f --all-containers=true
|
||||
|
||||
# Logs from all components
|
||||
kubectl logs -l app.kubernetes.io/instance=certctl -f --all-containers=true
|
||||
```
|
||||
|
||||
### 2. Install Prometheus Monitoring
|
||||
|
||||
```bash
|
||||
# Install Prometheus operator (if not already installed)
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm repo update
|
||||
|
||||
helm install prometheus prometheus-community/kube-prometheus-stack \
|
||||
--namespace monitoring \
|
||||
--create-namespace
|
||||
|
||||
# Certctl will automatically expose metrics if monitoring.enabled=true
|
||||
helm install certctl certctl/ \
|
||||
--set monitoring.enabled=true \
|
||||
--set monitoring.serviceMonitor.enabled=true
|
||||
```
|
||||
|
||||
### 3. Set Up Alerts
|
||||
|
||||
```bash
|
||||
# Create Prometheus alerts
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: certctl-alerts
|
||||
spec:
|
||||
groups:
|
||||
- name: certctl
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: CertctlServerDown
|
||||
expr: up{job="certctl-server"} == 0
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: "Certctl server is down"
|
||||
|
||||
- alert: CertificateExpiringSoon
|
||||
expr: certctl_certificate_expiring_soon > 0
|
||||
for: 1h
|
||||
annotations:
|
||||
summary: "{{ \$value }} certificates expiring soon"
|
||||
EOF
|
||||
```
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Scaling
|
||||
|
||||
```bash
|
||||
# Scale server replicas
|
||||
helm upgrade certctl certctl/ \
|
||||
--set server.replicas=5
|
||||
|
||||
# Scale agents (Deployment kind only)
|
||||
helm upgrade certctl certctl/ \
|
||||
--set agent.kind=Deployment \
|
||||
--set agent.replicas=10
|
||||
```
|
||||
|
||||
### Updating
|
||||
|
||||
```bash
|
||||
# Update chart version
|
||||
helm repo update
|
||||
helm upgrade certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
-f values.yaml
|
||||
|
||||
# Verify update
|
||||
kubectl rollout status deployment/certctl-server
|
||||
kubectl rollout status statefulset/certctl-postgres
|
||||
```
|
||||
|
||||
### Backup and Restore
|
||||
|
||||
```bash
|
||||
# Backup PostgreSQL data
|
||||
kubectl exec -i $(kubectl get pods -l app.kubernetes.io/component=postgres -o jsonpath='{.items[0].metadata.name}') \
|
||||
pg_dump -U certctl certctl | gzip > certctl-backup.sql.gz
|
||||
|
||||
# Restore from backup
|
||||
zcat certctl-backup.sql.gz | kubectl exec -i $(kubectl get pods -l app.kubernetes.io/component=postgres -o jsonpath='{.items[0].metadata.name}') \
|
||||
psql -U certctl certctl
|
||||
|
||||
# Backup PVC data
|
||||
kubectl get pvc
|
||||
kubectl exec -i $(kubectl get pods -l app.kubernetes.io/component=postgres -o jsonpath='{.items[0].metadata.name}') \
|
||||
tar czf - /var/lib/postgresql/data | gzip > certctl-data-backup.tar.gz
|
||||
```
|
||||
|
||||
### Uninstall
|
||||
|
||||
```bash
|
||||
# Remove Helm release (keeps PVCs by default)
|
||||
helm uninstall certctl --namespace certctl
|
||||
|
||||
# Delete PVCs if needed
|
||||
kubectl delete pvc --all -n certctl
|
||||
|
||||
# Delete namespace
|
||||
kubectl delete namespace certctl
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
See [README.md](README.md#troubleshooting) for detailed troubleshooting steps.
|
||||
|
||||
Common commands:
|
||||
|
||||
```bash
|
||||
# Get all resources
|
||||
kubectl get all -n certctl
|
||||
|
||||
# Describe pod for events
|
||||
kubectl describe pod <pod-name> -n certctl
|
||||
|
||||
# Stream logs
|
||||
kubectl logs -f <pod-name> -n certctl
|
||||
|
||||
# Execute commands in pod
|
||||
kubectl exec -it <pod-name> -n certctl -- /bin/sh
|
||||
|
||||
# Check events
|
||||
kubectl get events -n certctl --sort-by='.lastTimestamp'
|
||||
```
|
||||
@@ -0,0 +1,234 @@
|
||||
# Certctl Helm Chart - Complete File Index
|
||||
|
||||
## Navigation Guide
|
||||
|
||||
### Getting Started
|
||||
|
||||
1. **Start here**: `INSTALLATION.md` - Quick installation guide with one-liners
|
||||
2. **Full reference**: `README.md` - Complete Helm chart documentation
|
||||
3. **Detailed guide**: `DEPLOYMENT_GUIDE.md` - Step-by-step deployment walkthrough
|
||||
4. **Architecture**: `CHART_SUMMARY.md` - Technical overview and design
|
||||
|
||||
### Chart Directory Structure
|
||||
|
||||
```
|
||||
deploy/helm/
|
||||
│
|
||||
├── README.md Main documentation (15 KB)
|
||||
├── DEPLOYMENT_GUIDE.md Step-by-step guide (12 KB)
|
||||
├── CHART_SUMMARY.md Architecture & design (13 KB)
|
||||
├── INSTALLATION.md Quick start (2.2 KB)
|
||||
├── INDEX.md This file
|
||||
│
|
||||
├── certctl/ Helm chart package
|
||||
│ ├── Chart.yaml Chart metadata
|
||||
│ ├── values.yaml Default configuration (11 KB)
|
||||
│ ├── .helmignore Build ignore patterns
|
||||
│ │
|
||||
│ └── templates/ 15 Kubernetes resource templates
|
||||
│ ├── _helpers.tpl Helper functions
|
||||
│ ├── NOTES.txt Post-install notes
|
||||
│ ├── server-deployment.yaml API server
|
||||
│ ├── server-service.yaml Server networking
|
||||
│ ├── server-configmap.yaml Server configuration
|
||||
│ ├── server-secret.yaml Server secrets
|
||||
│ ├── postgres-statefulset.yaml Database
|
||||
│ ├── postgres-service.yaml Database networking
|
||||
│ ├── postgres-secret.yaml Database secrets
|
||||
│ ├── agent-daemonset.yaml Agents (DaemonSet/Deployment)
|
||||
│ ├── agent-configmap.yaml Agent configuration
|
||||
│ ├── ingress.yaml Optional HTTPS ingress
|
||||
│ └── serviceaccount.yaml RBAC resources
|
||||
│
|
||||
└── examples/ Example configurations
|
||||
├── values-dev.yaml Development setup
|
||||
├── values-prod-ha.yaml Production HA setup
|
||||
├── values-external-db.yaml External PostgreSQL
|
||||
└── values-acme-dns01.yaml ACME DNS-01 configuration
|
||||
```
|
||||
|
||||
## File Descriptions
|
||||
|
||||
### Documentation Files
|
||||
|
||||
| File | Purpose | Size |
|
||||
|------|---------|------|
|
||||
| `README.md` | Complete Helm chart documentation, configuration reference, security considerations | 15 KB |
|
||||
| `DEPLOYMENT_GUIDE.md` | Step-by-step installation instructions, production setup, troubleshooting | 12 KB |
|
||||
| `CHART_SUMMARY.md` | Technical overview, architecture, features, best practices | 13 KB |
|
||||
| `INSTALLATION.md` | Quick start guide, one-liner commands, verification steps | 2.2 KB |
|
||||
| `INDEX.md` | This file - complete file index and navigation | - |
|
||||
|
||||
### Chart Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `Chart.yaml` | Helm chart metadata (name, version, appVersion, license) |
|
||||
| `values.yaml` | Default configuration values with comprehensive comments |
|
||||
| `.helmignore` | Files to ignore when building the chart |
|
||||
|
||||
### Template Files
|
||||
|
||||
| File | Components Created |
|
||||
|------|-------------------|
|
||||
| `_helpers.tpl` | 14 Helm template helper functions |
|
||||
| `NOTES.txt` | Post-installation notes and instructions |
|
||||
| `server-deployment.yaml` | Certctl API server deployment (1-N replicas) |
|
||||
| `server-service.yaml` | Service exposing the server |
|
||||
| `server-configmap.yaml` | Non-secret server configuration |
|
||||
| `server-secret.yaml` | Secrets (API key, DB password, SMTP) |
|
||||
| `postgres-statefulset.yaml` | PostgreSQL database with persistent storage |
|
||||
| `postgres-service.yaml` | Headless service for PostgreSQL |
|
||||
| `postgres-secret.yaml` | Database credentials |
|
||||
| `agent-daemonset.yaml` | Certctl agents (DaemonSet or Deployment) |
|
||||
| `agent-configmap.yaml` | Agent configuration |
|
||||
| `ingress.yaml` | Optional HTTPS ingress resource |
|
||||
| `serviceaccount.yaml` | ServiceAccount and RBAC resources |
|
||||
|
||||
### Example Configuration Files
|
||||
|
||||
| File | Use Case | Features |
|
||||
|------|----------|----------|
|
||||
| `values-dev.yaml` | Development/testing | Single replica, debug logging, LoadBalancer, no auth |
|
||||
| `values-prod-ha.yaml` | Production HA | 3 replicas, pod anti-affinity, monitoring, large storage |
|
||||
| `values-external-db.yaml` | External PostgreSQL | AWS RDS, Cloud SQL, Azure Database, self-managed |
|
||||
| `values-acme-dns01.yaml` | Let's Encrypt | DNS-01 challenges, wildcard certs, custom DNS scripts |
|
||||
|
||||
## Quick Links
|
||||
|
||||
### Installation Commands
|
||||
|
||||
#### Development
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set server.auth.type=none \
|
||||
--set postgresql.auth.password=dev
|
||||
```
|
||||
|
||||
#### Production HA
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 32)"
|
||||
```
|
||||
|
||||
#### External Database
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-external-db.yaml \
|
||||
--set postgresql.enabled=false \
|
||||
--set 'server.env.CERTCTL_DATABASE_URL=postgres://...'
|
||||
```
|
||||
|
||||
### Verification Commands
|
||||
|
||||
```bash
|
||||
# Check chart syntax
|
||||
helm lint certctl/
|
||||
helm template certctl certctl/
|
||||
|
||||
# Install in cluster
|
||||
helm install certctl certctl/
|
||||
helm status certctl
|
||||
|
||||
# Check pod status
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
|
||||
# View logs
|
||||
kubectl logs -l app.kubernetes.io/component=server -f
|
||||
```
|
||||
|
||||
## Documentation Organization
|
||||
|
||||
### By User Role
|
||||
|
||||
**DevOps/Platform Engineers**
|
||||
- Start: `INSTALLATION.md`
|
||||
- Deep dive: `DEPLOYMENT_GUIDE.md`
|
||||
- Configuration reference: `README.md`
|
||||
|
||||
**Kubernetes Developers**
|
||||
- Architecture: `CHART_SUMMARY.md`
|
||||
- Configuration: `values.yaml`
|
||||
- Templates: `templates/`
|
||||
|
||||
**Security/SREs**
|
||||
- Security section: `README.md#security-considerations`
|
||||
- RBAC: `templates/serviceaccount.yaml`
|
||||
- Network policies: `DEPLOYMENT_GUIDE.md#network-policies`
|
||||
|
||||
**Database Administrators**
|
||||
- PostgreSQL config: `values.yaml` (postgresql section)
|
||||
- External DB setup: `examples/values-external-db.yaml`
|
||||
- Backup/restore: `DEPLOYMENT_GUIDE.md#backup-and-restore`
|
||||
|
||||
### By Task
|
||||
|
||||
**Getting Started**
|
||||
1. Read: `INSTALLATION.md`
|
||||
2. Install: `helm install certctl certctl/`
|
||||
3. Verify: Run commands in `INSTALLATION.md`
|
||||
|
||||
**Production Deployment**
|
||||
1. Read: `DEPLOYMENT_GUIDE.md`
|
||||
2. Choose: `examples/values-prod-ha.yaml`
|
||||
3. Deploy: Follow step-by-step guide
|
||||
4. Reference: `README.md` for detailed options
|
||||
|
||||
**Troubleshooting**
|
||||
- Common issues: `README.md#troubleshooting`
|
||||
- Detailed guide: `DEPLOYMENT_GUIDE.md#troubleshooting`
|
||||
- Error messages: kubectl logs and events
|
||||
|
||||
**Configuration**
|
||||
- All options: `values.yaml`
|
||||
- Examples: `examples/values-*.yaml`
|
||||
- Detailed docs: `README.md#configuration`
|
||||
|
||||
## Key Features
|
||||
|
||||
### High Availability
|
||||
- Multi-replica server deployment
|
||||
- Pod anti-affinity
|
||||
- StatefulSet for database
|
||||
- Pod disruption budgets
|
||||
|
||||
### Security
|
||||
- Non-root containers
|
||||
- Read-only filesystems
|
||||
- RBAC support
|
||||
- Kubernetes Secrets
|
||||
- Network policies
|
||||
|
||||
### Flexibility
|
||||
- Multiple issuers (Local CA, ACME, step-ca, OpenSSL)
|
||||
- Internal or external PostgreSQL
|
||||
- DaemonSet or Deployment agents
|
||||
- Optional Ingress with TLS
|
||||
- Email notifications
|
||||
|
||||
### Observability
|
||||
- Health checks
|
||||
- Structured logging
|
||||
- Prometheus metrics
|
||||
- ServiceMonitor support
|
||||
|
||||
## Support
|
||||
|
||||
- **GitHub**: https://github.com/shankar0123/certctl
|
||||
- **Issues**: Report on GitHub issues
|
||||
- **Documentation**: All docs are in `deploy/helm/`
|
||||
|
||||
## File Statistics
|
||||
|
||||
- **Total files**: 24
|
||||
- **Documentation**: 4 files (42 KB)
|
||||
- **Chart files**: 3 files
|
||||
- **Templates**: 13 files
|
||||
- **Examples**: 4 files
|
||||
- **Total size**: 144 KB
|
||||
|
||||
## License
|
||||
|
||||
All files are covered under the BSL-1.1 license (converts to Apache 2.0 in 2033).
|
||||
@@ -0,0 +1,97 @@
|
||||
# Quick Installation Guide
|
||||
|
||||
## One-Liner Installation
|
||||
|
||||
### Development (no auth)
|
||||
```bash
|
||||
helm install certctl certctl/ \
|
||||
--set server.auth.type=none \
|
||||
--set postgresql.auth.password=dev
|
||||
```
|
||||
|
||||
### Production (with API key)
|
||||
```bash
|
||||
API_KEY=$(openssl rand -base64 32)
|
||||
DB_PASSWORD=$(openssl rand -base64 32)
|
||||
|
||||
helm install certctl certctl/ \
|
||||
--values examples/values-prod-ha.yaml \
|
||||
--set server.auth.apiKey="$API_KEY" \
|
||||
--set postgresql.auth.password="$DB_PASSWORD"
|
||||
```
|
||||
|
||||
## Verify Installation
|
||||
|
||||
```bash
|
||||
# Wait for pods to be ready
|
||||
kubectl rollout status deployment/certctl-server
|
||||
kubectl rollout status statefulset/certctl-postgres
|
||||
|
||||
# Check all components
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
|
||||
# View server logs
|
||||
kubectl logs -l app.kubernetes.io/component=server -f
|
||||
|
||||
# Access the API (HTTPS-only as of v2.2; use --cacert or -k depending on your cert provisioning)
|
||||
kubectl port-forward svc/certctl-server 8443:8443 &
|
||||
# If the chart provisioned a self-signed cert, fetch the CA bundle from the secret first:
|
||||
# kubectl get secret certctl-server-tls -o jsonpath='{.data.ca\.crt}' | base64 -d > /tmp/certctl-ca.crt
|
||||
curl --cacert /tmp/certctl-ca.crt https://localhost:8443/health
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Read Documentation**
|
||||
- `README.md` - Complete reference
|
||||
- `DEPLOYMENT_GUIDE.md` - Step-by-step guide
|
||||
- `CHART_SUMMARY.md` - Architecture overview
|
||||
|
||||
2. **Configure for Your Environment**
|
||||
- Review `examples/` for your deployment scenario
|
||||
- Customize `values.yaml` as needed
|
||||
- Use `helm upgrade` to apply changes
|
||||
|
||||
3. **Set Up Monitoring**
|
||||
- Install Prometheus (optional)
|
||||
- Enable Ingress with HTTPS
|
||||
- Configure email notifications
|
||||
|
||||
4. **Deploy Agents**
|
||||
- Agents deploy automatically as DaemonSet
|
||||
- Verify with: `kubectl get pods -l app.kubernetes.io/component=agent`
|
||||
|
||||
5. **Create Certificates**
|
||||
- Configure issuer connectors (Local CA, ACME, etc.)
|
||||
- Access web dashboard at ingress or port-forward
|
||||
|
||||
## Common Commands
|
||||
|
||||
```bash
|
||||
# List installations
|
||||
helm list
|
||||
|
||||
# View chart values
|
||||
helm values certctl
|
||||
|
||||
# Upgrade chart
|
||||
helm upgrade certctl certctl/ -f new-values.yaml
|
||||
|
||||
# Rollback to previous version
|
||||
helm rollback certctl 1
|
||||
|
||||
# Uninstall chart
|
||||
helm uninstall certctl
|
||||
|
||||
# View deployment history
|
||||
helm history certctl
|
||||
|
||||
# Dry-run installation to see generated YAML
|
||||
helm install certctl certctl/ --dry-run --debug
|
||||
```
|
||||
|
||||
## Support
|
||||
|
||||
- Full documentation in `README.md`
|
||||
- Troubleshooting in `DEPLOYMENT_GUIDE.md`
|
||||
- Issues: https://github.com/shankar0123/certctl
|
||||
@@ -0,0 +1,516 @@
|
||||
# Certctl Helm Chart
|
||||
|
||||
Production-ready Helm chart for deploying certctl (self-hosted certificate lifecycle management platform) on Kubernetes.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Quick Start](#quick-start)
|
||||
2. [Chart Features](#chart-features)
|
||||
3. [Prerequisites](#prerequisites)
|
||||
4. [Installation](#installation)
|
||||
5. [Configuration](#configuration)
|
||||
6. [Usage Examples](#usage-examples)
|
||||
7. [Upgrading](#upgrading)
|
||||
8. [Uninstalling](#uninstalling)
|
||||
9. [Architecture](#architecture)
|
||||
10. [Security Considerations](#security-considerations)
|
||||
11. [Troubleshooting](#troubleshooting)
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Add the chart repository (when available)
|
||||
helm repo add certctl https://charts.example.com
|
||||
helm repo update
|
||||
|
||||
# Install with default values
|
||||
helm install certctl certctl/certctl \
|
||||
--set server.auth.apiKey="your-secure-api-key" \
|
||||
--set postgresql.auth.password="your-secure-password"
|
||||
|
||||
# Check installation status
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
```
|
||||
|
||||
## Chart Features
|
||||
|
||||
- **Server Deployment** — certctl control plane with configurable replicas
|
||||
- **PostgreSQL StatefulSet** — Persistent database with automatic schema migration
|
||||
- **Agent DaemonSet or Deployment** — Flexible agent deployment (per-node or custom replicas)
|
||||
- **Ingress Support** — Optional HTTPS ingress with cert-manager integration
|
||||
- **Security Contexts** — Non-root containers, read-only filesystems, minimal capabilities
|
||||
- **Resource Limits** — Configurable CPU and memory requests/limits
|
||||
- **Health Checks** — Liveness and readiness probes on all containers
|
||||
- **ConfigMaps and Secrets** — Centralized configuration management
|
||||
- **Service Account and RBAC** — Optional cluster role bindings
|
||||
- **Pod Disruption Budgets** — HA-ready with configurable disruption budgets
|
||||
- **Monitoring** — Optional Prometheus ServiceMonitor support
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Kubernetes 1.19 or later
|
||||
- Helm 3.0 or later
|
||||
- Optional: cert-manager (for automatic TLS certificate provisioning)
|
||||
- Optional: Prometheus (for metrics scraping)
|
||||
|
||||
## Installation
|
||||
|
||||
### 1. Using Chart from Repository
|
||||
|
||||
```bash
|
||||
helm repo add certctl https://charts.example.com
|
||||
helm repo update
|
||||
helm install certctl certctl/certctl -f my-values.yaml
|
||||
```
|
||||
|
||||
### 2. Using Local Chart
|
||||
|
||||
```bash
|
||||
cd deploy/helm
|
||||
helm install certctl certctl/ \
|
||||
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 32)"
|
||||
```
|
||||
|
||||
### 3. Minimal Production Installation
|
||||
|
||||
```bash
|
||||
helm install certctl certctl/certctl \
|
||||
--namespace certctl \
|
||||
--create-namespace \
|
||||
--set server.auth.apiKey="change-me" \
|
||||
--set postgresql.auth.password="change-me" \
|
||||
--set server.replicas=2 \
|
||||
--set server.resources.requests.cpu=200m \
|
||||
--set server.resources.requests.memory=256Mi \
|
||||
--set ingress.enabled=true \
|
||||
--set ingress.className=nginx \
|
||||
--set ingress.hosts[0].host=certctl.example.com
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Server Configuration
|
||||
|
||||
```yaml
|
||||
server:
|
||||
replicas: 1 # Number of server replicas
|
||||
port: 8443 # Service port
|
||||
auth:
|
||||
type: api-key # Authentication type
|
||||
apiKey: "your-api-key" # REQUIRED for production
|
||||
logging:
|
||||
level: info # Log level (debug, info, warn, error)
|
||||
format: json # Output format
|
||||
issuer:
|
||||
local:
|
||||
enabled: true # Enable local CA issuer
|
||||
acme:
|
||||
enabled: false # Enable ACME issuer
|
||||
directoryURL: "" # ACME directory URL
|
||||
email: "" # ACME registration email
|
||||
challengeType: "http-01" # Challenge type (http-01, dns-01, dns-persist-01)
|
||||
```
|
||||
|
||||
### PostgreSQL Configuration
|
||||
|
||||
```yaml
|
||||
postgresql:
|
||||
enabled: true # Use managed PostgreSQL
|
||||
auth:
|
||||
database: certctl
|
||||
username: certctl
|
||||
password: "your-password" # REQUIRED
|
||||
storage:
|
||||
size: 10Gi # PVC size
|
||||
storageClass: "" # Use default StorageClass
|
||||
```
|
||||
|
||||
### Agent Configuration
|
||||
|
||||
```yaml
|
||||
agent:
|
||||
enabled: true # Deploy agents
|
||||
kind: DaemonSet # DaemonSet (one per node) or Deployment
|
||||
replicas: 1 # For Deployment kind only
|
||||
discoveryDirs: "" # Comma-separated cert discovery paths
|
||||
nodeSelector: {} # Node affinity for DaemonSet
|
||||
```
|
||||
|
||||
### Ingress Configuration
|
||||
|
||||
```yaml
|
||||
ingress:
|
||||
enabled: false
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: certctl-tls
|
||||
hosts:
|
||||
- certctl.example.com
|
||||
```
|
||||
|
||||
See `values.yaml` for all available configuration options.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Example 1: High Availability Setup
|
||||
|
||||
```yaml
|
||||
# ha-values.yaml
|
||||
server:
|
||||
replicas: 3
|
||||
resources:
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 1000m
|
||||
memory: 512Mi
|
||||
|
||||
postgresql:
|
||||
storage:
|
||||
size: 50Gi
|
||||
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: app.kubernetes.io/component
|
||||
operator: In
|
||||
values: [server]
|
||||
topologyKey: kubernetes.io/hostname
|
||||
```
|
||||
|
||||
Deploy with:
|
||||
```bash
|
||||
helm install certctl certctl/certctl -f ha-values.yaml
|
||||
```
|
||||
|
||||
### Example 2: External PostgreSQL Database
|
||||
|
||||
```yaml
|
||||
# external-db-values.yaml
|
||||
postgresql:
|
||||
enabled: false
|
||||
|
||||
server:
|
||||
env:
|
||||
CERTCTL_DATABASE_URL: "postgres://user:password@rds.example.com:5432/certctl?sslmode=require"
|
||||
```
|
||||
|
||||
Deploy with:
|
||||
```bash
|
||||
helm install certctl certctl/certctl -f external-db-values.yaml
|
||||
```
|
||||
|
||||
### Example 3: ACME + Let's Encrypt
|
||||
|
||||
```yaml
|
||||
# acme-values.yaml
|
||||
server:
|
||||
issuer:
|
||||
acme:
|
||||
enabled: true
|
||||
directoryURL: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: admin@example.com
|
||||
challengeType: dns-01
|
||||
dnsPresentScript: /scripts/dns-present.sh
|
||||
dnsCleanupScript: /scripts/dns-cleanup.sh
|
||||
dnsPropagationWait: 30s
|
||||
```
|
||||
|
||||
### Example 4: Email Notifications via Slack + SMTP
|
||||
|
||||
```yaml
|
||||
# notifications-values.yaml
|
||||
server:
|
||||
smtp:
|
||||
enabled: true
|
||||
host: smtp.example.com
|
||||
port: 587
|
||||
username: certctl@example.com
|
||||
password: "smtp-password"
|
||||
fromAddress: certctl@example.com
|
||||
useTLS: true
|
||||
|
||||
notifiers:
|
||||
slack:
|
||||
enabled: true
|
||||
webhookUrl: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
|
||||
channel: "#certificates"
|
||||
```
|
||||
|
||||
## Upgrading
|
||||
|
||||
```bash
|
||||
# Update chart repository
|
||||
helm repo update
|
||||
|
||||
# Upgrade release
|
||||
helm upgrade certctl certctl/certctl -f values.yaml
|
||||
|
||||
# View upgrade history
|
||||
helm history certctl
|
||||
|
||||
# Rollback to previous version
|
||||
helm rollback certctl 1
|
||||
```
|
||||
|
||||
## Uninstalling
|
||||
|
||||
```bash
|
||||
# Delete the release (keeps data by default)
|
||||
helm uninstall certctl
|
||||
|
||||
# Also delete persistent data
|
||||
kubectl delete pvc --all -l app.kubernetes.io/instance=certctl
|
||||
|
||||
# Delete namespace
|
||||
kubectl delete namespace certctl
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Components
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────┐
|
||||
│ Kubernetes Cluster │
|
||||
├──────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌──────────────────┐ │
|
||||
│ │ Ingress/LB │ │ Agent Pod 1 │ │
|
||||
│ │ (optional) │ │ (DaemonSet) │ │
|
||||
│ └────────┬────────┘ └──────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ ┌──────────────────┐ │
|
||||
│ ┌─────────────────────────┐ │ Agent Pod 2 │ │
|
||||
│ │ Server Deployment │ │ (DaemonSet) │ │
|
||||
│ │ (1 to N replicas) │ └──────────────────┘ │
|
||||
│ │ - REST API │ │
|
||||
│ │ - Scheduler │ ┌──────────────────┐ │
|
||||
│ │ - UI Dashboard │ │ Agent Pod N │ │
|
||||
│ └────────┬────────────────┘ │ (DaemonSet) │ │
|
||||
│ │ └──────────────────┘ │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ ┌──────────────────────────┐ │
|
||||
│ │ PostgreSQL StatefulSet │ │
|
||||
│ │ - Database │ │
|
||||
│ │ - PVC (persistent) │ │
|
||||
│ └──────────────────────────┘ │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Network Communication
|
||||
|
||||
- **Server → PostgreSQL**: Internal cluster DNS (`certctl-postgres:5432`)
|
||||
- **Agent → Server**: Internal cluster DNS (`certctl-server:8443`)
|
||||
- **External → Server**: Via Ingress or Service (ClusterIP/LoadBalancer/NodePort)
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### 1. Secrets Management
|
||||
|
||||
All sensitive data is stored in Kubernetes Secrets:
|
||||
- PostgreSQL credentials
|
||||
- API keys
|
||||
- SMTP passwords
|
||||
- ACME account secrets
|
||||
|
||||
**Best Practices:**
|
||||
- Use sealed-secrets or external-secrets operator
|
||||
- Enable encryption at rest in etcd
|
||||
- Rotate secrets regularly
|
||||
|
||||
```bash
|
||||
# Example: Using sealed-secrets
|
||||
kubectl create secret generic certctl-api-key --from-literal=api-key="$(openssl rand -base64 32)" --dry-run=client -o yaml | kubeseal -f - | kubectl apply -f -
|
||||
```
|
||||
|
||||
### 2. RBAC
|
||||
|
||||
The chart creates minimal RBAC by default:
|
||||
- ServiceAccount per release
|
||||
- ClusterRole (empty, extensible)
|
||||
- ClusterRoleBinding
|
||||
|
||||
**To restrict further:**
|
||||
```yaml
|
||||
rbac:
|
||||
create: true
|
||||
# Add specific rules here
|
||||
```
|
||||
|
||||
### 3. Pod Security
|
||||
|
||||
All containers run with:
|
||||
- Non-root user (UID 1000)
|
||||
- Read-only root filesystem
|
||||
- No privilege escalation
|
||||
- Dropped capabilities (ALL)
|
||||
|
||||
### 4. Network Policies
|
||||
|
||||
Restrict pod-to-pod communication:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: certctl-default-deny
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/instance: certctl
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
ingress:
|
||||
- from:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: certctl
|
||||
egress:
|
||||
- to:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: certctl
|
||||
- to:
|
||||
- podSelector: {}
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 53 # DNS
|
||||
- protocol: UDP
|
||||
port: 53
|
||||
```
|
||||
|
||||
### 5. TLS/HTTPS
|
||||
|
||||
Enable HTTPS with cert-manager:
|
||||
|
||||
```bash
|
||||
helm install cert-manager jetstack/cert-manager \
|
||||
--namespace cert-manager \
|
||||
--create-namespace \
|
||||
--set installCRDs=true
|
||||
```
|
||||
|
||||
Then configure Ingress with TLS.
|
||||
|
||||
### 6. API Key Security
|
||||
|
||||
For production:
|
||||
1. Generate a strong API key: `openssl rand -base64 32`
|
||||
2. Store securely (Vault, sealed-secrets, etc.)
|
||||
3. Never commit to Git
|
||||
4. Rotate periodically
|
||||
|
||||
```bash
|
||||
# Generate and deploy API key
|
||||
NEW_KEY=$(openssl rand -base64 32)
|
||||
kubectl patch secret certctl-server -p "{\"data\":{\"api-key\":\"$(echo -n $NEW_KEY | base64)\"}}"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### 1. Pods Not Starting
|
||||
|
||||
```bash
|
||||
# Check pod status
|
||||
kubectl get pods -l app.kubernetes.io/instance=certctl
|
||||
kubectl describe pod <pod-name>
|
||||
kubectl logs <pod-name>
|
||||
```
|
||||
|
||||
### 2. Database Connection Issues
|
||||
|
||||
```bash
|
||||
# Verify PostgreSQL is running
|
||||
kubectl get pods -l app.kubernetes.io/component=postgres
|
||||
kubectl logs -l app.kubernetes.io/component=postgres
|
||||
|
||||
# Test connection from server pod
|
||||
kubectl exec -it <server-pod> -- \
|
||||
psql postgres://certctl:password@certctl-postgres:5432/certctl
|
||||
```
|
||||
|
||||
### 3. Agent Not Connecting
|
||||
|
||||
```bash
|
||||
# Check agent logs
|
||||
kubectl logs -l app.kubernetes.io/component=agent
|
||||
|
||||
# Verify server is reachable
|
||||
kubectl exec -it <agent-pod> -- \
|
||||
wget -q -O - http://certctl-server:8443/health
|
||||
```
|
||||
|
||||
### 4. Persistent Data Loss
|
||||
|
||||
```bash
|
||||
# Check PVC status
|
||||
kubectl get pvc
|
||||
|
||||
# Verify data is being stored
|
||||
kubectl exec -it <postgres-pod> -- \
|
||||
ls -lah /var/lib/postgresql/data/postgres
|
||||
```
|
||||
|
||||
### 5. Permission Denied Errors
|
||||
|
||||
The chart runs containers as non-root (UID 1000). If you see permission errors:
|
||||
|
||||
```yaml
|
||||
# Temporarily allow root for debugging
|
||||
server:
|
||||
securityContext:
|
||||
runAsUser: 0 # NOT FOR PRODUCTION
|
||||
```
|
||||
|
||||
### 6. Out of Memory
|
||||
|
||||
Increase resource limits:
|
||||
|
||||
```bash
|
||||
helm upgrade certctl certctl/certctl \
|
||||
--set server.resources.limits.memory=1Gi \
|
||||
--set postgresql.resources.limits.memory=2Gi
|
||||
```
|
||||
|
||||
### 7. Certificate Validation Issues
|
||||
|
||||
For self-signed certificates:
|
||||
|
||||
```bash
|
||||
kubectl exec -it <pod> -- \
|
||||
CERTCTL_TLS_INSECURE_SKIP_VERIFY=true <command>
|
||||
```
|
||||
|
||||
### Common Issues and Solutions
|
||||
|
||||
| Issue | Solution |
|
||||
|-------|----------|
|
||||
| `ImagePullBackOff` | Update `server.image.repository` to your registry |
|
||||
| `CrashLoopBackOff` | Check logs with `kubectl logs <pod>` |
|
||||
| `Pending` PVC | Check storage class availability |
|
||||
| Connection timeout | Verify network policies and service DNS |
|
||||
| High memory usage | Adjust `postgresql.resources.limits` and `server.resources.limits` |
|
||||
|
||||
## Support and Contributing
|
||||
|
||||
For issues, questions, or contributions, visit:
|
||||
- GitHub: https://github.com/shankar0123/certctl
|
||||
- Documentation: https://github.com/shankar0123/certctl/tree/main/docs
|
||||
|
||||
## License
|
||||
|
||||
BSL-1.1 (converts to Apache 2.0 in 2033)
|
||||
@@ -0,0 +1,31 @@
|
||||
# Patterns to ignore when building packages.
|
||||
# This supports shell glob patterns, relative path patterns, and negated
|
||||
# patterns. Only one pattern per line.
|
||||
.DS_Store
|
||||
# Common VCS dirs
|
||||
.git/
|
||||
.gitignore
|
||||
.bzr/
|
||||
.bzrignore
|
||||
.hg/
|
||||
.hgignore
|
||||
.svn/
|
||||
# Common backup files
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
*.pyo
|
||||
*.pyc
|
||||
.pytest_cache/
|
||||
*.egg-info/
|
||||
dist/
|
||||
build/
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.sublime-project
|
||||
*.sublime-workspace
|
||||
# OS
|
||||
Thumbs.db
|
||||
# Helm
|
||||
Chart.lock
|
||||
@@ -0,0 +1,20 @@
|
||||
apiVersion: v2
|
||||
name: certctl
|
||||
description: Self-hosted certificate lifecycle management platform
|
||||
type: application
|
||||
version: 0.1.0
|
||||
appVersion: "2.1.0"
|
||||
keywords:
|
||||
- certificate
|
||||
- tls
|
||||
- ssl
|
||||
- pki
|
||||
- acme
|
||||
- lifecycle
|
||||
- kubernetes
|
||||
maintainers:
|
||||
- name: certctl
|
||||
home: https://github.com/shankar0123/certctl
|
||||
sources:
|
||||
- https://github.com/shankar0123/certctl
|
||||
license: BSL-1.1
|
||||
@@ -0,0 +1,148 @@
|
||||
# certctl Helm Chart
|
||||
|
||||
Production-ready Helm chart for deploying [certctl](https://github.com/shankar0123/certctl) on Kubernetes. Wires up the certctl server (Deployment), PostgreSQL (StatefulSet with PVC), and the agent (DaemonSet — one per node) on a private cluster, with health probes, security contexts, and optional Ingress.
|
||||
|
||||
## Quick install
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--create-namespace --namespace certctl \
|
||||
--set server.auth.apiKey="$(openssl rand -base64 32)" \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 24)"
|
||||
```
|
||||
|
||||
This brings up:
|
||||
|
||||
- `<release>-server` Deployment (HTTPS-only on port 8443; TLS 1.3)
|
||||
- `<release>-postgres` StatefulSet (PostgreSQL 16-alpine, 1 replica, 10Gi PVC by default)
|
||||
- `<release>-agent` DaemonSet (polls server, generates ECDSA P-256 keys locally)
|
||||
- Service objects, optional Ingress, and ServiceAccount with RBAC
|
||||
|
||||
See [`values.yaml`](values.yaml) for the full configuration surface — issuer settings, target connectors, scheduler intervals, notifier credentials, and resource requests/limits all live there.
|
||||
|
||||
## Operational notes
|
||||
|
||||
### Postgres password rotation — read this before changing `postgresql.auth.password`
|
||||
|
||||
**The trap.** `postgresql.auth.password` is bound to `pg_authid` exactly once — when the StatefulSet's PVC is provisioned and `initdb` runs. The official `postgres:16-alpine` image only runs `initdb` when `/var/lib/postgresql/data` is empty, so on every subsequent rollout the `POSTGRES_PASSWORD` env var is read into the container but **ignored** by postgres itself. The certctl-server container also picks up the new value (via the database URL helper template), so the two halves diverge: server presents the new password, postgres still expects the old one.
|
||||
|
||||
**Symptom.** The certctl-server pod's startup log shows:
|
||||
|
||||
```
|
||||
failed to ping database: postgres rejected the configured credentials
|
||||
(SQLSTATE 28P01 — invalid_password). If you recently rotated POSTGRES_PASSWORD ...
|
||||
```
|
||||
|
||||
That diagnostic is emitted by `internal/repository/postgres/db.go::wrapPingError` — it points operators at the two remediation paths below.
|
||||
|
||||
**Remediation, non-destructive (preferred for any environment with real data):**
|
||||
|
||||
```bash
|
||||
# 1. Rotate the password in postgres directly
|
||||
kubectl -n certctl exec -it <release>-postgres-0 -- \
|
||||
psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new-password>';"
|
||||
|
||||
# 2. Update the secret / Helm values to the same value
|
||||
helm upgrade <release> deploy/helm/certctl/ \
|
||||
--reuse-values \
|
||||
--set postgresql.auth.password='<new-password>'
|
||||
|
||||
# 3. Bounce the certctl-server pod so it re-reads the secret
|
||||
kubectl -n certctl rollout restart deployment/<release>-server
|
||||
```
|
||||
|
||||
**Remediation, destructive (DESTROYS ALL CERTCTL DATA — only acceptable on dev/demo clusters):**
|
||||
|
||||
```bash
|
||||
helm uninstall <release> -n certctl
|
||||
kubectl -n certctl delete pvc -l \
|
||||
app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres
|
||||
helm install <release> deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set postgresql.auth.password='<new-password>'
|
||||
```
|
||||
|
||||
The PVC re-creates empty, `initdb` runs on first boot of the new postgres pod, and `pg_authid` is seeded with the new password.
|
||||
|
||||
**Why we don't fix this in the chart.** The env-vs-`pg_authid` divergence is intrinsic to how the upstream `postgres` image bootstraps — `initdb` is run-once-per-empty-data-dir, and there is no upstream-supported way to make subsequent boots re-seed `pg_authid` from `POSTGRES_PASSWORD`. The ergonomic answer is the runtime diagnostic plus this operational note.
|
||||
|
||||
**Cross-references.** Same root cause is documented for the docker-compose path in [`docs/quickstart.md`](../../../docs/quickstart.md) (Warning callout after the `cp .env.example .env` block) and in [`deploy/ENVIRONMENTS.md`](../../ENVIRONMENTS.md) (Stateful volume — first-boot password binding section). The runtime diagnostic itself lives in `internal/repository/postgres/db.go::wrapPingError` with regression coverage in `internal/repository/postgres/db_test.go`.
|
||||
|
||||
### Server API key rotation
|
||||
|
||||
Unlike the postgres password, `server.auth.apiKey` accepts a comma-separated list, so zero-downtime rotation is straightforward:
|
||||
|
||||
```bash
|
||||
# 1. Add the new key alongside the old
|
||||
helm upgrade <release> deploy/helm/certctl/ \
|
||||
--reuse-values \
|
||||
--set server.auth.apiKey='new-key,old-key'
|
||||
|
||||
# 2. Roll your agents / clients over to the new key
|
||||
|
||||
# 3. Remove the old key
|
||||
helm upgrade <release> deploy/helm/certctl/ \
|
||||
--reuse-values \
|
||||
--set server.auth.apiKey='new-key'
|
||||
```
|
||||
|
||||
### JWT / OIDC via authenticating gateway
|
||||
|
||||
certctl's in-process auth surface is intentionally narrow: `server.auth.type=api-key` for production deployments and `server.auth.type=none` for development. There is no in-process JWT, OIDC, mTLS, or SAML middleware. (`server.auth.type=jwt` was accepted pre-G-1 but silently routed every request through the api-key bearer middleware — silent auth downgrade. The chart now fails at `helm install`/`helm upgrade` template time via the `certctl.validateAuthType` helper if you set it. See [`../../../docs/upgrade-to-v2-jwt-removal.md`](../../../docs/upgrade-to-v2-jwt-removal.md) if you previously had this in your values.)
|
||||
|
||||
For deployments that need JWT/OIDC, the canonical Kubernetes-flavored shape is to put oauth2-proxy in front of the certctl Service, attach an authenticating Ingress middleware, and run certctl with `server.auth.type=none`:
|
||||
|
||||
```bash
|
||||
# 1. Install oauth2-proxy (or any OIDC-terminating sidecar) in the same namespace
|
||||
helm install oauth2-proxy oauth2-proxy/oauth2-proxy \
|
||||
--namespace certctl \
|
||||
--set config.clientID="$OIDC_CLIENT_ID" \
|
||||
--set config.clientSecret="$OIDC_CLIENT_SECRET" \
|
||||
--set config.cookieSecret="$(openssl rand -base64 32)" \
|
||||
--set config.configFile='|
|
||||
provider = "oidc"
|
||||
oidc_issuer_url = "https://your-issuer/"
|
||||
upstreams = ["http://<release>-server.certctl.svc.cluster.local:8443"]
|
||||
pass_authorization_header = true
|
||||
set_authorization_header = true
|
||||
email_domains = ["*"]
|
||||
'
|
||||
|
||||
# 2. Install certctl with type=none (gateway terminates auth)
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--namespace certctl \
|
||||
--set server.auth.type=none \
|
||||
--set postgresql.auth.password="$(openssl rand -base64 24)"
|
||||
|
||||
# 3. Attach an Ingress that routes through oauth2-proxy
|
||||
# (Traefik ForwardAuth, nginx auth_request, Envoy ext_authz, etc.)
|
||||
```
|
||||
|
||||
Same root pattern works with Pomerium, Authelia, Caddy `forward_auth`, Apache `mod_auth_openidc`, or any service-mesh `ext_authz`. See [`../../../docs/architecture.md`](../../../docs/architecture.md) "Authenticating-gateway pattern" for the full design rationale and [`../../../docs/upgrade-to-v2-jwt-removal.md`](../../../docs/upgrade-to-v2-jwt-removal.md) for the migration walkthrough.
|
||||
|
||||
### TLS certificate sourcing
|
||||
|
||||
By default the chart provisions a self-signed cert via the same init-container pattern as the docker-compose deploy. For production, supply an operator-managed Secret (cert-manager, internal CA, etc.) — see [`docs/tls.md`](../../../docs/tls.md) for the full provisioning matrix and [`docs/upgrade-to-tls.md`](../../../docs/upgrade-to-tls.md) for upgrade-from-HTTP procedures.
|
||||
|
||||
## Disabling embedded postgres
|
||||
|
||||
If you have an existing PostgreSQL cluster, disable the embedded one and point at it directly:
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set postgresql.enabled=false \
|
||||
--set server.databaseUrl='postgres://certctl:<pw>@my-pg-host:5432/certctl?sslmode=require'
|
||||
```
|
||||
|
||||
The volume-trap section above does **not** apply to this configuration — your postgres operator (or cloud DB) handles password rotation, and you control `pg_authid` directly.
|
||||
|
||||
## Uninstall
|
||||
|
||||
```bash
|
||||
helm uninstall <release> -n certctl
|
||||
# Optional — also delete the postgres PVC (DESTROYS DATA):
|
||||
kubectl -n certctl delete pvc -l \
|
||||
app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres
|
||||
```
|
||||
|
||||
By default `helm uninstall` retains the StatefulSet's PVCs, so reinstalling with the same release name preserves the database. If you've changed `postgresql.auth.password` in your values between uninstall and reinstall, you'll hit the trap on the reinstall — apply the non-destructive remediation above, or also delete the PVC.
|
||||
@@ -0,0 +1,74 @@
|
||||
1. Get the certctl Server URL by running:
|
||||
{{- if .Values.ingress.enabled }}
|
||||
https://{{ index .Values.ingress.hosts 0 "host" }}
|
||||
{{- else if contains "NodePort" .Values.server.service.type }}
|
||||
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
|
||||
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "certctl.fullname" . }}-server)
|
||||
echo https://$NODE_IP:$NODE_PORT
|
||||
{{- else if contains "LoadBalancer" .Values.server.service.type }}
|
||||
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "certctl.fullname" . }}-server --template "{.status.loadBalancer.ingress[0].ip}")
|
||||
echo https://$SERVICE_IP:{{ .Values.server.service.port }}
|
||||
{{- else }}
|
||||
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "certctl.name" . }},app.kubernetes.io/instance={{ .Release.Name }},app.kubernetes.io/component=server" -o jsonpath="{.items[0].metadata.name}")
|
||||
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
|
||||
echo "Visit https://127.0.0.1:8443 to use your application"
|
||||
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8443:$CONTAINER_PORT
|
||||
{{- end }}
|
||||
|
||||
2. Talk to the HTTPS-only server from your workstation:
|
||||
# Export the CA bundle that signed the server cert (self-signed or cert-manager-issued)
|
||||
kubectl get secret --namespace {{ .Release.Namespace }} {{ include "certctl.tls.secretName" . }} \
|
||||
-o jsonpath='{.data.ca\.crt}' | base64 --decode > /tmp/certctl-ca.crt
|
||||
# (If ca.crt is empty, fall back to tls.crt — typical when the Secret
|
||||
# was created from a self-signed bootstrap cert without a separate CA.)
|
||||
|
||||
# Adapt the URL below to match the Server URL printed in step 1.
|
||||
curl --cacert /tmp/certctl-ca.crt https://127.0.0.1:8443/health
|
||||
|
||||
3. Get the default API key:
|
||||
kubectl get secret --namespace {{ .Release.Namespace }} {{ include "certctl.fullname" . }}-server -o jsonpath="{.data.api-key}" | base64 --decode; echo
|
||||
|
||||
4. Get PostgreSQL connection details:
|
||||
Host: {{ include "certctl.fullname" . }}-postgres.{{ .Release.Namespace }}.svc.cluster.local
|
||||
Port: 5432
|
||||
Database: {{ .Values.postgresql.auth.database }}
|
||||
Username: {{ .Values.postgresql.auth.username }}
|
||||
Password: $(kubectl get secret --namespace {{ .Release.Namespace }} {{ include "certctl.fullname" . }}-postgres -o jsonpath="{.data.password}" | base64 --decode)
|
||||
|
||||
5. Check deployment status:
|
||||
kubectl get pods -n {{ .Release.Namespace }} -l app.kubernetes.io/instance={{ .Release.Name }}
|
||||
|
||||
6. View server logs:
|
||||
kubectl logs -n {{ .Release.Namespace }} -l app.kubernetes.io/name={{ include "certctl.name" . }},app.kubernetes.io/component=server -f
|
||||
|
||||
{{- if .Values.agent.enabled }}
|
||||
|
||||
7. View agent logs:
|
||||
kubectl logs -n {{ .Release.Namespace }} -l app.kubernetes.io/name={{ include "certctl.name" . }},app.kubernetes.io/component=agent -f
|
||||
|
||||
{{- end }}
|
||||
|
||||
IMPORTANT NOTES FOR PRODUCTION:
|
||||
|
||||
1. Update the API key for security:
|
||||
kubectl patch secret {{ include "certctl.fullname" . }}-server -n {{ .Release.Namespace }} \
|
||||
-p '{"data":{"api-key":"'$(echo -n "YOUR_NEW_API_KEY" | base64)'"}}'
|
||||
|
||||
2. Update PostgreSQL password:
|
||||
kubectl patch secret {{ include "certctl.fullname" . }}-postgres -n {{ .Release.Namespace }} \
|
||||
-p '{"data":{"password":"'$(echo -n "YOUR_NEW_PASSWORD" | base64)'"}}'
|
||||
|
||||
3. Configure certificate issuers (ACME, step-ca, etc.) via values.yaml:
|
||||
helm upgrade {{ .Release.Name }} certctl/certctl \
|
||||
--set server.issuer.acme.enabled=true \
|
||||
--set server.issuer.acme.directoryURL=https://acme-v02.api.letsencrypt.org/directory \
|
||||
--set server.issuer.acme.email=admin@example.com
|
||||
|
||||
4. For production with persistent databases and backups:
|
||||
- Use an external PostgreSQL managed service (AWS RDS, Cloud SQL, etc.)
|
||||
- Set postgresql.enabled=false and configure CERTCTL_DATABASE_URL in values
|
||||
|
||||
5. Review security contexts and network policies:
|
||||
- All containers run as non-root
|
||||
- Implement network policies to restrict traffic between components
|
||||
- Consider pod security policies or security standards for your cluster
|
||||
@@ -0,0 +1,194 @@
|
||||
{{/*
|
||||
Expand the name of the chart.
|
||||
*/}}
|
||||
{{- define "certctl.name" -}}
|
||||
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create a default fully qualified app name.
|
||||
*/}}
|
||||
{{- define "certctl.fullname" -}}
|
||||
{{- if .Values.fullnameOverride }}
|
||||
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- $name := default .Chart.Name .Values.nameOverride }}
|
||||
{{- if contains $name .Release.Name }}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create chart name and version as used by the chart label.
|
||||
*/}}
|
||||
{{- define "certctl.chart" -}}
|
||||
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Common labels
|
||||
*/}}
|
||||
{{- define "certctl.labels" -}}
|
||||
helm.sh/chart: {{ include "certctl.chart" . }}
|
||||
{{ include "certctl.selectorLabels" . }}
|
||||
{{- if .Chart.AppVersion }}
|
||||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
|
||||
{{- end }}
|
||||
app.kubernetes.io/managed-by: {{ .Release.Service }}
|
||||
{{- with .Values.commonLabels }}
|
||||
{{ toYaml . }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Selector labels for the main service (server, agent, postgres)
|
||||
*/}}
|
||||
{{- define "certctl.selectorLabels" -}}
|
||||
app.kubernetes.io/name: {{ include "certctl.name" . }}
|
||||
app.kubernetes.io/instance: {{ .Release.Name }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Server selector labels
|
||||
*/}}
|
||||
{{- define "certctl.serverSelectorLabels" -}}
|
||||
{{ include "certctl.selectorLabels" . }}
|
||||
app.kubernetes.io/component: server
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Agent selector labels
|
||||
*/}}
|
||||
{{- define "certctl.agentSelectorLabels" -}}
|
||||
{{ include "certctl.selectorLabels" . }}
|
||||
app.kubernetes.io/component: agent
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
PostgreSQL selector labels
|
||||
*/}}
|
||||
{{- define "certctl.postgresSelectorLabels" -}}
|
||||
{{ include "certctl.selectorLabels" . }}
|
||||
app.kubernetes.io/component: postgres
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Service account name
|
||||
*/}}
|
||||
{{- define "certctl.serviceAccountName" -}}
|
||||
{{- if .Values.serviceAccount.create }}
|
||||
{{- default (include "certctl.fullname" .) .Values.serviceAccount.name }}
|
||||
{{- else }}
|
||||
{{- default "default" .Values.serviceAccount.name }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Server image
|
||||
*/}}
|
||||
{{- define "certctl.serverImage" -}}
|
||||
{{- $image := .Values.server.image }}
|
||||
{{- printf "%s:%s" $image.repository (coalesce $image.tag .Chart.AppVersion) }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Agent image
|
||||
*/}}
|
||||
{{- define "certctl.agentImage" -}}
|
||||
{{- $image := .Values.agent.image }}
|
||||
{{- printf "%s:%s" $image.repository (coalesce $image.tag .Chart.AppVersion) }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
PostgreSQL image
|
||||
*/}}
|
||||
{{- define "certctl.postgresImage" -}}
|
||||
{{- $image := .Values.postgresql.image }}
|
||||
{{- printf "%s:%s" $image.repository $image.tag }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Database connection string
|
||||
*/}}
|
||||
{{- define "certctl.databaseURL" -}}
|
||||
postgres://{{ .Values.postgresql.auth.username }}:$(POSTGRES_PASSWORD)@{{ include "certctl.fullname" . }}-postgres:5432/{{ .Values.postgresql.auth.database }}?sslmode=disable
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Server URL (for agents). HTTPS-only as of v2.2 — see docs/tls.md.
|
||||
*/}}
|
||||
{{- define "certctl.serverURL" -}}
|
||||
https://{{ include "certctl.fullname" . }}-server:{{ .Values.server.service.port }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
TLS Secret name resolver.
|
||||
|
||||
Operator-facing precedence:
|
||||
1. server.tls.existingSecret — operator points at a pre-existing kubernetes.io/tls Secret
|
||||
2. server.tls.certManager.secretName — explicit secret name for the cert-manager Certificate CR
|
||||
3. "<fullname>-tls" — default when cert-manager is enabled but secretName is blank
|
||||
|
||||
Never emits an empty string — that case is already excluded by certctl.tls.required below,
|
||||
which must be invoked by any template that depends on the resolved secret name.
|
||||
*/}}
|
||||
{{- define "certctl.tls.secretName" -}}
|
||||
{{- if .Values.server.tls.existingSecret -}}
|
||||
{{- .Values.server.tls.existingSecret -}}
|
||||
{{- else if .Values.server.tls.certManager.secretName -}}
|
||||
{{- .Values.server.tls.certManager.secretName -}}
|
||||
{{- else -}}
|
||||
{{- printf "%s-tls" (include "certctl.fullname" .) -}}
|
||||
{{- end -}}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
TLS configuration gate.
|
||||
|
||||
HTTPS is the only supported listener mode (v2.2+). The server refuses to start
|
||||
without a cert/key pair mounted at server.tls.mountPath, so `helm template` /
|
||||
`helm install` must fail loudly at render-time rather than shipping a broken
|
||||
Deployment that crash-loops with "tls config required".
|
||||
|
||||
Operators MUST configure EXACTLY ONE of:
|
||||
(a) server.tls.existingSecret: <name-of-kubernetes.io/tls-secret>
|
||||
(b) server.tls.certManager.enabled: true (+ issuerRef.name populated)
|
||||
|
||||
Any template that mounts the TLS Secret must call
|
||||
`{{ include "certctl.tls.required" . }}` at the top so this guard runs once
|
||||
per affected resource. No-op when configured correctly.
|
||||
*/}}
|
||||
{{- define "certctl.tls.required" -}}
|
||||
{{- if and (not .Values.server.tls.existingSecret) (not .Values.server.tls.certManager.enabled) -}}
|
||||
{{- fail "\n\ncertctl refuses to start without TLS.\n\nSet EXACTLY ONE of:\n --set server.tls.existingSecret=<your-kubernetes.io/tls-secret-name>\nOR\n --set server.tls.certManager.enabled=true \\\n --set server.tls.certManager.issuerRef.name=<your-issuer-or-clusterissuer>\n\nSee docs/tls.md for the full setup walkthrough, including bootstrap\nguidance for air-gapped clusters without cert-manager.\n" -}}
|
||||
{{- end -}}
|
||||
{{- if and .Values.server.tls.certManager.enabled (not .Values.server.tls.certManager.issuerRef.name) -}}
|
||||
{{- fail "\n\nserver.tls.certManager.enabled=true but server.tls.certManager.issuerRef.name is empty.\n\nSet:\n --set server.tls.certManager.issuerRef.name=<your-issuer-or-clusterissuer>\n\nSee docs/tls.md.\n" -}}
|
||||
{{- end -}}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Auth-type validation gate.
|
||||
|
||||
G-1 (P1): pre-G-1 the chart accepted server.auth.type=jwt and the
|
||||
certctl-server container silently routed every request through the
|
||||
api-key bearer middleware (no JWT impl ships with certctl). Post-G-1
|
||||
the chart fails at template-time with a pointer at the authenticating-
|
||||
gateway pattern. The valid set must stay in sync with
|
||||
internal/config.ValidAuthTypes() in the Go binary; if you add a value
|
||||
there you must add it here too (and update the property test in
|
||||
internal/config/config_test.go that pins both surfaces).
|
||||
|
||||
Any template that consumes .Values.server.auth.type should call
|
||||
`{{ include "certctl.validateAuthType" . }}` at the top so this guard
|
||||
runs once per affected resource. No-op when configured correctly.
|
||||
*/}}
|
||||
{{- define "certctl.validateAuthType" -}}
|
||||
{{- $valid := list "api-key" "none" -}}
|
||||
{{- if not (has .Values.server.auth.type $valid) -}}
|
||||
{{- fail (printf "\n\nserver.auth.type=%q is not supported (valid: %v).\n\nFor JWT/OIDC, run an authenticating gateway in front of certctl\n(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) and\nset server.auth.type=none here so the gateway terminates federated\nidentity. See docs/architecture.md \"Authenticating-gateway pattern\"\nand docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough.\n\nG-1 audit closure: pre-G-1 the chart accepted type=jwt and the binary\nsilently downgraded to api-key middleware. The chart now fails at\ntemplate time so misconfigured deployments cannot ship.\n" .Values.server.auth.type $valid) -}}
|
||||
{{- end -}}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,13 @@
|
||||
{{- if .Values.agent.enabled }}
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-agent
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: agent
|
||||
data:
|
||||
{{- if .Values.agent.discoveryDirs }}
|
||||
discovery-dirs: {{ .Values.agent.discoveryDirs | quote }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,181 @@
|
||||
{{- if .Values.agent.enabled }}
|
||||
{{- include "certctl.tls.required" . }}
|
||||
{{- if eq .Values.agent.kind "DaemonSet" }}
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-agent
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: agent
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "certctl.agentSelectorLabels" . | nindent 6 }}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "certctl.agentSelectorLabels" . | nindent 8 }}
|
||||
spec:
|
||||
serviceAccountName: {{ include "certctl.serviceAccountName" . }}
|
||||
securityContext:
|
||||
{{- toYaml .Values.agent.securityContext | nindent 8 }}
|
||||
{{- with .Values.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.nodeSelector }}
|
||||
nodeSelector:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.tolerations }}
|
||||
tolerations:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.affinity }}
|
||||
affinity:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
containers:
|
||||
- name: agent
|
||||
image: {{ include "certctl.agentImage" . }}
|
||||
imagePullPolicy: {{ .Values.agent.image.pullPolicy }}
|
||||
env:
|
||||
- name: CERTCTL_SERVER_URL
|
||||
value: {{ include "certctl.serverURL" . }}
|
||||
- name: CERTCTL_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: api-key
|
||||
- name: CERTCTL_AGENT_NAME
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
- name: CERTCTL_KEY_DIR
|
||||
value: {{ .Values.agent.keyDir }}
|
||||
- name: CERTCTL_SERVER_CA_BUNDLE_PATH
|
||||
value: "{{ .Values.server.tls.mountPath }}/ca.crt"
|
||||
{{- if .Values.agent.discoveryDirs }}
|
||||
- name: CERTCTL_DISCOVERY_DIRS
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-agent
|
||||
key: discovery-dirs
|
||||
{{- end }}
|
||||
{{- with .Values.agent.env }}
|
||||
{{- toYaml . | nindent 12 }}
|
||||
{{- end }}
|
||||
resources:
|
||||
{{- toYaml .Values.agent.resources | nindent 12 }}
|
||||
volumeMounts:
|
||||
- name: agent-keys
|
||||
mountPath: {{ .Values.agent.keyDir }}
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
- name: server-tls
|
||||
mountPath: {{ .Values.server.tls.mountPath }}
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: agent-keys
|
||||
emptyDir:
|
||||
sizeLimit: 1Gi
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
- name: server-tls
|
||||
secret:
|
||||
secretName: {{ include "certctl.tls.secretName" . }}
|
||||
defaultMode: 0400
|
||||
{{- else if eq .Values.agent.kind "Deployment" }}
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-agent
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: agent
|
||||
spec:
|
||||
replicas: {{ .Values.agent.replicas }}
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "certctl.agentSelectorLabels" . | nindent 6 }}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "certctl.agentSelectorLabels" . | nindent 8 }}
|
||||
spec:
|
||||
serviceAccountName: {{ include "certctl.serviceAccountName" . }}
|
||||
securityContext:
|
||||
{{- toYaml .Values.agent.securityContext | nindent 8 }}
|
||||
{{- with .Values.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.nodeSelector }}
|
||||
nodeSelector:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.tolerations }}
|
||||
tolerations:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- with .Values.agent.affinity }}
|
||||
affinity:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
containers:
|
||||
- name: agent
|
||||
image: {{ include "certctl.agentImage" . }}
|
||||
imagePullPolicy: {{ .Values.agent.image.pullPolicy }}
|
||||
env:
|
||||
- name: CERTCTL_SERVER_URL
|
||||
value: {{ include "certctl.serverURL" . }}
|
||||
- name: CERTCTL_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: api-key
|
||||
- name: CERTCTL_AGENT_NAME
|
||||
{{- if .Values.agent.name }}
|
||||
value: {{ .Values.agent.name | quote }}
|
||||
{{- else }}
|
||||
valueFrom:
|
||||
fieldRef:
|
||||
fieldPath: metadata.name
|
||||
{{- end }}
|
||||
- name: CERTCTL_KEY_DIR
|
||||
value: {{ .Values.agent.keyDir }}
|
||||
- name: CERTCTL_SERVER_CA_BUNDLE_PATH
|
||||
value: "{{ .Values.server.tls.mountPath }}/ca.crt"
|
||||
{{- if .Values.agent.discoveryDirs }}
|
||||
- name: CERTCTL_DISCOVERY_DIRS
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-agent
|
||||
key: discovery-dirs
|
||||
{{- end }}
|
||||
{{- with .Values.agent.env }}
|
||||
{{- toYaml . | nindent 12 }}
|
||||
{{- end }}
|
||||
resources:
|
||||
{{- toYaml .Values.agent.resources | nindent 12 }}
|
||||
volumeMounts:
|
||||
- name: agent-keys
|
||||
mountPath: {{ .Values.agent.keyDir }}
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
- name: server-tls
|
||||
mountPath: {{ .Values.server.tls.mountPath }}
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: agent-keys
|
||||
emptyDir:
|
||||
sizeLimit: 1Gi
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
- name: server-tls
|
||||
secret:
|
||||
secretName: {{ include "certctl.tls.secretName" . }}
|
||||
defaultMode: 0400
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,51 @@
|
||||
{{- if .Values.ingress.enabled }}
|
||||
{{- if and .Values.ingress.certManager.enabled (not .Values.ingress.certManager.issuerRef.name) -}}
|
||||
{{- fail "\n\ningress.certManager.enabled=true but ingress.certManager.issuerRef.name is empty.\n\nSet:\n --set ingress.certManager.issuerRef.name=<your-issuer-or-clusterissuer>\n\nThis is separate from server.tls.certManager — it issues the external-facing\nIngress cert, not the in-cluster server TLS cert. See docs/tls.md.\n" -}}
|
||||
{{- end -}}
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
annotations:
|
||||
{{- if .Values.ingress.certManager.enabled }}
|
||||
{{- if eq .Values.ingress.certManager.issuerRef.kind "ClusterIssuer" }}
|
||||
cert-manager.io/cluster-issuer: {{ .Values.ingress.certManager.issuerRef.name | quote }}
|
||||
{{- else }}
|
||||
cert-manager.io/issuer: {{ .Values.ingress.certManager.issuerRef.name | quote }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- with .Values.ingress.annotations }}
|
||||
{{- toYaml . | nindent 4 }}
|
||||
{{- end }}
|
||||
spec:
|
||||
{{- if .Values.ingress.className }}
|
||||
ingressClassName: {{ .Values.ingress.className }}
|
||||
{{- end }}
|
||||
{{- if .Values.ingress.tls }}
|
||||
tls:
|
||||
{{- range .Values.ingress.tls }}
|
||||
- hosts:
|
||||
{{- range .hosts }}
|
||||
- {{ . | quote }}
|
||||
{{- end }}
|
||||
secretName: {{ .secretName }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
rules:
|
||||
{{- range .Values.ingress.hosts }}
|
||||
- host: {{ .host | quote }}
|
||||
http:
|
||||
paths:
|
||||
{{- range .paths }}
|
||||
- path: {{ .path }}
|
||||
pathType: {{ .pathType }}
|
||||
backend:
|
||||
service:
|
||||
name: {{ include "certctl.fullname" $ }}-server
|
||||
port:
|
||||
number: {{ $.Values.server.service.port }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,12 @@
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: postgres
|
||||
type: Opaque
|
||||
stringData:
|
||||
password: {{ .Values.postgresql.auth.password | default "changeme" | quote }}
|
||||
username: {{ .Values.postgresql.auth.username | quote }}
|
||||
database: {{ .Values.postgresql.auth.database | quote }}
|
||||
@@ -0,0 +1,18 @@
|
||||
{{- if .Values.postgresql.enabled }}
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: postgres
|
||||
spec:
|
||||
clusterIP: None
|
||||
ports:
|
||||
- port: {{ .Values.postgresql.service.port }}
|
||||
targetPort: postgres
|
||||
protocol: TCP
|
||||
name: postgres
|
||||
selector:
|
||||
{{- include "certctl.postgresSelectorLabels" . | nindent 4 }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,79 @@
|
||||
{{- if .Values.postgresql.enabled }}
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: postgres
|
||||
spec:
|
||||
serviceName: {{ include "certctl.fullname" . }}-postgres
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "certctl.postgresSelectorLabels" . | nindent 6 }}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "certctl.postgresSelectorLabels" . | nindent 8 }}
|
||||
spec:
|
||||
securityContext:
|
||||
{{- toYaml .Values.postgresql.securityContext | nindent 8 }}
|
||||
{{- with .Values.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
containers:
|
||||
- name: postgres
|
||||
image: {{ include "certctl.postgresImage" . }}
|
||||
imagePullPolicy: {{ .Values.postgresql.image.pullPolicy }}
|
||||
ports:
|
||||
- name: postgres
|
||||
containerPort: 5432
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: POSTGRES_DB
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
key: database
|
||||
- name: POSTGRES_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
key: username
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
key: password
|
||||
- name: POSTGRES_INITDB_ARGS
|
||||
value: "--encoding=UTF8"
|
||||
livenessProbe:
|
||||
{{- toYaml .Values.postgresql.livenessProbe | nindent 12 }}
|
||||
readinessProbe:
|
||||
{{- toYaml .Values.postgresql.readinessProbe | nindent 12 }}
|
||||
resources:
|
||||
{{- toYaml .Values.postgresql.resources | nindent 12 }}
|
||||
volumeMounts:
|
||||
- name: postgres-data
|
||||
mountPath: /var/lib/postgresql/data
|
||||
subPath: postgres
|
||||
- name: postgres-init
|
||||
mountPath: /docker-entrypoint-initdb.d
|
||||
volumes:
|
||||
- name: postgres-init
|
||||
emptyDir: {}
|
||||
volumeClaimTemplates:
|
||||
- metadata:
|
||||
name: postgres-data
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
{{- if .Values.postgresql.storage.storageClass }}
|
||||
storageClassName: {{ .Values.postgresql.storage.storageClass }}
|
||||
{{- end }}
|
||||
resources:
|
||||
requests:
|
||||
storage: {{ .Values.postgresql.storage.size }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,31 @@
|
||||
{{- if .Values.server.tls.certManager.enabled }}
|
||||
{{- include "certctl.tls.required" . }}
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-server-tls
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: server
|
||||
spec:
|
||||
secretName: {{ include "certctl.tls.secretName" . }}
|
||||
commonName: {{ .Values.server.tls.certManager.commonName | quote }}
|
||||
dnsNames:
|
||||
{{- range .Values.server.tls.certManager.dnsNames }}
|
||||
- {{ . | quote }}
|
||||
{{- end }}
|
||||
duration: {{ .Values.server.tls.certManager.duration }}
|
||||
renewBefore: {{ .Values.server.tls.certManager.renewBefore }}
|
||||
usages:
|
||||
- server auth
|
||||
- digital signature
|
||||
- key encipherment
|
||||
privateKey:
|
||||
algorithm: ECDSA
|
||||
size: 256
|
||||
rotationPolicy: Always
|
||||
issuerRef:
|
||||
name: {{ .Values.server.tls.certManager.issuerRef.name | quote }}
|
||||
kind: {{ .Values.server.tls.certManager.issuerRef.kind }}
|
||||
group: {{ .Values.server.tls.certManager.issuerRef.group }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,37 @@
|
||||
{{- include "certctl.validateAuthType" . }}
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: server
|
||||
data:
|
||||
log-level: {{ .Values.server.logging.level | quote }}
|
||||
auth-type: {{ .Values.server.auth.type | quote }}
|
||||
keygen-mode: {{ .Values.server.keygen.mode | quote }}
|
||||
rate-limit-rps: {{ .Values.server.rateLimiting.rps | quote }}
|
||||
rate-limit-burst: {{ .Values.server.rateLimiting.burst | quote }}
|
||||
{{- if .Values.server.cors.origins }}
|
||||
cors-origins: {{ .Values.server.cors.origins | quote }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.networkScan.enabled }}
|
||||
network-scan-interval: {{ .Values.server.networkScan.interval | quote }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.est.enabled }}
|
||||
est-issuer-id: {{ .Values.server.est.issuerID | quote }}
|
||||
{{- if .Values.server.est.profileID }}
|
||||
est-profile-id: {{ .Values.server.est.profileID | quote }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.smtp.enabled }}
|
||||
smtp-host: {{ .Values.server.smtp.host | quote }}
|
||||
smtp-port: {{ .Values.server.smtp.port | quote }}
|
||||
smtp-username: {{ .Values.server.smtp.username | quote }}
|
||||
smtp-from-address: {{ .Values.server.smtp.fromAddress | quote }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.issuer.acme.enabled }}
|
||||
acme-directory-url: {{ .Values.server.issuer.acme.directoryURL | quote }}
|
||||
acme-email: {{ .Values.server.issuer.acme.email | quote }}
|
||||
acme-challenge-type: {{ .Values.server.issuer.acme.challengeType | quote }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,209 @@
|
||||
{{- include "certctl.tls.required" . }}
|
||||
{{- include "certctl.validateAuthType" . }}
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: server
|
||||
spec:
|
||||
{{- if gt (int .Values.server.replicas) 1 }}
|
||||
replicas: {{ .Values.server.replicas }}
|
||||
{{- end }}
|
||||
selector:
|
||||
matchLabels:
|
||||
{{- include "certctl.serverSelectorLabels" . | nindent 6 }}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{- include "certctl.serverSelectorLabels" . | nindent 8 }}
|
||||
annotations:
|
||||
checksum/config: {{ include (print $.Template.BasePath "/server-configmap.yaml") . | sha256sum }}
|
||||
checksum/secret: {{ include (print $.Template.BasePath "/server-secret.yaml") . | sha256sum }}
|
||||
spec:
|
||||
serviceAccountName: {{ include "certctl.serviceAccountName" . }}
|
||||
securityContext:
|
||||
{{- toYaml .Values.server.securityContext | nindent 8 }}
|
||||
{{- with .Values.imagePullSecrets }}
|
||||
imagePullSecrets:
|
||||
{{- toYaml . | nindent 8 }}
|
||||
{{- end }}
|
||||
containers:
|
||||
- name: server
|
||||
image: {{ include "certctl.serverImage" . }}
|
||||
imagePullPolicy: {{ .Values.server.image.pullPolicy }}
|
||||
ports:
|
||||
- name: https
|
||||
containerPort: {{ .Values.server.port }}
|
||||
protocol: TCP
|
||||
env:
|
||||
- name: CERTCTL_SERVER_HOST
|
||||
value: "0.0.0.0"
|
||||
- name: CERTCTL_SERVER_PORT
|
||||
value: "{{ .Values.server.port }}"
|
||||
- name: CERTCTL_SERVER_TLS_CERT_PATH
|
||||
value: "{{ .Values.server.tls.mountPath }}/tls.crt"
|
||||
- name: CERTCTL_SERVER_TLS_KEY_PATH
|
||||
value: "{{ .Values.server.tls.mountPath }}/tls.key"
|
||||
- name: CERTCTL_DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: database-url
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-postgres
|
||||
key: password
|
||||
- name: CERTCTL_LOG_LEVEL
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: log-level
|
||||
- name: CERTCTL_LOG_FORMAT
|
||||
value: "json"
|
||||
- name: CERTCTL_AUTH_TYPE
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: auth-type
|
||||
{{- if eq .Values.server.auth.type "api-key" }}
|
||||
- name: CERTCTL_AUTH_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: api-key
|
||||
{{- end }}
|
||||
- name: CERTCTL_KEYGEN_MODE
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: keygen-mode
|
||||
- name: CERTCTL_RATE_LIMIT_RPS
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: rate-limit-rps
|
||||
- name: CERTCTL_RATE_LIMIT_BURST
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: rate-limit-burst
|
||||
{{- if .Values.server.cors.origins }}
|
||||
- name: CERTCTL_CORS_ORIGINS
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: cors-origins
|
||||
{{- end }}
|
||||
{{- if .Values.server.networkScan.enabled }}
|
||||
- name: CERTCTL_NETWORK_SCAN_ENABLED
|
||||
value: "true"
|
||||
- name: CERTCTL_NETWORK_SCAN_INTERVAL
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: network-scan-interval
|
||||
{{- end }}
|
||||
{{- if .Values.server.est.enabled }}
|
||||
- name: CERTCTL_EST_ENABLED
|
||||
value: "true"
|
||||
- name: CERTCTL_EST_ISSUER_ID
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: est-issuer-id
|
||||
{{- if .Values.server.est.profileID }}
|
||||
- name: CERTCTL_EST_PROFILE_ID
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: est-profile-id
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.smtp.enabled }}
|
||||
- name: CERTCTL_SMTP_HOST
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: smtp-host
|
||||
- name: CERTCTL_SMTP_PORT
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: smtp-port
|
||||
- name: CERTCTL_SMTP_USERNAME
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: smtp-username
|
||||
- name: CERTCTL_SMTP_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: smtp-password
|
||||
- name: CERTCTL_SMTP_FROM_ADDRESS
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: smtp-from-address
|
||||
{{- end }}
|
||||
{{- if .Values.server.issuer.acme.enabled }}
|
||||
- name: CERTCTL_ACME_DIRECTORY_URL
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: acme-directory-url
|
||||
- name: CERTCTL_ACME_EMAIL
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: acme-email
|
||||
- name: CERTCTL_ACME_CHALLENGE_TYPE
|
||||
valueFrom:
|
||||
configMapKeyRef:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
key: acme-challenge-type
|
||||
{{- end }}
|
||||
{{- with .Values.server.env }}
|
||||
{{- toYaml . | nindent 12 }}
|
||||
{{- end }}
|
||||
livenessProbe:
|
||||
{{- toYaml .Values.server.livenessProbe | nindent 12 }}
|
||||
readinessProbe:
|
||||
{{- toYaml .Values.server.readinessProbe | nindent 12 }}
|
||||
resources:
|
||||
{{- toYaml .Values.server.resources | nindent 12 }}
|
||||
volumeMounts:
|
||||
- name: tmp
|
||||
mountPath: /tmp
|
||||
- name: tls
|
||||
mountPath: {{ .Values.server.tls.mountPath }}
|
||||
readOnly: true
|
||||
{{- if .Values.server.volumeMounts }}
|
||||
{{- toYaml .Values.server.volumeMounts | nindent 12 }}
|
||||
{{- end }}
|
||||
volumes:
|
||||
- name: tmp
|
||||
emptyDir: {}
|
||||
- name: tls
|
||||
secret:
|
||||
secretName: {{ include "certctl.tls.secretName" . }}
|
||||
defaultMode: 0400
|
||||
{{- if .Values.server.volumes }}
|
||||
{{- toYaml .Values.server.volumes | nindent 8 }}
|
||||
{{- end }}
|
||||
{{- if .Values.nodeAffinity }}
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
{{- toYaml .Values.nodeAffinity | nindent 10 }}
|
||||
{{- else if .Values.podAntiAffinity }}
|
||||
affinity:
|
||||
podAntiAffinity:
|
||||
{{- toYaml .Values.podAntiAffinity | nindent 10 }}
|
||||
{{- else if .Values.podAffinity }}
|
||||
affinity:
|
||||
podAffinity:
|
||||
{{- toYaml .Values.podAffinity | nindent 10 }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,17 @@
|
||||
{{- include "certctl.validateAuthType" . }}
|
||||
apiVersion: v1
|
||||
kind: Secret
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: server
|
||||
type: Opaque
|
||||
stringData:
|
||||
database-url: postgres://{{ .Values.postgresql.auth.username }}:$(POSTGRES_PASSWORD)@{{ include "certctl.fullname" . }}-postgres:5432/{{ .Values.postgresql.auth.database }}?sslmode=disable
|
||||
{{- if and (eq .Values.server.auth.type "api-key") .Values.server.auth.apiKey }}
|
||||
api-key: {{ .Values.server.auth.apiKey | quote }}
|
||||
{{- end }}
|
||||
{{- if .Values.server.smtp.enabled }}
|
||||
smtp-password: {{ .Values.server.smtp.password | quote }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,20 @@
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}-server
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
app.kubernetes.io/component: server
|
||||
{{- with .Values.server.service.annotations }}
|
||||
annotations:
|
||||
{{- toYaml . | nindent 4 }}
|
||||
{{- end }}
|
||||
spec:
|
||||
type: {{ .Values.server.service.type }}
|
||||
ports:
|
||||
- port: {{ .Values.server.service.port }}
|
||||
targetPort: https
|
||||
protocol: TCP
|
||||
name: https
|
||||
selector:
|
||||
{{- include "certctl.serverSelectorLabels" . | nindent 4 }}
|
||||
@@ -0,0 +1,44 @@
|
||||
{{- if .Values.serviceAccount.create }}
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: {{ include "certctl.serviceAccountName" . }}
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
{{- with .Values.serviceAccount.annotations }}
|
||||
annotations:
|
||||
{{- toYaml . | nindent 4 }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- if .Values.rbac.create }}
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRole
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
rules:
|
||||
{{- if .Values.kubernetesSecrets.enabled }}
|
||||
- apiGroups: [""]
|
||||
resources: ["secrets"]
|
||||
verbs: ["get", "list", "create", "update", "patch"]
|
||||
{{- else }}
|
||||
[]
|
||||
{{- end }}
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: ClusterRoleBinding
|
||||
metadata:
|
||||
name: {{ include "certctl.fullname" . }}
|
||||
labels:
|
||||
{{- include "certctl.labels" . | nindent 4 }}
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: ClusterRole
|
||||
name: {{ include "certctl.fullname" . }}
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: {{ include "certctl.serviceAccountName" . }}
|
||||
namespace: {{ .Release.Namespace }}
|
||||
{{- end }}
|
||||
@@ -0,0 +1,541 @@
|
||||
# Default values for certctl Helm chart
|
||||
# This is a YAML-formatted file.
|
||||
# Declare variables to be passed into your templates.
|
||||
|
||||
# Namespace override (optional)
|
||||
namespace: ""
|
||||
|
||||
# Global configuration
|
||||
commonLabels: {}
|
||||
imagePullSecrets: []
|
||||
nameOverride: ""
|
||||
fullnameOverride: ""
|
||||
|
||||
# ==============================================================================
|
||||
# Certctl Server Configuration
|
||||
# ==============================================================================
|
||||
server:
|
||||
# Number of replicas (for HA deployments)
|
||||
replicas: 1
|
||||
|
||||
# Image configuration
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl
|
||||
tag: "" # defaults to Chart.appVersion
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Server port
|
||||
port: 8443
|
||||
|
||||
# Resource requests and limits
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
|
||||
# Pod security context
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
readOnlyRootFilesystem: true
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Liveness and readiness probes (HTTPS-only as of v2.2).
|
||||
#
|
||||
# The two paths exposed for probes are `/health` and `/ready` —
|
||||
# registered in internal/api/router/router.go:76-85 and bypassing the
|
||||
# auth middleware via the no-auth list at cmd/server/main.go:920.
|
||||
# Both serve the same JSON shape today (`{"status":"healthy"}` /
|
||||
# `{"status":"ready"}`) but exist as separate routes so liveness and
|
||||
# readiness can diverge in the future without renaming.
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: https
|
||||
scheme: HTTPS
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
|
||||
# U-2 (P1, cat-u-healthcheck_protocol_mismatch — adjacent fix): pre-U-2
|
||||
# the readiness probe pointed at `/readyz`, the conventional kube-flavor
|
||||
# name. The certctl server doesn't register `/readyz` (only `/health`
|
||||
# and `/ready`) — see cmd/server/main.go:920 and
|
||||
# internal/api/router/router.go:81. K8s readiness probes therefore
|
||||
# received a 404 (or, with auth enabled, a 401 from the api-key middleware
|
||||
# because `/readyz` was NOT in the no-auth bypass set), pods stayed
|
||||
# `NotReady` indefinitely, and Helm rollouts stalled. Post-U-2 the path
|
||||
# matches a registered route.
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ready
|
||||
port: https
|
||||
scheme: HTTPS
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 2
|
||||
|
||||
# TLS configuration — REQUIRED. HTTPS is the only supported mode (v2.2+).
|
||||
# Operator must configure EXACTLY ONE of:
|
||||
# (a) server.tls.existingSecret: <name> # pre-existing kubernetes.io/tls Secret
|
||||
# (b) server.tls.certManager.enabled: true # provision a cert-manager Certificate CR
|
||||
# Refusing to set either makes `helm template` fail with a diagnostic pointing at docs/tls.md.
|
||||
tls:
|
||||
# Name of a pre-existing Secret (type kubernetes.io/tls) holding tls.crt + tls.key (+ optional ca.crt).
|
||||
# Leave empty to fall through to the cert-manager path.
|
||||
existingSecret: ""
|
||||
|
||||
# Mount path for the TLS Secret inside the server + agent containers.
|
||||
mountPath: /etc/certctl/tls
|
||||
|
||||
# cert-manager auto-provisioning. Opt-in (off by default per milestone §3.4).
|
||||
certManager:
|
||||
enabled: false
|
||||
|
||||
# Secret name the cert-manager Certificate CR writes into. Agents and the server
|
||||
# both read from this Secret. If empty, defaults to "<fullname>-tls".
|
||||
secretName: ""
|
||||
|
||||
# Cert-manager issuer reference.
|
||||
issuerRef:
|
||||
name: "" # e.g. "letsencrypt-prod" or "internal-ca"
|
||||
kind: ClusterIssuer # ClusterIssuer or Issuer
|
||||
group: cert-manager.io
|
||||
|
||||
# Subject fields on the issued cert.
|
||||
commonName: "certctl-server"
|
||||
dnsNames:
|
||||
- certctl-server
|
||||
- localhost
|
||||
|
||||
# Certificate lifetime + renewal window.
|
||||
duration: 2160h # 90 days
|
||||
renewBefore: 360h # 15 days
|
||||
|
||||
# Service type (ClusterIP, LoadBalancer, NodePort)
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 8443
|
||||
annotations: {}
|
||||
|
||||
# Authentication configuration.
|
||||
# Valid types: "api-key" (production) or "none" (demo only — disables
|
||||
# authentication on the API and logs a loud Warn at server startup).
|
||||
# For JWT/OIDC, run an authenticating gateway in front of certctl
|
||||
# (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium)
|
||||
# and set type=none here so the gateway terminates federated identity.
|
||||
# See docs/architecture.md "Authenticating-gateway pattern".
|
||||
#
|
||||
# G-1 (P1): pre-G-1 the chart accepted server.auth.type=jwt and the
|
||||
# certctl-server container silently routed every request through the
|
||||
# api-key bearer middleware — silent auth downgrade. Post-G-1 the
|
||||
# chart's `certctl.validateAuthType` template helper rejects any value
|
||||
# outside {api-key, none} at template time. See
|
||||
# docs/upgrade-to-v2-jwt-removal.md if you previously set type=jwt.
|
||||
auth:
|
||||
type: api-key
|
||||
apiKey: "" # REQUIRED when type=api-key (set via --set or values override).
|
||||
|
||||
# Logging configuration
|
||||
logging:
|
||||
level: info # debug, info, warn, error
|
||||
format: json # json or text
|
||||
|
||||
# SMTP configuration for email notifications (optional)
|
||||
smtp:
|
||||
enabled: false
|
||||
host: ""
|
||||
port: 587
|
||||
username: ""
|
||||
password: ""
|
||||
fromAddress: ""
|
||||
useTLS: true
|
||||
|
||||
# Certificate digest digest (periodic email summary)
|
||||
digest:
|
||||
enabled: false
|
||||
interval: "24h"
|
||||
recipients: []
|
||||
# Example:
|
||||
# - admin@example.com
|
||||
# - ops@example.com
|
||||
|
||||
# Enrollment over Secure Transport (EST) configuration
|
||||
est:
|
||||
enabled: false
|
||||
issuerID: "iss-local"
|
||||
profileID: ""
|
||||
|
||||
# Rate limiting configuration
|
||||
rateLimiting:
|
||||
rps: 100 # Requests per second
|
||||
burst: 200 # Burst capacity
|
||||
|
||||
# Network scanning configuration
|
||||
networkScan:
|
||||
enabled: false
|
||||
interval: "6h"
|
||||
|
||||
# Certificate key generation mode
|
||||
keygen:
|
||||
mode: agent # Options: agent (production), server (demo with warning)
|
||||
|
||||
# CORS configuration
|
||||
cors:
|
||||
origins: "" # Comma-separated list, empty means deny all cross-origin requests
|
||||
|
||||
# Issuer connectors configuration
|
||||
issuer:
|
||||
local:
|
||||
enabled: true
|
||||
# For sub-CA mode, provide these paths:
|
||||
# caCertPath: /path/to/ca.crt
|
||||
# caKeyPath: /path/to/ca.key
|
||||
|
||||
acme:
|
||||
enabled: false
|
||||
directoryURL: ""
|
||||
email: ""
|
||||
challengeType: "http-01" # Options: http-01, dns-01, dns-persist-01
|
||||
# DNS configuration (for dns-01 or dns-persist-01)
|
||||
# dnsPresentScript: /path/to/dns-present.sh
|
||||
# dnsCleanupScript: /path/to/dns-cleanup.sh
|
||||
# dnsPropagationWait: "30s"
|
||||
# dnsPersistIssuerDomain: "validation.example.com"
|
||||
# EAB configuration (for ZeroSSL, Google Trust Services, etc.)
|
||||
# eabKid: ""
|
||||
# eabHmac: ""
|
||||
|
||||
stepca:
|
||||
enabled: false
|
||||
# rootCAPath: /path/to/root_ca.crt
|
||||
# intermediateCAPath: /path/to/intermediate_ca.crt
|
||||
# provisionerName: ""
|
||||
# provisionerPassword: ""
|
||||
|
||||
openssl:
|
||||
enabled: false
|
||||
# signScript: /path/to/sign.sh
|
||||
# revokeScript: /path/to/revoke.sh
|
||||
# crlScript: /path/to/crl.sh
|
||||
# timeoutSeconds: 30
|
||||
|
||||
# Notifier connectors configuration
|
||||
notifiers:
|
||||
slack:
|
||||
enabled: false
|
||||
# webhookUrl: ""
|
||||
# channel: ""
|
||||
# username: ""
|
||||
# iconEmoji: ""
|
||||
|
||||
teams:
|
||||
enabled: false
|
||||
# webhookUrl: ""
|
||||
|
||||
pagerduty:
|
||||
enabled: false
|
||||
# routingKey: ""
|
||||
# severity: warning
|
||||
|
||||
opsgenie:
|
||||
enabled: false
|
||||
# apiKey: ""
|
||||
# priority: P3
|
||||
|
||||
# Additional environment variables
|
||||
# Will be passed as-is to the server container
|
||||
env: {}
|
||||
# Example:
|
||||
# CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL: "1h"
|
||||
# CERTCTL_DATABASE_MAX_CONNS: "25"
|
||||
|
||||
# Additional volume mounts for custom configurations
|
||||
# volumeMounts: []
|
||||
# - name: ca-cert
|
||||
# mountPath: /etc/ssl/certs/ca.crt
|
||||
# subPath: ca.crt
|
||||
|
||||
# Additional volumes
|
||||
# volumes: []
|
||||
# - name: ca-cert
|
||||
# secret:
|
||||
# secretName: ca-cert
|
||||
|
||||
# ==============================================================================
|
||||
# PostgreSQL Configuration
|
||||
# ==============================================================================
|
||||
postgresql:
|
||||
# Enable/disable PostgreSQL (set to false if using external database)
|
||||
enabled: true
|
||||
|
||||
# Image configuration
|
||||
image:
|
||||
repository: postgres
|
||||
tag: "16-alpine"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Authentication
|
||||
auth:
|
||||
database: certctl
|
||||
username: certctl
|
||||
# REQUIRED — set via `--set postgresql.auth.password=<value>` or values override.
|
||||
#
|
||||
# WARNING (U-1): rotating this value after first deploy does NOT change the
|
||||
# database password. The `postgres:16-alpine` image runs `initdb` only when
|
||||
# /var/lib/postgresql/data is empty, so POSTGRES_PASSWORD is written into
|
||||
# pg_authid exactly once — on the first boot of the StatefulSet's PVC.
|
||||
# Subsequent rollouts pick up the new env value in the postgres container
|
||||
# but the certctl-server container's CERTCTL_DATABASE_URL also picks up
|
||||
# the new value, while pg_authid still expects the old one — leading to
|
||||
# `pq: password authentication failed for user "certctl"` (SQLSTATE 28P01).
|
||||
#
|
||||
# The certctl-server emits guidance via internal/repository/postgres/db.go::
|
||||
# wrapPingError when it sees SQLSTATE 28P01 at startup. To resolve in a
|
||||
# Helm deployment:
|
||||
# - Non-destructive (preferred for environments with data):
|
||||
# kubectl exec -it <release>-postgres-0 -- \
|
||||
# psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new>';"
|
||||
# then update the secret/values to match and let the certctl-server
|
||||
# pod restart against the matching credential.
|
||||
# - Destructive (DESTROYS DATA — only acceptable on dev/demo PVCs):
|
||||
# helm uninstall <release> && \
|
||||
# kubectl delete pvc -l app.kubernetes.io/name=certctl,app.kubernetes.io/component=postgres && \
|
||||
# helm install <release> ... # PVC re-creates empty, initdb seeds new password
|
||||
password: ""
|
||||
|
||||
# Storage configuration
|
||||
storage:
|
||||
size: 10Gi
|
||||
storageClass: "" # Uses default StorageClass if empty
|
||||
# deleteOnTermination: false # Keep data on Helm uninstall
|
||||
|
||||
# Resource requests and limits
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
|
||||
# Pod security context
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 999
|
||||
runAsGroup: 999
|
||||
fsGroup: 999
|
||||
|
||||
# Liveness and readiness probes
|
||||
livenessProbe:
|
||||
exec:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- pg_isready -U certctl -d certctl
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 3
|
||||
|
||||
readinessProbe:
|
||||
exec:
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
- pg_isready -U certctl -d certctl
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 2
|
||||
|
||||
# Service configuration
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 5432
|
||||
|
||||
# PostgreSQL-specific settings
|
||||
postgresqlConfig: {}
|
||||
# Example:
|
||||
# max_connections: "200"
|
||||
# shared_buffers: "256MB"
|
||||
|
||||
# ==============================================================================
|
||||
# Certctl Agent Configuration
|
||||
# ==============================================================================
|
||||
agent:
|
||||
# Enable/disable agent deployment
|
||||
enabled: true
|
||||
|
||||
# Deployment strategy: DaemonSet (recommended) or Deployment
|
||||
kind: DaemonSet # Options: DaemonSet, Deployment
|
||||
|
||||
# Image configuration
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl-agent
|
||||
tag: "" # defaults to Chart.appVersion
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
# Number of replicas (for Deployment kind; ignored for DaemonSet)
|
||||
replicas: 1
|
||||
|
||||
# Resource requests and limits
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 64Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
|
||||
# Pod security context
|
||||
securityContext:
|
||||
runAsNonRoot: true
|
||||
runAsUser: 1000
|
||||
runAsGroup: 1000
|
||||
fsGroup: 1000
|
||||
readOnlyRootFilesystem: true
|
||||
allowPrivilegeEscalation: false
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
|
||||
# Agent name (can be overridden per pod via StatefulSet ordinals)
|
||||
name: "" # If empty, uses release name
|
||||
|
||||
# Key storage directory
|
||||
keyDir: /var/lib/certctl/keys
|
||||
|
||||
# Certificate discovery directories (comma-separated)
|
||||
discoveryDirs: ""
|
||||
# Example: "/etc/ssl/certs,/etc/pki/tls"
|
||||
|
||||
# Node selector for agent pods (for DaemonSet)
|
||||
nodeSelector: {}
|
||||
# Example:
|
||||
# node-role.kubernetes.io/worker: "true"
|
||||
|
||||
# Tolerations for agent pods
|
||||
tolerations: []
|
||||
# Example:
|
||||
# - key: node-role
|
||||
# operator: Equal
|
||||
# value: worker
|
||||
# effect: NoSchedule
|
||||
|
||||
# Affinity rules
|
||||
affinity: {}
|
||||
|
||||
# Additional environment variables
|
||||
env: {}
|
||||
|
||||
# ==============================================================================
|
||||
# Ingress Configuration
|
||||
# ==============================================================================
|
||||
ingress:
|
||||
enabled: false
|
||||
className: ""
|
||||
annotations: {}
|
||||
# kubernetes.io/ingress.class: nginx
|
||||
|
||||
# Optional cert-manager integration for the public-facing Ingress cert.
|
||||
# This is completely independent of server.tls.* — the Ingress terminates
|
||||
# an *additional* TLS hop between the internet and the in-cluster Service.
|
||||
# Leave disabled unless an Ingress is exposing certctl to the outside world.
|
||||
certManager:
|
||||
enabled: false
|
||||
issuerRef:
|
||||
name: "" # e.g. "letsencrypt-prod"
|
||||
kind: ClusterIssuer # ClusterIssuer or Issuer
|
||||
hosts:
|
||||
- host: certctl.local
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls: []
|
||||
# - secretName: certctl-tls
|
||||
# hosts:
|
||||
# - certctl.local
|
||||
|
||||
# ==============================================================================
|
||||
# Service Account Configuration
|
||||
# ==============================================================================
|
||||
serviceAccount:
|
||||
create: true
|
||||
annotations: {}
|
||||
name: "" # defaults to release name if empty
|
||||
|
||||
# ==============================================================================
|
||||
# RBAC Configuration
|
||||
# ==============================================================================
|
||||
rbac:
|
||||
create: true
|
||||
|
||||
# ==============================================================================
|
||||
# Kubernetes Secrets Target Connector
|
||||
# ==============================================================================
|
||||
kubernetesSecrets:
|
||||
# Enable RBAC rules for managing TLS Secrets
|
||||
enabled: false
|
||||
|
||||
# ==============================================================================
|
||||
# Pod Disruption Budget (for HA deployments)
|
||||
# ==============================================================================
|
||||
podDisruptionBudget:
|
||||
enabled: false
|
||||
minAvailable: 1
|
||||
# maxUnavailable: 1
|
||||
|
||||
# ==============================================================================
|
||||
# Monitoring Configuration
|
||||
# ==============================================================================
|
||||
monitoring:
|
||||
enabled: false
|
||||
# Prometheus ServiceMonitor
|
||||
serviceMonitor:
|
||||
enabled: false
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
# labels: {}
|
||||
# selector: {}
|
||||
|
||||
# ==============================================================================
|
||||
# Advanced Configuration
|
||||
# ==============================================================================
|
||||
|
||||
# Node affinity for server pods
|
||||
nodeAffinity: {}
|
||||
|
||||
# Pod affinity for server pods
|
||||
podAffinity: {}
|
||||
|
||||
# Pod anti-affinity for server pods (for HA)
|
||||
podAntiAffinity: {}
|
||||
# Example:
|
||||
# podAntiAffinity:
|
||||
# preferredDuringSchedulingIgnoredDuringExecution:
|
||||
# - weight: 100
|
||||
# podAffinityTerm:
|
||||
# labelSelector:
|
||||
# matchExpressions:
|
||||
# - key: app.kubernetes.io/name
|
||||
# operator: In
|
||||
# values:
|
||||
# - certctl
|
||||
# topologyKey: kubernetes.io/hostname
|
||||
|
||||
# Custom labels for all resources
|
||||
customLabels: {}
|
||||
|
||||
# Custom annotations for all resources
|
||||
customAnnotations: {}
|
||||
@@ -0,0 +1,77 @@
|
||||
# Certctl with ACME DNS-01 Challenge (Let's Encrypt)
|
||||
# Enables automatic certificate issuance from Let's Encrypt
|
||||
# using DNS-01 verification (wildcard-capable)
|
||||
|
||||
server:
|
||||
auth:
|
||||
type: api-key
|
||||
apiKey: "CHANGE_ME"
|
||||
|
||||
issuer:
|
||||
local:
|
||||
enabled: true
|
||||
|
||||
acme:
|
||||
enabled: true
|
||||
directoryURL: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: admin@example.com
|
||||
challengeType: dns-01
|
||||
dnsPresentScript: /scripts/dns-present.sh
|
||||
dnsCleanupScript: /scripts/dns-cleanup.sh
|
||||
dnsPropagationWait: 30s
|
||||
# For DNS-PERSIST-01 (standing validation record, no per-renewal updates):
|
||||
# challengeType: dns-persist-01
|
||||
# dnsPersistIssuerDomain: validation.example.com
|
||||
|
||||
# Mount DNS scripts as ConfigMap
|
||||
volumes:
|
||||
- name: dns-scripts
|
||||
configMap:
|
||||
name: dns-scripts
|
||||
defaultMode: 0755
|
||||
|
||||
volumeMounts:
|
||||
- name: dns-scripts
|
||||
mountPath: /scripts
|
||||
readOnly: true
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
storage:
|
||||
size: 20Gi
|
||||
|
||||
agent:
|
||||
enabled: true
|
||||
kind: DaemonSet
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
|
||||
---
|
||||
# You'll need to create the DNS scripts ConfigMap separately:
|
||||
#
|
||||
# kubectl create configmap dns-scripts \
|
||||
# --from-file=dns-present.sh=./scripts/dns-present.sh \
|
||||
# --from-file=dns-cleanup.sh=./scripts/dns-cleanup.sh
|
||||
#
|
||||
# Example dns-present.sh (Cloudflare):
|
||||
# #!/bin/bash
|
||||
# DOMAIN=$1
|
||||
# TOKEN=$2
|
||||
#
|
||||
# curl -X POST "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records" \
|
||||
# -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}" \
|
||||
# -d "{\"type\":\"TXT\",\"name\":\"_acme-challenge.${DOMAIN}\",\"content\":\"${TOKEN}\"}"
|
||||
#
|
||||
# Example dns-cleanup.sh (Cloudflare):
|
||||
# #!/bin/bash
|
||||
# DOMAIN=$1
|
||||
#
|
||||
# curl -X DELETE "https://api.cloudflare.com/client/v4/zones/{zone_id}/dns_records/{record_id}" \
|
||||
# -H "Authorization: Bearer ${CLOUDFLARE_API_TOKEN}"
|
||||
@@ -0,0 +1,99 @@
|
||||
# Certctl Development Configuration
|
||||
# Lightweight setup for development and testing
|
||||
# - Single server replica
|
||||
# - Small PostgreSQL storage
|
||||
# - Minimal resource limits
|
||||
# - No ingress or monitoring
|
||||
# - Demo auth mode (no API key required)
|
||||
|
||||
server:
|
||||
replicas: 1
|
||||
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl
|
||||
pullPolicy: IfNotPresent # Use latest tag
|
||||
|
||||
port: 8443
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 64Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
|
||||
auth:
|
||||
type: none # Demo mode - no authentication
|
||||
|
||||
logging:
|
||||
level: debug
|
||||
format: json
|
||||
|
||||
service:
|
||||
type: LoadBalancer # Easy external access for dev
|
||||
|
||||
issuer:
|
||||
local:
|
||||
enabled: true
|
||||
|
||||
rateLimiting:
|
||||
rps: 100
|
||||
burst: 200
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
|
||||
image:
|
||||
repository: postgres
|
||||
tag: "16-alpine"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
auth:
|
||||
database: certctl
|
||||
username: certctl
|
||||
password: "dev-password-change-me"
|
||||
|
||||
storage:
|
||||
size: 5Gi
|
||||
storageClass: "" # Use default storage class
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
|
||||
agent:
|
||||
enabled: true
|
||||
kind: Deployment
|
||||
replicas: 1
|
||||
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl-agent
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 25m
|
||||
memory: 32Mi
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
|
||||
ingress:
|
||||
enabled: false
|
||||
|
||||
serviceAccount:
|
||||
create: true
|
||||
|
||||
rbac:
|
||||
create: true
|
||||
|
||||
monitoring:
|
||||
enabled: false
|
||||
|
||||
customLabels:
|
||||
environment: development
|
||||
@@ -0,0 +1,50 @@
|
||||
# Certctl with External PostgreSQL Database
|
||||
# Use this when PostgreSQL is managed externally:
|
||||
# - AWS RDS
|
||||
# - Cloud SQL (Google Cloud)
|
||||
# - Azure Database for PostgreSQL
|
||||
# - Self-managed PostgreSQL server
|
||||
|
||||
server:
|
||||
replicas: 2
|
||||
|
||||
auth:
|
||||
type: api-key
|
||||
apiKey: "CHANGE_ME"
|
||||
|
||||
issuer:
|
||||
local:
|
||||
enabled: true
|
||||
|
||||
# Pass external database URL via environment variable
|
||||
env:
|
||||
CERTCTL_DATABASE_URL: "postgres://certctl:CHANGE_ME@postgres.example.com:5432/certctl?sslmode=require"
|
||||
|
||||
# Disable internal PostgreSQL
|
||||
postgresql:
|
||||
enabled: false
|
||||
|
||||
agent:
|
||||
enabled: true
|
||||
kind: DaemonSet
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
|
||||
# For AWS RDS with IAM authentication:
|
||||
# env:
|
||||
# CERTCTL_DATABASE_URL: "postgres://certctl:CHANGE_ME@mydb.123456789.us-east-1.rds.amazonaws.com:5432/certctl?sslmode=require"
|
||||
|
||||
# For Google Cloud SQL:
|
||||
# env:
|
||||
# CERTCTL_DATABASE_URL: "postgres://certctl:CHANGE_ME@/certctl?host=/cloudsql/PROJECT:REGION:INSTANCE&sslmode=require"
|
||||
|
||||
# For Azure Database:
|
||||
# env:
|
||||
# CERTCTL_DATABASE_URL: "postgres://certctl@servername:CHANGE_ME@servername.postgres.database.azure.com:5432/certctl?sslmode=require"
|
||||
@@ -0,0 +1,159 @@
|
||||
# Certctl Production HA Configuration
|
||||
# High availability deployment with:
|
||||
# - 3 server replicas with pod anti-affinity
|
||||
# - Large PostgreSQL storage
|
||||
# - Resource limits for production
|
||||
# - Prometheus monitoring
|
||||
# - Network policies enforcement
|
||||
|
||||
namespace: certctl
|
||||
|
||||
server:
|
||||
replicas: 3
|
||||
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl
|
||||
tag: "2.1.0"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
port: 8443
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 1000m
|
||||
memory: 512Mi
|
||||
|
||||
auth:
|
||||
type: api-key
|
||||
apiKey: "CHANGE_ME_IN_PRODUCTION" # Use --set or sealed-secrets
|
||||
|
||||
logging:
|
||||
level: info
|
||||
format: json
|
||||
|
||||
service:
|
||||
type: ClusterIP
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "8443"
|
||||
prometheus.io/path: "/api/v1/metrics/prometheus"
|
||||
|
||||
issuer:
|
||||
local:
|
||||
enabled: true
|
||||
acme:
|
||||
enabled: true
|
||||
directoryURL: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: admin@example.com
|
||||
challengeType: dns-01
|
||||
|
||||
rateLimiting:
|
||||
rps: 500
|
||||
burst: 1000
|
||||
|
||||
postgresql:
|
||||
enabled: true
|
||||
|
||||
image:
|
||||
repository: postgres
|
||||
tag: "16-alpine"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
auth:
|
||||
database: certctl
|
||||
username: certctl
|
||||
password: "CHANGE_ME_IN_PRODUCTION" # Use --set or sealed-secrets
|
||||
|
||||
storage:
|
||||
size: 100Gi
|
||||
storageClass: "fast-ssd" # Use your high-performance storage class
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
cpu: 2000m
|
||||
memory: 2Gi
|
||||
|
||||
agent:
|
||||
enabled: true
|
||||
kind: DaemonSet
|
||||
|
||||
image:
|
||||
repository: ghcr.io/shankar0123/certctl-agent
|
||||
tag: "2.1.0"
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 256Mi
|
||||
|
||||
discoveryDirs: "/etc/ssl/certs,/etc/pki/tls,/etc/ssl"
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
||||
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
|
||||
hosts:
|
||||
- host: certctl.example.com
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: certctl-tls
|
||||
hosts:
|
||||
- certctl.example.com
|
||||
|
||||
serviceAccount:
|
||||
create: true
|
||||
annotations:
|
||||
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/certctl-role # For IRSA on AWS
|
||||
|
||||
rbac:
|
||||
create: true
|
||||
|
||||
podDisruptionBudget:
|
||||
enabled: true
|
||||
minAvailable: 2
|
||||
|
||||
monitoring:
|
||||
enabled: true
|
||||
serviceMonitor:
|
||||
enabled: true
|
||||
interval: 30s
|
||||
scrapeTimeout: 10s
|
||||
|
||||
# Pod anti-affinity for HA
|
||||
podAntiAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
- labelSelector:
|
||||
matchExpressions:
|
||||
- key: app.kubernetes.io/name
|
||||
operator: In
|
||||
values:
|
||||
- certctl
|
||||
- key: app.kubernetes.io/component
|
||||
operator: In
|
||||
values:
|
||||
- server
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
||||
customLabels:
|
||||
environment: production
|
||||
team: platform
|
||||
cost-center: ops
|
||||
|
||||
customAnnotations:
|
||||
slack-alerts: "#ops"
|
||||
backup-policy: daily
|
||||
@@ -0,0 +1,216 @@
|
||||
//go:build integration
|
||||
|
||||
// Package integration_test — image-level HEALTHCHECK contract.
|
||||
//
|
||||
// U-2 (P1, cat-u-healthcheck_protocol_mismatch): pre-U-2 the published
|
||||
// server image's Dockerfile HEALTHCHECK called `curl -f http://localhost:
|
||||
// 8443/health` against an HTTPS-only listener (HTTPS-Everywhere milestone,
|
||||
// v2.2 / tag v2.0.47). Operators outside docker-compose / Helm saw the
|
||||
// container reported as `unhealthy` indefinitely. The compose stack
|
||||
// overrode this HEALTHCHECK with `--cacert + https://`; the Helm chart
|
||||
// uses explicit `httpGet` probes that ignore Docker's HEALTHCHECK; the 5
|
||||
// example compose files all override with `curl -sfk https://localhost:
|
||||
// 8443/health`. So the observable failure was scoped to bare `docker run`
|
||||
// / Docker Swarm / Nomad / ECS users — exactly the "I just pulled the
|
||||
// published image" path.
|
||||
//
|
||||
// This file's tests pin the contract at the binary-image level. The
|
||||
// matching CI grep guardrail in .github/workflows/ci.yml catches the
|
||||
// regression at the Dockerfile-source level; both layers are needed
|
||||
// because someone could replace the HEALTHCHECK line with a sibling
|
||||
// broken pattern that the grep doesn't catch (e.g., a TCP-only check
|
||||
// against the HTTPS port).
|
||||
//
|
||||
// Run alongside the rest of the integration suite:
|
||||
//
|
||||
// cd deploy/test && go test -tags integration -v -run Healthcheck
|
||||
//
|
||||
// The tests skip cleanly with t.Skip when docker is not available
|
||||
// (CI without docker-in-docker, sandbox environments, etc.) so they
|
||||
// don't block local development on machines without docker.
|
||||
package integration_test
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"os/exec"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// dockerAvailable returns true when `docker version` returns 0.
|
||||
// We cache it across tests in this file so the skip message prints once.
|
||||
func dockerAvailable(t *testing.T) bool {
|
||||
t.Helper()
|
||||
cmd := exec.Command("docker", "version", "--format", "{{.Server.Version}}")
|
||||
out, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
t.Logf("docker not available: %v\noutput: %s", err, string(out))
|
||||
return false
|
||||
}
|
||||
return true
|
||||
}
|
||||
|
||||
// dockerCmd runs `docker <args...>` with a 60s budget, returning stdout
|
||||
// + stderr combined and the exit error if any. Used for short-lived
|
||||
// probes (inspect, build, run -d).
|
||||
func dockerCmd(t *testing.T, timeout time.Duration, args ...string) (string, error) {
|
||||
t.Helper()
|
||||
cmd := exec.Command("docker", args...)
|
||||
done := make(chan struct{})
|
||||
var out []byte
|
||||
var err error
|
||||
go func() {
|
||||
out, err = cmd.CombinedOutput()
|
||||
close(done)
|
||||
}()
|
||||
select {
|
||||
case <-done:
|
||||
return string(out), err
|
||||
case <-time.After(timeout):
|
||||
_ = cmd.Process.Kill()
|
||||
t.Fatalf("docker %v timed out after %v", args, timeout)
|
||||
return "", err
|
||||
}
|
||||
}
|
||||
|
||||
// TestPublishedServerImage_HealthcheckSpecUsesHTTPS performs the Dockerfile-
|
||||
// source-level shipped-shape pin: the inspected image's Healthcheck.Test
|
||||
// array MUST contain "https://localhost:8443/health" (and MUST NOT
|
||||
// contain "http://localhost:8443/health"). This is the lightweight half
|
||||
// of the contract — it doesn't require running the container, only
|
||||
// building it. It catches the audit-flagged bug directly.
|
||||
func TestPublishedServerImage_HealthcheckSpecUsesHTTPS(t *testing.T) {
|
||||
if !dockerAvailable(t) {
|
||||
t.Skip("docker not available — skipping image-level HEALTHCHECK test")
|
||||
}
|
||||
|
||||
const imgTag = "certctl-u2-healthcheck-spec-test"
|
||||
t.Cleanup(func() {
|
||||
_, _ = dockerCmd(t, 30*time.Second, "rmi", "-f", imgTag)
|
||||
})
|
||||
|
||||
// Build the server image. Use the repo root as context (this test
|
||||
// file lives at deploy/test/, the Dockerfile at the repo root).
|
||||
buildOut, err := dockerCmd(t, 5*time.Minute,
|
||||
"build", "-f", "../../Dockerfile", "-t", imgTag, "../..")
|
||||
if err != nil {
|
||||
t.Fatalf("docker build failed: %v\noutput:\n%s", err, buildOut)
|
||||
}
|
||||
|
||||
// Inspect the shipped HEALTHCHECK metadata.
|
||||
inspectOut, err := dockerCmd(t, 30*time.Second,
|
||||
"inspect", "--format", "{{json .Config.Healthcheck}}", imgTag)
|
||||
if err != nil {
|
||||
t.Fatalf("docker inspect failed: %v\noutput:\n%s", err, inspectOut)
|
||||
}
|
||||
|
||||
var hc struct {
|
||||
Test []string
|
||||
Interval int64
|
||||
Timeout int64
|
||||
}
|
||||
if err := json.Unmarshal([]byte(strings.TrimSpace(inspectOut)), &hc); err != nil {
|
||||
t.Fatalf("could not parse Healthcheck JSON %q: %v", inspectOut, err)
|
||||
}
|
||||
|
||||
joined := strings.Join(hc.Test, " ")
|
||||
|
||||
// Positive contract.
|
||||
if !strings.Contains(joined, "https://localhost:8443/health") {
|
||||
t.Errorf("Healthcheck.Test does not target https://localhost:8443/health\nfull: %v", hc.Test)
|
||||
}
|
||||
|
||||
// Negative contract — pre-U-2 regression shape MUST be absent.
|
||||
if strings.Contains(joined, "http://localhost:8443/health") {
|
||||
t.Errorf("Healthcheck.Test still contains the pre-U-2 plaintext shape: %v", hc.Test)
|
||||
}
|
||||
|
||||
// `-k` (or `--insecure`) must be present because the bootstrap cert
|
||||
// is per-deploy and the published image can't pin a CA bundle —
|
||||
// see the U-2 closure docblock on Dockerfile and the audit doc.
|
||||
if !strings.Contains(joined, "-k") && !strings.Contains(joined, "--insecure") {
|
||||
t.Errorf("Healthcheck.Test omits -k / --insecure flag (required for self-signed bootstrap probe): %v", hc.Test)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPublishedAgentImage_HealthcheckSpecExists pins the U-2 adjacent
|
||||
// fix that added a HEALTHCHECK to the agent image. Pre-U-2 the agent
|
||||
// image had no HEALTHCHECK declaration, so bare-`docker run` agents got
|
||||
// `none` health status from Docker. Post-U-2 the agent uses pgrep to
|
||||
// verify the process is alive (mirroring the docker-compose pattern at
|
||||
// deploy/docker-compose.yml:173, which also became reliable post-U-2
|
||||
// because procps is now installed in the runtime image).
|
||||
func TestPublishedAgentImage_HealthcheckSpecExists(t *testing.T) {
|
||||
if !dockerAvailable(t) {
|
||||
t.Skip("docker not available — skipping image-level HEALTHCHECK test")
|
||||
}
|
||||
|
||||
const imgTag = "certctl-u2-agent-healthcheck-spec-test"
|
||||
t.Cleanup(func() {
|
||||
_, _ = dockerCmd(t, 30*time.Second, "rmi", "-f", imgTag)
|
||||
})
|
||||
|
||||
buildOut, err := dockerCmd(t, 5*time.Minute,
|
||||
"build", "-f", "../../Dockerfile.agent", "-t", imgTag, "../..")
|
||||
if err != nil {
|
||||
t.Fatalf("docker build failed: %v\noutput:\n%s", err, buildOut)
|
||||
}
|
||||
|
||||
inspectOut, err := dockerCmd(t, 30*time.Second,
|
||||
"inspect", "--format", "{{json .Config.Healthcheck}}", imgTag)
|
||||
if err != nil {
|
||||
t.Fatalf("docker inspect failed: %v\noutput:\n%s", err, inspectOut)
|
||||
}
|
||||
|
||||
trimmed := strings.TrimSpace(inspectOut)
|
||||
if trimmed == "null" || trimmed == "" {
|
||||
t.Fatalf("agent image has no HEALTHCHECK (got %q) — U-2 adjacent fix regressed", inspectOut)
|
||||
}
|
||||
|
||||
var hc struct {
|
||||
Test []string
|
||||
}
|
||||
if err := json.Unmarshal([]byte(trimmed), &hc); err != nil {
|
||||
t.Fatalf("could not parse Healthcheck JSON %q: %v", inspectOut, err)
|
||||
}
|
||||
|
||||
joined := strings.Join(hc.Test, " ")
|
||||
if !strings.Contains(joined, "pgrep") {
|
||||
t.Errorf("agent Healthcheck.Test does not use pgrep (lost the process-presence shape): %v", hc.Test)
|
||||
}
|
||||
if !strings.Contains(joined, "certctl-agent") {
|
||||
t.Errorf("agent Healthcheck.Test does not target the certctl-agent process name: %v", hc.Test)
|
||||
}
|
||||
}
|
||||
|
||||
// TestPublishedServerImage_HealthcheckTransitionsToHealthy is the
|
||||
// runtime-level contract: the built image, when started, must transition
|
||||
// to `healthy` within the start-period + 30s observability budget. This
|
||||
// is the heavy test — it requires the server to actually start, which
|
||||
// in turn requires either a reachable database OR a startup that fails
|
||||
// gracefully enough to keep the HEALTHCHECK probe target alive.
|
||||
//
|
||||
// The container is started with CERTCTL_DATABASE_URL pointing at an
|
||||
// unreachable host so the server fails its postgres bring-up — but
|
||||
// importantly, fails AFTER the TLS listener has come up, because the
|
||||
// HEALTHCHECK probe target is the TLS listener. We don't actually need
|
||||
// the database to validate the HEALTHCHECK shape.
|
||||
//
|
||||
// IMPORTANT: this test is the runtime contract. If you're working on the
|
||||
// server's startup ordering and the listener now comes up AFTER the
|
||||
// database, this test must adapt — start a sidecar postgres via
|
||||
// testcontainers-go (see internal/integration/lifecycle_test.go for the
|
||||
// pattern) and connect the certctl-server container to it.
|
||||
func TestPublishedServerImage_HealthcheckTransitionsToHealthy(t *testing.T) {
|
||||
if !dockerAvailable(t) {
|
||||
t.Skip("docker not available — skipping runtime HEALTHCHECK test")
|
||||
}
|
||||
if testing.Short() {
|
||||
t.Skip("runtime HEALTHCHECK test takes ~45s; skipping under -short")
|
||||
}
|
||||
t.Skip("runtime probe contract not yet wired to a sidecar postgres; " +
|
||||
"image-spec contract above (TestPublishedServerImage_HealthcheckSpecUsesHTTPS) " +
|
||||
"covers the audit-flagged regression. Re-enable once the integration " +
|
||||
"harness provisions postgres for image-level smoke.")
|
||||
}
|
||||
@@ -0,0 +1,27 @@
|
||||
#!/bin/sh
|
||||
# Generate a self-signed placeholder certificate so NGINX can boot
|
||||
# before the certctl agent deploys a real certificate.
|
||||
# Once the agent deploys, it overwrites these files and reloads NGINX.
|
||||
|
||||
CERT_DIR="/etc/nginx/certs"
|
||||
mkdir -p "$CERT_DIR"
|
||||
|
||||
# Make cert directory world-writable so the certctl-agent container
|
||||
# (which shares this volume) can overwrite the placeholder certs.
|
||||
chmod 777 "$CERT_DIR"
|
||||
|
||||
if [ ! -f "$CERT_DIR/cert.pem" ]; then
|
||||
echo "Generating self-signed placeholder certificate..."
|
||||
apk add --no-cache openssl > /dev/null 2>&1
|
||||
openssl req -x509 -nodes -days 1 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
|
||||
-keyout "$CERT_DIR/key.pem" \
|
||||
-out "$CERT_DIR/cert.pem" \
|
||||
-subj "/CN=placeholder.certctl.test" \
|
||||
2>/dev/null
|
||||
# Make placeholder certs writable by the agent container
|
||||
chmod 666 "$CERT_DIR/cert.pem" "$CERT_DIR/key.pem"
|
||||
echo "Placeholder certificate generated."
|
||||
fi
|
||||
|
||||
# Start NGINX in foreground
|
||||
exec nginx -g "daemon off;"
|
||||
@@ -0,0 +1,42 @@
|
||||
# NGINX configuration for certctl test environment.
|
||||
# The agent deploys certificates to /etc/nginx/certs/ and reloads NGINX.
|
||||
# On startup, NGINX uses a self-signed placeholder so it can boot before any cert is deployed.
|
||||
|
||||
# Generate a self-signed placeholder on container start (see entrypoint in compose).
|
||||
# Once the agent deploys a real cert, it overwrites these files and reloads.
|
||||
|
||||
events {
|
||||
worker_connections 1024;
|
||||
}
|
||||
|
||||
http {
|
||||
# HTTP → redirect to HTTPS (optional, for realism)
|
||||
server {
|
||||
listen 80;
|
||||
server_name _;
|
||||
return 301 https://$host$request_uri;
|
||||
}
|
||||
|
||||
# HTTPS server — serves whatever cert the agent has deployed
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name _;
|
||||
|
||||
ssl_certificate /etc/nginx/certs/cert.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/key.pem;
|
||||
|
||||
# Modern TLS settings
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
ssl_prefer_server_ciphers off;
|
||||
|
||||
location / {
|
||||
default_type text/plain;
|
||||
return 200 'certctl test environment — NGINX is serving TLS\n';
|
||||
}
|
||||
|
||||
location /health {
|
||||
default_type text/plain;
|
||||
return 200 'ok\n';
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,16 @@
|
||||
{
|
||||
"pebble": {
|
||||
"listenAddress": "0.0.0.0:14000",
|
||||
"managementListenAddress": "0.0.0.0:15000",
|
||||
"certificate": "test/certs/localhost/cert.pem",
|
||||
"privateKey": "test/certs/localhost/key.pem",
|
||||
"httpPort": 80,
|
||||
"tlsPort": 443,
|
||||
"ocspResponderURL": "",
|
||||
"externalAccountBindingRequired": false,
|
||||
"retryAfter": {
|
||||
"authz": 3,
|
||||
"order": 5
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,972 @@
|
||||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# DEPRECATED — prefer `go test -tags integration ./deploy/test/...`
|
||||
# =============================================================================
|
||||
#
|
||||
# This bash harness predates the Go integration test suite in
|
||||
# deploy/test/integration_test.go (build tag `integration`, 34 subtests across
|
||||
# 13 phases — health, agent heartbeat, Local CA issuance, ACME, step-ca, EST,
|
||||
# S/MIME, discovery, network scan, revocation + CRL, deployment verification).
|
||||
# The Go suite uses crypto/x509, crypto/tls, and database/sql to parse certs,
|
||||
# probe TLS, and talk to PostgreSQL directly — no openssl text-scraping or
|
||||
# brittle curl pipelines. It is the authoritative integration test surface as
|
||||
# of milestone M-007 (HTTPS Everywhere, Phase 6), where the test compose
|
||||
# stack wires the server on https://localhost:8443 behind a pinned CA bundle
|
||||
# at ./certs/ca.crt.
|
||||
#
|
||||
# Run the Go suite:
|
||||
# (cd deploy && docker compose -f docker-compose.test.yml up -d --build)
|
||||
# go test -tags integration -v -count=1 ./deploy/test/...
|
||||
#
|
||||
# Keep this bash script around because:
|
||||
# * It is cited in docs/test-env.md and muscle-memory for contributors.
|
||||
# * It exercises the CLI / curl path end-to-end (a different failure mode
|
||||
# than the Go HTTP client path).
|
||||
# But any NEW integration coverage goes in integration_test.go — not here.
|
||||
#
|
||||
# =============================================================================
|
||||
# certctl End-to-End Test Script
|
||||
# =============================================================================
|
||||
#
|
||||
# Automates the full lifecycle test from docs/test-env.md:
|
||||
# 1. Bring up all 7 containers (build from source)
|
||||
# 2. Wait for every service to be healthy
|
||||
# 3. Verify pre-seeded data (agents, issuers, targets, profiles)
|
||||
# 4. Issue a certificate via Local CA → deploy to NGINX → verify TLS
|
||||
# 5. Issue a certificate via ACME/Pebble → verify
|
||||
# 6. Issue a certificate via step-ca → verify
|
||||
# 7. Test revocation + CRL
|
||||
# 8. Test discovery
|
||||
# 9. Test renewal (re-issue step-ca cert, check version history)
|
||||
# 10. EST enrollment (RFC 7030) — cacerts + simpleenroll
|
||||
# 11. S/MIME issuance — emailProtection EKU + adaptive KeyUsage
|
||||
# 12. API spot checks + print summary
|
||||
#
|
||||
# Usage:
|
||||
# cd certctl/deploy
|
||||
# ./test/run-test.sh # full run (build + test)
|
||||
# ./test/run-test.sh --no-build # skip docker build, reuse existing containers
|
||||
# ./test/run-test.sh --no-teardown # leave containers running after test
|
||||
#
|
||||
# Requirements: docker, curl, openssl, jq (or python3 for json parsing)
|
||||
# =============================================================================
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config
|
||||
# ---------------------------------------------------------------------------
|
||||
COMPOSE_FILE="docker-compose.test.yml"
|
||||
API_URL="https://localhost:8443"
|
||||
API_KEY="test-key-2026"
|
||||
NGINX_TLS="localhost:8444"
|
||||
AUTH_HEADER="Authorization: Bearer ${API_KEY}"
|
||||
CACERT="./certs/ca.crt"
|
||||
|
||||
# Flags
|
||||
BUILD=true
|
||||
TEARDOWN=true
|
||||
for arg in "$@"; do
|
||||
case "$arg" in
|
||||
--no-build) BUILD=false ;;
|
||||
--no-teardown) TEARDOWN=false ;;
|
||||
esac
|
||||
done
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
CYAN='\033[0;36m'
|
||||
BOLD='\033[1m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
PASS=0
|
||||
FAIL=0
|
||||
SKIP=0
|
||||
|
||||
pass() {
|
||||
PASS=$((PASS + 1))
|
||||
echo -e " ${GREEN}PASS${NC} $1"
|
||||
}
|
||||
|
||||
fail() {
|
||||
FAIL=$((FAIL + 1))
|
||||
echo -e " ${RED}FAIL${NC} $1"
|
||||
if [ -n "${2:-}" ]; then
|
||||
echo -e " ${RED}$2${NC}"
|
||||
fi
|
||||
}
|
||||
|
||||
skip() {
|
||||
SKIP=$((SKIP + 1))
|
||||
echo -e " ${YELLOW}SKIP${NC} $1"
|
||||
}
|
||||
|
||||
info() {
|
||||
echo -e "${CYAN}==>${NC} $1"
|
||||
}
|
||||
|
||||
header() {
|
||||
echo ""
|
||||
echo -e "${BOLD}─── $1 ───${NC}"
|
||||
}
|
||||
|
||||
# API helper: GET endpoint, return JSON body. Exits 1 on HTTP error.
|
||||
api_get() {
|
||||
local path="$1"
|
||||
curl -sf --cacert "${CACERT}" -H "${AUTH_HEADER}" "${API_URL}${path}" 2>/dev/null
|
||||
}
|
||||
|
||||
# API helper: POST with optional JSON body
|
||||
api_post() {
|
||||
local path="$1"
|
||||
local body="${2:-}"
|
||||
if [ -n "$body" ]; then
|
||||
curl -sf --cacert "${CACERT}" -X POST -H "${AUTH_HEADER}" -H "Content-Type: application/json" \
|
||||
-d "$body" "${API_URL}${path}" 2>/dev/null
|
||||
else
|
||||
curl -sf --cacert "${CACERT}" -X POST -H "${AUTH_HEADER}" "${API_URL}${path}" 2>/dev/null
|
||||
fi
|
||||
}
|
||||
|
||||
# Wait for an HTTP endpoint to return 200. Retries with backoff.
|
||||
wait_for_http() {
|
||||
local url="$1"
|
||||
local label="$2"
|
||||
local max_wait="${3:-120}"
|
||||
local elapsed=0
|
||||
local interval=3
|
||||
|
||||
while [ $elapsed -lt $max_wait ]; do
|
||||
if curl -sf -H "${AUTH_HEADER}" "$url" >/dev/null 2>&1; then
|
||||
return 0
|
||||
fi
|
||||
sleep $interval
|
||||
elapsed=$((elapsed + interval))
|
||||
done
|
||||
return 1
|
||||
}
|
||||
|
||||
# Extract a field from JSON using python3 (no jq dependency)
|
||||
json_field() {
|
||||
python3 -c "import sys,json; d=json.load(sys.stdin); print($1)" 2>/dev/null
|
||||
}
|
||||
|
||||
# Wait for a job to reach a terminal state (Completed or Failed)
|
||||
# Usage: wait_for_job <cert_id> <max_seconds>
|
||||
# Returns 0 if Completed, 1 if Failed/timeout
|
||||
wait_for_jobs_done() {
|
||||
local cert_id="$1"
|
||||
local max_wait="${2:-180}"
|
||||
local elapsed=0
|
||||
local interval=5
|
||||
|
||||
while [ $elapsed -lt $max_wait ]; do
|
||||
local jobs_json
|
||||
jobs_json=$(api_get "/api/v1/jobs" 2>/dev/null || echo '{"data":[]}')
|
||||
|
||||
# Check if all jobs for this cert are in terminal state
|
||||
# API returns jobs under "data" key (not "jobs")
|
||||
local pending
|
||||
pending=$(echo "$jobs_json" | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
jobs = data.get('data') or data.get('jobs') or []
|
||||
active = [j for j in jobs if j.get('certificate_id') == '$cert_id'
|
||||
and j.get('status') not in ('Completed', 'Failed', 'Cancelled')]
|
||||
print(len(active))
|
||||
" 2>/dev/null || echo "99")
|
||||
|
||||
if [ "$pending" = "0" ]; then
|
||||
# Check how many jobs exist and their terminal states
|
||||
local job_counts
|
||||
job_counts=$(echo "$jobs_json" | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
jobs = data.get('data') or data.get('jobs') or []
|
||||
mine = [j for j in jobs if j.get('certificate_id') == '$cert_id']
|
||||
completed = len([j for j in mine if j.get('status') == 'Completed'])
|
||||
failed = len([j for j in mine if j.get('status') in ('Failed', 'Cancelled')])
|
||||
print(f'{len(mine)} {completed} {failed}')
|
||||
" 2>/dev/null || echo "0 0 0")
|
||||
local total_jobs completed_jobs failed_jobs
|
||||
total_jobs=$(echo "$job_counts" | cut -d' ' -f1)
|
||||
completed_jobs=$(echo "$job_counts" | cut -d' ' -f2)
|
||||
failed_jobs=$(echo "$job_counts" | cut -d' ' -f3)
|
||||
|
||||
if [ "$completed_jobs" -gt 0 ]; then
|
||||
return 0 # At least one job completed successfully
|
||||
fi
|
||||
if [ "$total_jobs" -gt 0 ] && [ "$failed_jobs" -gt 0 ]; then
|
||||
return 1 # All jobs are in terminal state but none completed — all failed
|
||||
fi
|
||||
fi
|
||||
|
||||
sleep $interval
|
||||
elapsed=$((elapsed + interval))
|
||||
done
|
||||
return 1
|
||||
}
|
||||
|
||||
# Get the TLS cert subject from NGINX for a given SNI
|
||||
get_tls_subject() {
|
||||
local sni="$1"
|
||||
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
|
||||
| openssl x509 -noout -subject 2>/dev/null \
|
||||
| sed 's/subject=//' | sed 's/^ *//'
|
||||
}
|
||||
|
||||
get_tls_issuer() {
|
||||
local sni="$1"
|
||||
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
|
||||
| openssl x509 -noout -issuer 2>/dev/null \
|
||||
| sed 's/issuer=//' | sed 's/^ *//'
|
||||
}
|
||||
|
||||
# Get the TLS cert SANs from NGINX for a given SNI
|
||||
# Modern CAs (including Let's Encrypt / Pebble) put domains only in SAN, not Subject CN.
|
||||
get_tls_san() {
|
||||
local sni="$1"
|
||||
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
|
||||
| openssl x509 -noout -ext subjectAltName 2>/dev/null \
|
||||
| grep -i "DNS:" | sed 's/^ *//'
|
||||
}
|
||||
|
||||
# Check if NGINX is serving a cert that matches the given domain (checks Subject then SAN)
|
||||
check_tls_identity() {
|
||||
local domain="$1"
|
||||
local subject issuer san
|
||||
subject=$(get_tls_subject "$domain")
|
||||
issuer=$(get_tls_issuer "$domain")
|
||||
san=$(get_tls_san "$domain")
|
||||
if echo "$subject" | grep -qi "$domain" || echo "$san" | grep -qi "$domain"; then
|
||||
echo "MATCH"
|
||||
echo "Subject: $subject"
|
||||
echo "SAN: $san"
|
||||
echo "Issuer: $issuer"
|
||||
else
|
||||
echo "NO_MATCH"
|
||||
echo "Subject: $subject"
|
||||
echo "SAN: $san"
|
||||
echo "Issuer: $issuer"
|
||||
fi
|
||||
}
|
||||
|
||||
# SQL exec in the postgres container
|
||||
psql_exec() {
|
||||
docker exec certctl-test-postgres psql -U certctl -d certctl -tAc "$1" 2>/dev/null
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cleanup trap
|
||||
# ---------------------------------------------------------------------------
|
||||
cleanup() {
|
||||
if [ "$TEARDOWN" = true ]; then
|
||||
info "Tearing down test environment..."
|
||||
docker compose -f "$COMPOSE_FILE" down -v >/dev/null 2>&1 || true
|
||||
else
|
||||
info "Leaving containers running (--no-teardown)"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 0: Environment Check
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 0: Environment Check"
|
||||
|
||||
# Make sure we're in the deploy directory
|
||||
if [ ! -f "$COMPOSE_FILE" ]; then
|
||||
echo -e "${RED}ERROR: $COMPOSE_FILE not found.${NC}"
|
||||
echo "Run this script from the certctl/deploy directory:"
|
||||
echo " cd certctl/deploy && ./test/run-test.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for cmd in docker curl openssl python3; do
|
||||
if command -v "$cmd" >/dev/null 2>&1; then
|
||||
pass "$cmd available"
|
||||
else
|
||||
fail "$cmd not found" "Install $cmd and try again"
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
|
||||
if docker compose version >/dev/null 2>&1; then
|
||||
pass "docker compose available"
|
||||
else
|
||||
fail "docker compose not available" "Install Docker Compose v2+"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 1: Start the Stack
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 1: Start Test Environment"
|
||||
|
||||
# Teardown any previous run
|
||||
info "Cleaning up previous test environment..."
|
||||
docker compose -f "$COMPOSE_FILE" down -v >/dev/null 2>&1 || true
|
||||
|
||||
# Set the cleanup trap AFTER the initial teardown
|
||||
trap cleanup EXIT
|
||||
|
||||
if [ "$BUILD" = true ]; then
|
||||
info "Building and starting containers (this takes 2-5 minutes on first run)..."
|
||||
docker compose -f "$COMPOSE_FILE" up --build -d 2>&1 | tail -5
|
||||
else
|
||||
info "Starting containers (--no-build)..."
|
||||
docker compose -f "$COMPOSE_FILE" up -d 2>&1 | tail -5
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 2: Wait for Services
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 2: Waiting for Services"
|
||||
|
||||
info "Waiting for PostgreSQL..."
|
||||
if docker compose -f "$COMPOSE_FILE" exec -T postgres pg_isready -U certctl -d certctl >/dev/null 2>&1 ||
|
||||
wait_for_http "${API_URL}/health" "postgres" 60; then
|
||||
pass "PostgreSQL ready"
|
||||
else
|
||||
fail "PostgreSQL not ready after 60s"
|
||||
fi
|
||||
|
||||
info "Waiting for certctl server..."
|
||||
if wait_for_http "${API_URL}/health" "server" 120; then
|
||||
pass "certctl server healthy"
|
||||
# Show trust setup + connector init for debugging
|
||||
echo " --- Server startup (trust setup) ---"
|
||||
docker logs certctl-test-server 2>&1 | grep -E "trust|Added|Extract|provisioner|Pre-launch|key file|WARNING|CERTCTL_" | head -15
|
||||
echo " ---"
|
||||
else
|
||||
fail "certctl server not healthy after 120s"
|
||||
echo ""
|
||||
echo "Server logs:"
|
||||
docker logs certctl-test-server --tail 30
|
||||
exit 1
|
||||
fi
|
||||
|
||||
info "Waiting for NGINX..."
|
||||
if wait_for_http "http://localhost:8080" "nginx" 30; then
|
||||
pass "NGINX healthy"
|
||||
else
|
||||
# NGINX might not respond to plain curl on /health without the right path
|
||||
# Check docker health instead
|
||||
if docker inspect certctl-test-nginx --format='{{.State.Health.Status}}' 2>/dev/null | grep -q healthy; then
|
||||
pass "NGINX healthy (docker healthcheck)"
|
||||
else
|
||||
skip "NGINX health check inconclusive (will verify via TLS later)"
|
||||
fi
|
||||
fi
|
||||
|
||||
# Give the agent a few seconds to register and send first heartbeat
|
||||
info "Waiting for agent heartbeat (up to 45s)..."
|
||||
AGENT_READY=false
|
||||
for i in $(seq 1 15); do
|
||||
AGENT_STATUS=$(api_get "/api/v1/agents/agent-test-01" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "")
|
||||
if [ "$AGENT_STATUS" = "online" ]; then
|
||||
AGENT_READY=true
|
||||
break
|
||||
fi
|
||||
sleep 3
|
||||
done
|
||||
if [ "$AGENT_READY" = true ]; then
|
||||
pass "Agent online"
|
||||
else
|
||||
skip "Agent not yet online (may be slow to heartbeat — continuing)"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 3: Verify Pre-Seeded Data
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 3: Verify Pre-Seeded Data"
|
||||
|
||||
# Agents
|
||||
AGENT_COUNT=$(api_get "/api/v1/agents" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$AGENT_COUNT" -ge 2 ]; then
|
||||
pass "Agents: $AGENT_COUNT found (agent-test-01 + server-scanner)"
|
||||
else
|
||||
fail "Agents: expected >= 2, got $AGENT_COUNT"
|
||||
fi
|
||||
|
||||
# Issuers
|
||||
ISSUER_COUNT=$(api_get "/api/v1/issuers" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$ISSUER_COUNT" -ge 3 ]; then
|
||||
pass "Issuers: $ISSUER_COUNT found (iss-local, iss-acme-staging, iss-stepca)"
|
||||
else
|
||||
fail "Issuers: expected >= 3, got $ISSUER_COUNT" "Check seed_test.sql loaded correctly"
|
||||
fi
|
||||
|
||||
# Targets
|
||||
TARGET_COUNT=$(api_get "/api/v1/targets" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$TARGET_COUNT" -ge 1 ]; then
|
||||
pass "Targets: $TARGET_COUNT found (target-test-nginx)"
|
||||
else
|
||||
fail "Targets: expected >= 1, got $TARGET_COUNT" "seed_test.sql may have failed after iss-local"
|
||||
fi
|
||||
|
||||
# Profile
|
||||
PROFILE_RESP=$(api_get "/api/v1/profiles" 2>/dev/null || echo '{"total":0}')
|
||||
PROFILE_COUNT=$(echo "$PROFILE_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$PROFILE_COUNT" -ge 2 ]; then
|
||||
pass "Profiles: $PROFILE_COUNT found (prof-test-tls, prof-test-smime)"
|
||||
else
|
||||
fail "Profiles: expected >= 1, got $PROFILE_COUNT"
|
||||
fi
|
||||
|
||||
# Bail if seed data is broken
|
||||
if [ "$ISSUER_COUNT" -lt 3 ] || [ "$TARGET_COUNT" -lt 1 ]; then
|
||||
echo ""
|
||||
echo -e "${RED}Seed data is incomplete. Cannot continue.${NC}"
|
||||
echo "Check PostgreSQL logs: docker logs certctl-test-postgres"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 4: Local CA Issuance
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 4: Local CA Certificate Issuance"
|
||||
|
||||
info "Creating certificate record mc-local-test..."
|
||||
CREATE_RESP=$(api_post "/api/v1/certificates" '{
|
||||
"id": "mc-local-test",
|
||||
"name": "local-test-cert",
|
||||
"common_name": "local.certctl.test",
|
||||
"sans": ["local.certctl.test"],
|
||||
"issuer_id": "iss-local",
|
||||
"owner_id": "owner-test-admin",
|
||||
"team_id": "team-test-ops",
|
||||
"renewal_policy_id": "rp-default",
|
||||
"certificate_profile_id": "prof-test-tls",
|
||||
"environment": "development"
|
||||
}' 2>/dev/null || echo "ERROR")
|
||||
|
||||
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-local-test'" 2>/dev/null; then
|
||||
pass "Certificate record created"
|
||||
else
|
||||
fail "Certificate creation failed" "$CREATE_RESP"
|
||||
fi
|
||||
|
||||
info "Linking certificate to NGINX target..."
|
||||
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-local-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
|
||||
pass "Target mapping inserted"
|
||||
|
||||
info "Triggering issuance..."
|
||||
RENEW_RESP=$(api_post "/api/v1/certificates/mc-local-test/renew" 2>/dev/null || echo "ERROR")
|
||||
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
|
||||
pass "Issuance triggered"
|
||||
else
|
||||
fail "Trigger failed" "$RENEW_RESP"
|
||||
fi
|
||||
|
||||
# Verify a job was created (this is the bug fix check)
|
||||
sleep 2
|
||||
JOB_COUNT=$(api_get "/api/v1/jobs" | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
jobs = [j for j in (data.get('data') or data.get('jobs') or []) if j.get('certificate_id') == 'mc-local-test']
|
||||
print(len(jobs))
|
||||
" 2>/dev/null || echo "0")
|
||||
|
||||
if [ "$JOB_COUNT" -gt 0 ]; then
|
||||
pass "Job created ($JOB_COUNT jobs for mc-local-test)"
|
||||
else
|
||||
fail "No jobs created — TriggerRenewalWithActor bug still present"
|
||||
fi
|
||||
|
||||
info "Waiting for issuance + deployment (up to 180s)..."
|
||||
if wait_for_jobs_done "mc-local-test" 180; then
|
||||
pass "All jobs completed"
|
||||
else
|
||||
fail "Jobs did not complete within 180s"
|
||||
echo " Current jobs:"
|
||||
api_get "/api/v1/jobs" 2>/dev/null | python3 -m json.tool 2>/dev/null | head -30
|
||||
fi
|
||||
|
||||
info "Reloading NGINX to pick up deployed certificate..."
|
||||
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
|
||||
sleep 3
|
||||
|
||||
info "Verifying TLS certificate on NGINX..."
|
||||
TLS_CHECK=$(check_tls_identity "local.certctl.test")
|
||||
TLS_RESULT=$(echo "$TLS_CHECK" | head -1)
|
||||
if [ "$TLS_RESULT" = "MATCH" ]; then
|
||||
pass "NGINX serving cert for local.certctl.test"
|
||||
echo "$TLS_CHECK" | tail -n +2 | while read -r line; do echo -e " $line"; done
|
||||
else
|
||||
fail "NGINX not serving expected cert" "$(echo "$TLS_CHECK" | tail -n +2 | tr '\n' ', ')"
|
||||
fi
|
||||
|
||||
# Check cert status in API
|
||||
CERT_STATUS=$(api_get "/api/v1/certificates/mc-local-test" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
|
||||
if [ "$CERT_STATUS" = "Active" ]; then
|
||||
pass "Certificate status: Active"
|
||||
else
|
||||
skip "Certificate status: $CERT_STATUS (expected Active — may need more time)"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 5: ACME (Pebble) Issuance
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 5: ACME (Pebble) Certificate Issuance"
|
||||
|
||||
info "Creating certificate record mc-acme-test..."
|
||||
CREATE_RESP=$(api_post "/api/v1/certificates" '{
|
||||
"id": "mc-acme-test",
|
||||
"name": "acme-test-cert",
|
||||
"common_name": "acme.certctl.test",
|
||||
"sans": ["acme.certctl.test"],
|
||||
"issuer_id": "iss-acme-staging",
|
||||
"owner_id": "owner-test-admin",
|
||||
"team_id": "team-test-ops",
|
||||
"renewal_policy_id": "rp-default",
|
||||
"certificate_profile_id": "prof-test-tls",
|
||||
"environment": "staging"
|
||||
}' 2>/dev/null || echo "ERROR")
|
||||
|
||||
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-acme-test'" 2>/dev/null; then
|
||||
pass "Certificate record created"
|
||||
else
|
||||
fail "Certificate creation failed" "$CREATE_RESP"
|
||||
fi
|
||||
|
||||
info "Linking to target and triggering issuance..."
|
||||
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-acme-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
|
||||
RENEW_RESP=$(api_post "/api/v1/certificates/mc-acme-test/renew" 2>/dev/null || echo "ERROR")
|
||||
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
|
||||
pass "Issuance triggered"
|
||||
else
|
||||
fail "Trigger failed" "$RENEW_RESP"
|
||||
fi
|
||||
|
||||
info "Waiting for ACME issuance + deployment (up to 180s)..."
|
||||
if wait_for_jobs_done "mc-acme-test" 180; then
|
||||
pass "All jobs completed"
|
||||
|
||||
info "Reloading NGINX to pick up deployed certificate..."
|
||||
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
|
||||
sleep 3
|
||||
|
||||
TLS_CHECK=$(check_tls_identity "acme.certctl.test")
|
||||
TLS_RESULT=$(echo "$TLS_CHECK" | head -1)
|
||||
if [ "$TLS_RESULT" = "MATCH" ]; then
|
||||
pass "NGINX serving cert for acme.certctl.test"
|
||||
echo "$TLS_CHECK" | tail -n +2 | while read -r line; do echo -e " $line"; done
|
||||
else
|
||||
fail "NGINX not serving expected ACME cert" "$(echo "$TLS_CHECK" | tail -n +2 | tr '\n' ', ')"
|
||||
fi
|
||||
else
|
||||
fail "ACME jobs did not complete within 180s"
|
||||
info "Checking ACME job status..."
|
||||
api_get "/api/v1/jobs" 2>/dev/null | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
for j in data.get('data', []):
|
||||
if j.get('certificate_id') == 'mc-acme-test':
|
||||
print(f\" Job {j['id']}: type={j['type']} status={j['status']} error={j.get('last_error','')}\")" 2>/dev/null || true
|
||||
echo " Server logs (last 20 lines):"
|
||||
docker logs certctl-test-server --tail 20 2>&1 | grep -i "acme\|error\|fail\|CSR" | head -10 || true
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 6: step-ca Issuance
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 6: step-ca (Private CA) Certificate Issuance"
|
||||
|
||||
info "Creating certificate record mc-stepca-test..."
|
||||
CREATE_RESP=$(api_post "/api/v1/certificates" '{
|
||||
"id": "mc-stepca-test",
|
||||
"name": "stepca-test-cert",
|
||||
"common_name": "stepca.certctl.test",
|
||||
"sans": ["stepca.certctl.test"],
|
||||
"issuer_id": "iss-stepca",
|
||||
"owner_id": "owner-test-admin",
|
||||
"team_id": "team-test-ops",
|
||||
"renewal_policy_id": "rp-default",
|
||||
"certificate_profile_id": "prof-test-tls",
|
||||
"environment": "staging"
|
||||
}' 2>/dev/null || echo "ERROR")
|
||||
|
||||
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-stepca-test'" 2>/dev/null; then
|
||||
pass "Certificate record created"
|
||||
else
|
||||
fail "Certificate creation failed" "$CREATE_RESP"
|
||||
fi
|
||||
|
||||
info "Linking to target and triggering issuance..."
|
||||
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-stepca-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
|
||||
RENEW_RESP=$(api_post "/api/v1/certificates/mc-stepca-test/renew" 2>/dev/null || echo "ERROR")
|
||||
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
|
||||
pass "Issuance triggered"
|
||||
else
|
||||
fail "Trigger failed" "$RENEW_RESP"
|
||||
fi
|
||||
|
||||
info "Waiting for step-ca issuance + deployment (up to 120s)..."
|
||||
if wait_for_jobs_done "mc-stepca-test" 120; then
|
||||
pass "All jobs completed"
|
||||
else
|
||||
fail "Jobs did not complete in time"
|
||||
info "Checking step-ca job status..."
|
||||
api_get "/api/v1/jobs" 2>/dev/null | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
for j in data.get('data', []):
|
||||
if j.get('certificate_id') == 'mc-stepca-test':
|
||||
print(f\" Job {j['id']}: type={j['type']} status={j['status']} error={j.get('last_error','')}\")" 2>/dev/null || true
|
||||
echo " Server logs (step-ca related):"
|
||||
docker logs certctl-test-server --tail 30 2>&1 | grep -i "stepca\|step-ca\|provisioner\|jwe\|decrypt\|CSR.*fail\|error" | head -10 || true
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 7: Revocation
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 7: Revocation"
|
||||
|
||||
info "Revoking mc-local-test (reason: superseded)..."
|
||||
REVOKE_RESP=$(api_post "/api/v1/certificates/mc-local-test/revoke" '{"reason": "superseded"}' 2>/dev/null || echo "ERROR")
|
||||
if echo "$REVOKE_RESP" | grep -qi "revoked\|status"; then
|
||||
pass "Certificate revoked"
|
||||
else
|
||||
fail "Revocation failed" "$REVOKE_RESP"
|
||||
fi
|
||||
|
||||
info "Checking DER CRL under /.well-known/pki (RFC 5280 §5, RFC 8615)..."
|
||||
# The JSON CRL endpoint (`GET /api/v1/crl`) was removed in M-006. RFC 5280
|
||||
# defines only the DER wire format, now served unauthenticated at
|
||||
# `/.well-known/pki/crl/{issuer_id}`. Fetch without the Bearer header to
|
||||
# prove the endpoint is reachable by relying parties with no API key.
|
||||
CRL_TMP=$(mktemp)
|
||||
CRL_HEADERS=$(mktemp)
|
||||
CRL_HTTP_CODE=$(curl -s -o "$CRL_TMP" -D "$CRL_HEADERS" -w "%{http_code}" "${API_URL}/.well-known/pki/crl/iss-local" 2>/dev/null || echo "000")
|
||||
CRL_SIZE=$(wc -c < "$CRL_TMP" | tr -d ' ')
|
||||
CRL_CONTENT_TYPE=$(awk 'tolower($1)=="content-type:" { sub(/\r$/,"",$2); print tolower($2) }' "$CRL_HEADERS" | head -n1)
|
||||
rm -f "$CRL_TMP" "$CRL_HEADERS"
|
||||
|
||||
if [ "$CRL_HTTP_CODE" = "200" ] && [ "$CRL_CONTENT_TYPE" = "application/pkix-crl" ] && [ "$CRL_SIZE" -gt 0 ]; then
|
||||
pass "DER CRL served unauthenticated (HTTP 200, Content-Type application/pkix-crl, ${CRL_SIZE} bytes)"
|
||||
else
|
||||
fail "DER CRL fetch failed: HTTP=$CRL_HTTP_CODE Content-Type=$CRL_CONTENT_TYPE size=$CRL_SIZE"
|
||||
fi
|
||||
|
||||
CERT_STATUS=$(api_get "/api/v1/certificates/mc-local-test" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
|
||||
if [ "$CERT_STATUS" = "Revoked" ]; then
|
||||
pass "Certificate status updated to Revoked"
|
||||
else
|
||||
fail "Certificate status: $CERT_STATUS (expected Revoked)"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 8: Discovery
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 8: Certificate Discovery"
|
||||
|
||||
info "Checking discovered certificates..."
|
||||
DISC_RESP=$(api_get "/api/v1/discovered-certificates" 2>/dev/null || echo '{"total":0}')
|
||||
DISC_TOTAL=$(echo "$DISC_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$DISC_TOTAL" -ge 1 ]; then
|
||||
pass "Discovered $DISC_TOTAL certificate(s) on filesystem"
|
||||
else
|
||||
skip "No discovered certificates yet (agent scan may not have run)"
|
||||
fi
|
||||
|
||||
SUMMARY_RESP=$(api_get "/api/v1/discovery-summary" 2>/dev/null || echo '{}')
|
||||
echo -e " Discovery summary: $SUMMARY_RESP"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 9: Renewal (re-issue ACME cert)
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 9: Renewal"
|
||||
|
||||
# Try mc-stepca-test first (mc-local-test was revoked in Phase 7).
|
||||
# Fall back to mc-acme-test if step-ca cert isn't Active.
|
||||
RENEWAL_CERT=""
|
||||
for candidate in mc-stepca-test mc-acme-test; do
|
||||
STATUS=$(api_get "/api/v1/certificates/$candidate" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
|
||||
if [ "$STATUS" = "Active" ]; then
|
||||
RENEWAL_CERT="$candidate"
|
||||
break
|
||||
fi
|
||||
done
|
||||
|
||||
if [ -z "$RENEWAL_CERT" ]; then
|
||||
skip "Cannot test renewal — no certificate in Active state"
|
||||
else
|
||||
info "Using $RENEWAL_CERT for renewal test..."
|
||||
info "Triggering renewal on $RENEWAL_CERT..."
|
||||
RENEW_RESP=$(api_post "/api/v1/certificates/$RENEWAL_CERT/renew" 2>/dev/null || echo "ERROR")
|
||||
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
|
||||
pass "Renewal triggered"
|
||||
else
|
||||
skip "Renewal trigger returned: $RENEW_RESP"
|
||||
fi
|
||||
|
||||
info "Waiting for renewal to complete (up to 180s)..."
|
||||
if wait_for_jobs_done "$RENEWAL_CERT" 180; then
|
||||
pass "Renewal jobs completed"
|
||||
|
||||
info "Reloading NGINX to pick up renewed certificate..."
|
||||
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
|
||||
sleep 3
|
||||
|
||||
# Verify version history shows multiple versions
|
||||
VERSIONS=$(api_get "/api/v1/certificates/$RENEWAL_CERT/versions" 2>/dev/null | python3 -c "import sys,json; d=json.load(sys.stdin); print(len(d) if isinstance(d, list) else d.get('total', 0))" 2>/dev/null || echo 0)
|
||||
if [ "$VERSIONS" -ge 2 ]; then
|
||||
pass "Certificate has $VERSIONS versions (original + renewal)"
|
||||
else
|
||||
skip "Expected 2+ versions, got $VERSIONS"
|
||||
fi
|
||||
else
|
||||
skip "Renewal jobs did not complete within 180s"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 10: EST Enrollment (RFC 7030)
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 10: EST Enrollment (RFC 7030)"
|
||||
|
||||
# Test cacerts endpoint — should return PKCS#7 with CA cert chain
|
||||
info "Testing EST cacerts endpoint..."
|
||||
EST_CACERTS_RESP=$(curl -sf -H "${AUTH_HEADER}" "${API_URL}/.well-known/est/cacerts" 2>/dev/null || echo "ERROR")
|
||||
if [ "$EST_CACERTS_RESP" != "ERROR" ] && [ -n "$EST_CACERTS_RESP" ]; then
|
||||
# Response should be base64-encoded PKCS#7
|
||||
if echo "$EST_CACERTS_RESP" | base64 -d >/dev/null 2>&1; then
|
||||
pass "EST cacerts returns valid base64 PKCS#7 response"
|
||||
else
|
||||
fail "EST cacerts returned non-base64 data"
|
||||
fi
|
||||
else
|
||||
fail "EST cacerts endpoint failed" "$EST_CACERTS_RESP"
|
||||
fi
|
||||
|
||||
# Test csrattrs endpoint
|
||||
info "Testing EST csrattrs endpoint..."
|
||||
EST_CSRATTRS_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" -H "${AUTH_HEADER}" "${API_URL}/.well-known/est/csrattrs" 2>/dev/null || echo "000")
|
||||
if [ "$EST_CSRATTRS_STATUS" = "200" ] || [ "$EST_CSRATTRS_STATUS" = "204" ]; then
|
||||
pass "EST csrattrs returns $EST_CSRATTRS_STATUS"
|
||||
else
|
||||
fail "EST csrattrs returned $EST_CSRATTRS_STATUS (expected 200 or 204)"
|
||||
fi
|
||||
|
||||
# Test simpleenroll — generate CSR, POST as base64-encoded DER
|
||||
info "Testing EST simpleenroll with generated CSR..."
|
||||
EST_KEY_FILE=$(mktemp /tmp/est-key-XXXXXX.pem)
|
||||
EST_CSR_PEM_FILE=$(mktemp /tmp/est-csr-XXXXXX.pem)
|
||||
EST_CSR_DER_FILE=$(mktemp /tmp/est-csr-XXXXXX.der)
|
||||
trap "rm -f $EST_KEY_FILE $EST_CSR_PEM_FILE $EST_CSR_DER_FILE" EXIT
|
||||
|
||||
# Generate ECDSA key + CSR
|
||||
openssl ecparam -genkey -name prime256v1 -noout -out "$EST_KEY_FILE" 2>/dev/null
|
||||
openssl req -new -key "$EST_KEY_FILE" -out "$EST_CSR_PEM_FILE" -subj "/CN=est-device.certctl.test" 2>/dev/null
|
||||
openssl req -in "$EST_CSR_PEM_FILE" -out "$EST_CSR_DER_FILE" -outform DER 2>/dev/null
|
||||
|
||||
# base64-encode the DER CSR (EST wire format)
|
||||
EST_CSR_B64=$(base64 < "$EST_CSR_DER_FILE" | tr -d '\n')
|
||||
|
||||
EST_ENROLL_RESP=$(curl -sf \
|
||||
-X POST \
|
||||
-H "${AUTH_HEADER}" \
|
||||
-H "Content-Type: application/pkcs10" \
|
||||
-d "$EST_CSR_B64" \
|
||||
"${API_URL}/.well-known/est/simpleenroll" 2>/dev/null || echo "ERROR")
|
||||
|
||||
if [ "$EST_ENROLL_RESP" != "ERROR" ] && [ -n "$EST_ENROLL_RESP" ]; then
|
||||
# Response should be base64-encoded PKCS#7 containing the issued cert
|
||||
if echo "$EST_ENROLL_RESP" | base64 -d >/dev/null 2>&1; then
|
||||
pass "EST simpleenroll issued certificate via PKCS#7 response"
|
||||
else
|
||||
fail "EST simpleenroll returned non-base64 data"
|
||||
fi
|
||||
else
|
||||
fail "EST simpleenroll failed" "$(curl -s -X POST -H "${AUTH_HEADER}" -H "Content-Type: application/pkcs10" -d "$EST_CSR_B64" "${API_URL}/.well-known/est/simpleenroll" 2>&1 | head -5)"
|
||||
fi
|
||||
|
||||
# Test simplereenroll (should work identically)
|
||||
info "Testing EST simplereenroll..."
|
||||
EST_REENROLL_STATUS=$(curl -sf -o /dev/null -w "%{http_code}" \
|
||||
-X POST \
|
||||
-H "${AUTH_HEADER}" \
|
||||
-H "Content-Type: application/pkcs10" \
|
||||
-d "$EST_CSR_B64" \
|
||||
"${API_URL}/.well-known/est/simplereenroll" 2>/dev/null || echo "000")
|
||||
|
||||
if [ "$EST_REENROLL_STATUS" = "200" ]; then
|
||||
pass "EST simplereenroll works (status 200)"
|
||||
else
|
||||
fail "EST simplereenroll returned $EST_REENROLL_STATUS (expected 200)"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 11: S/MIME Certificate Issuance
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 11: S/MIME Certificate Issuance"
|
||||
|
||||
info "Creating S/MIME certificate record..."
|
||||
SMIME_RESP=$(api_post "/api/v1/certificates" '{
|
||||
"id": "mc-smime-test",
|
||||
"name": "smime-test-cert",
|
||||
"common_name": "testuser@certctl.test",
|
||||
"sans": ["testuser@certctl.test"],
|
||||
"issuer_id": "iss-local",
|
||||
"owner_id": "owner-test-admin",
|
||||
"team_id": "team-test-ops",
|
||||
"renewal_policy_id": "rp-default",
|
||||
"certificate_profile_id": "prof-test-smime",
|
||||
"environment": "staging"
|
||||
}' 2>/dev/null || echo "ERROR")
|
||||
|
||||
if echo "$SMIME_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-smime-test'" 2>/dev/null; then
|
||||
pass "S/MIME certificate record created"
|
||||
else
|
||||
fail "S/MIME certificate creation failed" "$SMIME_RESP"
|
||||
fi
|
||||
|
||||
info "Linking S/MIME cert to target (needed for agent work routing)..."
|
||||
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-smime-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
|
||||
|
||||
info "Triggering S/MIME issuance..."
|
||||
SMIME_RENEW=$(api_post "/api/v1/certificates/mc-smime-test/renew" 2>/dev/null || echo "ERROR")
|
||||
if echo "$SMIME_RENEW" | grep -q "renewal_triggered\|status"; then
|
||||
pass "S/MIME issuance triggered"
|
||||
else
|
||||
fail "S/MIME trigger failed" "$SMIME_RENEW"
|
||||
fi
|
||||
|
||||
info "Waiting for S/MIME issuance (up to 120s)..."
|
||||
if wait_for_jobs_done "mc-smime-test" 120; then
|
||||
pass "S/MIME jobs completed"
|
||||
|
||||
# Fetch the issued cert and verify EKU
|
||||
info "Verifying S/MIME certificate EKU..."
|
||||
SMIME_VERSIONS=$(api_get "/api/v1/certificates/mc-smime-test/versions" 2>/dev/null || echo "[]")
|
||||
SMIME_PEM=$(echo "$SMIME_VERSIONS" | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
versions = data if isinstance(data, list) else data.get('data', [])
|
||||
if versions:
|
||||
print(versions[-1].get('pem_chain', versions[-1].get('pem', '')))
|
||||
" 2>/dev/null || echo "")
|
||||
|
||||
if [ -n "$SMIME_PEM" ]; then
|
||||
# Parse the cert and check for emailProtection EKU
|
||||
SMIME_EKU=$(echo "$SMIME_PEM" | openssl x509 -noout -text 2>/dev/null | grep -A2 "Extended Key Usage" || echo "")
|
||||
if echo "$SMIME_EKU" | grep -qi "emailProtection\|E-mail Protection"; then
|
||||
pass "S/MIME cert has emailProtection EKU"
|
||||
else
|
||||
fail "S/MIME cert missing emailProtection EKU" "Got: $SMIME_EKU"
|
||||
fi
|
||||
|
||||
# Check KeyUsage flags (S/MIME should have Digital Signature + Content Commitment)
|
||||
SMIME_KU=$(echo "$SMIME_PEM" | openssl x509 -noout -text 2>/dev/null | awk '/X509v3 Key Usage:/{getline; print; exit}')
|
||||
if echo "$SMIME_KU" | grep -qi "Digital Signature"; then
|
||||
pass "S/MIME cert has Digital Signature KeyUsage"
|
||||
else
|
||||
fail "S/MIME cert missing Digital Signature KeyUsage" "Got: $SMIME_KU"
|
||||
fi
|
||||
|
||||
# Check that email SAN is present
|
||||
SMIME_SAN=$(echo "$SMIME_PEM" | openssl x509 -noout -ext subjectAltName 2>/dev/null || echo "")
|
||||
if echo "$SMIME_SAN" | grep -qi "email:testuser@certctl.test"; then
|
||||
pass "S/MIME cert has email SAN"
|
||||
else
|
||||
# Some implementations use rfc822Name instead of email:
|
||||
if echo "$SMIME_SAN" | grep -qi "testuser@certctl.test"; then
|
||||
pass "S/MIME cert has email SAN (rfc822Name)"
|
||||
else
|
||||
skip "S/MIME email SAN not found in cert (may be in CN only)"
|
||||
echo " SAN content: $SMIME_SAN"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
skip "Could not extract S/MIME cert PEM for EKU verification"
|
||||
fi
|
||||
else
|
||||
fail "S/MIME issuance did not complete within 120s"
|
||||
info "Checking S/MIME job status..."
|
||||
api_get "/api/v1/jobs" 2>/dev/null | python3 -c "
|
||||
import sys, json
|
||||
data = json.load(sys.stdin)
|
||||
for j in data.get('data', []):
|
||||
if j.get('certificate_id') == 'mc-smime-test':
|
||||
print(f\" Job {j['id']}: type={j['type']} status={j['status']} error={j.get('last_error','')}\")" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PHASE 12: API Spot Checks
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Phase 12: API Spot Checks"
|
||||
|
||||
# Health
|
||||
if api_get "/health" >/dev/null 2>&1; then
|
||||
pass "GET /health returns 200"
|
||||
else
|
||||
fail "GET /health failed"
|
||||
fi
|
||||
|
||||
# Metrics
|
||||
METRICS_RESP=$(api_get "/api/v1/metrics" 2>/dev/null || echo "ERROR")
|
||||
if echo "$METRICS_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert 'gauge' in d" 2>/dev/null; then
|
||||
pass "GET /api/v1/metrics returns valid JSON"
|
||||
else
|
||||
fail "Metrics endpoint broken"
|
||||
fi
|
||||
|
||||
# Stats summary
|
||||
STATS_RESP=$(api_get "/api/v1/stats/summary" 2>/dev/null || echo "ERROR")
|
||||
if echo "$STATS_RESP" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
|
||||
pass "GET /api/v1/stats/summary returns valid JSON"
|
||||
else
|
||||
fail "Stats summary endpoint broken"
|
||||
fi
|
||||
|
||||
# Audit trail
|
||||
AUDIT_RESP=$(api_get "/api/v1/audit" 2>/dev/null || echo '{"total":0}')
|
||||
AUDIT_TOTAL=$(echo "$AUDIT_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
if [ "$AUDIT_TOTAL" -gt 0 ]; then
|
||||
pass "Audit trail: $AUDIT_TOTAL events recorded"
|
||||
else
|
||||
fail "Audit trail empty"
|
||||
fi
|
||||
|
||||
# Jobs summary
|
||||
JOBS_RESP=$(api_get "/api/v1/jobs" 2>/dev/null || echo '{"total":0}')
|
||||
JOBS_TOTAL=$(echo "$JOBS_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
|
||||
pass "Total jobs created: $JOBS_TOTAL"
|
||||
|
||||
# Prometheus
|
||||
PROM_RESP=$(curl -sf -H "${AUTH_HEADER}" "${API_URL}/api/v1/metrics/prometheus" 2>/dev/null || echo "")
|
||||
if echo "$PROM_RESP" | grep -q "certctl_certificate_total"; then
|
||||
pass "Prometheus metrics endpoint working"
|
||||
else
|
||||
fail "Prometheus metrics endpoint broken"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Summary
|
||||
# ---------------------------------------------------------------------------
|
||||
header "Test Summary"
|
||||
|
||||
TOTAL=$((PASS + FAIL + SKIP))
|
||||
echo ""
|
||||
echo -e " ${GREEN}Passed: $PASS${NC}"
|
||||
echo -e " ${RED}Failed: $FAIL${NC}"
|
||||
echo -e " ${YELLOW}Skipped: $SKIP${NC}"
|
||||
echo -e " Total: $TOTAL"
|
||||
echo ""
|
||||
|
||||
if [ "$FAIL" -eq 0 ]; then
|
||||
echo -e "${GREEN}${BOLD}All tests passed.${NC}"
|
||||
exit 0
|
||||
else
|
||||
echo -e "${RED}${BOLD}$FAIL test(s) failed.${NC}"
|
||||
echo ""
|
||||
echo "Useful debug commands:"
|
||||
echo " docker logs certctl-test-server --tail 50"
|
||||
echo " docker logs certctl-test-agent --tail 50"
|
||||
echo " docker compose -f $COMPOSE_FILE ps"
|
||||
exit 1
|
||||
fi
|
||||
@@ -0,0 +1,140 @@
|
||||
#!/bin/sh
|
||||
# This script runs inside the certctl-server container at startup.
|
||||
# It fetches CA certificates from Pebble and step-ca, adds them to the
|
||||
# system trust store, then starts the certctl server.
|
||||
#
|
||||
# Why: The ACME connector and step-ca connector use Go's default http.Client
|
||||
# with no InsecureSkipVerify. They rely on the system trust store to verify
|
||||
# TLS connections. Pebble and step-ca both use self-signed root CAs that
|
||||
# aren't in Alpine's default CA bundle, so we must add them manually.
|
||||
#
|
||||
# This script runs as root (user: "0:0" in docker-compose) so that
|
||||
# update-ca-certificates can write to /etc/ssl/certs/.
|
||||
|
||||
set -e
|
||||
|
||||
echo "=== certctl trust store setup ==="
|
||||
|
||||
# --- Pebble CA cert (fetched from management API) ---
|
||||
# Pebble's management API serves the root CA at /roots/0.
|
||||
# We use -k because we can't verify Pebble's TLS cert yet (chicken-and-egg).
|
||||
echo "Fetching Pebble root CA from management API..."
|
||||
PEBBLE_CA=""
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if PEBBLE_CA=$(curl -sk https://pebble:15000/roots/0 2>/dev/null); then
|
||||
if [ -n "$PEBBLE_CA" ]; then
|
||||
echo "$PEBBLE_CA" > /usr/local/share/ca-certificates/pebble-ca.crt
|
||||
echo " Added: Pebble test CA"
|
||||
break
|
||||
fi
|
||||
fi
|
||||
echo " Waiting for Pebble (attempt $i/10)..."
|
||||
sleep 2
|
||||
done
|
||||
|
||||
if [ -z "$PEBBLE_CA" ]; then
|
||||
echo " WARNING: Could not fetch Pebble CA. ACME issuance will fail."
|
||||
fi
|
||||
|
||||
# --- step-ca root cert (from shared volume) ---
|
||||
# The step-ca container writes its root CA to /home/step/certs/root_ca.crt.
|
||||
# We mount the step-ca data volume at /stepca-data inside this container.
|
||||
STEPCA_ROOT="/stepca-data/certs/root_ca.crt"
|
||||
echo "Waiting for step-ca root cert..."
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if [ -f "$STEPCA_ROOT" ]; then
|
||||
cp "$STEPCA_ROOT" /usr/local/share/ca-certificates/step-ca-root.crt
|
||||
echo " Added: step-ca root CA"
|
||||
break
|
||||
fi
|
||||
echo " Waiting for step-ca root cert (attempt $i/10)..."
|
||||
sleep 2
|
||||
done
|
||||
|
||||
if [ ! -f "$STEPCA_ROOT" ]; then
|
||||
echo " WARNING: step-ca root cert not found at $STEPCA_ROOT"
|
||||
echo " step-ca issuance may fail until the cert is available."
|
||||
fi
|
||||
|
||||
# --- step-ca provisioner key (extracted from ca.json) ---
|
||||
# When step-ca auto-bootstraps via DOCKER_STEPCA_INIT_* env vars, the
|
||||
# encrypted provisioner key (JWE) is NOT written as a separate file.
|
||||
# Instead, it's embedded in ca.json under:
|
||||
# authority.provisioners[0].encryptedKey
|
||||
# We extract it here and write to /tmp so the certctl server can read it.
|
||||
# The stepca_data volume is mounted :ro, so we can't write there.
|
||||
STEPCA_CA_JSON="/stepca-data/config/ca.json"
|
||||
STEPCA_KEY_EXTRACTED="/tmp/step-ca-provisioner-key"
|
||||
echo "Extracting step-ca provisioner key from ca.json..."
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if [ -f "$STEPCA_CA_JSON" ]; then
|
||||
# Extract the encryptedKey value using grep+sed (no jq in Alpine base)
|
||||
# The field looks like: "encryptedKey": "eyJhbGciOi..."
|
||||
ENCRYPTED_KEY=$(grep -o '"encryptedKey":"[^"]*"' "$STEPCA_CA_JSON" | head -1 | sed 's/"encryptedKey":"//;s/"$//')
|
||||
if [ -z "$ENCRYPTED_KEY" ]; then
|
||||
# Try with spaces around colon (JSON formatting varies)
|
||||
ENCRYPTED_KEY=$(grep -o '"encryptedKey" *: *"[^"]*"' "$STEPCA_CA_JSON" | head -1 | sed 's/"encryptedKey" *: *"//;s/"$//')
|
||||
fi
|
||||
if [ -n "$ENCRYPTED_KEY" ]; then
|
||||
# Check if it's JWE compact serialization (dot-separated) or JSON serialization
|
||||
case "$ENCRYPTED_KEY" in
|
||||
\{*)
|
||||
# Already JSON serialization — write as-is
|
||||
echo "$ENCRYPTED_KEY" > "$STEPCA_KEY_EXTRACTED"
|
||||
;;
|
||||
*)
|
||||
# JWE compact serialization: header.encrypted_key.iv.ciphertext.tag
|
||||
# Convert to JSON serialization expected by Go decryptProvisionerKey()
|
||||
JWE_PROTECTED=$(echo "$ENCRYPTED_KEY" | cut -d. -f1)
|
||||
JWE_ENCKEY=$(echo "$ENCRYPTED_KEY" | cut -d. -f2)
|
||||
JWE_IV=$(echo "$ENCRYPTED_KEY" | cut -d. -f3)
|
||||
JWE_CT=$(echo "$ENCRYPTED_KEY" | cut -d. -f4)
|
||||
JWE_TAG=$(echo "$ENCRYPTED_KEY" | cut -d. -f5)
|
||||
printf '{"protected":"%s","encrypted_key":"%s","iv":"%s","ciphertext":"%s","tag":"%s"}' \
|
||||
"$JWE_PROTECTED" "$JWE_ENCKEY" "$JWE_IV" "$JWE_CT" "$JWE_TAG" > "$STEPCA_KEY_EXTRACTED"
|
||||
;;
|
||||
esac
|
||||
echo " Extracted provisioner key to $STEPCA_KEY_EXTRACTED"
|
||||
echo " Key file size: $(wc -c < "$STEPCA_KEY_EXTRACTED") bytes"
|
||||
echo " Key starts with: $(head -c 40 "$STEPCA_KEY_EXTRACTED")..."
|
||||
# Override the env var so the server reads from the extracted file
|
||||
export CERTCTL_STEPCA_KEY_PATH="$STEPCA_KEY_EXTRACTED"
|
||||
break
|
||||
else
|
||||
echo " ca.json found but encryptedKey not found in it (attempt $i/10)"
|
||||
fi
|
||||
else
|
||||
echo " Waiting for step-ca ca.json (attempt $i/10)..."
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
if [ ! -f "$STEPCA_KEY_EXTRACTED" ]; then
|
||||
echo " WARNING: Could not extract step-ca provisioner key"
|
||||
echo " Listing /stepca-data/config/ for debugging:"
|
||||
ls -la /stepca-data/config/ 2>/dev/null || echo " /stepca-data/config/ does not exist"
|
||||
echo " step-ca issuance will fail."
|
||||
fi
|
||||
|
||||
# --- Update system trust store ---
|
||||
echo "Updating system CA trust store..."
|
||||
update-ca-certificates 2>/dev/null || true
|
||||
|
||||
echo "Trust store updated."
|
||||
|
||||
# --- Debug: verify configuration before starting server ---
|
||||
echo "=== Pre-launch verification ==="
|
||||
echo " CERTCTL_STEPCA_KEY_PATH=$CERTCTL_STEPCA_KEY_PATH"
|
||||
if [ -f "$CERTCTL_STEPCA_KEY_PATH" ]; then
|
||||
echo " step-ca key file exists ($(wc -c < "$CERTCTL_STEPCA_KEY_PATH") bytes)"
|
||||
echo " step-ca key preview: $(head -c 60 "$CERTCTL_STEPCA_KEY_PATH")..."
|
||||
else
|
||||
echo " WARNING: step-ca key file NOT FOUND at $CERTCTL_STEPCA_KEY_PATH"
|
||||
fi
|
||||
echo " CERTCTL_ACME_DIRECTORY_URL=$CERTCTL_ACME_DIRECTORY_URL"
|
||||
echo " CERTCTL_ACME_INSECURE=$CERTCTL_ACME_INSECURE"
|
||||
echo " Pebble CA cert: $(ls -la /usr/local/share/ca-certificates/pebble-ca.crt 2>/dev/null || echo 'NOT FOUND')"
|
||||
echo " step-ca root cert: $(ls -la /usr/local/share/ca-certificates/step-ca-root.crt 2>/dev/null || echo 'NOT FOUND')"
|
||||
echo " System CA count: $(ls /etc/ssl/certs/*.pem 2>/dev/null | wc -l) PEM files"
|
||||
echo "=== Starting certctl server ==="
|
||||
exec /app/server
|
||||
@@ -1,5 +1,41 @@
|
||||
# Architecture Guide
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [System Components](#system-components)
|
||||
- [Control Plane (Server)](#control-plane-server)
|
||||
- [Agents](#agents)
|
||||
- [Web Dashboard](#web-dashboard)
|
||||
- [PostgreSQL Database](#postgresql-database)
|
||||
3. [Data Flow: Certificate Lifecycle](#data-flow-certificate-lifecycle)
|
||||
- [Create Managed Certificate](#1-create-managed-certificate)
|
||||
- [Certificate Issuance](#2-certificate-issuance)
|
||||
- [Deploy Certificate to Target](#3-deploy-certificate-to-target)
|
||||
- [Revoke a Certificate](#35-revoke-a-certificate)
|
||||
- [Automatic Renewal](#4-automatic-renewal)
|
||||
4. [Connector Architecture](#connector-architecture)
|
||||
- [IssuerConnectorAdapter (Dependency Inversion)](#issuerconnectoradapter-dependency-inversion)
|
||||
- [Issuer Connector](#issuer-connector)
|
||||
- [Target Connector](#target-connector)
|
||||
- [Notifier Connector](#notifier-connector)
|
||||
- [EST Server (RFC 7030)](#est-server-rfc-7030)
|
||||
5. [Security Model](#security-model)
|
||||
- [Private Key Management](#private-key-management)
|
||||
- [Authentication](#authentication)
|
||||
- [Audit Trail](#audit-trail)
|
||||
- [API Audit Log](#api-audit-log)
|
||||
- [Logging](#logging)
|
||||
6. [API Design](#api-design)
|
||||
7. [MCP Server](#mcp-server)
|
||||
8. [CLI Tool](#cli-tool)
|
||||
9. [Deployment Topologies](#deployment-topologies)
|
||||
- [Docker Compose (Development / Small Deployments)](#docker-compose-development--small-deployments)
|
||||
- [Production (Kubernetes)](#production-kubernetes)
|
||||
10. [Discovery Data Flow (M18b + M21)](#discovery-data-flow-m18b--m21)
|
||||
11. [Testing Strategy](#testing-strategy)
|
||||
12. [What's Next](#whats-next)
|
||||
|
||||
## Overview
|
||||
|
||||
Certctl is a certificate management platform with a **decoupled control-plane and agent architecture**. The control plane orchestrates certificate issuance and renewal, while agents deployed across your infrastructure handle key generation, certificate deployment, and local validation — private keys never leave the infrastructure they were generated on.
|
||||
@@ -9,7 +45,7 @@ New to certificates? Read the [Concepts Guide](concepts.md) first.
|
||||
### Design Principles
|
||||
|
||||
1. **Private Key Isolation** — Agents generate ECDSA P-256 keys locally and submit CSRs only. Private keys never touch the control plane. Server-side keygen available via `CERTCTL_KEYGEN_MODE=server` for demo only.
|
||||
2. **Pull-Only Deployment** — The server never initiates outbound connections to agents or targets. Agents poll for work. For network appliances and agentless targets, a proxy agent in the same network zone executes deployments via the target's API. This keeps the control plane firewalled off and limits credential scope to the proxy agent's zone.
|
||||
2. **Pull-Only Deployment** — The server never initiates outbound connections to agents or targets. Agents poll for work and receive only jobs assigned to their targets (routed via `agent_id` on jobs or through target→agent relationships). For network appliances and agentless targets, a proxy agent in the same network zone executes deployments via the target's API. This keeps the control plane firewalled off and limits credential scope to the proxy agent's zone.
|
||||
3. **Sub-CA Capable** — The Local CA can operate as a subordinate CA under an enterprise root (e.g., ADCS). Load a pre-signed CA cert+key from disk and all issued certs chain to the enterprise trust hierarchy. Self-signed mode remains the default for development/demos.
|
||||
4. **GUI as Primary Interface** — The web dashboard is the operational control plane, not a secondary viewer. Every backend feature ships with its corresponding GUI surface.
|
||||
5. **Decoupled Operations** — Agents operate autonomously; the control plane coordinates but doesn't block agent function
|
||||
@@ -25,7 +61,7 @@ flowchart TB
|
||||
API["REST API\n(Go net/http, :8443)"]
|
||||
SVC["Service Layer"]
|
||||
REPO["Repository Layer\n(database/sql + lib/pq)"]
|
||||
SCHED["Background Scheduler\n6 loops"]
|
||||
SCHED["Background Scheduler\n8 always-on + 4 optional loops"]
|
||||
DASH["Web Dashboard\n(React SPA)"]
|
||||
end
|
||||
|
||||
@@ -41,18 +77,33 @@ flowchart TB
|
||||
|
||||
subgraph "Issuer Backends"
|
||||
CA1["Local CA\n(crypto/x509, sub-CA)"]
|
||||
CA2["ACME\n(HTTP-01 + DNS-01)"]
|
||||
CA2["ACME\n(HTTP-01 + DNS-01 + DNS-PERSIST-01)\n(EAB, ZeroSSL auto-EAB)"]
|
||||
CA3["step-ca\n(/sign API)"]
|
||||
CA4["OpenSSL / Custom CA\n(script-based)"]
|
||||
CA6["Vault PKI\n(planned)"]
|
||||
CA6["Vault PKI\n(token auth, /sign API)"]
|
||||
CA7["DigiCert CertCentral\n(async order model)"]
|
||||
CA8["Sectigo SCM\n(async order model)"]
|
||||
CA9["Google CAS\n(OAuth2, sync)"]
|
||||
CA10["AWS ACM PCA\n(sync issuance)"]
|
||||
CA11["Entrust\n(mTLS, sync/async)"]
|
||||
CA12["GlobalSign Atlas\n(mTLS + API key)"]
|
||||
CA13["EJBCA\n(mTLS or OAuth2)"]
|
||||
end
|
||||
|
||||
subgraph "Target Systems"
|
||||
T1["NGINX\n(file write + reload)"]
|
||||
T4["Apache httpd\n(file write + reload)"]
|
||||
T5["HAProxy\n(combined PEM + reload)"]
|
||||
T2["F5 BIG-IP\n(proxy agent + iControl REST, planned)"]
|
||||
T3["IIS\n(agent-local PowerShell, planned)"]
|
||||
T6["Traefik\n(file provider)"]
|
||||
T7["Caddy\n(admin API / file)"]
|
||||
T8["Envoy\n(file-based SDS)"]
|
||||
T9["Postfix/Dovecot\n(file + service reload)"]
|
||||
T2["F5 BIG-IP\n(proxy agent + iControl REST)"]
|
||||
T3["IIS\n(WinRM + local)"]
|
||||
T10["SSH\n(SFTP + reload)"]
|
||||
T11["WinCertStore\n(PowerShell import)"]
|
||||
T12["Java Keystore\n(keytool pipeline)"]
|
||||
T13["Kubernetes Secrets\n(K8s API)"]
|
||||
end
|
||||
|
||||
DASH --> API
|
||||
@@ -60,7 +111,7 @@ flowchart TB
|
||||
SVC --> REPO
|
||||
REPO --> PG
|
||||
SCHED --> SVC
|
||||
SVC -->|"Issue/Renew"| CA1 & CA2 & CA3
|
||||
SVC -->|"Issue/Renew"| CA1 & CA2 & CA3 & CA4 & CA6 & CA7 & CA8 & CA9 & CA10
|
||||
|
||||
A1 & A2 & A3 -->|"CSR + Heartbeat"| API
|
||||
API -->|"Cert + Chain\n(NO private key)"| A1 & A2 & A3
|
||||
@@ -80,7 +131,7 @@ The server exposes a REST API under `/api/v1/` and optionally serves the web das
|
||||
|
||||
### Agents
|
||||
|
||||
Lightweight Go processes that run on or near your infrastructure. Agents generate ECDSA P-256 private keys locally, create CSRs, and submit them to the control plane for signing — private keys never leave agent infrastructure. Agents also handle certificate deployment to target systems (NGINX, Apache httpd, HAProxy fully implemented; F5 BIG-IP, IIS interface only with V2 implementations planned) and report job status. They communicate with the control plane via HTTP and authenticate with API keys.
|
||||
Lightweight Go processes that run on or near your infrastructure. Agents generate ECDSA P-256 private keys locally, create CSRs, and submit them to the control plane for signing — private keys never leave agent infrastructure. Agents also handle certificate deployment to target systems (NGINX, Apache httpd, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS, F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets) and report job status. They communicate with the control plane via HTTP and authenticate with API keys.
|
||||
|
||||
The agent runs two background loops: a heartbeat (every 60 seconds) to signal it's alive, and a work poll (every 30 seconds) to check for actionable jobs via `GET /api/v1/agents/{id}/work`. Jobs may be `AwaitingCSR` (agent needs to generate key + submit CSR) or `Deployment` (agent needs to deploy a certificate). Private keys are stored in `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`) with 0600 permissions.
|
||||
|
||||
@@ -88,18 +139,28 @@ The agent runs two background loops: a heartbeat (every 60 seconds) to signal it
|
||||
|
||||
**Agent groups (M11b):** Dynamic device grouping allows organizing agents by metadata criteria. Agent groups can match by OS, architecture, IP CIDR, and version. Groups support both dynamic matching (agents automatically join when criteria match) and manual membership (explicit include/exclude). Renewal policies can be scoped to agent groups via the `agent_group_id` foreign key. The GUI provides full CRUD management for agent groups with visual match criteria badges.
|
||||
|
||||
**Agent soft-retirement (I-004):** `DELETE /api/v1/agents/{id}` is a soft-delete surface — the row is never removed. Retirement stamps `agents.retired_at` (TIMESTAMPTZ) and `agents.retired_reason` (TEXT) and flips the operational status to `Offline`. Default listings (`GET /api/v1/agents`, the dashboard stats counter, and the stale-offline sweeper) filter retired rows out via `AgentRepository.ListActive`; retired rows are surfaced only through the opt-in `GET /api/v1/agents/retired` view. The endpoint follows a preflight → block → escape-hatch contract:
|
||||
|
||||
- **Clean retire** (no active dependencies) — `200 OK` with `RetireAgentResponse` (`cascade=false`, zero counts).
|
||||
- **Blocked by active dependencies** — `409 Conflict` with `BlockedByDependenciesResponse`. The three counts (`active_targets`, `active_certificates`, `pending_jobs`) tell the operator exactly which rows would be orphaned. The schema diverges from `ErrorResponse` because downstream dashboards parse the stable three-key shape.
|
||||
- **Force cascade** — `DELETE /api/v1/agents/{id}?force=true&reason=...`. `reason` is required (400 otherwise). Transactionally soft-retires downstream `deployment_targets`, cancels pending jobs, and soft-retires the agent, emitting an `agent_retirement_cascaded` audit event with actor + reason + per-bucket counts.
|
||||
- **Idempotent re-retire** — a retire attempt against an already-retired agent returns `204 No Content` with an empty body (no second audit event, no response shape — callers that POST again on a retry get a clean no-op).
|
||||
- **Sentinel refusal** — the four sentinel agent IDs (`server-scanner`, `cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`) back non-agent discovery subsystems (the network scanner and the three cloud secret-manager sources). They are refused unconditionally — even with `force=true` — via `ErrAgentIsSentinel` → `403 Forbidden`. The ID list lives in `internal/domain/connector.go` (`SentinelAgentIDs`) so handler, repository, and scheduler code can filter them without importing `service`.
|
||||
|
||||
Retired agents receive `410 Gone` on subsequent heartbeats (`service.ErrAgentRetired`). `cmd/agent` treats 410 as a terminal signal and exits cleanly so retired agents stop phoning home. Migration `000015` flipped `deployment_targets.agent_id` from `ON DELETE CASCADE` to `ON DELETE RESTRICT`, making the old hard-delete path a schema error and forcing all retirement through this contract.
|
||||
|
||||
### Web Dashboard
|
||||
|
||||
The web dashboard is the primary operational interface for certctl. It is built with Vite + React + TypeScript and uses TanStack Query for server state management (caching, background refetching, optimistic updates).
|
||||
|
||||
**Current views**: certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info + OS/architecture grouping with charts), job queue (status, retry, cancel, approve/reject), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (list with test connection + delete), targets (list with 3-step configuration wizard + delete), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), summary dashboard with charts (expiration heatmap, renewal success rate, status distribution, issuance rate), and login page.
|
||||
**Current views** (24 pages): certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info + OS/architecture grouping with charts), job queue (list + detail with verification section, timeline, audit events; approve/reject for AwaitingApproval jobs), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (catalog with 10 type cards + 3-step create wizard + detail with test connection), targets (list with 3-step configuration wizard + detail with deployment history), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), discovered certificates triage (claim/dismiss unmanaged certs discovered by agents or network scans), network scan targets management (CRUD + Scan Now button), summary dashboard with charts (expiration heatmap, renewal success rate, status distribution, issuance rate), digest preview and send, observability (health, metrics, Prometheus config), and login page.
|
||||
|
||||
The dashboard includes an **ErrorBoundary component** for graceful error recovery — if a view crashes, the boundary catches the error and displays a user-friendly message instead of breaking the entire dashboard. It also includes a **demo mode** that activates when the API is unreachable — it renders realistic mock data for screenshots and offline presentations.
|
||||
|
||||
**Tech decisions**:
|
||||
- Vite for fast builds and HMR during development
|
||||
- TanStack Query over manual fetch/useEffect for automatic cache invalidation and refetching
|
||||
- Dark theme default (ops teams live in dark mode)
|
||||
- Light content area with branded dark teal sidebar, Inter + JetBrains Mono typography
|
||||
- SSE/WebSocket planned for real-time job status updates
|
||||
|
||||
### PostgreSQL Database
|
||||
@@ -224,6 +285,9 @@ erDiagram
|
||||
text channel
|
||||
text recipient
|
||||
text status
|
||||
int retry_count
|
||||
timestamptz next_retry_at
|
||||
text last_error
|
||||
}
|
||||
certificate_profiles {
|
||||
text id PK
|
||||
@@ -284,7 +348,12 @@ erDiagram
|
||||
}
|
||||
```
|
||||
|
||||
Migrations are idempotent (`IF NOT EXISTS` on all CREATE statements, `ON CONFLICT (id) DO NOTHING` on all seed data) so they're safe to run multiple times — important for Docker Compose where both initdb and the server may run the same SQL.
|
||||
The ER diagram above documents **database shape**, not REST-API wire shape. Several columns are intentionally server-internal and never serialized to clients:
|
||||
|
||||
- `agents.api_key_hash` — SHA-256 of the agent's plaintext API key, populated by `service.RegisterAgent` (`hashAPIKey(apiKey)` at `internal/service/agent.go`) and consumed by `repository.AgentRepository::GetByAPIKey` for the auth-lookup. **Not** exposed via the REST API, **not** echoed via CLI / MCP / agent registration response, **never** logged. Enforced by `internal/domain/connector.go::Agent.MarshalJSON` (G-2 audit closure, `cat-s5-apikey_leak`); the OpenAPI Agent schema explicitly excludes the field, the frontend `Agent` interface omits it, and a CI grep guardrail at `.github/workflows/ci.yml` blocks reintroduction.
|
||||
- `issuers.config` / `deployment_targets.config` — plaintext jsonb shadow of the AES-GCM-encrypted on-disk blob; the encrypted form lives on `EncryptedConfig []byte` (Go-only field tagged `json:"-"`).
|
||||
|
||||
Migrations are idempotent (`IF NOT EXISTS` on all CREATE statements, `ON CONFLICT (id) DO NOTHING` on all seed data) so they're safe to run multiple times. Pre-U-3 (`cat-u-seed_initdb_schema_drift`, GitHub #10) the deploy compose stack mounted both a hand-curated subset of `migrations/*.up.sql` and `seed.sql` into postgres `/docker-entrypoint-initdb.d/` so initdb applied them on first boot, *and* the server re-applied the same files via `RunMigrations` on every start. The dual source of truth was the bug: every time a migration shipped that the seed depended on (e.g., 000013 added `policy_rules.severity`), the mount list had to be updated by hand, and missing the update crashed initdb on first boot. Post-U-3 the server is the single source of truth: postgres comes up with an empty schema, `RunMigrations` applies the entire ladder, then `RunSeed` lands the baseline seed (and `RunDemoSeed` lands the demo overlay when `CERTCTL_DEMO_SEED=true`). Helm has used this pattern since day one (postgres-init `emptyDir`); the docker-compose deploy now matches.
|
||||
|
||||
## Data Flow: Certificate Lifecycle
|
||||
|
||||
@@ -345,7 +414,11 @@ sequenceDiagram
|
||||
Note over A: Agent deploys using locally-held private key
|
||||
```
|
||||
|
||||
**Profile enforcement:** If the certificate is assigned to a profile (`certificate_profile_id`), the profile's `allowed_key_algorithms` and `max_validity_days` constraints are checked during CSR validation. A CSR with a disallowed key type or a validity period exceeding the profile maximum is rejected before reaching the issuer connector.
|
||||
**Profile enforcement (M11c):** Crypto policy enforcement is wired into all four issuance paths: renewal (server-side and agent CSR), agent fallback CSR signing, EST enrollment (RFC 7030), and SCEP enrollment (RFC 8894). At each path, the service layer resolves the certificate's profile and calls `ValidateCSRAgainstProfile()` to check the CSR key algorithm and minimum key size against the profile's `allowed_key_algorithms` rules. A CSR with a disallowed key type or insufficient key size is rejected before reaching the issuer connector.
|
||||
|
||||
**MaxTTL enforcement:** When a profile specifies `max_ttl_seconds`, the value is forwarded through the service-layer `IssuerConnector` interface to the connector layer via `MaxTTLSeconds` on `IssuanceRequest` and `RenewalRequest`. Each issuer connector enforces the cap according to its capabilities: the Local CA caps `NotAfter` directly, Vault overrides its TTL string, step-ca caps `NotAfter` with zero-value handling, and OpenSSL logs an advisory warning (script-based signing can't enforce server-side). For CAs that control validity themselves (ACME, DigiCert, Sectigo, Google CAS, AWS ACM PCA), MaxTTLSeconds passes through but the CA makes the final decision.
|
||||
|
||||
**Key metadata persistence:** Certificate versions record `key_algorithm` and `key_size` extracted from the CSR during issuance. This metadata enables post-hoc auditing — operators can verify that all issued certificates comply with the key requirements in effect at the time of issuance.
|
||||
|
||||
#### Server-Side Key Generation (Demo Only)
|
||||
|
||||
@@ -377,8 +450,8 @@ The agent deploys certificates using target connectors. Each connector knows how
|
||||
- **NGINX**: Writes cert/chain/key files to disk, validates config with `nginx -t`, reloads with `nginx -s reload` or `systemctl reload nginx`
|
||||
- **Apache httpd**: Writes separate cert/chain/key files, validates with `apachectl configtest`, graceful reload
|
||||
- **HAProxy**: Builds a combined PEM file (cert + chain + key), optionally validates config, reloads via systemctl or signal
|
||||
- **F5 BIG-IP** (planned): A proxy agent in the same network zone calls the iControl REST API to upload certificate and update SSL profile bindings. The server assigns the work; the proxy agent executes it.
|
||||
- **IIS** (planned, dual-mode): (1) Agent-local (recommended) — a Windows agent on the IIS box runs PowerShell `Import-PfxCertificate` + `Set-WebBinding` directly. (2) Proxy agent WinRM — for agentless IIS targets, a nearby Windows agent reaches the IIS box via WinRM.
|
||||
- **F5 BIG-IP**: A proxy agent in the same network zone calls the iControl REST API to upload certificate/key files, install crypto objects, and update the SSL client profile within an atomic transaction. The server assigns the work; the proxy agent executes it.
|
||||
- **IIS** (implemented, dual-mode): (1) Agent-local (recommended) — a Windows agent on the IIS box runs PowerShell `Import-PfxCertificate` + `Set-WebBinding` directly with PFX conversion and SHA-1 thumbprint computation. (2) Proxy agent WinRM — for agentless IIS targets, a nearby Windows agent reaches the IIS box via WinRM.
|
||||
|
||||
The agent handles both the certificate (public) and the private key (read from local key store at `CERTCTL_KEY_DIR`). The control plane never sees the private key and never initiates outbound connections to agents or targets (pull-only model).
|
||||
|
||||
@@ -408,41 +481,65 @@ sequenceDiagram
|
||||
API-->>U: 200 OK
|
||||
```
|
||||
|
||||
The revocation is recorded in the `certificate_revocations` table (separate from the certificate status update) for CRL generation. The DER-encoded CRL at `GET /api/v1/crl/{issuer_id}` is generated on-demand by querying this table and signing with the issuing CA's key. The OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}` checks both the certificate status and the revocations table to return signed good/revoked/unknown responses.
|
||||
The revocation is recorded in the `certificate_revocations` table (separate from the certificate status update) for CRL generation. The DER-encoded CRL at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615) is generated on-demand by querying this table and signing with the issuing CA's key. The OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960) checks both the certificate status and the revocations table to return signed good/revoked/unknown responses. Both endpoints are served unauthenticated — relying parties (TLS clients, hardware appliances, browsers) must be able to reach them without a certctl API key — and carry the IANA-registered media types `application/pkix-crl` and `application/ocsp-response` respectively.
|
||||
|
||||
Short-lived certificates (those with profile TTL < 1 hour) return "good" from OCSP and are excluded from CRL — their rapid expiry is treated as sufficient revocation.
|
||||
|
||||
#### Bulk Revocation
|
||||
|
||||
For compliance events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
|
||||
### 4. Automatic Renewal
|
||||
|
||||
The control plane runs a scheduler with six background loops:
|
||||
The control plane runs a scheduler with 8 always-on loops plus up to 4 optional loops (enabled by configuration). `internal/scheduler/scheduler.go:262-265` is the authoritative count.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph "Scheduler (Background Goroutines)"
|
||||
R["Renewal Checker\n⏱ every 1h"]
|
||||
J["Job Processor\n⏱ every 30s"]
|
||||
JR["Job Retry\n⏱ every 5m"]
|
||||
JT["Job Timeout\n⏱ every 10m"]
|
||||
H["Agent Health\n⏱ every 2m"]
|
||||
N["Notification Processor\n⏱ every 1m"]
|
||||
NR["Notification Retry\n⏱ every 2m"]
|
||||
SL["Short-Lived Expiry\n⏱ every 30s"]
|
||||
NS["Network Scanner\n⏱ every 6h"]
|
||||
DG["Certificate Digest\n⏱ every 24h"]
|
||||
HC["Endpoint Health\n⏱ every 60s"]
|
||||
CD["Cloud Discovery\n⏱ every 6h"]
|
||||
end
|
||||
|
||||
R -->|"Find expiring certs\nCreate renewal jobs"| DB[("PostgreSQL")]
|
||||
J -->|"Process pending jobs\nCoordinate issuance"| DB
|
||||
JR -->|"Retry Failed jobs\nFailed→Pending"| DB
|
||||
JT -->|"Reap stalled AwaitingCSR / AwaitingApproval jobs"| DB
|
||||
H -->|"Check heartbeat staleness\nMark agents offline"| DB
|
||||
N -->|"Send pending notifications\nEmail / Webhook / Slack"| DB
|
||||
NR -->|"Retry failed notifications\n2^n-min backoff, DLQ after 5 attempts"| DB
|
||||
SL -->|"Expire short-lived certs\nMark as Expired"| DB
|
||||
NS -->|"Probe TLS endpoints\nStore discovered certs"| DB
|
||||
DG -->|"Generate & send HTML digest\nEmail to recipients"| DB
|
||||
HC -->|"Probe deployed TLS endpoints\nState machine + mismatch"| DB
|
||||
CD -->|"AWS SM / Azure KV / GCP SM\nFeed discovery pipeline"| DB
|
||||
```
|
||||
|
||||
| Loop | Interval | Timeout | Purpose |
|
||||
|------|----------|---------|---------|
|
||||
| Renewal checker | 1 hour | 5 minutes | Finds certificates approaching expiry, creates renewal jobs |
|
||||
| Job processor | 30 seconds | 2 minutes | Processes pending jobs (issuance, renewal, deployment) |
|
||||
| Agent health check | 2 minutes | 1 minute | Marks agents as offline if heartbeat is stale |
|
||||
| Notification processor | 1 minute | 1 minute | Sends pending notifications via configured channels |
|
||||
| Short-lived expiry | 30 seconds | 30 seconds | Marks expired short-lived certificates (profile TTL < 1 hour) |
|
||||
| Network scanner | 6 hours | 30 minutes | Probes TLS endpoints on configured CIDR ranges, stores discovered certs (M21, opt-in via `CERTCTL_NETWORK_SCAN_ENABLED`) |
|
||||
| Loop | Interval | Always-on? | Purpose |
|
||||
|------|----------|------------|---------|
|
||||
| Renewal checker | 1 hour | Yes | Finds certificates approaching expiry (threshold-based or ARI-directed), creates renewal jobs |
|
||||
| Job processor | 30 seconds | Yes | Processes pending jobs (issuance, renewal, deployment) |
|
||||
| Job retry | 5 minutes (`CERTCTL_SCHEDULER_RETRY_INTERVAL`) | Yes | Transitions `Failed` jobs back to `Pending` for re-dispatch (I-001) |
|
||||
| Job timeout | 10 minutes (`CERTCTL_JOB_TIMEOUT_INTERVAL`) | Yes | Reaps `AwaitingCSR` jobs older than 24h and `AwaitingApproval` jobs older than 7d to `Failed`, feeding the retry loop (I-003) |
|
||||
| Agent health check | 2 minutes | Yes | Marks agents as offline if heartbeat is stale |
|
||||
| Notification processor | 1 minute | Yes | Sends pending notifications via configured channels |
|
||||
| Notification retry | 2 minutes (`CERTCTL_NOTIFICATION_RETRY_INTERVAL`) | Yes | Re-dispatches `Failed` notifications whose `next_retry_at` has elapsed; exponential backoff (2^n minutes, capped at 1h), 5-attempt budget, terminal `dead` status after exhaustion (I-005) |
|
||||
| Short-lived expiry | 30 seconds | Yes | Marks expired short-lived certificates (profile TTL < 1 hour) |
|
||||
| Network scanner | 6 hours | Opt-in (`CERTCTL_NETWORK_SCAN_ENABLED`) | Probes TLS endpoints on configured CIDR ranges, stores discovered certs (M21). CIDR size validated at API level — max /20 (4096 IPs) per range. |
|
||||
| Certificate digest | 24 hours (`CERTCTL_DIGEST_INTERVAL`) | Opt-in (digest service) | Generates HTML email with certificate stats, expiration timeline, job health, agent count. Does NOT run on startup — waits for first scheduled tick. Falls back to certificate owner emails if no explicit recipients configured. |
|
||||
| Endpoint health | 60 seconds (`CERTCTL_HEALTH_CHECK_INTERVAL`) | Opt-in (health check service) | Probes deployed TLS endpoints, drives the healthy/degraded/down/cert_mismatch state machine (M48) |
|
||||
| Cloud discovery | 6 hours | Opt-in (at least one cloud source configured) | Walks AWS Secrets Manager / Azure Key Vault / GCP Secret Manager, feeds discovery pipeline (M50) |
|
||||
|
||||
Each loop uses `sync/atomic.Bool` idempotency guards to prevent concurrent tick execution — if a loop iteration is still running when the next tick fires, the tick is skipped with a warning log. Most loops (including short-lived expiry, job retry, job timeout, and notification retry) run immediately on startup before entering their ticker interval, ensuring no gap between scheduler start and first execution. The certificate digest loop is the exception — it does NOT run on startup, only on scheduled ticks. Graceful shutdown uses `sync.WaitGroup` with `WaitForCompletion()` to drain all in-flight work before process exit.
|
||||
|
||||
Each operation has a context timeout to prevent indefinite hangs if external services become unresponsive.
|
||||
|
||||
@@ -463,9 +560,16 @@ flowchart TB
|
||||
II["IssuerConnector Interface\nIssueCertificate() | RenewCertificate()\nRevokeCertificate() | GetOrderStatus()"]
|
||||
II --> LC["Local CA"]
|
||||
II --> ACME["ACME v2"]
|
||||
II --> SC["step-ca"]
|
||||
II --> SCA["step-ca"]
|
||||
II --> OC["OpenSSL / Custom CA"]
|
||||
II --> VP["Vault PKI (planned)"]
|
||||
II --> VP["Vault PKI"]
|
||||
II --> DC["DigiCert CertCentral"]
|
||||
II --> SG["Sectigo SCM"]
|
||||
II --> GC["Google CAS"]
|
||||
II --> AP2["AWS ACM PCA"]
|
||||
II --> EN["Entrust"]
|
||||
II --> GS["GlobalSign Atlas"]
|
||||
II --> EJ["EJBCA"]
|
||||
end
|
||||
|
||||
subgraph "Target Connectors"
|
||||
@@ -474,8 +578,16 @@ flowchart TB
|
||||
TI --> NG["NGINX"]
|
||||
TI --> AP["Apache httpd"]
|
||||
TI --> HP["HAProxy"]
|
||||
TI --> F5["F5 BIG-IP (interface only)"]
|
||||
TI --> IIS["IIS (interface only)"]
|
||||
TI --> TF["Traefik"]
|
||||
TI --> CD["Caddy"]
|
||||
TI --> EV["Envoy"]
|
||||
TI --> PO["Postfix/Dovecot"]
|
||||
TI --> IIS["IIS"]
|
||||
TI --> F5["F5 BIG-IP"]
|
||||
TI --> SSH["SSH"]
|
||||
TI --> WCS["WinCertStore"]
|
||||
TI --> JKS["Java Keystore"]
|
||||
TI --> K8S["K8s Secrets"]
|
||||
end
|
||||
|
||||
subgraph "Notifier Connectors"
|
||||
@@ -527,7 +639,11 @@ type Connector interface {
|
||||
}
|
||||
```
|
||||
|
||||
Built-in issuers: **Local CA** (self-signed or sub-CA mode using `crypto/x509`), **ACME v2** (HTTP-01 and DNS-01 challenges, compatible with Let's Encrypt, Sectigo, and any ACME-compliant CA), **step-ca** (Smallstep private CA via native /sign API with JWK provisioner auth), and **OpenSSL/Custom CA** (script-based signing delegating to user-provided shell scripts). The ACME connector uses `golang.org/x/crypto/acme`, generates an ECDSA P-256 account key, handles account registration with ToS acceptance, order creation, challenge solving (HTTP-01 via built-in server, DNS-01 via script-based hooks), order finalization, and DER-to-PEM chain conversion. The interface also includes `GetCACertPEM(ctx)` for CA chain distribution (used by the EST server's `/cacerts` endpoint).
|
||||
Built-in issuers (9 connectors): **Local CA** (self-signed or sub-CA mode using `crypto/x509`), **ACME v2** (HTTP-01, DNS-01, and DNS-PERSIST-01 challenges, compatible with Let's Encrypt, ZeroSSL, Sectigo, Google Trust Services, and any ACME-compliant CA), **step-ca** (Smallstep private CA via native /sign API with JWK provisioner auth), **OpenSSL/Custom CA** (script-based signing delegating to user-provided shell scripts), **Vault PKI** (HashiCorp Vault's PKI secrets engine via /sign API with token auth), **DigiCert** (commercial CA via CertCentral REST API with async order processing), **Sectigo SCM** (async order model with 3-header auth), **Google CAS** (Cloud Certificate Authority Service with OAuth2 service account auth), and **AWS ACM Private CA** (synchronous issuance via ACM PCA API). The ACME connector uses `golang.org/x/crypto/acme`, generates an ECDSA P-256 account key, handles account registration with ToS acceptance and optional External Account Binding (EAB) for CAs that require it (ZeroSSL, Google Trust Services, SSL.com), order creation, challenge solving (HTTP-01 via built-in server, DNS-01 via script-based hooks, DNS-PERSIST-01 via standing TXT records with auto-fallback to DNS-01), order finalization, and DER-to-PEM chain conversion. For ZeroSSL, EAB credentials are auto-fetched from ZeroSSL's public API when the directory URL is detected as ZeroSSL and no EAB credentials are provided — zero-friction onboarding with no dashboard visit required.
|
||||
|
||||
**ACME Renewal Information (ARI, RFC 9773):** The ACME connector supports CA-directed renewal timing via the `GetRenewalInfo()` method. Instead of using fixed thresholds (e.g., renew 30 days before expiry), the CA tells certctl when to renew by providing a `suggestedWindow` with start and end times. This is useful for distributing renewal load during maintenance windows and coordinating mass-revocation scenarios. Enable with `CERTCTL_ACME_ARI_ENABLED=true`. Cert ID is computed as `base64url(SHA-256(DER cert))` per RFC 9773. If the CA doesn't support ARI (404 from the ARI endpoint), certctl automatically falls back to threshold-based renewal — no operator intervention required. Errors from the CA are logged as warnings.
|
||||
|
||||
The interface also includes `GetCACertPEM(ctx)` for CA chain distribution (used by the EST server's `/cacerts` endpoint).
|
||||
|
||||
### Target Connector
|
||||
|
||||
@@ -543,9 +659,11 @@ type Connector interface {
|
||||
|
||||
The `DeploymentRequest` struct carries the full material needed by the target system: the signed certificate, the CA chain, the agent-generated private key, target-specific configuration, and arbitrary metadata. The key field is populated by the agent from its local key store (`CERTCTL_KEY_DIR`) — it never originates from the control plane.
|
||||
|
||||
Built-in targets: **NGINX** (writes cert/chain/key files, validates with `nginx -t`, reloads), **Apache httpd** (writes cert/chain/key files, validates with `apachectl configtest`, graceful reload), **HAProxy** (combined PEM file with cert+chain+key, validates config, reloads via systemctl/signal), **F5 BIG-IP** (interface only — proxy agent + iControl REST, implementation planned), **IIS** (interface only — dual-mode: agent-local PowerShell primary + proxy agent WinRM for agentless targets, implementation planned).
|
||||
Built-in targets (14 connector types): **NGINX** (writes cert/chain/key files, validates with `nginx -t`, reloads), **Apache httpd** (writes cert/chain/key files, validates with `apachectl configtest`, graceful reload), **HAProxy** (combined PEM file with cert+chain+key, validates config, reloads via systemctl/signal), **Traefik** (file provider — writes cert/key to watched directory, Traefik auto-reloads), **Caddy** (dual-mode: admin API hot-reload or file-based), **Envoy** (file-based with optional SDS JSON config), **F5 BIG-IP** (proxy agent + iControl REST, transaction-based atomic SSL profile updates), **IIS** (dual-mode: agent-local PowerShell + proxy agent WinRM for agentless targets), **Postfix/Dovecot** (file write + service reload), **SSH** (agentless deployment via SSH/SFTP), **Windows Certificate Store** (PowerShell-based cert import, dual-mode local/WinRM), **Java Keystore** (PEM → PKCS#12 → keytool pipeline, JKS and PKCS12 formats), **Kubernetes Secrets** (deploys as `kubernetes.io/tls` Secrets via injectable K8sClient interface, in-cluster or kubeconfig auth).
|
||||
|
||||
Additional cloud, network, and Kubernetes target connectors are planned for future releases.
|
||||
After deployment, agents can perform **post-deployment TLS verification**: the agent probes the live TLS endpoint using `crypto/tls.DialWithDialer` and compares the SHA-256 fingerprint of the served certificate against what was deployed. Results are reported via `POST /api/v1/jobs/{id}/verify` and stored on the job record. Verification is best-effort — failures don't block or rollback deployments.
|
||||
|
||||
The SSH connector enables agentless deployment to any Linux/Unix server via SSH/SFTP, using the proxy agent pattern. The Kubernetes Secrets connector deploys certificates as `kubernetes.io/tls` Secrets via an injectable K8sClient interface supporting both in-cluster and out-of-cluster auth.
|
||||
|
||||
### Notifier Connector
|
||||
|
||||
@@ -563,6 +681,16 @@ Built-in notifiers: **Email** (SMTP), **Webhook** (HTTP POST), **Slack** (incomi
|
||||
|
||||
See the [Connector Development Guide](connectors.md) for details on building custom connectors.
|
||||
|
||||
### Notification Retry & Dead-Letter Queue
|
||||
|
||||
A transient notifier failure (SMTP timeout, 5xx webhook response, Slack rate-limit) must not silently drop a critical alert. Migration `000016_notification_retry` adds three columns to `notification_events` — `retry_count INTEGER NOT NULL DEFAULT 0`, `next_retry_at TIMESTAMPTZ` (nullable — only meaningful while a row is in `failed` state), and `last_error TEXT` (the most recent transient error, preserved for operator triage) — together with a partial index `idx_notification_events_retry_sweep ON notification_events(next_retry_at) WHERE status = 'failed' AND next_retry_at IS NOT NULL` so the retry hot path scales with the retry-eligible slice rather than the full notification history.
|
||||
|
||||
The scheduler's notification-retry loop (see the scheduler section above) calls `NotificationService.RetryFailedNotifications(ctx)` every `CERTCTL_NOTIFICATION_RETRY_INTERVAL` (default `2m`). Each tick pulls up to 1000 rows via `notifRepo.ListRetryEligible(ctx, now, maxAttempts, sweepLimit)` — a partial-index-driven query that filters on `status='failed' AND next_retry_at <= now() AND retry_count < 5` — and redispatches them through the same notifier registry used by `ProcessPendingNotifications`. A successful redispatch transitions the row directly to `sent` without incrementing `retry_count`, so the audit trail preserves "delivered on attempt N". A failed redispatch re-arms `next_retry_at` using exponential backoff — `wait = min(2^retry_count minutes, 1h)` — bumps `retry_count`, and stamps `last_error`. When `retry_count >= 4` (the fifth attempt has just failed) the row is promoted to the terminal `dead` status via `notifRepo.MarkAsDead`, which clears `next_retry_at` so the partial retry-sweep index stops matching and the row cannot be re-entered into the retry rotation without operator action.
|
||||
|
||||
`NotificationService.RequeueNotification(ctx, id)` is the operator-driven escape hatch from `dead`. It atomically resets `retry_count → 0`, `next_retry_at → NULL`, `last_error → NULL`, and `status → pending`, handing the row back to `ProcessPendingNotifications` on the next 1m tick. This is the correct response to "the notifier outage is resolved, redeliver the queue"; it is not a retry, which is why the retry counter is reset rather than incremented.
|
||||
|
||||
The dead-letter depth is surfaced in two places. First, `DashboardSummary.NotificationsDead` is populated by `StatsService.GetDashboardSummary` via `notifRepo.CountByStatus(ctx, "dead")`. The injection uses a `SetNotifRepo` setter pattern (mirroring `CertificateService.SetTargetRepo`) rather than a new positional argument to `NewStatsService`, which keeps all nine existing `NewStatsService` call sites (main.go plus eight digest tests and stats_test.go) signature-stable — when the notification repository has not been wired in, `NotificationsDead` falls through to zero. Second, the `/api/v1/metrics/prometheus` endpoint emits `certctl_notification_dead_total` as a counter (operator alert thresholds per the I-005 spec: `> 0` warning, `> 10` critical) using the same `DashboardSummary` snapshot so the dashboard card and the Prometheus counter cannot skew. The web dashboard exposes a two-tab toolbar on `/notifications` — "All" (the pre-I-005 inbox) and "Dead letter" (threads `?status=dead` into the list query, surfaces `Retry N/5` and the truncated `last_error` with a full-text tooltip per row, and binds a Requeue button to `POST /api/v1/notifications/{id}/requeue`).
|
||||
|
||||
### EST Server (RFC 7030)
|
||||
|
||||
The EST (Enrollment over Secure Transport) server provides an industry-standard enrollment interface for devices that need certificates without using the REST API. It runs under `/.well-known/est/` per RFC 7030 and supports four operations: CA certificate distribution (`/cacerts`), initial enrollment (`/simpleenroll`), re-enrollment (`/simplereenroll`), and CSR attributes (`/csrattrs`).
|
||||
@@ -598,10 +726,52 @@ type ESTService interface {
|
||||
}
|
||||
```
|
||||
|
||||
**Issuer connector extension:** EST required adding `GetCACertPEM(ctx) (string, error)` to the issuer connector interface so the `/cacerts` endpoint can serve the CA chain. The Local CA connector returns its CA certificate PEM; ACME, step-ca, and OpenSSL connectors return errors (they don't expose a static CA chain — their chains are per-issuance).
|
||||
**Issuer connector extension:** EST required adding `GetCACertPEM(ctx) (string, error)` to the issuer connector interface so the `/cacerts` endpoint can serve the CA chain. The Local CA returns its CA certificate PEM; Vault PKI fetches via `GET /v1/{mount}/ca/pem`; Google CAS fetches via API; AWS ACM PCA retrieves via `GetCertificateAuthorityCertificate`. ACME, step-ca, OpenSSL, DigiCert, and Sectigo connectors return errors (they don't expose a static CA chain — their chains are per-issuance).
|
||||
|
||||
**Authentication:** EST endpoints are served unauthenticated at the HTTP layer under `/.well-known/est/*` — no Bearer token required. Per RFC 7030 §3.2.3 EST authentication is deployment-specific, and per §4.1.1 `/cacerts` is explicitly anonymous. certctl enforces authentication via CSR signature verification inside `ESTService.SimpleEnroll`/`SimpleReEnroll` plus profile policy gates (allowed key algorithms, minimum key size, permitted SANs, permitted EKUs, MaxTTL). The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes `/.well-known/est/*` through `noAuthHandler` (RequestID + structuredLogger + Recovery only). Operators who need stronger client identification should terminate mTLS at an upstream reverse proxy and pin the CSR's SAN to the client cert subject at the profile level.
|
||||
|
||||
**Audit:** Every EST enrollment is recorded in the audit trail with `protocol: "EST"`, the CN, SANs, issuer ID, serial number, and optional profile ID.
|
||||
|
||||
### SCEP Server (RFC 8894)
|
||||
|
||||
The SCEP (Simple Certificate Enrollment Protocol) server provides certificate enrollment for MDM platforms and network devices. It runs at `/scep` with operation-based dispatch via query parameters per RFC 8894.
|
||||
|
||||
**Architecture:** SCEP follows the exact same layering as EST — a handler-level protocol that delegates certificate issuance to an existing `IssuerConnector`. The `SCEPService` bridges the `SCEPHandler` to whichever issuer connector is configured via `CERTCTL_SCEP_ISSUER_ID`.
|
||||
|
||||
```
|
||||
Client (MDM, network device, SCEP client)
|
||||
│
|
||||
▼
|
||||
SCEPHandler (handler layer)
|
||||
│ PKCS#7 envelope parsing, CSR extraction, challenge password extraction
|
||||
▼
|
||||
SCEPService (service layer)
|
||||
│ Challenge password validation, CSR validation, CN/SAN extraction, audit recording
|
||||
▼
|
||||
IssuerConnector (connector layer via IssuerConnectorAdapter)
|
||||
│ Certificate signing (Local CA, step-ca, etc.)
|
||||
▼
|
||||
Signed certificate returned as PKCS#7 certs-only
|
||||
```
|
||||
|
||||
**Wire format:** SCEP clients wrap CSRs in PKCS#7 SignedData envelopes. The handler parses the outer ASN.1 ContentInfo → SignedData → EncapsulatedContentInfo to extract the CSR bytes. Fallback paths handle base64-encoded PKCS#7 and raw CSR submissions (for simpler clients). Responses use PKCS#7 certs-only via the shared `internal/pkcs7` package (same as EST). Single certs are returned as raw DER for `GetCACert`, chains as PKCS#7.
|
||||
|
||||
**Authentication:** SCEP endpoints at `/scep` and `/scep/*` are served unauthenticated at the HTTP layer — no Bearer token required — per RFC 8894 §3.2, which defines authentication via the `challengePassword` attribute (OID 1.2.840.113549.1.9.7) embedded in the PKCS#10 CSR rather than an HTTP credential. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes `/scep` and `/scep/*` through `noAuthHandler` (RequestID + structuredLogger + Recovery only). The `challengePassword` is mandatory: `preflightSCEPChallengePassword` at startup refuses to boot the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`, closing CWE-306 (missing authentication for a critical function). `SCEPService.PKCSReq` enforces the same invariant defense-in-depth — an empty `s.challengePassword` rejects every enrollment — and the password comparison uses `crypto/subtle.ConstantTimeCompare` to prevent response-time side-channel leakage. The startup log line `SCEP server enabled` emits a `challenge_password_set` boolean for operator visibility.
|
||||
|
||||
**Interface:** The `SCEPHandler` defines an `SCEPService` interface (dependency inversion):
|
||||
|
||||
```go
|
||||
type SCEPService interface {
|
||||
GetCACaps(ctx context.Context) string
|
||||
GetCACert(ctx context.Context) (string, error)
|
||||
PKCSReq(ctx context.Context, csrPEM string, challengePassword string, transactionID string) (*domain.SCEPEnrollResult, error)
|
||||
}
|
||||
```
|
||||
|
||||
**Shared PKCS#7 package:** Both EST and SCEP handlers share a common `internal/pkcs7` package for building PKCS#7 certs-only responses and PEM-to-DER chain conversion, eliminating code duplication between the two enrollment protocols.
|
||||
|
||||
**Audit:** Every SCEP enrollment is recorded in the audit trail with `protocol: "SCEP"`, the CN, SANs, issuer ID, serial number, transaction ID, and optional profile ID.
|
||||
|
||||
## Security Model
|
||||
|
||||
### Private Key Management
|
||||
@@ -643,10 +813,11 @@ The control plane only handles public material: certificates, chains, and CSRs.
|
||||
|
||||
### Authentication
|
||||
|
||||
- **API clients → Server**: API key in `Authorization: Bearer` header, or `none` for demo mode
|
||||
- **API clients → Server**: API key in `Authorization: Bearer` header, or `none` for demo mode. Applies to every path under `/api/v1/*`.
|
||||
- **Agent → Server**: API key registered at agent creation, included in all requests
|
||||
- **Server → Issuers**: ACME account key, or connector-specific credentials
|
||||
- **Agent → Targets**: API tokens, WinRM credentials (stored locally on agent or proxy agent — never on server). Credential scope is limited to the agent's network zone.
|
||||
- **Standards-based enrollment and PKI distribution endpoints**: `/.well-known/est/*` (RFC 7030), `/scep` and `/scep/*` (RFC 8894), and `/.well-known/pki/crl/{issuer_id}` + `/.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer. These protocols carry their own authentication semantics — CSR signature + profile policy for EST (§3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous), `challengePassword` in CSR attributes for SCEP (§3.2), and relying-party accessibility for CRL/OCSP — and cannot present certctl Bearer tokens. The dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware). CWE-306 is closed for SCEP by `preflightSCEPChallengePassword`, which refuses to start the server when SCEP is enabled without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The 27-subtest regression harness `cmd/server/finalhandler_test.go` pins this dispatch surface (EST 4-endpoint, SCEP exact + trailing-slash + query-string, PKI CRL+OCSP, health probes, `/api/v1/*` authenticated, `/assets/*` file server, SPA fallback).
|
||||
|
||||
### Audit Trail
|
||||
|
||||
@@ -669,10 +840,75 @@ Audit events cannot be modified or deleted. They support filtering by actor, act
|
||||
|
||||
### API Audit Log
|
||||
|
||||
In addition to application-level audit events, certctl records every HTTP API call via middleware. The audit middleware captures method, path, actor (extracted from auth context), SHA-256 request body hash (truncated to 16 characters), response status code, and request latency. Health and readiness probes are excluded to avoid noise.
|
||||
In addition to application-level audit events, certctl records every HTTP API call via middleware. The audit middleware captures method, URL path (excluding query parameters — see security note below), actor (extracted from auth context), SHA-256 request body hash (truncated to 16 characters), response status code, and request latency. Health and readiness probes are excluded to avoid noise.
|
||||
|
||||
**Security: Query Parameter Exclusion** — The audit middleware intentionally records `r.URL.Path` only (not `r.URL.String()` or `r.RequestURI`). Query strings may contain cursor tokens, API keys passed as params, or other sensitive filter values. Since the audit trail is append-only with no deletion capability, any sensitive data recorded would persist permanently.
|
||||
|
||||
Audit recording is async (via goroutine) so it never blocks the HTTP response. If audit persistence fails, the error is logged immediately — the API call still succeeds. The middleware sits after the auth middleware in the stack so the actor identity is available from context.
|
||||
|
||||
### Input Validation and SSRF Protection
|
||||
|
||||
All shell-facing inputs (connector scripts, domain names, ACME tokens) are validated through `internal/validation/command.go` before reaching shell execution. `ValidateShellCommand()` denies all shell metacharacters. `ValidateDomainName()` enforces RFC 1123. `ValidateACMEToken()` restricts to base64url characters. The network scanner filters reserved IP ranges (loopback, link-local including cloud metadata 169.254.169.254, multicast, broadcast) to prevent SSRF, while preserving RFC 1918 private ranges for legitimate internal scanning.
|
||||
|
||||
### Request Body Size Limits
|
||||
|
||||
All incoming HTTP request bodies are capped by `http.MaxBytesReader` middleware (default 1MB, configurable via `CERTCTL_MAX_BODY_SIZE`). Requests exceeding the limit receive a 413 Request Entity Too Large response. The middleware is positioned before authentication in the chain so oversized payloads are rejected early, before any auth processing or database work occurs. Requests without bodies (GET, HEAD, nil body) skip the limit check.
|
||||
|
||||
### Config Encryption at Rest
|
||||
|
||||
Dynamic issuer and target configurations (rows with `source='database'`) contain credentials — ACME EAB HMACs, Vault tokens, DigiCert/Sectigo API keys, SSH private keys, WinRM passwords, F5 BIG-IP passwords, and similar. These are sealed at rest in PostgreSQL via `internal/crypto/encryption.go` using AES-256-GCM with a key derived from the operator passphrase `CERTCTL_CONFIG_ENCRYPTION_KEY` through PBKDF2-SHA256 (100,000 rounds, 32-byte output).
|
||||
|
||||
**v2 wire format (current, M-8 remediation, CWE-916 / CWE-329):**
|
||||
|
||||
```
|
||||
magic(0x02) || salt(16) || nonce(12) || ciphertext+tag
|
||||
```
|
||||
|
||||
Every call to `EncryptIfKeySet` draws 16 fresh bytes from `crypto/rand` as the PBKDF2 salt, so the derived AES-256 key is distinct per ciphertext and per re-encryption. The salt is stored alongside the ciphertext; decryption reads the magic byte, splits out the salt, re-derives the key, and verifies the AEAD tag.
|
||||
|
||||
**v1 legacy format (read-only):**
|
||||
|
||||
```
|
||||
nonce(12) || ciphertext+tag
|
||||
```
|
||||
|
||||
Pre-M-8 blobs were sealed with a package-level fixed salt `"certctl-config-encryption-v1"`. `DecryptIfKeySet` preserves the v1 read path unchanged — a blob whose first byte is not `0x02`, or whose v2 AEAD verification fails (including the 1/256 case where a v1 nonce happens to begin with `0x02`), falls through to a v1 attempt against the legacy fixed salt. v1 blobs are never written by the post-M-8 code path; they re-seal as v2 naturally on the next UPDATE through the normal service CRUD flow. No operator migration ceremony is required.
|
||||
|
||||
**Fail-closed behavior (C-2 sentinel, CWE-311):** both `EncryptIfKeySet` and `DecryptIfKeySet` return `ErrEncryptionKeyRequired` when invoked with an empty passphrase. The server refuses to start if any `source='database'` rows already exist without `CERTCTL_CONFIG_ENCRYPTION_KEY` set.
|
||||
|
||||
**Low-level primitives preserved byte-identical.** `Encrypt`, `Decrypt`, and `DeriveKey` are kept bit-stable so v1 fixtures on disk remain decryptable unchanged and so callers outside the config-encryption path (none today, but the symbols are exported) do not see a breaking change. The new per-ciphertext salt path is reached via the helper `deriveKeyWithSalt(passphrase, salt)`.
|
||||
|
||||
**Passphrase plumbing.** Services (`IssuerService`, `TargetService`, `IssuerRegistry`) hold the operator passphrase as a raw `string` and delegate PBKDF2 to the crypto package per ciphertext. This replaces the pre-M-8 design that pre-derived a single `[]byte` key at service construction and reused it for every row, which was the direct consequence of the fixed-salt KDF.
|
||||
|
||||
**Coverage gate.** CI enforces `internal/crypto/...` coverage ≥ 85% (observed 86.7%) — the encryption primitives are a security-critical gate, and the v2 format plus v1 fallback plus C-2 sentinel paths all need exhaustive coverage to avoid silent regressions.
|
||||
|
||||
### CORS
|
||||
|
||||
CORS uses a **deny-by-default** posture: when `CERTCTL_CORS_ORIGINS` is empty, no CORS headers are set and only same-origin requests can read responses. Operators must explicitly configure allowed origins. This prevents accidental exposure of the API to cross-origin requests in production.
|
||||
|
||||
### Middleware Chain Order
|
||||
|
||||
The HTTP middleware stack processes requests in the following order (see `cmd/server/main.go`):
|
||||
|
||||
1. **RequestID** - assigns unique request ID for correlation
|
||||
2. **Logging** - structured slog middleware with request ID propagation
|
||||
3. **Recovery** - panic recovery (catches panics in downstream middleware/handlers)
|
||||
4. **BodyLimit** - request body size cap via `http.MaxBytesReader`
|
||||
5. **RateLimiter** - token bucket rate limiting (optional, when enabled)
|
||||
6. **CORS** - cross-origin request handling (deny-by-default)
|
||||
7. **Auth** - API key validation (or none in development; JWT/OIDC via authenticating gateway, see below — not in-process)
|
||||
8. **AuditLog** - records every API call to the audit trail (requires auth context for actor)
|
||||
|
||||
### Authenticating-gateway pattern (JWT, OIDC, mTLS)
|
||||
|
||||
certctl's in-process authentication surface is intentionally narrow: `api-key` for production deployments and `none` for development. There is no in-process JWT, OIDC, mTLS, or SAML middleware. (`CERTCTL_AUTH_TYPE=jwt` was accepted pre-G-1 but silently routed through the api-key bearer middleware — a security finding masquerading as a config option, removed at the v2.x boundary; see [`upgrade-to-v2-jwt-removal.md`](upgrade-to-v2-jwt-removal.md) if you previously set it.)
|
||||
|
||||
For deployments that need JWT/OIDC/mTLS, the standard pattern is to put an authenticating gateway in front of certctl and configure `CERTCTL_AUTH_TYPE=none` on the upstream certctl process. The gateway terminates the federated identity protocol, validates tokens / certificates / SAML assertions, and proxies the authenticated request to certctl as a same-origin call on a private network. This separation gives operators the full breadth of the modern identity ecosystem (oauth2-proxy, Envoy `ext_authz`, Traefik `ForwardAuth`, Pomerium, Authelia, Caddy `forward_auth`, Apache `mod_auth_openidc`, nginx `auth_request`) without certctl itself having to track signing-key rotation, claim mapping, audience validation, and the rest of the JWT/OIDC surface area. Operators wanting per-request actor attribution past the gateway boundary forward the gateway-resolved identity (e.g., `X-Auth-Request-User` from oauth2-proxy) and run a small authorization layer at the gateway that enforces the bearer-key contract certctl actually uses.
|
||||
|
||||
### Concurrency Safety
|
||||
|
||||
The background scheduler uses `sync/atomic.Bool` idempotency guards on every loop (8 always-on plus up to 4 optional) — if a tick fires while the previous iteration is still running, it skips. A `sync.WaitGroup` tracks all in-flight goroutines. `WaitForCompletion(timeout)` blocks during shutdown until all work finishes or the timeout expires, preventing state corruption from mid-flight database operations during process exit.
|
||||
|
||||
### Logging
|
||||
|
||||
All logging throughout the service layer uses Go's `log/slog` package for structured, queryable logs. This replaces ad-hoc `fmt.Printf` statements with consistent key-value logging that includes request context, operation names, and error details. Agents also implement exponential backoff on network failures to gracefully handle temporary connectivity issues with the control plane.
|
||||
@@ -690,10 +926,12 @@ All endpoints are under `/api/v1/` and follow consistent patterns:
|
||||
|
||||
Resources: certificates, issuers, targets, agents, jobs, policies, profiles, teams, owners, agent-groups, audit, notifications, discovered-certificates, discovery-scans, network-scan-targets, stats, metrics.
|
||||
|
||||
The full API is documented in an OpenAPI 3.1 specification at `api/openapi.yaml` with 97 endpoints across 20 resource domains (95 under `/api/v1/` + `/.well-known/est/` plus `/health` and `/ready`; includes auth, 7 discovery endpoints from M18b, 6 network scan endpoints from M21, Prometheus metrics from M22, and 4 EST enrollment endpoints from M23), all request/response schemas, and pagination conventions. See the [OpenAPI Guide](openapi.md) for usage with Swagger UI and SDK generation.
|
||||
The full API is documented in an OpenAPI 3.1 specification at `api/openapi.yaml` with 97 operations across `/api/v1/` and `/.well-known/est/` (includes auth, 7 discovery endpoints, 6 network scan endpoints, Prometheus metrics, 4 EST enrollment endpoints, 2 digest endpoints, 2 verification endpoints, 2 export endpoints), all request/response schemas, and pagination conventions. The server also registers `/health` and `/ready` outside the OpenAPI spec, bringing the total route count to 107. See the [OpenAPI Guide](openapi.md) for usage with Swagger UI and SDK generation.
|
||||
|
||||
Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST /api/v1/jobs/{id}/approve`, `POST /api/v1/jobs/{id}/reject`.
|
||||
|
||||
**Bulk Operations:** `POST /api/v1/certificates/bulk-revoke` — Bulk revocation by filter criteria (profile_id, owner_id, agent_id, issuer_id). Creates individual revocation jobs for matching certificates, with partial-failure tolerance and a summary audit event.
|
||||
|
||||
**Enhanced Query Features (M20):** Certificate list endpoints support additional query capabilities beyond basic pagination:
|
||||
|
||||
- **Sorting**: `?sort=notAfter` (ascending) or `?sort=-createdAt` (descending). Whitelist: notAfter, expiresAt, createdAt, updatedAt, commonName, name, status, environment.
|
||||
@@ -703,7 +941,9 @@ Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST
|
||||
- **Additional filters**: `?agent_id=`, `?profile_id=` (in addition to existing status, environment, owner_id, team_id, issuer_id).
|
||||
- **Deployments**: `GET /api/v1/certificates/{id}/deployments` returns deployment targets for a certificate.
|
||||
|
||||
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. A JSON-formatted CRL is available at `GET /api/v1/crl`, and a DER-encoded X.509 CRL signed by the issuing CA at `GET /api/v1/crl/{issuer_id}`. An embedded OCSP responder serves signed responses at `GET /api/v1/ocsp/{issuer_id}/{serial}`. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation.
|
||||
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`). The embedded OCSP responder serves signed responses unauthenticated at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`). Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation.
|
||||
|
||||
Certificate export (M27): `GET /api/v1/certificates/{id}/export/pem` returns PEM-encoded certificate and chain, and `POST /api/v1/certificates/{id}/export/pkcs12` returns a PKCS#12 bundle (binary). Private keys are never exported — they remain on agents. All exports are audited with actor, timestamp, and format.
|
||||
|
||||
Health checks live outside the API prefix: `GET /health` and `GET /ready`.
|
||||
|
||||
@@ -716,7 +956,7 @@ flowchart LR
|
||||
AI["AI Assistant\n(Claude, Cursor)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
MCP -->|"HTTP + Bearer token"| API["certctl REST API\n:8443"]
|
||||
|
||||
subgraph "78 MCP Tools"
|
||||
subgraph "MCP Tools"
|
||||
T1["Certificate CRUD"]
|
||||
T2["Agent Management"]
|
||||
T3["Job Operations"]
|
||||
@@ -730,7 +970,7 @@ flowchart LR
|
||||
|
||||
The MCP server is a stateless HTTP proxy — every MCP tool call translates to an HTTP request to the certctl REST API. It adds no new state, no new dependencies, and no new attack surface beyond what the API already exposes. Configuration is minimal: `CERTCTL_SERVER_URL` and `CERTCTL_API_KEY` environment variables.
|
||||
|
||||
The 78 tools are organized across 16 resource domains with typed input structs and `jsonschema` struct tags for automatic LLM-friendly schema generation. Binary response support handles DER CRL and OCSP endpoints.
|
||||
The tools are organized across 16 resource domains with typed input structs and `jsonschema` struct tags for automatic LLM-friendly schema generation. Binary response support handles DER CRL and OCSP endpoints.
|
||||
|
||||
## CLI Tool
|
||||
|
||||
@@ -760,7 +1000,9 @@ flowchart TB
|
||||
**Credentials & Configuration:**
|
||||
Database and API credentials are managed via environment variables defined in a `.env` file. Copy `deploy/.env.example` to `deploy/.env` for local development and customize credentials for production. The agent key directory (`CERTCTL_KEY_DIR`) is persisted as a named Docker volume (`agent_keys`) at `/var/lib/certctl/keys` for reliable key storage across container restarts.
|
||||
|
||||
### Production (Kubernetes)
|
||||
### Production (Kubernetes with Helm)
|
||||
|
||||
A production-ready Helm chart is available under `deploy/helm/certctl/` with full support for multi-replica deployments, persistent PostgreSQL, agent DaemonSet, optional Ingress, and security best practices.
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@@ -786,11 +1028,26 @@ flowchart TB
|
||||
DS --> DEP
|
||||
```
|
||||
|
||||
**Helm Installation:**
|
||||
|
||||
```bash
|
||||
# Add the chart (if published) or install from local directory
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set server.auth.apiKey="your-secure-key" \
|
||||
--set postgresql.auth.password="your-db-password" \
|
||||
--set ingress.enabled=true \
|
||||
--set ingress.hosts[0].host="certctl.example.com"
|
||||
```
|
||||
|
||||
The Helm chart includes: server Deployment with configurable replicas, liveness/readiness probes, security context (non-root, read-only rootfs), PostgreSQL StatefulSet with persistent volumes, optional Ingress with TLS, ServiceAccount with configurable RBAC, and agent DaemonSet running one agent per node. All certctl configuration options are exposed in `values.yaml` — issuers, targets, notifiers, scheduler intervals, discovery settings, and SMTP for digest emails.
|
||||
|
||||
See `deploy/helm/certctl/values.yaml` for the full configuration reference and `deploy/helm/certctl/Chart.yaml` for version and appVersion details.
|
||||
|
||||
For production, you would also add an ingress controller, TLS termination for the certctl API itself, and external PostgreSQL (RDS, Cloud SQL, etc.).
|
||||
|
||||
## Discovery Data Flow (M18b + M21)
|
||||
## Discovery Data Flow (M18b + M21 + M50)
|
||||
|
||||
Certificate discovery enables operators to build a complete inventory of existing certificates before managing them with certctl. There are two discovery modes that feed into the same pipeline:
|
||||
Certificate discovery enables operators to build a complete inventory of existing certificates before managing them with certctl. There are three discovery modes that feed into the same pipeline:
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
@@ -799,6 +1056,7 @@ flowchart TB
|
||||
SCAN["Filesystem Scanner\n(CERTCTL_DISCOVERY_DIRS)"]
|
||||
SERVER["certctl-server\n(network discovery)"]
|
||||
NETSCAN["TLS Scanner\n(CIDR ranges + ports)"]
|
||||
CLOUD["Cloud Discovery\n(AWS SM / Azure KV / GCP SM)"]
|
||||
end
|
||||
|
||||
EXTRACT["Extract Metadata\n(CN, SANs, serial, issuer, expiry, fingerprint)"]
|
||||
@@ -814,6 +1072,7 @@ flowchart TB
|
||||
SCAN --> EXTRACT
|
||||
SERVER -->|"Scheduler loop\n(every 6h)"| NETSCAN
|
||||
NETSCAN -->|"crypto/tls.Dial\n50 goroutines"| EXTRACT
|
||||
CLOUD -->|"Scheduler loop\n(every 6h)"| EXTRACT
|
||||
EXTRACT --> SERVICE
|
||||
SERVICE --> REPO
|
||||
REPO -->|"Dedup by fingerprint\n+ agent_id + source_path"| DB
|
||||
@@ -840,7 +1099,16 @@ flowchart TB
|
||||
5. **Sentinel agent** — Results submitted using `server-scanner` as virtual agent ID, with `source_path` set to `ip:port` and `source_format` set to `network`
|
||||
6. **Same pipeline** — Feeds into the same `DiscoveryService.ProcessDiscoveryReport()` as filesystem discovery — same dedup, same audit trail, same triage workflow
|
||||
|
||||
**Common triage workflow (both sources):**
|
||||
**Cloud Secret Manager Discovery (M50):**
|
||||
|
||||
1. **Pluggable sources** — Each cloud provider implements the `DiscoverySource` interface (Name, Type, Discover, ValidateConfig). Three built-in sources: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager
|
||||
2. **CloudDiscoveryService orchestrator** — Iterates registered sources, calls `Discover()` on each, feeds reports into `ProcessDiscoveryReport()`. Errors from one source don't prevent other sources from running
|
||||
3. **Scheduler integration** — opt-in cloud discovery scheduler loop (6h default; see `docs/architecture.md` 12-loop topology), runs immediately on startup, `atomic.Bool` idempotency guard
|
||||
4. **Sentinel agents** — Each source uses its own sentinel agent ID (`cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`) for dedup and triage filtering
|
||||
5. **Source path format** — `aws-sm://{region}/{secret}`, `azure-kv://{cert-name}/{version}`, `gcp-sm://{project}/{secret}`
|
||||
6. **No new schema** — Reuses existing `discovered_certificates` and `discovery_scans` tables. Sentinel agent IDs leverage existing `(fingerprint_sha256, agent_id, source_path)` dedup constraint
|
||||
|
||||
**Common triage workflow (all sources):**
|
||||
|
||||
1. **Storage** — Records stored in `discovered_certificates` table with status = "Unmanaged"
|
||||
2. **Audit** — `discovery_scan_completed` event logged with agent ID, cert count, scan timestamp
|
||||
@@ -853,25 +1121,53 @@ flowchart TB
|
||||
|
||||
This data flow is pull-based and non-blocking. Agents discover at their own pace; the server stores results for later review. There's no pressure to claim or dismiss; operators can leave certificates in "Unmanaged" status indefinitely.
|
||||
|
||||
## Continuous TLS Health Monitoring (M48)
|
||||
|
||||
Beyond one-time discovery, certctl continuously monitors TLS endpoints for certificate health using a shared TLS probing package and a state-machine-driven health check service. Endpoints transition between states (Healthy → Degraded → Down) based on consecutive failures, and `cert_mismatch` status alerts when a deployed certificate is unexpectedly replaced.
|
||||
|
||||
**Architecture:** Probing is extracted into a shared `internal/tlsprobe/` package used by both the network scanner (M21) and the health monitor. The `HealthCheckService` manages 8 API endpoints for CRUD operations and state transitions. A dedicated opt-in endpoint health scheduler loop runs every 60 seconds (configurable via `CERTCTL_HEALTH_CHECK_INTERVAL`). Individual health check targets have their own check intervals (default 300 seconds) — the scheduler queries only endpoints due for check via `ListDueForCheck()`. Results are stored with historical tracking for 30 days (configurable via `CERTCTL_HEALTH_CHECK_HISTORY_RETENTION`). State transitions trigger notifications (critical for down endpoints, warning for degraded, high for cert_mismatch).
|
||||
|
||||
**State Machine:** Healthy → Degraded (configurable threshold, default 2 consecutive failures) → Down (default 5 failures). The `cert_mismatch` status is special — it fires whenever the observed certificate fingerprint differs from the expected (deployed) fingerprint, catching silent rollbacks and unauthorized cert replacements. Recovery from degraded/down transitions back to healthy and resets the failure counter.
|
||||
|
||||
**API:** 8 endpoints for list (with filters: status, certificate_id, network_scan_target_id, enabled), get, create, update, delete, history (with limit param), acknowledge (incident marking), and summary (aggregate status counts).
|
||||
|
||||
**Auto-Create:** When a deployment job completes with successful verification (M25), the system automatically creates a health check with the deployed certificate's fingerprint as the expected value. Network scan targets can also opt-in to auto-create health checks for discovered endpoints.
|
||||
|
||||
**Configuration:**
|
||||
|
||||
| Env Var | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_HEALTH_CHECK_ENABLED` | `false` | Enable/disable the feature |
|
||||
| `CERTCTL_HEALTH_CHECK_INTERVAL` | `60s` | Scheduler tick interval |
|
||||
| `CERTCTL_HEALTH_CHECK_DEFAULT_INTERVAL` | `300s` | Default per-endpoint check interval (5 min) |
|
||||
| `CERTCTL_HEALTH_CHECK_DEFAULT_TIMEOUT` | `5000ms` | TLS connection timeout per probe |
|
||||
| `CERTCTL_HEALTH_CHECK_MAX_CONCURRENT` | `20` | Max concurrent TLS probes |
|
||||
| `CERTCTL_HEALTH_CHECK_HISTORY_RETENTION` | `30 days` | Purge probe history older than this |
|
||||
| `CERTCTL_HEALTH_CHECK_AUTO_CREATE` | `true` | Auto-create checks from deployments |
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 900+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database.
|
||||
certctl is extensively tested across eight layers with CI-enforced coverage gates that act as regression floors. The goal is high-confidence regression prevention at the service and handler layers (where the most complex business logic lives), combined with integration tests that exercise the full request path from HTTP to database.
|
||||
|
||||
**Service layer unit tests** (`internal/service/*_test.go`) — ~238 test functions across 15 files with mock repositories. These test all business logic in isolation: certificate CRUD with validation, certificate revocation (success, already-revoked, archived, invalid reason, all RFC 5280 reason codes, issuer notification, notification service integration, OCSP/CRL generation), agent lifecycle (registration, heartbeat, CSR submission with both keygen modes), job state machine (creation, processing, cancellation, retry logic), policy evaluation (all 5 rule types, violation creation), renewal and issuance flow (server-side and agent-side keygen paths), notification deduplication (threshold tag matching, channel routing), team/owner/agent group CRUD with pagination and audit recording, issuer service CRUD with connection testing, and the issuer connector adapter (type translation between connector and service layers including revocation). Mock repositories are simple structs with function fields, avoiding heavy mocking frameworks — this keeps tests readable and avoids coupling to mock library APIs.
|
||||
**Service layer unit tests** (`internal/service/*_test.go`) — Mock-based tests across all service files covering certificate CRUD, revocation (all RFC 5280 reason codes, OCSP/CRL generation, bulk revocation by filter with partial-failure tolerance), agent lifecycle, job state machine, policy evaluation, renewal/issuance flow (both keygen modes), notification deduplication, team/owner/agent group CRUD, issuer service CRUD with connection testing, and the issuer connector adapter. Mock repositories are simple structs with function fields — no heavy mocking frameworks.
|
||||
|
||||
**Handler layer tests** (`internal/api/handler/*_test.go`) — ~257 test functions across 11 files using Go's `httptest` package. Every handler file has a corresponding test file: certificates (50 tests including revocation, DER CRL, and OCSP), agents (28 tests), jobs (21 tests including approve/reject), notifications (11 tests), policies (19 tests), profiles (18 tests), issuers (17 tests), targets (17 tests), agent groups (12 tests), teams (26 tests), and owners (21 tests). Each test file follows the same pattern: a mock service struct with function fields, `httptest.NewRecorder` for capturing responses, and a shared `contextWithRequestID()` helper. Tests cover the happy path, input validation (missing fields, invalid JSON, empty IDs, name length limits), error propagation from the service layer, method-not-allowed responses, and pagination parameters.
|
||||
**Handler layer tests** (`internal/api/handler/*_test.go`) — Every handler file has a corresponding test file using Go's `httptest` package: certificates (including revocation, bulk revocation by profile/owner/agent/issuer, DER CRL, OCSP), agents, jobs (including approve/reject), notifications, policies, profiles, issuers, targets, agent groups, teams, owners, discovery, network scan, verification, export, EST, digest, stats, and metrics. Tests cover the happy path, input validation, error propagation, method-not-allowed, pagination, and bulk operation partial-failure scenarios.
|
||||
|
||||
**Integration tests** (`internal/integration/`) — Two test files exercising the full stack from HTTP request through router, handler, service, and postgres repository layers. `lifecycle_test.go` has 11 subtests covering the complete certificate lifecycle: team/owner creation, certificate creation, issuer verification, renewal trigger, job verification, agent registration, CSR submission, deployment, and status reporting. `negative_test.go` has 14 subtests covering error paths, 19 M11b endpoint tests, and 8 revocation endpoint tests (M15a+M15b): nonexistent resource lookups (404s), invalid request bodies (malformed JSON, missing required fields), invalid CSR submission, heartbeat for nonexistent agents, wrong HTTP methods on list endpoints, empty list responses, renewal on nonexistent certificates, expired certificate lifecycle, team/owner/agent group CRUD validation, revocation success, already-revoked rejection, not-found revocation, JSON CRL retrieval, DER CRL retrieval, OCSP response retrieval, and short-lived cert exemption. Both use a shared `setupTestServer()` that builds a fully-wired server with real postgres repositories and the Local CA issuer connector. A third file, `e2e_test.go`, contains 8 cross-milestone test functions with 48+ subtests that exercise features across milestones end-to-end: M10 agent metadata via heartbeat, M11 profiles/teams/owners/agent-groups CRUD, M12 issuer registry verification, M13 GUI operation endpoints, M14 stats and metrics, M15 revocation and CRL, M16 notification channels, and M20 enhanced query API (sorting, cursor pagination, sparse fields, time-range filters).
|
||||
**Integration tests** (`internal/integration/`) — Three test files exercising the full stack from HTTP request through router, handler, service, and repository layers. `lifecycle_test.go` covers the complete certificate lifecycle (team/owner creation through deployment and status reporting). `negative_test.go` covers error paths, endpoint validation, and revocation scenarios. `e2e_test.go` exercises cross-milestone features end-to-end (agent metadata, profiles, issuer registry, GUI operations, stats, revocation, notifications, enhanced query API).
|
||||
|
||||
**Frontend tests** (`web/src/api/client.test.ts`, `web/src/api/utils.test.ts`) — 86 Vitest tests covering the API client, stats/metrics endpoints, and utility functions. The API client tests mock `globalThis.fetch` and verify all endpoint functions (certificates, agents, jobs, policies, issuers, targets, notifications, audit, stats, metrics, health) send correct HTTP methods, URLs, headers, and request bodies. They also test API key management (store/retrieve/clear), auth header propagation, 401 event dispatching, and error handling (server messages, error fields, status text fallback). The stats/metrics endpoint tests verify correct query parameter handling and response shape validation. The utility tests use `vi.useFakeTimers()` for deterministic date testing and cover `formatDate`, `formatDateTime`, `timeAgo`, `daysUntil`, and `expiryColor`. The test environment uses jsdom with `@testing-library/jest-dom` matchers.
|
||||
**Go integration tests** (`deploy/test/integration_test.go`) — Runs against the live Docker Compose test environment with real CA backends (Local CA, Pebble ACME, step-ca). Covers health checks, agent heartbeat, issuance, renewal, revocation, CRL/OCSP, EST enrollment, S/MIME, discovery, network scanning, and deployment verification using `crypto/x509` for cert parsing and `crypto/tls` for live TLS verification.
|
||||
|
||||
**CLI tests** (`internal/cli/client_test.go`) — 14 tests covering all 10 CLI subcommands with httptest mock servers, PEM parsing for bulk import, auth header verification, and JSON/table output formatting.
|
||||
**Frontend tests** (`web/src/api/`) — Vitest tests covering the full API client (all endpoint functions with fetch mocking), stats/metrics endpoints, utility functions, and auth flows. Test environment uses jsdom with `@testing-library/jest-dom` matchers.
|
||||
|
||||
**CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs: Go (build, vet, test with coverage, coverage threshold enforcement) and Frontend (TypeScript type check, Vitest test suite, Vite production build). The Go job runs all tests with `-coverprofile`, then enforces coverage thresholds: service layer must be at least 30% (current: ~35%) and handler layer must be at least 50% (current: ~63%). These thresholds act as regression floors — they can only go up. The service layer threshold is deliberately lower because much of the service code depends on postgres repositories and external connectors that require real infrastructure to test meaningfully. Connector tests are included via `./internal/connector/issuer/...` and `./internal/connector/target/...` (covers Local CA, ACME, step-ca, NGINX, Apache, and HAProxy packages with unit tests for certificate signing logic, DNS solver, issuer validation, and deployment flows). The Frontend job runs `npx vitest run` between the TypeScript check and production build steps.
|
||||
**Connector tests** (`internal/connector/`) — Issuer connectors (Local CA self-signed/sub-CA modes, ACME DNS-01/DNS-PERSIST-01, step-ca, OpenSSL, Vault PKI, DigiCert, Sectigo, Google CAS, AWS ACM PCA — all with httptest mock servers or injectable interface mocks). Target connectors (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS with mock PowerShell executor, F5 BIG-IP with mock iControl client, Postfix/Dovecot, SSH with mock SSH client, Windows Certificate Store with mock PowerShell executor, Java Keystore with mock command executor, Kubernetes Secrets with mock K8s client, shared certutil package). Notifier connectors (Slack, Teams, PagerDuty, OpsGenie).
|
||||
|
||||
**Connector tests** (`internal/connector/`) — 57 test functions covering issuer, target, and notifier connectors. The Local CA connector has tests for self-signed and sub-CA modes (RSA, ECDSA, config validation, non-CA cert rejection). The ACME DNS solver has 6 tests for script-based DNS-01 challenges. The step-ca connector has tests with a mock HTTP server for issuance, renewal, revocation, and error paths. The OpenSSL/Custom CA connector has 14 tests covering config validation, issuance success/failure/timeout, renewal, revocation, and CRL generation. The NGINX target connector has 13 tests covering config validation, certificate deployment (file writing, permissions, validate/reload commands), and deployment validation. Apache httpd and HAProxy connectors each have 3 tests covering config validation, deployment, and validation flows. Notifier connector tests span 20 tests across Slack (5), Teams (4), PagerDuty (6), and OpsGenie (5) — verifying channel identity, payload formatting, HTTP error handling, connection failures, auth headers, and configuration defaults.
|
||||
**Scheduler tests** (`internal/scheduler/scheduler_test.go`) — Idempotency guards (`sync/atomic.Bool`), `WaitForCompletion` success and timeout paths, and multi-loop concurrency safety.
|
||||
|
||||
**What's not tested and why:** Postgres repository implementations (`internal/repository/postgres/`) require a real database and are tested only through integration tests, not unit tests. Target connectors for F5 BIG-IP and IIS are interface stubs (implementation planned for a future release). Scheduler loops are time-dependent and tested manually during development. The ACME connector requires a real ACME server (tested manually against Let's Encrypt staging). These are all candidates for future expansion as the test infrastructure matures.
|
||||
**Fuzz tests** (`internal/validation/`, `internal/domain/`) — Go native fuzz tests for command validation (`ValidateShellCommand`, `ValidateDomainName`, `ValidateACMEToken`) and revocation domain parsing.
|
||||
|
||||
**CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs. Go: build, vet, `go test -race`, `golangci-lint` (11 linters), `govulncheck`, test with coverage, per-layer coverage threshold enforcement (service 55%, handler 60%, domain 40%, middleware 30%). Frontend: TypeScript type check, Vitest, Vite production build.
|
||||
|
||||
For detailed test procedures, smoke tests, and the release sign-off checklist, see the [Testing Guide](testing-guide.md). For setting up the Docker Compose test environment with real CA backends, see [Test Environment](test-env.md).
|
||||
|
||||
## What's Next
|
||||
|
||||
@@ -881,3 +1177,5 @@ certctl uses a layered testing approach aligned with the handler → service →
|
||||
- [Compliance Mapping](compliance.md) — SOC 2, PCI-DSS 4.0, and NIST SP 800-57 alignment
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the API
|
||||
- [OpenAPI Spec](openapi.md) — Full API reference and SDK generation
|
||||
- [Testing Guide](testing-guide.md) — Test procedures and release sign-off
|
||||
- [Test Environment](test-env.md) — Docker Compose test environment setup
|
||||
|
||||
@@ -0,0 +1,145 @@
|
||||
# certctl for cert-manager Users
|
||||
|
||||
You run cert-manager inside Kubernetes and it works well for in-cluster certificates. But you also have VMs, bare-metal servers, network appliances, and legacy systems outside the cluster. cert-manager can't reach those. This guide shows how certctl complements cert-manager to give you unified certificate visibility and automation across your entire infrastructure.
|
||||
|
||||
## Not a Replacement
|
||||
|
||||
cert-manager is the right tool for in-cluster certs. It's tightly integrated with Kubernetes:
|
||||
- Native CRDs (Certificate, ClusterIssuer, Issuer)
|
||||
- Automatic cert injection into Ingress and Service objects
|
||||
- Controller-driven renewal within the cluster
|
||||
|
||||
**certctl does not replace this.** Instead, it extends your certificate management to everything outside Kubernetes: VMs, bare metal, network appliances, Windows servers, and legacy systems.
|
||||
|
||||
## The Problem
|
||||
|
||||
Your setup:
|
||||
- **cert-manager**: handles all certs in Kubernetes (TLS for Ingress, service-to-service, internal services)
|
||||
- **Everything else**: NGINX/Apache on VMs, HAProxy load balancers on bare metal, network appliances, Windows servers with IIS — these are managed inconsistently. Maybe Certbot cron jobs, maybe manual renewal, maybe deprecated cert files sitting around.
|
||||
|
||||
Result:
|
||||
- No unified visibility — you don't know when non-Kubernetes certs expire
|
||||
- Renewal failures go unnoticed until the cert is already expired
|
||||
- Audit trail fragmented across multiple tools
|
||||
- Scaling to hundreds of machines becomes impossible
|
||||
|
||||
## The Solution
|
||||
|
||||
Deploy certctl control plane once (Docker Compose, Kubernetes Helm chart, or self-hosted). Deploy agents on your VMs, bare metal, and network appliances. One dashboard shows:
|
||||
- **All cert-manager certs** via discovery scanning (agents find cert-manager-issued certs copied to target machines, or scan the cluster directly)
|
||||
- **All certctl-managed certs** issued by shared issuers (ACME, step-ca, Vault PKI (planned), private CA)
|
||||
- **Unified renewal and deployment** across both worlds
|
||||
- **Single pane of glass** with expiration timeline, renewal status, deployment verification, audit trail
|
||||
|
||||
## How to Set Up
|
||||
|
||||
### 1. Install certctl Control Plane
|
||||
|
||||
**Option A: Docker Compose** (quickest for evaluation)
|
||||
```bash
|
||||
cd /opt/certctl
|
||||
docker compose up -d
|
||||
# Dashboard & API: https://localhost:8443 (self-signed cert — pin with --cacert ./deploy/test/certs/ca.crt)
|
||||
```
|
||||
|
||||
**Option B: Kubernetes** (recommended for prod)
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set auth.apiKey=YOUR_SECURE_KEY
|
||||
```
|
||||
|
||||
### 2. Deploy Agents to Non-Kubernetes Infrastructure
|
||||
|
||||
On each VM, bare-metal server, or appliance (via proxy agent):
|
||||
```bash
|
||||
# Linux amd64
|
||||
curl -sSL https://github.com/shankar0123/certctl/releases/download/v2.1.0/certctl-agent-linux-amd64 \
|
||||
-o /usr/local/bin/certctl-agent
|
||||
chmod +x /usr/local/bin/certctl-agent
|
||||
|
||||
# Config
|
||||
sudo tee /etc/certctl/agent.env > /dev/null <<EOF
|
||||
CERTCTL_SERVER_URL=https://certctl-control-plane:8443
|
||||
CERTCTL_SERVER_CA_BUNDLE_PATH=/etc/certctl/tls/ca.crt
|
||||
CERTCTL_API_KEY=your-api-key
|
||||
CERTCTL_DISCOVERY_DIRS=/etc/nginx/certs,/etc/ssl,/etc/letsencrypt/live
|
||||
CERTCTL_KEY_DIR=/var/lib/certctl/keys
|
||||
EOF
|
||||
sudo chmod 600 /etc/certctl/agent.env
|
||||
|
||||
# Start
|
||||
sudo systemctl start certctl-agent
|
||||
```
|
||||
|
||||
### 3. Enable Discovery Scanning
|
||||
|
||||
Agents scan configured directories and report back all existing certs. In the dashboard:
|
||||
- **Discovery** page: all found certs grouped by agent
|
||||
- Claim cert-manager certs to link them with Kubernetes metadata
|
||||
- Dismiss obsolete certs
|
||||
|
||||
### 4. Configure Shared Issuers
|
||||
|
||||
Set up the same issuer certctl uses for non-Kubernetes certs:
|
||||
- **ACME** (Let's Encrypt, for public certs)
|
||||
- **step-ca** (Smallstep, for internal certs)
|
||||
- **Vault PKI** (HashiCorp Vault, for enterprise PKI)
|
||||
- **Private CA** (your own internal root CA)
|
||||
|
||||
No new CA infrastructure needed. If cert-manager already uses your CA, certctl points to the same one.
|
||||
|
||||
### 5. Create Policies for Non-Kubernetes Certs
|
||||
|
||||
Go to **Policies** → **+ New Policy** to create enforcement rules:
|
||||
- **Name:** e.g., "VM Certificate Policy"
|
||||
- **Type:** `expiration_window` or `key_algorithm` (enforce renewal thresholds or crypto requirements)
|
||||
- **Severity:** `high`
|
||||
- **Config:** set your enforcement parameters
|
||||
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other compliance rules across your fleet.
|
||||
|
||||
### 6. View Unified Inventory
|
||||
|
||||
**Dashboard** shows:
|
||||
- Certificate status heatmap (all 1000 certs: cert-manager + certctl)
|
||||
- Renewal job trends (both types)
|
||||
- Expiration timeline (30/60/90 days)
|
||||
- Agent fleet status (all infrastructure)
|
||||
|
||||
**Certificates** page filters by issuer (show me all ACME certs, or all step-ca certs):
|
||||
- cert-manager certs discovered from Kubernetes nodes
|
||||
- certctl-managed certs on VMs
|
||||
- Network appliance certs auto-discovered
|
||||
|
||||
## Shared Infrastructure
|
||||
|
||||
If cert-manager and certctl both use the same CA:
|
||||
- **ACME**: cert-manager uses ClusterIssuer + certctl uses ACME connector → same Let's Encrypt account, transparent coexistence
|
||||
- **step-ca**: cert-manager uses external issuer CRD + certctl uses step-ca connector → same provisioner, shared certificate inventory
|
||||
- **Vault PKI**: cert-manager uses external issuer CRD + certctl uses Vault connector → same mount, same audit trail
|
||||
|
||||
No conflict. They just issue certs through the same CA. certctl's discovery scanning finds cert-manager-issued certs and shows them alongside certctl-managed ones.
|
||||
|
||||
## Key Differences from cert-manager
|
||||
|
||||
| Feature | cert-manager | certctl |
|
||||
|---------|--------------|---------|
|
||||
| Target | In-cluster (Kubernetes) | Out-of-cluster (VMs, bare metal, appliances) |
|
||||
| Configuration | CRDs (Certificate, ClusterIssuer, Issuer) | API + Dashboard (JSON REST) |
|
||||
| Deployment | Injected into Secret objects, mounted by pods | Agent pulls work, deploys via target-specific API (file, service restart, proxy agent) |
|
||||
| Renewal | Controller watches Certificate CRDs, triggers renewal when needed | Scheduler checks thresholds, agents poll for work |
|
||||
| Audit | Kubernetes event log | Immutable append-only audit trail |
|
||||
| Visibility | Per-namespace, per-resource | Fleet-wide, unified inventory |
|
||||
|
||||
## Future Integration
|
||||
|
||||
On the roadmap (V4): **cert-manager external issuer** — certctl acts as a ClusterIssuer backend for Kubernetes. This would allow cert-manager to request certificates from certctl, which could issue them via any of its connectors (step-ca, Vault, private CA, etc.). Pure integration play; no breaking changes.
|
||||
|
||||
For now: cert-manager handles Kubernetes, certctl handles everything else. They coexist seamlessly.
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Run through the [Quick Start](./quickstart.md) for a 5-minute demo
|
||||
2. Try the [Multi-Issuer example](../examples/multi-issuer/multi-issuer.md) — manages public and internal certs from one dashboard
|
||||
3. Explore [Architecture](./architecture.md#agents) for deployment patterns
|
||||
4. Check the [Helm Chart](../deploy/helm/certctl/) for production Kubernetes deployment
|
||||
@@ -2,6 +2,24 @@
|
||||
|
||||
NIST SP 800-57 Part 1 Rev 5 (May 2020) is the authoritative US government guidance on cryptographic key management. This document maps certctl's implementation to its recommendations. certctl follows NIST guidance where applicable; this guide documents the alignment and identifies gaps for future roadmap planning.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Key Generation (Section 6.1)](#key-generation-section-61)
|
||||
2. [Key Storage and Protection (Sections 6.3, 6.4)](#key-storage-and-protection-sections-63-64)
|
||||
3. [Cryptoperiods (Section 5.3, Table 1)](#cryptoperiods-section-53-table-1)
|
||||
4. [Key States and Transitions (Section 5.2)](#key-states-and-transitions-section-52)
|
||||
5. [Algorithm Recommendations (Section 5.1, SP 800-131A)](#algorithm-recommendations-section-51-sp-800-131a)
|
||||
6. [Key Distribution and Transport (Section 6.2)](#key-distribution-and-transport-section-62)
|
||||
7. [Revocation and Compromise (NIST SP 800-57 Part 3)](#revocation-and-compromise-nist-sp-800-57-part-3)
|
||||
8. [Alignment Summary Table](#alignment-summary-table)
|
||||
9. [Gaps and Remediation Roadmap](#gaps-and-remediation-roadmap)
|
||||
- [V2 (Current)](#v2-current)
|
||||
- [V3 (Planned: 2026)](#v3-planned-2026)
|
||||
- [V5 (Planned: 2027+)](#v5-planned-2027)
|
||||
- [Post-Quantum (2027+)](#post-quantum-2027)
|
||||
10. [References](#references)
|
||||
11. [Questions or Corrections?](#questions-or-corrections)
|
||||
|
||||
## Key Generation (Section 6.1)
|
||||
|
||||
certctl generates certificate keys on agent infrastructure using Go's `crypto/rand` for entropy, backed by `/dev/urandom` on Linux and `CryptGenRandom` on Windows. Key generation happens as follows:
|
||||
@@ -54,7 +72,7 @@ certctl implements tiered key storage with different protection profiles based o
|
||||
- Configured via: `CERTCTL_CA_CERT_PATH=/path/to/ca.crt` and `CERTCTL_CA_KEY_PATH=/path/to/ca.key`
|
||||
|
||||
**NIST Gap: HSM Storage**
|
||||
NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for V5 roadmap, enabling integration with:
|
||||
NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for certctl Pro (V3), enabling integration with:
|
||||
- AWS CloudHSM
|
||||
- Azure Dedicated HSM
|
||||
- Thales Luna, Gemalto SafeNet, YubiHSM (on-premises)
|
||||
@@ -192,15 +210,17 @@ NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize
|
||||
- Proxy agent executes deployment via appliance API
|
||||
|
||||
**Revocation Distribution**
|
||||
- Certificate Revocation List (CRL) via `GET /api/v1/crl/{issuer_id}`
|
||||
- Returns DER-encoded X.509 CRL signed by issuing CA
|
||||
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
|
||||
- Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
|
||||
- 24-hour validity period
|
||||
- Includes all revoked serials, reasons, and revocation timestamps
|
||||
- Served unauthenticated so relying parties without certctl API credentials can fetch it
|
||||
- Subject to URL caching; OCSP preferred for real-time revocation
|
||||
- OCSP via `GET /api/v1/ocsp/{issuer_id}/{serial}`
|
||||
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure)
|
||||
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
|
||||
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
|
||||
- Signed by issuing CA (or delegated OCSP signing cert)
|
||||
- Responds with good/revoked/unknown status
|
||||
- Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
|
||||
- Real-time, more bandwidth-efficient than CRL polling
|
||||
|
||||
## Revocation and Compromise (NIST SP 800-57 Part 3)
|
||||
@@ -254,20 +274,23 @@ NIST SP 800-57 Part 3 covers revocation (Section 2.5) when keys are suspected co
|
||||
- OCSP responder queries revocation table in real-time
|
||||
- Short-lived certificate exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
**Bulk Revocation for Large-Scale Compromise Response** (V2.2) — NIST SP 800-57 Part 3 emphasizes rapid revocation when keys are compromised. `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria (profile, owner, agent, issuer) in a single operation. This enables operators to execute fleet-wide revocation for key compromise events affecting multiple certificates. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring every certificate is recorded in the audit trail with the incident reason.
|
||||
|
||||
**Revocation Audit Trail**
|
||||
All revocation events logged:
|
||||
- Event type: `certificate_revoked`
|
||||
- Event type: `certificate_revoked` or `bulk_revocation_initiated` (for fleet operations)
|
||||
- Actor: authenticated user or service
|
||||
- Reason code: RFC 5280 enum
|
||||
- Reason code: RFC 5280 enum (or incident justification for bulk operations)
|
||||
- Timestamp: RFC3339
|
||||
- Issuer notification status: success or error reason
|
||||
- Filter criteria: profile_id, owner_id, agent_id, issuer_id (for bulk revocation)
|
||||
|
||||
## Alignment Summary Table
|
||||
|
||||
| NIST SP 800-57 Area | Status | Coverage | Notes |
|
||||
|---|---|---|---|
|
||||
| **Key Generation** | ✅ Aligned | 100% | Agent-side ECDSA P-256 using crypto/rand; server mode flagged as demo-only |
|
||||
| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V5 |
|
||||
| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V3 Pro |
|
||||
| **Cryptoperiods** | ✅ Aligned | 100% | Profile-enforced max_ttl; threshold-based renewal alerting |
|
||||
| **Key States** | ✅ Aligned | 100% | Full lifecycle tracking with immutable audit trail |
|
||||
| **Algorithms** | ✅ Aligned | 100% | NIST-approved algorithms only; post-quantum tracking in progress |
|
||||
@@ -283,13 +306,14 @@ All revocation events logged:
|
||||
- [x] RFC 5280 revocation support
|
||||
- [x] Immutable audit trail
|
||||
|
||||
### V2.2 (Planned: 2026)
|
||||
- Bulk revocation by profile/owner/agent/issuer (fleet-level revocation for incident response)
|
||||
|
||||
### V3 (Planned: 2026)
|
||||
- Role-based access control (limit revocation/approval to authorized operators)
|
||||
- Bulk revocation by profile/owner/agent (fleet-level revocation policy)
|
||||
|
||||
### V5 (Planned: 2027+)
|
||||
- HSM support for CA key storage
|
||||
- PKCS#11 integration for hardware tokens
|
||||
### V3 Pro (Planned)
|
||||
- HSM support for CA key storage and agent key storage (TPM 2.0, PKCS#11)
|
||||
- FIPS 140-2/3 validated crypto module (BoringCrypto build or external FIPS library)
|
||||
- Key destruction API (explicit secure erasure of agent keys)
|
||||
- Key escrow / recovery mechanism (backup encrypted private keys for disaster recovery)
|
||||
|
||||
@@ -4,6 +4,34 @@ This guide maps certctl's existing capabilities to PCI-DSS 4.0 requirements rele
|
||||
|
||||
Organizations subject to PCI-DSS typically need to demonstrate control over certificate issuance, renewal, rotation, revocation, and key management. Certctl automates the technical controls for certificate lifecycle; compliance depends on how you deploy, monitor, and audit it.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [Requirement 4: Protect Data in Transit](#requirement-4-protect-data-in-transit)
|
||||
- [4.2.1 — Strong Cryptography for Transmission](#421--strong-cryptography-for-transmission)
|
||||
- [4.2.2 — Certificate Inventory and Validation](#422--certificate-inventory-and-validation)
|
||||
3. [Requirement 3: Protect Stored Cardholder Data (Key Management)](#requirement-3-protect-stored-cardholder-data-key-management)
|
||||
- [3.6 — Cryptographic Key Documentation](#36--cryptographic-key-documentation)
|
||||
- [3.7 — Key Lifecycle Procedures](#37--key-lifecycle-procedures)
|
||||
4. [Requirement 8: Identify and Authenticate](#requirement-8-identify-and-authenticate)
|
||||
- [8.3 — Strong Authentication](#83--strong-authentication)
|
||||
- [8.6 — Application Account Management](#86--application-account-management)
|
||||
5. [Requirement 10: Log and Monitor](#requirement-10-log-and-monitor)
|
||||
- [10.2 — Implement Automated Audit Logging](#102--implement-automated-audit-logging)
|
||||
- [10.3 — Protect Audit Trail](#103--protect-audit-trail)
|
||||
- [10.4 — Promptly Review and Address Audit Trail Exceptions](#104--promptly-review-and-address-audit-trail-exceptions)
|
||||
- [10.7 — Retain and Protect Audit Trail History](#107--retain-and-protect-audit-trail-history)
|
||||
6. [Requirement 6: Develop and Maintain Secure Systems and Applications](#requirement-6-develop-and-maintain-secure-systems-and-applications)
|
||||
- [6.3.1 — Security Coding Practices](#631--security-coding-practices)
|
||||
- [6.5.10 — Broken Authentication and Cryptography Prevention](#6510--broken-authentication-and-cryptography-prevention)
|
||||
7. [Requirement 7: Restrict Access by Business Need-to-Know](#requirement-7-restrict-access-by-business-need-to-know)
|
||||
- [7.2 — Implement Access Control](#72--implement-access-control)
|
||||
8. [Evidence Summary Table](#evidence-summary-table)
|
||||
9. [Operator Responsibilities](#operator-responsibilities)
|
||||
10. [V3 Enhancements for PCI-DSS](#v3-enhancements-for-pci-dss)
|
||||
11. [Next Steps for Compliance](#next-steps-for-compliance)
|
||||
12. [Questions?](#questions)
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
Your QSA will request evidence that your certificate and key management systems meet specific PCI-DSS 4.0 requirements. For each applicable requirement, this guide identifies:
|
||||
@@ -64,9 +92,11 @@ Your QSA will request evidence that your certificate and key management systems
|
||||
|
||||
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
|
||||
|
||||
- **Revocation Infrastructure** (M15a, M15b):
|
||||
- CRL endpoint: `GET /api/v1/crl` (JSON format) or `GET /api/v1/crl/{issuer_id}` (DER X.509 CRL, 24h validity, signed by issuing CA)
|
||||
- OCSP responder: `GET /api/v1/ocsp/{issuer_id}/{serial}` (returns DER-encoded OCSP response: good/revoked/unknown)
|
||||
- **Revocation Infrastructure** (M15a, M15b, M-006):
|
||||
- Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
|
||||
- CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` — DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
|
||||
- Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
- **Stats API** (M14) — Real-time visibility:
|
||||
@@ -79,7 +109,7 @@ Your QSA will request evidence that your certificate and key management systems
|
||||
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
|
||||
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
|
||||
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
|
||||
- CRL/OCSP availability proof: HTTP GET requests to `/api/v1/crl` and `/api/v1/ocsp/{issuer}/{serial}` with signed responses.
|
||||
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
|
||||
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
|
||||
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
|
||||
|
||||
@@ -298,11 +328,14 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
- Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
|
||||
- Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **CRL and OCSP Publication** (M15b) — Revoked certificates published in:
|
||||
- CRL: `GET /api/v1/crl` (JSON format) or `GET /api/v1/crl/{issuer_id}` (DER X.509, signed by CA, 24h validity)
|
||||
- OCSP: `GET /api/v1/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain)
|
||||
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
|
||||
- CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
|
||||
- Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
|
||||
|
||||
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
|
||||
|
||||
- **Private Key Destruction on Agent** — When certificate renewed or revoked:
|
||||
- Agent removes old private key file from `CERTCTL_KEY_DIR` when new certificate deployed.
|
||||
- Job status tracking confirms old key is no longer needed.
|
||||
@@ -310,8 +343,8 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
|
||||
- CRL publication: HTTP GET `/api/v1/crl` and parse JSON to show revoked serial numbers and timestamps.
|
||||
- OCSP responder validation: Query `GET /api/v1/ocsp/{issuer}/{serial}` for a known-revoked cert; response includes `revoked` status.
|
||||
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
|
||||
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
|
||||
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
|
||||
|
||||
**Operator Responsibility**:
|
||||
@@ -354,18 +387,18 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
- API key transmitted in Authorization header (not URL parameter, not cookie).
|
||||
- Browser to server: TLS.
|
||||
- Agent to server: TLS.
|
||||
- No credential logging (API key hash only, never plaintext).
|
||||
- No credential logging (audit records the per-key actor `Name`, never the Bearer token; logs redact the `Authorization` header).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- API configuration: `CERTCTL_AUTH_TYPE=api-key` in deployment manifest.
|
||||
- Database schema: `api_keys` table showing SHA-256 hash column, not plaintext.
|
||||
- API audit log: `GET /api/v1/audit?action=api_call` showing Bearer token validation (no plaintext keys logged).
|
||||
- Key inventory: `CERTCTL_API_KEYS_NAMED` env var (format `name:key:admin,...`) — seeds the in-memory `NamedAPIKey{Name, Key, Admin}` struct at `internal/api/middleware/middleware.go:29`. Keys are constant-time-compared (`subtle.ConstantTimeCompare`) against the Bearer token. No database table stores them; protect the env var contents at rest via a secrets manager (Vault / AWS Secrets Manager / Kubernetes Secrets / Docker Secrets).
|
||||
- API audit log: `GET /api/v1/audit?action=api_call` showing per-key actor names (`Name` field of matched `NamedAPIKey`) on every call, with zero plaintext or hashed key material recorded.
|
||||
- TLS certificate on control plane: `openssl s_client -connect {server}:8443` showing valid certificate, TLS 1.2+, strong cipher.
|
||||
- GUI login flow: browser network tab showing Authorization header (token value redacted in compliance report).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Issue API keys to users/systems** requiring API access (outside certctl; you maintain key registry).
|
||||
- **Rotate API keys periodically** (recommendation: annually, or when personnel changes).
|
||||
- **Rotate API keys using zero-downtime rotation** — `CERTCTL_AUTH_SECRET` supports comma-separated keys (e.g., `new-key,old-key`). Add the new key, migrate clients, then remove the old key. Recommendation: rotate at least annually, or immediately when personnel changes.
|
||||
- **Revoke API keys immediately** when user leaves or token is compromised (set `enabled=false` in API key management — not yet implemented in v1, owner must track manually).
|
||||
- **Enforce strong TLS** on control plane: TLS 1.2+, modern ciphers (configure on reverse proxy or `CERTCTL_TLS_*` env vars if operator-controlled).
|
||||
- **Protect `.env` and credential files** where API key is defined (restrict file system access, no version control).
|
||||
@@ -424,7 +457,7 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
- **Immutable API Audit Log** (M19) — Middleware captures every API call:
|
||||
- `audit_events` table (append-only, no UPDATE/DELETE):
|
||||
- `method`: HTTP method (GET, POST, PUT, DELETE)
|
||||
- `path`: API endpoint path (e.g., `/api/v1/certificates`)
|
||||
- `path`: API endpoint path only, excluding query parameters (e.g., `/api/v1/certificates` — query strings intentionally omitted to prevent sensitive data persistence in the append-only audit trail)
|
||||
- `actor`: authenticated user/service (extracted from API key or context)
|
||||
- `body_hash`: SHA-256 hash of request body (truncated to 16 chars, first 8 chars shown in logs)
|
||||
- `status_code`: HTTP response status (200, 201, 400, 401, 404, 500, etc.)
|
||||
@@ -529,6 +562,7 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
- **Alert Notifications** (M3, M16a) — Configurable escalation:
|
||||
- Email alerts: certificate approaching expiration, renewal failure, revocation notification.
|
||||
- Webhook: custom HTTP POST to your monitoring system (Slack, Teams, PagerDuty, OpsGenie, custom webhook).
|
||||
- **Retry & Dead-Letter Queue** (I-005) — Transient notifier failures (SMTP timeout, webhook 5xx) are retried with exponential backoff (`2^n` minutes capped at 1h, 5-attempt budget) before landing in the terminal `dead` status. Operators monitor DLQ depth via the `certctl_notification_dead_total` Prometheus counter and requeue via the Notifications page Dead letter tab once the underlying outage is resolved. Closes the pre-I-005 silent-drop gap where a single 5xx could lose a compliance-relevant alert without evidence.
|
||||
- Deduplication: one alert per threshold/certificate per day (avoid alert fatigue).
|
||||
|
||||
- **Audit Trail Filtering and Export** (M13) — Compliance reporting:
|
||||
@@ -689,12 +723,12 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
|
||||
| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
|
||||
|---|---|---|---|---|---|
|
||||
| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
|
||||
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /api/v1/crl`, `GET /api/v1/ocsp/{issuer}/{serial}` | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
|
||||
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
|
||||
| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
|
||||
| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
|
||||
| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
|
||||
| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
|
||||
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, `GET /api/v1/crl`, OCSP endpoint | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
|
||||
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
|
||||
| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
|
||||
| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
|
||||
| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
|
||||
|
||||
@@ -14,6 +14,28 @@ Each section includes:
|
||||
- **V2 vs V3 status** — whether feature is in the free community edition (V2) or paid Pro edition (V3)
|
||||
- **Operator responsibility** — aspects your organization must handle outside of certctl
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [CC6: Logical and Physical Access Controls](#cc6-logical-and-physical-access-controls)
|
||||
- [CC6.1 — Logical Access Security](#cc61--logical-access-security)
|
||||
- [CC6.2 — Prior to Issuing System Credentials](#cc62--prior-to-issuing-system-credentials)
|
||||
- [CC6.3 — Authentication Policies](#cc63--authentication-policies)
|
||||
- [CC6.7 — Information Transmission Protection](#cc67--information-transmission-protection)
|
||||
3. [CC7: System Operations](#cc7-system-operations)
|
||||
- [CC7.1 — System Monitoring](#cc71--system-monitoring)
|
||||
- [CC7.2 — Anomaly Detection](#cc72--anomaly-detection)
|
||||
- [CC7.3 — Incident Response](#cc73--incident-response)
|
||||
- [CC7.4 — Identify and Develop Risk Mitigation Activities](#cc74--identify-and-develop-risk-mitigation-activities)
|
||||
4. [A1: Availability](#a1-availability)
|
||||
- [A1.1/A1.2 — Availability and Recovery](#a11a12--availability-and-recovery)
|
||||
5. [CC8: Change Management](#cc8-change-management)
|
||||
- [CC8.1 — Change Control](#cc81--change-control)
|
||||
6. [Evidence Summary Table](#evidence-summary-table)
|
||||
7. [What Requires Operator Action](#what-requires-operator-action)
|
||||
8. [V3 Enhancements](#v3-enhancements)
|
||||
9. [Conclusion](#conclusion)
|
||||
|
||||
## CC6: Logical and Physical Access Controls
|
||||
|
||||
### CC6.1 — Logical Access Security
|
||||
@@ -22,11 +44,13 @@ Each section includes:
|
||||
|
||||
**certctl Implementation** (V2 — Community Edition):
|
||||
|
||||
- **API Key Authentication** — All API calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
|
||||
- **API Key Authentication** — All `/api/v1/*` calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
|
||||
- **Standards-based enrollment and PKI distribution endpoints** — EST (`/.well-known/est/*`, RFC 7030), SCEP (`/scep`, `/scep/*`, RFC 8894), and CRL/OCSP (`/.well-known/pki/crl/{issuer_id}`, `/.well-known/pki/ocsp/{issuer_id}/{serial}`, RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer because these protocols cannot present certctl Bearer tokens. Authentication is enforced in-protocol: EST relies on CSR signature verification plus profile policy (RFC 7030 §3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous); SCEP requires a shared `challengePassword` in the PKCS#10 CSR attributes (OID 1.2.840.113549.1.9.7, RFC 8894 §3.2), validated with `crypto/subtle.ConstantTimeCompare`; CRL and OCSP are intentionally anonymous for relying-party accessibility. CWE-306 (missing authentication for a critical function) is closed for SCEP by `preflightSCEPChallengePassword` in `cmd/server/main.go`, which refuses to start the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware) and is pinned by the 27-subtest regression harness at `cmd/server/finalhandler_test.go`.
|
||||
- **GUI Authentication** — Web dashboard includes login screen requiring API key entry. Failed auth redirects to login on 401. Auth context persists across page navigation. Logout clears session.
|
||||
- **Configurable CORS** — API restricts cross-origin requests via `CERTCTL_CORS_ORIGINS` allowlist or wildcard. Preflight caching prevents chatty browser auth flows.
|
||||
- **Token Bucket Rate Limiting** — Per-IP rate limiting (configurable via `CERTCTL_RATE_LIMIT_RPS` / `CERTCTL_RATE_LIMIT_BURST`) returns 429 Too Many Requests with Retry-After header. Prevents credential stuffing and brute-force attacks.
|
||||
- **No Password Storage** — certctl does not store user passwords. API keys are the sole authentication mechanism. Your API key generation, distribution, and rotation policies are your responsibility (see "Operator Responsibility" below).
|
||||
- **Zero-Downtime Key Rotation** — `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `new-key,old-key`). All listed keys are validated with constant-time comparison. Operators can add a new key, migrate clients, then remove the old key — no service restart required for the client migration phase. A single-key warning is logged at startup to encourage rotation configuration.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
@@ -35,6 +59,11 @@ Each section includes:
|
||||
- Auth info endpoint: `GET /api/v1/auth/info` (returns current auth mode, served without auth so GUI detects mode)
|
||||
- Rate limiting middleware: `internal/api/middleware/rate_limit.go`
|
||||
- CORS configuration: `cmd/server/main.go`, search for `CERTCTL_CORS_ORIGINS`
|
||||
- Final handler dispatch (authenticated vs. unauthenticated routing): `cmd/server/main.go:buildFinalHandler`
|
||||
- SCEP preflight gate (CWE-306 closure): `cmd/server/main.go:preflightSCEPChallengePassword`
|
||||
- SCEP service-layer defense-in-depth (rejects enrollment on empty challenge password, `crypto/subtle.ConstantTimeCompare`): `internal/service/scep.go`
|
||||
- Final handler dispatch regression harness (27 subtests): `cmd/server/finalhandler_test.go`
|
||||
- OpenAPI spec `security: []` overrides on unauthenticated paths: `api/openapi.yaml` (EST `/cacerts`, `/simpleenroll`, `/simplereenroll`, `/csrattrs`; SCEP `/scep` GET+POST; PKI `/crl/{issuer_id}`, `/ocsp/{issuer_id}/{serial}`)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
@@ -87,7 +116,7 @@ Each section includes:
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **API Key Policy** — All API access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup.
|
||||
- **API Key Policy** — All `/api/v1/*` access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup. The standards-based enrollment and PKI distribution endpoints (EST, SCEP, CRL, OCSP) are served unauthenticated at the HTTP layer per their respective RFCs; see CC6.1 for the full authentication contract and CWE-306 closure via `preflightSCEPChallengePassword`.
|
||||
- **Agent Authentication** — Agents authenticate to the server via API keys (same mechanism as users). Agent credentials are separate from user API keys.
|
||||
- **Private Key Policy** — Agent-side key generation is the default (`CERTCTL_KEYGEN_MODE=agent`). Server-side keygen (`CERTCTL_KEYGEN_MODE=server`) requires explicit configuration and logs a warning: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only".
|
||||
- **Password Policy** — Not applicable; certctl uses API keys exclusively. Password management is delegated to your organization's IAM system if you integrate OIDC/SSO (V3).
|
||||
@@ -160,14 +189,20 @@ Each section includes:
|
||||
|
||||
- **Health Endpoint** — `GET /health` returns 200 OK with service status. Consumed by Docker health checks and Kubernetes probes.
|
||||
- **Readiness Endpoint** — `GET /ready` returns 200 OK when the database is connected and migrations are applied.
|
||||
- **Background Scheduler Monitoring** — 6 background loops run on a fixed schedule:
|
||||
- Renewal loop: every 1 hour, scans for certificates approaching renewal threshold
|
||||
- Job processor loop: every 30 seconds, picks up pending/waiting jobs and advances their state
|
||||
- Health check loop: every 2 minutes, pings agents to detect downtime
|
||||
- Notification dispatcher loop: every 1 minute, sends queued alerts
|
||||
- Short-lived cert expiry loop: every 30 seconds, marks expired short-lived credentials
|
||||
- Network scanner loop: every 6 hours, scans enabled TLS endpoints for certificate discovery
|
||||
Each loop includes error handling and logs failures via structured slog.
|
||||
- **Background Scheduler Monitoring** — 12 background loops (8 always-on + 4 opt-in) run on a fixed schedule. Authoritative topology in `docs/architecture.md`:
|
||||
- Renewal loop (always-on, 1 hour): scans for certificates approaching renewal threshold
|
||||
- Job processor loop (always-on, 30 seconds): picks up pending/waiting jobs and advances their state
|
||||
- Job retry loop (always-on, 5 minutes, `CERTCTL_SCHEDULER_RETRY_INTERVAL`): retries Failed jobs (I-001)
|
||||
- Job timeout reaper loop (always-on, 10 minutes, `CERTCTL_JOB_TIMEOUT_INTERVAL`): fails AwaitingCSR/AwaitingApproval jobs past timeout (I-003)
|
||||
- Agent health check loop (always-on, 2 minutes): pings agents to detect downtime
|
||||
- Notification dispatcher loop (always-on, 1 minute): sends queued alerts
|
||||
- Notification retry loop (always-on, 2 minutes, `CERTCTL_NOTIFICATION_RETRY_INTERVAL`): exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005)
|
||||
- Short-lived cert expiry loop (always-on, 30 seconds): marks expired short-lived credentials
|
||||
- Network scanner loop (opt-in, 6 hours, `CERTCTL_NETWORK_SCAN_ENABLED`): scans enabled TLS endpoints for certificate discovery
|
||||
- Digest emailer loop (opt-in, 24 hours, `CERTCTL_DIGEST_INTERVAL`): sends scheduled certificate digest email to configured recipients
|
||||
- Endpoint health loop (opt-in, 60 seconds, `CERTCTL_HEALTH_CHECK_INTERVAL`): continuous TLS health probes (M48)
|
||||
- Cloud discovery loop (opt-in, 6 hours, `CERTCTL_CLOUD_DISCOVERY_INTERVAL`): cloud secret manager certificate discovery (M50)
|
||||
Each loop includes `atomic.Bool` idempotency guards, error handling, and structured slog failure logs.
|
||||
- **Metrics Endpoints** — Two formats for monitoring integration:
|
||||
- `GET /api/v1/metrics` — JSON object with gauges, counters, and uptime for custom dashboards
|
||||
- `GET /api/v1/metrics/prometheus` — Prometheus exposition format (`text/plain; version=0.0.4`) for native scraping by Prometheus, Grafana Agent, Datadog, and other OpenMetrics-compatible collectors
|
||||
@@ -210,7 +245,7 @@ Each section includes:
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, path, query parameters, actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp.
|
||||
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, URL path (query parameters intentionally excluded — see security note), actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp. **Security: Query parameters are excluded from the audit path** because they may contain cursor tokens, API keys, or sensitive filter values; since the audit trail is append-only with no deletion, any sensitive data recorded would persist permanently.
|
||||
- **Audit Trail API** — `GET /api/v1/audit?actor=...&action=...&resource_id=...&created_after=...&created_before=...` allows searching for anomalous patterns (e.g., "who accessed certificate XYZ and when?", "did anyone revoke certs at 2 AM?").
|
||||
- **Expiration Threshold Alerting** — Certificate renewal policies define alert thresholds (days before expiry): default `[30, 14, 7, 0]`. When a certificate approaches a threshold, a notification is enqueued. Deduplication prevents duplicate alerts for the same cert at the same threshold. Auto status transition: cert moves to `Expiring` status at 30 days, `Expired` at 0 days.
|
||||
- **Certificate Status Auto-Transitions** — When a cert is issued, it's `Active`. As expiry approaches, status auto-transitions to `Expiring` (at 30d threshold). At expiry, status becomes `Expired`. Revoked certs move to `Revoked`. These transitions are recorded in the audit trail.
|
||||
@@ -258,12 +293,13 @@ Each section includes:
|
||||
- `certificateHold` — temporary revocation (can be "unhold" by reissue)
|
||||
- `privilegeWithdrawn` — access rights revoked
|
||||
Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
|
||||
- **CRL Endpoint** — `GET /api/v1/crl` returns a JSON-formatted Certificate Revocation List (serial, reason, timestamp for each revoked cert). `GET /api/v1/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (useful for legacy clients that don't support OCSP).
|
||||
- **OCSP Responder** — `GET /api/v1/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
|
||||
- **CRL Endpoint** — `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
|
||||
- **OCSP Responder** — `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
|
||||
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
|
||||
- Certificate owner (email)
|
||||
- Configured webhooks (if you have a SIEM that subscribes)
|
||||
- Slack/Teams channels (if notifiers are configured)
|
||||
- **Bulk Revocation for Fleet-Wide Incidents** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. Essential for incident response: key compromise affecting multiple certs, CA distrust events, decommissioning a team's infrastructure. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring audit trail and notifications for every certificate.
|
||||
- **Short-Lived Cert Exemption** — Certificates with TTL < 1 hour (configured in profile) skip CRL/OCSP publication. Expiry is the revocation mechanism for short-lived certs (e.g., Kubernetes pod certs, session tokens).
|
||||
- **Deployment Rollback** — If a revoked cert is still deployed (shouldn't happen, but race conditions exist), operators can manually redeploy a previous version via the GUI. Rollback is audited.
|
||||
|
||||
@@ -278,7 +314,6 @@ Each section includes:
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Bulk Revocation** — Revoke all certs issued by a specific profile, owner, or agent in a single API call (useful for large-scale incidents like CA compromise)
|
||||
- **Revocation Automation** — Trigger revocation based on external events (e.g., employee termination, security breach alert from CT Log monitoring)
|
||||
|
||||
**Operator Responsibility**:
|
||||
@@ -429,15 +464,15 @@ Each section includes:
|
||||
| | Metrics JSON Endpoint | `GET /api/v1/metrics` (gauges, counters, uptime) | ✅ | ✅ | Set thresholds, configure alerting |
|
||||
| | Stats API (time-series) | `GET /api/v1/stats/*` (summary, status, expiration, jobs, issuance) | ✅ | ✅ | Integrate into dashboards, SLO tracking |
|
||||
| | Structured Logging | `slog` middleware with request IDs | ✅ | ✅ | Aggregate logs to SIEM, define retention policy |
|
||||
| | Background Scheduler | 6 loops (renewal 1h, jobs 30s, health 2m, notifications 1m, short-lived 30s, network scan 6h) | ✅ | ✅ | Alert on scheduler loop failures |
|
||||
| | Background Scheduler | 12 loops (8 always-on: renewal 1h, jobs 30s, job retry 5m I-001, job timeout 10m I-003, health 2m, notifications 1m, notif retry 2m I-005, short-lived 30s; 4 opt-in: network scan 6h, digest 24h, endpoint health 60s M48, cloud discovery 6h M50) | ✅ | ✅ | Alert on scheduler loop failures |
|
||||
| **CC7.2** Anomaly Detection | Immutable API Audit Trail | `internal/api/middleware/audit.go`, `GET /api/v1/audit` | ✅ | Enhanced (SIEM export) | Integrate into SIEM, search for anomalies, archive long-term |
|
||||
| | Expiration Threshold Alerting | Configurable per-policy (default 30/14/7/0 days) | ✅ | ✅ | Configure thresholds, integrate notifications |
|
||||
| | Status Auto-Transitions | Active → Expiring (30d) → Expired (0d) | ✅ | ✅ | Monitor status changes in audit trail |
|
||||
| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
|
||||
| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
|
||||
| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
|
||||
| | CRL Endpoint (JSON + DER) | `GET /api/v1/crl`, `GET /api/v1/crl/{issuer_id}` | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients |
|
||||
| | OCSP Responder | `GET /api/v1/ocsp/{issuer_id}/{serial}` | ✅ | ✅ | Test revocation in staging |
|
||||
| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
|
||||
| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
|
||||
| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
|
||||
| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
|
||||
| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
|
||||
|
||||
@@ -2,6 +2,41 @@
|
||||
|
||||
If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [What Is a TLS Certificate?](#what-is-a-tls-certificate)
|
||||
2. [Why Do Certificates Expire?](#why-do-certificates-expire)
|
||||
3. [The Cast of Characters](#the-cast-of-characters)
|
||||
- [Certificate Authority (CA)](#certificate-authority-ca)
|
||||
- [ACME Protocol](#acme-protocol)
|
||||
- [EST Protocol (Enrollment over Secure Transport)](#est-protocol-enrollment-over-secure-transport)
|
||||
- [Private Key](#private-key)
|
||||
- [Subject Alternative Names (SANs)](#subject-alternative-names-sans)
|
||||
- [Certificate Chain](#certificate-chain)
|
||||
4. [How certctl Works](#how-certctl-works)
|
||||
- [The Control Plane (Server)](#the-control-plane-server)
|
||||
- [Agents](#agents)
|
||||
- [Deployment Targets](#deployment-targets)
|
||||
5. [The Certificate Lifecycle](#the-certificate-lifecycle)
|
||||
6. [Why Not Just Use Certbot?](#why-not-just-use-certbot)
|
||||
7. [Key Concepts in certctl](#key-concepts-in-certctl)
|
||||
- [Teams and Owners](#teams-and-owners)
|
||||
- [Agent Groups](#agent-groups)
|
||||
- [Certificate Profiles](#certificate-profiles)
|
||||
- [Interactive Renewal Approval](#interactive-renewal-approval)
|
||||
- [Certificate Revocation](#certificate-revocation)
|
||||
- [Short-Lived Certificates](#short-lived-certificates)
|
||||
- [Policies](#policies)
|
||||
- [Jobs](#jobs)
|
||||
- [Audit Trail](#audit-trail)
|
||||
- [Notifications](#notifications)
|
||||
- [CLI](#cli)
|
||||
- [MCP Server (AI Integration)](#mcp-server-ai-integration)
|
||||
- [EST Enrollment (Device Certificates)](#est-enrollment-device-certificates)
|
||||
- [Certificate Discovery](#certificate-discovery)
|
||||
- [Observability](#observability)
|
||||
8. [What's Next](#whats-next)
|
||||
|
||||
## What Is a TLS Certificate?
|
||||
|
||||
When you visit `https://yourbank.com`, your browser checks a digital document called a **TLS certificate** before sending any data. That certificate proves two things: (1) you're really talking to yourbank.com and not an imposter, and (2) everything sent between you and the server is encrypted.
|
||||
@@ -34,9 +69,9 @@ certctl includes a built-in **Local CA** that can operate in two modes: self-sig
|
||||
|
||||
### ACME Protocol
|
||||
|
||||
ACME (Automatic Certificate Management Environment) is the protocol Let's Encrypt created for automated certificate issuance. Instead of filling out forms and waiting for emails, ACME lets software request, validate, and receive certificates programmatically. The server proves domain ownership by responding to challenges — placing a specific file on the web server (HTTP-01) or creating a DNS record (DNS-01).
|
||||
ACME (Automatic Certificate Management Environment) is the protocol Let's Encrypt created for automated certificate issuance. Instead of filling out forms and waiting for emails, ACME lets software request, validate, and receive certificates programmatically. The server proves domain ownership by responding to challenges — placing a specific file on the web server (HTTP-01), creating a DNS record (DNS-01), or maintaining a standing DNS record that persists across renewals (DNS-PERSIST-01).
|
||||
|
||||
certctl speaks ACME natively with both HTTP-01 and DNS-01 challenges, so it can request certificates — including wildcard certificates — from Let's Encrypt or any ACME-compatible CA without manual intervention. HTTP-01 uses a built-in temporary HTTP server for domain validation; DNS-01 uses pluggable script-based hooks to create TXT records with any DNS provider (Cloudflare, Route53, Azure DNS, etc.).
|
||||
certctl speaks ACME natively with HTTP-01, DNS-01, and DNS-PERSIST-01 challenges, so it can request certificates — including wildcard certificates — from Let's Encrypt or any ACME-compatible CA without manual intervention. HTTP-01 uses a built-in temporary HTTP server for domain validation; DNS-01 uses pluggable script-based hooks to create TXT records with any DNS provider (Cloudflare, Route53, Azure DNS, etc.); DNS-PERSIST-01 creates a standing `_validation-persist` TXT record once (containing the CA domain and account URI) that the CA revalidates on every renewal — no per-renewal DNS updates needed. If the CA doesn't yet support DNS-PERSIST-01, certctl automatically falls back to DNS-01.
|
||||
|
||||
### EST Protocol (Enrollment over Secure Transport)
|
||||
|
||||
@@ -88,11 +123,13 @@ At no point does the private key leave the agent. This is a fundamental security
|
||||
|
||||
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
|
||||
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
Targets are the systems where certificates actually get installed — NGINX web servers, Apache httpd servers, HAProxy load balancers, F5 BIG-IP appliances, Microsoft IIS servers. Each target type has a **connector** that knows how to deploy certificates to that specific system (e.g., writing files and reloading NGINX or Apache config, building a combined PEM for HAProxy).
|
||||
Targets are the systems where certificates actually get installed — NGINX web servers, Apache httpd servers, HAProxy load balancers, Traefik reverse proxies, Caddy servers, Envoy gateways, Postfix/Dovecot mail servers, Microsoft IIS servers, and network appliances. Each target type has a **connector** that knows how to deploy certificates to that specific system (e.g., writing files and reloading NGINX or Apache config, building a combined PEM for HAProxy).
|
||||
|
||||
For targets where an agent runs directly on the machine (NGINX, Apache, HAProxy, IIS), the agent deploys certificates locally — no remote access needed. For network appliances where you can't install an agent (F5 BIG-IP, Palo Alto, etc.), a **proxy agent** in the same network zone picks up the deployment job and calls the appliance's API. The server never initiates outbound connections to any target.
|
||||
For targets where an agent runs directly on the machine (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS), the agent deploys certificates locally — no remote access needed. For network appliances where you can't install an agent (F5 BIG-IP, Palo Alto, etc.), a **proxy agent** in the same network zone picks up the deployment job and calls the appliance's API. The server never initiates outbound connections to any target.
|
||||
|
||||
## The Certificate Lifecycle
|
||||
|
||||
@@ -148,6 +185,29 @@ Profiles are managed via the API (`/api/v1/profiles`) and the GUI, and can be as
|
||||
|
||||
For policies with `auto_renew` disabled, renewal jobs enter an **AwaitingApproval** state instead of processing immediately. An operator must explicitly approve or reject the renewal via the API or GUI. Approved jobs transition to Pending and are picked up by the scheduler. Rejected jobs are cancelled with an optional reason. This is useful for high-value certificates where you want human oversight before renewal.
|
||||
|
||||
### Renewal Timing: Thresholds vs. ARI (RFC 9773)
|
||||
|
||||
**Traditional approach (thresholds):** By default, certctl uses static renewal thresholds — renew a certificate at a fixed number of days before expiry (default: 30 days). This simple, predictable model works for most use cases: it avoids unnecessary renewals near expiry and gives you a predictable window to catch failures.
|
||||
|
||||
**Advanced approach (ACME ARI):** Some Certificate Authorities support ACME Renewal Information (RFC 9773), which allows the CA to tell certctl the optimal time to renew. Instead of guessing "renew 30 days before expiry," the CA responds with a precise `suggestedWindow` containing start and end times. This is useful when:
|
||||
- The CA is performing maintenance and wants to batch renewals in a specific window
|
||||
- The CA is coordinating a mass revocation (e.g., due to a compromise) and needs to control renewal timing
|
||||
- You want to avoid thundering herd renewal spikes by accepting the CA's suggested timing
|
||||
|
||||
**How it works:** Enable with `CERTCTL_ACME_ARI_ENABLED=true` on your ACME issuer. When a certificate approaches expiry, certctl queries the ARI endpoint with the certificate's DER encoding. The CA responds with a suggested renewal window. If the current time is within the window or past the start time, certctl renews immediately. Otherwise, it waits until the window opens.
|
||||
|
||||
**Graceful degradation:** If your CA doesn't support ARI (returns 404 from the ARI endpoint), certctl automatically falls back to the traditional threshold-based renewal. No configuration change needed — the fallback is transparent. Errors from the CA are logged as warnings and don't block the renewal process.
|
||||
|
||||
### Shorter Certificate Validity (45-Day and 6-Day Certs)
|
||||
|
||||
The industry is moving toward shorter certificate lifetimes. The CA/Browser Forum's SC-081v3 ballot mandates a phased reduction: 200-day max (March 2026), 100-day max (March 2027), and 47-day max (March 2029). Let's Encrypt has already begun reducing default validity to 45 days, and offers 6-day "shortlived" certificates via ACME profile selection.
|
||||
|
||||
certctl handles shorter-lived certificates correctly out of the box:
|
||||
|
||||
- **45-day certs** with the default 31-day renewal window trigger renewal at day 14 — at roughly 1/3 of the cert's lifetime.
|
||||
- **6-day "shortlived" certs** are always within the renewal window. ARI (RFC 9773) is the expected renewal path for these — the CA directs timing. Short-lived certs also skip CRL/OCSP since expiry is sufficient revocation (per profile TTL < 1 hour exemption).
|
||||
- **ACME profile selection** lets you request specific certificate profiles from your CA. Set `CERTCTL_ACME_PROFILE=shortlived` to get 6-day certificates from Let's Encrypt, or `CERTCTL_ACME_PROFILE=tlsserver` for standard TLS certificates.
|
||||
|
||||
### Certificate Revocation
|
||||
|
||||
When a private key is compromised, a certificate is superseded, or a service is decommissioned, you need to revoke the certificate immediately — not wait for it to expire. Revocation tells clients "stop trusting this certificate right now."
|
||||
@@ -156,9 +216,11 @@ certctl implements revocation using three complementary mechanisms:
|
||||
|
||||
**Revocation API**: `POST /api/v1/certificates/{id}/revoke` marks a certificate as revoked in the inventory, records the revocation in a dedicated `certificate_revocations` table, notifies the issuing CA (best-effort — the revocation succeeds even if the CA is unreachable), creates an audit trail entry, and sends notifications. You can specify an RFC 5280 reason code (keyCompromise, superseded, cessationOfOperation, etc.) or let it default to "unspecified."
|
||||
|
||||
**Certificate Revocation List (CRL)**: certctl serves both a JSON-formatted CRL at `GET /api/v1/crl` and DER-encoded X.509 CRLs per issuer at `GET /api/v1/crl/{issuer_id}`. The DER CRL is signed by the issuing CA's key and has 24-hour validity — clients can download it periodically to check revocation status offline.
|
||||
**Bulk Revocation** (Fleet-Level Incident Response): For large-scale incidents like CA compromise or team infrastructure decommissioning, `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria in a single operation. Filter by profile, owner, team, agent group, or issuer to target the affected certificate set. This is essential for incident response — instead of revoking certificates one-by-one, operators can revoke an entire fleet in minutes. Bulk revocation creates individual revocation jobs that reuse the existing revocation pipeline, ensuring every certificate is audited and notifications are sent.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}`. It returns signed OCSP responses (good, revoked, or unknown) so clients can verify certificate status without downloading the full CRL.
|
||||
**Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960). Like the CRL endpoint, it is unauthenticated and returns signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`, so clients can verify certificate status without downloading the full CRL.
|
||||
|
||||
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
|
||||
|
||||
@@ -194,7 +256,7 @@ The CLI supports both table and JSON output formats (`--format table` or `--form
|
||||
|
||||
### MCP Server (AI Integration)
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes 78 MCP tools covering the REST API. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.
|
||||
|
||||
@@ -211,10 +273,12 @@ Certificate discovery is the process of automatically finding existing certifica
|
||||
**How it works:** There are two discovery modes. *Filesystem discovery* — agents scan configured directories (configured via `CERTCTL_DISCOVERY_DIRS`) for certificate files. On startup and every 6 hours, the agent walks directories recursively, parses PEM and DER files, extracts metadata, and reports findings to the control plane. *Network discovery* — the control plane itself probes TLS endpoints across configured CIDR ranges and ports (enabled via `CERTCTL_NETWORK_SCAN_ENABLED=true`). It connects to each endpoint, extracts certificates from the TLS handshake, and feeds results into the same discovery pipeline. This finds certificates on services you may not have agents on. In both cases, the server deduplicates by fingerprint and stores discovered certs with a status: **Unmanaged** (discovered but not yet managed), **Managed** (linked to a control plane cert), or **Dismissed** (operator decided not to manage it).
|
||||
|
||||
This gives you a three-step triage workflow:
|
||||
1. **Discover** — Agents find all existing certs on your infrastructure
|
||||
2. **Triage** — Operators review discoveries and decide: claim it (enroll for management), or dismiss it (not worth managing)
|
||||
1. **Discover** — Agents scan filesystems and the server probes network endpoints to find all existing certs
|
||||
2. **Triage** — Operators review discoveries in the **Discovery** dashboard page and decide: claim it (link to a managed certificate) or dismiss it (not worth managing). The dashboard shows a summary stats bar (Unmanaged/Managed/Dismissed counts), filters by status and agent, and provides one-click claim and dismiss actions.
|
||||
3. **Baseline** — Once triaged, you have a complete baseline of what's deployed, what you're managing, and what's unmanaged
|
||||
|
||||
Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.
|
||||
|
||||
This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
|
||||
|
||||
### Observability
|
||||
|
||||
@@ -5,6 +5,41 @@ This demo goes beyond browsing pre-loaded data. You'll create a team, register a
|
||||
**Time**: 15-20 minutes
|
||||
**Prerequisites**: certctl running via Docker Compose (see [Quick Start](quickstart.md))
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Setup](#setup)
|
||||
2. [How the pieces fit together](#how-the-pieces-fit-together)
|
||||
3. [Alternative Issuers Reference](#alternative-issuers-reference)
|
||||
- [Sub-CA Mode](#sub-ca-mode-local-ca-chained-to-enterprise-root)
|
||||
- [ACME with ZeroSSL](#acme-with-zerossl-auto-eab)
|
||||
- [ACME with DNS-01 Challenges](#acme-with-dns-01-challenges-wildcard-certificates)
|
||||
- [ACME with DNS-PERSIST-01](#acme-with-dns-persist-01-zero-touch-renewals)
|
||||
- [step-ca (Smallstep Private CA)](#step-ca-smallstep-private-ca)
|
||||
- [OpenSSL / Custom CA](#openssl--custom-ca-script-based)
|
||||
4. [Part 1: Build the Organization Structure](#part-1-build-the-organization-structure)
|
||||
5. [Part 2: Verify the Issuer](#part-2-verify-the-issuer)
|
||||
6. [Part 3: Create a Managed Certificate](#part-3-create-a-managed-certificate)
|
||||
7. [Part 4: Trigger Certificate Renewal](#part-4-trigger-certificate-renewal)
|
||||
8. [Part 4.5: Manage Deployment Targets](#part-45-manage-deployment-targets)
|
||||
9. [Part 5: Deploy the Certificate](#part-5-deploy-the-certificate)
|
||||
10. [Part 6: View the Audit Trail](#part-6-view-the-audit-trail-immutable-api-audit-log)
|
||||
11. [Part 7: Check Notifications](#part-7-check-notifications)
|
||||
12. [Part 8: Create a Second Certificate and Compare](#part-8-create-a-second-certificate-and-compare)
|
||||
13. [Part 8.5: Revoke a Certificate](#part-85-revoke-a-certificate)
|
||||
14. [Part 9: Policy Violations](#part-9-policy-violations)
|
||||
15. [Part 9.5: Dashboard Stats and Metrics](#part-95-dashboard-stats-and-metrics)
|
||||
16. [Part 10: Certificate Profiles](#part-10-certificate-profiles)
|
||||
17. [Part 11: Agent Groups](#part-11-agent-groups)
|
||||
18. [Part 12: Interactive Approval Workflow](#part-12-interactive-approval-workflow)
|
||||
19. [Part 13: Advanced Query Features](#part-13-advanced-query-features)
|
||||
20. [Part 14: CLI Tool](#part-14-cli-tool-m16b)
|
||||
21. [Part 15: MCP Server for AI Integration](#part-15-mcp-server-for-ai-integration-m18a)
|
||||
22. [Part 16: Certificate Discovery](#part-16-certificate-discovery-m18b--m21)
|
||||
23. [End-to-End Architecture Summary](#end-to-end-architecture-summary)
|
||||
24. [Full Automated Script](#full-automated-script)
|
||||
25. [What to Show Stakeholders](#what-to-show-stakeholders)
|
||||
26. [Teardown](#teardown)
|
||||
|
||||
## Setup
|
||||
|
||||
Make sure certctl is running:
|
||||
@@ -15,14 +50,17 @@ docker compose -f deploy/docker-compose.yml up -d --build
|
||||
docker compose -f deploy/docker-compose.yml ps
|
||||
```
|
||||
|
||||
Open **http://localhost:8443** in your browser alongside your terminal. You'll watch changes appear in the dashboard as you make API calls.
|
||||
Open **https://localhost:8443** in your browser alongside your terminal. The default compose stack ships a self-signed cert; your browser will show a warning the first time — click through (or trust `deploy/test/certs/ca.crt` in your OS keychain). You'll watch changes appear in the dashboard as you make API calls.
|
||||
|
||||
Set up a base variable for convenience:
|
||||
Set up base variables for convenience:
|
||||
|
||||
```bash
|
||||
API="http://localhost:8443"
|
||||
API="https://localhost:8443"
|
||||
CA="$PWD/deploy/test/certs/ca.crt" # pin the self-signed CA for curl
|
||||
```
|
||||
|
||||
Every `curl` in this guide uses `--cacert "$CA"` so the TLS handshake verifies against the compose-stack CA instead of the system trust store.
|
||||
|
||||
## How the pieces fit together
|
||||
|
||||
Before we start, here's the high-level flow of what we're about to do:
|
||||
@@ -62,6 +100,27 @@ docker compose -f deploy/docker-compose.yml restart server
|
||||
|
||||
The CA key can be RSA, ECDSA, or PKCS#8 format. The connector validates that the certificate has `IsCA=true` and `KeyUsageCertSign`.
|
||||
|
||||
### ACME with ZeroSSL (Auto-EAB)
|
||||
|
||||
ZeroSSL is a free ACME CA that requires External Account Binding (EAB) for account registration. certctl auto-fetches EAB credentials from ZeroSSL's public API when the directory URL is detected as ZeroSSL and no EAB credentials are provided — you just need an email address:
|
||||
|
||||
```bash
|
||||
# Minimal config — certctl auto-fetches EAB credentials from ZeroSSL
|
||||
export CERTCTL_ACME_DIRECTORY_URL="https://acme.zerossl.com/v2/DV90"
|
||||
export CERTCTL_ACME_EMAIL="ops@example.com"
|
||||
```
|
||||
|
||||
No dashboard visit, no manual EAB credential copy-paste. certctl calls `api.zerossl.com/acme/eab-credentials-email` with your email, gets back a KID + HMAC key, and uses them for ACME account registration automatically.
|
||||
|
||||
If you already have EAB credentials (e.g., from the ZeroSSL dashboard or for other CAs like Google Trust Services or SSL.com), you can provide them explicitly:
|
||||
|
||||
```bash
|
||||
export CERTCTL_ACME_DIRECTORY_URL="https://acme.zerossl.com/v2/DV90"
|
||||
export CERTCTL_ACME_EMAIL="ops@example.com"
|
||||
export CERTCTL_ACME_EAB_KID="your-key-id"
|
||||
export CERTCTL_ACME_EAB_HMAC="your-base64url-hmac-key"
|
||||
```
|
||||
|
||||
### ACME with DNS-01 Challenges (Wildcard Certificates)
|
||||
|
||||
For Let's Encrypt or other ACME providers with wildcard support:
|
||||
@@ -97,6 +156,21 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
}' | jq .
|
||||
```
|
||||
|
||||
### ACME with DNS-PERSIST-01 (Zero-Touch Renewals)
|
||||
|
||||
DNS-PERSIST-01 uses a standing `_validation-persist` TXT record that you set once. The CA revalidates it on every renewal — no per-renewal DNS updates, no cleanup scripts, no propagation waits. If the CA doesn't support DNS-PERSIST-01 yet, certctl falls back to DNS-01 automatically.
|
||||
|
||||
```bash
|
||||
# Configure ACME DNS-PERSIST-01
|
||||
export CERTCTL_ACME_CHALLENGE_TYPE="dns-persist-01"
|
||||
export CERTCTL_ACME_DNS_PRESENT_SCRIPT="/usr/local/bin/dns-present.sh"
|
||||
export CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN="letsencrypt.org"
|
||||
|
||||
# The present script creates a _validation-persist.<domain> TXT record with value:
|
||||
# "letsencrypt.org; accounturi=https://acme-v02.api.letsencrypt.org/acme/acct/12345"
|
||||
# This record is set once and never touched again.
|
||||
```
|
||||
|
||||
### step-ca (Smallstep Private CA)
|
||||
|
||||
For organizations running step-ca as their private CA:
|
||||
@@ -221,7 +295,7 @@ You should see:
|
||||
|
||||
The result is a structurally valid X.509 certificate — browsers won't trust it (no root CA in their trust store), but it exercises the exact same code paths that a production ACME or Vault issuer would.
|
||||
|
||||
**Why pluggable issuers:** Different organizations use different CAs. Some use Let's Encrypt (ACME protocol), some use step-ca or internal PKI (Vault), some use commercial CAs (DigiCert, Entrust, GlobalSign), and some have custom OpenSSL-based workflows. For enterprises with ADCS, certctl can operate as a sub-CA — all issued certs chain to the enterprise root. The connector interface means certctl doesn't care — it calls `IssueCertificate()` and gets back a signed cert regardless of the backend. V1 ships with Local CA (self-signed or sub-CA), ACME (HTTP-01 + DNS-01 for wildcards), and step-ca (Smallstep private CA via native /sign API). V2 adds the OpenSSL/Custom CA connector (script-based signing). DigiCert, Vault PKI, Entrust, GlobalSign, Google CAS, and EJBCA are planned for V3+.
|
||||
**Why pluggable issuers:** Different organizations use different CAs. Some use Let's Encrypt (ACME protocol), some use step-ca or internal PKI (Vault), some use commercial CAs (DigiCert, Entrust, GlobalSign), and some have custom OpenSSL-based workflows. For enterprises with ADCS, certctl can operate as a sub-CA — all issued certs chain to the enterprise root. The connector interface means certctl doesn't care — it calls `IssueCertificate()` and gets back a signed cert regardless of the backend. V1 ships with Local CA (self-signed or sub-CA), ACME (HTTP-01 + DNS-01 + DNS-PERSIST-01 for wildcards), and step-ca (Smallstep private CA via native /sign API). V2 adds the OpenSSL/Custom CA connector (script-based signing). DigiCert, Vault PKI, Entrust, GlobalSign, Google CAS, and EJBCA are planned for V3+.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
@@ -236,8 +310,8 @@ flowchart TD
|
||||
A --> F["ACME\n(Let's Encrypt)"]
|
||||
A --> G["step-ca\n(implemented)"]
|
||||
A --> H["OpenSSL / Custom CA\n(script-based)"]
|
||||
A --> J["DigiCert API\n(planned)"]
|
||||
A --> K["Vault PKI\n(planned)"]
|
||||
A --> J["DigiCert API\n(implemented)"]
|
||||
A --> K["Vault PKI\n(implemented)"]
|
||||
A --> L["Entrust / GlobalSign\n(planned)"]
|
||||
A --> M["Google CAS / EJBCA\n(planned)"]
|
||||
```
|
||||
@@ -653,22 +727,24 @@ curl -s -X POST $API/api/v1/certificates/mc-demo-payments/revoke \
|
||||
6. Creates an audit trail entry
|
||||
7. Sends revocation notifications via configured channels
|
||||
|
||||
Check the CRL (Certificate Revocation List):
|
||||
Check the CRL (Certificate Revocation List) — served unauthenticated under the RFC 8615 well-known namespace so relying parties without a certctl API key can still verify revocation (RFC 5280 §5):
|
||||
|
||||
```bash
|
||||
# JSON-formatted CRL
|
||||
curl -s $API/api/v1/crl | jq .
|
||||
|
||||
# DER-encoded X.509 CRL for the local CA (binary — pipe to openssl for inspection)
|
||||
curl -s $API/api/v1/crl/iss-local -o /tmp/crl.der
|
||||
# DER-encoded X.509 CRL for the local CA (binary — pipe to openssl for inspection).
|
||||
# Note: no -H "Authorization: Bearer ..." — the endpoint is deliberately
|
||||
# unauthenticated. Content-Type is application/pkix-crl.
|
||||
curl --cacert "$CA" -s https://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
|
||||
openssl crl -inform DER -in /tmp/crl.der -text -noout
|
||||
```
|
||||
|
||||
Check OCSP status:
|
||||
Check OCSP status (RFC 6960, also unauthenticated, `application/ocsp-response`):
|
||||
|
||||
```bash
|
||||
# Replace SERIAL with the actual serial number from the certificate version
|
||||
curl -s $API/api/v1/ocsp/iss-local/SERIAL | jq .
|
||||
# Replace SERIAL with the actual serial number from the certificate version.
|
||||
# The embedded OCSP responder returns a signed DER response — parse it with
|
||||
# `openssl ocsp -respin` or similar tooling.
|
||||
curl --cacert "$CA" -s https://localhost:8443/.well-known/pki/ocsp/iss-local/SERIAL -o /tmp/ocsp.der
|
||||
openssl ocsp -respin /tmp/ocsp.der -noverify -resp_text | head -40
|
||||
```
|
||||
|
||||
**Why RFC 5280 reason codes:** The reason code isn't just metadata — it tells clients *why* the certificate was revoked. A `keyCompromise` revocation means the private key was exposed and the certificate should be distrusted immediately. A `superseded` revocation means a newer certificate replaced it — less urgent. CRLs and OCSP responses include the reason code so client software can make informed trust decisions.
|
||||
@@ -805,14 +881,14 @@ curl -s -X POST $API/api/v1/agent-groups \
|
||||
|
||||
## Part 12: Interactive Approval Workflow
|
||||
|
||||
For high-value certificates, you may want human oversight before renewal proceeds. Create a policy that requires approval:
|
||||
For high-value certificates, you may want human oversight before renewal proceeds. The demo includes 2 pre-seeded `AwaitingApproval` renewal jobs (for `auth-production` and `payments-production`). Open **Jobs** in the sidebar — you'll see the amber "Pending Approval" banner and Approve/Reject buttons immediately.
|
||||
|
||||
```bash
|
||||
# Check jobs that need approval
|
||||
# Check jobs that need approval (demo includes 2)
|
||||
curl -s "$API/api/v1/jobs?status=AwaitingApproval" | jq '.data[] | {id, type, certificate_id, status}'
|
||||
```
|
||||
|
||||
If there are jobs awaiting approval, approve or reject them:
|
||||
Approve or reject them:
|
||||
|
||||
```bash
|
||||
# Approve a job
|
||||
@@ -830,6 +906,8 @@ curl -s -X POST $API/api/v1/jobs/JOB_ID/reject \
|
||||
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. PCI-scoped certificates, certs with specific compliance requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
|
||||
**In the dashboard:** Click "Jobs" in the sidebar, filter by status "AwaitingApproval", and you'll see a list of renewal jobs waiting for approval. Each job shows the certificate, issuer, and requested validity period. Click a job to open its detail view and see the Approve / Reject buttons with a reason text field. After approval or rejection, the job status updates in real-time and the audit trail records the decision.
|
||||
|
||||
---
|
||||
|
||||
## Part 13: Advanced Query Features
|
||||
@@ -871,7 +949,8 @@ certctl includes a standalone CLI tool for command-line users:
|
||||
cd cmd/cli && go build -o certctl-cli .
|
||||
|
||||
# Export credentials
|
||||
export CERTCTL_SERVER_URL="http://localhost:8443"
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$PWD/deploy/test/certs/ca.crt"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
|
||||
# List certificates (JSON or table format)
|
||||
@@ -908,14 +987,15 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
|
||||
## Part 15: MCP Server for AI Integration (M18a)
|
||||
|
||||
certctl exposes 78 MCP tools covering the REST API via the Model Context Protocol (MCP), enabling seamless integration with Claude, Cursor, and other AI assistants:
|
||||
certctl exposes the full REST API via the Model Context Protocol (MCP), enabling seamless integration with Claude, Cursor, and other AI assistants:
|
||||
|
||||
```bash
|
||||
# Build the MCP server
|
||||
cd cmd/mcp-server && go build -o mcp-server .
|
||||
|
||||
# Export credentials
|
||||
export CERTCTL_SERVER_URL="http://localhost:8443"
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$PWD/deploy/test/certs/ca.crt"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
|
||||
# Start the MCP server (listens on stdin/stdout)
|
||||
@@ -956,6 +1036,8 @@ The MCP server is perfect for:
|
||||
|
||||
certctl discovers existing certificates two ways: **filesystem scanning** (agents scan local directories) and **network scanning** (the server probes TLS endpoints). Both feed into the same triage pipeline.
|
||||
|
||||
**The demo comes pre-loaded with discovery data:** 9 discovered certificates (3 Unmanaged from filesystem scans, 3 Unmanaged from network scans, 2 Managed, 1 Dismissed), 3 discovery scans, and 3 network scan targets with recent scan results. Open **Discovery** in the sidebar to see the triage workflow immediately. The steps below show how to configure discovery from scratch.
|
||||
|
||||
### Filesystem Discovery (Agent-Side)
|
||||
|
||||
Configure the demo agent to scan for certificates. In the Docker Compose setup, agents have a `/tmp/certs` directory (created by the seed script). Restart the agent with discovery enabled:
|
||||
@@ -971,12 +1053,12 @@ docker compose -f deploy/docker-compose.yml run -e CERTCTL_DISCOVERY_DIRS=/tmp/c
|
||||
Or with the CLI flag:
|
||||
|
||||
```bash
|
||||
certctl-agent --agent-id a-demo-1 --key-dir /tmp/keys --discovery-dirs /tmp/certs --server http://localhost:8443 --api-key test-key-123
|
||||
certctl-agent --agent-id a-demo-1 --key-dir /tmp/keys --discovery-dirs /tmp/certs --server https://localhost:8443 --ca-bundle "$CA" --api-key test-key-123
|
||||
```
|
||||
|
||||
### Network Discovery (Server-Side)
|
||||
|
||||
The server can also discover certificates by actively probing TLS endpoints — no agent required. Create a scan target and trigger a scan:
|
||||
The server can also discover certificates by actively probing TLS endpoints — no agent required. Network scanning is enabled by default in the Docker Compose demo (`CERTCTL_NETWORK_SCAN_ENABLED=true`), with 3 pre-configured scan targets. You can create additional targets:
|
||||
|
||||
```bash
|
||||
# Create a network scan target
|
||||
@@ -1030,6 +1112,28 @@ curl -s -X POST "$API/api/v1/discovered-certificates/$DISCOVERED_ID/dismiss" \
|
||||
|
||||
**How it works:** Filesystem discovery: the agent scans `CERTCTL_DISCOVERY_DIRS` on startup and every 6 hours, extracts metadata (common name, SANs, issuer, expiration, key type, fingerprint) from all PEM and DER files, and POSTs findings to `POST /api/v1/agents/{id}/discoveries`. Network discovery: the server expands CIDR ranges (capped at /20 = 4096 IPs), connects to each IP:port via TLS, extracts the peer certificate chain, and stores results using `server-scanner` as a sentinel agent ID. Both sources deduplicate by fingerprint and store results with a status: **Unmanaged** (discovered, not yet managed), **Managed** (linked to a control plane cert), or **Dismissed** (operator decided not to manage). This gives you a triage workflow: discover → review → claim or dismiss.
|
||||
|
||||
### Discovery & Network Scans in the Dashboard
|
||||
|
||||
**Discovered Certificates Page:** Click "Discovery" in the sidebar to see a triage workflow. The page lists all discovered certificates grouped by status (Unmanaged, Managed, Dismissed). For each Unmanaged certificate, you see:
|
||||
- Common name and SANs
|
||||
- Issuer and subject DN
|
||||
- Expiration date
|
||||
- Fingerprint (helps dedup)
|
||||
- Source (agent ID or `server-scanner` for network scans)
|
||||
- Action buttons: Claim (manage this cert), Dismiss (ignore it)
|
||||
|
||||
Click "Claim" to bring an unmanaged certificate under certctl's control. Click "Dismiss" to remove it from the triage queue.
|
||||
|
||||
**Network Scans Page:** Click "Network Scans" in the sidebar to manage network scan targets. The page shows all configured scan targets with:
|
||||
- Target name and description
|
||||
- CIDR ranges and ports scanned
|
||||
- Enabled/disabled toggle
|
||||
- Scan interval and connection timeout
|
||||
- Last scan timestamp and result summary
|
||||
- Action buttons: Edit, Delete, Scan Now (immediate)
|
||||
|
||||
Click "Scan Now" to trigger an immediate TLS probe of the target's IP ranges. Results appear within seconds in the Discovered Certificates page as entries with `agent_id=server-scanner`.
|
||||
|
||||
**In the dashboard**, click "Discovered Certificates" in the sidebar to see what agents and network scans found — claim unmanaged certs to bring them under certctl's management, or dismiss them.
|
||||
|
||||
---
|
||||
@@ -1056,7 +1160,7 @@ flowchart TB
|
||||
API["REST API\nGo net/http"]
|
||||
SVC["Service Layer\nBusiness Logic"]
|
||||
REPO["Repository Layer\ndatabase/sql + lib/pq"]
|
||||
SCHED["Scheduler\n6 background loops"]
|
||||
SCHED["Scheduler\n12 background loops\n(8 always-on + 4 opt-in)"]
|
||||
CONN["Connector Registry\nIssuer + Target + Notifier"]
|
||||
end
|
||||
|
||||
@@ -1092,7 +1196,8 @@ Here's a single script that runs the entire demo end-to-end. Save it as `demo.sh
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
API="http://localhost:8443"
|
||||
API="https://localhost:8443"
|
||||
CA="$PWD/deploy/test/certs/ca.crt" # pin the self-signed CA for curl
|
||||
BLUE='\033[0;34m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
@@ -1200,7 +1305,7 @@ echo " 5. Revoked the certificate with RFC 5280 reason codes"
|
||||
echo " 6. Checked dashboard stats and metrics"
|
||||
echo " 7. All actions recorded in the audit trail"
|
||||
echo ""
|
||||
echo -e "Open ${GREEN}http://localhost:8443${NC} to see everything in the dashboard."
|
||||
echo -e "Open ${GREEN}https://localhost:8443${NC} to see everything in the dashboard."
|
||||
echo "Look for certificate: $CERT_ID"
|
||||
```
|
||||
|
||||
|
||||
@@ -0,0 +1,120 @@
|
||||
# Deployment Examples
|
||||
|
||||
Five turnkey docker-compose scenarios, each runnable in under 5 minutes. Pick the one closest to your setup.
|
||||
|
||||
## Which Example Should I Use?
|
||||
|
||||
| I need to... | Example | Issuer | Target |
|
||||
|--------------|---------|--------|--------|
|
||||
| Get Let's Encrypt certs for NGINX on a public server | [ACME + NGINX](#acme--nginx) | ACME (HTTP-01) | NGINX |
|
||||
| Issue wildcard certs without opening port 80 | [Wildcard DNS-01](#wildcard-dns-01) | ACME (DNS-01) | Any |
|
||||
| Run an internal CA for services behind a firewall | [Private CA + Traefik](#private-ca--traefik) | Local CA | Traefik |
|
||||
| Use Smallstep step-ca as my PKI backend | [step-ca + HAProxy](#step-ca--haproxy) | step-ca | HAProxy |
|
||||
| Manage both public and internal certs from one dashboard | [Multi-Issuer](#multi-issuer) | ACME + Local CA | Mixed |
|
||||
|
||||
**Already using another tool?** See the migration sections below each example for Certbot, acme.sh, and cert-manager users.
|
||||
|
||||
---
|
||||
|
||||
## ACME + NGINX
|
||||
|
||||
**Scenario:** You have one or more public-facing domains, NGINX as the reverse proxy, and want automated Let's Encrypt certificates with HTTP-01 challenges.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + NGINX, all on one Docker network. The agent generates keys locally (ECDSA P-256), submits CSRs to the server, receives signed certs from Let's Encrypt, and deploys them to NGINX with automatic reload.
|
||||
|
||||
**Prerequisites:** A domain pointing to your server, ports 80 and 443 open, Docker Compose v20.10+.
|
||||
|
||||
```bash
|
||||
cd examples/acme-nginx
|
||||
cp .env.example .env # Edit with your domain and email
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../examples/acme-nginx/acme-nginx.md).
|
||||
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](migrate-from-certbot.md).
|
||||
|
||||
---
|
||||
|
||||
## Wildcard DNS-01
|
||||
|
||||
**Scenario:** You need wildcard certificates (`*.example.com`) or your servers aren't reachable from the internet (no port 80). DNS-01 validates ownership by creating a TXT record at your DNS provider.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent. Includes a Cloudflare DNS hook script as a working reference — swap in your own DNS provider (Route53, Azure DNS, Google Cloud DNS, or any provider with an API).
|
||||
|
||||
**Prerequisites:** A domain, API credentials for your DNS provider, Docker Compose.
|
||||
|
||||
```bash
|
||||
cd examples/acme-wildcard-dns01
|
||||
cp .env.example .env # Edit with domain, email, DNS provider credentials
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
|
||||
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](migrate-from-acmesh.md).
|
||||
|
||||
---
|
||||
|
||||
## Private CA + Traefik
|
||||
|
||||
**Scenario:** Internal services that don't need public CA validation. You run your own certificate authority — either a self-signed root for development, or a subordinate CA chained to your enterprise root (e.g., Active Directory Certificate Services).
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + Traefik. The Local CA issuer signs certificates directly. Traefik watches a cert directory and auto-reloads when new files appear.
|
||||
|
||||
**Prerequisites:** Docker Compose. For sub-CA mode, you'll need a CA certificate and key signed by your enterprise root.
|
||||
|
||||
```bash
|
||||
cd examples/private-ca-traefik
|
||||
docker compose up -d # Self-signed mode (no .env needed for demo)
|
||||
```
|
||||
|
||||
The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../examples/private-ca-traefik/private-ca-traefik.md).
|
||||
|
||||
---
|
||||
|
||||
## step-ca + HAProxy
|
||||
|
||||
**Scenario:** You use Smallstep's step-ca as your private PKI and want automated lifecycle management for certificates deployed to HAProxy load balancers.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + step-ca (with JWK provisioner) + HAProxy. certctl issues certs via step-ca's native `/sign` API, combines them into HAProxy's expected PEM format (cert + chain + key in one file), and reloads HAProxy.
|
||||
|
||||
**Prerequisites:** Docker Compose.
|
||||
|
||||
```bash
|
||||
cd examples/step-ca-haproxy
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../examples/step-ca-haproxy/step-ca-haproxy.md).
|
||||
|
||||
---
|
||||
|
||||
## Multi-Issuer
|
||||
|
||||
**Scenario:** You manage both public-facing services (needing Let's Encrypt or another public CA) and internal services (using a private CA) and want a single dashboard for everything.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent configured with both an ACME issuer and a Local CA issuer. Demonstrates issuer assignment via profiles — public services get ACME certs, internal services get Local CA certs, all visible in one inventory.
|
||||
|
||||
**Prerequisites:** Docker Compose. For real ACME certs, a public domain and port 80 access.
|
||||
|
||||
```bash
|
||||
cd examples/multi-issuer
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../examples/multi-issuer/multi-issuer.md).
|
||||
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](certctl-for-cert-manager-users.md).
|
||||
|
||||
---
|
||||
|
||||
## Beyond These Examples
|
||||
|
||||
These 5 scenarios cover the most common deployment patterns, but certctl supports 7 issuer backends and 10 target connectors. Once you have the basics running, you can mix and match:
|
||||
|
||||
**Issuers:** ACME (Let's Encrypt, ZeroSSL, Buypass, Google Trust Services), Local CA (self-signed or sub-CA), step-ca, Vault PKI, DigiCert CertCentral, OpenSSL/Custom CA script, Sectigo (coming soon).
|
||||
|
||||
**Targets:** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell or WinRM proxy), Postfix, Dovecot, F5 BIG-IP (coming soon).
|
||||
|
||||
See [Connector Reference](connectors.md) for configuration details on every issuer and target.
|
||||
@@ -29,15 +29,18 @@ The binary has zero runtime dependencies beyond the certctl server it connects t
|
||||
|
||||
## Configuration
|
||||
|
||||
The MCP server reads two environment variables:
|
||||
The MCP server reads three environment variables:
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `CERTCTL_SERVER_URL` | No | `http://localhost:8443` | URL of the certctl REST API |
|
||||
| `CERTCTL_SERVER_URL` | No | `https://localhost:8443` | URL of the certctl REST API (HTTPS-only as of v2.2) |
|
||||
| `CERTCTL_API_KEY` | No | (empty) | API key for authentication (passed as `Bearer` token) |
|
||||
| `CERTCTL_SERVER_CA_BUNDLE_PATH` | Yes (for self-signed / internal CA) | (empty) | Path to PEM CA bundle that signed the server cert. Required when the server cert isn't rooted in the system trust store (the default compose stack ships a self-signed cert at `deploy/test/certs/ca.crt`). |
|
||||
|
||||
If your certctl server has auth enabled (the default), you must provide the API key. The MCP server passes it through to every HTTP request.
|
||||
|
||||
Since v2.2 the certctl control plane is HTTPS-only. If the server cert is self-signed or chained to an internal CA, set `CERTCTL_SERVER_CA_BUNDLE_PATH` so the MCP server can verify the TLS handshake. Never set `CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY=true` outside local development — it disables all certificate validation.
|
||||
|
||||
## Setting Up with Claude Desktop
|
||||
|
||||
Add this to your Claude Desktop MCP configuration file (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS, `%APPDATA%\Claude\claude_desktop_config.json` on Windows):
|
||||
@@ -48,7 +51,8 @@ Add this to your Claude Desktop MCP configuration file (`~/Library/Application S
|
||||
"certctl": {
|
||||
"command": "/path/to/certctl-mcp",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "http://localhost:8443",
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/certctl/deploy/test/certs/ca.crt",
|
||||
"CERTCTL_API_KEY": "your-api-key-here"
|
||||
}
|
||||
}
|
||||
@@ -67,7 +71,8 @@ In Cursor, go to Settings → MCP Servers and add:
|
||||
"certctl": {
|
||||
"command": "/path/to/certctl-mcp",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "http://localhost:8443",
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/certctl/deploy/test/certs/ca.crt",
|
||||
"CERTCTL_API_KEY": "your-api-key-here"
|
||||
}
|
||||
}
|
||||
@@ -84,7 +89,8 @@ Add certctl as an MCP server in your project's `.mcp.json`:
|
||||
"certctl": {
|
||||
"command": "/path/to/certctl-mcp",
|
||||
"env": {
|
||||
"CERTCTL_SERVER_URL": "http://localhost:8443",
|
||||
"CERTCTL_SERVER_URL": "https://localhost:8443",
|
||||
"CERTCTL_SERVER_CA_BUNDLE_PATH": "/path/to/certctl/deploy/test/certs/ca.crt",
|
||||
"CERTCTL_API_KEY": "your-api-key-here"
|
||||
}
|
||||
}
|
||||
@@ -94,7 +100,7 @@ Add certctl as an MCP server in your project's `.mcp.json`:
|
||||
|
||||
## Available Tools
|
||||
|
||||
The MCP server registers 78 tools organized across 16 resource domains:
|
||||
The MCP server exposes the full REST API organized across 16 resource domains:
|
||||
|
||||
| Domain | Tools | Examples |
|
||||
|--------|-------|---------|
|
||||
@@ -153,7 +159,7 @@ flowchart LR
|
||||
AI <-->|"stdio"| MCP
|
||||
MCP -->|"HTTP + Bearer token"| SERVER
|
||||
|
||||
MCP ~~~ TOOLS["78 tools · 16 domains\nTyped input structs"]
|
||||
MCP ~~~ TOOLS["REST API via MCP · 16 domains\nTyped input structs"]
|
||||
```
|
||||
|
||||
The MCP server is intentionally thin:
|
||||
|
||||
@@ -0,0 +1,275 @@
|
||||
# Migrate from acme.sh to certctl
|
||||
|
||||
You use acme.sh to automate Let's Encrypt renewal across multiple servers. It works — but without centralized visibility, deployment verification, or policy enforcement.
|
||||
|
||||
This guide walks through moving your acme.sh workload to certctl while keeping your existing DNS provider setup.
|
||||
|
||||
## Why Migrate
|
||||
|
||||
**acme.sh strength:** Lightweight agent, works everywhere, integrates with any DNS provider via shell script hooks.
|
||||
|
||||
**acme.sh limitations:**
|
||||
- No inventory visibility — certificates scattered across servers, no unified view of expiry dates or renewal status
|
||||
- No deployment verification — cron job succeeds even if cert doesn't actually take effect on the service
|
||||
- No policy enforcement — no way to require approval, audit who renewed what, or prevent misconfigurations
|
||||
- No multi-server orchestration — each server manages its own renewals; no way to batch test or rollback
|
||||
|
||||
certctl adds a control plane that sees all your certificates, deploys with verification, enforces policy, and provides a complete audit trail. You keep the DNS-01 challenge scripts you already have.
|
||||
|
||||
## What You Keep
|
||||
|
||||
- **Existing certificates** — discovered automatically during migration, claimed in the dashboard
|
||||
- **DNS provider scripts** — acme.sh's `dns_*` hooks are shell-script compatible with certctl's DNS-01 implementation
|
||||
- **Same Let's Encrypt account** — ACME issuer in certctl uses the same account and email
|
||||
|
||||
## Migration Steps
|
||||
|
||||
### 1. Deploy certctl Server
|
||||
|
||||
Start with Docker Compose (5 minutes):
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shankar0123/certctl.git
|
||||
cd certctl/deploy
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
Access the dashboard at `https://localhost:8443` with the API key from `.env`. The default compose stack ships a self-signed cert; pin with `--cacert ./deploy/test/certs/ca.crt` when calling the API from the host.
|
||||
|
||||
### 2. Deploy Agents
|
||||
|
||||
On each server running acme.sh certs, install the certctl agent:
|
||||
|
||||
```bash
|
||||
curl -sSL https://raw.githubusercontent.com/shankar0123/certctl/master/install-agent.sh | bash
|
||||
# Prompted for server URL and API key
|
||||
```
|
||||
|
||||
Or manually:
|
||||
|
||||
```bash
|
||||
# Download and install agent binary
|
||||
wget https://github.com/shankar0123/certctl/releases/download/v2.1.0/certctl-agent-linux-amd64
|
||||
chmod +x certctl-agent-linux-amd64
|
||||
sudo mv certctl-agent-linux-amd64 /usr/local/bin/certctl-agent
|
||||
|
||||
# Create systemd unit
|
||||
sudo tee /etc/systemd/system/certctl-agent.service > /dev/null <<EOF
|
||||
[Unit]
|
||||
Description=certctl Agent
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/certctl-agent
|
||||
Environment="CERTCTL_SERVER_URL=https://certctl.internal:8443"
|
||||
Environment="CERTCTL_API_KEY=your-api-key-here"
|
||||
Environment="CERTCTL_DISCOVERY_DIRS=~/.acme.sh"
|
||||
Restart=always
|
||||
RestartSec=10s
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now certctl-agent
|
||||
```
|
||||
|
||||
### 3. Discover Existing acme.sh Certificates
|
||||
|
||||
acme.sh stores certificates in `~/.acme.sh/<domain>/` (or `/etc/acme.sh/` if installed system-wide).
|
||||
|
||||
When you start the agent with `CERTCTL_DISCOVERY_DIRS` pointing to those directories, it scans for existing PEM/DER certificates and reports fingerprints to the control plane. The dashboard's **Discovery** page shows what was found.
|
||||
|
||||
Example agent systemd service (using home directory):
|
||||
|
||||
```bash
|
||||
Environment="CERTCTL_DISCOVERY_DIRS=/home/user/.acme.sh"
|
||||
```
|
||||
|
||||
Or for system-wide acme.sh:
|
||||
|
||||
```bash
|
||||
Environment="CERTCTL_DISCOVERY_DIRS=/etc/acme.sh"
|
||||
```
|
||||
|
||||
### 4. Claim Discovered Certificates
|
||||
|
||||
In the **Discovery** page:
|
||||
1. Review the "Unmanaged" certificates found by the agent
|
||||
2. Click **Claim** on each acme.sh certificate
|
||||
3. Enter the managed certificate ID to link it (e.g., `mc-api-prod`)
|
||||
|
||||
Once claimed, the certificate appears in the main **Certificates** page with ownership, renewal history, and deployment status.
|
||||
|
||||
### 5. Create an ACME Issuer
|
||||
|
||||
In **Issuers** → **+ New Issuer:**
|
||||
|
||||
1. Select **ACME** from the issuer type grid
|
||||
2. Fill in the type-specific fields: name, directory URL (`https://acme-v02.api.letsencrypt.org/directory`), and config
|
||||
|
||||
Or configure via environment variables:
|
||||
```bash
|
||||
export CERTCTL_ACME_DIRECTORY_URL=https://acme-v02.api.letsencrypt.org/directory
|
||||
export CERTCTL_ACME_EMAIL=your-email@example.com # same as your acme.sh account
|
||||
export CERTCTL_ACME_CHALLENGE_TYPE=dns-01
|
||||
```
|
||||
|
||||
### 6. Adapt Your DNS Provider Scripts
|
||||
|
||||
acme.sh uses `dns_*` hooks (e.g., `dns_cloudflare`) with predictable argument patterns. certctl's DNS-01 uses the same pattern, so your scripts often work with zero changes.
|
||||
|
||||
**acme.sh pattern:**
|
||||
```bash
|
||||
# acme.sh invokes: dns_cloudflare_add "domain" "record" "value"
|
||||
dns_cloudflare_add() {
|
||||
local full_domain=$1
|
||||
local record_name=$2
|
||||
local record_value=$3
|
||||
# ... DNS API call to create TXT record ...
|
||||
}
|
||||
```
|
||||
|
||||
**certctl pattern:**
|
||||
```bash
|
||||
# certctl invokes: /path/to/dns-present-script
|
||||
# Scripts receive environment variables:
|
||||
#!/bin/bash
|
||||
# CERTCTL_DNS_DOMAIN — domain name (e.g., "example.com")
|
||||
# CERTCTL_DNS_FQDN — full record name (e.g., "_acme-challenge.example.com")
|
||||
# CERTCTL_DNS_VALUE — TXT record value (key authorization digest)
|
||||
# CERTCTL_DNS_TOKEN — ACME challenge token
|
||||
# Create TXT record at "${CERTCTL_DNS_FQDN}" with value "${CERTCTL_DNS_VALUE}"
|
||||
```
|
||||
|
||||
**Example: Cloudflare DNS-01 adapter**
|
||||
|
||||
If you have an acme.sh Cloudflare hook, adapt it:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /etc/certctl/dns/cloudflare-present.sh
|
||||
set -e
|
||||
|
||||
# certctl passes these environment variables:
|
||||
# CERTCTL_DNS_DOMAIN — domain name
|
||||
# CERTCTL_DNS_FQDN — full record name (e.g., "_acme-challenge.example.com")
|
||||
# CERTCTL_DNS_VALUE — TXT record value
|
||||
# CERTCTL_DNS_TOKEN — ACME challenge token
|
||||
|
||||
# Call your existing Cloudflare API (example using curl)
|
||||
curl -X POST "https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records" \
|
||||
-H "X-Auth-Email: ${CF_EMAIL}" \
|
||||
-H "X-Auth-Key: ${CF_KEY}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"type\":\"TXT\",\"name\":\"${CERTCTL_DNS_FQDN}\",\"content\":\"${CERTCTL_DNS_VALUE}\"}"
|
||||
|
||||
echo "Created ${CERTCTL_DNS_FQDN}"
|
||||
```
|
||||
|
||||
DNS cleanup:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /etc/certctl/dns/cloudflare-cleanup.sh
|
||||
|
||||
# certctl passes these environment variables:
|
||||
# CERTCTL_DNS_DOMAIN — domain name
|
||||
# CERTCTL_DNS_FQDN — full record name (e.g., "_acme-challenge.example.com")
|
||||
# CERTCTL_DNS_VALUE — TXT record value
|
||||
# CERTCTL_DNS_TOKEN — ACME challenge token
|
||||
|
||||
# Query and delete the TXT record
|
||||
curl -X DELETE "https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/dns_records/${RECORD_ID}" \
|
||||
-H "X-Auth-Email: ${CF_EMAIL}" \
|
||||
-H "X-Auth-Key: ${CF_KEY}"
|
||||
```
|
||||
|
||||
Configure the ACME issuer via environment variables:
|
||||
|
||||
```bash
|
||||
export CERTCTL_ACME_DIRECTORY_URL=https://acme-v02.api.letsencrypt.org/directory
|
||||
export CERTCTL_ACME_EMAIL=your-email@example.com
|
||||
export CERTCTL_ACME_CHALLENGE_TYPE=dns-01
|
||||
export CERTCTL_ACME_DNS_PRESENT_SCRIPT=/etc/certctl/dns/cloudflare-present.sh
|
||||
export CERTCTL_ACME_DNS_CLEANUP_SCRIPT=/etc/certctl/dns/cloudflare-cleanup.sh
|
||||
```
|
||||
|
||||
Or create the issuer through the dashboard: **Issuers** → **+ New Issuer** → select **ACME** → fill in the config fields.
|
||||
|
||||
### 7. Create Renewal Policies
|
||||
|
||||
In **Policies** → **+ New Policy:**
|
||||
|
||||
- **Name:** e.g., "ACME DNS-01 Policy"
|
||||
- **Type:** `expiration_window` (enforces renewal thresholds)
|
||||
- **Severity:** `high`
|
||||
- **Config:** set your renewal window (default: 30 days before expiry)
|
||||
|
||||
Renewal scheduling is driven by the certificate's assigned profile and issuer. Policies add enforcement guardrails on top.
|
||||
|
||||
### 8. Phase Out acme.sh Cron
|
||||
|
||||
Once you verify renewals work via certctl (manually trigger one in the dashboard first), remove the acme.sh cron job:
|
||||
|
||||
```bash
|
||||
# Remove acme.sh from crontab
|
||||
crontab -e
|
||||
# Delete the line: "0 0 * * * /home/user/.acme.sh/acme.sh --cron --home /home/user/.acme.sh"
|
||||
|
||||
# OR disable the cron service if installed
|
||||
sudo systemctl disable acme-renew.timer
|
||||
```
|
||||
|
||||
## DNS Script Compatibility
|
||||
|
||||
Most acme.sh DNS provider hooks need only minor changes:
|
||||
|
||||
| acme.sh | certctl |
|
||||
|---------|---------|
|
||||
| Called on every renewal | Called once per challenge window |
|
||||
| Receives: domain, record name, record value as arguments | Receives: `CERTCTL_DNS_DOMAIN`, `CERTCTL_DNS_FQDN`, `CERTCTL_DNS_VALUE`, `CERTCTL_DNS_TOKEN` as environment variables |
|
||||
| Must support multiple concurrent records | Same — cleanup removes the specific token |
|
||||
| Environment variables for credentials | Same — pass via agent systemd `Environment=` or `.env` file |
|
||||
|
||||
**Real example:** If you use Route53, acme.sh's `dns_aws` hook submits via AWS CLI. Adapt it to use `${CERTCTL_DNS_FQDN}` and `${CERTCTL_DNS_VALUE}` environment variables instead of positional arguments, and it works with certctl's DNS-01.
|
||||
|
||||
## Coexistence Period
|
||||
|
||||
During migration, run both acme.sh and certctl in parallel:
|
||||
|
||||
1. Keep acme.sh cron running (low overhead, serves as fallback)
|
||||
2. Configure certctl policies and test renewal on 1-2 non-critical domains
|
||||
3. Monitor certctl's audit trail and deployment logs
|
||||
4. Once confident, disable acme.sh cron on those domains
|
||||
5. Roll out to remaining domains
|
||||
|
||||
This way, if certctl renewal fails, acme.sh's cron still renews the cert (you'll see duplicate renewals in the audit trail, but no gap).
|
||||
|
||||
## Next: DNS-PERSIST-01 (Zero-Touch Renewals)
|
||||
|
||||
After migrating to certctl + DNS-01, consider upgrading to **DNS-PERSIST-01**. Instead of creating/deleting DNS records on every renewal, you create one persistent TXT record at `_validation-persist.<domain>` that never changes. Let's Encrypt then validates against that standing record forever.
|
||||
|
||||
Benefits:
|
||||
- **Zero operational overhead per renewal** — no DNS API calls during renewal
|
||||
- **Auditable** — DNS record created once, visible to the team, never modified
|
||||
- **Vendor-agnostic** — works with any DNS provider that supports TXT records
|
||||
|
||||
To enable:
|
||||
|
||||
```bash
|
||||
export CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01
|
||||
export CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN=letsencrypt.org
|
||||
export CERTCTL_ACME_DNS_PRESENT_SCRIPT=/etc/certctl/dns/cloudflare-present.sh
|
||||
```
|
||||
|
||||
certctl automatically falls back to DNS-01 if the CA doesn't support dns-persist-01 yet.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Try the [Wildcard DNS-01 example](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md) — a working docker-compose with Cloudflare hooks you can adapt for your DNS provider
|
||||
- See [Connector Reference](connectors.md) for advanced ACME options (EAB, ARI, custom timeouts)
|
||||
- See [Discovery Guide](concepts.md#certificate-discovery) for managing discovered certificates at scale
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (ACME+NGINX, private CA, step-ca, multi-issuer)
|
||||
@@ -0,0 +1,173 @@
|
||||
# Migrating from Certbot to certctl
|
||||
|
||||
You have 50 Let's Encrypt certificates across 10 servers, managed by a mix of Certbot cron jobs and manual renewals. Certbot handles issuance, but you lack inventory visibility, centralized alerting, and audit trails. This guide walks you through moving to certctl while keeping your existing certificates and ACME account.
|
||||
|
||||
## Why Migrate
|
||||
|
||||
Certbot renews certs in isolation. If a renewal fails on one server, you don't know until the cert expires. certctl gives you a single pane of glass: see all certs across all servers, get alerts 30/14/7 days before expiry, track who renewed what when, and verify each deployment succeeded via TLS fingerprint validation.
|
||||
|
||||
## What You Keep
|
||||
|
||||
- Your existing Certbot ACME account key and Let's Encrypt account
|
||||
- All issued certificates in `/etc/letsencrypt/live/`
|
||||
- Certbot's renewal history and hooks
|
||||
|
||||
You will not re-issue any certificates. certctl discovers them and takes over renewal scheduling.
|
||||
|
||||
## Step-by-Step Migration
|
||||
|
||||
### 1. Deploy certctl Control Plane
|
||||
|
||||
Option A: Docker Compose (quickest for evaluation)
|
||||
```bash
|
||||
cd /opt/certctl
|
||||
docker compose up -d
|
||||
# Dashboard & API: https://localhost:8443 (self-signed cert — use --cacert ./deploy/test/certs/ca.crt for the default compose stack)
|
||||
# Default API key in logs (grep CERTCTL_API_KEY docker logs certctl-server)
|
||||
```
|
||||
|
||||
Option B: Kubernetes (Helm)
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--set auth.apiKey=YOUR_SECURE_KEY
|
||||
```
|
||||
|
||||
### 2. Deploy Agents to Each Server
|
||||
|
||||
On each of your 10 servers running Certbot:
|
||||
|
||||
```bash
|
||||
# Linux amd64 (adjust for your architecture)
|
||||
curl -sSL https://github.com/shankar0123/certctl/releases/download/v2.1.0/certctl-agent-linux-amd64 \
|
||||
-o /usr/local/bin/certctl-agent
|
||||
chmod +x /usr/local/bin/certctl-agent
|
||||
|
||||
# Create config
|
||||
sudo mkdir -p /etc/certctl /var/lib/certctl/keys
|
||||
sudo tee /etc/certctl/agent.env > /dev/null <<EOF
|
||||
CERTCTL_SERVER_URL=https://certctl-control-plane.example.com:8443
|
||||
CERTCTL_SERVER_CA_BUNDLE_PATH=/etc/certctl/tls/ca.crt
|
||||
CERTCTL_API_KEY=your-api-key-here
|
||||
CERTCTL_DISCOVERY_DIRS=/etc/letsencrypt/live
|
||||
CERTCTL_KEY_DIR=/var/lib/certctl/keys
|
||||
EOF
|
||||
sudo chmod 600 /etc/certctl/agent.env
|
||||
|
||||
# Start agent
|
||||
sudo systemctl start certctl-agent # if installed via script
|
||||
# OR manually:
|
||||
sudo certctl-agent --server https://... --api-key ... --discovery-dirs /etc/letsencrypt/live
|
||||
```
|
||||
|
||||
The agent will scan `/etc/letsencrypt/live/` and report all discovered certificates to the control plane.
|
||||
|
||||
### 3. Triage Discovered Certificates
|
||||
|
||||
In the certctl dashboard, go to **Discovery**:
|
||||
- See all discovered certs grouped by agent
|
||||
- Status shows "Unmanaged" for certificates not yet claimed
|
||||
- For each Certbot cert, click **Claim** and link it to managed inventory
|
||||
|
||||
The control plane now knows about all 50 certs and where they live.
|
||||
|
||||
### 4. Configure ACME Issuer
|
||||
|
||||
Go to **Issuers** → **+ New Issuer**:
|
||||
1. Select **ACME** from the issuer type grid
|
||||
2. Fill in the type-specific fields: name, directory URL (`https://acme-v02.api.letsencrypt.org/directory`), and any required config
|
||||
|
||||
Alternatively, configure via environment variables before starting the server:
|
||||
```bash
|
||||
export CERTCTL_ACME_DIRECTORY_URL=https://acme-v02.api.letsencrypt.org/directory
|
||||
export CERTCTL_ACME_EMAIL=your-email@example.com
|
||||
export CERTCTL_ACME_CHALLENGE_TYPE=http-01 # or dns-01 for wildcard certs
|
||||
```
|
||||
|
||||
For DNS-01, also set:
|
||||
```bash
|
||||
export CERTCTL_ACME_DNS_PRESENT_SCRIPT=/etc/certctl/dns/present.sh
|
||||
export CERTCTL_ACME_DNS_CLEANUP_SCRIPT=/etc/certctl/dns/cleanup.sh
|
||||
```
|
||||
|
||||
certctl uses the same Let's Encrypt account; no new credentials needed.
|
||||
|
||||
### 5. Create Renewal Policies
|
||||
|
||||
Go to **Policies** → **+ New Policy** to create enforcement rules:
|
||||
- Name: e.g., "ACME Renewal Policy"
|
||||
- Type: `expiration_window` (to enforce renewal thresholds)
|
||||
- Severity: `high`
|
||||
- Config: set your renewal threshold (default: 30 days before expiry)
|
||||
|
||||
Renewal scheduling is driven by the certificate's assigned profile and issuer. Policies add enforcement guardrails (key algorithm requirements, expiration windows, etc.).
|
||||
|
||||
### 6. Disable Certbot Cron, One Server at a Time
|
||||
|
||||
On the first server (start with a low-traffic one):
|
||||
|
||||
```bash
|
||||
# Stop Certbot renewal
|
||||
sudo systemctl disable certbot.timer
|
||||
sudo systemctl stop certbot.timer
|
||||
|
||||
# Or remove the cron job
|
||||
sudo rm /etc/cron.d/certbot # if managed by cron
|
||||
```
|
||||
|
||||
Monitor that server in the certctl dashboard. Certctl will renew the cert ~30 days before expiry.
|
||||
|
||||
### 7. Verify First Renewal Succeeds
|
||||
|
||||
Wait for the renewal to trigger (or manually trigger it in **Certificates** → select cert → **Renew**). Check the dashboard:
|
||||
- **Certificates** page: status transitions from `Active` to `Renewing` to `Active`
|
||||
- **Jobs** page: renewal job shows `Completed` status
|
||||
- **Verification** tab: TLS check confirms the new cert is deployed and live
|
||||
|
||||
After verifying, disable Certbot on the remaining 9 servers.
|
||||
|
||||
### 8. Enable Alerting
|
||||
|
||||
Configure notifiers via environment variables before starting the server:
|
||||
```bash
|
||||
# Example: Slack alerting
|
||||
export CERTCTL_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
|
||||
docker compose up -d
|
||||
|
||||
# Or email alerting
|
||||
export CERTCTL_SMTP_HOST=smtp.gmail.com
|
||||
export CERTCTL_SMTP_PORT=587
|
||||
export CERTCTL_SMTP_USERNAME=your-email@gmail.com
|
||||
export CERTCTL_SMTP_PASSWORD=your-app-password
|
||||
export CERTCTL_SMTP_FROM_ADDRESS=certctl@example.com
|
||||
docker compose up -d
|
||||
|
||||
# Other options: CERTCTL_TEAMS_WEBHOOK_URL, CERTCTL_PAGERDUTY_ROUTING_KEY, CERTCTL_OPSGENIE_API_KEY
|
||||
```
|
||||
|
||||
Now you get 30/14/7-day warnings before any cert expires, across all 10 servers, in one place.
|
||||
|
||||
## What Changes
|
||||
|
||||
- **Renewal**: Agent polls certctl for work instead of Certbot cron triggering locally. Faster failure detection (agent heartbeat every 60 seconds vs. cron running once a day).
|
||||
- **Deployment**: certctl verifies post-deployment by probing the live TLS endpoint and comparing SHA-256 fingerprints. Catches reload failures silently.
|
||||
- **Audit Trail**: Every renewal, deployment, and alert is logged immutably. Answer "who renewed cert X when and why" within seconds.
|
||||
- **Alerting**: Threshold-based alerts to Slack/email/webhook 30/14/7 days before expiry, not when cert expires.
|
||||
|
||||
## Coexistence and Rollback
|
||||
|
||||
During migration, certctl and Certbot can run simultaneously. The agent will discover Certbot certs even while Certbot continues renewing them. Run both for a week to build confidence.
|
||||
|
||||
**If you need to rollback**: Re-enable Certbot cron on any server:
|
||||
```bash
|
||||
sudo systemctl enable certbot.timer
|
||||
sudo systemctl start certbot.timer
|
||||
```
|
||||
|
||||
certctl will stop renewing that cert when the policy is disabled. Certbot resumes as before. Your certificates and ACME account remain untouched.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Try the [ACME + NGINX example](../examples/acme-nginx/acme-nginx.md) — a working docker-compose you can run locally before deploying to production
|
||||
- Review the [Concepts Guide](./concepts.md) for terminology (profiles, policies, agents, jobs)
|
||||
- Explore [Network Discovery](./quickstart.md#network-discovery-agentless) to find certificates you didn't know about
|
||||
- See all [Deployment Examples](./examples.md) for other scenarios (wildcard DNS-01, private CA, step-ca, multi-issuer)
|
||||
@@ -68,8 +68,10 @@ The spec organizes endpoints into 16 tags:
|
||||
The spec declares a `bearerAuth` security scheme applied globally. All endpoints under `/api/v1/` require a Bearer token by default:
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer your-api-key" \
|
||||
http://localhost:8443/api/v1/certificates
|
||||
# The default compose stack uses a self-signed cert; pin with --cacert
|
||||
curl --cacert ./deploy/test/certs/ca.crt \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
https://localhost:8443/api/v1/certificates
|
||||
```
|
||||
|
||||
Three endpoints are exempt from auth (declared with `security: []` in the spec): `/health`, `/ready`, and `/api/v1/auth/info`. The auth info endpoint tells clients whether authentication is enabled and what type is required — useful for GUIs that need to show/hide a login screen.
|
||||
@@ -150,8 +152,9 @@ Import the spec directly into Postman:
|
||||
|
||||
1. Open Postman → Import → File → select `api/openapi.yaml`
|
||||
2. Postman creates a collection with all 78 documented operations organized by tag
|
||||
3. Set the `baseUrl` variable to `http://localhost:8443`
|
||||
3. Set the `baseUrl` variable to `https://localhost:8443` (HTTPS-only as of v2.2)
|
||||
4. Add an `Authorization: Bearer your-api-key` header to the collection
|
||||
5. Import the demo stack CA bundle (`deploy/test/certs/ca.crt`) into Postman's Settings → Certificates → CA Certificates, or disable certificate verification for the `localhost` host (Settings → General → SSL certificate verification)
|
||||
|
||||
## Key Schemas
|
||||
|
||||
@@ -176,8 +179,10 @@ Use the spec to generate contract tests that verify the API matches the spec:
|
||||
```bash
|
||||
# Using schemathesis for fuzz testing against the spec
|
||||
pip install schemathesis
|
||||
# The default compose stack uses a self-signed cert — export a CA bundle or set REQUESTS_CA_BUNDLE
|
||||
export REQUESTS_CA_BUNDLE=$(pwd)/deploy/test/certs/ca.crt
|
||||
schemathesis run api/openapi.yaml \
|
||||
--base-url http://localhost:8443 \
|
||||
--base-url https://localhost:8443 \
|
||||
--header "Authorization: Bearer your-api-key"
|
||||
```
|
||||
|
||||
|
||||
@@ -0,0 +1,297 @@
|
||||
# QA Test Suite Guide (`qa_test.go`)
|
||||
|
||||
> **Audience:** Anyone running release QA for certctl — whether you're a first-time contributor or the maintainer cutting a release tag.
|
||||
>
|
||||
> **Companion to:** `docs/testing-guide.md` (the *what* to test). This document explains the *how* — the automated test file, what it covers, what it skips, and how to fill the gaps manually.
|
||||
|
||||
---
|
||||
|
||||
## What Is This File?
|
||||
|
||||
`deploy/test/qa_test.go` is a single Go test file (~1700 lines) that automates as much of `docs/testing-guide.md` as possible against a running certctl Docker Compose demo stack. It replaces the legacy `qa-smoke-test.sh` bash script.
|
||||
|
||||
It covers **all 54 Parts** of the testing guide:
|
||||
|
||||
- **~164 automated subtests** — API calls, database queries, source file checks, performance benchmarks
|
||||
- **11 skipped Parts** — with documented reasons (external CAs, Windows, browser-only, etc.)
|
||||
- **Remaining ~282 manual tests** — GUI flows, scheduler timing, Docker log inspection — must be done by a human following `docs/testing-guide.md`
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌────────────────────────┐ ┌──────────────────────────┐
|
||||
│ qa_test.go │────▶│ certctl demo stack │
|
||||
│ (//go:build qa) │ │ docker-compose.yml + │
|
||||
│ │ │ docker-compose.demo.yml │
|
||||
│ TestQA(t *testing.T) │ │ │
|
||||
│ ├─ Part01_Infra │ │ ┌─ certctl-server :8443 │
|
||||
│ ├─ Part02_Auth │ │ ├─ postgres :5432 │
|
||||
│ ├─ Part03_CertCRUD │ │ └─ certctl-agent │
|
||||
│ ├─ ... │ └──────────────────────────┘
|
||||
│ └─ Part52_HelmChart │
|
||||
└────────────────────────┘
|
||||
```
|
||||
|
||||
Key design choices:
|
||||
|
||||
- **Build tag:** `//go:build qa` — never runs during `go test ./...` or CI. Only runs when explicitly requested.
|
||||
- **Package:** `integration_test` — same package as `integration_test.go` (which uses `//go:build integration` for the test stack). They coexist but never run together.
|
||||
- **Zero internal imports:** Uses only stdlib + `lib/pq` (from `go.mod`). All API interactions are plain HTTP. All JSON is decoded into lightweight local structs (`qaCert`, `qaJob`, etc.) — not the internal domain types.
|
||||
- **Self-cleaning:** Tests that create data use `t.Cleanup()` to delete it afterward. The seed data is not modified.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Docker Compose demo stack running:**
|
||||
```bash
|
||||
cd deploy
|
||||
docker compose -f docker-compose.yml -f docker-compose.demo.yml up --build -d
|
||||
```
|
||||
Wait ~15 seconds for health checks to pass.
|
||||
|
||||
2. **Go 1.22+** installed (the project uses Go 1.25 in `go.mod`, but 1.22+ works for running tests).
|
||||
|
||||
3. **PostgreSQL port exposed** — the demo stack exposes port 5432 for database verification tests (table counts, schema checks).
|
||||
|
||||
4. **Repository checkout** — source file verification tests (`fileExists`, `fileContains`) read files relative to `qaRepoDir` (default: `../..` from `deploy/test/`).
|
||||
|
||||
## Running the Tests
|
||||
|
||||
### Full suite
|
||||
```bash
|
||||
cd deploy/test
|
||||
go test -tags qa -v -timeout 10m ./...
|
||||
```
|
||||
|
||||
### Single Part
|
||||
```bash
|
||||
go test -tags qa -v -run TestQA/Part03 ./...
|
||||
```
|
||||
|
||||
### Single subtest
|
||||
```bash
|
||||
go test -tags qa -v -run TestQA/Part03_CertCRUD/Create_Minimal ./...
|
||||
```
|
||||
|
||||
### With custom environment
|
||||
```bash
|
||||
CERTCTL_QA_SERVER_URL=https://staging.internal:8443 \
|
||||
CERTCTL_QA_API_KEY=my-staging-key \
|
||||
CERTCTL_QA_DB_URL=postgres://certctl:secret@db.internal:5432/certctl?sslmode=require \
|
||||
CERTCTL_QA_REPO_DIR=/path/to/certctl \
|
||||
go test -tags qa -v -timeout 10m ./...
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `CERTCTL_QA_SERVER_URL` | `https://localhost:8443` | certctl server URL (HTTPS-only as of v2.2) |
|
||||
| `CERTCTL_QA_API_KEY` | `change-me-in-production` | API key for Bearer auth |
|
||||
| `CERTCTL_QA_DB_URL` | `postgres://certctl:certctl@localhost:5432/certctl?sslmode=disable` | PostgreSQL connection string |
|
||||
| `CERTCTL_QA_REPO_DIR` | `../..` | Path to certctl repo root (for source file checks) |
|
||||
| `CERTCTL_QA_CA_BUNDLE` | `./certs/ca.crt` | PEM CA bundle pinned for TLS verification. The demo stack's `certctl-tls-init` container writes here. |
|
||||
| `CERTCTL_QA_INSECURE` | `false` | Set to `"true"` to skip TLS verification (e.g. before the init container finishes). Never use outside the demo harness. |
|
||||
|
||||
## Part-by-Part Coverage Map
|
||||
|
||||
This table shows what each Part tests and what's left for manual verification.
|
||||
|
||||
| Part | Testing Guide Section | Automated Subtests | What's Automated | What's Manual |
|
||||
|------|----------------------|-------------------|-----------------|--------------|
|
||||
| 1 | Infrastructure & Deployment | 8 | Table count, health/ready endpoints, seed data counts (certs, agents, issuers, targets, policies) | Docker container health, log inspection, volume mounts |
|
||||
| 2 | Authentication & Security | 4 | No-auth 401, bad-key 401, health-no-auth 200, no private keys in API | CORS preflight, rate limiting (429 + Retry-After), TLS config |
|
||||
| 3 | Certificate Lifecycle | 10 | Create (minimal + full), get, 404, list pagination, status/issuer filters, sparse fields, update, archive | Deployment trigger, version history, certificate detail UI |
|
||||
| 4 | Renewal Workflow | 3 | Trigger renewal, 404 on nonexistent, agent work endpoint | AwaitingCSR flow, agent key generation, full issuance cycle |
|
||||
| 5 | Revocation | 5 | Revoke (default reason), already-revoked, nonexistent, invalid reason, CRL JSON | DER CRL, OCSP responder, revocation notifications |
|
||||
| 6 | Policies & Profiles | 6 | Policy CRUD (create/delete), invalid type 400, profile CRUD, list | Policy violation detection, profile enforcement on CSR |
|
||||
| 7 | Ownership & Teams | 4 | Team CRUD, owner CRUD, agent groups list | Owner notification routing, dynamic group matching |
|
||||
| 8 | Job System | 2 | List jobs, 404 on nonexistent | Job state transitions, approval workflow, cancellation |
|
||||
| 9 | Issuer Connectors | 4 | List, get detail, create (GenericCA), missing name 400 | Test connection, issuer-specific issuance flow |
|
||||
| 10 | Sub-CA Mode | SKIP | — | Requires CA cert+key on disk |
|
||||
| 11 | ACME ARI | SKIP | — | Requires ARI-capable CA |
|
||||
| 12 | Vault PKI | SKIP | — | Requires live Vault server |
|
||||
| 13 | DigiCert | SKIP | — | Requires DigiCert sandbox |
|
||||
| 14 | Target Connectors | 3 | List, create NGINX target, delete 204 | Deploy to real target, validate deployment |
|
||||
| 15–17 | Apache/HAProxy, Traefik/Caddy, IIS | — | (Covered by source checks in Parts 42–46) | Requires real services or Windows |
|
||||
| 18 | Agent Operations | 3 | Heartbeat (register), metadata check, auto-create on heartbeat | Agent binary behavior, key storage, discovery scan |
|
||||
| 19 | Agent Work Routing | 1 | Empty work for agent with no targets | Scoped job assignment, multi-target fan-out |
|
||||
| 20 | Post-Deployment Verification | 1 | 404 on nonexistent job verification | TLS probing, fingerprint comparison |
|
||||
| 21 | EST Server | 2 | CACerts (200 + content-type), CSRAttrs (200/204) | simpleenroll with CSR, simplereenroll, PKCS#7 parsing |
|
||||
| 22 | Certificate Export | 3 | PEM export, PKCS#12 export, 404 on nonexistent | Download mode, file content validation |
|
||||
| 25 | Certificate Discovery | 5 | List discovered, summary, list scan targets, create target, invalid CIDR 400 | Agent filesystem scan, claim/dismiss workflow |
|
||||
| 26 | Enhanced Query API | 4 | Sort descending, cursor pagination, time-range filter, invalid sort field | Field projection correctness, cursor token cycling |
|
||||
| 27 | Request Body Size Limits | 1 | 2MB body rejected (413/400) | Exact limit boundary (1MB) |
|
||||
| 28 | CLI | SKIP | — | Requires compiled `certctl-cli` binary |
|
||||
| 29 | MCP Server | SKIP | — | Requires compiled `mcp-server` binary + stdio |
|
||||
| 30 | Observability | 7 | Dashboard summary, certs by status, expiration timeline, job trends, issuance rate, JSON metrics (uptime + gauges), Prometheus (content-type + 4 metric names) | Chart rendering (GUI), Grafana import |
|
||||
| 31 | Notifications | 2 | List, 404 on nonexistent | Notification content, mark-read, email/Slack delivery |
|
||||
| 32 | Audit Trail | 3 | List events (≥10), PUT immutability, DELETE immutability | Actor attribution, body hash, time range filters |
|
||||
| 33 | Background Scheduler | SKIP | — | Timing-dependent; verify via Docker logs |
|
||||
| 34 | Structured Logging | SKIP | — | Requires Docker log inspection |
|
||||
| 35 | GUI Testing | SKIP | — | Requires browser |
|
||||
| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
|
||||
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
|
||||
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
|
||||
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
|
||||
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
|
||||
| 44 | SSH Target | 4 | Domain type, connector file, agent dispatch (`sshconn`), OpenAPI | SSH deployment test (requires target host) |
|
||||
| 45 | Windows Certificate Store | 3 | Domain type, connector file, shared certutil package | Windows deployment (requires Windows) |
|
||||
| 46 | Java Keystore | 3 | Domain type, connector file, OpenAPI | JKS deployment (requires keytool) |
|
||||
| 47 | Certificate Digest Email | 3 | Preview endpoint (200/503), service file, adapter file | SMTP delivery, HTML template rendering |
|
||||
| 48 | Dynamic Issuer Config | 4 | Crypto package exists, create ACME issuer via API, config redaction check, migration exists | Test connection flow, registry rebuild |
|
||||
| 49 | Dynamic Target Config | 2 | Create NGINX target via API, migration exists | Test connection via agent heartbeat |
|
||||
| 50 | Onboarding Wizard | 2 | Wizard component exists, docker-compose split (clean vs demo) | Wizard UI flow, step completion |
|
||||
| 51 | ACME Profile Selection | 3 | Profile module exists, frontend config, RFC 9702→9773 renumber check | Profile-aware issuance against real CA |
|
||||
| 52 | Helm Chart | 5 | Chart.yaml, values.yaml, 4 templates exist, securityContext, health probes | `helm template` rendering, `helm install` |
|
||||
| 53 | Kubernetes Secrets Target Connector (M47) | 18 | Config validation (namespace DNS-1123, secret name DNS subdomain, label keys, required fields), deployment (create/update Secret, chain concatenation, error propagation), validation (serial comparison, not-found, empty cert) | GUI target wizard KubernetesSecrets fields (namespace, secret_name, labels, kubeconfig_path), Helm RBAC toggle, TargetDetailPage type label |
|
||||
| 54 | AWS ACM Private CA Issuer Connector (M47) | 23 | Config validation (region, CA ARN regex, signing algorithm whitelist, validity_days, defaults), issuance (full flow, empty CSR, errors), renewal (reuses issuance), revocation (reason mapping, default, errors), GetOrderStatus completed, GetCACertPEM (success/chain/error), GetRenewalInfo nil | GUI issuer wizard AWSACMPCA fields (region, ca_arn, signing_algorithm, validity_days, template_arn), seed data visibility, create issuer flow |
|
||||
|
||||
**Totals:** ~164 automated subtests, 11 fully skipped Parts, ~282 manual tests remaining.
|
||||
|
||||
## Test Categories
|
||||
|
||||
The automated tests fall into four categories:
|
||||
|
||||
### 1. API Integration Tests (majority)
|
||||
Make real HTTP requests to the running server and verify status codes, response structure, and JSON field values. Examples:
|
||||
- `POST /api/v1/certificates` with valid payload → 201
|
||||
- `GET /api/v1/certificates?status=Active` → all returned certs have `status: "Active"`
|
||||
- `DELETE /api/v1/certificates/mc-qa-full` → 204
|
||||
|
||||
### 2. Database Verification Tests
|
||||
Connect directly to PostgreSQL and verify schema state:
|
||||
- Table count ≥ 19 (from migrations 000001–000010)
|
||||
- Useful for catching migration regressions
|
||||
|
||||
### 3. Source File Verification Tests
|
||||
Read files from the repo checkout and verify structure:
|
||||
- Domain types exist in `internal/domain/connector.go` (e.g., `TargetTypeEnvoy`)
|
||||
- Connector implementations exist (e.g., `internal/connector/target/envoy/envoy.go`)
|
||||
- Documentation contains expected content (all issuer/target types listed)
|
||||
- No stale RFC 9702 references (replaced by RFC 9773)
|
||||
|
||||
### 4. Performance Spot Checks
|
||||
Timed API requests with threshold assertions:
|
||||
- `GET /api/v1/certificates?per_page=15` < 200ms
|
||||
- `GET /api/v1/stats/summary` < 500ms
|
||||
- `GET /api/v1/metrics/prometheus` < 300ms
|
||||
|
||||
## What This Test Does NOT Cover
|
||||
|
||||
These gaps must be filled by manual testing per `docs/testing-guide.md`:
|
||||
|
||||
### External CA Integrations (Parts 10–13)
|
||||
- **Sub-CA mode** — requires CA cert+key files on disk
|
||||
- **ACME ARI** — requires a CA that supports RFC 9773 Renewal Information
|
||||
- **Vault PKI** — requires a running HashiCorp Vault instance
|
||||
- **DigiCert / Sectigo / Google CAS** — requires sandbox API credentials
|
||||
|
||||
### Browser/GUI Testing (Parts 35–37, 50)
|
||||
- Dashboard chart rendering (Recharts)
|
||||
- Onboarding wizard step-by-step flow
|
||||
- Issuer catalog card layout and create wizard
|
||||
- Bulk operations UI (multi-select, progress bars)
|
||||
- Discovery triage workflow
|
||||
|
||||
### Real Deployment Testing (Parts 15–17)
|
||||
- NGINX/Apache/HAProxy file write + reload
|
||||
- Traefik/Caddy file provider or API reload
|
||||
- IIS PowerShell/WinRM (requires Windows)
|
||||
- F5 BIG-IP iControl REST (requires appliance or mock)
|
||||
- SSH agentless deployment (requires target host)
|
||||
|
||||
### Agent Binary Behavior (Parts 18, 28–29)
|
||||
- Agent-side ECDSA key generation and CSR submission
|
||||
- Agent filesystem discovery scan
|
||||
- CLI tool (`certctl-cli`) — all 10 subcommands
|
||||
- MCP server (`mcp-server`) — stdio transport
|
||||
|
||||
### Timing-Dependent Tests (Parts 33–34)
|
||||
- Background scheduler loop execution (renewal, jobs, health, notifications, digest, network scan)
|
||||
- Structured logging format verification (requires Docker log parsing)
|
||||
|
||||
## How This Relates to `integration_test.go`
|
||||
|
||||
Both files live in `deploy/test/` in the same Go package (`integration_test`):
|
||||
|
||||
| | `qa_test.go` | `integration_test.go` |
|
||||
|---|---|---|
|
||||
| **Build tag** | `//go:build qa` | `//go:build integration` |
|
||||
| **Target stack** | Demo (`docker-compose.yml` + `docker-compose.demo.yml`) | Test (`docker-compose.test.yml`) |
|
||||
| **Port** | 8443 | Different (test stack config) |
|
||||
| **Seed data** | `seed_demo.sql` (32 certs, 8 agents, realistic history) | Minimal (created by tests) |
|
||||
| **CA backends** | Local CA only (demo mode) | Pebble ACME, step-ca, NGINX |
|
||||
| **Purpose** | Release QA — broad coverage, spot checks | Functional — end-to-end issuance, renewal, revocation against real CAs |
|
||||
| **Run frequency** | Before each release tag | CI on every PR |
|
||||
|
||||
They are complementary. Integration tests prove the machinery works. QA tests prove the product works at release quality.
|
||||
|
||||
## Seed Data Reference
|
||||
|
||||
The QA tests depend on `migrations/seed_demo.sql`. Key IDs used:
|
||||
|
||||
### Certificates (32 total)
|
||||
`mc-api-prod`, `mc-web-prod`, `mc-pay-prod`, `mc-dash-prod`, `mc-data-prod`, `mc-search-prod`, `mc-admin-prod`, `mc-blog-prod`, `mc-docs-prod`, `mc-status-prod`, `mc-grpc-prod`, `mc-vault-prod`, `mc-consul-prod`, `mc-shop-prod`, `mc-auth-prod`, `mc-cdn-prod`, `mc-mail-prod`, `mc-ci-prod`, `mc-legacy-prod`, `mc-old-api`, `mc-wiki-prod`, `mc-api-stg`, `mc-web-stg`, `mc-pay-stg`, `mc-api-dev`, `mc-grafana-prod`, `mc-vpn-prod`, `mc-wildcard-prod`, `mc-compromised`, `mc-edge-eu`, `mc-k8s-ingress`, `mc-smime-bob`
|
||||
|
||||
### Agents (9 total)
|
||||
`ag-web-prod`, `ag-web-staging`, `ag-lb-prod`, `ag-iis-prod`, `ag-data-prod`, `ag-edge-01`, `ag-k8s-prod`, `ag-mac-dev`, `server-scanner` (sentinel)
|
||||
|
||||
### Issuers (9 total)
|
||||
`iss-local`, `iss-acme-le`, `iss-stepca`, `iss-acme-zs`, `iss-openssl`, `iss-vault`, `iss-digicert`, `iss-sectigo`, `iss-googlecas`
|
||||
|
||||
### Targets (8 total)
|
||||
`tgt-nginx-prod`, `tgt-nginx-staging`, `tgt-haproxy-prod`, `tgt-apache-prod`, `tgt-iis-prod`, `tgt-traefik-prod`, `tgt-caddy-prod`, `tgt-nginx-data`
|
||||
|
||||
### Network Scan Targets (4 total)
|
||||
`nst-dc1-web`, `nst-dc2-apps`, `nst-dmz`, `nst-edge`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Server unreachable" on startup
|
||||
The test pings `GET /health` before running anything. If this fails:
|
||||
```bash
|
||||
# Check if the stack is running
|
||||
docker compose -f docker-compose.yml -f docker-compose.demo.yml ps
|
||||
|
||||
# Check server logs
|
||||
docker compose -f docker-compose.yml -f docker-compose.demo.yml logs certctl-server
|
||||
|
||||
# Check if the port is exposed (self-signed cert — pin CA bundle)
|
||||
curl --cacert ./deploy/test/certs/ca.crt -s https://localhost:8443/health
|
||||
```
|
||||
|
||||
### "connect to QA DB" failure
|
||||
The database tests connect directly to PostgreSQL. Ensure port 5432 is exposed:
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.demo.yml port postgres 5432
|
||||
```
|
||||
|
||||
### Performance tests flaking
|
||||
The performance thresholds (200ms, 300ms, 500ms) assume a local Docker stack. On slow CI runners or remote Docker hosts, increase the thresholds or skip Part 39:
|
||||
```bash
|
||||
go test -tags qa -v -run 'TestQA/Part(?!39)' ./...
|
||||
```
|
||||
|
||||
### Source file checks failing
|
||||
The `fileExists` and `fileContains` helpers read from `CERTCTL_QA_REPO_DIR` (default `../..`). If running from a non-standard location:
|
||||
```bash
|
||||
CERTCTL_QA_REPO_DIR=/absolute/path/to/certctl go test -tags qa -v ./...
|
||||
```
|
||||
|
||||
## Adding New Tests
|
||||
|
||||
When a new feature ships:
|
||||
|
||||
1. **Add a Part section** in `qa_test.go` following the numbering in `docs/testing-guide.md`
|
||||
2. **API tests**: use `c.get()`, `c.post()`, `c.bodyStr()`, `c.getJSON()`, `c.timedGet()`
|
||||
3. **Source checks**: use `fileExists(t, "relative/path")` and `fileContains(t, "path", "substring")`
|
||||
4. **DB checks**: use `openQADB(t)` and `db.queryInt(t, "SELECT ...")`
|
||||
5. **Cleanup**: always use `t.Cleanup()` for data created during tests
|
||||
6. **Skip if external**: use `t.Skip("Requires X — manual test")` with a clear reason
|
||||
|
||||
## Version History
|
||||
|
||||
- **v1.0** (April 2026) — Initial release covering all 52 Parts of testing-guide.md v2.1. Replaces `qa-smoke-test.sh`.
|
||||
- **v1.1** (April 2026) — Added Parts 53–54 (M47: Kubernetes Secrets target + AWS ACM PCA issuer). 54 Parts total, ~164 automated subtests.
|
||||
@@ -6,6 +6,30 @@ This guide gets you running in 5 minutes and walks you through everything certct
|
||||
|
||||
New to certificates? Read the [Concepts Guide](concepts.md) first — it explains TLS, CAs, and private keys in plain language.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [Start Everything](#start-everything)
|
||||
3. [Open the Dashboard](#open-the-dashboard)
|
||||
4. [Explore the API](#explore-the-api)
|
||||
- [Core operations](#core-operations)
|
||||
- [Sorting, filtering, and pagination](#sorting-filtering-and-pagination)
|
||||
- [Stats and metrics](#stats-and-metrics)
|
||||
5. [Create Your First Certificate](#create-your-first-certificate)
|
||||
- [Revoke a certificate](#revoke-a-certificate)
|
||||
- [Interactive approval workflow](#interactive-approval-workflow)
|
||||
6. [Certificate Discovery](#certificate-discovery)
|
||||
- [Filesystem discovery (agent-based)](#filesystem-discovery-agent-based)
|
||||
- [Network discovery (agentless)](#network-discovery-agentless)
|
||||
- [Triage discovered certificates](#triage-discovered-certificates)
|
||||
7. [CLI Tool](#cli-tool)
|
||||
8. [MCP Server (AI Integration)](#mcp-server-ai-integration)
|
||||
9. [Demo Data Reference](#demo-data-reference)
|
||||
10. [Dashboard Demo Mode](#dashboard-demo-mode)
|
||||
11. [Presenting to Stakeholders](#presenting-to-stakeholders)
|
||||
12. [Tear Down](#tear-down)
|
||||
13. [What's Next](#whats-next)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
You need **Docker** and **Docker Compose** installed. That's it.
|
||||
@@ -19,6 +43,8 @@ On Linux, follow the official Docker install guide for your distribution.
|
||||
|
||||
## Start Everything
|
||||
|
||||
### Docker Compose (Quick Start)
|
||||
|
||||
```bash
|
||||
git clone https://github.com/shankar0123/certctl.git
|
||||
cd certctl
|
||||
@@ -34,6 +60,39 @@ cp deploy/.env.example deploy/.env
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
> **Warning:** Edit `POSTGRES_PASSWORD` *before* the very first `docker compose up`. Postgres seeds the password into its data directory only on first boot of an empty volume — after that, the password is baked into `pg_authid` and the env var is ignored. If you boot once with the default and later change `POSTGRES_PASSWORD` in `.env`, the certctl-server container picks up the new value but postgres still authenticates against the old one, and the server logs `pq: password authentication failed for user "certctl"` (SQLSTATE 28P01). Two ways out: tear down the volume with `docker compose -f deploy/docker-compose.yml down -v` (this **deletes all data**) and bring up fresh, or rotate non-destructively with `docker compose -f deploy/docker-compose.yml exec postgres psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new>';"` and then restart certctl-server with the matching `POSTGRES_PASSWORD`.
|
||||
|
||||
### Docker Compose Environments
|
||||
|
||||
The `deploy/` directory contains four compose files for different use cases:
|
||||
|
||||
| File | Purpose | How to run |
|
||||
|------|---------|------------|
|
||||
| `docker-compose.yml` | **Base platform.** PostgreSQL + certctl server + agent. Clean dashboard with onboarding wizard — use this for production or first-time setup. | `docker compose -f deploy/docker-compose.yml up --build` |
|
||||
| `docker-compose.demo.yml` | **Demo data override.** Layers 180 days of realistic seed data (15 certs, 5 agents, multiple issuers) onto the base. Dashboard charts and tables look populated on first boot. | `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up --build` |
|
||||
| `docker-compose.dev.yml` | **Development override.** Adds PgAdmin (port 5050), debug-level logging, and a Delve debugger port (40000) for the server. | `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.dev.yml up --build` |
|
||||
| `docker-compose.test.yml` | **Integration test environment.** 7 containers on a static-IP subnet: PostgreSQL, certctl server+agent, step-ca, Pebble ACME server, challenge test server, and NGINX. Runs the full issuance→deployment→verification flow against real CA backends. Standalone — does not combine with the base file. | `docker compose -f deploy/docker-compose.test.yml up --build` |
|
||||
|
||||
Override files are layered onto the base with multiple `-f` flags. The test environment is self-contained and runs independently. To reset any environment's data, add `down -v` to remove volumes.
|
||||
|
||||
For a deep dive into every service, environment variable, and networking decision, see the [Docker Compose Environments Guide](../deploy/ENVIRONMENTS.md).
|
||||
|
||||
### Kubernetes with Helm
|
||||
|
||||
For production deployments on Kubernetes, use the Helm chart:
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--create-namespace --namespace certctl \
|
||||
--set server.auth.apiKey="your-secure-api-key" \
|
||||
--set postgresql.auth.password="your-db-password" \
|
||||
--set ingress.enabled=true \
|
||||
--set ingress.hosts[0].host="certctl.example.com" \
|
||||
--set ingress.hosts[0].tls=true
|
||||
```
|
||||
|
||||
The chart includes: server Deployment (with configurable replicas, health probes, security context), PostgreSQL StatefulSet with persistent volumes, agent DaemonSet (one agent per infrastructure node), optional Ingress with TLS, and ServiceAccount with RBAC. All certctl configuration options are exposed in `values.yaml` — customize issuer settings, target connectors, scheduler intervals, and notifier credentials there.
|
||||
|
||||
Wait about 30 seconds for PostgreSQL to initialize, then verify:
|
||||
|
||||
```bash
|
||||
@@ -48,24 +107,36 @@ certctl-server Up (healthy)
|
||||
certctl-agent Up
|
||||
```
|
||||
|
||||
The control plane is HTTPS-only as of v2.2. The `certctl-tls-init` init container in the shipped `deploy/docker-compose.yml` self-signs a cert on first boot and drops it into a named volume. Extract the CA bundle once and reuse it for every API call in this guide:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8443/health
|
||||
export CA=/tmp/certctl-ca.crt
|
||||
docker compose -f deploy/docker-compose.yml exec -T certctl-server \
|
||||
cat /etc/certctl/tls/ca.crt > "$CA"
|
||||
|
||||
curl --cacert "$CA" https://localhost:8443/health
|
||||
```
|
||||
```json
|
||||
{"status":"healthy"}
|
||||
```
|
||||
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/tls.md`](tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/upgrade-to-tls.md`](upgrade-to-tls.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
|
||||
## Open the Dashboard
|
||||
|
||||
Open **http://localhost:8443** in your browser.
|
||||
Open **https://localhost:8443** in your browser. Your browser will warn about the self-signed cert — that's expected for the demo bootstrap. Trust the CA bundle you just exported, or click through the warning.
|
||||
|
||||
The dashboard comes pre-loaded with 15 demo certificates across multiple teams, environments, and statuses — expiring certs, expired certs, active certs, failed renewals. A realistic snapshot of what certificate management looks like in a real organization.
|
||||
> **Note:** The Docker Compose demo runs with authentication disabled (`CERTCTL_AUTH_TYPE=none`) so you can explore immediately. For production, set `CERTCTL_AUTH_TYPE=api-key` and `CERTCTL_AUTH_SECRET=<your-secret>` in your environment, then pass `Authorization: Bearer <your-secret>` on all API requests. The dashboard will prompt for your API key on first load.
|
||||
>
|
||||
> **Key rotation:** `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `CERTCTL_AUTH_SECRET=new-key,old-key`). Both keys are valid simultaneously, enabling zero-downtime rotation: add the new key, roll clients over, then remove the old key.
|
||||
|
||||
The dashboard comes pre-loaded with 35 demo certificates across 5 issuers, 8 agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization.
|
||||
|
||||
### What you're looking at
|
||||
|
||||
The main dashboard shows total certificates, how many are expiring soon, how many have expired, the renewal success rate, and four charts: an **expiration heatmap** (90-day weekly buckets), **renewal success rate trends** (30-day line chart), **certificate status distribution** (donut chart), and **issuance rate** (30-day bar chart).
|
||||
|
||||
Explore the sidebar: Certificates, Agents, Policies, Jobs, Audit Trail, Notifications, Profiles, Teams, Owners, Agent Groups, Fleet Overview, Short-Lived Credentials, Discovery.
|
||||
Explore the sidebar: Certificates, Agents, Policies, Jobs, Audit Trail, Notifications, Profiles, Teams, Owners, Agent Groups, Fleet Overview, Short-Lived Credentials, Discovery, and Network Scans.
|
||||
|
||||
### Scenarios to walk through
|
||||
|
||||
@@ -77,9 +148,11 @@ Explore the sidebar: Certificates, Agents, Policies, Jobs, Audit Trail, Notifica
|
||||
|
||||
**"Can I revoke a compromised cert?"** — Click any active certificate, then "Revoke." A modal with RFC 5280 reason codes (Key Compromise, Superseded, Cessation of Operation). After revocation, CRL and OCSP are served automatically — clients stop trusting the cert immediately.
|
||||
|
||||
**"What about certificates already in production?"** — Click "Discovered Certificates." Agents scan local filesystems for existing certs. The server probes TLS endpoints on configured CIDR ranges. Both feed into a triage workflow: claim unmanaged certs to bring them under automation, or dismiss them.
|
||||
**"What about certificates already in production?"** — Click "Discovery" in the sidebar. The demo comes pre-loaded with 9 discovered certificates — some found by agents scanning filesystems, some found by the server probing TLS endpoints on the network. You'll see Unmanaged certs waiting for triage (including an expired printer cert and an expiring switch management cert), certs already linked to managed inventory, and one that was dismissed. Claim unmanaged certs to bring them under automation, or dismiss them. Click "Network Scans" to see the 3 configured scan targets with recent scan results.
|
||||
|
||||
**"Show me the agent fleet"** — Click "Agents." Four agents online, one offline. Click "Fleet Overview" for OS/architecture grouping, version distribution, and per-platform listing. Agents generate ECDSA P-256 keys locally — private keys never leave your infrastructure.
|
||||
**"I need to approve a renewal before it proceeds"** — Click "Jobs" in the sidebar. You'll see an amber banner: "2 jobs awaiting approval." These are renewal jobs for `auth-production` and `payments-production` that require human sign-off before proceeding. Click Approve or Reject with a reason — the decision is recorded in the audit trail.
|
||||
|
||||
**"Show me the agent fleet"** — Click "Agents." Eight agents across Linux, macOS, and Windows platforms—most online, showing OS, architecture, IP, and version metadata. A ninth entry (server-scanner) is the sentinel agent used for network certificate discovery. Click "Fleet Overview" for OS/architecture grouping, version distribution, and per-platform listing. Agents generate ECDSA P-256 keys locally — private keys never leave your infrastructure.
|
||||
|
||||
**"What about bulk operations?"** — On the Certificates page, select multiple certificates with checkboxes. A bulk action bar appears: trigger renewal, revoke with reason codes, or reassign ownership — all with progress tracking. At 47-day lifespans with hundreds of certs, bulk operations aren't optional.
|
||||
|
||||
@@ -91,62 +164,64 @@ Everything you see in the dashboard is backed by the REST API. All endpoints liv
|
||||
|
||||
### Core operations
|
||||
|
||||
Every request below uses `--cacert "$CA"` to pin the self-signed CA bundle extracted above. In production, point `$CA` at your internal CA root or the bundle you distributed to the fleet.
|
||||
|
||||
```bash
|
||||
# List all certificates
|
||||
curl -s http://localhost:8443/api/v1/certificates | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates | jq .
|
||||
|
||||
# Filter by status
|
||||
curl -s "http://localhost:8443/api/v1/certificates?status=Expiring" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?status=Expiring" | jq .
|
||||
|
||||
# Filter by environment
|
||||
curl -s "http://localhost:8443/api/v1/certificates?environment=production" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?environment=production" | jq .
|
||||
|
||||
# Get a specific certificate
|
||||
curl -s http://localhost:8443/api/v1/certificates/mc-api-prod | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/mc-api-prod | jq .
|
||||
|
||||
# Get deployment targets for a certificate
|
||||
curl -s http://localhost:8443/api/v1/certificates/mc-api-prod/deployments | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/mc-api-prod/deployments | jq .
|
||||
|
||||
# List agents
|
||||
curl -s http://localhost:8443/api/v1/agents | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agents | jq .
|
||||
|
||||
# Check agent pending work
|
||||
curl -s http://localhost:8443/api/v1/agents/ag-web-prod/work | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agents/ag-web-prod/work | jq .
|
||||
|
||||
# View audit trail
|
||||
curl -s http://localhost:8443/api/v1/audit | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/audit | jq .
|
||||
|
||||
# View policies and violations
|
||||
curl -s http://localhost:8443/api/v1/policies | jq .
|
||||
curl -s http://localhost:8443/api/v1/policies/pr-require-owner/violations | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/policies | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/policies/pr-require-owner/violations | jq .
|
||||
|
||||
# Notifications
|
||||
curl -s http://localhost:8443/api/v1/notifications | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/notifications | jq .
|
||||
|
||||
# Profiles and agent groups
|
||||
curl -s http://localhost:8443/api/v1/profiles | jq .
|
||||
curl -s http://localhost:8443/api/v1/agent-groups | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/profiles | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agent-groups | jq .
|
||||
```
|
||||
|
||||
### Sorting, filtering, and pagination
|
||||
|
||||
```bash
|
||||
# Sort by expiration date (ascending)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?sort=notAfter" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?sort=notAfter" | jq .
|
||||
|
||||
# Sort descending (prefix with -)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?sort=-createdAt" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?sort=-createdAt" | jq .
|
||||
|
||||
# Time-range filters (RFC3339)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?expires_before=2026-05-01T00:00:00Z" | jq .
|
||||
curl -s "http://localhost:8443/api/v1/certificates?created_after=2026-03-01T00:00:00Z" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?expires_before=2026-05-01T00:00:00Z" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?created_after=2026-03-01T00:00:00Z" | jq .
|
||||
|
||||
# Sparse fields — request only what you need
|
||||
curl -s "http://localhost:8443/api/v1/certificates?fields=id,common_name,status,expires_at" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?fields=id,common_name,status,expires_at" | jq .
|
||||
|
||||
# Cursor pagination — efficient for large inventories
|
||||
curl -s "http://localhost:8443/api/v1/certificates?page_size=5" | jq '{next_cursor: .next_cursor, count: (.data | length)}'
|
||||
curl -s "http://localhost:8443/api/v1/certificates?cursor=<next_cursor_value>&page_size=5" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?page_size=5" | jq '{next_cursor: .next_cursor, count: (.data | length)}'
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?cursor=<next_cursor_value>&page_size=5" | jq .
|
||||
```
|
||||
|
||||
Supported sort fields: `notAfter`, `expiresAt`, `createdAt`, `updatedAt`, `commonName`, `name`, `status`, `environment`.
|
||||
@@ -155,22 +230,22 @@ Supported sort fields: `notAfter`, `expiresAt`, `createdAt`, `updatedAt`, `commo
|
||||
|
||||
```bash
|
||||
# Dashboard summary
|
||||
curl -s http://localhost:8443/api/v1/stats/summary | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/stats/summary | jq .
|
||||
|
||||
# Certificates by status
|
||||
curl -s http://localhost:8443/api/v1/stats/certificates-by-status | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/stats/certificates-by-status | jq .
|
||||
|
||||
# Expiration timeline (next 90 days)
|
||||
curl -s "http://localhost:8443/api/v1/stats/expiration-timeline?days=90" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/stats/expiration-timeline?days=90" | jq .
|
||||
|
||||
# Job trends (last 30 days)
|
||||
curl -s "http://localhost:8443/api/v1/stats/job-trends?days=30" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/stats/job-trends?days=30" | jq .
|
||||
|
||||
# JSON metrics
|
||||
curl -s http://localhost:8443/api/v1/metrics | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/metrics | jq .
|
||||
|
||||
# Prometheus format (for Prometheus, Grafana Agent, Datadog)
|
||||
curl -s http://localhost:8443/api/v1/metrics/prometheus
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/metrics/prometheus
|
||||
```
|
||||
|
||||
## Create Your First Certificate
|
||||
@@ -178,7 +253,7 @@ curl -s http://localhost:8443/api/v1/metrics/prometheus
|
||||
Create a certificate record that certctl will track, renew, and deploy automatically.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8443/api/v1/certificates \
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "My First Certificate",
|
||||
@@ -201,45 +276,51 @@ CERT_ID="<paste the id from the response>"
|
||||
|
||||
Trigger renewal:
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8443/api/v1/certificates/$CERT_ID/renew | jq .
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates/$CERT_ID/renew | jq .
|
||||
```
|
||||
|
||||
Check the result:
|
||||
```bash
|
||||
curl -s http://localhost:8443/api/v1/certificates/$CERT_ID | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/$CERT_ID | jq .
|
||||
```
|
||||
|
||||
Refresh the dashboard at http://localhost:8443 — your new certificate appears in the inventory.
|
||||
Refresh the dashboard at https://localhost:8443 — your new certificate appears in the inventory.
|
||||
|
||||
### Revoke a certificate
|
||||
|
||||
When a private key is compromised or a service is decommissioned:
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8443/api/v1/certificates/$CERT_ID/revoke \
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates/$CERT_ID/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "superseded"}' | jq .
|
||||
```
|
||||
|
||||
Supported RFC 5280 reason codes: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn`.
|
||||
|
||||
Confirm via CRL:
|
||||
Confirm via the unauthenticated DER CRL (RFC 5280 §5, RFC 8615):
|
||||
```bash
|
||||
curl -s http://localhost:8443/api/v1/crl | jq .
|
||||
# Fetch the CRL without any API key — relying parties shouldn't need one.
|
||||
# The CRL path is unauthenticated, but it's still served over TLS.
|
||||
curl --cacert "$CA" -s https://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
|
||||
openssl crl -inform der -in /tmp/crl.der -noout -text | head -40
|
||||
```
|
||||
|
||||
### Interactive approval workflow
|
||||
|
||||
For high-value certificates where you want human oversight:
|
||||
For high-value certificates where you want human oversight. The demo includes 2 pre-seeded jobs in `AwaitingApproval` status (for `auth-production` and `payments-production`). Open **Jobs** in the sidebar and you'll see the amber "Pending Approval" banner immediately.
|
||||
|
||||
```bash
|
||||
# List jobs awaiting approval (demo includes 2)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/jobs?status=AwaitingApproval" | jq '.data[] | {id, certificate_id, status}'
|
||||
|
||||
# Approve a pending job
|
||||
curl -s -X POST http://localhost:8443/api/v1/jobs/JOB_ID/approve \
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Approved for production deployment"}' | jq .
|
||||
|
||||
# Reject a pending job
|
||||
curl -s -X POST http://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
```
|
||||
@@ -248,6 +329,8 @@ curl -s -X POST http://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
|
||||
Find certificates already running in your infrastructure — ones you didn't issue through certctl.
|
||||
|
||||
The demo environment comes pre-loaded with 9 discovered certificates (from agent filesystem scans and server-side network scans), 3 network scan targets, and recent scan history. Open **Discovery** and **Network Scans** in the sidebar to see the triage workflow immediately.
|
||||
|
||||
### Filesystem discovery (agent-based)
|
||||
|
||||
```bash
|
||||
@@ -263,7 +346,7 @@ export CERTCTL_DISCOVERY_DIRS="/etc/nginx/certs,/etc/ssl/certs,/var/lib/certs"
|
||||
export CERTCTL_NETWORK_SCAN_ENABLED=true
|
||||
|
||||
# Create a scan target
|
||||
curl -s -X POST http://localhost:8443/api/v1/network-scan-targets \
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/network-scan-targets \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Internal Network",
|
||||
@@ -275,20 +358,20 @@ curl -s -X POST http://localhost:8443/api/v1/network-scan-targets \
|
||||
}' | jq .
|
||||
|
||||
# Trigger an immediate scan
|
||||
curl -s -X POST http://localhost:8443/api/v1/network-scan-targets/nst-internal-network/scan | jq .
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/network-scan-targets/nst-internal-network/scan | jq .
|
||||
```
|
||||
|
||||
### Triage discovered certificates
|
||||
|
||||
```bash
|
||||
# List discovered certs
|
||||
curl -s "http://localhost:8443/api/v1/discovered-certificates?agent_id=agent-nginx-prod" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/discovered-certificates?agent_id=agent-nginx-prod" | jq .
|
||||
|
||||
# Summary counts
|
||||
curl -s http://localhost:8443/api/v1/discovery-summary | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/discovery-summary | jq .
|
||||
|
||||
# Claim a discovered cert (bring under management)
|
||||
curl -s -X POST "http://localhost:8443/api/v1/discovered-certificates/DISCOVERY_ID/claim" \
|
||||
curl --cacert "$CA" -s -X POST "https://localhost:8443/api/v1/discovered-certificates/DISCOVERY_ID/claim" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"managed_certificate_id": "mc-api-prod"}' | jq .
|
||||
```
|
||||
@@ -298,8 +381,9 @@ curl -s -X POST "http://localhost:8443/api/v1/discovered-certificates/DISCOVERY_
|
||||
```bash
|
||||
cd cmd/cli && go build -o certctl-cli .
|
||||
|
||||
export CERTCTL_SERVER_URL="http://localhost:8443"
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # or pass --ca-bundle; --insecure for dev self-signed
|
||||
|
||||
./certctl-cli certs list # List certificates
|
||||
./certctl-cli certs get mc-api-prod # Certificate details
|
||||
@@ -311,31 +395,66 @@ export CERTCTL_API_KEY="test-key-123"
|
||||
./certctl-cli status # Health + stats
|
||||
```
|
||||
|
||||
## Scheduled Certificate Digest Emails
|
||||
|
||||
Enable automatic HTML digest emails with certificate stats, expiration timeline, and job health:
|
||||
|
||||
```bash
|
||||
# Set SMTP configuration
|
||||
export CERTCTL_SMTP_HOST=smtp.gmail.com
|
||||
export CERTCTL_SMTP_PORT=587
|
||||
export CERTCTL_SMTP_USERNAME=admin@example.com
|
||||
export CERTCTL_SMTP_PASSWORD=your-app-password
|
||||
export CERTCTL_SMTP_FROM_ADDRESS=certctl@example.com
|
||||
export CERTCTL_SMTP_USE_TLS=true
|
||||
|
||||
# Enable digest and set recipients
|
||||
export CERTCTL_DIGEST_ENABLED=true
|
||||
export CERTCTL_DIGEST_INTERVAL=24h
|
||||
export CERTCTL_DIGEST_RECIPIENTS=ops@example.com,security@example.com
|
||||
```
|
||||
|
||||
Preview the digest HTML before enabling scheduled delivery:
|
||||
```bash
|
||||
curl --cacert "$CA" https://localhost:8443/api/v1/digest/preview | jq '.html' | grep -o '<html>' # Shows HTML is ready
|
||||
|
||||
# Trigger a digest send immediately (outside of schedule)
|
||||
curl --cacert "$CA" -X POST https://localhost:8443/api/v1/digest/send
|
||||
```
|
||||
|
||||
If no recipients are configured (`CERTCTL_DIGEST_RECIPIENTS` empty), the digest falls back to certificate owner emails. Digests include total certificates, expiring soon, expired, active agents, completed/failed jobs (30-day summary), and a table of expiring certs color-coded by urgency (7/14/30 days).
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
|
||||
```bash
|
||||
cd cmd/mcp-server && go build -o mcp-server .
|
||||
|
||||
export CERTCTL_SERVER_URL="http://localhost:8443"
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # MCP is env-vars-only; no CLI flags
|
||||
|
||||
./mcp-server
|
||||
```
|
||||
|
||||
Exposes 78 MCP tools covering the REST API via stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
|
||||
## Demo Data Reference
|
||||
|
||||
| Resource | Count | Examples |
|
||||
|----------|-------|---------|
|
||||
| Teams | 5 | Platform, Security, Payments, Frontend, Data |
|
||||
| Owners | 5 | Alice, Bob, Carol, Dave, Eve |
|
||||
| Issuers | 4 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, DigiCert (disabled) |
|
||||
| Agents | 5 | ag-web-prod, ag-web-staging, ag-lb-prod, ag-iis-prod, ag-data-prod |
|
||||
| Targets | 5 | NGINX (prod/staging/data), F5 LB, IIS |
|
||||
| Certificates | 15 | Various statuses: Active, Expiring, Expired, Failed, Wildcard |
|
||||
| Teams | 6 | Platform, Security, Payments, Frontend, Data, DevOps |
|
||||
| Owners | 6 | Alice, Bob, Carol, Dave, Eve, Frank |
|
||||
| Issuers | 5 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, ZeroSSL (EAB), Custom OpenSSL CA |
|
||||
| Agents | 9 | 8 real agents (linux/darwin/windows, amd64/arm64) + server-scanner (network discovery) |
|
||||
| Targets | 8 | NGINX prod, NGINX staging, NGINX data, HAProxy, Apache, IIS, Traefik, Caddy |
|
||||
| Certificates | 35 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Jobs | 50+ | 90 days of issuance, renewal, deployment jobs + 2 AwaitingApproval |
|
||||
| Discovered Certs | 12 | Unmanaged (filesystem + network), Managed (linked), Dismissed |
|
||||
| Discovery Scans | 8 | Historical + recent agent filesystem scans + network TLS scans |
|
||||
| Network Scan Targets | 4 | DC1 Web Servers, DC2 Application Tier, DMZ Public Endpoints, Edge Locations |
|
||||
| Audit Events | 55+ | 90 days of lifecycle events (issuance, renewal, deployment, revocation, discovery) |
|
||||
| Policies | 4 | Required owner, allowed environments, max lifetime, min renewal window |
|
||||
| Profiles | 3 | Default TLS, Short-Lived, High-Security |
|
||||
| Profiles | 5 | Standard TLS, Internal mTLS, Short-Lived, High Security, S/MIME Email |
|
||||
| Agent Groups | 5 | Linux agents, ARM agents, Production subnet, etc. |
|
||||
|
||||
## Dashboard Demo Mode
|
||||
@@ -374,7 +493,10 @@ The `-v` flag removes the PostgreSQL data volume for a clean slate.
|
||||
|
||||
## What's Next
|
||||
|
||||
**Ready to deploy with your stack?** The [Deployment Examples](examples.md) page has 5 turnkey docker-compose scenarios — pick the one closest to your setup and have it running in minutes. It also covers migration paths from Certbot, acme.sh, and cert-manager.
|
||||
|
||||
- **[Deployment Examples](examples.md)** — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer
|
||||
- **[Advanced Demo](demo-advanced.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Guide](connectors.md)** — Build custom connectors for your infrastructure
|
||||
- **[Connector Reference](connectors.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Concepts Guide](concepts.md)** — TLS certificates, CAs, and private keys explained from scratch
|
||||
|
||||
|
After Width: | Height: | Size: 229 KiB |
|
After Width: | Height: | Size: 296 KiB |
|
After Width: | Height: | Size: 160 KiB |
|
After Width: | Height: | Size: 182 KiB |
|
After Width: | Height: | Size: 179 KiB |
|
After Width: | Height: | Size: 293 KiB |
|
After Width: | Height: | Size: 166 KiB |
|
After Width: | Height: | Size: 192 KiB |
|
After Width: | Height: | Size: 162 KiB |
|
After Width: | Height: | Size: 154 KiB |
|
After Width: | Height: | Size: 150 KiB |
|
After Width: | Height: | Size: 148 KiB |
|
After Width: | Height: | Size: 179 KiB |
|
After Width: | Height: | Size: 120 KiB |
|
After Width: | Height: | Size: 340 KiB |
|
After Width: | Height: | Size: 179 KiB |
|
After Width: | Height: | Size: 160 KiB |
|
After Width: | Height: | Size: 340 KiB |
|
After Width: | Height: | Size: 296 KiB |
|
After Width: | Height: | Size: 229 KiB |