mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:11:29 +00:00
d6f4d5c5e8
Phase 4 of the certctl architecture diligence remediation closure.
Seven findings, all in deploy/helm/certctl/.
DEPL-H2 (High) — ship deploy/helm/certctl/templates/backup-cronjob.yaml
Operator opt-in via backup.enabled=true. Default OFF. CronJob runs
pg_dump --format=custom --no-owner --no-acl --dbname=certctl
matching the canonical shape in
docs/operator/runbooks/postgres-backup.md (so manual and
automated dumps are byte-identical). Sink: PVC (default) OR S3
via aws-cli. Documented as in-cluster-Postgres only — managed DB
deployments rely on their provider's PITR.
DEPL-M1 (Med) — Helm pre-install/pre-upgrade migration hook
deploy/helm/certctl/templates/migration-job.yaml — runs
`certctl-server --migrate-only` before the server Deployment
rolls. The --migrate-only flag (new in cmd/server/main.go) is a
hermetic schema-mutation pass: load config, open DB pool, run
RunMigrations + RunSeed, exit 0. No HTTP listener, no scheduler,
no signing setup.
Server's boot-time RunMigrations call is now gated on
CERTCTL_MIGRATIONS_VIA_HOOK — when set true, the server skips
the boot path (the hook owns the work). Default still runs at
boot, so Compose / VM / bare-metal deploys are unchanged.
migrations.viaHook: false in values.yaml (off by default).
DEPL-M4 (Med) — explicit Postgres StatefulSet strategy fields
deploy/helm/certctl/templates/postgres-statefulset.yaml adds:
spec.updateStrategy.type: OnDelete
spec.podManagementPolicy: OrderedReady
Operator-controlled Postgres upgrades (the OnDelete strategy
means a chart template tweak no longer triggers an immediate
Postgres restart). OrderedReady aligns with the standard
Postgres-on-Kubernetes pattern for any future HA work.
DEPL-M5 (Med) — per-fleet-size resource ladder documentation
deploy/helm/certctl/values.yaml — extended comments next to
server.resources + agent.resources documenting:
"≤ 500 certs / 100 agents" → defaults are validated
"5K certs / 1K agents" → starter suggestions, TBD Phase 8
"50K certs / 10K agents" → starter suggestions, TBD Phase 8
Numbers for the small-fleet case derive from the measured
baselines in docs/operator/performance-baselines.md
(50ms p50, < 3s for 1000-cert inventory walk, etc.). Larger
fleet numbers explicitly marked TBD pending Phase 8 load-test
runs — operators tune empirically until then.
DEPL-L1 (Low) — Helm rollback runbook
docs/operator/runbooks/rollback.md — covers helm rollback
mechanics, the schema-migration manual-cleanup path (when
*.down.sql files apply vs. when full restore is the only safe
path), and the per-migration-class safe-to-rollback table.
DEPL-L2 (Low) — Prometheus AlertManager rules
deploy/helm/certctl/templates/prometheusrules.yaml — opt-in via
monitoring.prometheusRules.enabled=true. Default OFF. Four
starter rules using verified metric names from
internal/api/handler/metrics.go:
CertctlCertificateExpiringSoon (certctl_certificate_expiring_soon)
CertctlAgentOffline ((agent_total - agent_online) > 0 for 1h)
CertctlJobFailureRateHigh (failure rate over 5% for 15m)
CertctlIssuanceFailures (any failures over 15m window)
All thresholds operator-tunable via
monitoring.prometheusRules.thresholds.* in values.
DEPL-L3 (Low) — Prometheus bearer-token setup runbook
docs/operator/runbooks/prometheus-bearer-token.md — documents
the API-key + Secret + values wiring for the RBAC-gated
/api/v1/metrics/prometheus scrape endpoint. End-to-end
procedure with troubleshooting steps + rotation guide.
CI guard: scripts/ci-guards/helm-templates-lint.sh
Six-combo matrix: defaults / backup PVC / backup S3 /
prometheusRules / migrations.viaHook / all-on. Each runs helm
template + checks render success. helm lint also gated.
Wired into the auto-pickup loop in .github/workflows/ci.yml;
azure/setup-helm@b9e51907 (v4.3.0, SHA-pinned per Phase 1
RED-2) installs helm v3.16.0 on the runner.
Verification (all pass):
ls deploy/helm/certctl/templates/{backup-cronjob,migration-job,prometheusrules}.yaml
grep -E 'updateStrategy|podManagementPolicy' deploy/helm/certctl/templates/postgres-statefulset.yaml # 2 matches
helm template deploy/helm/certctl/ --set backup.enabled=true \
--set monitoring.prometheusRules.enabled=true --set migrations.viaHook=true \
| grep -E "kind: (CronJob|PrometheusRule|Job)" # 3 matches
helm lint deploy/helm/certctl/ # 0 failed
ls docs/operator/runbooks/{rollback,prometheus-bearer-token}.md
bash scripts/ci-guards/helm-templates-lint.sh # 6/6 matrix combinations pass
Go build clean (cmd/server compiles, migrate-only path verified by
the build target). YAML validated.
Closes: cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H2
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M1
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M4
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M5
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L1
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L2
cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L3
88 lines
3.1 KiB
Bash
Executable File
88 lines
3.1 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# scripts/ci-guards/helm-templates-lint.sh
|
|
#
|
|
# Phase 4 closure (2026-05-14): Helm chart lint + template-render gate.
|
|
#
|
|
# Runs `helm lint` against the chart and `helm template` against four
|
|
# representative value combinations to catch:
|
|
# - Syntax errors in any chart template
|
|
# - Schema-violation in values.yaml
|
|
# - Missing required values uncovered by the opt-in toggles
|
|
# (backup, monitoring.prometheusRules, migrations.viaHook)
|
|
# - Render errors when new templates are added without updating
|
|
# this guard's coverage matrix
|
|
#
|
|
# The opt-in templates added in Phase 4 (backup-cronjob.yaml,
|
|
# prometheusrules.yaml, migration-job.yaml) default OFF; without
|
|
# explicit coverage in the guard's matrix they would never render in
|
|
# CI and silent breakage could ship.
|
|
|
|
set -euo pipefail
|
|
|
|
CHART_DIR="deploy/helm/certctl"
|
|
|
|
if [ ! -d "$CHART_DIR" ]; then
|
|
echo "helm-templates-lint: skipped — $CHART_DIR not found (running outside repo root?)"
|
|
exit 0
|
|
fi
|
|
|
|
if ! command -v helm >/dev/null 2>&1; then
|
|
echo "helm-templates-lint: skipped — helm not on PATH."
|
|
echo " Install: https://helm.sh/docs/intro/install/"
|
|
exit 0
|
|
fi
|
|
|
|
echo "helm-templates-lint: running helm lint"
|
|
helm lint "$CHART_DIR" >/dev/null
|
|
|
|
# Minimal valid value set to satisfy chart preflight validators
|
|
# (server.tls.existingSecret, server.auth.apiKey, postgresql.auth.password).
|
|
# These are NOT real secrets — they're just non-empty strings to
|
|
# make the chart render in lint mode.
|
|
BASE_VALUES=(
|
|
--set "server.tls.existingSecret=lint-test-tls"
|
|
--set "server.auth.apiKey=lint-test-apikey"
|
|
--set "postgresql.auth.password=lint-test-pgpass"
|
|
)
|
|
|
|
render_and_check() {
|
|
local label="$1"
|
|
shift
|
|
local out
|
|
out="$(helm template "$CHART_DIR" "${BASE_VALUES[@]}" "$@" 2>&1)" || {
|
|
echo "helm-templates-lint: FAIL — template render error for '$label'"
|
|
echo "$out" | tail -20
|
|
return 1
|
|
}
|
|
echo "helm-templates-lint: OK — '$label'"
|
|
}
|
|
|
|
# Matrix:
|
|
# 1. Defaults (no Phase 4 opt-ins) — confirms the chart still
|
|
# renders cleanly when every Phase 4 feature is off.
|
|
# 2. backup.enabled=true (PVC sink) — confirms backup-cronjob renders.
|
|
# 3. backup.enabled=true + sink=s3 — confirms S3 sink branch renders.
|
|
# 4. monitoring.prometheusRules.enabled=true — confirms PrometheusRule renders.
|
|
# 5. migrations.viaHook=true — confirms migration-job hook renders.
|
|
# 6. All Phase 4 opt-ins on simultaneously — confirms no template
|
|
# interaction breaks the others.
|
|
render_and_check "defaults"
|
|
render_and_check "backup.enabled (pvc)" \
|
|
--set "backup.enabled=true"
|
|
render_and_check "backup.enabled (s3)" \
|
|
--set "backup.enabled=true" \
|
|
--set "backup.sink=s3" \
|
|
--set "backup.s3.bucket=lint-test-bucket"
|
|
render_and_check "monitoring.prometheusRules.enabled" \
|
|
--set "monitoring.enabled=true" \
|
|
--set "monitoring.prometheusRules.enabled=true"
|
|
render_and_check "migrations.viaHook" \
|
|
--set "migrations.viaHook=true"
|
|
render_and_check "all phase 4 opt-ins" \
|
|
--set "backup.enabled=true" \
|
|
--set "monitoring.enabled=true" \
|
|
--set "monitoring.prometheusRules.enabled=true" \
|
|
--set "migrations.viaHook=true"
|
|
|
|
echo "helm-templates-lint: all matrix combinations rendered cleanly"
|