Files
certctl/deploy/helm
shankar0123 d6f4d5c5e8 deploy(helm): close Phase 4 — chart surface + DR + ops runbooks
Phase 4 of the certctl architecture diligence remediation closure.
Seven findings, all in deploy/helm/certctl/.

DEPL-H2 (High) — ship deploy/helm/certctl/templates/backup-cronjob.yaml
  Operator opt-in via backup.enabled=true. Default OFF. CronJob runs
  pg_dump --format=custom --no-owner --no-acl --dbname=certctl
  matching the canonical shape in
  docs/operator/runbooks/postgres-backup.md (so manual and
  automated dumps are byte-identical). Sink: PVC (default) OR S3
  via aws-cli. Documented as in-cluster-Postgres only — managed DB
  deployments rely on their provider's PITR.

DEPL-M1 (Med) — Helm pre-install/pre-upgrade migration hook
  deploy/helm/certctl/templates/migration-job.yaml — runs
  `certctl-server --migrate-only` before the server Deployment
  rolls. The --migrate-only flag (new in cmd/server/main.go) is a
  hermetic schema-mutation pass: load config, open DB pool, run
  RunMigrations + RunSeed, exit 0. No HTTP listener, no scheduler,
  no signing setup.

  Server's boot-time RunMigrations call is now gated on
  CERTCTL_MIGRATIONS_VIA_HOOK — when set true, the server skips
  the boot path (the hook owns the work). Default still runs at
  boot, so Compose / VM / bare-metal deploys are unchanged.

  migrations.viaHook: false in values.yaml (off by default).

DEPL-M4 (Med) — explicit Postgres StatefulSet strategy fields
  deploy/helm/certctl/templates/postgres-statefulset.yaml adds:
    spec.updateStrategy.type: OnDelete
    spec.podManagementPolicy: OrderedReady
  Operator-controlled Postgres upgrades (the OnDelete strategy
  means a chart template tweak no longer triggers an immediate
  Postgres restart). OrderedReady aligns with the standard
  Postgres-on-Kubernetes pattern for any future HA work.

DEPL-M5 (Med) — per-fleet-size resource ladder documentation
  deploy/helm/certctl/values.yaml — extended comments next to
  server.resources + agent.resources documenting:
    "≤ 500 certs / 100 agents" → defaults are validated
    "5K certs / 1K agents" → starter suggestions, TBD Phase 8
    "50K certs / 10K agents" → starter suggestions, TBD Phase 8
  Numbers for the small-fleet case derive from the measured
  baselines in docs/operator/performance-baselines.md
  (50ms p50, < 3s for 1000-cert inventory walk, etc.). Larger
  fleet numbers explicitly marked TBD pending Phase 8 load-test
  runs — operators tune empirically until then.

DEPL-L1 (Low) — Helm rollback runbook
  docs/operator/runbooks/rollback.md — covers helm rollback
  mechanics, the schema-migration manual-cleanup path (when
  *.down.sql files apply vs. when full restore is the only safe
  path), and the per-migration-class safe-to-rollback table.

DEPL-L2 (Low) — Prometheus AlertManager rules
  deploy/helm/certctl/templates/prometheusrules.yaml — opt-in via
  monitoring.prometheusRules.enabled=true. Default OFF. Four
  starter rules using verified metric names from
  internal/api/handler/metrics.go:
    CertctlCertificateExpiringSoon (certctl_certificate_expiring_soon)
    CertctlAgentOffline ((agent_total - agent_online) > 0 for 1h)
    CertctlJobFailureRateHigh (failure rate over 5% for 15m)
    CertctlIssuanceFailures (any failures over 15m window)
  All thresholds operator-tunable via
  monitoring.prometheusRules.thresholds.* in values.

DEPL-L3 (Low) — Prometheus bearer-token setup runbook
  docs/operator/runbooks/prometheus-bearer-token.md — documents
  the API-key + Secret + values wiring for the RBAC-gated
  /api/v1/metrics/prometheus scrape endpoint. End-to-end
  procedure with troubleshooting steps + rotation guide.

CI guard: scripts/ci-guards/helm-templates-lint.sh
  Six-combo matrix: defaults / backup PVC / backup S3 /
  prometheusRules / migrations.viaHook / all-on. Each runs helm
  template + checks render success. helm lint also gated.
  Wired into the auto-pickup loop in .github/workflows/ci.yml;
  azure/setup-helm@b9e51907 (v4.3.0, SHA-pinned per Phase 1
  RED-2) installs helm v3.16.0 on the runner.

Verification (all pass):
  ls deploy/helm/certctl/templates/{backup-cronjob,migration-job,prometheusrules}.yaml
  grep -E 'updateStrategy|podManagementPolicy' deploy/helm/certctl/templates/postgres-statefulset.yaml  # 2 matches
  helm template deploy/helm/certctl/ --set backup.enabled=true \
    --set monitoring.prometheusRules.enabled=true --set migrations.viaHook=true \
    | grep -E "kind: (CronJob|PrometheusRule|Job)"  # 3 matches
  helm lint deploy/helm/certctl/  # 0 failed
  ls docs/operator/runbooks/{rollback,prometheus-bearer-token}.md
  bash scripts/ci-guards/helm-templates-lint.sh  # 6/6 matrix combinations pass

Go build clean (cmd/server compiles, migrate-only path verified by
the build target). YAML validated.

Closes: cowork/certctl-architecture-diligence-audit.html#fix-DEPL-H2
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M1
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M4
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-M5
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L1
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L2
        cowork/certctl-architecture-diligence-audit.html#fix-DEPL-L3
2026-05-14 00:58:00 +00:00
..

Certctl Helm Chart

Production-ready Helm chart for deploying certctl (self-hosted certificate lifecycle management platform) on Kubernetes.

Table of Contents

  1. Quick Start
  2. Chart Features
  3. Prerequisites
  4. Installation
  5. Configuration
  6. Usage Examples
  7. Upgrading
  8. Uninstalling
  9. Architecture
  10. Security Considerations
  11. Troubleshooting

Quick Start

# Add the chart repository (when available)
helm repo add certctl https://charts.example.com
helm repo update

# Install with default values
helm install certctl certctl/certctl \
  --set server.auth.apiKey="your-secure-api-key" \
  --set postgresql.auth.password="your-secure-password"

# Check installation status
kubectl get pods -l app.kubernetes.io/instance=certctl

Chart Features

  • Server Deployment — certctl control plane with configurable replicas
  • PostgreSQL StatefulSet — Persistent database with automatic schema migration
  • Agent DaemonSet or Deployment — Flexible agent deployment (per-node or custom replicas)
  • Ingress Support — Optional HTTPS ingress with cert-manager integration
  • Security Contexts — Non-root containers, read-only filesystems, minimal capabilities
  • Resource Limits — Configurable CPU and memory requests/limits
  • Health Checks — Liveness and readiness probes on all containers
  • ConfigMaps and Secrets — Centralized configuration management
  • Service Account and RBAC — Optional cluster role bindings
  • Pod Disruption Budgets — HA-ready with configurable disruption budgets
  • Monitoring — Optional Prometheus ServiceMonitor support

Prerequisites

  • Kubernetes 1.19 or later
  • Helm 3.0 or later
  • Optional: cert-manager (for automatic TLS certificate provisioning)
  • Optional: Prometheus (for metrics scraping)

Installation

1. Using Chart from Repository

helm repo add certctl https://charts.example.com
helm repo update
helm install certctl certctl/certctl -f my-values.yaml

2. Using Local Chart

cd deploy/helm
helm install certctl certctl/ \
  --set server.auth.apiKey="$(openssl rand -base64 32)" \
  --set postgresql.auth.password="$(openssl rand -base64 32)"

3. Minimal Production Installation

helm install certctl certctl/certctl \
  --namespace certctl \
  --create-namespace \
  --set server.auth.apiKey="change-me" \
  --set postgresql.auth.password="change-me" \
  --set server.replicas=2 \
  --set server.resources.requests.cpu=200m \
  --set server.resources.requests.memory=256Mi \
  --set ingress.enabled=true \
  --set ingress.className=nginx \
  --set ingress.hosts[0].host=certctl.example.com

Configuration

Server Configuration

server:
  replicas: 1                    # Number of server replicas
  port: 8443                     # Service port
  auth:
    type: api-key               # Authentication type
    apiKey: "your-api-key"      # REQUIRED for production
  logging:
    level: info                 # Log level (debug, info, warn, error)
    format: json                # Output format
  issuer:
    local:
      enabled: true             # Enable local CA issuer
    acme:
      enabled: false            # Enable ACME issuer
      directoryURL: ""          # ACME directory URL
      email: ""                 # ACME registration email
      challengeType: "http-01"  # Challenge type (http-01, dns-01, dns-persist-01)

PostgreSQL Configuration

postgresql:
  enabled: true                 # Use managed PostgreSQL
  auth:
    database: certctl
    username: certctl
    password: "your-password"   # REQUIRED
  storage:
    size: 10Gi                  # PVC size
    storageClass: ""            # Use default StorageClass

Agent Configuration

agent:
  enabled: true                 # Deploy agents
  kind: DaemonSet              # DaemonSet (one per node) or Deployment
  replicas: 1                  # For Deployment kind only
  discoveryDirs: ""            # Comma-separated cert discovery paths
  nodeSelector: {}             # Node affinity for DaemonSet

Ingress Configuration

ingress:
  enabled: false
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: certctl.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: certctl-tls
      hosts:
        - certctl.example.com

See values.yaml for all available configuration options.

Usage Examples

Example 1: High Availability Setup

# ha-values.yaml
server:
  replicas: 3
  resources:
    requests:
      cpu: 250m
      memory: 256Mi
    limits:
      cpu: 1000m
      memory: 512Mi

postgresql:
  storage:
    size: 50Gi

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
          - key: app.kubernetes.io/component
            operator: In
            values: [server]
      topologyKey: kubernetes.io/hostname

Deploy with:

helm install certctl certctl/certctl -f ha-values.yaml

Example 2: External PostgreSQL Database

# external-db-values.yaml
postgresql:
  enabled: false

server:
  env:
    CERTCTL_DATABASE_URL: "postgres://user:password@rds.example.com:5432/certctl?sslmode=require"

Deploy with:

helm install certctl certctl/certctl -f external-db-values.yaml

Example 3: ACME + Let's Encrypt

# acme-values.yaml
server:
  issuer:
    acme:
      enabled: true
      directoryURL: https://acme-v02.api.letsencrypt.org/directory
      email: admin@example.com
      challengeType: dns-01
      dnsPresentScript: /scripts/dns-present.sh
      dnsCleanupScript: /scripts/dns-cleanup.sh
      dnsPropagationWait: 30s

Example 4: Email Notifications via Slack + SMTP

# notifications-values.yaml
server:
  smtp:
    enabled: true
    host: smtp.example.com
    port: 587
    username: certctl@example.com
    password: "smtp-password"
    fromAddress: certctl@example.com
    useTLS: true

  notifiers:
    slack:
      enabled: true
      webhookUrl: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
      channel: "#certificates"

Upgrading

# Update chart repository
helm repo update

# Upgrade release
helm upgrade certctl certctl/certctl -f values.yaml

# View upgrade history
helm history certctl

# Rollback to previous version
helm rollback certctl 1

Uninstalling

# Delete the release (keeps data by default)
helm uninstall certctl

# Also delete persistent data
kubectl delete pvc --all -l app.kubernetes.io/instance=certctl

# Delete namespace
kubectl delete namespace certctl

Architecture

Components

┌──────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster                                           │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────┐                 ┌──────────────────┐  │
│  │ Ingress/LB      │                 │  Agent Pod 1     │  │
│  │ (optional)      │                 │  (DaemonSet)     │  │
│  └────────┬────────┘                 └──────────────────┘  │
│           │                                                  │
│           ▼                           ┌──────────────────┐  │
│  ┌─────────────────────────┐          │  Agent Pod 2     │  │
│  │ Server Deployment       │          │  (DaemonSet)     │  │
│  │ (1 to N replicas)       │          └──────────────────┘  │
│  │ - REST API              │                                 │
│  │ - Scheduler             │          ┌──────────────────┐  │
│  │ - UI Dashboard          │          │  Agent Pod N     │  │
│  └────────┬────────────────┘          │  (DaemonSet)     │  │
│           │                           └──────────────────┘  │
│           │                                                  │
│           ▼                                                  │
│  ┌──────────────────────────┐                               │
│  │ PostgreSQL StatefulSet   │                               │
│  │ - Database               │                               │
│  │ - PVC (persistent)       │                               │
│  └──────────────────────────┘                               │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Network Communication

  • Server → PostgreSQL: Internal cluster DNS (certctl-postgres:5432)
  • Agent → Server: Internal cluster DNS (certctl-server:8443)
  • External → Server: Via Ingress or Service (ClusterIP/LoadBalancer/NodePort)

Security Considerations

1. Secrets Management

All sensitive data is stored in Kubernetes Secrets:

  • PostgreSQL credentials
  • API keys
  • SMTP passwords
  • ACME account secrets

Best Practices:

  • Use sealed-secrets or external-secrets operator
  • Enable encryption at rest in etcd
  • Rotate secrets regularly
# Example: Using sealed-secrets
kubectl create secret generic certctl-api-key --from-literal=api-key="$(openssl rand -base64 32)" --dry-run=client -o yaml | kubeseal -f - | kubectl apply -f -

2. RBAC

The chart creates minimal RBAC by default:

  • ServiceAccount per release
  • ClusterRole (empty, extensible)
  • ClusterRoleBinding

To restrict further:

rbac:
  create: true
  # Add specific rules here

3. Pod Security

All containers run with:

  • Non-root user (UID 1000)
  • Read-only root filesystem
  • No privilege escalation
  • Dropped capabilities (ALL)

4. Network Policies

Restrict pod-to-pod communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: certctl-default-deny
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: certctl
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: certctl
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: certctl
    - to:
        - podSelector: {}
      ports:
        - protocol: TCP
          port: 53  # DNS
        - protocol: UDP
          port: 53

5. TLS/HTTPS

Enable HTTPS with cert-manager:

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Then configure Ingress with TLS.

6. API Key Security

For production:

  1. Generate a strong API key: openssl rand -base64 32
  2. Store securely (Vault, sealed-secrets, etc.)
  3. Never commit to Git
  4. Rotate periodically
# Generate and deploy API key
NEW_KEY=$(openssl rand -base64 32)
kubectl patch secret certctl-server -p "{\"data\":{\"api-key\":\"$(echo -n $NEW_KEY | base64)\"}}"

Troubleshooting

1. Pods Not Starting

# Check pod status
kubectl get pods -l app.kubernetes.io/instance=certctl
kubectl describe pod <pod-name>
kubectl logs <pod-name>

2. Database Connection Issues

# Verify PostgreSQL is running
kubectl get pods -l app.kubernetes.io/component=postgres
kubectl logs -l app.kubernetes.io/component=postgres

# Test connection from server pod
kubectl exec -it <server-pod> -- \
  psql postgres://certctl:password@certctl-postgres:5432/certctl

3. Agent Not Connecting

# Check agent logs
kubectl logs -l app.kubernetes.io/component=agent

# Verify server is reachable
kubectl exec -it <agent-pod> -- \
  wget -q -O - http://certctl-server:8443/health

4. Persistent Data Loss

# Check PVC status
kubectl get pvc

# Verify data is being stored
kubectl exec -it <postgres-pod> -- \
  ls -lah /var/lib/postgresql/data/postgres

5. Permission Denied Errors

The chart runs containers as non-root (UID 1000). If you see permission errors:

# Temporarily allow root for debugging
server:
  securityContext:
    runAsUser: 0  # NOT FOR PRODUCTION

6. Out of Memory

Increase resource limits:

helm upgrade certctl certctl/certctl \
  --set server.resources.limits.memory=1Gi \
  --set postgresql.resources.limits.memory=2Gi

7. Certificate Validation Issues

For self-signed certificates:

kubectl exec -it <pod> -- \
  CERTCTL_TLS_INSECURE_SKIP_VERIFY=true <command>

Common Issues and Solutions

Issue Solution
ImagePullBackOff Update server.image.repository to your registry
CrashLoopBackOff Check logs with kubectl logs <pod>
Pending PVC Check storage class availability
Connection timeout Verify network policies and service DNS
High memory usage Adjust postgresql.resources.limits and server.resources.limits

Support and Contributing

For issues, questions, or contributions, visit:

License

BSL-1.1