fix: end-to-end certificate lifecycle bugs + integration test environment

Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:

ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)

step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)

NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional

Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission

Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
  discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
  pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
shankar0123
2026-04-02 17:02:20 -04:00
parent 2238f28610
commit b059ec930f
19 changed files with 2102 additions and 84 deletions
+303
View File
@@ -0,0 +1,303 @@
# =============================================================================
# certctl Testing Environment — Docker Compose
# =============================================================================
#
# Spins up the full certctl platform with real CA backends for manual QA:
#
# 1. PostgreSQL 16 — database (clean, no demo data)
# 2. certctl-server — control plane API + web dashboard on :8443
# 3. certctl-agent — polls for work, deploys certs to NGINX
# 4. step-ca — private CA (JWK provisioner, auto-bootstraps)
# 5. Pebble — ACME test server (simulates Let's Encrypt)
# 6. pebble-challtestsrv — DNS/HTTP challenge test server for Pebble
# 7. NGINX — TLS target server on :8080 (HTTP) / :8444 (HTTPS)
#
# Usage:
# cd deploy
# docker compose -f docker-compose.test.yml up --build
#
# Dashboard: http://localhost:8443
# API key: test-key-2026
# NGINX: https://localhost:8444 (self-signed placeholder until cert deployed)
#
# See docs/test-env.md for the full walkthrough.
# =============================================================================
services:
# ---------------------------------------------------------------------------
# Database
# ---------------------------------------------------------------------------
postgres:
image: postgres:16-alpine
container_name: certctl-test-postgres
environment:
POSTGRES_DB: certctl
POSTGRES_USER: certctl
POSTGRES_PASSWORD: testpass
volumes:
- test_postgres_data:/var/lib/postgresql/data
- ../migrations/000001_initial_schema.up.sql:/docker-entrypoint-initdb.d/001_schema.sql
- ../migrations/000002_agent_metadata.up.sql:/docker-entrypoint-initdb.d/002_agent_metadata.sql
- ../migrations/000003_certificate_profiles.up.sql:/docker-entrypoint-initdb.d/003_certificate_profiles.sql
- ../migrations/000004_agent_groups.up.sql:/docker-entrypoint-initdb.d/004_agent_groups.sql
- ../migrations/000005_revocation.up.sql:/docker-entrypoint-initdb.d/005_revocation.sql
- ../migrations/000006_discovery.up.sql:/docker-entrypoint-initdb.d/006_discovery.sql
- ../migrations/000007_network_discovery.up.sql:/docker-entrypoint-initdb.d/007_network_discovery.sql
- ../migrations/000008_verification.up.sql:/docker-entrypoint-initdb.d/008_verification.sql
- ../migrations/seed.sql:/docker-entrypoint-initdb.d/010_seed.sql
- ../migrations/seed_test.sql:/docker-entrypoint-initdb.d/015_seed_test.sql
# No seed_demo.sql — start with a clean database for real testing
networks:
certctl-test:
ipv4_address: 10.30.50.2
healthcheck:
test: ["CMD-SHELL", "pg_isready -U certctl -d certctl"]
interval: 5s
timeout: 5s
retries: 5
restart: unless-stopped
# ---------------------------------------------------------------------------
# Pebble — ACME test server (simulates Let's Encrypt)
# ---------------------------------------------------------------------------
# Pebble is the official ACME test server from Let's Encrypt (RFC 8555).
# It validates challenges via the companion challtestsrv.
# Root CA cert available at https://pebble:15000/roots/0 (management API).
pebble-challtestsrv:
image: ghcr.io/letsencrypt/pebble-challtestsrv:latest
container_name: certctl-test-challtestsrv
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
# Matches the official Pebble docker-compose format.
# -doh "" disables DoH (default :8443 would conflict with certctl server).
# defaultIPv4 must point to the certctl-server (10.30.50.6) because that's where
# the ACME HTTP-01 challenge server runs (port 80 inside the container).
# Pebble resolves domains via challtestsrv, then connects to this IP to validate.
command: -defaultIPv4 10.30.50.6 -defaultIPv6 "" -doh ""
networks:
certctl-test:
ipv4_address: 10.30.50.3
restart: unless-stopped
pebble:
image: ghcr.io/letsencrypt/pebble:latest
container_name: certctl-test-pebble
depends_on:
- pebble-challtestsrv
environment:
PEBBLE_VA_NOSLEEP: 1
PEBBLE_VA_ALWAYS_VALID: 0
# ENTRYPOINT is /app (the binary). command: provides only the FLAGS.
command:
- -config
- /test/config/pebble-config.json
- -dnsserver
- "10.30.50.3:8053"
- -strict
volumes:
- ./test/pebble-config.json:/test/config/pebble-config.json:ro
networks:
certctl-test:
ipv4_address: 10.30.50.4
restart: unless-stopped
# ---------------------------------------------------------------------------
# step-ca — Private CA (Smallstep)
# ---------------------------------------------------------------------------
# Auto-bootstraps on first run: generates root CA + JWK provisioner "admin".
# Root cert: /home/step/certs/root_ca.crt (inside stepca_data volume)
# Provisioner key: /home/step/secrets/provisioner_key (encrypted JWK)
step-ca:
image: smallstep/step-ca:latest
container_name: certctl-test-stepca
environment:
DOCKER_STEPCA_INIT_NAME: "certctl-test-ca"
DOCKER_STEPCA_INIT_DNS_NAMES: "step-ca,localhost"
DOCKER_STEPCA_INIT_PROVISIONER_NAME: "admin"
DOCKER_STEPCA_INIT_PASSWORD: "password123"
DOCKER_STEPCA_INIT_ADDRESS: ":9000"
volumes:
- stepca_data:/home/step
networks:
certctl-test:
ipv4_address: 10.30.50.5
healthcheck:
test: ["CMD", "curl", "-fk", "https://localhost:9000/health"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 10
restart: unless-stopped
# ---------------------------------------------------------------------------
# certctl Server (Control Plane)
# ---------------------------------------------------------------------------
# Connects to PostgreSQL, Pebble (ACME), step-ca, and Local CA.
#
# TLS trust problem: Pebble and step-ca use self-signed root CAs that
# aren't in Alpine's trust store. The ACME and step-ca connectors use
# Go's default http.Client (no InsecureSkipVerify), so they need the
# CA certs in the system trust store.
#
# Solution: setup-trust.sh runs as root, fetches Pebble CA from its
# management API, copies step-ca root cert from the shared volume,
# runs update-ca-certificates, then execs the server binary.
certctl-server:
build:
context: ..
dockerfile: Dockerfile
container_name: certctl-test-server
depends_on:
postgres:
condition: service_healthy
pebble:
condition: service_started
step-ca:
condition: service_healthy
# Run as root so update-ca-certificates can write to /etc/ssl/certs.
# Container isolation provides the security boundary.
user: "0:0"
entrypoint: ["/bin/sh", "/app/setup-trust.sh"]
environment:
# Database
CERTCTL_DATABASE_URL: postgres://certctl:testpass@postgres:5432/certctl?sslmode=disable
# Server
CERTCTL_SERVER_HOST: 0.0.0.0
CERTCTL_SERVER_PORT: 8443
CERTCTL_LOG_LEVEL: debug
# Auth — API key required (production-like)
CERTCTL_AUTH_TYPE: api-key
CERTCTL_AUTH_SECRET: test-key-2026
# Key generation — agent-side (production-like)
CERTCTL_KEYGEN_MODE: agent
# Local CA issuer (iss-local) — self-signed mode (no CA cert/key paths)
# This is the simplest issuer, always available.
# ACME issuer (iss-acme-staging) — pointed at Pebble
CERTCTL_ACME_DIRECTORY_URL: https://pebble:14000/dir
CERTCTL_ACME_EMAIL: test@certctl.dev
CERTCTL_ACME_CHALLENGE_TYPE: http-01
CERTCTL_ACME_INSECURE: "true"
# step-ca issuer (iss-stepca)
CERTCTL_STEPCA_URL: https://step-ca:9000
CERTCTL_STEPCA_ROOT_CERT: /stepca-data/certs/root_ca.crt
CERTCTL_STEPCA_PROVISIONER: admin
CERTCTL_STEPCA_PASSWORD: password123
CERTCTL_STEPCA_KEY_PATH: /stepca-data/secrets/provisioner_key
# Network scanning
CERTCTL_NETWORK_SCAN_ENABLED: "true"
# Post-deployment TLS verification
CERTCTL_VERIFY_DEPLOYMENT: "true"
CERTCTL_VERIFY_TIMEOUT: "10s"
CERTCTL_VERIFY_DELAY: "3s"
ports:
- "8443:8443"
volumes:
- ./test/setup-trust.sh:/app/setup-trust.sh:ro
# step-ca data volume (root cert at /certs/root_ca.crt, key at /secrets/provisioner_key)
- stepca_data:/stepca-data:ro
networks:
certctl-test:
ipv4_address: 10.30.50.6
healthcheck:
# /health requires auth when CERTCTL_AUTH_TYPE=api-key, so include the Bearer token
test: ["CMD", "curl", "-f", "-H", "Authorization: Bearer test-key-2026", "http://localhost:8443/health"]
interval: 10s
timeout: 5s
start_period: 30s
retries: 10
restart: unless-stopped
# ---------------------------------------------------------------------------
# NGINX — TLS Target Server
# ---------------------------------------------------------------------------
# The agent deploys certificates here via the shared nginx_certs volume.
# nginx-entrypoint.sh generates a self-signed placeholder cert so NGINX
# can boot before the agent deploys a real cert.
#
# Ports: 8080 (HTTP) / 8444 (HTTPS) — offset to avoid conflict with server.
nginx:
image: nginx:alpine
container_name: certctl-test-nginx
entrypoint: ["/bin/sh", "/entrypoint.sh"]
volumes:
- ./test/nginx.conf:/etc/nginx/nginx.conf:ro
- ./test/nginx-entrypoint.sh:/entrypoint.sh:ro
- nginx_certs:/etc/nginx/certs
ports:
- "8080:80"
- "8444:443"
networks:
certctl-test:
ipv4_address: 10.30.50.7
healthcheck:
test: ["CMD-SHELL", "curl -fk https://localhost/health || exit 1"]
interval: 10s
timeout: 5s
start_period: 15s
retries: 5
restart: unless-stopped
# ---------------------------------------------------------------------------
# certctl Agent
# ---------------------------------------------------------------------------
# Polls the server for work, generates ECDSA P-256 keys locally,
# deploys certs to NGINX via the shared volume, and discovers existing
# certs in the NGINX cert directory.
certctl-agent:
build:
context: ..
dockerfile: Dockerfile.agent
container_name: certctl-test-agent
depends_on:
certctl-server:
condition: service_healthy
environment:
CERTCTL_SERVER_URL: http://certctl-server:8443
CERTCTL_API_KEY: test-key-2026
CERTCTL_AGENT_NAME: test-agent-01
CERTCTL_AGENT_ID: agent-test-01
CERTCTL_KEYGEN_MODE: agent
CERTCTL_LOG_LEVEL: debug
CERTCTL_DISCOVERY_DIRS: /nginx-certs
volumes:
- agent_keys:/var/lib/certctl/keys
- nginx_certs:/nginx-certs
networks:
certctl-test:
ipv4_address: 10.30.50.8
restart: unless-stopped
# =============================================================================
# Network
# =============================================================================
# Static IPs are required because:
# - Pebble needs to know the challtestsrv DNS server address (10.30.50.3)
# - challtestsrv resolves all domains to certctl-server (10.30.50.6) for HTTP-01 challenges
# - Avoids DNS race conditions during startup
networks:
certctl-test:
driver: bridge
ipam:
config:
- subnet: 10.30.50.0/24
# =============================================================================
# Volumes
# =============================================================================
volumes:
test_postgres_data:
driver: local
stepca_data:
driver: local
agent_keys:
driver: local
nginx_certs:
driver: local
+27
View File
@@ -0,0 +1,27 @@
#!/bin/sh
# Generate a self-signed placeholder certificate so NGINX can boot
# before the certctl agent deploys a real certificate.
# Once the agent deploys, it overwrites these files and reloads NGINX.
CERT_DIR="/etc/nginx/certs"
mkdir -p "$CERT_DIR"
# Make cert directory world-writable so the certctl-agent container
# (which shares this volume) can overwrite the placeholder certs.
chmod 777 "$CERT_DIR"
if [ ! -f "$CERT_DIR/cert.pem" ]; then
echo "Generating self-signed placeholder certificate..."
apk add --no-cache openssl > /dev/null 2>&1
openssl req -x509 -nodes -days 1 -newkey ec -pkeyopt ec_paramgen_curve:prime256v1 \
-keyout "$CERT_DIR/key.pem" \
-out "$CERT_DIR/cert.pem" \
-subj "/CN=placeholder.certctl.test" \
2>/dev/null
# Make placeholder certs writable by the agent container
chmod 666 "$CERT_DIR/cert.pem" "$CERT_DIR/key.pem"
echo "Placeholder certificate generated."
fi
# Start NGINX in foreground
exec nginx -g "daemon off;"
+42
View File
@@ -0,0 +1,42 @@
# NGINX configuration for certctl test environment.
# The agent deploys certificates to /etc/nginx/certs/ and reloads NGINX.
# On startup, NGINX uses a self-signed placeholder so it can boot before any cert is deployed.
# Generate a self-signed placeholder on container start (see entrypoint in compose).
# Once the agent deploys a real cert, it overwrites these files and reloads.
events {
worker_connections 1024;
}
http {
# HTTP → redirect to HTTPS (optional, for realism)
server {
listen 80;
server_name _;
return 301 https://$host$request_uri;
}
# HTTPS server — serves whatever cert the agent has deployed
server {
listen 443 ssl;
server_name _;
ssl_certificate /etc/nginx/certs/cert.pem;
ssl_certificate_key /etc/nginx/certs/key.pem;
# Modern TLS settings
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;
location / {
default_type text/plain;
return 200 'certctl test environment NGINX is serving TLS\n';
}
location /health {
default_type text/plain;
return 200 'ok\n';
}
}
}
+16
View File
@@ -0,0 +1,16 @@
{
"pebble": {
"listenAddress": "0.0.0.0:14000",
"managementListenAddress": "0.0.0.0:15000",
"certificate": "test/certs/localhost/cert.pem",
"privateKey": "test/certs/localhost/key.pem",
"httpPort": 80,
"tlsPort": 443,
"ocspResponderURL": "",
"externalAccountBindingRequired": false,
"retryAfter": {
"authz": 3,
"order": 5
}
}
}
+764
View File
@@ -0,0 +1,764 @@
#!/usr/bin/env bash
# =============================================================================
# certctl End-to-End Test Script
# =============================================================================
#
# Automates the full lifecycle test from docs/test-env.md:
# 1. Bring up all 7 containers (build from source)
# 2. Wait for every service to be healthy
# 3. Verify pre-seeded data (agents, issuers, targets)
# 4. Issue a certificate via Local CA → deploy to NGINX → verify TLS
# 5. Issue a certificate via ACME/Pebble → verify
# 6. Issue a certificate via step-ca → verify
# 7. Test revocation + CRL
# 8. Test discovery
# 9. Test renewal (re-issue step-ca cert, check version history)
# 10. Print summary
#
# Usage:
# cd certctl/deploy
# ./test/run-test.sh # full run (build + test)
# ./test/run-test.sh --no-build # skip docker build, reuse existing containers
# ./test/run-test.sh --no-teardown # leave containers running after test
#
# Requirements: docker, curl, openssl, jq (or python3 for json parsing)
# =============================================================================
set -euo pipefail
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
COMPOSE_FILE="docker-compose.test.yml"
API_URL="http://localhost:8443"
API_KEY="test-key-2026"
NGINX_TLS="localhost:8444"
AUTH_HEADER="Authorization: Bearer ${API_KEY}"
# Flags
BUILD=true
TEARDOWN=true
for arg in "$@"; do
case "$arg" in
--no-build) BUILD=false ;;
--no-teardown) TEARDOWN=false ;;
esac
done
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
CYAN='\033[0;36m'
BOLD='\033[1m'
NC='\033[0m' # No Color
PASS=0
FAIL=0
SKIP=0
pass() {
PASS=$((PASS + 1))
echo -e " ${GREEN}PASS${NC} $1"
}
fail() {
FAIL=$((FAIL + 1))
echo -e " ${RED}FAIL${NC} $1"
if [ -n "${2:-}" ]; then
echo -e " ${RED}$2${NC}"
fi
}
skip() {
SKIP=$((SKIP + 1))
echo -e " ${YELLOW}SKIP${NC} $1"
}
info() {
echo -e "${CYAN}==>${NC} $1"
}
header() {
echo ""
echo -e "${BOLD}─── $1 ───${NC}"
}
# API helper: GET endpoint, return JSON body. Exits 1 on HTTP error.
api_get() {
local path="$1"
curl -sf -H "${AUTH_HEADER}" "${API_URL}${path}" 2>/dev/null
}
# API helper: POST with optional JSON body
api_post() {
local path="$1"
local body="${2:-}"
if [ -n "$body" ]; then
curl -sf -X POST -H "${AUTH_HEADER}" -H "Content-Type: application/json" \
-d "$body" "${API_URL}${path}" 2>/dev/null
else
curl -sf -X POST -H "${AUTH_HEADER}" "${API_URL}${path}" 2>/dev/null
fi
}
# Wait for an HTTP endpoint to return 200. Retries with backoff.
wait_for_http() {
local url="$1"
local label="$2"
local max_wait="${3:-120}"
local elapsed=0
local interval=3
while [ $elapsed -lt $max_wait ]; do
if curl -sf -H "${AUTH_HEADER}" "$url" >/dev/null 2>&1; then
return 0
fi
sleep $interval
elapsed=$((elapsed + interval))
done
return 1
}
# Extract a field from JSON using python3 (no jq dependency)
json_field() {
python3 -c "import sys,json; d=json.load(sys.stdin); print($1)" 2>/dev/null
}
# Wait for a job to reach a terminal state (Completed or Failed)
# Usage: wait_for_job <cert_id> <max_seconds>
# Returns 0 if Completed, 1 if Failed/timeout
wait_for_jobs_done() {
local cert_id="$1"
local max_wait="${2:-180}"
local elapsed=0
local interval=5
while [ $elapsed -lt $max_wait ]; do
local jobs_json
jobs_json=$(api_get "/api/v1/jobs" 2>/dev/null || echo '{"data":[]}')
# Check if all jobs for this cert are in terminal state
# API returns jobs under "data" key (not "jobs")
local pending
pending=$(echo "$jobs_json" | python3 -c "
import sys, json
data = json.load(sys.stdin)
jobs = data.get('data') or data.get('jobs') or []
active = [j for j in jobs if j.get('certificate_id') == '$cert_id'
and j.get('status') not in ('Completed', 'Failed', 'Cancelled')]
print(len(active))
" 2>/dev/null || echo "99")
if [ "$pending" = "0" ]; then
# Check how many jobs exist and their terminal states
local job_counts
job_counts=$(echo "$jobs_json" | python3 -c "
import sys, json
data = json.load(sys.stdin)
jobs = data.get('data') or data.get('jobs') or []
mine = [j for j in jobs if j.get('certificate_id') == '$cert_id']
completed = len([j for j in mine if j.get('status') == 'Completed'])
failed = len([j for j in mine if j.get('status') in ('Failed', 'Cancelled')])
print(f'{len(mine)} {completed} {failed}')
" 2>/dev/null || echo "0 0 0")
local total_jobs completed_jobs failed_jobs
total_jobs=$(echo "$job_counts" | cut -d' ' -f1)
completed_jobs=$(echo "$job_counts" | cut -d' ' -f2)
failed_jobs=$(echo "$job_counts" | cut -d' ' -f3)
if [ "$completed_jobs" -gt 0 ]; then
return 0 # At least one job completed successfully
fi
if [ "$total_jobs" -gt 0 ] && [ "$failed_jobs" -gt 0 ]; then
return 1 # All jobs are in terminal state but none completed — all failed
fi
fi
sleep $interval
elapsed=$((elapsed + interval))
done
return 1
}
# Get the TLS cert subject from NGINX for a given SNI
get_tls_subject() {
local sni="$1"
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
| openssl x509 -noout -subject 2>/dev/null \
| sed 's/subject=//' | sed 's/^ *//'
}
get_tls_issuer() {
local sni="$1"
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
| openssl x509 -noout -issuer 2>/dev/null \
| sed 's/issuer=//' | sed 's/^ *//'
}
# Get the TLS cert SANs from NGINX for a given SNI
# Modern CAs (including Let's Encrypt / Pebble) put domains only in SAN, not Subject CN.
get_tls_san() {
local sni="$1"
echo | openssl s_client -connect "$NGINX_TLS" -servername "$sni" 2>/dev/null \
| openssl x509 -noout -ext subjectAltName 2>/dev/null \
| grep -i "DNS:" | sed 's/^ *//'
}
# Check if NGINX is serving a cert that matches the given domain (checks Subject then SAN)
check_tls_identity() {
local domain="$1"
local subject issuer san
subject=$(get_tls_subject "$domain")
issuer=$(get_tls_issuer "$domain")
san=$(get_tls_san "$domain")
if echo "$subject" | grep -qi "$domain" || echo "$san" | grep -qi "$domain"; then
echo "MATCH"
echo "Subject: $subject"
echo "SAN: $san"
echo "Issuer: $issuer"
else
echo "NO_MATCH"
echo "Subject: $subject"
echo "SAN: $san"
echo "Issuer: $issuer"
fi
}
# SQL exec in the postgres container
psql_exec() {
docker exec certctl-test-postgres psql -U certctl -d certctl -tAc "$1" 2>/dev/null
}
# ---------------------------------------------------------------------------
# Cleanup trap
# ---------------------------------------------------------------------------
cleanup() {
if [ "$TEARDOWN" = true ]; then
info "Tearing down test environment..."
docker compose -f "$COMPOSE_FILE" down -v >/dev/null 2>&1 || true
else
info "Leaving containers running (--no-teardown)"
fi
}
# ---------------------------------------------------------------------------
# PHASE 0: Environment Check
# ---------------------------------------------------------------------------
header "Phase 0: Environment Check"
# Make sure we're in the deploy directory
if [ ! -f "$COMPOSE_FILE" ]; then
echo -e "${RED}ERROR: $COMPOSE_FILE not found.${NC}"
echo "Run this script from the certctl/deploy directory:"
echo " cd certctl/deploy && ./test/run-test.sh"
exit 1
fi
for cmd in docker curl openssl python3; do
if command -v "$cmd" >/dev/null 2>&1; then
pass "$cmd available"
else
fail "$cmd not found" "Install $cmd and try again"
exit 1
fi
done
if docker compose version >/dev/null 2>&1; then
pass "docker compose available"
else
fail "docker compose not available" "Install Docker Compose v2+"
exit 1
fi
# ---------------------------------------------------------------------------
# PHASE 1: Start the Stack
# ---------------------------------------------------------------------------
header "Phase 1: Start Test Environment"
# Teardown any previous run
info "Cleaning up previous test environment..."
docker compose -f "$COMPOSE_FILE" down -v >/dev/null 2>&1 || true
# Set the cleanup trap AFTER the initial teardown
trap cleanup EXIT
if [ "$BUILD" = true ]; then
info "Building and starting containers (this takes 2-5 minutes on first run)..."
docker compose -f "$COMPOSE_FILE" up --build -d 2>&1 | tail -5
else
info "Starting containers (--no-build)..."
docker compose -f "$COMPOSE_FILE" up -d 2>&1 | tail -5
fi
# ---------------------------------------------------------------------------
# PHASE 2: Wait for Services
# ---------------------------------------------------------------------------
header "Phase 2: Waiting for Services"
info "Waiting for PostgreSQL..."
if docker compose -f "$COMPOSE_FILE" exec -T postgres pg_isready -U certctl -d certctl >/dev/null 2>&1 ||
wait_for_http "${API_URL}/health" "postgres" 60; then
pass "PostgreSQL ready"
else
fail "PostgreSQL not ready after 60s"
fi
info "Waiting for certctl server..."
if wait_for_http "${API_URL}/health" "server" 120; then
pass "certctl server healthy"
# Show trust setup + connector init for debugging
echo " --- Server startup (trust setup) ---"
docker logs certctl-test-server 2>&1 | grep -E "trust|Added|Extract|provisioner|Pre-launch|key file|WARNING|CERTCTL_" | head -15
echo " ---"
else
fail "certctl server not healthy after 120s"
echo ""
echo "Server logs:"
docker logs certctl-test-server --tail 30
exit 1
fi
info "Waiting for NGINX..."
if wait_for_http "http://localhost:8080" "nginx" 30; then
pass "NGINX healthy"
else
# NGINX might not respond to plain curl on /health without the right path
# Check docker health instead
if docker inspect certctl-test-nginx --format='{{.State.Health.Status}}' 2>/dev/null | grep -q healthy; then
pass "NGINX healthy (docker healthcheck)"
else
skip "NGINX health check inconclusive (will verify via TLS later)"
fi
fi
# Give the agent a few seconds to register and send first heartbeat
info "Waiting for agent heartbeat (up to 45s)..."
AGENT_READY=false
for i in $(seq 1 15); do
AGENT_STATUS=$(api_get "/api/v1/agents/agent-test-01" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "")
if [ "$AGENT_STATUS" = "online" ]; then
AGENT_READY=true
break
fi
sleep 3
done
if [ "$AGENT_READY" = true ]; then
pass "Agent online"
else
skip "Agent not yet online (may be slow to heartbeat — continuing)"
fi
# ---------------------------------------------------------------------------
# PHASE 3: Verify Pre-Seeded Data
# ---------------------------------------------------------------------------
header "Phase 3: Verify Pre-Seeded Data"
# Agents
AGENT_COUNT=$(api_get "/api/v1/agents" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$AGENT_COUNT" -ge 2 ]; then
pass "Agents: $AGENT_COUNT found (agent-test-01 + server-scanner)"
else
fail "Agents: expected >= 2, got $AGENT_COUNT"
fi
# Issuers
ISSUER_COUNT=$(api_get "/api/v1/issuers" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$ISSUER_COUNT" -ge 3 ]; then
pass "Issuers: $ISSUER_COUNT found (iss-local, iss-acme-staging, iss-stepca)"
else
fail "Issuers: expected >= 3, got $ISSUER_COUNT" "Check seed_test.sql loaded correctly"
fi
# Targets
TARGET_COUNT=$(api_get "/api/v1/targets" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$TARGET_COUNT" -ge 1 ]; then
pass "Targets: $TARGET_COUNT found (target-test-nginx)"
else
fail "Targets: expected >= 1, got $TARGET_COUNT" "seed_test.sql may have failed after iss-local"
fi
# Profile
PROFILE_RESP=$(api_get "/api/v1/profiles" 2>/dev/null || echo '{"total":0}')
PROFILE_COUNT=$(echo "$PROFILE_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$PROFILE_COUNT" -ge 1 ]; then
pass "Profiles: $PROFILE_COUNT found (prof-test-tls)"
else
fail "Profiles: expected >= 1, got $PROFILE_COUNT"
fi
# Bail if seed data is broken
if [ "$ISSUER_COUNT" -lt 3 ] || [ "$TARGET_COUNT" -lt 1 ]; then
echo ""
echo -e "${RED}Seed data is incomplete. Cannot continue.${NC}"
echo "Check PostgreSQL logs: docker logs certctl-test-postgres"
exit 1
fi
# ---------------------------------------------------------------------------
# PHASE 4: Local CA Issuance
# ---------------------------------------------------------------------------
header "Phase 4: Local CA Certificate Issuance"
info "Creating certificate record mc-local-test..."
CREATE_RESP=$(api_post "/api/v1/certificates" '{
"id": "mc-local-test",
"name": "local-test-cert",
"common_name": "local.certctl.test",
"sans": ["local.certctl.test"],
"issuer_id": "iss-local",
"owner_id": "owner-test-admin",
"team_id": "team-test-ops",
"renewal_policy_id": "rp-default",
"certificate_profile_id": "prof-test-tls",
"environment": "development"
}' 2>/dev/null || echo "ERROR")
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-local-test'" 2>/dev/null; then
pass "Certificate record created"
else
fail "Certificate creation failed" "$CREATE_RESP"
fi
info "Linking certificate to NGINX target..."
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-local-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
pass "Target mapping inserted"
info "Triggering issuance..."
RENEW_RESP=$(api_post "/api/v1/certificates/mc-local-test/renew" 2>/dev/null || echo "ERROR")
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
pass "Issuance triggered"
else
fail "Trigger failed" "$RENEW_RESP"
fi
# Verify a job was created (this is the bug fix check)
sleep 2
JOB_COUNT=$(api_get "/api/v1/jobs" | python3 -c "
import sys, json
data = json.load(sys.stdin)
jobs = [j for j in (data.get('data') or data.get('jobs') or []) if j.get('certificate_id') == 'mc-local-test']
print(len(jobs))
" 2>/dev/null || echo "0")
if [ "$JOB_COUNT" -gt 0 ]; then
pass "Job created ($JOB_COUNT jobs for mc-local-test)"
else
fail "No jobs created — TriggerRenewalWithActor bug still present"
fi
info "Waiting for issuance + deployment (up to 180s)..."
if wait_for_jobs_done "mc-local-test" 180; then
pass "All jobs completed"
else
fail "Jobs did not complete within 180s"
echo " Current jobs:"
api_get "/api/v1/jobs" 2>/dev/null | python3 -m json.tool 2>/dev/null | head -30
fi
info "Reloading NGINX to pick up deployed certificate..."
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
sleep 3
info "Verifying TLS certificate on NGINX..."
TLS_CHECK=$(check_tls_identity "local.certctl.test")
TLS_RESULT=$(echo "$TLS_CHECK" | head -1)
if [ "$TLS_RESULT" = "MATCH" ]; then
pass "NGINX serving cert for local.certctl.test"
echo "$TLS_CHECK" | tail -n +2 | while read -r line; do echo -e " $line"; done
else
fail "NGINX not serving expected cert" "$(echo "$TLS_CHECK" | tail -n +2 | tr '\n' ', ')"
fi
# Check cert status in API
CERT_STATUS=$(api_get "/api/v1/certificates/mc-local-test" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
if [ "$CERT_STATUS" = "Active" ]; then
pass "Certificate status: Active"
else
skip "Certificate status: $CERT_STATUS (expected Active — may need more time)"
fi
# ---------------------------------------------------------------------------
# PHASE 5: ACME (Pebble) Issuance
# ---------------------------------------------------------------------------
header "Phase 5: ACME (Pebble) Certificate Issuance"
info "Creating certificate record mc-acme-test..."
CREATE_RESP=$(api_post "/api/v1/certificates" '{
"id": "mc-acme-test",
"name": "acme-test-cert",
"common_name": "acme.certctl.test",
"sans": ["acme.certctl.test"],
"issuer_id": "iss-acme-staging",
"owner_id": "owner-test-admin",
"team_id": "team-test-ops",
"renewal_policy_id": "rp-default",
"certificate_profile_id": "prof-test-tls",
"environment": "staging"
}' 2>/dev/null || echo "ERROR")
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-acme-test'" 2>/dev/null; then
pass "Certificate record created"
else
fail "Certificate creation failed" "$CREATE_RESP"
fi
info "Linking to target and triggering issuance..."
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-acme-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
RENEW_RESP=$(api_post "/api/v1/certificates/mc-acme-test/renew" 2>/dev/null || echo "ERROR")
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
pass "Issuance triggered"
else
fail "Trigger failed" "$RENEW_RESP"
fi
info "Waiting for ACME issuance + deployment (up to 180s)..."
if wait_for_jobs_done "mc-acme-test" 180; then
pass "All jobs completed"
info "Reloading NGINX to pick up deployed certificate..."
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
sleep 3
TLS_CHECK=$(check_tls_identity "acme.certctl.test")
TLS_RESULT=$(echo "$TLS_CHECK" | head -1)
if [ "$TLS_RESULT" = "MATCH" ]; then
pass "NGINX serving cert for acme.certctl.test"
echo "$TLS_CHECK" | tail -n +2 | while read -r line; do echo -e " $line"; done
else
fail "NGINX not serving expected ACME cert" "$(echo "$TLS_CHECK" | tail -n +2 | tr '\n' ', ')"
fi
else
fail "ACME jobs did not complete within 180s"
info "Checking ACME job status..."
api_get "/api/v1/jobs" 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
for j in data.get('data', []):
if j.get('certificate_id') == 'mc-acme-test':
print(f\" Job {j['id']}: type={j['type']} status={j['status']} error={j.get('last_error','')}\")" 2>/dev/null || true
echo " Server logs (last 20 lines):"
docker logs certctl-test-server --tail 20 2>&1 | grep -i "acme\|error\|fail\|CSR" | head -10 || true
fi
# ---------------------------------------------------------------------------
# PHASE 6: step-ca Issuance
# ---------------------------------------------------------------------------
header "Phase 6: step-ca (Private CA) Certificate Issuance"
info "Creating certificate record mc-stepca-test..."
CREATE_RESP=$(api_post "/api/v1/certificates" '{
"id": "mc-stepca-test",
"name": "stepca-test-cert",
"common_name": "stepca.certctl.test",
"sans": ["stepca.certctl.test"],
"issuer_id": "iss-stepca",
"owner_id": "owner-test-admin",
"team_id": "team-test-ops",
"renewal_policy_id": "rp-default",
"certificate_profile_id": "prof-test-tls",
"environment": "staging"
}' 2>/dev/null || echo "ERROR")
if echo "$CREATE_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert d.get('id')=='mc-stepca-test'" 2>/dev/null; then
pass "Certificate record created"
else
fail "Certificate creation failed" "$CREATE_RESP"
fi
info "Linking to target and triggering issuance..."
psql_exec "INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ('mc-stepca-test', 'target-test-nginx') ON CONFLICT DO NOTHING;"
RENEW_RESP=$(api_post "/api/v1/certificates/mc-stepca-test/renew" 2>/dev/null || echo "ERROR")
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
pass "Issuance triggered"
else
fail "Trigger failed" "$RENEW_RESP"
fi
info "Waiting for step-ca issuance + deployment (up to 120s)..."
if wait_for_jobs_done "mc-stepca-test" 120; then
pass "All jobs completed"
else
fail "Jobs did not complete in time"
info "Checking step-ca job status..."
api_get "/api/v1/jobs" 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
for j in data.get('data', []):
if j.get('certificate_id') == 'mc-stepca-test':
print(f\" Job {j['id']}: type={j['type']} status={j['status']} error={j.get('last_error','')}\")" 2>/dev/null || true
echo " Server logs (step-ca related):"
docker logs certctl-test-server --tail 30 2>&1 | grep -i "stepca\|step-ca\|provisioner\|jwe\|decrypt\|CSR.*fail\|error" | head -10 || true
fi
# ---------------------------------------------------------------------------
# PHASE 7: Revocation
# ---------------------------------------------------------------------------
header "Phase 7: Revocation"
info "Revoking mc-local-test (reason: superseded)..."
REVOKE_RESP=$(api_post "/api/v1/certificates/mc-local-test/revoke" '{"reason": "superseded"}' 2>/dev/null || echo "ERROR")
if echo "$REVOKE_RESP" | grep -qi "revoked\|status"; then
pass "Certificate revoked"
else
fail "Revocation failed" "$REVOKE_RESP"
fi
info "Checking CRL..."
CRL_RESP=$(api_get "/api/v1/crl" 2>/dev/null || echo '{"total":0}')
CRL_TOTAL=$(echo "$CRL_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$CRL_TOTAL" -ge 1 ]; then
pass "CRL contains $CRL_TOTAL revoked certificate(s)"
else
fail "CRL empty after revocation"
fi
CERT_STATUS=$(api_get "/api/v1/certificates/mc-local-test" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
if [ "$CERT_STATUS" = "Revoked" ]; then
pass "Certificate status updated to Revoked"
else
fail "Certificate status: $CERT_STATUS (expected Revoked)"
fi
# ---------------------------------------------------------------------------
# PHASE 8: Discovery
# ---------------------------------------------------------------------------
header "Phase 8: Certificate Discovery"
info "Checking discovered certificates..."
DISC_RESP=$(api_get "/api/v1/discovered-certificates" 2>/dev/null || echo '{"total":0}')
DISC_TOTAL=$(echo "$DISC_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$DISC_TOTAL" -ge 1 ]; then
pass "Discovered $DISC_TOTAL certificate(s) on filesystem"
else
skip "No discovered certificates yet (agent scan may not have run)"
fi
SUMMARY_RESP=$(api_get "/api/v1/discovery-summary" 2>/dev/null || echo '{}')
echo -e " Discovery summary: $SUMMARY_RESP"
# ---------------------------------------------------------------------------
# PHASE 9: Renewal (re-issue ACME cert)
# ---------------------------------------------------------------------------
header "Phase 9: Renewal"
# Try mc-stepca-test first (mc-local-test was revoked in Phase 7).
# Fall back to mc-acme-test if step-ca cert isn't Active.
RENEWAL_CERT=""
for candidate in mc-stepca-test mc-acme-test; do
STATUS=$(api_get "/api/v1/certificates/$candidate" 2>/dev/null | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
if [ "$STATUS" = "Active" ]; then
RENEWAL_CERT="$candidate"
break
fi
done
if [ -z "$RENEWAL_CERT" ]; then
skip "Cannot test renewal — no certificate in Active state"
else
info "Using $RENEWAL_CERT for renewal test..."
info "Triggering renewal on $RENEWAL_CERT..."
RENEW_RESP=$(api_post "/api/v1/certificates/$RENEWAL_CERT/renew" 2>/dev/null || echo "ERROR")
if echo "$RENEW_RESP" | grep -q "renewal_triggered\|status"; then
pass "Renewal triggered"
else
skip "Renewal trigger returned: $RENEW_RESP"
fi
info "Waiting for renewal to complete (up to 180s)..."
if wait_for_jobs_done "$RENEWAL_CERT" 180; then
pass "Renewal jobs completed"
info "Reloading NGINX to pick up renewed certificate..."
docker exec certctl-test-nginx nginx -s reload 2>/dev/null || true
sleep 3
# Verify version history shows multiple versions
VERSIONS=$(api_get "/api/v1/certificates/$RENEWAL_CERT/versions" 2>/dev/null | python3 -c "import sys,json; d=json.load(sys.stdin); print(len(d) if isinstance(d, list) else d.get('total', 0))" 2>/dev/null || echo 0)
if [ "$VERSIONS" -ge 2 ]; then
pass "Certificate has $VERSIONS versions (original + renewal)"
else
skip "Expected 2+ versions, got $VERSIONS"
fi
else
skip "Renewal jobs did not complete within 180s"
fi
fi
# ---------------------------------------------------------------------------
# PHASE 10: API Spot Checks
# ---------------------------------------------------------------------------
header "Phase 10: API Spot Checks"
# Health
if api_get "/health" >/dev/null 2>&1; then
pass "GET /health returns 200"
else
fail "GET /health failed"
fi
# Metrics
METRICS_RESP=$(api_get "/api/v1/metrics" 2>/dev/null || echo "ERROR")
if echo "$METRICS_RESP" | python3 -c "import sys,json; d=json.load(sys.stdin); assert 'gauge' in d" 2>/dev/null; then
pass "GET /api/v1/metrics returns valid JSON"
else
fail "Metrics endpoint broken"
fi
# Stats summary
STATS_RESP=$(api_get "/api/v1/stats/summary" 2>/dev/null || echo "ERROR")
if echo "$STATS_RESP" | python3 -c "import sys,json; json.load(sys.stdin)" 2>/dev/null; then
pass "GET /api/v1/stats/summary returns valid JSON"
else
fail "Stats summary endpoint broken"
fi
# Audit trail
AUDIT_RESP=$(api_get "/api/v1/audit" 2>/dev/null || echo '{"total":0}')
AUDIT_TOTAL=$(echo "$AUDIT_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$AUDIT_TOTAL" -gt 0 ]; then
pass "Audit trail: $AUDIT_TOTAL events recorded"
else
fail "Audit trail empty"
fi
# Jobs summary
JOBS_RESP=$(api_get "/api/v1/jobs" 2>/dev/null || echo '{"total":0}')
JOBS_TOTAL=$(echo "$JOBS_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
pass "Total jobs created: $JOBS_TOTAL"
# Prometheus
PROM_RESP=$(curl -sf -H "${AUTH_HEADER}" "${API_URL}/api/v1/metrics/prometheus" 2>/dev/null || echo "")
if echo "$PROM_RESP" | grep -q "certctl_certificate_total"; then
pass "Prometheus metrics endpoint working"
else
fail "Prometheus metrics endpoint broken"
fi
# ---------------------------------------------------------------------------
# Summary
# ---------------------------------------------------------------------------
header "Test Summary"
TOTAL=$((PASS + FAIL + SKIP))
echo ""
echo -e " ${GREEN}Passed: $PASS${NC}"
echo -e " ${RED}Failed: $FAIL${NC}"
echo -e " ${YELLOW}Skipped: $SKIP${NC}"
echo -e " Total: $TOTAL"
echo ""
if [ "$FAIL" -eq 0 ]; then
echo -e "${GREEN}${BOLD}All tests passed.${NC}"
exit 0
else
echo -e "${RED}${BOLD}$FAIL test(s) failed.${NC}"
echo ""
echo "Useful debug commands:"
echo " docker logs certctl-test-server --tail 50"
echo " docker logs certctl-test-agent --tail 50"
echo " docker compose -f $COMPOSE_FILE ps"
exit 1
fi
+140
View File
@@ -0,0 +1,140 @@
#!/bin/sh
# This script runs inside the certctl-server container at startup.
# It fetches CA certificates from Pebble and step-ca, adds them to the
# system trust store, then starts the certctl server.
#
# Why: The ACME connector and step-ca connector use Go's default http.Client
# with no InsecureSkipVerify. They rely on the system trust store to verify
# TLS connections. Pebble and step-ca both use self-signed root CAs that
# aren't in Alpine's default CA bundle, so we must add them manually.
#
# This script runs as root (user: "0:0" in docker-compose) so that
# update-ca-certificates can write to /etc/ssl/certs/.
set -e
echo "=== certctl trust store setup ==="
# --- Pebble CA cert (fetched from management API) ---
# Pebble's management API serves the root CA at /roots/0.
# We use -k because we can't verify Pebble's TLS cert yet (chicken-and-egg).
echo "Fetching Pebble root CA from management API..."
PEBBLE_CA=""
for i in 1 2 3 4 5 6 7 8 9 10; do
if PEBBLE_CA=$(curl -sk https://pebble:15000/roots/0 2>/dev/null); then
if [ -n "$PEBBLE_CA" ]; then
echo "$PEBBLE_CA" > /usr/local/share/ca-certificates/pebble-ca.crt
echo " Added: Pebble test CA"
break
fi
fi
echo " Waiting for Pebble (attempt $i/10)..."
sleep 2
done
if [ -z "$PEBBLE_CA" ]; then
echo " WARNING: Could not fetch Pebble CA. ACME issuance will fail."
fi
# --- step-ca root cert (from shared volume) ---
# The step-ca container writes its root CA to /home/step/certs/root_ca.crt.
# We mount the step-ca data volume at /stepca-data inside this container.
STEPCA_ROOT="/stepca-data/certs/root_ca.crt"
echo "Waiting for step-ca root cert..."
for i in 1 2 3 4 5 6 7 8 9 10; do
if [ -f "$STEPCA_ROOT" ]; then
cp "$STEPCA_ROOT" /usr/local/share/ca-certificates/step-ca-root.crt
echo " Added: step-ca root CA"
break
fi
echo " Waiting for step-ca root cert (attempt $i/10)..."
sleep 2
done
if [ ! -f "$STEPCA_ROOT" ]; then
echo " WARNING: step-ca root cert not found at $STEPCA_ROOT"
echo " step-ca issuance may fail until the cert is available."
fi
# --- step-ca provisioner key (extracted from ca.json) ---
# When step-ca auto-bootstraps via DOCKER_STEPCA_INIT_* env vars, the
# encrypted provisioner key (JWE) is NOT written as a separate file.
# Instead, it's embedded in ca.json under:
# authority.provisioners[0].encryptedKey
# We extract it here and write to /tmp so the certctl server can read it.
# The stepca_data volume is mounted :ro, so we can't write there.
STEPCA_CA_JSON="/stepca-data/config/ca.json"
STEPCA_KEY_EXTRACTED="/tmp/step-ca-provisioner-key"
echo "Extracting step-ca provisioner key from ca.json..."
for i in 1 2 3 4 5 6 7 8 9 10; do
if [ -f "$STEPCA_CA_JSON" ]; then
# Extract the encryptedKey value using grep+sed (no jq in Alpine base)
# The field looks like: "encryptedKey": "eyJhbGciOi..."
ENCRYPTED_KEY=$(grep -o '"encryptedKey":"[^"]*"' "$STEPCA_CA_JSON" | head -1 | sed 's/"encryptedKey":"//;s/"$//')
if [ -z "$ENCRYPTED_KEY" ]; then
# Try with spaces around colon (JSON formatting varies)
ENCRYPTED_KEY=$(grep -o '"encryptedKey" *: *"[^"]*"' "$STEPCA_CA_JSON" | head -1 | sed 's/"encryptedKey" *: *"//;s/"$//')
fi
if [ -n "$ENCRYPTED_KEY" ]; then
# Check if it's JWE compact serialization (dot-separated) or JSON serialization
case "$ENCRYPTED_KEY" in
\{*)
# Already JSON serialization — write as-is
echo "$ENCRYPTED_KEY" > "$STEPCA_KEY_EXTRACTED"
;;
*)
# JWE compact serialization: header.encrypted_key.iv.ciphertext.tag
# Convert to JSON serialization expected by Go decryptProvisionerKey()
JWE_PROTECTED=$(echo "$ENCRYPTED_KEY" | cut -d. -f1)
JWE_ENCKEY=$(echo "$ENCRYPTED_KEY" | cut -d. -f2)
JWE_IV=$(echo "$ENCRYPTED_KEY" | cut -d. -f3)
JWE_CT=$(echo "$ENCRYPTED_KEY" | cut -d. -f4)
JWE_TAG=$(echo "$ENCRYPTED_KEY" | cut -d. -f5)
printf '{"protected":"%s","encrypted_key":"%s","iv":"%s","ciphertext":"%s","tag":"%s"}' \
"$JWE_PROTECTED" "$JWE_ENCKEY" "$JWE_IV" "$JWE_CT" "$JWE_TAG" > "$STEPCA_KEY_EXTRACTED"
;;
esac
echo " Extracted provisioner key to $STEPCA_KEY_EXTRACTED"
echo " Key file size: $(wc -c < "$STEPCA_KEY_EXTRACTED") bytes"
echo " Key starts with: $(head -c 40 "$STEPCA_KEY_EXTRACTED")..."
# Override the env var so the server reads from the extracted file
export CERTCTL_STEPCA_KEY_PATH="$STEPCA_KEY_EXTRACTED"
break
else
echo " ca.json found but encryptedKey not found in it (attempt $i/10)"
fi
else
echo " Waiting for step-ca ca.json (attempt $i/10)..."
fi
sleep 2
done
if [ ! -f "$STEPCA_KEY_EXTRACTED" ]; then
echo " WARNING: Could not extract step-ca provisioner key"
echo " Listing /stepca-data/config/ for debugging:"
ls -la /stepca-data/config/ 2>/dev/null || echo " /stepca-data/config/ does not exist"
echo " step-ca issuance will fail."
fi
# --- Update system trust store ---
echo "Updating system CA trust store..."
update-ca-certificates 2>/dev/null || true
echo "Trust store updated."
# --- Debug: verify configuration before starting server ---
echo "=== Pre-launch verification ==="
echo " CERTCTL_STEPCA_KEY_PATH=$CERTCTL_STEPCA_KEY_PATH"
if [ -f "$CERTCTL_STEPCA_KEY_PATH" ]; then
echo " step-ca key file exists ($(wc -c < "$CERTCTL_STEPCA_KEY_PATH") bytes)"
echo " step-ca key preview: $(head -c 60 "$CERTCTL_STEPCA_KEY_PATH")..."
else
echo " WARNING: step-ca key file NOT FOUND at $CERTCTL_STEPCA_KEY_PATH"
fi
echo " CERTCTL_ACME_DIRECTORY_URL=$CERTCTL_ACME_DIRECTORY_URL"
echo " CERTCTL_ACME_INSECURE=$CERTCTL_ACME_INSECURE"
echo " Pebble CA cert: $(ls -la /usr/local/share/ca-certificates/pebble-ca.crt 2>/dev/null || echo 'NOT FOUND')"
echo " step-ca root cert: $(ls -la /usr/local/share/ca-certificates/step-ca-root.crt 2>/dev/null || echo 'NOT FOUND')"
echo " System CA count: $(ls /etc/ssl/certs/*.pem 2>/dev/null | wc -l) PEM files"
echo "=== Starting certctl server ==="
exec /app/server