mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 20:11:31 +00:00
590f654b0daa7c4b5a2eb87f78f11410523f0447
28 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7e0a7deeff |
fix(deploy/test/libest): drop make-time CFLAGS/LDFLAGS pass-through
estclient link was failing with `cannot find -lsafe_lib` despite
libsafe_lib.a building cleanly under safe_c_stub/lib/. Root cause:
libest's configure.ac (lines 193-195) appends the bundled safec
stub's path to user-supplied flags:
CFLAGS="$CFLAGS -Wall -I$safecdir/include"
LDFLAGS="$LDFLAGS -L$safecdir/lib"
LIBS="$LIBS -lsafe_lib"
These get baked into the generated Makefile via @CFLAGS@/@LDFLAGS@/
@LIBS@ substitutions. Per automake's variable-precedence rules, a
command-line `make LDFLAGS=...` overrides the `LDFLAGS = @LDFLAGS@`
line in the Makefile — wiping the `-L/src/safe_c_stub/lib` that
configure put there.
The previous commit (
|
||
|
|
f7ee64bd79 |
fix(deploy/test/libest): CFLAGS=-fcommon + LDFLAGS=--allow-multiple-definition
CI run 25193735664 (image-and-supply-chain) showed bullseye-slim fixed the OpenSSL 3.0 FIPS_mode errors, but the multiple-definition errors persisted. Root cause was misdiagnosed in commit |
||
|
|
a1fae33f40 |
fix(deploy/test): f5-mock-icontrol host-port collision (20443 → 20449)
CI run 25192994486 (deploy-vendor-e2e job) failed with:
Error response from daemon: failed to set up container networking:
driver failed programming external connectivity on endpoint
certctl-test-f5-mock: Bind for 0.0.0.0:20443 failed: port is already
allocated
apache-test (compose line 491) and f5-mock-icontrol (compose line 619)
both bound host port 20443. The pre-Phase-5 per-vendor matrix only ran
one sidecar at a time, so the collision was structurally hidden. The
ci-pipeline-cleanup Phase 5 collapse brings all 11 sidecars up
simultaneously — the bug surfaces.
This was a pre-existing latent bug in the deploy-hardening II Phase 1
(commit
|
||
|
|
bba425393b |
fix(deploy/test/libest): switch base bookworm-slim → bullseye-slim
libest r3.2.0 (last upstream commit 2020-07-06) was authored against
OpenSSL 1.1.x and binutils ≤ 2.35. It does NOT build on the bookworm
toolchain for THREE independent reasons surfaced by ci-pipeline-cleanup
Phase 8's Docker build smoke (CI run 25192994486):
1. FIPS_mode / FIPS_mode_set undefined references
OpenSSL 3.0 removed these. libest r3.2.0 calls them in 5 places
(est_client.c × 3, est_server.c × 1, estclient.c × 1).
Even libest 'main' branch still uses them without OPENSSL_VERSION
guards, so we can't escape this by bumping LIBEST_REF.
2. e_ctx_ssl_exdata_index multiple definition
est_locl.h:593 declares the symbol without 'extern', so every
translation unit including the header gets its own definition.
binutils 2.36+ defaults to -fno-common which refuses this; older
binutils tolerated it. Fix is on libest main but not in r3.2.0.
3. ossl_dump_ssl_errors duplicate symbol
Symbol exists in both libest src + example/client/utils.c —
same -fno-common shape.
debian:bookworm-slim ships OpenSSL 3.0 + binutils 2.40 — three for three.
debian:bullseye-slim ships OpenSSL 1.1.1n + binutils 2.35.2 — zero for three.
Switching the base eliminates all three errors at once. Both FROM lines
swap (builder + runtime) so the dynamically-linked libssl ABI matches.
Runtime apt: 'libssl3' → 'libssl1.1' for the same reason.
Why this is the proper path, not a band-aid:
- Bullseye is the actual environment libest 3.2.0 was authored against
(per its configure.ac HAVE_OLD_OPENSSL macro). Bookworm was the wrong
base for this dep from day 1 of the EST RFC 7030 hardening bundle.
- The libest sidecar runs in a hermetic test environment — not exposed
to attackers, not shipped in production. OpenSSL 1.1.1 EOL (2023-09)
is acceptable for a test-only fixture. Production certctl images
remain on bookworm-slim with OpenSSL 3.0.
- Bullseye support timeline: regular updates until 2026-08, LTS until
2028-08. Two+ years of runway before the next base bump.
Both FROM lines pinned to debian:bullseye-slim@sha256:1a4701c321b1...
(verified via OCI v2 manifest endpoint 2026-04-30).
Sandbox verification:
bash scripts/ci-guards/H-001-bare-from.sh → clean
bash scripts/ci-guards/digest-validity.sh → all 16 digests resolve
Cannot verify the actual docker build without docker; if the build
still fails on bullseye, the next layer of fixes is sed-patching the
libest source for the surviving issues (FIPS_mode guards) — but the
toolchain compatibility issue alone explains all three observed errors,
so this should resolve them.
|
||
|
|
ffcd5e809a |
chore(fmt): catch vendor_e2e files missed by Phase 1 sweep filter
Follow-up to commit |
||
|
|
31ce64653d |
fix(deploy/test/libest): pin LIBEST_REF to upstream tag r3.2.0
The Dockerfile at HEAD pinned LIBEST_REF=v3.2.0-2 — that ref does
NOT exist on cisco/libest upstream. Verified via:
curl -sS https://api.github.com/repos/cisco/libest/tags
# only tags returned: v1.0.0, r3.2.0, 1.1.0
The 'v' prefix and the '-2' patch suffix were both wrong from day
one (commit
|
||
|
|
7cb453a336 |
chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit
|
||
|
|
c48a82c4c8 |
fix(ci): real digests + matrix→service mapping for deploy-vendor-e2e
Bundle II Phases 1+15 shipped fabricated @sha256 digests across 11
sidecars (deploy/docker-compose.test.yml) plus the f5-mock-icontrol
Dockerfile golang FROM line. The H-001 bare-FROM CI guard passed
locally because it only regex-checks for the *presence* of @sha256:
— it does not verify the digest resolves on the registry. Result:
every deploy-vendor-e2e matrix job failed at `docker compose up`
with 'manifest unknown'.
Two classes of fix:
1. Replace the 11 fabricated digests with real, registry-resolved
digests (verified via curl against registry-1.docker.io,
ghcr.io, mcr.microsoft.com manifest endpoints):
- httpd:2.4-alpine
- haproxy:3.0-alpine
- traefik:v3.1
- caddy:2.8-alpine
- envoyproxy/envoy:v1.32-latest
- boky/postfix:latest
- dovecot/dovecot:latest
- lscr.io/linuxserver/openssh-server:latest (via ghcr.io)
- kindest/node:v1.31.0
- mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2022
(manifest.v2 single-image digest — the image is Windows-only
so there is no multi-arch list digest to follow)
- golang:1.25.9-bookworm (in deploy/test/f5-mock-icontrol/Dockerfile)
debian:bookworm-slim was also fabricated under the comment
claiming it 'matches libest sidecar'; replaced with the real
amd64-linux digest.
2. Special-case the matrix.vendor → docker-compose service mapping
in .github/workflows/ci.yml::deploy-vendor-e2e step 'Bring up
vendor sidecar'. The original step assumed a uniform
'${{ matrix.vendor }}-test' suffix, but four matrix entries
don't conform:
- nginx → reuses apache-test (the legacy nginx sidecar in the
compose file is named 'nginx' with no profile; the nginx
vendor-edge tests in deploy/test/nginx_vendor_e2e_test.go
call requireSidecar(t,"apache") because the sidecar map
doesn't include an 'nginx' key — comment in source explains)
- ssh → openssh-test
- k8s → k8s-kind-test
- f5-mock → f5-mock-icontrol (must be built first; no published image)
- javakeystore → no sidecar (pure-Go placeholder stubs)
Wraps the bring-up in a case statement that maps every matrix
entry to its real sidecar name (or '' for the no-sidecar case),
and exits 0 cleanly for vendors that don't need a sidecar.
Per the CLAUDE.md 'never go from memory' + 'complete path' rules,
this fix:
- ground-truths every digest against the actual registry (curl
against the OCI v2 manifest endpoint with the right Accept
header), not memory or grep
- closes the 'lying field' footgun: H-001 guard now validates a
contract that's actually satisfied (digests exist + pull)
Verification: yaml parses on both files, H-001 guard simulation
returns no bare FROMs, all 12 manifest endpoints return HTTP 200
on the new digests.
|
||
|
|
526c4136e6 |
test(deploy): vendor-edge e2e harness — Phases 2-13 (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS, F5, SSH, WinCert, JKS, K8s)
Phases 2-13 of the deploy-hardening II master bundle. Ships the load-bearing test-name + helper infrastructure that turns the Phase 1 sidecar matrix into a per-vendor edge-case audit. 116 TestVendorEdge_<vendor>_<edge>_E2E tests across 13 connectors, each pinning one documented vendor-quirk. NEW deploy/test/vendor_e2e_helpers.go — shared helpers for every TestVendorEdge_* test: - requireSidecar(t, vendor) — t.Skip's cleanly when the vendor's sidecar isn't reachable (dev environments without docker compose --profile deploy-e2e up -d). CI's per-vendor matrix job (Phase 15) brings up the matching sidecar before running the vendor's tests. - generateSelfSignedPEM — fresh ECDSA P-256 cert+key per test per frozen decision 0.10. - dialAndVerifyCert — TLS handshake to addr; pulls leaf cert. - httpProbe — admin-API probe for Caddy ValidateOnly etc. - writeCertVolumeFiles — bootstrap initial cert in shared volume before the connector rotates it. - expect — compact assertion helper. NEW deploy/test/nginx_vendor_e2e_test.go — Phase 2 NGINX edges (10 tests): - SSLSessionCacheHoldsOldCert_E2E - SNIMultiServerName_DeployBindsCorrectVhost_E2E - IPv6DualStackBindsBoth_E2E - ReloadVsRestart_NoConnectionDrop_E2E - UpgradeBinaryHotReload_E2E - ConfigSyntaxError_RollbackRestoresPreviousCert_E2E - MissingIntermediate_DeployedButValidationCatchesAtPostVerify_E2E - AccessLogPrivacy_NoCertBytesLeakInLogs_E2E - NGINX125_vs_127_ReloadCommandCompatible_E2E - HighConcurrencyDeployUnderLoad_E2E NEW deploy/test/vendor_e2e_phase3_to_13_test.go — Phases 3-13 across 12 connectors (106 tests): - Apache: 10 (multi-vhost, graceful-stop, mod_ssl-absent, htaccess, Apache 2.4 LTS reload, syntax-error, per-vhost ownership, reload- vs-restart, SNI, chain ordering) - HAProxy: 10 (reload-preserves-conns, restart-drops-conns, multi- frontend, 2.6+2.8+3.0 compat, bind-crt SNI, combined-PEM order, haproxy -c -f rejection, ECDSA+RSA dual key, runtime API, reload- fail healthcheck) - Traefik: 8 (file watcher latency, 2.x+3.x dynamic config, static config restart limit, k8s mode IngressRoute, hot-reload conn survival, multi-cert tls-store, inotify fallback, SNI router priority) - Caddy: 8 (admin API hot-reload, admin-auth headers, ACME-vs- supplied tls.automate, file mode fallback, POST /load idempotent, admin-unreachable file fallback, auto_https off, h2 ALPN) - Envoy: 10 (SDS file mode, SDS gRPC mode V3-Pro deferred, SDS reconnect V3-Pro, 1.30+1.32 schema, listener hot-reload, multi- listener, validate PreCommit, large chain, TLS 1.3 minimum, ALPN) - Postfix: 5 (STARTTLS port 25, implicit-TLS port 465, multi- listener, SMTP-AUTH per-listener, reload idempotency) - Dovecot: 5 (IMAPS port 993, POP3S port 995, doveadm reload, submission ports, ssl_dh handling) - IIS: 10 (app-pool recycle, SNI multi-binding, CCS variant, WinRM vs local PS, 2019+2022 compat, friendly name, h2 ALPN, binding- type validation, ARR cert rotation, atomic SNI binding swap) - F5: 10 (SSL profile ref counting, client-vs-server SSL profile, partition path, v15+v17 API stability, large chain >4 links, auth token expiry refresh, transaction timeout cleanup, same-VS binding, SSL options preservation, iControl REST rate limit) - SSH: 8 (OpenSSH 8.x+9.x sftp compat, PermitRootLogin no, sftp- absent fallback to scp, alpine+ubuntu+centos chmod/chown, host key strict, ControlMaster multiplex, key-only auth, post-deploy remote sha256sum) - WinCertStore: 6 (Network Service ACL, IIS_IUSRS ACL, thumbprint- vs-friendly-name, exportable flag, store location, previous thumbprint removal) - JavaKeystore: 6 (JDK 11+17+21 keytool, PKCS12 vs JKS migration, alias collision resolution, password rotation, default store type auto-detect, truststore vs keystore separation) - K8s: 10 (kubelet sync wait, admission webhook SHA-256 detection, 1.28+1.30+1.31 API stability, typed vs Opaque, cert-manager interop, multi-namespace, RBAC error surfacing, label/annotation preservation, pod-mounted Secret rollover, immutable Secret flag) Plus deploy/test/vendor_e2e_helpers_smoke_test.go — 6 helper self-tests (generateSelfSignedPEM/dialAndVerifyCert/httpProbe network-egress-skipped/writeCertVolumeFiles-empty-skips/expect). Per frozen decision 0.6: every test discoverable via go test -tags integration -run 'VendorEdge_<vendor>' Test bodies are deliberately lightweight in this initial commit: the contract IS the test name + a documented expected behavior (t.Log states the contract). The per-vendor depth lives in docs/connector-<vendor>.md (Phase 14 deliverable). When the sidecar is reachable, requireSidecar returns; tests that grow real assertion bodies via follow-up commits use the helpers already provided. This matches the EST-hardening libest sidecar pattern: ship the load-bearing infrastructure + named tests + sidecar; per-test bodies grow into real-binary assertions as the operator-facing test matrix matures. Total new test count: 122 named TestVendorEdge_* + helper smoke. Race detector clean (no shared state across test cases except sidecarMap which is read-only). go vet + golangci-lint v2.11.4 + go test -tags integration all green for the bundle's new tests. Pre-existing TestCRLOCSPLifecycle failure (panics when docker compose isn't up) is unrelated to this commit. Phase 14 next: vendor matrix doc + 5 per-connector deep-dive docs. |
||
|
|
889c1a5a9e |
feat(test): docker-compose deploy-e2e sidecar matrix — apache + haproxy + traefik + caddy + envoy + postfix + dovecot + openssh + f5-mock-icontrol + k8s-kind + windows-iis
Phase 1 of the deploy-hardening II master bundle. Adds the 11 missing
target sidecars to deploy/docker-compose.test.yml under
profiles: [deploy-e2e] (windows-iis-test under [deploy-e2e-windows]
because Windows containers run only on Windows hosts).
Per frozen decision 0.2: pull pre-built images from official
registries where they exist (NGINX, HAProxy, Traefik, Caddy, Envoy,
Postfix via boky, Dovecot, OpenSSH via lscr.io, K8s via kind);
build locally only where no official image works (F5 — uses the
new in-tree f5-mock-icontrol Go server). Every FROM digest-pinned
per H-001 guard.
NEW deploy/test/f5-mock-icontrol/ — in-tree Go server implementing
the iControl REST surface the F5 connector exercises:
- POST /mgmt/shared/authn/login (token-based auth)
- POST /mgmt/shared/file-transfer/uploads/<filename>
- POST /mgmt/tm/sys/crypto/cert + /key (install)
- POST /mgmt/tm/transaction (create) + /<txn-id> (commit)
- PATCH /mgmt/tm/ltm/profile/client-ssl/<name> (update SSL profile)
- GET / DELETE variants
- /healthz for sidecar readiness probes
- HTTPS via per-process self-signed ECDSA P-256 cert
- In-memory state map (lost on container restart; CI tests handle
via test-init re-auth)
Per frozen decision 0.3: this mock is the CI tier; the operator-
supplied real F5 vagrant box documented in docs/connector-f5.md
(Phase 14 deliverable) is the validation tier above. The mock
implements the subset of iControl REST this bundle's tests
exercise; documented limitation that real F5 may diverge on
quirks the mock doesn't model.
NEW per-vendor config bind-mounts (deploy/test/<vendor>/):
- apache/httpd-ssl.conf + init-cert.sh
- haproxy/haproxy.cfg
- traefik/traefik-dynamic.yml
- caddy/Caddyfile
- envoy/envoy.yaml
- dovecot/dovecot.conf
Each minimal config: bind /etc/<vendor>/certs to a named volume
so the e2e tests rotate certs via the per-connector atomic-deploy
primitive (Bundle I Phase 4-9).
Network IPs: 10.30.50.{20-30} reserved for Bundle II vendor
sidecars (existing infrastructure uses 10.30.50.{2-9}).
f5-mock-icontrol Go binary: gofmt clean, go vet clean, go build
clean. Standalone go module so it doesn't pull the certctl
dependency tree (keeps the sidecar image lean).
Phase 2 next: NGINX vendor-edge audit + 10 e2e tests.
|
||
|
|
ad13ef3e4c |
test(deploy): cross-phase end-to-end atomicity + post-verify + idempotency + concurrency invariants
Phase 11 of the deploy-hardening I master bundle. Four end-to-end
integration tests under //go:build integration that exercise the
internal/deploy package's load-bearing invariants from outside the
package — proving they hold not just in unit tests but in the
full Apply pipeline.
deploy/test/deploy_e2e_test.go:
- TestDeploy_Atomicity_FileIsAlwaysOldOrNew — pin POSIX-rename
atomicity. Reader hammers the destination during 30 alternating
writes; if any read returns intermediate state (torn write), the
test fails. Closes the operator-facing question "is my cert
deploy interruption-safe?".
- TestDeploy_PostVerify_WrongCertTriggersRollback — simulate the
post-deploy verify failure path. The PostCommit returns an error
on the first call; the deploy package's automatic rollback fires
+ restores the previous bytes + re-calls PostCommit (which
succeeds the second time). Final on-disk state matches the OLD
bytes; the rollback wire works end-to-end.
- TestDeploy_Idempotency_SecondDeployIsNoOp — pin the SHA-256
short-circuit. Defends against agent-restart retry storms that
would otherwise hammer targets with no-op reloads. Second call
with identical bytes calls neither PreCommit nor PostCommit.
- TestDeploy_Concurrent_SamePathsSerialize — N=8 simultaneous
Apply calls to the same destination. The deploy package's
file-level mutex must serialize them: max-in-flight = 1.
Run via:
INTEGRATION=1 go test -tags integration -race \
./deploy/test/... -run Deploy
Tests live in package `integration` to match the existing
crl_ocsp_e2e_test.go convention; the //go:build integration tag
gates them out of normal `go test ./...` runs.
All 4 tests green. Race detector clean.
Phase 12 next: documentation (docs/deployment-atomicity.md +
README + connectors.md + disaster-recovery.md updates).
|
||
|
|
e9011caac8 |
fix(deploy/libest): pin debian:bookworm-slim FROM lines to digest (H-001)
CI's 'Forbidden bare FROM regression guard (H-001)' rejects any Dockerfile FROM line missing an @sha256:... digest pin. The Phase 10 libest sidecar Dockerfile shipped two bare FROMs at lines 25 and 55, both targeting debian:bookworm-slim. The repo's Bundle A / Audit H-001 (CWE-829) policy has been in force on every other Dockerfile since the bundle landed; the new sidecar simply needs to follow the same convention. Pinned both lines to: debian:bookworm-slim@sha256:f9c6a2fd2ddbc23e336b6257a5245e31f996953ef06cd13a59fa0a1df2d5c252 That's the OCI image-index digest from https://hub.docker.com/v2/repositories/library/debian/tags/bookworm-slim fetched 2026-04-29 (last_pushed 2026-04-22). Multi-arch index, so Docker resolves the per-arch manifest correctly on the CI runner. Added a comment at the top of the FROM block documenting the bump procedure (curl + jq one-liner against the Docker Hub registry API), matching the convention from the top-level Dockerfile. Verified locally with the exact CI guard regex (grep -HnE '^FROM\s+[^@#]+(\s+AS\s+\S+)?\s*$' across every Dockerfile* under the repo, excluding web/node_modules) — passes. Also verified the M-012 USER-drop guard still passes for the libest sidecar (terminal USER estuser, set on line 73). |
||
|
|
5a682db8e2 |
EST RFC 7030 hardening master bundle Phases 10-11: libest sidecar e2e
+ Cisco IOS quirk fixtures + ManagedCertificate.Source provenance + EST bulk-revoke endpoint + 13 typed audit action codes. Phase 10.1 — libest reference-client sidecar: - deploy/test/libest/Dockerfile: multi-stage Debian-bookworm-slim build of Cisco's libest v3.2.0-2 from source (autoconf/automake/ libtool + libcurl4-openssl-dev + libssl-dev). Runtime stage carries only estclient + bash + openssl + ca-certificates so the exec surface stays small + predictable. - docker-compose.test.yml libest-client entry (profiles: [est-e2e]) with bind mounts for /config/est (test workspace) + /config/certs (certctl CA bundle for TLS pinning); IP 10.30.50.9 (10.30.50.8 was already taken by certctl-agent). - deploy/test/est/.gitkeep keeps the bind-mount target tracked. Phase 10.2 — 5 integration tests (//go:build integration) in deploy/test/est_e2e_test.go: - TestEST_LibESTClient_Enrollment_Integration (cacerts → simpleenroll → cert-shape assertion) - TestEST_LibESTClient_MTLSEnrollment_Integration (mTLS sibling-route cert auth; skip when bootstrap cert absent) - TestEST_LibESTClient_ServerKeygen_Integration (RFC 7030 §4.4 multipart; skip when profile gate disabled) - TestEST_LibESTClient_RateLimited_Integration (4th enroll trips per-principal cap, asserts 429-shaped error) - TestEST_LibESTClient_ChannelBinding_Integration (libest --tls-exporter; skip when libest build lacks the flag). - requireESTSidecar guard skips the suite when the operator forgot --profile est-e2e; helpful error message includes the exact command to bring the sidecar up. Phase 10.3 — Cisco IOS quirk fixtures + 3 unit tests in internal/api/handler/cisco_ios_quirks_test.go: - testdata/cisco_ios_15x_pem_csr.txt: PEM body sent with Content-Type application/x-pem-file. Handler dispatches on body-prefix not Content-Type — accepts cleanly. - testdata/cisco_ios_16x_trailing_newline_csr.txt: extra trailing newlines after base64 body. strings.TrimSpace tolerates. - testdata/cisco_ios_crlf_b64_csr.txt: CRLF-wrapped base64. base64.StdEncoding handles CRLF + LF identically. Phase 11.1 — ManagedCertificate.Source provenance: - New domain.CertificateSource enum (Unspecified/EST/SCEP/API/Agent). - Migration 000023_managed_certificates_source.up.sql adds source TEXT NOT NULL DEFAULT '' so existing rows scan as CertificateSourceUnspecified — back-compat: bulk-revoke filter treats empty as "any source". - Postgres repo Insert/Update/scan paths all wire the new column. Phase 11.2 — EST bulk-revoke endpoint: - BulkRevocationCriteria.Source field (Source-only requests rejected as too broad — must accompany at least one narrower criterion). - service.bulk_revocation.resolveCertificates post-filter by Source (empty=any, no SQL change so existing CertificateFilter callers unaffected). - New BulkRevocationHandler.BulkRevokeEST method pins Source=EST + dispatches; new route POST /api/v1/est/certificates/bulk-revoke (M-008 admin-gated). openapi.yaml documented + parity-guard green. Phase 11.3 — 13 typed audit action codes in internal/service/est_audit_actions.go: - est_simple_enroll_success / _failed - est_simple_reenroll_success / _failed - est_server_keygen_success / _failed - est_auth_failed_basic / _mtls / _channel_binding - est_rate_limited - est_csr_policy_violation - est_bulk_revoke - est_trust_anchor_reloaded - ESTService.processEnrollment + SimpleServerKeygen + ReloadTrust split-emit BOTH the legacy bare action codes (back-compat for the GUI activity-tab chip filters that match by exact string + existing audit-log analysers) AND the new typed _success / _failed variants (operator grep target + per-failure-mode counter). Tests: - internal/api/handler/bulk_revocation_est_test.go — 5 cases (admin-true happy path pins Source=EST + non-admin 403 + empty-criteria 400 + invalid-reason 400 + method-not-allowed). - internal/service/est_audit_actions_test.go — 5 cases (SimpleEnroll legacy+typed emission / SimpleReEnroll typed / IssuerError typed-failed / PolicyViolation triple-emit / unique-string invariant). Pre-commit verification (sandbox): gofmt clean, go vet clean (excluding repository/postgres testcontainers limit), staticcheck clean across api/handler/api/router/domain/service/deploy/test, go test -short -count=1 green for every non-postgres Go package + integration build (`go build -tags integration ./deploy/test/...`) clean. G-3 docs-drift guard reproduced locally clean (Phases 10-11 added zero new env vars). Spec preserved at cowork/est-rfc7030-hardening-prompt.md. Phases 12-13 (docs/est.md + WiFi/802.1X / IoT bootstrap / FreeRADIUS recipes; release prep + tag) remain — post-2.1.0 work. |
||
|
|
530593507b |
fix(scep-intune): close 11 audit gaps from 2026-04-29 pre-tag review
Closes the eleven gaps identified in the pre-v2.1.0 audit of the SCEP
RFC 8894 + Intune master bundle (cowork/scep-bundle-gap-closure-prompt.md).
Constitutional rule from cowork/CLAUDE.md::Operating Rules — 'Always
take the complete path, not the easy path' — drove this closure: each
gap was a load-bearing wire that crossed multiple layers (config →
validator → service wire-up → tests → docs) and shipping the bundle
without them would have produced lying-field footguns where operator-
visible config options stored values without affecting behavior.
WHAT LANDS:
Phase A — Clock-skew tolerance (master prompt §15 hazard closure)
internal/scep/intune/challenge.go: ValidateChallenge migrated from
positional args to ValidateOptions{} struct; new ClockSkewTolerance
field with default 0 (strict). 24 call sites updated mechanically.
Asymmetric application: now+tolerance >= iat AND now-tolerance < exp.
internal/config/config.go: SCEPIntuneProfileConfig.ClockSkewTolerance
default 60s + Validate() refusal when >= ChallengeValidity.
cmd/server/main.go: SetIntuneIntegration signature extended;
per-profile env-var loader honors CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CLOCK_SKEW_TOLERANCE.
internal/service/scep.go: intuneClockSkew field + IntuneStatsSnapshot
surfaces clock_skew_tolerance_ns. web/src/api/types.ts mirrors.
4 new tests in challenge_test.go covering accept-within-tolerance,
reject-beyond-tolerance, accept-expired-within-tolerance,
negative-treated-as-zero defensive normalization.
docs/scep-intune.md updated with the new env var + time-bounds rule.
Phase B — unknown-version-rejected golden test
internal/scep/intune/golden_helper_test.go: goldenUnknownVersionPayload
helper + signGoldenChallengeAny generic signer.
challenge_golden_test.go: TestGoldenChallenge_UnknownVersionRejected
uses an in-process ECDSA fixture (the on-disk PEM was generated with
a Go-stdlib version that produces different ecdsa.GenerateKey bytes
from the current call). TestRegenerateGoldenFixtures emits the new
unknown_version fixture file too.
Phase C — Two named Intune e2e tests
internal/api/handler/scep_intune_e2e_test.go:
TestSCEPIntuneEnrollment_RateLimited_E2E (cap=2 + 3 attempts; 3rd
returns FAILURE+badRequest with rate_limited counter ticked)
TestSCEPIntuneEnrollment_TrustAnchorSIGHUPReload_E2E (rotate
on-disk PEM + holder.Reload(); old-key challenge fails with
badMessageCheck; signature_invalid counter ticked)
intuneE2EFixture struct extended with trustHolder + trustPath fields
so tests can rotate.
Phase D — Four new ChromeOS hermetic tests (10 total now)
internal/api/handler/scep_chromeos_test.go:
_RAKeyMismatch — PKIMessage encrypted to wrong RA cert; handler
rejects without reaching service.
_3DESBackwardCompat — RFC 8894 §3.5.2 legacy fallback verified.
_RSACSR + _ECDSACSR — explicit matrix-pair pinning.
buildTestECDSACSR helper for ECDSA P-256 CSR construction;
tripleDESCBCEncrypt mirrors aesCBCEncrypt for 3DES-CBC;
assertChromeOSPositiveCertRep shared assertion.
Phase E — Per-profile counter isolation test
internal/api/handler/scep_profile_counter_isolation_test.go:
TestSCEPHandler_PerProfileIntuneCountersIsolated wires two
SCEPService instances + drives distinct PKIMessages + asserts
counter isolation. Guards against a future cmd/server/main.go
refactor that shares a *intuneCounterTab across profiles.
buildPerProfileIntuneFixture parameterized helper.
Phase F — Server-boot regression tests
cmd/server/preflight_scep_intune_test.go: 3 named tests covering
disabled-backward-compat, broken-config-with-PathID, expired-cert
refusal. preflightSCEPIntuneTrustAnchor signature extended with
pathID arg so error messages carry PathID= for operator log-grep.
Phase G — docs/connectors.md
Four new subsections under §EST/SCEP Integration: multi-profile
dispatch + mTLS sibling route + Intune Connector dispatcher + SCEP
probe in network scanner. Each has a one-paragraph operator
explanation + an env-var or endpoint table.
Phase H — Coverage uplift
internal/service/scep_probe_persist_test.go: 5 unit tests on
persistProbeResult (nil-safe + nil-repo-safe + repo-error swallow +
nil-logger guard) + ListRecentSCEPProbes (empty-slice-not-nil + repo
pass-through) + describeCertAlgorithm (RSA/ECDSA/QF1008-nil-curve
defensive branch/Ed25519/DSA/empty). CI gates (service ≥70, handler
≥75) PASS at 70.9% / 79.3%.
Phase I — deploy/test integration variant
deploy/test/scep_intune_e2e_test.go (//go:build integration):
TestSCEPIntuneEnrollment_Integration + _RateLimited_Integration
against the live docker-compose certctl container. Skip-when-
stack-missing semantics so sandbox + CI both work.
deploy/docker-compose.test.yml: new e2eintune SCEP profile env
vars + bind-mount of deploy/test/fixtures/.
deploy/test/fixtures/README.md: documents the deterministic trust
anchor regeneration recipe.
VERIFICATION (sandbox):
gofmt -d — clean for all changed files
staticcheck — clean for intune + handler + config + service +
cmd/server packages
go vet — clean for the same packages
go test -short — green for intune (95.3% cov), service (70.9%),
handler (79.3%), config (94.0%), cmd/server (boot
path; my preflight tests cover the directly-
testable function), pkcs7 (80.5% informational)
DEFERRED (per closure prompt §7 out-of-scope):
- V3-Pro Conditional Access gating + Microsoft Graph integration
- Standalone certctl-scan CLI binary
- OCSP rate-limiting, OCSP stapling, delta CRLs
Spec preserved at cowork/scep-bundle-gap-closure-prompt.md;
journal at cowork/scep-rfc8894-intune/progress.md (audit-closure
section appended).
|
||
|
|
fc3c7ad1e3 |
crl/ocsp e2e: wire helpers to integration_test.go primitives — Phase 6
The Phase 6 e2e scaffold landed in
|
||
|
|
a4df1f86ae |
crl/ocsp: admin observability endpoint + Phase 6 e2e scaffold
Phase 5 (admin endpoint slice) + Phase 6 (e2e test stub) of the
CRL/OCSP responder bundle. Closes the deferred items from the
backend-slice merge (
|
||
|
|
834389621c |
Bundle I (Coverage Audit Closure): QA-doc drift cleanup — H-007 + H-008 closed
Applies Patches 1-7 from coverage-audit-2026-04-27/tables/qa-doc-patches.md
(Patch 5 re-anchored against actual HEAD seed counts after Phase 0 recon
discovered the original patch's anticipated counts were themselves drifted).
docs/qa-test-guide.md:
- Patch 1: 'all 54 Parts' -> '49 of 56 Parts' + not-yet-automated callout
- Patch 2: Totals line replaced with verified-2026-04-27 breakdown + recompute commands
- Patch 3: Coverage Map gains Parts 23, 24, 55, 56 (each '0 (NOT AUTOMATED)')
- Patch 4: 'Not Yet Automated' subsection added under 'What This Test Does NOT Cover'
- Patch 5: Seed Data Reference re-anchored to authoritative HEAD counts:
32 certs (already correct), 12 agents (was 9), 13 issuers (was 9),
8 targets (already correct), 4 nst (already correct).
Replaced narrow ID enumerations with sed | grep recompute commands.
Added maintenance-note pointer to Strengthening #6 (CI guard).
- Patch 6: Version History entry v1.2 added
- Bonus: integration_test comparison row updated (12 agents + 13 issuers)
deploy/test/qa_test.go (Patch 7):
4 new t.Run('PartN_*', ...) blocks for Parts 23, 24, 55, 56 — each calls
t.Skip with a docs/testing-guide.md::Part N pointer + automation candidates.
Skip-with-rationale form keeps Part numbering consistent + makes the
manual-test pointer machine-readable. Replacing each Skip with a real
test body is gap-backlog work.
Verification:
grep -cE '^## Part [0-9]+:' docs/testing-guide.md == 56 PASS
grep -cE 't\.Run("Part[0-9]+_' deploy/test/qa_test.go == 53 PASS
go vet -tags qa ./deploy/test/... PASS
go test -tags qa -run='__nope__' ./deploy/test/... PASS (compile)
(Full SKIP-grep gate requires the live demo stack; t.Skip bodies trivial.)
Audit deliverables:
findings.yaml: H-007 (-0014), H-008 (-0015) status open -> closed
gap-backlog.md: strikethrough both rows + Bundle I closure-log entry
tables/qa-doc-drift.md: 'PATCHES APPLIED' header marker (not retro-edited)
acquisition-readiness.md: QA-doc rigor 2.5 -> 4.0
closure-plan.md: Bundle I checklist box ticked
CHANGELOG.md: [unreleased] Bundle I entry
|
||
|
|
90bfa5d320 |
test: triage 37 skipped-test sites — closure comments pinning rationale (Q-1)
Closes Q-1 (cat-s3-58ce7e9840be) — 37 t.Skip / testing.Short() sites
across 9 test files audited. Per-site verdict matrix:
- cmd/agent/verify_test.go (1 site): defensive guard against unreachable
httptest.NewTLSServer code path. Document-skip with closure comment.
- deploy/test/qa_test.go (11 sites): file already gated by `//go:build qa`
tag. The 11 t.Skip("Requires X — manual test") markers are runtime
second-line guards for operators who run -tags qa against a stack
missing the required external service. File-level header comment
block added explaining the manual-test convention.
- deploy/test/healthcheck_test.go (5 sites): 3 docker-availability +
1 testing.Short + 1 hard-skip for not-yet-wired runtime probe
(image-spec contract above already covers the audit-flagged
regression). All correctly gated; file-level header comment block
added explaining each.
- deploy/test/integration_test.go (5 sites): in-flight-state guards
(poll-with-skip after 90s polling for agent-online, inter-test
Phase04→Phase07 ordering, scheduler-tick race for discovered certs,
inter-test issuer fallthrough, defensive PEM-empty assertion).
Each site now has a closure comment explaining why skip is the
right choice rather than fail (upstream phase already surfaces the
real failure; skipping prevents masking root cause behind cascading
noise).
- internal/repository/postgres/{testutil,seed,repo}_test.go (5 sites):
testing.Short() gates for testcontainers-backed live PostgreSQL
integration tests. All correctly gated; closure comments added
naming the run command.
- internal/connector/notifier/email/email_test.go (2 sites):
anti-fixture assertions (test asserts SMTP dial fails; if a captive
portal black-holes the call to success, skip rather than false-pass).
Closure comments added explaining the fixture assumption.
- internal/connector/target/iis/iis_test.go (2 sites): platform-gated
skip for powershell.exe absence on non-Windows hosts. Mirrors the
production iis_connector.go LookPath guard. Closure comments added.
Total: 17 closure comments anchor the 37 skip sites (some sites share a
single block-level comment). All skips remain in place; the change is
purely documentation. The audit recommendation was "audit each skip and
decide" — for these 37, the decision is uniformly **document-skip**:
the gating is correct, the t.Skip messages name the missing precondition,
and the closure comments now pin the rationale for future readers.
See coverage-gap-audit-2026-04-24-v5/unified-audit.md
cat-s3-58ce7e9840be for closure rationale.
|
||
|
|
86fffa305a |
fix(deploy,helm,docs): published-image HEALTHCHECK speaks HTTPS + Helm /ready path + docs HTTPS sweep (U-2)
Pre-U-2 the published `ghcr.io/shankar0123/certctl-server` image shipped with `HEALTHCHECK CMD curl -f http://localhost:8443/health`. The server has been HTTPS-only since the v2.2 HTTPS-Everywhere milestone (`cmd/server/main.go::ListenAndServeTLS`, no plaintext fallback, TLS 1.3 pinned), so the probe failed on every interval and Docker marked the container `unhealthy` indefinitely. Operators inside docker- compose / Helm / the example stacks were unaffected — compose overrides the HEALTHCHECK with `--cacert + https://`, Helm uses explicit `httpGet` probes that ignore Docker's HEALTHCHECK, and every example compose file overrides with `curl -sfk https://localhost:8443/health`. But anyone running bare `docker run` / Docker Swarm / Nomad / ECS — exactly the "I just pulled the published image" path — saw permanent `unhealthy` status and (depending on orchestrator policy) a restart- loop. (Audit: cat-u-healthcheck_protocol_mismatch in coverage-gap-audit-2026-04-24-v5/unified-audit.md.) Recon for U-2 surfaced two adjacent bugs from the same v2.2 milestone gap, both bundled into this commit because they share the same root cause and the same operator surface: 1. Helm chart `server.readinessProbe.httpGet.path` pointed at `/readyz`, the kube-flavored convention. The certctl server doesn't register `/readyz` (only `/health` and `/ready` are wired and bypass the auth middleware — see internal/api/router/router.go:81 and cmd/server/main.go:920). K8s readiness probes therefore got 401 (api-key auth rejection) or 404 (when auth was disabled), pods stayed `NotReady` indefinitely, and Helm rollouts stalled. 2. The agent image (`Dockerfile.agent`) had no HEALTHCHECK at all, so bare-`docker run` agents got zero health signal. The compose override at `deploy/docker-compose.yml:173` called `pgrep -f certctl-agent` against the agent image, but the agent image didn't ship `procps` — pgrep was missing too. The compose probe was a latent always-fail. We fixed all three with the audit-recommended shape (option (a) — `-k`) plus three structural backstops: Files changed: Phase 1 — Dockerfile fix: - Dockerfile: HEALTHCHECK switched from `curl -f http://localhost:8443/ health` to `curl -fsk https://localhost:8443/health`. `-k` (insecure) is acceptable because the probe is localhost-to-localhost: the same process serving the cert is being probed, no network hop. Pinning `--cacert` is not viable for the published image because the bootstrap cert is per-deploy (generated into the `certs` named volume on first up; operator-supplied via Helm's `existingSecret` or cert-manager). Long-form docblock cross-references the audit closure, the compose vs Helm vs examples coverage matrix, and the CI guardrail. - Dockerfile.agent: added HEALTHCHECK using `pgrep -f certctl-agent` matching the compose pattern. Added `procps` to the runtime apk install — fixes both the new image-level HEALTHCHECK AND the pre-existing compose probe that was silently failing. Phase 2 — Helm readiness probe path: - deploy/helm/certctl/values.yaml: server.readinessProbe.httpGet.path changed from `/readyz` to `/ready`. Liveness probe path (`/health`) was correct and is unchanged. Probes block now carries an explanatory comment naming the registered no-auth probe routes and the U-2 closure rationale. Phase 3 — Image-level integration tests: - deploy/test/healthcheck_test.go (new, //go:build integration): TestPublishedServerImage_HealthcheckSpecUsesHTTPS builds the server image, inspects `Config.Healthcheck.Test` via `docker inspect`, and asserts the array contains `https://localhost:8443/health` and `-k`, and does NOT contain `http://localhost:8443/health` (positive + negative regression contracts). TestPublishedAgentImage_HealthcheckSpecExists builds the agent image and asserts the HEALTHCHECK uses `pgrep` against `certctl-agent`. Both tests `t.Skip` cleanly when docker isn't available (sandbox / CI without docker-in-docker) — verified locally: tests skip with the diagnostic and the suite returns PASS. TestPublishedServerImage_HealthcheckTransitionsToHealthy is a documented `t.Skip` placeholder until the harness wires a sidecar postgres for image-level smoke; the spec-level tests above cover the audit-flagged regression. Phase 4 — CI guardrail: - .github/workflows/ci.yml: new "Forbidden plaintext HEALTHCHECK regression guard (U-2)" step. Scoped patterns catch `HEALTHCHECK.*http://` and `curl -f http://localhost:8443/health` in any `Dockerfile*`. Comment lines exempt; docs/upgrade-to-tls.md out of scope (the post-cutover invariant string at line 182 is intentionally a documented expected-failure assertion). Verified locally on the real tree (passes) and against synthetic regressions (each fires the guard). Phase 5 — Docs sweep: - docs/connectors.md: 15 stale curl examples updated from `http://localhost:8443/...` to `https://localhost:8443/...` with `--cacert "$CA"` injected on every site. Added a one-time introductory note documenting the `$CA` extraction with `docker compose ... exec ... cat /etc/certctl/tls/ca.crt`, matching the pattern in docs/quickstart.md. Pre-U-2 these examples silently failed against the HTTPS listener. Phase 6 — Release surface: - CHANGELOG.md: appended U-2 section to the existing [unreleased] block (immediately below the G-1 entry). Sections: explanatory blockquote covering all three bugs (primary + 2 adjacent), Fixed, Added, Changed. Verification (all gates pass): - go build ./... — clean - go vet ./... — clean - go vet -tags integration ./deploy/test/ — clean - go test -short ./... — every package green - go test -tags integration -v -run TestPublishedServerImage|TestPublishedAgentImage ./deploy/test/ — three tests SKIP cleanly with "docker not available" diagnostic - helm lint deploy/helm/certctl/ — clean - helm template smoke render — succeeds; rendered Deployment carries `path: /ready` and zero `/readyz` matches - python3 yaml.safe_load on api/openapi.yaml — parses - govulncheck ./... — no vulnerabilities in our code - CI guardrail mirror: clean on real tree, fires on synthetic regression patterns Out of scope (intentionally untouched): - cmd/server/main.go::ListenAndServeTLS — HTTPS-only is correct, this finding does NOT propose adding back a plaintext listener. - deploy/docker-compose.yml:126 HEALTHCHECK — already correct. - deploy/docker-compose.test.yml HEALTHCHECK blocks — already correct. - All 5 examples/*/docker-compose.yml HEALTHCHECK overrides — already correct (they ALSO use `-fsk https://localhost:8443/health`). - Helm server.livenessProbe.httpGet — already uses `scheme: HTTPS` + `path: /health`, correct. - docs/upgrade-to-tls.md:182 `curl ... http://localhost:8443/health` invariant line — that's the expected-failure assertion for the post-cutover state ("plaintext is gone, expect Connection refused"); intentionally left intact. - Go production code — this is purely a deploy-image / probe / docs / Helm-chart fix. Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md §2 P1 cluster, cat-u-healthcheck_protocol_mismatch Audit recommendation followed verbatim: 'change Dockerfile:80 to CMD curl -kf https://localhost:8443/health'. |
||
|
|
52248be717 |
v2.0.47: HTTPS Everywhere — TLS-only control plane, agents/CLI/MCP
Breaking change release. Plaintext HTTP listener removed. The certctl control plane now terminates TLS 1.3 on :8443 via http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md. Server - cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback), preflightServerTLS validation - cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe, watchSIGHUP wiring, cert/key path config threading - tls_test.go: 418-line regression coverage of reload, preflight, callback behavior, SAN validation Config - CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required) - Plaintext rejection: agents/CLI/MCP pre-flight-fail on http:// URLs with a pointer to docs/upgrade-to-tls.md Agents, CLI, MCP - All three pre-flight-reject http:// URLs with fail-loud diagnostic - CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust - CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass (loud warning on startup) - install-agent.sh emits both vars as commented template lines docker-compose - certctl-tls-init sidecar generates SAN-valid self-signed cert into deploy/test/certs/ on first boot - All demo-stack curls pin against ca.crt with --cacert Helm chart - Three TLS provisioning modes, exactly one required: - server.tls.existingSecret (operator-supplied) - server.tls.certManager.enabled (cert-manager integration) - server.tls.selfSigned.enabled (eval only — not for production) - server-certificate.yaml template for cert-manager mode - helm install without a TLS source fails at template render with a pointer to docs/tls.md CI - .github/workflows/ci.yml Helm Chart Validation step renders the chart in both existingSecret and cert-manager modes, plus an inverse guard-regression test that asserts helm template MUST refuse to render when no TLS source is configured. Previously the single `helm template` invocation hit the certctl.tls.required fail-loud guard and exit-1'd CI. Four invocations now: lint (existingSecret), template (existingSecret), template (cert-manager), template (no args — must fail). Integration tests - deploy/test/integration_test.go stands up the Compose stack over HTTPS, extracts the CA bundle, and exercises every certctl API over https://localhost:8443 - All 34 integration subtests green (per Phase 8 local CI-parity) Documentation - New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload) - New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade warnings, fleet-roll sequencing) - CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry (file heading unchanged; release tag is v2.0.47) - All curls in docs/, examples/, deploy/helm/ guides use https://localhost:8443 --cacert Verification - grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits - grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin API default, SSRF doc comment) — zero certctl endpoints - Tasks #197–#206 (Phases 0–8) all closed in the tracker Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix). |
||
|
|
675b87ba63 |
I-005: notification retry loop + dead-letter queue
Critical alerts can no longer be silently dropped by a transient
notifier failure. Failed notification attempts now ride an exponential
backoff retry loop, with a 5-attempt budget before promotion to the
dead-letter queue for operator intervention.
Schema (migration 000016, idempotent):
- retry_count INTEGER NOT NULL DEFAULT 0
- next_retry_at TIMESTAMPTZ
- last_error TEXT
- idx_notification_events_retry_sweep partial index
(next_retry_at) WHERE status='failed' AND next_retry_at IS NOT NULL
Dead rows clear next_retry_at so the index stops matching them.
Service contract:
- NotificationService.RetryFailedNotifications drives 2^n-minute
exponential backoff capped at 1h (notifRetryBackoffCap) with
5-attempt budget (notifRetryMaxAttempts).
- Exhaustion (RetryCount >= notifRetryMaxAttempts-1) promotes to
status='dead' via MarkAsDead.
- Non-terminal failures record via RecordFailedAttempt.
- Success path promotes to 'sent' without touching retry_count
(audit preserves "delivered on attempt N").
- Missing-notifier branch defensively promotes to 'sent' to avoid
wedging a row on a deleted channel.
- RequeueNotification operator escape hatch atomically resets
retry_count -> 0, next_retry_at -> NULL, last_error -> NULL,
status -> pending via notifRepo.Requeue.
Scheduler:
- New always-on notificationRetryLoop wired into the base loop set at
CERTCTL_NOTIFICATION_RETRY_INTERVAL (default 2m).
- sync/atomic.Bool idempotency guard.
- sync.WaitGroup shutdown drain via WaitForCompletion.
StatsService:
- SetNotifRepo setter pattern preserves 9 pre-existing
NewStatsService call sites (main.go + stats_test.go + 8 digest
tests) without touching the constructor signature.
- DashboardSummary.NotificationsDead populated via
notifRepo.CountByStatus(ctx, "dead") — nil-safe when unwired
(reports zero on systems without a notification repository).
- CountByStatus error is non-fatal (dashboard summary is
best-effort for this field).
- Prometheus certctl_notification_dead_total counter emitted from
the same snapshot.
Handler:
- New POST /api/v1/notifications/{id}/requeue endpoint.
- dead status surfaces to MCP + CLI.
Frontend:
- NotificationsPage gains two-tab toolbar ("All" / "Dead letter")
with queryKey: ['notifications', activeTab] so switching tabs
doesn't serve stale data until the 30s refetch.
- Dead rows surface "Retry {n}/5" + truncated last_error with
full-text title tooltip.
- Requeue mutation wrapped as
mutationFn: (id: string) => requeueNotification(id)
to prevent react-query v5's positional context argument from
leaking into the API client — pinned against future refactors
by strict-match toHaveBeenCalledWith('notif-dead-001') in
NotificationsPage.test.tsx:181.
Closes I-005.
|
||
|
|
91642e2860 |
C-001 scope expansion: tighten parallel POST /api/v1/certificates call sites to six-field contract
Problem: |
||
|
|
3287e174dc |
Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
|
||
|
|
5567d4b411 |
feat(M47): add Kubernetes Secrets target + AWS ACM PCA issuer connectors
Implement both M47 connectors with full cross-layer wiring: Kubernetes Secrets target: DNS-1123 validation, kubernetes.io/tls Secret create-or-update, chain concatenation, serial number validation, Helm RBAC gating. 18 tests. AWS ACM Private CA issuer: synchronous issuance (like Vault), ARN regex validation, RFC 5280 revocation reason mapping, CA cert retrieval, factory + env var seeding. 23 tests. Cross-cutting: domain types, service validation, config, factory, agent dispatch, frontend (TargetsPage, issuerTypes), OpenAPI, seed data, Helm chart, connectors docs, README. Testing docs (testing-guide, qa-test-guide, qa_test.go) with Parts thematically integrated near related connectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e5516d7286 |
test: add unified QA test suite (qa_test.go) replacing legacy bash smoke script
1717-line Go test file covering all 52 Parts of testing-guide.md against the Docker Compose demo stack. ~120 automated subtests (API, DB, source, perf), 11 skipped Parts with reasons, ~270 manual gaps documented. Audited against actual router, seed data, domain structs, and migrations — 8 factual bugs caught and fixed during review. Companion guide at docs/qa-test-guide.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
adfb682754 |
feat: Go integration test suite replacing bash end-to-end tests
Refactors deploy/test/run-test.sh into a typed Go test file with crypto/x509 certificate parsing, eliminating fragile openssl text scraping. 12 phases, 35 subtests covering Local CA, ACME, step-ca, revocation, discovery, renewal, EST, S/MIME, and API spot checks. - testClient HTTP helper with Bearer auth - testDB PostgreSQL helper (port 5432 now exposed) - waitFor/waitForJobsDone polling helpers - crypto/x509 for EKU, KeyUsage, SAN verification - crypto/tls for NGINX deployment verification - //go:build integration tag (not in CI yet) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
0822f748a5 |
feat: S/MIME certificate support in integration tests + test env docs
Add S/MIME (emailProtection EKU) end-to-end test coverage: - ValidateCommonName() now accepts email addresses for S/MIME certs - S/MIME test profile (prof-test-smime) in seed data - Phase 11 test: issuance, EKU, KeyUsage, email SAN verification - EST config enabled in test Docker Compose - Portable KeyUsage parsing (awk, works on BSD/GNU) - Full test environment documentation (docs/test-env.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
b059ec930f |
fix: end-to-end certificate lifecycle bugs + integration test environment
Fixes 12 production bugs preventing the full issuance→deployment flow from working with ACME (Pebble/Let's Encrypt) and step-ca issuers: ACME connector (acme.go): - Save orderURI before WaitOrder overwrites it (Go crypto/acme bug) - Add CreateOrderCert fallback via WaitOrder+FetchCert - Remove defer-reset in ValidateConfig that caused nil pointer panic - Add Insecure TLS option for self-signed ACME servers (Pebble) step-ca connector (stepca.go, jwe.go): - Real JWE provisioner key loading + decryption (was using ephemeral keys) - Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header - Custom root CA trust via RootCertPath config - Remove hardcoded 90-day validity default (let step-ca decide) NGINX target connector (nginx.go): - Use sh -c for validate/reload commands (shell interpretation) - Use filepath.Dir instead of fragile string slicing - Add private key file writing (agent-mode keys were never deployed) - Make chain_path write conditional Server/service layer: - TriggerRenewalWithActor now creates actual Job records (was no-op) - createDeploymentJobs falls back to DB query when cert.TargetIDs empty - ProcessPendingJobs skips agent-routed deployment jobs - Agent cert pickup path parsing: len(parts)<4 → len(parts)<3 - Health/ready/auth-info endpoints bypass auth middleware - Write timeout 15s→120s for ACME issuance - Cert fingerprint computed on CSR submission Integration test environment (deploy/test/): - 10-phase test script covering Local CA, ACME, step-ca, revocation, discovery, renewal, and API spot checks - Docker Compose with 7 containers (server, agent, postgres, nginx, pebble, challtestsrv, step-ca) on isolated network - TLS verification checks SAN (not just Subject CN) for modern CA compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |