mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:31:34 +00:00
v2.0.52
82 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
a3d8b9c607 |
fix(deploy,db,handler): close fresh-clone postgres init failure + 4 ride-along audit findings (U-3 master)
GitHub #10 reopened: operator mikeakasully cloned v2.0.50 fresh and ran the canonical quickstart (docker compose -f deploy/docker-compose.yml up -d --build); postgres reported unhealthy indefinitely, dependent containers never started. Root cause: deploy/docker-compose.yml mounted a hand-curated subset of migrations/*.up.sql + seed.sql into postgres /docker-entrypoint-initdb.d/. Postgres applied them at initdb time. Once seed.sql referenced columns added by migrations *after* the mounted cutoff (e.g., policy_rules.severity from migration 000013), initdb crashed mid-seed and the container loop wedged. Two sources of truth (compose mount list vs in-tree migration ladder) diverged the moment a seed-touching migration shipped, and the only thing that fixed it was hand-editing the compose file every release. Fix: remove the dual source. Postgres boots empty; the server applies migrations + seed at startup via RunMigrations + RunSeed. Helm has used this pattern since day one (postgres-init emptyDir); compose now matches. Bundled with four ride-along audit findings whose fixes share the same schema/db code surface, so operators take the schema-change pain only once: cat-u-seed_initdb_schema_drift [P1, primary] — initdb-mount fix cat-o-retry_interval_unit_mismatch [P1] — column rename minutes→seconds cat-o-notification_created_at_dead_field [P2] — add column + populate cat-o-health_check_column_orphans [P1] — drop unwired columns cat-u-no_version_endpoint [P2] — add /api/v1/version Single migration (000017_db_coupling_cleanup) bundles the three schema changes under a DO \$\$ guard so re-application is safe; reduces operator-visible 'schema-change releases' from four to one. Backend - internal/repository/postgres/db.go: add RunSeed (baseline) + RunDemoSeed (gated by CERTCTL_DEMO_SEED). Both idempotent (ON CONFLICT DO NOTHING in every shipped INSERT) so repeated boots are safe; missing-file is no-op so custom packaging that strips seeds still boots cleanly. - cmd/server/main.go: invoke RunSeed (always) + RunDemoSeed (when flag set) immediately after RunMigrations. - internal/repository/postgres/notification.go: NotificationRepository.Create now sets created_at (with time.Now() fallback when caller leaves it zero); scanNotification reads it back; List + ListRetryEligible SELECT extended. - internal/repository/postgres/renewal_policy.go: column references updated to retry_interval_seconds across SELECT/INSERT/UPDATE sites. - internal/api/handler/version.go: new VersionHandler exposes {version, commit, modified, build_time, go_version} from runtime/debug.ReadBuildInfo() with ldflags-supplied Version override. - internal/api/router/router.go: register GET /api/v1/version through the no-auth chain (CORS + ContentType) alongside /health, /ready, /api/v1/auth/info. - cmd/server/main.go: add /api/v1/version to no-auth dispatch + audit ExcludePaths so rollout polling doesn't dominate the audit trail. - internal/config/config.go: add DatabaseConfig.DemoSeed + CERTCTL_DEMO_SEED env var. Migration - migrations/000017_db_coupling_cleanup.up.sql + .down.sql: (1) renewal_policies.retry_interval_minutes → retry_interval_seconds (DO \$\$ guard, idempotent re-application) (2) notification_events ADD COLUMN created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (3) network_scan_targets DROP orphan health_check_enabled + health_check_interval_seconds - migrations/seed.sql: column reference updated to retry_interval_seconds. - migrations/seed_demo.sql: same column rename + applied at runtime now via RunDemoSeed (no longer initdb-mounted). Compose - deploy/docker-compose.yml: drop ALL initdb mounts (10 migration files + seed.sql); add start_period: 30s to postgres + certctl-server healthchecks to absorb the runtime migration + seed application window on first boot. - deploy/docker-compose.test.yml: same drop (+ ghost seed_test.sql mount removed; that file never existed); same healthcheck start_period. - deploy/docker-compose.demo.yml: replace seed_demo.sql initdb mount with CERTCTL_DEMO_SEED=true env var on certctl-server. Tests - internal/api/handler/version_handler_test.go: TestVersion_ReturnsBuildInfo, TestVersion_RejectsNonGet, TestVersion_LdflagsOverride. - internal/repository/postgres/seed_test.go: TestRunSeed_AppliesIdempotently, TestRunSeed_MissingFileIsNoOp, TestRunDemoSeed_AppliesIdempotently, TestMigration000017_RetryIntervalRename, TestMigration000017_NotificationCreatedAt, TestMigration000017_HealthCheckOrphansDropped (testcontainers, -short skips). - internal/repository/postgres/notification_test.go: TestNotificationRepository_CreatedAt_IsPersisted + TestNotificationRepository_CreatedAt_DefaultsToNow. CI guardrail - .github/workflows/ci.yml: new 'Forbidden migration mount in compose initdb (U-3)' step grep-fails the build if any migrations/*.sql or seed*.sql re-appears in /docker-entrypoint-initdb.d in any compose file. Catches future drift before a fresh-clone operator hits it. Spec / Docs - api/openapi.yaml: add /api/v1/version operation under Health tag. - docs/architecture.md: replace the 'initdb may run the same SQL' paragraph with a post-U-3 single-source-of-truth explanation. - CHANGELOG.md: full unreleased-section entry covering all 5 closures, breaking changes, and the new env var. Audit doc - coverage-gap-audit-2026-04-24-v5/unified-audit.md: add new P1 #14 cat-u-seed_initdb_schema_drift; flip the 4 ride-along findings to ✅ RESOLVED with closure prose pointing at this commit. Verification: build/vet/test -short -race all clean across all touched packages locally; govulncheck reports 0 vulnerabilities affecting our code; OpenAPI YAML parses; CI U-3 grep guardrail clears against the post-fix tree. |
||
|
|
87213128cc |
fix(security,domain): redact Agent.APIKeyHash from JSON wire shape (G-2)
Pre-G-2 internal/domain/connector.go::Agent::APIKeyHash was tagged
`json:"api_key_hash"` and shipped on every wire surface that returned
domain.Agent — GET /api/v1/agents (PagedResponse{Data: agents}),
GET /api/v1/agents/{id}, GET /api/v1/agents/retired, and the
POST /api/v1/agents registration response. Every authenticated client
(browser, CLI --json, MCP tool calls) received the SHA-256-of-the-API-key
string. The browser silently dropped it because web/src/api/types.ts
omits the field, but CLI and MCP consumers print full JSON so the hash
was visible there. Even though the value is a hash and not the plaintext
key, shipping it gives an attacker an offline brute-force target if the
API-key entropy is low (certctl doesn't enforce a minimum on operator-
supplied keys), and there's no business reason for any client to ever
receive it — the value is server-internal, used only for the lookup at
internal/repository/postgres/agent.go::GetByAPIKey. (Audit:
cat-s5-apikey_leak in coverage-gap-audit-2026-04-24-v5/unified-audit.md.)
We chose the audit's recommended fix (json:"-") plus a defense-in-depth
MarshalJSON plus a CI guardrail. Three layers because struct-tag
redaction alone is one rebase away from being silently reverted, the
custom MarshalJSON catches the case where a parent struct embeds Agent
under a different tag, and the CI grep blocks reintroduction at the spec
or frontend boundary even without a code review catching it.
Files changed:
Phase 1 — Domain redaction:
- internal/domain/connector.go: APIKeyHash tag flipped from
`json:"api_key_hash"` to `json:"-"`. New Agent.MarshalJSON
with value receiver + type-alias-recursion-break that explicitly
zeroes APIKeyHash on the marshal-time copy. Long-form docblock
explaining the G-2 closure rationale + cross-references to
service.RegisterAgent (populator), repository.AgentRepository::
GetByAPIKey (consumer), docs/architecture.md (DB-shape vs
API-shape distinction), and the audit finding.
Phase 2 — Domain tests (5 test functions):
- internal/domain/connector_test.go: TestAgent_MarshalJSON_RedactsAPIKeyHash
pins the marshal-boundary contract on a value receiver. ...RedactsViaPointer
pins the *Agent path. ...RedactsInSlice pins the []Agent path that the
ListAgents handler actually emits via PagedResponse. ...DoesNotMutateReceiver
pins the by-value-receiver contract so a future refactor that switches
to pointer-receiver gets caught. ...RoundTrip pins the wire-shape
guarantee that APIKeyHash is dropped on encode and cannot reappear on
decode. Single sentinel value ("sha256:LEAKED-CREDENTIAL-DERIVATIVE-
SENTINEL") flows through every fixture for grep-ability on regression.
Phase 3 — Handler tests (4 test functions):
- internal/api/handler/agent_handler_test.go: TestListAgents_DoesNotLeakAPIKeyHash,
TestGetAgent_DoesNotLeakAPIKeyHash, TestRegisterAgent_DoesNotLeakAPIKeyHash,
TestListRetiredAgents_DoesNotLeakAPIKeyHash. Each asserts (a) the
literal substring "api_key_hash" is absent from the httptest-captured
body, (b) the leak sentinel value is absent, (c) the non-leaked fields
ARE present (sanity that the handler is serving real data, not just
empty payloads). Shared sentinel "sha256:LEAKED-CREDENTIAL-DERIVATIVE-
HANDLER-SENTINEL" so a single grep over a failing test's output
identifies the leak surface immediately.
Phase 4 — Spec / docs:
- api/openapi.yaml: api_key_hash property REMOVED from Agent schema
(was at line 3690). Inline G-2 comment naming the closure + the
database-vs-API-shape distinction so a future spec edit doesn't
silently re-introduce the field.
- docs/architecture.md: ER-diagram block already documents the agents
table including api_key_hash (DB shape — correct). Added a sibling
note paragraph immediately below the diagram explaining that several
columns are intentionally server-internal (api_key_hash redaction
+ issuers.config / deployment_targets.config encrypted shadow), with
cross-references to the redaction enforcement site, the OpenAPI
schema, the frontend interface, and the CI guardrail.
- web/src/api/types.ts: Agent interface unchanged in shape (already
omitted the field) but added a leading comment block explaining
WHY the omission is intentional — stops a future frontend dev from
"completing" the interface from the OpenAPI spec or the Go struct.
Phase 5 — CI guardrail:
- .github/workflows/ci.yml: new "Forbidden api_key_hash JSON-shape
regression guard (G-2)" step. Scoped patterns catch the actual
regression shapes — Go struct tag (json:"api_key_hash"), frontend
interface declaration, OpenAPI schema property, YAML enum/array
membership. Repository / migration / seed / service / integration /
unit-test / comment lines exempt. Verified locally on the real tree
(passes) and against 4 synthetic regression patterns (each fires
the guardrail). Mirrors the G-1 pattern from .github/workflows/
ci.yml lines 47-108.
Phase 5b — Sweep verification (no changes, results documented for the
next reader):
- internal/api/middleware/audit.go: doesn't serialize Agent struct;
records request body only. No leak.
- service.RegisterAgent audit-event payload: `map[string]interface{}{
"name": name, "hostname": hostname}` — name + hostname only,
no APIKeyHash. No leak.
- All 9 slog sites that mention agent: scalar attrs only ("agent_id",
"error", "agent_hostname"), never the full struct. No leak.
- internal/mcp, internal/cli, cmd/cli, cmd/mcp-server: zero matches
for APIKeyHash / api_key_hash. Both pass server JSON verbatim, so
the wire-side fix transitively closes them.
Verification (all gates pass):
- go build ./...
- go vet ./...
- go test -short ./... — every package green
- go test -short -race ./internal/domain/... ./internal/api/handler/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template smoke render — succeeds
- python3 yaml.safe_load on api/openapi.yaml — parses
- OpenAPI Agent schema scan: no api_key_hash property
- CI guardrail mirror: clean on real tree, fires on all 4 synthetic
regression patterns
- Domain pkg coverage: Agent.MarshalJSON 100%, connector.go total 87.5%
- Handler pkg coverage: 79.2%
Sample response body (httptest captured during verification, GET
/api/v1/agents/{id} via the new handler test):
{"id":"agent-demo","name":"demo-agent","hostname":"demo.host",
"status":"Online","last_heartbeat_at":"2026-04-24T11:59:30Z",
"registered_at":"2026-04-24T12:00:00Z","os":"linux",
"architecture":"amd64","ip_address":"10.0.0.42",
"version":"v2.0.49"}
Note the absence of any api_key_hash key, even though the in-memory
struct passed to the handler had APIKeyHash set to a sentinel.
Out of scope (intentionally untouched):
- internal/repository/postgres/agent.go SELECT/INSERT/UPDATE/scan
paths and GetByAPIKey lookup — DB column stays, repo still
populates the struct, auth lookup still works. The redaction is a
marshal-boundary concern.
- migrations/000001_initial_schema.up.sql + migrations/seed_*.sql —
DB schema and seed data unchanged.
- internal/service/agent.go::RegisterAgent — service-side hashing
and persistence unchanged.
- Other domain types with potential credential-derivative fields
(Issuer.Config, DeploymentTarget.Config, notifier configs). Not
flagged by the audit; some are already protected (e.g.,
DeploymentTarget.EncryptedConfig []byte `json:"-"`). File a
separate audit pass if recon surfaces additional leaks.
- Per-resource DTO layer across every handler. Single audit
finding, single domain type.
- A separate possible follow-up: the v2 RegisterAgent endpoint
doesn't return the plaintext API key to the agent, which may
mean self-bootstrap via POST /api/v1/agents is broken. Verified
during recon; out of scope for G-2; should be its own ticket.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-s5-apikey_leak
Audit recommendation: 'json:"-" or API-response DTO
excluding APIKeyHash' — went with the json:"-" + MarshalJSON
defense-in-depth pair plus CI guardrail and structural docs.
|
||
|
|
9c1d446e40 |
fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
|
||
|
|
04c7eca615 |
docs: reconcile scheduler topology across sibling docs (7 → 12 loops)
Authoritative 12-loop table lives at docs/architecture.md:522-534 (committed via
the I-001/I-003/I-005 + M48/M50 milestone commits). This change brings six sibling
docs into parity with that table so every surface — user-facing features reference,
SOC 2 compliance mapping, connectors guide, advanced demo architecture diagram,
testing guide, and in-line architecture prose — reflects the same 8 always-on + 4
opt-in topology.
Touches:
- docs/architecture.md: 2 inline ordinal references (9th / 8th loop) replaced with
descriptive names (opt-in cloud discovery / opt-in endpoint health), cross-linked
to the authoritative table to prevent future ordinal rot.
- docs/features.md: metric row (7 → 12), inline reference to 9th loop, and full
scheduler table expanded to include Always-on column + env vars + I-001/I-003/I-005
refs.
- docs/compliance-soc2.md: background scheduler monitoring bullets expanded to list
all 12 loops with env vars + I-series refs; table row updated with 8 always-on +
4 opt-in summary.
- docs/connectors.md: three inline ordinals (7th/6th/9th loop) replaced with
descriptive names, cross-linked to architecture.md.
- docs/demo-advanced.md: Mermaid SCHED node label updated from '7 background loops'
to '12 background loops (8 always-on + 4 opt-in)'.
- docs/testing-guide.md: Test 20.1.1 header + grep pattern expanded to include
job-retry / job-timeout / notification-retry / digest / endpoint-health /
cloud-discovery loops; sign-off chart row label updated.
Pure documentation reconciliation. No code changes. Master HEAD pre-commit:
|
||
|
|
6e646e0fe8 |
M-001/M-006: strip HTTP auth from EST/SCEP + fail-loud SCEP preflight
Closes CWE-306 (missing authentication for critical function) for SCEP
via a fail-loud startup gate, and aligns EST/SCEP HTTP dispatch with
their respective RFCs. CRL/OCSP remain unauthenticated under
.well-known/pki/* per RFC 5280 §5 / RFC 6960 / RFC 8615. Option (D):
no mTLS in this milestone.
- RFC 7030 §3.2.3 (EST auth is deployment-specific) and §4.1.1
(/cacerts explicitly anonymous): EST paths served unauthenticated;
CSR-signature + profile policy enforce identity inside ESTService.
- RFC 8894 §3.2: SCEP authenticates via the challengePassword
PKCS#10 attribute (OID 1.2.840.113549.1.9.7), not an HTTP credential.
HTTP dispatch is unauthenticated; preflightSCEPChallengePassword
refuses to start when CERTCTL_SCEP_ENABLED=true without
CERTCTL_SCEP_CHALLENGE_PASSWORD. SCEPService.PKCSReq enforces the
same invariant defense-in-depth and compares with
crypto/subtle.ConstantTimeCompare.
cmd/server/main.go:
- Extract buildFinalHandler(apiHandler, noAuthHandler, webDir,
dashboardEnabled); route /.well-known/est/*, /scep, /scep/*,
/.well-known/pki/crl/{id}, /.well-known/pki/ocsp/{id}/{serial},
and health probes through noAuthHandler (RequestID +
structuredLogger + Recovery only).
- Add preflightSCEPChallengePassword fail-loud gate; startup log
emits challenge_password_set boolean for operator visibility.
cmd/server/finalhandler_test.go (new, 314 lines, 27 subtests):
- TestBuildFinalHandler_Dispatch (20) + TestBuildFinalHandler_NoDashboard
(7) pin the dispatch surface: EST 4-endpoint, SCEP exact +
trailing-slash + query-string, PKI CRL+OCSP, health, /api/v1/*
authenticated, /assets/* file server, SPA fallback.
internal/api/router/router.go, internal/config/config.go:
- Router-level comments explain why EST/SCEP/PKI dispatchers sit
outside the authenticated mux; SCEP challenge password config
plumbed through.
docs/architecture.md:
- New EST Authentication subsection (RFC 7030 §3.2.3 + §4.1.1,
buildFinalHandler + noAuthHandler references).
- Rewrite SCEP Authentication subsection; replaces pre-existing
factually-incorrect "any value accepted" claim with CWE-306
preflight, service-layer defense-in-depth, and
crypto/subtle.ConstantTimeCompare.
- Top-level Authentication section: qualify /api/v1/* scope on API
clients bullet; add standards-based-endpoints bullet referencing
the 27-subtest regression harness.
docs/compliance-soc2.md:
- CC6.1: scope API Key Authentication to /api/v1/*; add
standards-based endpoints bullet citing RFCs and CWE-306 closure.
- CC6.3: scope API Key Policy to /api/v1/* with cross-reference to
CC6.1.
- Evidence Locations augmented with buildFinalHandler,
preflightSCEPChallengePassword, scep.go defense path, regression
harness, and OpenAPI security:[] overrides.
api/openapi.yaml: verified already correct (global bearerAuth
default overridden with security:[] on /cacerts, /simpleenroll,
/simplereenroll, /csrattrs, /scep GET+POST, /crl/{issuer_id},
/ocsp/{issuer_id}/{serial}); no edits needed.
|
||
|
|
675b87ba63 |
I-005: notification retry loop + dead-letter queue
Critical alerts can no longer be silently dropped by a transient
notifier failure. Failed notification attempts now ride an exponential
backoff retry loop, with a 5-attempt budget before promotion to the
dead-letter queue for operator intervention.
Schema (migration 000016, idempotent):
- retry_count INTEGER NOT NULL DEFAULT 0
- next_retry_at TIMESTAMPTZ
- last_error TEXT
- idx_notification_events_retry_sweep partial index
(next_retry_at) WHERE status='failed' AND next_retry_at IS NOT NULL
Dead rows clear next_retry_at so the index stops matching them.
Service contract:
- NotificationService.RetryFailedNotifications drives 2^n-minute
exponential backoff capped at 1h (notifRetryBackoffCap) with
5-attempt budget (notifRetryMaxAttempts).
- Exhaustion (RetryCount >= notifRetryMaxAttempts-1) promotes to
status='dead' via MarkAsDead.
- Non-terminal failures record via RecordFailedAttempt.
- Success path promotes to 'sent' without touching retry_count
(audit preserves "delivered on attempt N").
- Missing-notifier branch defensively promotes to 'sent' to avoid
wedging a row on a deleted channel.
- RequeueNotification operator escape hatch atomically resets
retry_count -> 0, next_retry_at -> NULL, last_error -> NULL,
status -> pending via notifRepo.Requeue.
Scheduler:
- New always-on notificationRetryLoop wired into the base loop set at
CERTCTL_NOTIFICATION_RETRY_INTERVAL (default 2m).
- sync/atomic.Bool idempotency guard.
- sync.WaitGroup shutdown drain via WaitForCompletion.
StatsService:
- SetNotifRepo setter pattern preserves 9 pre-existing
NewStatsService call sites (main.go + stats_test.go + 8 digest
tests) without touching the constructor signature.
- DashboardSummary.NotificationsDead populated via
notifRepo.CountByStatus(ctx, "dead") — nil-safe when unwired
(reports zero on systems without a notification repository).
- CountByStatus error is non-fatal (dashboard summary is
best-effort for this field).
- Prometheus certctl_notification_dead_total counter emitted from
the same snapshot.
Handler:
- New POST /api/v1/notifications/{id}/requeue endpoint.
- dead status surfaces to MCP + CLI.
Frontend:
- NotificationsPage gains two-tab toolbar ("All" / "Dead letter")
with queryKey: ['notifications', activeTab] so switching tabs
doesn't serve stale data until the 30s refetch.
- Dead rows surface "Retry {n}/5" + truncated last_error with
full-text title tooltip.
- Requeue mutation wrapped as
mutationFn: (id: string) => requeueNotification(id)
to prevent react-query v5's positional context argument from
leaking into the API client — pinned against future refactors
by strict-match toHaveBeenCalledWith('notif-dead-001') in
NotificationsPage.test.tsx:181.
Closes I-005.
|
||
|
|
0725713e19 |
Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.
Schema — migration 000015:
migrations/000015_agent_retire.up.sql flips
deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
boundary instead of quietly destroying targets. Both `agents` and
`deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
TEXT pair (TEXT not VARCHAR so operator comments are never
truncated), indexed via partial indexes WHERE retired_at IS NOT
NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
EXISTS) so repeated runs against partially-migrated databases
converge. migrations/000015_agent_retire.down.sql restores CASCADE
and drops the new columns for clean rollback. A dedicated
repository-layer testcontainers test
(internal/repository/postgres/migration_000015_test.go) asserts the
before/after FK action, column presence, index presence, and
round-trip idempotency under up→down→up.
Domain — sentinel guard + dependency counts:
internal/domain/connector.go gains IsRetired() on Agent, the
exported SentinelAgentIDs slice listing server-scanner,
cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
four reserved IDs documented in CLAUDE.md and created at startup in
cmd/server/main.go), IsSentinelAgent(id string) predicate,
AgentDependencyCounts{ActiveTargets, ActiveCertificates,
PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
ActorTypeSystem enum values used by audit emission downstream.
Coverage locked down by internal/domain/connector_test.go.
Service — 8-step ordered contract:
internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
opts{Force, Reason}) enforces a fixed execution order:
(1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
unconditionally; force=true does NOT bypass it.
(2) fetch — ErrAgentNotFound on miss.
(3) idempotency — if IsRetired() already, return
AgentRetirementResult{AlreadyRetired: true} with no new audit
event and no state change (safe to replay from flaky clients).
(4) preflight counts — collectAgentDependencyCounts runs
ActiveTargets, ActiveCertificates, PendingJobs sequentially
(not in parallel; keeps the per-query timeout predictable and
matches the repo's existing call-chain shape).
(5) force-reason guard — opts.Force=true with empty Reason returns
ErrForceReasonRequired (wired into the 400 status surface).
(6) dependency guard — HasDependencies() with opts.Force=false
returns BlockedByDependenciesError{Counts} (wired into the 409
body with per-bucket counts).
(7) mutation — single pinned retiredAt := time.Now(); agent
retirement first, then cascade target retirement if opts.Force,
all under the repo's single transaction so the two retired_at
stamps match to the second.
(8) best-effort audit — agent_retired always; agent_retirement_
cascaded additionally on the force path. Actor is whatever the
handler resolves from the request; actor type is mapped by
resolveActorType (system/agent-prefix→Agent/else→User). Audit
emission failures are logged via slog.Error but do not abort
the retirement (matches the house convention used by every
other scheduler-emitted event).
BlockedByDependenciesError implements Error() as
"active_targets=%d, active_certificates=%d, pending_jobs=%d" and
Unwrap() → ErrBlockedByDependencies. The single struct satisfies
errors.Is via Unwrap (used by scheduler-level tests) and errors.As
via the concrete type (used by the handler to fish out Counts for
the 409 body). ListRetiredAgents(page, perPage) adds a separate
paginated accessor with page<1→1 and perPage<1→50 normalization so
retired rows are queryable without polluting the default agent
listing.
Sentinel guard coverage is asymmetric by design: all four reserved
IDs are protected, and force=true cannot override. Regression tests
in internal/service/agent_retire_test.go assert each of the eight
steps in order, plus sentinel bypass attempts and idempotency
replay.
Handler + router — status-code surface:
internal/api/handler/agents.go:RetireAgent exposes seven status
codes on DELETE /agents/{id}:
200 on a fresh retirement (body echoes AgentRetirementResult).
204 on idempotent replay (AlreadyRetired=true; no new audit).
400 on ErrForceReasonRequired.
403 on ErrAgentIsSentinel.
404 on ErrAgentNotFound.
409 on BlockedByDependenciesError, with a custom body shape
{error, counts{active_targets, active_certificates,
pending_jobs}} that bypasses the default ErrorWithRequestID
envelope so callers get the per-bucket numbers directly.
500 on any other error.
Heartbeat HandleHeartbeat returns 410 Gone when the agent is
retired (ErrAgentRetired), signalling the agent to shut down.
Query params `force=true` and `reason=<text>` drive the cascade
path; both are forwarded as url.Values through the new MCP
transport.
internal/api/router/router.go registers GET /api/v1/agents/retired
literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
literal-beats-pattern-var precedence routes "retired" to the
paginated retired-agents listing instead of fetching a hypothetical
agent named "retired".
Agent binary — clean shutdown on 410:
cmd/agent/main.go gains the ErrAgentRetired sentinel, a
retiredOnce sync.Once, and a retiredSignal chan struct{}. A
markRetired(source, statusCode, body) helper closes the channel
exactly once; the Run() select loop observes the close and returns
ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
and exits cleanly instead of spinning in the heartbeat retry loop.
The 410 Gone surface is therefore terminal for the agent process.
MCP transport:
internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
a new additive transport method. Client.Delete is path-only; without
this method the retire tool would silently drop `force` and `reason`,
turning every cascade retire into a default soft-retire. The new
method shares do()'s 204 normalization and 4xx/5xx error
propagation so tool authors get one contract.
internal/mcp/tools.go + internal/mcp/types.go expose the
retire_agent tool with Force+Reason inputs wired through
DeleteWithQuery.
CLI:
cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
`agents list --retired` (client-side strip of --retired then
delegation to ListRetiredAgents, sharing --page/--per-page parsing
with the default listing) and `agents retire <id> [--force --reason
"…"]` (mirrors ErrForceReasonRequired — force without reason is
rejected client-side before the request is sent). JSON + table
output modes both honor the new columns.
Frontend:
web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
web/src/api/client.ts + web/src/api/types.ts expose the retire
endpoint and the retired-listing. 4 new Vitest regression cases.
OpenAPI:
api/openapi.yaml documents DELETE /agents/{id} with all seven
status codes, 410 on heartbeat, and the 409 per-bucket body shape.
Regression coverage (six new test files, all green):
internal/service/agent_retire_test.go — 8-step contract + sentinel guards
internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
internal/mcp/retire_agent_test.go — DeleteWithQuery wire-through
internal/cli/agent_retire_test.go — --retired listing + --force/--reason pairing
internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
internal/domain/connector_test.go — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies
Files:
api/openapi.yaml — DELETE + 410 + 409 body shape
cmd/agent/main.go — ErrAgentRetired, markRetired, retiredSignal
cmd/cli/main.go — handleAgents list/get/retire dispatch
docs/architecture.md, docs/concepts.md,
docs/testing-guide.md — retirement contract narrative
internal/api/handler/agents.go — RetireAgent, status surface, 410 on heartbeat
internal/api/handler/agent_handler_test.go — extended coverage
internal/api/handler/agent_retire_handler_test.go — new
internal/api/router/router.go — /agents/retired before /agents/{id}
internal/cli/agent_retire_test.go — new
internal/cli/client.go — ListRetiredAgents + RetireAgent
internal/domain/connector.go — IsRetired, SentinelAgentIDs,
IsSentinelAgent, AgentDependencyCounts,
ActorTypeAgent/System
internal/domain/connector_test.go — new
internal/integration/lifecycle_test.go — retirement fixture
internal/mcp/client.go — DeleteWithQuery additive transport
internal/mcp/retire_agent_test.go — new
internal/mcp/tools.go, internal/mcp/types.go — retire_agent tool + Force/Reason inputs
internal/repository/interfaces.go — AgentRepository retirement methods
internal/repository/postgres/agent.go — retire + cascade target retire + counts
internal/repository/postgres/migration_000015_test.go — new
internal/service/agent.go — wire into AgentService surface
internal/service/agent_retire.go — new 8-step contract
internal/service/agent_retire_test.go — new
internal/service/deployment.go — skip retired agents
internal/service/target.go — skip retired agents
internal/service/testutil_test.go — shared mocks extended
migrations/000015_agent_retire.up.sql — new
migrations/000015_agent_retire.down.sql — new
web/src/api/client.ts, types.ts + tests — retire endpoint wiring
web/src/pages/AgentsPage.tsx — retire UI
|
||
|
|
3287e174dc |
Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
|
||
|
|
5abeeb882b | fix(crypto): per-ciphertext PBKDF2 salt + v2 versioned format with v1 fallback (M-8) | ||
|
|
13cd4d98ba |
feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id, owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke), server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI spec, compliance mapping updates, and 21 new tests (12 service, 7 handler, 1 CLI, 1 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e1bcde4cf1 |
feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret managers. Three pluggable DiscoverySource connectors feed into the existing discovery pipeline via sentinel agent pattern, with a 9th scheduler loop for periodic cloud scanning. - AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests - Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests - GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests - CloudDiscoveryService orchestrator with 9 tests - 9th scheduler loop (6h default, atomic.Bool idempotency) - Discovery page: color-coded source type badges - 14 new env vars across CloudDiscoveryConfig structs - Docs: connectors.md, architecture.md, features.md, README updated 49 new tests. All CI checks pass (go vet, race, lint, coverage). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3f619bcaac |
feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA coverage. Entrust uses mTLS client certificate auth with sync/async issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for self-hosted Keyfactor CAs. Each connector implements the full issuer.Connector interface (9 methods), includes httptest-based unit tests (~14 each), and follows established patterns (injectable HTTP clients, RFC 5280 revocation reason mapping, CRL/OCSP delegated to CA). Also includes: issuer factory cases, env var seeding, config structs, domain types, seed data (3 rows, all disabled), OpenAPI enum updates, frontend issuer catalog entries with config fields, and full docs (connectors.md, architecture.md, features.md, README). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
596d86a206 |
feat(M48): continuous TLS health monitoring — endpoint state machine, shared tlsprobe, 8 API endpoints, GUI
Adds continuous TLS endpoint health monitoring that closes the deploy→verify→monitor loop. After M25 verifies a deployment succeeded once, M48 continuously confirms it stays healthy. Key components: - Shared `internal/tlsprobe/` package extracted from network scanner for reuse - Health status state machine: healthy → degraded (2 failures) → down (5 failures), plus cert_mismatch when served fingerprint differs from expected - 8th scheduler loop (60s tick, per-endpoint configurable intervals) - PostgreSQL migration 000011: endpoint_health_checks + endpoint_health_history tables - 8 REST API endpoints (CRUD, history, acknowledge, summary) - Health Monitor GUI page with summary bar, status table, create modal, auto-refresh - 38 new tests (5 tlsprobe + 11 domain + 10 service + 8 handler + 4 frontend) - All coverage thresholds maintained (service 68%, handler 83%, domain 87%, middleware 63%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f2e60b93a3 |
feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths (renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs with key algorithm/size that don't match profile rules. MaxTTL enforcement caps certificate validity per issuer connector (Local CA, Vault, step-ca enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and size are now persisted in certificate_versions for audit compliance. 16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded version number from GUI sidebar. Documentation updated across architecture, features, connectors, and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
bcefb11e65 |
feat(M51): add SCEP server (RFC 8894) for MDM and network device enrollment
Implements Simple Certificate Enrollment Protocol with single-endpoint operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7 SignedData CSR extraction with fallback for raw/base64 CSR, challenge password authentication via CSR attributes, and shared internal/pkcs7 package extracted from EST handler to eliminate code duplication. 24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c015cab2f4 |
docs: rewrite features.md, audit README + architecture against repo
Rewrote docs/features.md from scratch as authoritative feature inventory (1255 lines, every claim verified against source files). Audited README.md and architecture.md against repo — fixed 19 stale references: K8s Secrets status, issuer counts, dashboard page counts, CI thresholds, missing connectors in Mermaid diagrams, OpenAPI operation count, GetCACertPEM behavior, and V2/V4 roadmap accuracy. Also includes related fixes discovered during audit: - Scheduler skips expired/failed/revoked certs from auto-renewal - Seed demo expiry dates moved outside 31-day scheduler query window - Agent pages use correct last_heartbeat_at field name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4b5927dfff |
docs: expand README documentation table and fix orphaned doc links
- README: Add 7 missing docs to documentation table (MCP server, OpenAPI guide, migration guides for certbot/acme.sh/cert-manager, test environment, testing guide). Fix connector reference description to remove stale counts. Link OpenAPI guide instead of raw YAML. - architecture.md: Add cross-references to testing-guide.md and test-env.md from testing strategy section and What's Next links. These were the only two orphaned docs with zero inbound references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
cc03f55006 |
docs: comprehensive documentation audit — fix stale counts, V2/V3 matrix, connector status
- features.md: Fix Feature Matrix to correctly show all V2 Free features (F5/IIS/WinCertStore/JavaKeystore as Implemented, not Stub; Vault/DigiCert/ Sectigo/GoogleCAS as V2 Free, not V3 Paid). Add missing shipped features (EST, verification, export, S/MIME, ARI, digest, Helm, onboarding). Update issuer count to 9, target count to 13. - architecture.md: Fix F5/IIS from "interface only, implementation planned" to implemented. Add all 13 target connectors to built-in targets list. - why-certctl.md: Add Sectigo and Google CAS to issuer list (7→9). Fix target count (10→13). Remove hardcoded endpoint/operation counts. - connectors.md: Fix F5 BIG-IP TOC entry from "Interface Only" to "Implemented". Remove dead "Planned Issuers" TOC link. - README.md: Remove competitor product names (CertKit, KeyTalk). Remove hardcoded dashboard page count. Remove hardcoded endpoint counts. Fix V4 roadmap to remove already-shipped issuers (Sectigo, Google CAS). - Remove hardcoded MCP tool counts (78/80) across 8 files (mcp.md, architecture.md, features.md, testing-guide.md, concepts.md, quickstart.md, demo-advanced.md, why-certctl.md). Replace with "REST API exposed via MCP" to avoid future drift. - quickstart.md: Docker Compose environments table (from previous session). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f92c997a50 |
feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone: 1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support. ES256 JWS signing with kid mode, nonce management, directory discovery. Empty profile delegates to standard library path (zero behavior change). Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on ACME issuer config. 2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source, docs, README, and examples. Zero remaining occurrences of RFC 9702. 3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating renewal thresholds against SC-081v3 validity reduction timeline (200→100→47 days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the expected renewal path for 6-day shortlived certs. New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
697c0be9f3 |
feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any Linux/Unix server without installing the certctl agent binary. Uses the proxy agent pattern — a single agent in the same network zone deploys certs to remote servers over SSH/SFTP. Key additions: - SSH/SFTP connector with key auth (file/inline) + password auth - Injectable SSHClient interface for cross-platform testing (25 tests) - Shell injection prevention via validation.ValidateShellCommand() - Configurable cert/key/chain paths with octal permissions - GUI: 11 SSH config fields in target create wizard Also fixes pre-existing frontend bug where all target type strings (nginx, apache, etc.) were sent as lowercase but the backend expects proper-case (NGINX, Apache, etc.), breaking GUI-created targets. Adds missing TargetTypeSSH to validTargetTypes service map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e19b8c95fe |
docs: remove hardcoded test counts from public-facing docs
Replace brittle test count numbers (1,554+, 1,088+, 211, etc.) with descriptions of testing approach and CI-enforced coverage gates. Counts go stale every milestone — coverage thresholds are machine- verified and never drift. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
2a14a1da01 |
feat(M40): F5 BIG-IP target connector via iControl REST
Replace 190-line stub with full iControl REST implementation (~580 lines). Token auth with 401 auto-retry, file upload + crypto object install, transaction-based atomic SSL profile updates, cleanup on failure. Injectable F5Client interface for cross-platform testing. 32 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5a53b648b1 |
feat(M44): Google CAS issuer connector
Google Cloud Certificate Authority Service integration via REST API with OAuth2 service account auth (JWT→access token). Synchronous issuance model, CA pool selection, mutex-guarded token caching, revocation with RFC 5280 reason mapping. No Google SDK dependency — all stdlib. 19 tests with httptest mock OAuth2 + CAS API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3a11e447cf |
feat(M43): Sectigo SCM issuer connector
Implement Sectigo Certificate Manager REST API connector with async order model (enroll → poll → collect PEM), 3-header auth, DV/OV/EV support, collect-not-ready (400/-183) graceful handling, and RFC 5280 revocation reason mapping. 20 tests with httptest mock API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4c3b7cbb16 |
docs: fix stale references, seed data case bugs, and convert ASCII diagrams to Mermaid
Audit all docs and examples against current codebase state. Fix seed_demo.sql domain constant casing (IssuerType, TargetType, AgentStatus) that would cause agent dispatch failures. Fix example docker-compose health endpoints (/health not /api/v1/health) and env var names (CERTCTL_DATABASE_URL). Update connector counts, test numbers, and planned→implemented status across docs. Convert 3 ASCII flow diagrams to Mermaid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8b52da6aef |
feat(M39): IIS target connector + README overhaul
Implement full IIS target connector with PEM-to-PFX conversion via go-pkcs12, PowerShell-based deployment (Import-PfxCertificate, IIS binding management), SHA-1 thumbprint computation, and SNI support. Injectable PowerShellExecutor interface enables cross-platform testing. Regex-validated config fields prevent PowerShell injection. 28 tests. Restructure README from 563 to 313 lines: outcome-focused feature descriptions, "Who Is This For" persona section, examples promoted above the fold, configuration/API/security reference moved to docs. All numbers verified against repo (25 GUI pages, 97 OpenAPI ops, CI thresholds service 55%/handler 60%/domain 40%/middleware 30%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6c8d4eca40 |
feat: frontend audit fixes, README accuracy pass, doc updates
Frontend audit (10 categories): lifecycle fields in types, new API functions (CRL, OCSP, deployments, updateIssuer/Target, getPolicy), issuer/owner/profile filters on CertificatesPage, last_renewal_at column, error_message column on JobsPage, full crypto policy UI on ProfilesPage (key algorithms, EKUs, SAN patterns), key info + CA badge on DiscoveryPage, edit modal on TargetDetailPage, tags field on certificate creation, darwin→macOS mapping on AgentFleetPage. 211 Vitest tests passing. README accuracy: test counts (1300+ Go, 211 frontend), page count (24), demo data (32 certs, 7 issuers, 180 days), endpoint count (97), MCP tools (80), CLI subcommands (10), moved shipped items out of "Coming in v2.1.0". Docs: architecture.md diagrams updated (Vault PKI, DigiCert, Traefik, Caddy added), features.md Vault/DigiCert status updated. Version bumped to v2.0.20. cli binary removed from git tracking. Testing guide Part 41 added (12 auto + 9 manual tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6375909591 |
feat: add Vault PKI and DigiCert CertCentral issuer connectors (M32 + M37)
Vault PKI: synchronous issuance via /v1/{mount}/sign/{role}, token auth,
revocation, CA cert retrieval, 14 tests. DigiCert CertCentral: async order
model (submit → poll → download), X-DC-DEVKEY auth, OV/EV support, PEM
bundle parsing, 16 tests. Both conditionally registered based on env vars.
Includes OpenAPI enum updates, seed data, connector docs, architecture docs,
README badges, and testing guide sign-off (Parts 38 + 39, 12 automated
smoke test assertions all passing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
11173a74c6 |
feat(M31): agent work routing — scope jobs to assigned agents
Deployment jobs now set agent_id from target→agent relationship at creation time. GetPendingWork() uses ListPendingByAgentID() with a 3-way UNION query (direct match, legacy NULL fallback via target JOIN, AwaitingCSR via cert→target→agent chain) so each agent only receives its own jobs. - Added AgentID *string to Job domain struct - Added agent_id to all job SQL queries (5 SELECTs, INSERT, UPDATE, scanJob) - New ListPendingByAgentID() repository method - Rewrote GetPendingWork() from ~25 lines to single scoped query - 4 new Go tests (3 agent routing + 1 deployment agent_id) - Frontend: agent_id/target_id on Job type Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
bcf2c3ae92 |
feat(pre-2.1.0): demo data overhaul, examples, migration guides, install script
Pre-2.1.0 adoption polish delivering all four milestones: A) Demo Data Overhaul — seed_demo.sql rewritten with 35 certs across 5 issuers, 8 agents, 8 targets, 50+ jobs spanning 90 days, 55+ audit events, discovery scans, network scan targets, S/MIME cert. B) Examples Directory — 5 turnkey docker-compose configs: acme-nginx, acme-wildcard-dns01, private-ca-traefik, step-ca-haproxy, multi-issuer. C) Migration Guides — migrate-from-certbot.md, migrate-from-acmesh.md, certctl-for-cert-manager-users.md. D) Agent Install Script — install-agent.sh with cross-platform support (Linux systemd + macOS launchd), release.yml updated for 6-target cross-compilation. Triple-audited against codebase: 22 factual corrections applied across docs, examples, and config (env var names, CLI flags, ports, DNS hook interface, scheduler loop counts, license conversion date). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ec21c9bb29 |
feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing with cert ID computation, directory endpoint discovery, graceful degradation for non-ARI CAs. 19 tests. M29: Email notifier wiring + scheduled certificate digest — SMTP connector bridged to service layer via NotifierAdapter, DigestService with HTML email template, 7th scheduler loop (24h), digest preview/send API endpoints and GUI card. 21 tests. M30: Production-ready Helm chart — server Deployment, PostgreSQL StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security contexts, health probes, example values for dev/prod/ACME scenarios. Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job, documentation updates across 5 doc files and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a00bb349c4 |
feat(m27): certificate export (PEM/PKCS#12) and S/MIME EKU support
Add certificate export in PEM (JSON or file download) and PKCS#12 formats. Private keys are never included — they stay on agents. Add EKU-aware issuance threading profile EKUs (serverAuth, clientAuth, codeSigning, emailProtection, timeStamping) through the full issuance pipeline. Fix agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for S/MIME vs TLS, and a pre-existing generateID collision bug in deployment job creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6d508cf53f |
fix: security audit remediation (AUDIT-001, 003, 004, 005, 006, 018)
- AUDIT-001: Validate OpenSSL revoke inputs (hex-only serials, RFC 5280 reasons) - AUDIT-003: Enforce /20 CIDR size cap at API level (create + update) - AUDIT-004: Support comma-separated CERTCTL_AUTH_SECRET for zero-downtime key rotation - AUDIT-005: Add ReadHeaderTimeout (5s) to prevent Slowloris - AUDIT-006: Document audit trail query parameter exclusion rationale - AUDIT-018: Add immediate-run-on-start to short-lived expiry scheduler loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
de9264baf7 |
docs: synchronize project documentation with codebase
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010) and performs comprehensive documentation audit to eliminate drift between code and docs. Code changes: - TICKET-003: Repository integration tests with testcontainers-go (50+ subtests) - TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc - TICKET-010: Request body size limits via http.MaxBytesReader middleware - Fix missing slog import in certificate.go after service decomposition Documentation updates: - README: Fix endpoint count (97→93), expand env var reference (15→39 vars) - CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations - architecture.md: Add body size limits section, middleware chain ordering - CONTRIBUTING.md: New contributor guide with architecture conventions, test patterns, middleware ordering, CI thresholds - SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored) - Test files: Add doc comments to all new test files Documentation that should exist but doesn't yet: - Architecture diagrams (C4 model or similar) - Threat model document - Testing philosophy guide - Disaster recovery runbook - Upgrade guide (migration between versions) - API versioning strategy document Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
305c7dc851 |
docs: update project documentation to reflect security remediation
Update README, architecture guide, and feature inventory to document all changes from the security remediation pass (17 tickets): - README: Add CI pipeline section (race detection, golangci-lint, govulncheck, per-layer coverage thresholds), CORS deny-by-default behavior, input validation, SSRF protection, scheduler concurrency safety. Update test count to 1050+. Add race detection and govulncheck to development commands. - Architecture guide: Update testing strategy with scheduler tests, fuzz tests, and revised CI pipeline description. Add security model sections for input validation, CORS, and concurrency safety. Update test count. - Feature inventory: Document CORS deny-by-default behavior. - SECURITY_REMEDIATION.md: New file documenting all 17 remediated tickets with CWE classifications, before/after behavior, 3 deferred tickets with rationale, CI pipeline changes, and breaking CORS change. Missing docs flagged as future additions: - Formal threat model document - Disaster recovery runbook - Version upgrade guide - Capacity planning benchmarks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
be72627aeb |
feat: M25 post-deployment TLS verification + M26 Traefik/Caddy targets
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.
M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.
Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
d9fd0a147e |
feat(gui): add discovery triage, network scan management, and approval workflow pages (M24)
Three new GUI surfaces closing the backend-to-frontend gap for V2: - Discovery triage page: summary stats bar, DataTable with claim/dismiss actions, status/agent filters, collapsible scan history panel - Network scan target management: CRUD with create modal, enable/disable toggle, Scan Now button, last scan results display - Jobs page approval workflow: Approve/Reject buttons for AwaitingApproval jobs, rejection reason modal, pending approval banner with count, AwaitingApproval/AwaitingCSR added to status filter dropdown Also adds 13 new frontend tests, 4 API types, 12 API client functions, 2 sidebar nav items, 2 routes, and discovery status badge styles. Docs updated: README, architecture, quickstart, demo-advanced, CLAUDE.md, roadmap. Version bumped to v2.0.4. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
03593d4304 |
feat: wire ACME EAB into account registration + ZeroSSL auto-fetch
EAB credentials (KID + HMAC) were defined in the ACME connector config but never wired into the acme.Account registration call. This fixes the dead code and adds automatic EAB credential fetching for ZeroSSL — when the directory URL is detected as ZeroSSL and no EAB credentials are provided, certctl calls ZeroSSL's public API to get them automatically. Changes: - Wire EABKid/EABHmac into acme.Account.ExternalAccountBinding - Add isZeroSSL() detection and fetchZeroSSLEAB() auto-fetch - Add CERTCTL_ACME_EAB_KID/CERTCTL_ACME_EAB_HMAC env vars to main.go - Add 13 ACME connector tests (config validation, EAB decode, ZeroSSL auto-EAB with mock servers, URL detection) - Update docs: README, architecture, connectors, demo-advanced, testing-guide with EAB/auto-EAB documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
87355c3efb |
docs: add table of contents to all major documentation files
Navigation menus for testing guide, architecture, concepts, connectors, quickstart, advanced demo, and three compliance docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
50c520e1ff |
feat: dashboard theme overhaul — light content area with branded teal sidebar
Complete frontend visual redesign using certctl logo color palette: - Deep teal sidebar (#0c2e25) with prominent centered logo (64px in white pill) - Light content area (#f0f4f8) with white cards and visible borders - Brand colors from logo: teal (#2ea88f), blue (#3b7dd8), orange (#e8873a), green (#4ebe6e) - Inter + JetBrains Mono typography, colored stat card top borders - All 17 pages + 7 components updated (25 files, ~700 lines changed) - 15 new dashboard screenshots replacing old dark theme screenshots - Prometheus metrics e2e test added, integration test mock fixes - Docs updated: architecture.md theme description, testing-guide.md DNS-PERSIST-01 coverage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e19c240a79 |
feat: add ACME DNS-PERSIST-01 challenge support (IETF draft-ietf-acme-dns-persist)
Standing TXT record at _validation-persist.<domain> eliminates per-renewal DNS updates. Auto-fallback to dns-01 if CA doesn't offer dns-persist-01. ScriptDNSSolver extended with PresentPersist method. Configurable via CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01 and CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN env vars. Also fixes IsExpired edge-case test in discovery_test.go that always failed due to time.Now() drift between test setup and method invocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7d14635a72 |
feat: add EST server (RFC 7030) for device certificate enrollment (M23)
Implement Enrollment over Secure Transport protocol with 4 endpoints under /.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial enrollment), simplereenroll (certificate renewal), and csrattrs (CSR attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts both PEM and base64-encoded DER CSRs, configurable issuer and profile binding, full audit trail. 28 new tests (18 handler + 10 service). Also includes: - GetCACertPEM added to issuer connector interface (all 4 issuers updated) - EST integration tests wired into e2e test suite (13 test cases) - QA testing guide Part 26 (15 manual EST test cases) - All docs updated: README, features, architecture, concepts, connectors, quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs, issuer interface, resource lists, OpenSSL status) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
0f4a1b268b |
fix: handle 204 No Content in fetchJSON, add FK-aware delete errors, v2 screenshots
Frontend: fetchJSON now returns empty object on 204 instead of failing to parse empty body — fixes silent delete failures across all entities. Added onError callbacks to owner/team delete mutations to surface errors. Backend: owner and issuer delete handlers return 409 Conflict with descriptive messages when FK constraints block deletion, instead of generic 500. Added 15 v2 dashboard screenshots, updated README screenshot section, logo asset, page count references (18→full), and QA guide with FK constraint test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
72cda5877a |
docs: fix 16 discrepancies found by cross-validating all docs against source code
CLI syntax corrected across 5 files (concepts, demo-guide, demo-advanced, architecture, features): list-certs→certs list, get-cert→certs get, etc. Removed non-existent health/metrics commands, replaced with status. Subcommand count 10→12 everywhere. architecture.md: Go 1.22→1.25, endpoint count 91→93, ER diagram expanded from 15 to 21 tables (added renewal_policies, certificate_revocations, discovered_certificates, discovery_scans, network_scan_targets). connectors.md: added GenerateCRL and SignOCSPResponse to issuer interface, added Email and Webhook rows to notifier config table. compliance docs: fixed keygen warning messages to match actual log output, CERTCTL_STEPCA_PROVISIONER_KEY→CERTCTL_STEPCA_KEY_PATH, openssl genrsa→ crypto/ecdsa.GenerateKey, CERTCTL_SERVER_ADDR→CERTCTL_SERVER_HOST+PORT. README.md: v2.0.0 version bump, solo developer mention, feature list, table of contents, documentation table moved to top, 7 fact-check fixes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a4622d5e9a |
fix: correct stale counts across all docs (tables 19→21, MCP tools 76→78, tests 860→900+)
V2 audit found 3 critical number mismatches propagated across 8 files: - Table count was 19 everywhere but actual migrations create 21 tables - MCP tool count was 76 but tools.go registers 78 (M21/M22 additions) - README MCP breakdown claimed 83 tools with math summing to 90 - architecture.md still had stale 860+ test count - features.md OpenAPI claim said 93 ops but spec has 78 - mcp.md tool-per-domain table had wrong counts in 10 of 16 rows - Added 3 network_scan_targets to seed_demo.sql for demo completeness - Added curl examples to Agent Groups section in features.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4f90be9311 |
feat: add network certificate discovery (M21) and Prometheus metrics (M22)
M21 adds server-side active TLS scanning of CIDR ranges with concurrent probing, sentinel agent pattern for pipeline reuse, and full CRUD API for scan targets. M22 adds Prometheus exposition format endpoint alongside existing JSON metrics. Comprehensive documentation audit updates all docs to reflect 91 endpoints, 19 tables, 6 scheduler loops, and 900+ tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7bf20fce85 |
docs: add compliance mapping guides and comprehensive documentation audit
Add SOC 2 Type II, PCI-DSS 4.0, and NIST SP 800-57 compliance mapping guides — the final V2 deliverable. All claims verified against actual codebase (router.go, config.go, main.go). Also audit and update all existing docs: fix endpoint/tool/test counts in features.md, expand demo-guide.md and demo-advanced.md with CLI/MCP/discovery coverage, update connectors.md F5/IIS status to V3 paid, add compliance reference to architecture.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
667a30870d |
feat: M18b Filesystem Certificate Discovery — agent scanning, server dedup, triage API
Agent-side:
- Filesystem scanner walks configured directories (CERTCTL_DISCOVERY_DIRS)
- Parses PEM (.pem, .crt, .cer, .cert) and DER (.der) certificate files
- Extracts CN, SANs, serial, issuer/subject DN, validity, key info, SHA-256 fingerprint
- Reports discoveries to control plane on startup + every 6 hours
- Skips files >1MB and private key files
Server-side:
- Migration 000006: discovered_certificates + discovery_scans tables
- Domain model: DiscoveredCertificate, DiscoveryScan, DiscoveryReport
- Three triage states: Unmanaged, Managed (claimed), Dismissed
- Repository with upsert dedup (fingerprint + agent + path)
- Service layer: process reports, claim, dismiss, list, summary
- 7 new API endpoints (84 total):
POST /agents/{id}/discoveries, GET /discovered-certificates,
GET /discovered-certificates/{id}, POST .../claim, POST .../dismiss,
GET /discovery-scans, GET /discovery-summary
- Audit trail: scan_completed, cert_claimed, cert_dismissed events
Tests: 28 new test functions (domain, handler, service layers)
Docs: README, quickstart, demo-guide, demo-advanced, architecture,
concepts, connectors, features.md all updated
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
8768a7b3ef |
docs: add feature inventory, complete demo-advanced and architecture coverage
- Create docs/features.md — comprehensive V2 feature inventory (15+ sections covering all 77 endpoints, 4 issuers, 5 targets, 6 notifiers, profiles, agent groups, revocation, observability, CLI, MCP, and configuration) - Update docs/demo-advanced.md — add Parts 10-13 (Certificate Profiles, Agent Groups, Interactive Approval, Advanced Query Features), fix notification channel count (2→6), fix scheduler loop count (4→5), update architecture summary flowchart - Update docs/architecture.md — add revocation data flow diagram (Section 3.5), profile enforcement note, M20 Enhanced Query API section, OpenAPI spec reference, CLI Tool section, update connector test counts (23→57), add e2e_test.go mention Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1e56e35dcc |
docs: add MCP server guide and OpenAPI spec guide
New docs/mcp.md covers MCP server setup with Claude Desktop, Cursor, and Claude Code, lists all 76 tools across 16 domains, includes example conversations, and documents security considerations. New docs/openapi.md covers Swagger UI setup, SDK generation for TypeScript/Python/Go/Java, Postman import, spec validation, and contract testing with schemathesis. Updated cross-references in concepts.md and architecture.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |