mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 22:51:30 +00:00
e17788355b73da8e63abce8b3a76c5811954e712
134 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
87213128cc |
fix(security,domain): redact Agent.APIKeyHash from JSON wire shape (G-2)
Pre-G-2 internal/domain/connector.go::Agent::APIKeyHash was tagged
`json:"api_key_hash"` and shipped on every wire surface that returned
domain.Agent — GET /api/v1/agents (PagedResponse{Data: agents}),
GET /api/v1/agents/{id}, GET /api/v1/agents/retired, and the
POST /api/v1/agents registration response. Every authenticated client
(browser, CLI --json, MCP tool calls) received the SHA-256-of-the-API-key
string. The browser silently dropped it because web/src/api/types.ts
omits the field, but CLI and MCP consumers print full JSON so the hash
was visible there. Even though the value is a hash and not the plaintext
key, shipping it gives an attacker an offline brute-force target if the
API-key entropy is low (certctl doesn't enforce a minimum on operator-
supplied keys), and there's no business reason for any client to ever
receive it — the value is server-internal, used only for the lookup at
internal/repository/postgres/agent.go::GetByAPIKey. (Audit:
cat-s5-apikey_leak in coverage-gap-audit-2026-04-24-v5/unified-audit.md.)
We chose the audit's recommended fix (json:"-") plus a defense-in-depth
MarshalJSON plus a CI guardrail. Three layers because struct-tag
redaction alone is one rebase away from being silently reverted, the
custom MarshalJSON catches the case where a parent struct embeds Agent
under a different tag, and the CI grep blocks reintroduction at the spec
or frontend boundary even without a code review catching it.
Files changed:
Phase 1 — Domain redaction:
- internal/domain/connector.go: APIKeyHash tag flipped from
`json:"api_key_hash"` to `json:"-"`. New Agent.MarshalJSON
with value receiver + type-alias-recursion-break that explicitly
zeroes APIKeyHash on the marshal-time copy. Long-form docblock
explaining the G-2 closure rationale + cross-references to
service.RegisterAgent (populator), repository.AgentRepository::
GetByAPIKey (consumer), docs/architecture.md (DB-shape vs
API-shape distinction), and the audit finding.
Phase 2 — Domain tests (5 test functions):
- internal/domain/connector_test.go: TestAgent_MarshalJSON_RedactsAPIKeyHash
pins the marshal-boundary contract on a value receiver. ...RedactsViaPointer
pins the *Agent path. ...RedactsInSlice pins the []Agent path that the
ListAgents handler actually emits via PagedResponse. ...DoesNotMutateReceiver
pins the by-value-receiver contract so a future refactor that switches
to pointer-receiver gets caught. ...RoundTrip pins the wire-shape
guarantee that APIKeyHash is dropped on encode and cannot reappear on
decode. Single sentinel value ("sha256:LEAKED-CREDENTIAL-DERIVATIVE-
SENTINEL") flows through every fixture for grep-ability on regression.
Phase 3 — Handler tests (4 test functions):
- internal/api/handler/agent_handler_test.go: TestListAgents_DoesNotLeakAPIKeyHash,
TestGetAgent_DoesNotLeakAPIKeyHash, TestRegisterAgent_DoesNotLeakAPIKeyHash,
TestListRetiredAgents_DoesNotLeakAPIKeyHash. Each asserts (a) the
literal substring "api_key_hash" is absent from the httptest-captured
body, (b) the leak sentinel value is absent, (c) the non-leaked fields
ARE present (sanity that the handler is serving real data, not just
empty payloads). Shared sentinel "sha256:LEAKED-CREDENTIAL-DERIVATIVE-
HANDLER-SENTINEL" so a single grep over a failing test's output
identifies the leak surface immediately.
Phase 4 — Spec / docs:
- api/openapi.yaml: api_key_hash property REMOVED from Agent schema
(was at line 3690). Inline G-2 comment naming the closure + the
database-vs-API-shape distinction so a future spec edit doesn't
silently re-introduce the field.
- docs/architecture.md: ER-diagram block already documents the agents
table including api_key_hash (DB shape — correct). Added a sibling
note paragraph immediately below the diagram explaining that several
columns are intentionally server-internal (api_key_hash redaction
+ issuers.config / deployment_targets.config encrypted shadow), with
cross-references to the redaction enforcement site, the OpenAPI
schema, the frontend interface, and the CI guardrail.
- web/src/api/types.ts: Agent interface unchanged in shape (already
omitted the field) but added a leading comment block explaining
WHY the omission is intentional — stops a future frontend dev from
"completing" the interface from the OpenAPI spec or the Go struct.
Phase 5 — CI guardrail:
- .github/workflows/ci.yml: new "Forbidden api_key_hash JSON-shape
regression guard (G-2)" step. Scoped patterns catch the actual
regression shapes — Go struct tag (json:"api_key_hash"), frontend
interface declaration, OpenAPI schema property, YAML enum/array
membership. Repository / migration / seed / service / integration /
unit-test / comment lines exempt. Verified locally on the real tree
(passes) and against 4 synthetic regression patterns (each fires
the guardrail). Mirrors the G-1 pattern from .github/workflows/
ci.yml lines 47-108.
Phase 5b — Sweep verification (no changes, results documented for the
next reader):
- internal/api/middleware/audit.go: doesn't serialize Agent struct;
records request body only. No leak.
- service.RegisterAgent audit-event payload: `map[string]interface{}{
"name": name, "hostname": hostname}` — name + hostname only,
no APIKeyHash. No leak.
- All 9 slog sites that mention agent: scalar attrs only ("agent_id",
"error", "agent_hostname"), never the full struct. No leak.
- internal/mcp, internal/cli, cmd/cli, cmd/mcp-server: zero matches
for APIKeyHash / api_key_hash. Both pass server JSON verbatim, so
the wire-side fix transitively closes them.
Verification (all gates pass):
- go build ./...
- go vet ./...
- go test -short ./... — every package green
- go test -short -race ./internal/domain/... ./internal/api/handler/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template smoke render — succeeds
- python3 yaml.safe_load on api/openapi.yaml — parses
- OpenAPI Agent schema scan: no api_key_hash property
- CI guardrail mirror: clean on real tree, fires on all 4 synthetic
regression patterns
- Domain pkg coverage: Agent.MarshalJSON 100%, connector.go total 87.5%
- Handler pkg coverage: 79.2%
Sample response body (httptest captured during verification, GET
/api/v1/agents/{id} via the new handler test):
{"id":"agent-demo","name":"demo-agent","hostname":"demo.host",
"status":"Online","last_heartbeat_at":"2026-04-24T11:59:30Z",
"registered_at":"2026-04-24T12:00:00Z","os":"linux",
"architecture":"amd64","ip_address":"10.0.0.42",
"version":"v2.0.49"}
Note the absence of any api_key_hash key, even though the in-memory
struct passed to the handler had APIKeyHash set to a sentinel.
Out of scope (intentionally untouched):
- internal/repository/postgres/agent.go SELECT/INSERT/UPDATE/scan
paths and GetByAPIKey lookup — DB column stays, repo still
populates the struct, auth lookup still works. The redaction is a
marshal-boundary concern.
- migrations/000001_initial_schema.up.sql + migrations/seed_*.sql —
DB schema and seed data unchanged.
- internal/service/agent.go::RegisterAgent — service-side hashing
and persistence unchanged.
- Other domain types with potential credential-derivative fields
(Issuer.Config, DeploymentTarget.Config, notifier configs). Not
flagged by the audit; some are already protected (e.g.,
DeploymentTarget.EncryptedConfig []byte `json:"-"`). File a
separate audit pass if recon surfaces additional leaks.
- Per-resource DTO layer across every handler. Single audit
finding, single domain type.
- A separate possible follow-up: the v2 RegisterAgent endpoint
doesn't return the plaintext API key to the agent, which may
mean self-bootstrap via POST /api/v1/agents is broken. Verified
during recon; out of scope for G-2; should be its own ticket.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-s5-apikey_leak
Audit recommendation: 'json:"-" or API-response DTO
excluding APIKeyHash' — went with the json:"-" + MarshalJSON
defense-in-depth pair plus CI guardrail and structural docs.
|
||
|
|
9c1d446e40 |
fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.
We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.
The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.
Files changed:
Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
literal 'jwt' through a dedicated multi-line diagnostic naming the
authenticating-gateway pattern, then cross-checks against
ValidAuthTypes(). Secret-required branch simplified to api-key-only.
Field comment on AuthConfig.Type rewritten to drop jwt and point at
the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
config.Load() — exits 1 on any unsupported auth-type that bypassed the
validator. Auth-disabled startup log explicitly names the
authenticating-gateway pattern.
Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
(two table rows pinning the dedicated G-1 error fires regardless of
whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
guard against future re-introduction),
TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
hit the generic invalid-auth-type error). Removed the prior
TestValidate_JWTAuth_MissingSecret happy-path since its premise is
inverted post-G-1.
- internal/api/handler/health_test.go: removed
TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
into the regression suite). Pre-existing _APIKey test continues to
cover the api-key happy path.
Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
jwt and point at the gateway pattern; secret-required conditional
simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
section explaining the design rationale and listing oauth2-proxy /
Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
forward_auth / Apache mod_auth_openidc / nginx auth_request as the
standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
with preconditions, what-changes, both recovery paths, complete
docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
ext_authz patterns, rollback posture.
Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
helper mirroring the existing certctl.tls.required pattern. Fails
template render on any server.auth.type outside {api-key, none} with
a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
server-configmap.yaml, server-secret.yaml: invoke the helper at the
top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
walkthrough.
Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
Added / Changed sections; explicit pointer at
docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.
Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
guard (G-1)' step. Scoped patterns catch the actual regression shapes
(map literal, slice literal, switch case, OpenAPI enum, env-file
default, AuthType('jwt') cast). Comments and the dedicated rejection
branch are intentionally exempt; connector-package JWT references
(Google OAuth2 / step-ca) are exempt as out-of-scope external
protocols. Verified locally: the guard passes on the actual tree and
fires on all 4 synthetic regression patterns.
Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.
Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
longer accepted (G-1 silent auth downgrade): no JWT middleware ships
with certctl. To use JWT/OIDC, run an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
See docs/architecture.md "Authenticating-gateway pattern" and
docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'
config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-g-jwt_silent_auth_downgrade
Audit recommendation followed verbatim: 'Remove jwt from
validAuthTypes until middleware ships'.
|
||
|
|
a91197014f |
fix(db): emit volume-state guidance on postgres auth failure (U-1, #10)
The shipped quickstart instructs operators to copy deploy/.env.example to
deploy/.env, edit POSTGRES_PASSWORD, and run docker compose up. On the
*first* boot of a fresh checkout this works. On the *second* boot — i.e.,
when an operator first booted with the default POSTGRES_PASSWORD=certctl,
then edited .env and re-ran up — the certctl-server container picks up the
new password (env interpolated at every container start) but postgres does
not. The postgres docker-entrypoint runs initdb only when the data dir is
empty; on subsequent boots the persistent named volume postgres_data is
non-empty so pg_authid retains the password baked in on first boot. The
server connects with the new credentials, postgres rejects them, and the
operator sees an opaque `pq: password authentication failed for user
"certctl"` in the server log with no pointer to the actual cause. New-
operator onboarding gets blocked on the documented production path.
Why a doc fix alone is not sufficient. Operators don't reread the docs
after a successful first boot — the trap fires on the *second* up, when
they think they've already learned the system. The opaque pq error is
indistinguishable in the log from a typo'd password or a misconfigured
secret store. The diagnostic has to fire at the moment the failure is
observed.
Why we don't try to fix the bootstrap. The env-vs-pg_authid divergence is
intrinsic to how the official postgres image bootstraps (see
docker-entrypoint.sh: initdb runs only if PGDATA is empty). Switching to a
bind mount or ephemeral volume breaks the production path; switching to
POSTGRES_PASSWORD_FILE + ALTER ROLE adds operator surface without
eliminating the divergence. The ergonomic fix is to surface the failure
mode loudly, with both remediation paths, at the exact log line where it
becomes visible.
Two remediation paths, surfaced together. Destructive: `docker compose
-f deploy/docker-compose.yml down -v && up -d --build` — wipes the
postgres volume so initdb re-runs with the new env value. Use this on
demos / first-time setup where data loss is acceptable. Non-destructive:
`docker compose exec postgres psql -U certctl -c "ALTER ROLE certctl
PASSWORD '<new>';"` followed by a server restart with the matching
POSTGRES_PASSWORD. Use this on any environment that holds data you want
to keep. Surfacing both means the operator can pick based on their
environment without us assuming.
Files changed:
- internal/repository/postgres/db.go — extract wrapPingError(err) helper.
errors.As against *pq.Error; on SQLSTATE 28P01 (invalid_password) emit
the multi-line guidance preserving the %w wrap chain. Non-28P01 errors
retain the original `failed to ping database: %w` shape so transient
connection-refused / timeout paths don't get noisy. Add
pgErrInvalidPassword = "28P01" constant. Convert blank
`_ "github.com/lib/pq"` import to direct import (driver registration
still works via init()) so we can name the *pq.Error type at compile
time. NewDB now calls wrapPingError(err) instead of inlining the wrap.
- internal/repository/postgres/db_test.go (new) — 4 internal-package
unit tests covering wrapPingError. AuthFailureGuidance pins the
contract substrings ("SQLSTATE 28P01", "POSTGRES_PASSWORD",
"first boot", "down -v", "ALTER ROLE"). NonAuthErrorPreservesOriginalWrap
pins the no-leak contract for SQLSTATE 08006 (connection_failure).
NonPqErrorPreservesOriginalWrap pins the network-level path.
NilReturnsNil pins defensive contract. All run in -short without
testcontainers — package postgres (internal) so the unexported helper
is callable directly.
- docs/quickstart.md — `> **Warning:**` callout immediately after the
`cp deploy/.env.example deploy/.env` block at lines 56-61. Names the
trap, names the SQLSTATE, gives both remediation paths. Uses the
in-file `> **Note:**` blockquote convention.
- deploy/ENVIRONMENTS.md — `**Stateful volume — first-boot password
binding (U-1)**` paragraph appended to the Postgres expert-note block.
Explains the env-vs-pg_authid divergence, points at wrapPingError as
the runtime diagnostic, lists both remediation paths. Uses the in-file
`**Expert note:**` convention.
Out of scope (separate follow-ups):
- deploy/helm/certctl/templates/postgres-statefulset.yaml has the same
root cause via PVC retention. The wrapPingError diagnostic covers the
Helm path because the same NewDB code runs at server startup; the
Helm-specific doc warning lands separately.
- /.env.example at repo root (line 16 hardcodes the password literally
inside CERTCTL_DATABASE_URL rather than interpolating) — adjacent
trap, separate fix.
- examples/{acme-nginx,private-ca-traefik,step-ca-haproxy,multi-issuer,
acme-wildcard-dns01}/docker-compose.yml all carry the pattern. The
diagnostic covers them; targeted doc warnings are scoped to the
canonical quickstart + ENVIRONMENTS docs.
Out of consideration:
- Switch to bind mount / ephemeral volume — breaks the production path.
- POSTGRES_PASSWORD_FILE + Docker secret + ALTER ROLE rotation — adds
operator surface without fixing the env-vs-pg_authid divergence.
Verification (all passing):
- go build ./...
- go vet ./...
- go test -short -race ./internal/repository/postgres/ — 4/4 new tests
pass plus existing tests
- go test -short ./... — every package green
- govulncheck ./... — no vulnerabilities in our code
- wrapPingError coverage 100%; postgres pkg total unchanged in shape
(NewDB/RunMigrations were 0% pre-fix, still 0% post-fix; new helper
adds 100%-covered statements)
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-u-quickstart_postgres_password_volume_trap
GitHub Issue #10 (mikeakasully)
|
||
|
|
55ce86b132 |
v2.0.48: swap self-signed TLS bootstrap algorithm ed25519 → ECDSA-P256
Follow-up to v2.0.47 (HTTPS-Everywhere). The Phase-3 self-signed bootstrap sidecar shipped an ed25519 server cert. Apple's TLS stack — Safari Network Framework and the macOS-bundled LibreSSL 3.3.6 /usr/bin/curl — does not advertise ed25519 in the ClientHello signature_algorithms extension for server certs, so the handshake fails with the server-side log line: tls: peer doesn't support any of the certificate's signature algorithms Homebrew OpenSSL 3.x, Chrome, Firefox, and Linux curl all accept ed25519 server certs fine. Apple is the outlier. Rather than gate the demo stack behind "install Homebrew OpenSSL first," swap the bootstrap algorithm to ECDSA-P256 with SHA-256 — universally supported, including on the Apple stack. Changes - deploy/docker-compose.yml: certctl-tls-init openssl invocation swapped to `-newkey ec -pkeyopt ec_paramgen_curve:P-256 -nodes`; header comment + echo line updated; multi-line rationale paragraph added. - deploy/docker-compose.test.yml: same openssl swap + echo update for the test harness sidecar that writes to the bind-mounted ./test/certs directory the Go integration_test.go pins via CERTCTL_TEST_CA_BUNDLE. - docs/tls.md: Pattern 1 description + code block updated; "Why ECDSA-P256 and not ed25519" rationale paragraph added covering pre-v2.0.48 history, the Apple diagnosis, accepting clients, and the operator migration command. Patterns 2 (existing Secret) and 3 (cert-manager) explicitly called out as unaffected. - docs/upgrade-to-tls.md: docker-compose procedure sentence updated with cross-reference to tls.md Pattern 1. - docs/test-env.md: "Get the CA bundle for curl" sentence updated. Migration Existing demo installs must tear the `certs` named volume down to pick up the new algorithm: docker compose -f deploy/docker-compose.yml down -v docker compose -f deploy/docker-compose.yml up -d --build Not touched - cmd/server/tls.go: algorithm-agnostic. TLS 1.3 min version with [X25519, P-256] curve preferences for key exchange is orthogonal to the server cert's signature algorithm. No Go code change needed. - Helm chart: Patterns 2 and 3 operators supply their own cert; this patch does not affect them. - Unrelated ed25519 uses (agent key algorithm detection, profile algorithm options, SSH key path examples, tlsprobe key metadata, cloud discovery key-algo display): all orthogonal to the server TLS bootstrap cert. Incidental cleanup - .gitignore: dropped dangling `strategy.md` entry (file doesn't exist in repo; entry was cruft). |
||
|
|
52248be717 |
v2.0.47: HTTPS Everywhere — TLS-only control plane, agents/CLI/MCP
Breaking change release. Plaintext HTTP listener removed. The certctl control plane now terminates TLS 1.3 on :8443 via http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md. Server - cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback), preflightServerTLS validation - cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe, watchSIGHUP wiring, cert/key path config threading - tls_test.go: 418-line regression coverage of reload, preflight, callback behavior, SAN validation Config - CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required) - Plaintext rejection: agents/CLI/MCP pre-flight-fail on http:// URLs with a pointer to docs/upgrade-to-tls.md Agents, CLI, MCP - All three pre-flight-reject http:// URLs with fail-loud diagnostic - CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust - CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass (loud warning on startup) - install-agent.sh emits both vars as commented template lines docker-compose - certctl-tls-init sidecar generates SAN-valid self-signed cert into deploy/test/certs/ on first boot - All demo-stack curls pin against ca.crt with --cacert Helm chart - Three TLS provisioning modes, exactly one required: - server.tls.existingSecret (operator-supplied) - server.tls.certManager.enabled (cert-manager integration) - server.tls.selfSigned.enabled (eval only — not for production) - server-certificate.yaml template for cert-manager mode - helm install without a TLS source fails at template render with a pointer to docs/tls.md CI - .github/workflows/ci.yml Helm Chart Validation step renders the chart in both existingSecret and cert-manager modes, plus an inverse guard-regression test that asserts helm template MUST refuse to render when no TLS source is configured. Previously the single `helm template` invocation hit the certctl.tls.required fail-loud guard and exit-1'd CI. Four invocations now: lint (existingSecret), template (existingSecret), template (cert-manager), template (no args — must fail). Integration tests - deploy/test/integration_test.go stands up the Compose stack over HTTPS, extracts the CA bundle, and exercises every certctl API over https://localhost:8443 - All 34 integration subtests green (per Phase 8 local CI-parity) Documentation - New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload) - New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade warnings, fleet-roll sequencing) - CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry (file heading unchanged; release tag is v2.0.47) - All curls in docs/, examples/, deploy/helm/ guides use https://localhost:8443 --cacert Verification - grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits - grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin API default, SSRF doc comment) — zero certctl endpoints - Tasks #197–#206 (Phases 0–8) all closed in the tracker Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix). |
||
|
|
04c7eca615 |
docs: reconcile scheduler topology across sibling docs (7 → 12 loops)
Authoritative 12-loop table lives at docs/architecture.md:522-534 (committed via
the I-001/I-003/I-005 + M48/M50 milestone commits). This change brings six sibling
docs into parity with that table so every surface — user-facing features reference,
SOC 2 compliance mapping, connectors guide, advanced demo architecture diagram,
testing guide, and in-line architecture prose — reflects the same 8 always-on + 4
opt-in topology.
Touches:
- docs/architecture.md: 2 inline ordinal references (9th / 8th loop) replaced with
descriptive names (opt-in cloud discovery / opt-in endpoint health), cross-linked
to the authoritative table to prevent future ordinal rot.
- docs/features.md: metric row (7 → 12), inline reference to 9th loop, and full
scheduler table expanded to include Always-on column + env vars + I-001/I-003/I-005
refs.
- docs/compliance-soc2.md: background scheduler monitoring bullets expanded to list
all 12 loops with env vars + I-series refs; table row updated with 8 always-on +
4 opt-in summary.
- docs/connectors.md: three inline ordinals (7th/6th/9th loop) replaced with
descriptive names, cross-linked to architecture.md.
- docs/demo-advanced.md: Mermaid SCHED node label updated from '7 background loops'
to '12 background loops (8 always-on + 4 opt-in)'.
- docs/testing-guide.md: Test 20.1.1 header + grep pattern expanded to include
job-retry / job-timeout / notification-retry / digest / endpoint-health /
cloud-discovery loops; sign-off chart row label updated.
Pure documentation reconciliation. No code changes. Master HEAD pre-commit:
|
||
|
|
6e646e0fe8 |
M-001/M-006: strip HTTP auth from EST/SCEP + fail-loud SCEP preflight
Closes CWE-306 (missing authentication for critical function) for SCEP
via a fail-loud startup gate, and aligns EST/SCEP HTTP dispatch with
their respective RFCs. CRL/OCSP remain unauthenticated under
.well-known/pki/* per RFC 5280 §5 / RFC 6960 / RFC 8615. Option (D):
no mTLS in this milestone.
- RFC 7030 §3.2.3 (EST auth is deployment-specific) and §4.1.1
(/cacerts explicitly anonymous): EST paths served unauthenticated;
CSR-signature + profile policy enforce identity inside ESTService.
- RFC 8894 §3.2: SCEP authenticates via the challengePassword
PKCS#10 attribute (OID 1.2.840.113549.1.9.7), not an HTTP credential.
HTTP dispatch is unauthenticated; preflightSCEPChallengePassword
refuses to start when CERTCTL_SCEP_ENABLED=true without
CERTCTL_SCEP_CHALLENGE_PASSWORD. SCEPService.PKCSReq enforces the
same invariant defense-in-depth and compares with
crypto/subtle.ConstantTimeCompare.
cmd/server/main.go:
- Extract buildFinalHandler(apiHandler, noAuthHandler, webDir,
dashboardEnabled); route /.well-known/est/*, /scep, /scep/*,
/.well-known/pki/crl/{id}, /.well-known/pki/ocsp/{id}/{serial},
and health probes through noAuthHandler (RequestID +
structuredLogger + Recovery only).
- Add preflightSCEPChallengePassword fail-loud gate; startup log
emits challenge_password_set boolean for operator visibility.
cmd/server/finalhandler_test.go (new, 314 lines, 27 subtests):
- TestBuildFinalHandler_Dispatch (20) + TestBuildFinalHandler_NoDashboard
(7) pin the dispatch surface: EST 4-endpoint, SCEP exact +
trailing-slash + query-string, PKI CRL+OCSP, health, /api/v1/*
authenticated, /assets/* file server, SPA fallback.
internal/api/router/router.go, internal/config/config.go:
- Router-level comments explain why EST/SCEP/PKI dispatchers sit
outside the authenticated mux; SCEP challenge password config
plumbed through.
docs/architecture.md:
- New EST Authentication subsection (RFC 7030 §3.2.3 + §4.1.1,
buildFinalHandler + noAuthHandler references).
- Rewrite SCEP Authentication subsection; replaces pre-existing
factually-incorrect "any value accepted" claim with CWE-306
preflight, service-layer defense-in-depth, and
crypto/subtle.ConstantTimeCompare.
- Top-level Authentication section: qualify /api/v1/* scope on API
clients bullet; add standards-based-endpoints bullet referencing
the 27-subtest regression harness.
docs/compliance-soc2.md:
- CC6.1: scope API Key Authentication to /api/v1/*; add
standards-based endpoints bullet citing RFCs and CWE-306 closure.
- CC6.3: scope API Key Policy to /api/v1/* with cross-reference to
CC6.1.
- Evidence Locations augmented with buildFinalHandler,
preflightSCEPChallengePassword, scep.go defense path, regression
harness, and OpenAPI security:[] overrides.
api/openapi.yaml: verified already correct (global bearerAuth
default overridden with security:[] on /cacerts, /simpleenroll,
/simplereenroll, /csrattrs, /scep GET+POST, /crl/{issuer_id},
/ocsp/{issuer_id}/{serial}); no edits needed.
|
||
|
|
675b87ba63 |
I-005: notification retry loop + dead-letter queue
Critical alerts can no longer be silently dropped by a transient
notifier failure. Failed notification attempts now ride an exponential
backoff retry loop, with a 5-attempt budget before promotion to the
dead-letter queue for operator intervention.
Schema (migration 000016, idempotent):
- retry_count INTEGER NOT NULL DEFAULT 0
- next_retry_at TIMESTAMPTZ
- last_error TEXT
- idx_notification_events_retry_sweep partial index
(next_retry_at) WHERE status='failed' AND next_retry_at IS NOT NULL
Dead rows clear next_retry_at so the index stops matching them.
Service contract:
- NotificationService.RetryFailedNotifications drives 2^n-minute
exponential backoff capped at 1h (notifRetryBackoffCap) with
5-attempt budget (notifRetryMaxAttempts).
- Exhaustion (RetryCount >= notifRetryMaxAttempts-1) promotes to
status='dead' via MarkAsDead.
- Non-terminal failures record via RecordFailedAttempt.
- Success path promotes to 'sent' without touching retry_count
(audit preserves "delivered on attempt N").
- Missing-notifier branch defensively promotes to 'sent' to avoid
wedging a row on a deleted channel.
- RequeueNotification operator escape hatch atomically resets
retry_count -> 0, next_retry_at -> NULL, last_error -> NULL,
status -> pending via notifRepo.Requeue.
Scheduler:
- New always-on notificationRetryLoop wired into the base loop set at
CERTCTL_NOTIFICATION_RETRY_INTERVAL (default 2m).
- sync/atomic.Bool idempotency guard.
- sync.WaitGroup shutdown drain via WaitForCompletion.
StatsService:
- SetNotifRepo setter pattern preserves 9 pre-existing
NewStatsService call sites (main.go + stats_test.go + 8 digest
tests) without touching the constructor signature.
- DashboardSummary.NotificationsDead populated via
notifRepo.CountByStatus(ctx, "dead") — nil-safe when unwired
(reports zero on systems without a notification repository).
- CountByStatus error is non-fatal (dashboard summary is
best-effort for this field).
- Prometheus certctl_notification_dead_total counter emitted from
the same snapshot.
Handler:
- New POST /api/v1/notifications/{id}/requeue endpoint.
- dead status surfaces to MCP + CLI.
Frontend:
- NotificationsPage gains two-tab toolbar ("All" / "Dead letter")
with queryKey: ['notifications', activeTab] so switching tabs
doesn't serve stale data until the 30s refetch.
- Dead rows surface "Retry {n}/5" + truncated last_error with
full-text title tooltip.
- Requeue mutation wrapped as
mutationFn: (id: string) => requeueNotification(id)
to prevent react-query v5's positional context argument from
leaking into the API client — pinned against future refactors
by strict-match toHaveBeenCalledWith('notif-dead-001') in
NotificationsPage.test.tsx:181.
Closes I-005.
|
||
|
|
0725713e19 |
Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.
Schema — migration 000015:
migrations/000015_agent_retire.up.sql flips
deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
boundary instead of quietly destroying targets. Both `agents` and
`deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
TEXT pair (TEXT not VARCHAR so operator comments are never
truncated), indexed via partial indexes WHERE retired_at IS NOT
NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
EXISTS) so repeated runs against partially-migrated databases
converge. migrations/000015_agent_retire.down.sql restores CASCADE
and drops the new columns for clean rollback. A dedicated
repository-layer testcontainers test
(internal/repository/postgres/migration_000015_test.go) asserts the
before/after FK action, column presence, index presence, and
round-trip idempotency under up→down→up.
Domain — sentinel guard + dependency counts:
internal/domain/connector.go gains IsRetired() on Agent, the
exported SentinelAgentIDs slice listing server-scanner,
cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
four reserved IDs documented in CLAUDE.md and created at startup in
cmd/server/main.go), IsSentinelAgent(id string) predicate,
AgentDependencyCounts{ActiveTargets, ActiveCertificates,
PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
ActorTypeSystem enum values used by audit emission downstream.
Coverage locked down by internal/domain/connector_test.go.
Service — 8-step ordered contract:
internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
opts{Force, Reason}) enforces a fixed execution order:
(1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
unconditionally; force=true does NOT bypass it.
(2) fetch — ErrAgentNotFound on miss.
(3) idempotency — if IsRetired() already, return
AgentRetirementResult{AlreadyRetired: true} with no new audit
event and no state change (safe to replay from flaky clients).
(4) preflight counts — collectAgentDependencyCounts runs
ActiveTargets, ActiveCertificates, PendingJobs sequentially
(not in parallel; keeps the per-query timeout predictable and
matches the repo's existing call-chain shape).
(5) force-reason guard — opts.Force=true with empty Reason returns
ErrForceReasonRequired (wired into the 400 status surface).
(6) dependency guard — HasDependencies() with opts.Force=false
returns BlockedByDependenciesError{Counts} (wired into the 409
body with per-bucket counts).
(7) mutation — single pinned retiredAt := time.Now(); agent
retirement first, then cascade target retirement if opts.Force,
all under the repo's single transaction so the two retired_at
stamps match to the second.
(8) best-effort audit — agent_retired always; agent_retirement_
cascaded additionally on the force path. Actor is whatever the
handler resolves from the request; actor type is mapped by
resolveActorType (system/agent-prefix→Agent/else→User). Audit
emission failures are logged via slog.Error but do not abort
the retirement (matches the house convention used by every
other scheduler-emitted event).
BlockedByDependenciesError implements Error() as
"active_targets=%d, active_certificates=%d, pending_jobs=%d" and
Unwrap() → ErrBlockedByDependencies. The single struct satisfies
errors.Is via Unwrap (used by scheduler-level tests) and errors.As
via the concrete type (used by the handler to fish out Counts for
the 409 body). ListRetiredAgents(page, perPage) adds a separate
paginated accessor with page<1→1 and perPage<1→50 normalization so
retired rows are queryable without polluting the default agent
listing.
Sentinel guard coverage is asymmetric by design: all four reserved
IDs are protected, and force=true cannot override. Regression tests
in internal/service/agent_retire_test.go assert each of the eight
steps in order, plus sentinel bypass attempts and idempotency
replay.
Handler + router — status-code surface:
internal/api/handler/agents.go:RetireAgent exposes seven status
codes on DELETE /agents/{id}:
200 on a fresh retirement (body echoes AgentRetirementResult).
204 on idempotent replay (AlreadyRetired=true; no new audit).
400 on ErrForceReasonRequired.
403 on ErrAgentIsSentinel.
404 on ErrAgentNotFound.
409 on BlockedByDependenciesError, with a custom body shape
{error, counts{active_targets, active_certificates,
pending_jobs}} that bypasses the default ErrorWithRequestID
envelope so callers get the per-bucket numbers directly.
500 on any other error.
Heartbeat HandleHeartbeat returns 410 Gone when the agent is
retired (ErrAgentRetired), signalling the agent to shut down.
Query params `force=true` and `reason=<text>` drive the cascade
path; both are forwarded as url.Values through the new MCP
transport.
internal/api/router/router.go registers GET /api/v1/agents/retired
literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
literal-beats-pattern-var precedence routes "retired" to the
paginated retired-agents listing instead of fetching a hypothetical
agent named "retired".
Agent binary — clean shutdown on 410:
cmd/agent/main.go gains the ErrAgentRetired sentinel, a
retiredOnce sync.Once, and a retiredSignal chan struct{}. A
markRetired(source, statusCode, body) helper closes the channel
exactly once; the Run() select loop observes the close and returns
ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
and exits cleanly instead of spinning in the heartbeat retry loop.
The 410 Gone surface is therefore terminal for the agent process.
MCP transport:
internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
a new additive transport method. Client.Delete is path-only; without
this method the retire tool would silently drop `force` and `reason`,
turning every cascade retire into a default soft-retire. The new
method shares do()'s 204 normalization and 4xx/5xx error
propagation so tool authors get one contract.
internal/mcp/tools.go + internal/mcp/types.go expose the
retire_agent tool with Force+Reason inputs wired through
DeleteWithQuery.
CLI:
cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
`agents list --retired` (client-side strip of --retired then
delegation to ListRetiredAgents, sharing --page/--per-page parsing
with the default listing) and `agents retire <id> [--force --reason
"…"]` (mirrors ErrForceReasonRequired — force without reason is
rejected client-side before the request is sent). JSON + table
output modes both honor the new columns.
Frontend:
web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
web/src/api/client.ts + web/src/api/types.ts expose the retire
endpoint and the retired-listing. 4 new Vitest regression cases.
OpenAPI:
api/openapi.yaml documents DELETE /agents/{id} with all seven
status codes, 410 on heartbeat, and the 409 per-bucket body shape.
Regression coverage (six new test files, all green):
internal/service/agent_retire_test.go — 8-step contract + sentinel guards
internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
internal/mcp/retire_agent_test.go — DeleteWithQuery wire-through
internal/cli/agent_retire_test.go — --retired listing + --force/--reason pairing
internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
internal/domain/connector_test.go — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies
Files:
api/openapi.yaml — DELETE + 410 + 409 body shape
cmd/agent/main.go — ErrAgentRetired, markRetired, retiredSignal
cmd/cli/main.go — handleAgents list/get/retire dispatch
docs/architecture.md, docs/concepts.md,
docs/testing-guide.md — retirement contract narrative
internal/api/handler/agents.go — RetireAgent, status surface, 410 on heartbeat
internal/api/handler/agent_handler_test.go — extended coverage
internal/api/handler/agent_retire_handler_test.go — new
internal/api/router/router.go — /agents/retired before /agents/{id}
internal/cli/agent_retire_test.go — new
internal/cli/client.go — ListRetiredAgents + RetireAgent
internal/domain/connector.go — IsRetired, SentinelAgentIDs,
IsSentinelAgent, AgentDependencyCounts,
ActorTypeAgent/System
internal/domain/connector_test.go — new
internal/integration/lifecycle_test.go — retirement fixture
internal/mcp/client.go — DeleteWithQuery additive transport
internal/mcp/retire_agent_test.go — new
internal/mcp/tools.go, internal/mcp/types.go — retire_agent tool + Force/Reason inputs
internal/repository/interfaces.go — AgentRepository retirement methods
internal/repository/postgres/agent.go — retire + cascade target retire + counts
internal/repository/postgres/migration_000015_test.go — new
internal/service/agent.go — wire into AgentService surface
internal/service/agent_retire.go — new 8-step contract
internal/service/agent_retire_test.go — new
internal/service/deployment.go — skip retired agents
internal/service/target.go — skip retired agents
internal/service/testutil_test.go — shared mocks extended
migrations/000015_agent_retire.up.sql — new
migrations/000015_agent_retire.down.sql — new
web/src/api/client.ts, types.ts + tests — retire endpoint wiring
web/src/pages/AgentsPage.tsx — retire UI
|
||
|
|
3287e174dc |
Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
|
||
|
|
5abeeb882b | fix(crypto): per-ciphertext PBKDF2 salt + v2 versioned format with v1 fallback (M-8) | ||
|
|
6315ef102a |
security(globalsign): remove InsecureSkipVerify and pin CA pool (H-5)
The GlobalSign Atlas HVCA connector previously used InsecureSkipVerify:true on its mTLS TLS config, disabling server certificate validation and defeating the purpose of the client-side mTLS handshake. This was a CWE-295 Improper Certificate Validation vulnerability silently degrading trust on every production call to GlobalSign's signing API. Remediation (per H-5 audit finding, Lens 4.4): - Remove InsecureSkipVerify from all three http.Client construction sites (ValidateConfig, getHTTPClient, and legacy initialisation path). - Introduce buildServerTLSConfig() helper that constructs tls.Config with MinVersion: tls.VersionTLS12 (addresses adjacent L-1 recommendation). - New optional config field `server_ca_path` (env: CERTCTL_GLOBALSIGN_SERVER_CA_PATH). When unset the connector trusts the system root CA bundle (correct default for GlobalSign's publicly-trusted HVCA endpoints). When set the bundle is loaded via x509.NewCertPool() + AppendCertsFromPEM, and only those roots are trusted (supports private HVCA deployments and defence-in-depth root pinning). - Error wrapping chain: "failed to read server CA bundle at %s" and "no valid PEM certificates found in server CA bundle at %s" surface config problems at ValidateConfig time instead of silently failing at request time. Docs, config, service env-seed, and GUI issuer type definition updated to expose the new field. Tests: 9 dead `InsecureSkipVerify: true` client TLSClientConfig blocks (no-ops against httptest.NewServer plain-HTTP) replaced with bare http.Client; new TestGlobalSign_ServerTLSConfig covers pinned-CA trust, untrusted-server rejection, missing-file and invalid-PEM error paths. Verification: - go build ./... clean - go vet ./... clean - go test -race ./internal/connector/issuer/globalsign/... ./internal/config/... ./internal/service/... ok - go test ./... (excluding testcontainers-gated repo layer) ok - golangci-lint run ./... 0 issues - govulncheck ./... 0 reachable vulns - Per-layer coverage: service 68.7% (≥55), handler 83.6% (≥60), domain 82.0% (≥40), middleware 63.8% (≥30) - globalsign package coverage: 75.9% - Invariant sweep: 0 InsecureSkipVerify references remain in globalsign package (only a test-file comment documenting the removal). |
||
|
|
a49eae8155 |
fix: correct BSL 1.1 change date to March 14, 2033
why-certctl.md said March 1, CHART_SUMMARY.md said March 28. The LICENSE file is authoritative: Change Date is March 14, 2033. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
13cd4d98ba |
feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id, owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke), server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI spec, compliance mapping updates, and 21 new tests (12 service, 7 handler, 1 CLI, 1 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e1bcde4cf1 |
feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret managers. Three pluggable DiscoverySource connectors feed into the existing discovery pipeline via sentinel agent pattern, with a 9th scheduler loop for periodic cloud scanning. - AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests - Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests - GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests - CloudDiscoveryService orchestrator with 9 tests - 9th scheduler loop (6h default, atomic.Bool idempotency) - Discovery page: color-coded source type badges - 14 new env vars across CloudDiscoveryConfig structs - Docs: connectors.md, architecture.md, features.md, README updated 49 new tests. All CI checks pass (go vet, race, lint, coverage). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3f619bcaac |
feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA coverage. Entrust uses mTLS client certificate auth with sync/async issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for self-hosted Keyfactor CAs. Each connector implements the full issuer.Connector interface (9 methods), includes httptest-based unit tests (~14 each), and follows established patterns (injectable HTTP clients, RFC 5280 revocation reason mapping, CRL/OCSP delegated to CA). Also includes: issuer factory cases, env var seeding, config structs, domain types, seed data (3 rows, all disabled), OpenAPI enum updates, frontend issuer catalog entries with config fields, and full docs (connectors.md, architecture.md, features.md, README). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
596d86a206 |
feat(M48): continuous TLS health monitoring — endpoint state machine, shared tlsprobe, 8 API endpoints, GUI
Adds continuous TLS endpoint health monitoring that closes the deploy→verify→monitor loop. After M25 verifies a deployment succeeded once, M48 continuously confirms it stays healthy. Key components: - Shared `internal/tlsprobe/` package extracted from network scanner for reuse - Health status state machine: healthy → degraded (2 failures) → down (5 failures), plus cert_mismatch when served fingerprint differs from expected - 8th scheduler loop (60s tick, per-endpoint configurable intervals) - PostgreSQL migration 000011: endpoint_health_checks + endpoint_health_history tables - 8 REST API endpoints (CRUD, history, acknowledge, summary) - Health Monitor GUI page with summary bar, status table, create modal, auto-refresh - 38 new tests (5 tlsprobe + 11 domain + 10 service + 8 handler + 4 frontend) - All coverage thresholds maintained (service 68%, handler 83%, domain 87%, middleware 63%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f2e60b93a3 |
feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths (renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs with key algorithm/size that don't match profile rules. MaxTTL enforcement caps certificate validity per issuer connector (Local CA, Vault, step-ca enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and size are now persisted in certificate_versions for audit compliance. 16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded version number from GUI sidebar. Documentation updated across architecture, features, connectors, and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
bcefb11e65 |
feat(M51): add SCEP server (RFC 8894) for MDM and network device enrollment
Implements Simple Certificate Enrollment Protocol with single-endpoint operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7 SignedData CSR extraction with fallback for raw/base64 CSR, challenge password authentication via CSR attributes, and shared internal/pkcs7 package extracted from EST handler to eliminate code duplication. 24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
75cf8475f5 |
tighten BSL license scope, fix documentation underselling shipped features
Broadened BSL Additional Use Grant from "hosted or managed service" to cover any commercial offering (embedded, bundled, integrated). Updated README to promote all shipped connectors from Beta to Implemented, added EST/ARI/S/MIME highlight, Helm quickstart, and corrected license description. Fixed connectors.md stale claims (AWS ACM PCA listed as planned, K8s Secrets listed as coming soon) and updated overview with exact connector counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c015cab2f4 |
docs: rewrite features.md, audit README + architecture against repo
Rewrote docs/features.md from scratch as authoritative feature inventory (1255 lines, every claim verified against source files). Audited README.md and architecture.md against repo — fixed 19 stale references: K8s Secrets status, issuer counts, dashboard page counts, CI thresholds, missing connectors in Mermaid diagrams, OpenAPI operation count, GetCACertPEM behavior, and V2/V4 roadmap accuracy. Also includes related fixes discovered during audit: - Scheduler skips expired/failed/revoked certs from auto-renewal - Seed demo expiry dates moved outside 31-day scheduler query window - Agent pages use correct last_heartbeat_at field name Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3da6584ab8 |
fix: correct K8s Secrets status to 'Coming in 2.1', increase audit trail page size to 200
The Kubernetes Secrets target connector has config validation, tests, UI, and Helm RBAC implemented but the realK8sClient is a stub — runtime deployment will fail. Update README and connectors.md to reflect actual status instead of misleading 'Beta' label. Also increase the audit trail GUI default from 50 to 200 events per page (backend already permits up to 500). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5567d4b411 |
feat(M47): add Kubernetes Secrets target + AWS ACM PCA issuer connectors
Implement both M47 connectors with full cross-layer wiring: Kubernetes Secrets target: DNS-1123 validation, kubernetes.io/tls Secret create-or-update, chain concatenation, serial number validation, Helm RBAC gating. 18 tests. AWS ACM Private CA issuer: synchronous issuance (like Vault), ARN regex validation, RFC 5280 revocation reason mapping, CA cert retrieval, factory + env var seeding. 23 tests. Cross-cutting: domain types, service validation, config, factory, agent dispatch, frontend (TargetsPage, issuerTypes), OpenAPI, seed data, Helm chart, connectors docs, README. Testing docs (testing-guide, qa-test-guide, qa_test.go) with Parts thematically integrated near related connectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e5516d7286 |
test: add unified QA test suite (qa_test.go) replacing legacy bash smoke script
1717-line Go test file covering all 52 Parts of testing-guide.md against the Docker Compose demo stack. ~120 automated subtests (API, DB, source, perf), 11 skipped Parts with reasons, ~270 manual gaps documented. Audited against actual router, seed data, domain structs, and migrations — 8 factual bugs caught and fixed during review. Companion guide at docs/qa-test-guide.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
fd94e0bd19 |
docs: comprehensive testing guide audit — expand thin Parts, add 11 new connector/feature test sections
Refactored testing-guide.md from V2.0 (42 Parts, 444 tests) to V2.1 (52 Parts, 507 tests): - Expanded Part 11 (ARI) and Part 19 (Agent Work Routing) with What/Why intro paragraphs and per-test annotations explaining the production impact - Replaced Part 40 (Documentation) passive table with 8 executable verification tests (README screenshots, issuer/target type matching, OpenAPI parity, etc.) - Added Part 39 benchmark tests for Prometheus endpoint and audit trail queries - Added 11 new Part sections (42-52) covering all previously untested features: Envoy, Postfix/Dovecot, SSH, WinCertStore, JavaKeystore, Digest Email, Dynamic Issuer/Target Config, Onboarding Wizard, ACME Profiles, Helm Chart - Fixed stale TOC entries (regenerated from actual headings) - Removed duplicate TOC block left from previous reorder - Added sign-off chart entries for all new Parts - Updated summary: 144 auto (passed) + 88 auto (pending) + 5 skipped + 270 manual = 507 total Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
d0415d3b5e |
chore: move HSM/TPM to V3 paid tier, rename roadmap.md to strategy.md
- HSM/TPM agent key storage and CA key storage moved from V5+ to V3 Pro (enterprise compliance gate, not adoption driver) - Renamed roadmap.md to strategy.md (gitignored, never committed) - Updated compliance-nist.md HSM references from V5 to V3 Pro Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c6efa4ab39 |
docs: add Docker Compose environments guide and fix compose files
- New deploy/ENVIRONMENTS.md: comprehensive walkthrough of all 4 compose files with service-by-service explanations, beginner-friendly Docker concepts, and expert-level networking/config details - Fix docker-compose.dev.yml: agent LOG_LEVEL → CERTCTL_LOG_LEVEL (was silently ignored without the CERTCTL_ prefix) - Add CERTCTL_CONFIG_ENCRYPTION_KEY to base and test compose (enables M34/M35 dynamic issuer/target config encryption) - Add CERTCTL_DISCOVERY_DIRS to base compose agent (enables filesystem certificate discovery in default deployment) - Cross-link ENVIRONMENTS.md from README doc table and quickstart.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4b5927dfff |
docs: expand README documentation table and fix orphaned doc links
- README: Add 7 missing docs to documentation table (MCP server, OpenAPI guide, migration guides for certbot/acme.sh/cert-manager, test environment, testing guide). Fix connector reference description to remove stale counts. Link OpenAPI guide instead of raw YAML. - architecture.md: Add cross-references to testing-guide.md and test-env.md from testing strategy section and What's Next links. These were the only two orphaned docs with zero inbound references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
cc03f55006 |
docs: comprehensive documentation audit — fix stale counts, V2/V3 matrix, connector status
- features.md: Fix Feature Matrix to correctly show all V2 Free features (F5/IIS/WinCertStore/JavaKeystore as Implemented, not Stub; Vault/DigiCert/ Sectigo/GoogleCAS as V2 Free, not V3 Paid). Add missing shipped features (EST, verification, export, S/MIME, ARI, digest, Helm, onboarding). Update issuer count to 9, target count to 13. - architecture.md: Fix F5/IIS from "interface only, implementation planned" to implemented. Add all 13 target connectors to built-in targets list. - why-certctl.md: Add Sectigo and Google CAS to issuer list (7→9). Fix target count (10→13). Remove hardcoded endpoint/operation counts. - connectors.md: Fix F5 BIG-IP TOC entry from "Interface Only" to "Implemented". Remove dead "Planned Issuers" TOC link. - README.md: Remove competitor product names (CertKit, KeyTalk). Remove hardcoded dashboard page count. Remove hardcoded endpoint counts. Fix V4 roadmap to remove already-shipped issuers (Sectigo, Google CAS). - Remove hardcoded MCP tool counts (78/80) across 8 files (mcp.md, architecture.md, features.md, testing-guide.md, concepts.md, quickstart.md, demo-advanced.md, why-certctl.md). Replace with "REST API exposed via MCP" to avoid future drift. - quickstart.md: Docker Compose environments table (from previous session). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7d6ef44e21 |
feat(M46): Windows Certificate Store + Java Keystore target connectors, shared certutil package
Extract shared certutil helpers (CreatePFX, ParsePrivateKey, ComputeThumbprint, GenerateRandomPassword, ParseCertificatePEM) from IIS connector for reuse. Add WinCertStore connector (PowerShell Import-PfxCertificate, dual local/WinRM mode, configurable store/location, expired cert cleanup) and JavaKeystore connector (PEM→PKCS#12→keytool pipeline, JKS/PKCS12 support, shell injection prevention, path traversal protection). 53 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f92c997a50 |
feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone: 1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support. ES256 JWS signing with kid mode, nonce management, directory discovery. Empty profile delegates to standard library path (zero behavior change). Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on ACME issuer config. 2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source, docs, README, and examples. Zero remaining occurrences of RFC 9702. 3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating renewal thresholds against SC-081v3 validity reduction timeline (200→100→47 days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the expected renewal path for 6-day shortlived certs. New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
697c0be9f3 |
feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any Linux/Unix server without installing the certctl agent binary. Uses the proxy agent pattern — a single agent in the same network zone deploys certs to remote servers over SSH/SFTP. Key additions: - SSH/SFTP connector with key auth (file/inline) + password auth - Injectable SSHClient interface for cross-platform testing (25 tests) - Shell injection prevention via validation.ValidateShellCommand() - Configurable cert/key/chain paths with octal permissions - GUI: 11 SSH config fields in target create wizard Also fixes pre-existing frontend bug where all target type strings (nginx, apache, etc.) were sent as lowercase but the backend expects proper-case (NGINX, Apache, etc.), breaking GUI-created targets. Adds missing TargetTypeSSH to validTargetTypes service map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e19b8c95fe |
docs: remove hardcoded test counts from public-facing docs
Replace brittle test count numbers (1,554+, 1,088+, 211, etc.) with descriptions of testing approach and CI-enforced coverage gates. Counts go stale every milestone — coverage thresholds are machine- verified and never drift. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
2a14a1da01 |
feat(M40): F5 BIG-IP target connector via iControl REST
Replace 190-line stub with full iControl REST implementation (~580 lines). Token auth with 401 auto-retry, file upload + crypto object install, transaction-based atomic SSL profile updates, cleanup on failure. Injectable F5Client interface for cross-platform testing. 32 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5a53b648b1 |
feat(M44): Google CAS issuer connector
Google Cloud Certificate Authority Service integration via REST API with OAuth2 service account auth (JWT→access token). Synchronous issuance model, CA pool selection, mutex-guarded token caching, revocation with RFC 5280 reason mapping. No Google SDK dependency — all stdlib. 19 tests with httptest mock OAuth2 + CAS API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3a11e447cf |
feat(M43): Sectigo SCM issuer connector
Implement Sectigo Certificate Manager REST API connector with async order model (enroll → poll → collect PEM), 3-header auth, DV/OV/EV support, collect-not-ready (400/-183) graceful handling, and RFC 5280 revocation reason mapping. 20 tests with httptest mock API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
bad02e6f23 |
docs: add deployment examples index and cross-link migration guides
Create docs/examples.md as the central entry point for all 5 turnkey docker-compose scenarios with a decision matrix, per-example summaries, and contextual migration guide links. Update quickstart.md to bridge from demo to real deployment. Consolidate README docs table (10 rows from 13). Fix Vault PKI "(planned)" in cert-manager guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4c3b7cbb16 |
docs: fix stale references, seed data case bugs, and convert ASCII diagrams to Mermaid
Audit all docs and examples against current codebase state. Fix seed_demo.sql domain constant casing (IssuerType, TargetType, AgentStatus) that would cause agent dispatch failures. Fix example docker-compose health endpoints (/health not /api/v1/health) and env var names (CERTCTL_DATABASE_URL). Update connector counts, test numbers, and planned→implemented status across docs. Convert 3 ASCII flow diagrams to Mermaid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e8c64b47dd |
docs: rewrite why-certctl positioning page
Fix stale competitive claims (IIS shipped in M39, target count now 10), add 47-day operational math as forcing function, add credibility signals (1554 tests, 97 API operations, CI pipeline), restructure competitive comparisons by category for scannability, add "What Else Ships Free" feature surface section, add "Who Should Look Elsewhere" disqualification, move ownership message to opening paragraph. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
9feb6c796d |
feat(M42): Postfix/Dovecot mail server target connector
Dual-mode TLS connector for mail servers — single package with mode field selecting Postfix or Dovecot defaults. File-based cert/key deployment with correct permissions (cert 0644, key 0600), optional chain append, shell injection prevention, and configurable reload/validate commands. 18 tests covering config validation, deployment, and security. GUI wizard fields and OpenAPI enum updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
fd05bacb76 |
feat(M41): Envoy target connector with SDS support
File-based deployment for Envoy service mesh — writes cert/key/chain to watched directory with optional SDS JSON config for xDS bootstrap. Path traversal prevention, configurable filenames, 15 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
9a41d0ca39 |
feat(M39): IIS WinRM proxy agent mode + front-to-back wiring
Complete the IIS target connector with dual-mode deployment: - WinRM proxy agent mode via masterzen/winrm for remote Windows servers - Base64 PFX transfer with try/finally cleanup on remote host - GUI wizard updated with 13 IIS config fields including WinRM settings - TargetDetailPage sensitive field redaction (password/secret/token/key) - OpenAPI TargetType enum updated (added Traefik, Caddy) - connectors.md fully documented with WinRM proxy config example - 38 total IIS tests (10 new WinRM tests), all passing with race detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8b52da6aef |
feat(M39): IIS target connector + README overhaul
Implement full IIS target connector with PEM-to-PFX conversion via go-pkcs12, PowerShell-based deployment (Import-PfxCertificate, IIS binding management), SHA-1 thumbprint computation, and SNI support. Injectable PowerShellExecutor interface enables cross-platform testing. Regex-validated config fields prevent PowerShell injection. 28 tests. Restructure README from 563 to 313 lines: outcome-focused feature descriptions, "Who Is This For" persona section, examples promoted above the fold, configuration/API/security reference moved to docs. All numbers verified against repo (25 GUI pages, 97 OpenAPI ops, CI thresholds service 55%/handler 60%/domain 40%/middleware 30%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
0822f748a5 |
feat: S/MIME certificate support in integration tests + test env docs
Add S/MIME (emailProtection EKU) end-to-end test coverage: - ValidateCommonName() now accepts email addresses for S/MIME certs - S/MIME test profile (prof-test-smime) in seed data - Phase 11 test: issuance, EKU, KeyUsage, email SAN verification - EST config enabled in test Docker Compose - Portable KeyUsage parsing (awk, works on BSD/GNU) - Full test environment documentation (docs/test-env.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6c8d4eca40 |
feat: frontend audit fixes, README accuracy pass, doc updates
Frontend audit (10 categories): lifecycle fields in types, new API functions (CRL, OCSP, deployments, updateIssuer/Target, getPolicy), issuer/owner/profile filters on CertificatesPage, last_renewal_at column, error_message column on JobsPage, full crypto policy UI on ProfilesPage (key algorithms, EKUs, SAN patterns), key info + CA badge on DiscoveryPage, edit modal on TargetDetailPage, tags field on certificate creation, darwin→macOS mapping on AgentFleetPage. 211 Vitest tests passing. README accuracy: test counts (1300+ Go, 211 frontend), page count (24), demo data (32 certs, 7 issuers, 180 days), endpoint count (97), MCP tools (80), CLI subcommands (10), moved shipped items out of "Coming in v2.1.0". Docs: architecture.md diagrams updated (Vault PKI, DigiCert, Traefik, Caddy added), features.md Vault/DigiCert status updated. Version bumped to v2.0.20. cli binary removed from git tracking. Testing guide Part 41 added (12 auto + 9 manual tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
836534f2a7 |
feat: add issuer catalog page with type discovery + fix cert creation defaults (M33)
Issuer Catalog (M33): - Shared issuer type config (issuerTypes.ts) with 6 supported + 2 coming-soon types - Composable wizard components (TypeSelector, ConfigForm, ConfigDetailModal) - Catalog card layout with Connected/Available/Coming Soon badges - VaultPKI and DigiCert added to create wizard with full config fields - ACME EAB fields (eab_kid, eab_hmac with sensitive flag) - Issuer type filter dropdown on configured issuers table - Config detail modal replacing 60-char truncation - IssuerDetailPage uses shared typeLabels/redactConfig, Edit button, enabled/disabled status - StatusBadge extended with Enabled/Disabled styles - 2 new frontend tests (VaultPKI + DigiCert create payload verification) Bug fixes: - CertificateService.CreateCertificate now defaults Status to Pending and Tags to empty map when not set (DB column DEFAULTs only apply when columns are omitted from INSERT, but our repo always includes all columns) - CreateCertificate handler now logs actual error via slog.Error before returning generic 500, enabling root cause debugging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6375909591 |
feat: add Vault PKI and DigiCert CertCentral issuer connectors (M32 + M37)
Vault PKI: synchronous issuance via /v1/{mount}/sign/{role}, token auth,
revocation, CA cert retrieval, 14 tests. DigiCert CertCentral: async order
model (submit → poll → download), X-DC-DEVKEY auth, OV/EV support, PEM
bundle parsing, 16 tests. Both conditionally registered based on env vars.
Includes OpenAPI enum updates, seed data, connector docs, architecture docs,
README badges, and testing guide sign-off (Parts 38 + 39, 12 automated
smoke test assertions all passing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
a6515b4323 |
feat(Pre-2.1.0-E): GUI completeness — 5 new pages, clickable nav, verification badges
Wire all remaining backend features to the frontend GUI: New pages: - DigestPage: preview digest HTML via iframe + send with confirmation - ObservabilityPage: health status, metrics gauges, Prometheus config + live output - JobDetailPage: full job details, verification section, timeline, audit events - IssuerDetailPage: redacted config, test connection, issued certificates list - TargetDetailPage: config, agent link, deployment history with verification Existing page updates: - JobsPage: clickable job IDs, verification column with VerificationBadge - IssuersPage: clickable issuer names linking to detail page - TargetsPage: clickable target names linking to detail page - Sidebar: Digest and Observability nav items - 5 new routes in main.tsx API client: getJob, getIssuer, getTarget, getJobVerification, getPrometheusMetrics Tests: 7 new Vitest tests (203 total), testing-guide Part 37 (17 manual tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
11173a74c6 |
feat(M31): agent work routing — scope jobs to assigned agents
Deployment jobs now set agent_id from target→agent relationship at creation time. GetPendingWork() uses ListPendingByAgentID() with a 3-way UNION query (direct match, legacy NULL fallback via target JOIN, AwaitingCSR via cert→target→agent chain) so each agent only receives its own jobs. - Added AgentID *string to Job domain struct - Added agent_id to all job SQL queries (5 SELECTs, INSERT, UPDATE, scanJob) - New ListPendingByAgentID() repository method - Rewrote GetPendingWork() from ~25 lines to single scoped query - 4 new Go tests (3 agent routing + 1 deployment agent_id) - Frontend: agent_id/target_id on Job type Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ec0e7a3560 |
feat: wire ARI (RFC 9702) into renewal scheduler
CheckExpiringCertificates() now queries each issuer's ARI endpoint before creating renewal jobs. If the CA says "not yet" (suggested window hasn't opened), renewal is deferred. ARI errors fall back gracefully to threshold-based logic. Audit trail records renewal_trigger=ari when ARI drives the decision. 4 new unit tests: ShouldRenewNow, NotYet, NilFallback, ErrorFallback. 3 new smoke tests in testing-guide.md Part 35. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |