Commit Graph

9 Commits

Author SHA1 Message Date
shankar0123 9c1d446e40 fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1)
The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the
startup log faithfully echoed 'authentication enabled type=jwt'.
Reasonable people read that and concluded JWT auth was on. It wasn't.
The auth-middleware wiring at cmd/server/main.go unconditionally routed
every request through the api-key bearer middleware regardless of
cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming
'Authorization: Bearer <token>' against whatever string the operator put
in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who
treated CERTCTL_AUTH_SECRET as a *signing* secret (because they thought
they were configuring JWT) had effectively handed an attacker an api-key.
A security finding masquerading as a config option.

We chose the audit-recommended structural fix: remove the option, fail
fast at startup, and add the gateway-fronting pattern as the documented
forward path. Implementing JWT middleware would have meant jwks vs
static-secret rotation, claim mapping, expiry enforcement, audience and
issuer validation, key rollover semantics, and regression coverage at the
same depth as the existing api-key path — a feature, not a fix. Operators
who genuinely need JWT/OIDC front certctl with an authenticating gateway
(oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium /
Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same
shape works on docker-compose and Helm.

The change is comprehensive across 7 phases — every surface that
mentioned 'jwt' as a certctl-auth-type is updated, plus structural
backstops (typed enum, runtime guard, helm template validation, CI grep
guard) so the lie can't reappear.

Files changed:

Phase 1 — production code (typed enum + jwt removal):
- internal/config/config.go: AuthType typed alias + AuthTypeAPIKey /
  AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes
  literal 'jwt' through a dedicated multi-line diagnostic naming the
  authenticating-gateway pattern, then cross-checks against
  ValidAuthTypes(). Secret-required branch simplified to api-key-only.
  Field comment on AuthConfig.Type rewritten to drop jwt and point at
  the gateway pattern.
- internal/api/middleware/middleware.go: AuthConfig.Type field comment
  references the typed config.AuthType constants.
- internal/api/handler/health.go: same treatment for HealthHandler.AuthType.
- cmd/server/main.go: defense-in-depth runtime switch immediately after
  config.Load() — exits 1 on any unsupported auth-type that bypassed the
  validator. Auth-disabled startup log explicitly names the
  authenticating-gateway pattern.

Phase 2 — tests (Red→Green, contract pinning):
- internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated
  (two table rows pinning the dedicated G-1 error fires regardless of
  whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property
  guard against future re-introduction),
  TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract),
  TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still
  hit the generic invalid-auth-type error). Removed the prior
  TestValidate_JWTAuth_MissingSecret happy-path since its premise is
  inverted post-G-1.
- internal/api/handler/health_test.go: removed
  TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie
  into the regression suite). Pre-existing _APIKey test continues to
  cover the api-key happy path.

Phase 3 — spec, docs, env templates:
- api/openapi.yaml: auth_type enum dropped to [api-key, none] with
  inline comment naming the G-1 closure.
- .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop
  jwt and point at the gateway pattern; secret-required conditional
  simplified to api-key-only.
- docs/architecture.md: middleware-stack bullet rewritten to drop the
  JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)'
  section explaining the design rationale and listing oauth2-proxy /
  Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy
  forward_auth / Apache mod_auth_openidc / nginx auth_request as the
  standard fronting options.
- docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide
  with preconditions, what-changes, both recovery paths, complete
  docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy
  ext_authz patterns, rollback posture.

Phase 4 — Helm chart (template validation + docs):
- deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType
  helper mirroring the existing certctl.tls.required pattern. Fails
  template render on any server.auth.type outside {api-key, none} with
  a multi-line diagnostic.
- deploy/helm/certctl/templates/server-deployment.yaml,
  server-configmap.yaml, server-secret.yaml: invoke the helper at the
  top of each template that depends on .Values.server.auth.type.
- deploy/helm/certctl/values.yaml: auth: block comment expanded with the
  G-1 rationale and gateway-pattern cross-reference.
- deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces
  the allowed set and points at the upgrade doc.
- deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating
  gateway' section with a Kubernetes-flavored oauth2-proxy + certctl
  walkthrough.

Phase 5 — release surface:
- CHANGELOG.md: new [unreleased] top entry with Breaking / Removed /
  Added / Changed sections; explicit pointer at
  docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection.

Phase 6 — CI guardrail:
- .github/workflows/ci.yml: new 'Forbidden auth-type literal regression
  guard (G-1)' step. Scoped patterns catch the actual regression shapes
  (map literal, slice literal, switch case, OpenAPI enum, env-file
  default, AuthType('jwt') cast). Comments and the dedicated rejection
  branch are intentionally exempt; connector-package JWT references
  (Google OAuth2 / step-ca) are exempt as out-of-scope external
  protocols. Verified locally: the guard passes on the actual tree and
  fires on all 4 synthetic regression patterns.

Out of scope (explicitly untouched):
- internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service-
  account JWT (external protocol).
- internal/connector/issuer/googlecas/googlecas.go — same.
- internal/connector/issuer/stepca/stepca.go — step-ca's provisioner
  one-time-token JWT for /sign API.
- docs/test-env.md, docs/connectors.md, docs/features.md — describe
  external CAs' use of JWT, not certctl's auth shape.
- Implementing actual JWT middleware. Feature, not a fix.

Verification (all gates pass):
- go build ./... — clean
- go vet ./... — clean
- go test -short ./... — every package green
- go test -short -race ./internal/config/... ./internal/api/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template with auth.type=api-key — renders OK
- helm template with auth.type=none — renders OK
- helm template with auth.type=jwt — fails with validateAuthType
  diagnostic (exit 1)
- python3 yaml.safe_load on api/openapi.yaml — parses
- CI guardrail mirror — clean on real tree, fires on all 4 synthetic
  regression patterns
- Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero
  with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no
  longer accepted (G-1 silent auth downgrade): no JWT middleware ships
  with certctl. To use JWT/OIDC, run an authenticating gateway
  (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in
  front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream.
  See docs/architecture.md "Authenticating-gateway pattern" and
  docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough'

config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%.

Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
      §2 P1 cluster, cat-g-jwt_silent_auth_downgrade
      Audit recommendation followed verbatim: 'Remove jwt from
      validAuthTypes until middleware ships'.
2026-04-25 00:22:23 +00:00
shankar0123 ff7357f889 fix(lint): godoc comment on NewAuthWithNamedKeys must lead with function name (ST1020)
CI failure on master (commit 3287e17) — staticcheck ST1020:

  internal/api/middleware/middleware.go:125:1: ST1020: comment on exported
  function NewAuthWithNamedKeys should be of the form
  "NewAuthWithNamedKeys ..." (staticcheck)

When NewAuth was renamed to NewAuthWithNamedKeys during the M-002 auth
unification, the leading godoc sentence was left pointing at the old name.
Rewrite the comment so its first sentence starts with the new function
name, and expand the body to describe the named-key + admin-flag contract
introduced in 3287e17.

Also gitignore /.gopath/ — session-scoped tool install cache, same
category as /.gocache/ and /.gomodcache/.

Verification:
  go vet ./internal/api/middleware/...          — clean
  go build ./internal/api/middleware/...        — clean
  go test ./internal/api/middleware/...         — PASS (0.245s)
  staticcheck -checks=all,<project exclusions>  — clean across
    middleware, handler, service, domain, cmd/server, scheduler

Closes: CI failure on 3287e17.
2026-04-18 21:38:46 +00:00
shankar0123 3287e174dc Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
a53a4b8. The work lands as a single commit spanning server, docs, tests,
and the React client.

M-002 — Named API keys with per-key actor propagation
  * Migration 000014 adds the 'api_keys' table (id, name, hash,
    principal, role, created_at, last_used_at, disabled_at) so every
    credential carries an identifiable principal instead of the
    opaque 'anonymous'/'api-key' sentinel.
  * Auth middleware now rotates through configured keys, performs
    constant-time hash comparison, stamps 'last_used_at', and emits
    an actor struct via contextWithActor(). The audit middleware,
    bulk-revocation handler, approval handlers, and MCP tool layer
    now read the principal off the context and persist it on every
    audit_events row.
  * Regression coverage:
      - internal/api/middleware/audit_test.go — actor propagation,
        principal redaction for disabled keys, anonymous fallback for
        unauthenticated endpoints.
      - internal/api/handler/bulk_revocation_handler_test.go,
        job_handler_test.go — principal-on-audit assertions.

M-003 — Authorization gates (Phase B)
  * Approval handler rejects self-approval / self-rejection with 403
    when the actor principal equals the job's requested_by field.
  * Bulk revocation is gated behind the 'admin' role; operators and
    viewers receive 403.
  * Regression coverage:
      - internal/service/job_test.go — TestApproveJob_NotSelf,
        TestRejectJob_NotSelf.
      - internal/api/handler/bulk_revocation_handler_test.go —
        TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.

M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
  * Per RFC 8615, relying parties cannot reasonably be asked to
    authenticate against the issuing certctl instance to retrieve
    revocation material. CRL and OCSP move off the authenticated
    '/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
        GET /.well-known/pki/crl/{issuer_id}
            Content-Type: application/pkix-crl   (RFC 5280 §5)
        GET /.well-known/pki/ocsp/{issuer_id}/{serial}
            Content-Type: application/ocsp-response  (RFC 6960)
  * Non-standard JSON CRL shape is removed; only DER is served.
  * Short-lived certificate exemption (profile TTL < 1h → skip
    CRL/OCSP) is preserved; the response simply omits the serial.
  * Routes are registered on the unauthenticated 'finalHandler' mux
    in cmd/server/main.go alongside EST ('/.well-known/est/*') and
    SCEP ('/scep'). Legacy authenticated paths return 404.
  * Regression coverage:
      - internal/api/handler/certificate_handler_test.go — content
        type, DER parseability, 404 for unknown issuer.
      - internal/api/handler/adversarial_path_test.go — unauthenticated
        access asserted for CRL, OCSP, EST, SCEP.
      - internal/api/router/router_test.go — route-table assertion
        that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
        mounted on the unauthenticated branch.

M-001 — Auto-closed by M-002
  EST and SCEP were already registered on the unauthenticated
  'finalHandler' mux; the router comment at
  internal/api/router/router.go:247 now matches reality. The
  adversarial-path tests above lock the behavior in.

Verification (all gates green):
  * go vet ./...                                           — clean
  * go build ./...                                         — ok
  * go test -short ./... (55+ packages)                    — all pass
  * web/ : npm test (225 Vitest tests)                     — all pass
  * web/ : npx tsc --noEmit                                — clean
  * grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
    all intentional M-006 tombstone/relocation comments.

Documentation:
  * coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
    Fixed, with per-finding resolution paragraphs citing regression
    test IDs. (Audit file lives outside this repo; see cowork root.)
  * CLAUDE.md Project Status line updated with the auth-unification
    closure note.
  * docs/features.md, docs/architecture.md, docs/quickstart.md,
    docs/concepts.md, docs/connectors.md, docs/test-env.md,
    docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
    — refreshed for the new '.well-known/pki/*' namespace and named
    API keys.
  * api/openapi.yaml — documents the new unauthenticated endpoints
    and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.

.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
2026-04-18 18:17:41 +00:00
shankar0123 e3196e7b50 M-2 PR-F: Middleware/ACME ctx-propagation + contextcheck linter + audit closeout
Final PR in the six-commit M-2 sequence (PR-A: CertificateService cluster
cdc9d03, PR-B: IssuerService+TargetService eb14236, PR-C: Policy/Profile/
Owner/Team 2497be4, PR-D: Job/Notification/Audit ccd89c3, PR-E: AgentService
283ec27, PR-F: this commit). PR-A through PR-E collapsed the service-layer
shim methods and deleted every in-production context.Background() /
context.TODO() call from internal/service/; this PR completes the sweep
across the non-service tiers (HTTP middleware + ACME connector) and wires
the contextcheck linter so regressions fail CI.

Three narrow edits land the D-3 pattern (context.WithoutCancel for
subsidiary async writes and deferred shutdown contexts):

  - internal/api/middleware/audit.go  -- async audit goroutine now runs
    on auditCtx := context.WithoutCancel(r.Context()) instead of
    context.Background(). Preserves request-scoped values (trace ID, auth)
    while detaching from the request's cancellation so the audit write
    does not get killed when the response completes. Goroutine is still
    tracked via a.wg (M-1 shutdown drain) so Flush(ctx) behaviour is
    unchanged. CWE-770 Missing Release (goroutine leak potential) +
    CWE-400 Resource Exhaustion (missed cancellation propagation).

  - internal/api/middleware/middleware.go -- Recovery panic path now
    logs via slog.ErrorContext(ctx, ...) instead of log.Printf. Request-
    scoped trace/auth metadata now carries through the panic log, matching
    every other request log. D-3 non-bypass: the context is r.Context()
    captured before the defer, so even a panic mid-handler propagates
    the ctx's trace ID into the ERROR log line.

  - internal/connector/issuer/acme/acme.go (HTTP-01 challenge server
    shutdown) -- defer shutdown context derived from
    context.WithTimeout(context.WithoutCancel(ctx), 5s) instead of
    context.Background(). Preserves parent ctx values, detaches from
    parent cancellation so Shutdown always gets its full 5-second
    budget even when the parent was cancelled. Matches the same pattern
    applied in ACME's solveAuthorizationsDNS01 and solveAuthorizationsDNSPersist01.

Linter wiring: .golangci.yml adds `contextcheck` to the enabled set.
golangci-lint v2.11.4 now fails CI on any function that takes a
context.Context parameter but calls into context.Background() or
context.TODO() instead of propagating -- regression guard for all five
prior PRs.

Verification (CI parity, GOCACHE=/tmp/gocache GOMODCACHE=/tmp/gomodcache
GOLANGCI_LINT_CACHE=/tmp/lintcache):

  - go build ./... -> 0
  - go vet ./... -> 0
  - golangci-lint run (contextcheck enabled) -> 0 issues
  - go test -race -short ./internal/api/middleware/... -> PASS
  - go test -race -short ./internal/scheduler/... -> PASS
  - go test -race -short ./internal/connector/issuer/acme/... -> PASS
  - go test -race -short ./internal/service/... -> PASS
  - rg "context\.(Background|TODO)\(\)" internal/service/ internal/scheduler/
    internal/connector/ internal/api/middleware/ -> 0 non-test hits
    (one pedagogical godoc reference in audit.go documenting why
    context.Background() would be wrong remains intentional)

Wire-format invariants preserved: 0 API routes, 0 SQL migrations, 0
frontend bytes, 0 OpenAPI bytes, 0 connector interface signature changes,
0 new env vars, 0 new external dependencies (pure context stdlib). The
AuditRecorder interface signature, the body-hash algorithm (SHA-256 16
hex chars), the excluded-path short-circuit, the actor-extraction path,
the responseWriter status-capture wrapper, the AuditServiceAdapter, and
all 116 API routes under /api/v1/, /.well-known/est/, /scep, /health,
/auth are byte-identical.

M-2 aggregate across PR-A through PR-F: 57 files, +635 / -613 (PR-A 12f
+227/-237, PR-B 9f +150/-146, PR-C 17f +156/-148, PR-D 11f +67/-63,
PR-E 4f +9/-15, PR-F 4f +26/-4). With M-2 closed, 8 of 10 Medium
findings resolved; M-9, M-10, L-1..L-4, I-1..I-8 remain post-v2.1.0
hardening batch.

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:43:47 +00:00
shankar0123 6d508cf53f fix: security audit remediation (AUDIT-001, 003, 004, 005, 006, 018)
- AUDIT-001: Validate OpenSSL revoke inputs (hex-only serials, RFC 5280 reasons)
- AUDIT-003: Enforce /20 CIDR size cap at API level (create + update)
- AUDIT-004: Support comma-separated CERTCTL_AUTH_SECRET for zero-downtime key rotation
- AUDIT-005: Add ReadHeaderTimeout (5s) to prevent Slowloris
- AUDIT-006: Document audit trail query parameter exclusion rationale
- AUDIT-018: Add immediate-run-on-start to short-lived expiry scheduler loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 14:11:16 -04:00
shankar0123 3e3e68fd3a fix(security): TICKET-009 add HTTP timeouts to notifier clients
- Added TestSlack_ClientHasTimeout to verify 10-second timeout
- Added TestTeams_ClientHasTimeout to verify 10-second timeout
- Added TestPagerDuty_ClientHasTimeout to verify 10-second timeout
- Added TestOpsGenie_ClientHasTimeout to verify 10-second timeout
- All notifiers already configured with 10 second timeout in New()
- Tests verify timeout is set and matches expected value
2026-03-27 21:33:31 -04:00
shankar0123 ee75f149ae feat: M14 — Observability (dashboard charts, agent fleet, stats API, metrics, structured logging, rollback)
Backend: StatsService with 5 aggregation methods, JSON metrics endpoint, slog-based
structured logging middleware. Stats API: dashboard summary, certificates-by-status,
expiration timeline, job trends, issuance rate. 23 new backend tests.

Frontend: Recharts-powered dashboard with 4 charts (status pie, expiration heatmap,
job trends line, issuance bar), agent fleet overview page with OS/arch grouping and
version breakdown, deployment rollback buttons on version history. 7 new frontend tests.

78 API endpoints, 744+ total tests (658 Go + 86 Vitest).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 19:46:13 -04:00
shankar0123 28205e1131 Implement M7: auth middleware, rate limiting, CORS, and GUI login flow
Add SHA-256 API key authentication with constant-time comparison, configurable
token bucket rate limiter, CORS origin allowlist middleware, and React auth
context with login page. Auth info endpoint bootstraps GUI without credentials.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 11:58:13 -04:00
shankar0123 d395776a95 Initial scaffold: certificate control plane v0.1.0 2026-03-14 08:22:17 -04:00