Compare commits

...

51 Commits

Author SHA1 Message Date
shankar0123 707d8de4fb UX-001: sidebar re-entry + inline team/owner creation in wizard
Closes UX-001 (OnboardingWizard CertificateStep dead-end): users no
longer have to navigate away from the wizard and lose their in-flight
state when the required Owner/Team dropdowns are empty.

Layout.tsx
  - Adds persistent 'Setup guide' button in the left sidebar.
  - Clears localStorage 'certctl:onboarding-dismissed' then navigates
    to /?onboarding=1 as a re-entry signal that overrides dismissal.
  - localStorage.removeItem wrapped in try/catch to tolerate storage
    access errors (private browsing, quota, etc.).

DashboardPage.tsx
  - Reads ?onboarding=1 via useSearchParams as a forceOnboarding flag.
  - forceOnboarding bypasses the latched first-run gate so the wizard
    reopens even after dismissal or with certs/issuers already present.
  - onDismiss now also strips ?onboarding=1 via setSearchParams(next,
    { replace: true }) so a page refresh does not relaunch the wizard.

OnboardingWizard.tsx
  - Adds CreateTeamModalInline and CreateOwnerModalInline inside
    CertificateStep. Both wire through React Query: createTeam /
    createOwner mutation on success invalidates ['teams'] / ['owners']
    and calls onCreated(id) so the parent select auto-selects the new
    row as soon as the refetch lands.
  - '+ New team' and '+ New owner' buttons placed next to the select
    labels; empty-state copy replaced with inline 'create one now'
    buttons (no more Link back to /owners /teams).
  - CreateOwner coerces empty teamId to undefined before mutation so
    the server contract matches OwnersPage.

Tests (12 new, all green; total suite 252 passed / 0 failed):
  - Layout.test.tsx (4): Setup guide button renders, clicking it clears
    the dismissal key and navigates to /?onboarding=1, tolerates
    localStorage.removeItem throwing.
  - DashboardPage.test.tsx (4): first-run auto-open, ?onboarding=1
    re-entry after dismissal, onDismiss writes localStorage + strips
    the query param, dismissed-with-no-param stays closed.
  - OnboardingWizard.test.tsx (4): Skip-Skip reaches CertificateStep
    with '+ New team' / '+ New owner' buttons visible; '+ New team'
    happy path with React Query invalidation + parent-select
    auto-select via option-parent traversal (label is a sibling, not
    htmlFor-linked); '+ New owner' happy path pins team_id: undefined
    coercion; Cancel abort never mutates.

Test infrastructure notes:
  - Closure-driven vi.fn().mockImplementation pattern drives the
    post-invalidation refetch: the mutation mock mutates a closure
    variable that the getTeams/getOwners mock reads, so the parent
    select's new <option> exists by the time the refetch lands.
  - Anchored regex (/^Create Team$/, /^Create Owner$/) disambiguates
    the modal submit from the '+ New team' / '+ New owner' triggers.

Verification gates (all green):
  - vitest run: 252 passed / 0 failed (8 files, 13.98s)
  - tsc --noEmit: 0 errors
  - vite build: clean production bundle (851.77 kB js / 226.81 kB gzip)

No new runtime dependencies. Frontend-only change.
2026-04-19 14:49:04 +00:00
shankar0123 0725713e19 Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.

Schema — migration 000015:
  migrations/000015_agent_retire.up.sql flips
  deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
  RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
  boundary instead of quietly destroying targets. Both `agents` and
  `deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
  TEXT pair (TEXT not VARCHAR so operator comments are never
  truncated), indexed via partial indexes WHERE retired_at IS NOT
  NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
  CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
  EXISTS) so repeated runs against partially-migrated databases
  converge. migrations/000015_agent_retire.down.sql restores CASCADE
  and drops the new columns for clean rollback. A dedicated
  repository-layer testcontainers test
  (internal/repository/postgres/migration_000015_test.go) asserts the
  before/after FK action, column presence, index presence, and
  round-trip idempotency under up→down→up.

Domain — sentinel guard + dependency counts:
  internal/domain/connector.go gains IsRetired() on Agent, the
  exported SentinelAgentIDs slice listing server-scanner,
  cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
  four reserved IDs documented in CLAUDE.md and created at startup in
  cmd/server/main.go), IsSentinelAgent(id string) predicate,
  AgentDependencyCounts{ActiveTargets, ActiveCertificates,
  PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
  ActorTypeSystem enum values used by audit emission downstream.
  Coverage locked down by internal/domain/connector_test.go.

Service — 8-step ordered contract:
  internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
  opts{Force, Reason}) enforces a fixed execution order:
  (1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
      unconditionally; force=true does NOT bypass it.
  (2) fetch — ErrAgentNotFound on miss.
  (3) idempotency — if IsRetired() already, return
      AgentRetirementResult{AlreadyRetired: true} with no new audit
      event and no state change (safe to replay from flaky clients).
  (4) preflight counts — collectAgentDependencyCounts runs
      ActiveTargets, ActiveCertificates, PendingJobs sequentially
      (not in parallel; keeps the per-query timeout predictable and
      matches the repo's existing call-chain shape).
  (5) force-reason guard — opts.Force=true with empty Reason returns
      ErrForceReasonRequired (wired into the 400 status surface).
  (6) dependency guard — HasDependencies() with opts.Force=false
      returns BlockedByDependenciesError{Counts} (wired into the 409
      body with per-bucket counts).
  (7) mutation — single pinned retiredAt := time.Now(); agent
      retirement first, then cascade target retirement if opts.Force,
      all under the repo's single transaction so the two retired_at
      stamps match to the second.
  (8) best-effort audit — agent_retired always; agent_retirement_
      cascaded additionally on the force path. Actor is whatever the
      handler resolves from the request; actor type is mapped by
      resolveActorType (system/agent-prefix→Agent/else→User). Audit
      emission failures are logged via slog.Error but do not abort
      the retirement (matches the house convention used by every
      other scheduler-emitted event).

  BlockedByDependenciesError implements Error() as
  "active_targets=%d, active_certificates=%d, pending_jobs=%d" and
  Unwrap() → ErrBlockedByDependencies. The single struct satisfies
  errors.Is via Unwrap (used by scheduler-level tests) and errors.As
  via the concrete type (used by the handler to fish out Counts for
  the 409 body). ListRetiredAgents(page, perPage) adds a separate
  paginated accessor with page<1→1 and perPage<1→50 normalization so
  retired rows are queryable without polluting the default agent
  listing.

  Sentinel guard coverage is asymmetric by design: all four reserved
  IDs are protected, and force=true cannot override. Regression tests
  in internal/service/agent_retire_test.go assert each of the eight
  steps in order, plus sentinel bypass attempts and idempotency
  replay.

Handler + router — status-code surface:
  internal/api/handler/agents.go:RetireAgent exposes seven status
  codes on DELETE /agents/{id}:
    200 on a fresh retirement (body echoes AgentRetirementResult).
    204 on idempotent replay (AlreadyRetired=true; no new audit).
    400 on ErrForceReasonRequired.
    403 on ErrAgentIsSentinel.
    404 on ErrAgentNotFound.
    409 on BlockedByDependenciesError, with a custom body shape
        {error, counts{active_targets, active_certificates,
        pending_jobs}} that bypasses the default ErrorWithRequestID
        envelope so callers get the per-bucket numbers directly.
    500 on any other error.
  Heartbeat HandleHeartbeat returns 410 Gone when the agent is
  retired (ErrAgentRetired), signalling the agent to shut down.
  Query params `force=true` and `reason=<text>` drive the cascade
  path; both are forwarded as url.Values through the new MCP
  transport.

  internal/api/router/router.go registers GET /api/v1/agents/retired
  literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
  literal-beats-pattern-var precedence routes "retired" to the
  paginated retired-agents listing instead of fetching a hypothetical
  agent named "retired".

Agent binary — clean shutdown on 410:
  cmd/agent/main.go gains the ErrAgentRetired sentinel, a
  retiredOnce sync.Once, and a retiredSignal chan struct{}. A
  markRetired(source, statusCode, body) helper closes the channel
  exactly once; the Run() select loop observes the close and returns
  ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
  and exits cleanly instead of spinning in the heartbeat retry loop.
  The 410 Gone surface is therefore terminal for the agent process.

MCP transport:
  internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
  a new additive transport method. Client.Delete is path-only; without
  this method the retire tool would silently drop `force` and `reason`,
  turning every cascade retire into a default soft-retire. The new
  method shares do()'s 204 normalization and 4xx/5xx error
  propagation so tool authors get one contract.
  internal/mcp/tools.go + internal/mcp/types.go expose the
  retire_agent tool with Force+Reason inputs wired through
  DeleteWithQuery.

CLI:
  cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
  `agents list --retired` (client-side strip of --retired then
  delegation to ListRetiredAgents, sharing --page/--per-page parsing
  with the default listing) and `agents retire <id> [--force --reason
  "…"]` (mirrors ErrForceReasonRequired — force without reason is
  rejected client-side before the request is sent). JSON + table
  output modes both honor the new columns.

Frontend:
  web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
  web/src/api/client.ts + web/src/api/types.ts expose the retire
  endpoint and the retired-listing. 4 new Vitest regression cases.

OpenAPI:
  api/openapi.yaml documents DELETE /agents/{id} with all seven
  status codes, 410 on heartbeat, and the 409 per-bucket body shape.

Regression coverage (six new test files, all green):
  internal/service/agent_retire_test.go           — 8-step contract + sentinel guards
  internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
  internal/mcp/retire_agent_test.go               — DeleteWithQuery wire-through
  internal/cli/agent_retire_test.go               — --retired listing + --force/--reason pairing
  internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
  internal/domain/connector_test.go               — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies

Files:
  api/openapi.yaml                                — DELETE + 410 + 409 body shape
  cmd/agent/main.go                               — ErrAgentRetired, markRetired, retiredSignal
  cmd/cli/main.go                                 — handleAgents list/get/retire dispatch
  docs/architecture.md, docs/concepts.md,
    docs/testing-guide.md                         — retirement contract narrative
  internal/api/handler/agents.go                  — RetireAgent, status surface, 410 on heartbeat
  internal/api/handler/agent_handler_test.go      — extended coverage
  internal/api/handler/agent_retire_handler_test.go — new
  internal/api/router/router.go                   — /agents/retired before /agents/{id}
  internal/cli/agent_retire_test.go               — new
  internal/cli/client.go                          — ListRetiredAgents + RetireAgent
  internal/domain/connector.go                    — IsRetired, SentinelAgentIDs,
                                                    IsSentinelAgent, AgentDependencyCounts,
                                                    ActorTypeAgent/System
  internal/domain/connector_test.go               — new
  internal/integration/lifecycle_test.go          — retirement fixture
  internal/mcp/client.go                          — DeleteWithQuery additive transport
  internal/mcp/retire_agent_test.go               — new
  internal/mcp/tools.go, internal/mcp/types.go    — retire_agent tool + Force/Reason inputs
  internal/repository/interfaces.go               — AgentRepository retirement methods
  internal/repository/postgres/agent.go           — retire + cascade target retire + counts
  internal/repository/postgres/migration_000015_test.go — new
  internal/service/agent.go                       — wire into AgentService surface
  internal/service/agent_retire.go                — new 8-step contract
  internal/service/agent_retire_test.go           — new
  internal/service/deployment.go                  — skip retired agents
  internal/service/target.go                      — skip retired agents
  internal/service/testutil_test.go               — shared mocks extended
  migrations/000015_agent_retire.up.sql           — new
  migrations/000015_agent_retire.down.sql         — new
  web/src/api/client.ts, types.ts + tests         — retire endpoint wiring
  web/src/pages/AgentsPage.tsx                    — retire UI
2026-04-19 05:24:00 +00:00
shankar0123 1ee77c89f8 I-003: job timeout reaper closes AwaitingCSR/AwaitingApproval gap
Add 11th always-on scheduler loop that transitions jobs stuck in
AwaitingCSR (default 24h TTL) or AwaitingApproval (default 168h TTL)
to Failed. I-001's retry loop then auto-promotes eligible Failed jobs
back to Pending. No new status enum, no schema migration.

- JobRepository.ListTimedOutAwaitingJobs with per-status cutoff WHERE
- JobService.ReapTimedOutJobs mirrors RetryFailedJobs structure
- Scheduler jobTimeoutLoop with atomic.Bool idempotency guard, 2m
  per-tick context, WaitGroup shutdown drain
- Config: CERTCTL_JOB_TIMEOUT_INTERVAL (10m), CERTCTL_JOB_AWAITING_CSR_TIMEOUT
  (24h), CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT (168h)
- Audit event per transition: actor=system, actorType=System,
  action=job_timeout, details={old_status, new_status, timeout_reason,
  age_hours}
- 14 new tests: 3 config, 7 service, 4 scheduler
2026-04-19 01:37:18 +00:00
shankar0123 4bc8b3e723 fix(config): add RetryInterval to TestValidate_ValidConfig + TestValidate_AuthTypeNone fixtures (I-001 follow-up)
Problem:
  TestValidate_ValidConfig and TestValidate_AuthTypeNone construct a
  SchedulerConfig without RetryInterval, so Validate() fails the
  'retry interval must be at least 1 second' check at config.go:1086
  with 'retry interval must be at least 1 second'. Both tests expect
  success, so they fail whenever run.

Root cause (re-derived from source, not inherited from memory):
  git log -S 'retry interval must be at least' --source --all shows
  the validation was introduced in 0200c7f (I-001, RetryFailedJobs
  scheduler wiring). git log -- internal/config/config_test.go shows
  the test file was last touched in 7382e5f, which predates 0200c7f.
  I-001 added a new Validate() rule without updating the two positive
  test fixtures — a gap in I-001's verification pass.

  This is NOT C-001 fallout. The config_test.go file was untouched by
  the C-001 closure commits 91642e2 and 4696116. The failure surfaced
  during the full test suite run after C-001 landed because no one
  had run 'go test ./internal/config/...' since I-001.

Scope:
  - internal/config/config_test.go (2 fixtures: TestValidate_ValidConfig,
    TestValidate_AuthTypeNone).

Implementation:
  Added 'RetryInterval: 5 * time.Minute' to both SchedulerConfig
  literals. 5 minutes matches the I-001 default at config.go:818:

    RetryInterval: getEnvDuration("CERTCTL_SCHEDULER_RETRY_INTERVAL", 5*time.Minute)

  The other two TestValidate_* tests (InvalidAuthType, APIKeyAuth_
  MissingSecret) are unaffected because they expect Validate() to
  error at the auth-type check (line 1052) or auth-secret check
  (line 1057), both of which fire before the RetryInterval check at
  line 1086.

Verification:
  - go test -count=1 -run 'TestValidate_' ./internal/config/...: PASS
  - go test -short -count=1 ./...: all packages PASS
  - go vet ./...: exit 0

Residual:
  None. This is a pure test-fixture fix — production code is unchanged.

Commit:
  0200c7f (I-001) should have included this edit. Attributed here for
  traceability.
2026-04-19 00:33:22 +00:00
shankar0123 469611650c fix(cli): add missing os + path/filepath imports to client_test.go
Follow-up to 91642e2. TestClient_ImportCertificates_SixFieldPayload
uses filepath.Join(t.TempDir(), ...) and os.WriteFile to stage a
test PEM, but the import block only listed encoding/json,
encoding/pem, net/http, etc. — neither os nor path/filepath was
imported. go vet rejected the package with 'undefined: filepath'
(and would have caught 'undefined: os' next).

Add both imports. No behavioral change — the referenced symbols
are the standard library's usual names for their respective
packages, so the test compiles and runs exactly as intended.
CI should now pass go build + go vet on the cli package.
2026-04-19 00:27:11 +00:00
shankar0123 91642e2860 C-001 scope expansion: tighten parallel POST /api/v1/certificates call sites to six-field contract
Problem:
a53a4b8 closed C-001 at the handler boundary by tightening the
ValidateRequired contract on POST /api/v1/certificates to require six
fields: name, common_name, renewal_policy_id, issuer_id, owner_id,
team_id. (Correction re-derived from source: the handler
ValidateRequired calls on owner_id/team_id/renewal_policy_id were
actually installed in 3287e17 under M-002/M-003/M-006 auth unification
— a53a4b8's commit message overstates scope.) Post-audit on
2026-04-18 found three parallel call sites still shipping
three-to-four-field payloads that the newly strict handler would
reject with HTTP 400:
  - GUI: OnboardingWizard CertificateStep (common_name + sans +
    issuer_id + environment only)
  - CLI: certctl-cli import (common_name + issuer_id + status only;
    no required-flag gating)
  - Tests: deploy/test/qa_test.go Part03 positive paths

Scope:
Bring every POST /api/v1/certificates caller to six-field parity. No
handler changes — the contract is authoritative; the callers must
conform.

Implementation:

  GUI — OnboardingWizard CertificateStep expansion:
    web/src/pages/OnboardingWizard.tsx adds name/owner_id/team_id/
    renewal_policy_id state. React Query hooks for getOwners/
    getTeams/getPolicies use per_page: '500' to populate dropdowns
    without pagination-driven truncation. Payload ships all six
    required fields plus sans/certificate_profile_id/environment.
    nextDisabled gate enforces all six before the Continue button
    activates.

  CLI — ImportCertificates rewrite:
    internal/cli/client.go rewrites ImportCertificates with
    flag.NewFlagSet("import", flag.ContinueOnError). Required flags:
    --owner-id, --team-id, --renewal-policy-id, --issuer-id. Optional:
    --name-template (default {cn}, templated via strings.ReplaceAll
    against cert.Subject.CommonName), --environment (default
    imported). Missing required flags fail pre-HTTP with a clear
    error. Request map ships all six required fields plus sans/
    environment/status/optional serial_number.
    cmd/cli/main.go — usage string updated to document the new
    required/optional flags.

  Tests — qa_test.go Part03 positive paths:
    deploy/test/qa_test.go Part03 Create_Minimal and Create_Full
    updated to include all six fields. Uses seed_demo.sql-supplied IDs
    (o-alice, t-platform, rp-standard) — docker-compose.demo.yml is
    the run context. C-001 explanatory comment added above
    Create_Minimal so future readers understand why the minimal
    payload is no longer minimal.

  MCP parity:
    Verified no-op. internal/mcp/types.go:28 CreateCertificateInput
    already declares all six fields; internal/mcp/tools.go:102
    forwards the typed struct unchanged.

Verification:

  Go CLI regression tests (internal/cli/client_test.go):
    * TestClient_ImportCertificates_MissingRequiredFlags — 5 subtests,
      one per missing required flag, confirms flag.ContinueOnError
      rejects with non-nil error before any HTTP call is attempted.
    * TestClient_ImportCertificates_MissingPositionalArgs — confirms
      the "usage: import <file>" error path when no PEM file is
      supplied after the flags.
    * TestClient_ImportCertificates_SixFieldPayload — uses httptest
      to decode the POST body and assert all six required fields
      plus sans/environment are present on the wire.

  Frontend regression test (web/src/api/client.test.ts):
    'createCertificate accepts and transmits all six required fields'
    pins the wire shape for both GUI call sites (OnboardingWizard
    CertificateStep + CertificatesPage CreateCertificateModal). If
    either UI surface accidentally drops a field, this assertion
    fails in CI rather than surfacing as a 400 at runtime.

  Grep-based call-site sweep:
    Enumerated every POST /api/v1/certificates create caller. Four
    total: OnboardingWizard, CertificatesPage, MCP tools, CLI import.
    All four now ship six-field payloads. Claim path
    (internal/service/discovery.go) updates existing rows and does
    not POST. EST/SCEP handlers invoke internal
    certService.CreateVersion, not the public API. Negative-path
    tests (qa_test.go:1085/1267/1274/1288/1298) remain valid: they
    assert 400/non-500 on oversized/malformed/missing-CN/UTF-8/empty
    bodies, and these properties still hold under the stricter
    handler.

  Static gates:
    go build ./..., go vet ./..., go test ./internal/cli/..., and
    cd web && npm run test deferred to operator pre-push — the Go
    toolchain is not available in the session sandbox. Grep-based
    verification confirms the syntactic shape of every changed file.

Residual:
None. Every POST /api/v1/certificates call site now conforms to the
six-field contract; the wire shape is pinned by both Go and
TypeScript regression tests.

Commit:
TBD-SHA (audit doc + CLAUDE.md carry TBD-SHA placeholders to be
amended after commit)
2026-04-19 00:25:10 +00:00
shankar0123 0200c7f4a4 Close I-001 (RetryFailedJobs never invoked) coverage-gap finding
Operator decision answered as Option A: JobService.RetryFailedJobs is
now wired into the scheduler as an always-on 10th loop. Prior to this
commit the method was implemented, unit-tested, and exported but had
zero runtime callers — any job that transitioned to status=Failed stayed
Failed forever regardless of how many attempts it had remaining.

Scheduler — 10th loop:
  internal/scheduler/scheduler.go grows a jobRetryLoop alongside the
  existing nine loops (renewal, jobs, health, notifications, short-lived,
  network scan, digest, health check, cloud discovery). The loop follows
  the established run-immediately-then-tick pattern (same shape as
  jobProcessorLoop), gated by a sync/atomic.Bool idempotency guard and
  joined into the scheduler's sync.WaitGroup so WaitForCompletion drains
  it on graceful shutdown. Each tick runs under a 2-minute context
  timeout mirroring jobProcessorLoop's opCtx budget. The runJobRetry
  helper invokes jobService.RetryFailedJobs(ctx, 3) — the advisory
  maxRetries cap is belt-and-suspenders; per-job eligibility is still
  enforced inside the service via Attempts < MaxAttempts.

  The JobServicer scheduler-interface gains RetryFailedJobs so the
  scheduler's dependency surface stays explicit and mockable.

Service — audit trail per retry:
  internal/service/job.go:RetryFailedJobs now emits an audit event for
  every Failed→Pending transition. Following the house convention used
  by all scheduler-emitted events, actor='system' and actorType=
  domain.ActorTypeSystem; action='job_retry'; details capture
  old_status, new_status, attempts, max_attempts. JobService carries an
  optional *AuditService (SetAuditService) that nil-guards to preserve
  test-wiring ergonomics — existing tests that construct JobService
  without an audit service continue to pass unchanged.

Config — env var with sane default:
  internal/config/config.go:SchedulerConfig grows RetryInterval, wired
  to CERTCTL_SCHEDULER_RETRY_INTERVAL with a 5-minute default. Validate
  rejects intervals below 1 second (matches other scheduler interval
  validators).

Server wiring:
  cmd/server/main.go calls jobService.SetAuditService(auditService)
  after JobService construction and sched.SetJobRetryInterval(
  cfg.Scheduler.RetryInterval) alongside the other SetXxxInterval calls.

Regression coverage:
  internal/service/job_test.go (3 new)
    - TestJobService_RetryFailedJobs_EligibleJobTransitionsAndAudits
    - TestJobService_RetryFailedJobs_SkipsJobsAtMaxAttempts
    - TestJobService_RetryFailedJobs_NoAuditServiceOK
  internal/scheduler/scheduler_test.go (3 new)
    - TestScheduler_JobRetryLoop_CallsService
    - TestScheduler_JobRetryLoop_IdempotencyGuard
    - TestScheduler_JobRetryLoop_WaitForCompletion

  The service tests assert status transitions, attempt-cap short-
  circuiting, and audit event shape (actor='system', action='job_retry',
  details keys). The scheduler tests assert the loop invokes the service,
  the atomic.Bool guard skips overlapping ticks with the expected
  'still running, skipping tick' log, and WaitForCompletion drains the
  in-flight tick on Stop.

Residual follow-up (not in scope for this commit):
  internal/service/renewal.go:RetryFailedJobs is a parallel dead-code
  duplicate of the same logic on RenewalService — untested and has no
  runtime caller. The audit finding called this out as 'implemented
  twice'. Removing it is a separate cleanup and does not block the
  Option-A wiring this commit delivers.

Files:
  cmd/server/main.go                     — SetAuditService + SetJobRetryInterval
  internal/config/config.go              — RetryInterval field + env + validate
  internal/scheduler/scheduler.go        — 10th loop, interface, field, setter
  internal/scheduler/scheduler_test.go   — 3 new scheduler-loop tests
  internal/service/job.go                — RetryFailedJobs audit emission + SetAuditService
  internal/service/job_test.go           — 3 new service-layer tests
2026-04-18 23:24:54 +00:00
shankar0123 fe7e766510 Close M-004 (OCSP issuer binding) and M-005 (discovery actor propagation) coverage-gap findings
M-004 — OCSP issuer binding (composite key):
  The OCSP lookup path now binds (issuer_id, serial) as a composite key
  rather than resolving by serial alone. CertificateRepository and
  RevocationRepository gain GetByIssuerAndSerial methods; ca_operations.go
  scopes both lookups by the issuer_id path param. When no managed cert
  binds to that (issuer, serial) tuple, GetOCSPResponse constructs an
  RFC 6960 §2.2 'unknown' response (CertStatus=2) instead of the prior
  default 'good'. Short-lived cert exemption (profile TTL < 1h) is
  preserved. Real repo errors (non-sql.ErrNoRows) fail closed with a log.

  Regression coverage: internal/service/ca_operations_test.go
    - TestCAOperationsSvc_GetOCSPResponse_Unknown_CrossIssuer
    - TestCAOperationsSvc_GetOCSPResponse_Unknown_UnknownSerial

M-005 — Discovery Claim/Dismiss actor propagation:
  DiscoveryService.ClaimDiscovered and DismissDiscovered now accept an
  explicit 'actor string' parameter (propagation pattern mirrors
  bulk_revocation.go / revocation_svc.go). The handler layer passes
  resolveActor(r.Context()) — the named-key identity established by the
  M-002 auth unification — and the service falls back to 'api' (the same
  safe sentinel resolveActor uses when no auth context is present) only
  when the caller passes an empty string. Never falls back to 'operator'.

  Regression coverage: internal/service/discovery_test.go
    - TestDiscoveryService_ClaimDiscovered_AuditActor
    - TestDiscoveryService_DismissDiscovered_AuditActor
    - TestDiscoveryService_ClaimDiscovered_EmptyActorFallsBackToAPI
    - TestDiscoveryService_DismissDiscovered_EmptyActorFallsBackToAPI

Each new test asserts event.Actor matches the caller-supplied string (or
'api' on empty input) and explicitly asserts event.Actor != 'operator'
to lock in the historical fix intent.

Files:
  internal/api/handler/discovery.go          — pass resolveActor(ctx)
  internal/api/handler/discovery_handler_test.go — updated call sites
  internal/integration/lifecycle_test.go     — updated mock wiring
  internal/repository/interfaces.go          — GetByIssuerAndSerial on
                                               CertificateRepository +
                                               RevocationRepository
  internal/repository/postgres/certificate.go — composite key lookup
  internal/service/ca_operations.go          — (issuer_id, serial) scoping
  internal/service/ca_operations_test.go     — 2 new M-004 tests
  internal/service/discovery.go              — actor parameter + 'api' fallback
  internal/service/discovery_test.go         — 4 new M-005 tests
  internal/service/shortlived_test.go        — mock signature update
  internal/service/testutil_test.go          — mock GetByIssuerAndSerial
2026-04-18 22:20:25 +00:00
shankar0123 ff7357f889 fix(lint): godoc comment on NewAuthWithNamedKeys must lead with function name (ST1020)
CI failure on master (commit 3287e17) — staticcheck ST1020:

  internal/api/middleware/middleware.go:125:1: ST1020: comment on exported
  function NewAuthWithNamedKeys should be of the form
  "NewAuthWithNamedKeys ..." (staticcheck)

When NewAuth was renamed to NewAuthWithNamedKeys during the M-002 auth
unification, the leading godoc sentence was left pointing at the old name.
Rewrite the comment so its first sentence starts with the new function
name, and expand the body to describe the named-key + admin-flag contract
introduced in 3287e17.

Also gitignore /.gopath/ — session-scoped tool install cache, same
category as /.gocache/ and /.gomodcache/.

Verification:
  go vet ./internal/api/middleware/...          — clean
  go build ./internal/api/middleware/...        — clean
  go test ./internal/api/middleware/...         — PASS (0.245s)
  staticcheck -checks=all,<project exclusions>  — clean across
    middleware, handler, service, domain, cmd/server, scheduler

Closes: CI failure on 3287e17.
2026-04-18 21:38:46 +00:00
shankar0123 3287e174dc Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
a53a4b8. The work lands as a single commit spanning server, docs, tests,
and the React client.

M-002 — Named API keys with per-key actor propagation
  * Migration 000014 adds the 'api_keys' table (id, name, hash,
    principal, role, created_at, last_used_at, disabled_at) so every
    credential carries an identifiable principal instead of the
    opaque 'anonymous'/'api-key' sentinel.
  * Auth middleware now rotates through configured keys, performs
    constant-time hash comparison, stamps 'last_used_at', and emits
    an actor struct via contextWithActor(). The audit middleware,
    bulk-revocation handler, approval handlers, and MCP tool layer
    now read the principal off the context and persist it on every
    audit_events row.
  * Regression coverage:
      - internal/api/middleware/audit_test.go — actor propagation,
        principal redaction for disabled keys, anonymous fallback for
        unauthenticated endpoints.
      - internal/api/handler/bulk_revocation_handler_test.go,
        job_handler_test.go — principal-on-audit assertions.

M-003 — Authorization gates (Phase B)
  * Approval handler rejects self-approval / self-rejection with 403
    when the actor principal equals the job's requested_by field.
  * Bulk revocation is gated behind the 'admin' role; operators and
    viewers receive 403.
  * Regression coverage:
      - internal/service/job_test.go — TestApproveJob_NotSelf,
        TestRejectJob_NotSelf.
      - internal/api/handler/bulk_revocation_handler_test.go —
        TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.

M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
  * Per RFC 8615, relying parties cannot reasonably be asked to
    authenticate against the issuing certctl instance to retrieve
    revocation material. CRL and OCSP move off the authenticated
    '/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
        GET /.well-known/pki/crl/{issuer_id}
            Content-Type: application/pkix-crl   (RFC 5280 §5)
        GET /.well-known/pki/ocsp/{issuer_id}/{serial}
            Content-Type: application/ocsp-response  (RFC 6960)
  * Non-standard JSON CRL shape is removed; only DER is served.
  * Short-lived certificate exemption (profile TTL < 1h → skip
    CRL/OCSP) is preserved; the response simply omits the serial.
  * Routes are registered on the unauthenticated 'finalHandler' mux
    in cmd/server/main.go alongside EST ('/.well-known/est/*') and
    SCEP ('/scep'). Legacy authenticated paths return 404.
  * Regression coverage:
      - internal/api/handler/certificate_handler_test.go — content
        type, DER parseability, 404 for unknown issuer.
      - internal/api/handler/adversarial_path_test.go — unauthenticated
        access asserted for CRL, OCSP, EST, SCEP.
      - internal/api/router/router_test.go — route-table assertion
        that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
        mounted on the unauthenticated branch.

M-001 — Auto-closed by M-002
  EST and SCEP were already registered on the unauthenticated
  'finalHandler' mux; the router comment at
  internal/api/router/router.go:247 now matches reality. The
  adversarial-path tests above lock the behavior in.

Verification (all gates green):
  * go vet ./...                                           — clean
  * go build ./...                                         — ok
  * go test -short ./... (55+ packages)                    — all pass
  * web/ : npm test (225 Vitest tests)                     — all pass
  * web/ : npx tsc --noEmit                                — clean
  * grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
    all intentional M-006 tombstone/relocation comments.

Documentation:
  * coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
    Fixed, with per-finding resolution paragraphs citing regression
    test IDs. (Audit file lives outside this repo; see cowork root.)
  * CLAUDE.md Project Status line updated with the auth-unification
    closure note.
  * docs/features.md, docs/architecture.md, docs/quickstart.md,
    docs/concepts.md, docs/connectors.md, docs/test-env.md,
    docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
    — refreshed for the new '.well-known/pki/*' namespace and named
    API keys.
  * api/openapi.yaml — documents the new unauthenticated endpoints
    and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.

.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
2026-04-18 18:17:41 +00:00
shankar0123 a53a4b845b fix(gui,api): close C-001 + C-002 — ownership + agent FK contract
C-001 — CreateCertificate was server-accepted with null owner_id,
team_id, renewal_policy_id because the GUI neither collected the fields
nor enforced them, even though the backend's ManagedCertificate schema
and handler contract treat them as required. Fix the contract at all
four layers:

  - web/src/pages/CertificatesPage.tsx: replace owner_id/team_id free-
    text inputs with <select> elements fed by getOwners/getTeams/
    getPolicies queries; mark all three required; gate the Create
    button on owner_id + team_id + renewal_policy_id being set.
  - internal/api/handler/certificates.go: ValidateRequired for
    owner_id, team_id, renewal_policy_id on CreateCertificate so the
    handler returns HTTP 400 with the offending field name before the
    service layer is reached.
  - internal/mcp/types.go: drop ',omitempty' from
    CreateCertificateInput.RenewalPolicyID so the MCP schema reflects
    the required contract; Update inputs keep partial-update semantics.
  - api/openapi.yaml: 'required: [name, common_name, renewal_policy_id,
    issuer_id, owner_id, team_id]' was already present on the Create
    schema; clarified DeploymentTarget.agent_id description to note the
    FK contract.

C-002 — CreateTargetWizard accepted an empty or bogus agent_id and the
service inserted directly, producing a Postgres 23503 FK-violation that
bubbled out as a generic HTTP 500. The FK itself (migration 000001 line
104: agent_id TEXT NOT NULL REFERENCES agents(id)) is correct; we keep
the schema strict and add validation at three layers:

  - internal/service/target.go: introduce
    ErrAgentNotFound sentinel and pre-validate agent_id in
    TargetService.CreateTarget — empty string returns
    'agent_id is required'; a nonexistent id returns the full
    'referenced agent does not exist: <id>' error. Both wrap
    ErrAgentNotFound via fmt.Errorf %w so callers can use errors.Is.
  - internal/api/handler/targets.go: ValidateRequired on agent_id; map
    errors.Is(err, service.ErrAgentNotFound) to HTTP 400 instead of
    letting it fall through to the generic 500 branch.
  - internal/mcp/types.go: drop ',omitempty' from
    CreateTargetInput.AgentID to match the required contract.
  - web/src/pages/TargetsPage.tsx: replace the free-text Agent ID input
    with a <select> populated from getAgents(); include agent in the
    canProceedToReview gate so Next is disabled until an agent is
    chosen.

Regression coverage (21 new subtests total):

  - TestCreateCertificate_MissingRequiredField_Returns400 — 6 subtests,
    one per required field, each proves the handler guard fires before
    the mock service is called.
  - TestCreateTarget_MissingAgentID_Returns400 — handler guard.
  - TestCreateTarget_NonexistentAgent_Returns400 — pins the
    ErrAgentNotFound -> 400 translation.
  - TestTargetService_CreateTarget_MissingAgentID — errors.Is sentinel.
  - TestTargetService_CreateTarget_NonexistentAgentID — errors.Is.
  - The existing TestTargetService_CreateTarget_Success, along with
    TestCreateTarget_{MissingName,MissingType,NameTooLong}_* handler
    tests, were updated to seed a real agent or include agent_id in
    the request body so the happy paths still run cleanly.

Gates (Phase 4):
  - go build/vet/test/race: green
  - go test -cover: internal/service 68.7% (gate 55%),
    internal/api/handler 78.9% (gate 60%)
  - golangci-lint on service+handler+mcp: 0 issues
  - govulncheck: no reachable vulns
  - tsc --noEmit: clean
  - vitest: 223/223 passing

See cowork/certctl-coverage-gap-audit.md entries C-001 and C-002.
2026-04-18 16:01:40 +00:00
shankar0123 9143da5fa8 Merge branch 'fix/d-008-policy-engine-drift' 2026-04-18 14:56:06 +00:00
shankar0123 b3cc7cbdb2 fix(policies): close the D-006 loop — TitleCase seed canonicals + severity-aware, config-consuming rule engine (D-008)
D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:

  (a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
      types ('ownership', 'environment', 'lifetime', 'renewal_window')
      that the handler validator rejects on Create/Update but that
      raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
      switch fell through to the default "unknown policy rule type"
      error branch on every demo rule × every cert × every cycle,
      flooding logs while emitting zero violations.

  (b) migrations/seed_demo.sql persisted lowercase severity values
      ('critical', 'error', 'warning') on policy_violations rows.
      INSERT succeeded because that column had no CHECK, but any
      frontend comparing against the canonical PolicySeverity enum
      mis-categorized every seeded violation.

  (c) evaluateRule hardcoded Severity: PolicySeverityWarning on
      every emitted violation and ignored rule.Config entirely —
      so the D-006 per-rule severity column (000013) and every
      per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
      required_keys, allowed, lead_time_days, max_days}) was dead
      data below the evaluation layer.

This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.

## Changes

Domain (internal/domain/policy.go):
  * Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
    Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
    arm — routing it through RenewalLeadTime would conflate "how
    close to expiry before we renew" with "how long can the cert
    possibly be", two distinct semantics. The new type accepts
    config {"max_days": int} and flags certs whose
    NotAfter - NotBefore exceeds the cap.

Handler validator (internal/api/handler/validation.go):
  * ValidatePolicyType allowlist grown to 6 canonicals
    (AllowedIssuers, AllowedDomains, RequiredMetadata,
    AllowedEnvironments, RenewalLeadTime, CertificateLifetime).

OpenAPI (api/openapi.yaml):
  * PolicyType enum grown to match domain.

Frontend (web/src/api/types.ts, types.test.ts):
  * POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
    all 6 canonicals and rejects casing drift.

Migration 000014 (policy_violations severity CHECK):
  * Named CHECK constraint (policy_violations_severity_check)
    mirroring 000013's allowlist, defense-in-depth at the DB layer
    against future drift from bypassed writes (migrations, psql
    sessions, future callers). Symmetric down migration drops by
    name.

Seed data:
  * migrations/seed.sql rewritten to emit TitleCase canonicals with
    per-arm config JSON that actually exercises the config-consuming
    paths (not the missing-field backstops):
      - pr-require-owner         → RequiredMetadata     {"required_keys":["owner"]}                        Warning
      - pr-allowed-environments  → AllowedEnvironments  {"allowed":["production","staging","development"]} Error
      - pr-max-certificate-lifetime → CertificateLifetime {"max_days":90}                                   Critical
      - pr-min-renewal-window    → RenewalLeadTime      {"lead_time_days":14}                              Warning
    Severities are now differentiated per rule (D-006 intent).
  * migrations/seed_demo.sql violation rows flipped to TitleCase
    severity ('Critical', 'Error', 'Warning') so migration 000014
    applies cleanly on upgrade paths.

Engine rewrite (internal/service/policy.go):
  * evaluateRule rewritten. All six arms now:
      1. Parse rule.Config into the per-arm typed struct.
      2. Bad JSON → log at ValidateCertificate boundary and skip
         this rule (no co-located poisoning of other rules in the
         same batch).
      3. Empty/null Config → emit the pre-D-008 missing-field
         violation (backwards compat invariant — operators who
         haven't reconfigured still see the same output).
      4. Violations emitted carry rule.Severity (no more hardcoded
         Warning); D-006 column is now load-bearing.
  * CertificateLifetime arm reads NotBefore/NotAfter from the
    certificate's latest version via CertRepo. Injected via
    PolicyService.SetCertRepo() setter — avoids churning ~36
    NewPolicyService call sites while keeping the lifetime arm
    optional (degrades to a log+skip if the setter is not wired).

Server wiring (cmd/server/main.go):
  * policyService.SetCertRepo(certRepo) wired after construction.

Tests (internal/service/policy_test.go):
  * 25 new subtests across 5 groups:
      - TestEvaluateRule_SeverityPassThrough (6): every rule type
        emits violations carrying rule.Severity, not hardcoded.
      - TestEvaluateRule_ConfigConsumed (12): every per-arm Config
        path exercised positive + negative.
      - TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
        Config still emits pre-D-008 missing-field violations.
      - TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
        and skips cleanly without poisoning neighbors.
      - TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
        ok when repo wired, log+skip when not, handles missing
        NotBefore/NotAfter edges.

Provenance: D-008 surfaced during D-005/D-006 remediation review
in eef1db0. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
2026-04-18 14:55:56 +00:00
shankar0123 eef1db0f0a fix(policies): stop 400ing the "+ New Policy" button + add per-rule severity (D-005, D-006)
Coverage Gap Audit findings D-005 (P0) + D-006 (P1) fixed together in a
single commit because they share the same root cause — policy CRUD sending
values the backend silently rejects — and splitting them would leave a
half-working UI between commits.

## D-005 (P0): PoliciesPage dropdown 400s every Create Policy

Root cause
----------
`web/src/pages/PoliciesPage.tsx` populated the Type `<select>` from a
hardcoded `['key_algorithm', 'ownership', 'allowed_issuers', ...]` array.
The backend's `internal/api/handler/validators.go::ValidatePolicyType`
enforces the TitleCase allowlist `AllowedIssuers`, `AllowedDomains`,
`RequiredMetadata`, `AllowedEnvironments`, `RenewalLeadTime` — defined in
`internal/domain/policy.go`. Every Create Policy request was rejected with
`400 invalid policy type`. The error surfaced only as a transient toast;
the modal closed anyway. Silent user-visible failure.

Fix
---
- `web/src/api/types.ts`: added `POLICY_TYPES` and `POLICY_SEVERITIES`
  tuples with `as const` and narrowed `PolicyRule.type`, `.severity`, and
  `PolicyViolation.severity` to the literal-union types. Dropdown is now
  sourced from the tuple; casing drift becomes a compile error.
- `web/src/pages/PoliciesPage.tsx`: rekeyed `severityStyles` /
  `severityDots` to the TitleCase values, added `humanize()` for display
  (AllowedIssuers → "Allowed Issuers"), removed the `badge-neutral`
  fallback that was papering over the mismatch.
- `web/src/api/types.test.ts` (new): pins both tuples exactly. If anyone
  edits one side of the frontend/backend contract without the other, CI
  fails with a clear assertion. Pure-TS vitest, no RTL dependency.

## D-006 (P1): `severity` field silently dropped on create/update

Root cause
----------
`PolicyRule` had no `Severity` field in `internal/domain/policy.go`. The
frontend has always sent `severity` on create/update, but Go's
`json.Decoder` (default settings, no `DisallowUnknownFields`) silently
dropped it. The value never reached PostgreSQL. Every rule rendered with
the same severity because there was no severity — just a display
computation downstream.

Fix: option (b), full-stack schema add (not delete-the-field)
-------------------------------------------------------------
- Migration `000013_policy_rule_severity` (up + down): adds
  `severity VARCHAR(50) NOT NULL DEFAULT 'Warning'` to `policy_rules` with
  CHECK constraint `severity IN ('Warning', 'Error', 'Critical')`. No
  index — three-value column on a low-thousands-rows table, planner will
  seq-scan regardless. PG 11+ metadata-only ADD COLUMN, safe on live data.
- `internal/domain/policy.go`: added `Severity PolicySeverity` field.
- `internal/repository/postgres/policy.go`: plumbed `severity` through
  ListRules SELECT + Scan, GetRule SELECT + Scan, CreateRule INSERT,
  UpdateRule UPDATE (4 queries).
- `internal/service/policy.go::UpdatePolicy`: if the client omits
  severity on a PUT (zero-value empty string), fetch the existing rule
  and preserve its severity. Without this, partial updates would trip the
  NOT NULL CHECK and 500. Preserves pre-existing behavior for Name/Type
  (out of scope).
- `internal/api/handler/policies.go::CreatePolicy`: default empty severity
  to `'Warning'`, then validate via `ValidatePolicySeverity`. 400 with
  clear message instead of 500 on CHECK violation. `UpdatePolicy`:
  validates severity only when provided.
- `internal/mcp/types.go` + `internal/mcp/tools.go`: added optional
  `severity` on the MCP `create_policy` / `update_policy` tool inputs so
  LLM callers stay in sync with the wire contract.
- `api/openapi.yaml`: added `severity` to the `PolicyRule` schema with
  the enum and default.

Acceptance criterion (user-defined)
-----------------------------------
"Create a rule with severity=Critical, reload the page, and still see
Critical — no silent drops." Verified end-to-end: frontend sends
`severity: "Critical"`, handler validates, service persists, DB stores,
GET returns, React renders the correct badge.

Seed data
---------
`migrations/seed.sql`: four demo rules now have differentiated severities
— `pr-require-owner` → Warning, `pr-allowed-environments` → Error,
`pr-max-certificate-lifetime` → Critical, `pr-min-renewal-window` →
Warning. The user called out that seeding all four at the same severity
makes the feature look decorative; differentiation demonstrates the
column carries real signal.

## Integration test fix (side effect of D-006)

`internal/integration/e2e_test.go::TestCrossResourceWorkflow/CreatePolicy`
was sending `"severity": "High"` — a value from the pre-audit severity
vocabulary that the new `ValidatePolicySeverity` correctly rejects with
400. Changed to `"Error"` (closest semantic match in the new TitleCase
allowlist). Only severity reference in the integration/ directory;
verified via grep.

## Out of scope, logged for follow-up (d/D-008)

Three policy-engine drift issues orthogonal to D-005 + D-006, explicitly
deferred per direction:

1. `migrations/seed.sql` policy_rules INSERTs use lowercase TYPE values
   (`'ownership'`, `'environment'`, `'lifetime'`, `'renewal_window'`).
   These are load-bearing on `internal/service/policy.go::evaluateRule`'s
   `switch rule.Type` (which also uses the lowercase strings). Migrating
   requires coordinated changes across seed + evaluation engine.
2. `migrations/seed_demo.sql:482-483` contains lowercase `'critical'`
   severity — will now fail the new CHECK constraint. Separate fix.
3. `evaluateRule` hardcodes `Severity: domain.PolicySeverityWarning` on
   emitted violations and ignores the configured `rule.Config`. The new
   severity column is read correctly on the CRUD path but not yet
   consulted during evaluation.

## Verification

Backend:
- `go build ./...` — clean
- `go vet ./...` — clean
- `go test -short ./...` — all packages green, including
  `internal/service` (policy service), `internal/api/handler` (policy +
  MCP handler tests), `internal/integration` (e2e_test.go after fix),
  `internal/domain`, `internal/repository/postgres`.

Frontend:
- `tsc --noEmit` — clean
- `vitest run` — 223/223 passing (4 new assertions in types.test.ts)
- `vite build` — clean (only the pre-existing chunk-size warning)
2026-04-18 13:02:04 +00:00
shankar0123 72f5246ce3 Merge branch 'fix/m11-cosign-v3-sign-blob-bundle': M-11 cosign v3 sign-blob migration 2026-04-18 09:29:25 +00:00
shankar0123 cb308bb4c7 ci(release): migrate cosign sign-blob to --bundle (cosign v3.0)
Cosign v3.0 (shipped by default with sigstore/cosign-installer@cad07c2e,
release v3.0.5) removed --output-signature and --output-certificate from
the sign-blob subcommand. The replacement is a single --bundle flag that
emits a unified Sigstore bundle (.sigstore.json) containing the
signature, certificate chain, and Rekor inclusion proof in one file.

This change migrates both sign-blob invocations in .github/workflows/
release.yml (per-binary matrix signing and aggregate checksums.txt
signing), updates the artefact upload paths, the artefact aggregation
case filter, the GitHub Release asset list, and the release-notes body
verify-blob example. The README cosign verification snippet and sidecar
description are also updated to the --bundle / .sigstore.json shape.

No cosign version pinning. No legacy fallback. OCI image signing
(cosign sign on image digest) is unchanged — only sign-blob flags
changed in v3.0. See M-11 in certctl-audit-report.md.

Verification gates:
- YAML parse: OK
- go vet ./...: exit 0
- go build ./...: exit 0
- grep 'cosign sign-blob' release.yml: 2 (expected: 2)
- grep '.sigstore.json' release.yml: 9 (expected: >=5)
- grep '.sig/.pem' release.yml non-comment: 0 (expected: 0)
- README legacy cosign refs: 0 (expected: 0)
- docs/ legacy cosign refs: 0 (expected: 0)

Coverage: unchanged (CI workflow edit + README — zero Go code touched).
2026-04-18 09:29:20 +00:00
shankar0123 ad93e99158 Merge branch 'fix/m10-openapi-spec-drift': M-10 OpenAPI spec drift reconciliation 2026-04-18 03:21:45 +00:00
shankar0123 9d0c3dfa15 docs(openapi): reconcile api/openapi.yaml with router routes (M-10)
Add 9 missing operations to api/openapi.yaml that exist in router.go but
were absent from the spec. Spec-only change with no runtime Go code
changes; all 106 pre-existing operationIds preserved byte-identical.

New operationIds:
  - testTargetConnection (POST /api/v1/targets/{id}/test)
  - verifyDeployment    (POST /api/v1/jobs/{id}/verify)
  - getJobVerification  (GET  /api/v1/jobs/{id}/verification)
  - estCACerts          (GET  /.well-known/est/cacerts)
  - estSimpleEnroll     (POST /.well-known/est/simpleenroll)
  - estSimpleReEnroll   (POST /.well-known/est/simplereenroll)
  - estCSRAttrs         (GET  /.well-known/est/csrattrs)
  - scepGet             (GET  /scep)
  - scepPost            (POST /scep)

Spec operations: 106 → 115 (matches 115 router routes exactly).

Verification:
  - openapi-spec-validator: OK
  - go build ./...: clean
  - go vet ./...:   clean
  - go test -race -count=1 -short ./...: 54 packages ok, 0 FAIL
  - golangci-lint run ./...: 0 issues
  - govulncheck ./...: 0 vulnerabilities in our code
  - tsc --noEmit: 0 errors
  - vitest run: 3 files, 218 tests passed

sha256 before: 7c14f77107a86f8de82fe91b7f5e16cca11206d1e1fab7b7bd77ff396620fdf3
sha256 after:  87bd92d0407d63643bec612d27261bf489563beb90d0791ea71cde26346f83d3
2026-04-18 03:21:40 +00:00
shankar0123 2c9602db71 Merge branch 'fix/m9-sentinel-discovery-log-levels': M-9 sentinel discovery log-level fix 2026-04-18 02:53:50 +00:00
shankar0123 ef670fa6da fix(m-9): aggregate per-endpoint scan errors in NetworkScanService
Before this fix, RunScan declared `scanErrors []string` but never
appended to it. As a result:

  - the summary Info log ("network target scan completed") always
    reported `"errors": 0`, regardless of how many endpoints failed
  - the DiscoveryReport's `Errors` field — stored on the scan record
    and surfaced in the GUI scan history — was always nil

Operators who needed to understand scan failures had to enable Debug
logging and grep through the noise of expected sweep-scan connection
refusals. The per-endpoint log level (Debug) is deliberate and correct
— scanning a /24 typically produces 200+ connection-refused results,
and logging each at Warn would create massive log spam at default
verbosity. The bug was the silent loss of the aggregate count.

This commit:

  - extracts the partitioning logic into `collectScanResults`, a pure
    method that splits per-endpoint results into discovered certificate
    entries and a list of endpoint error strings
  - populates the errors list with "<address>: <error>" so the scan
    record correlates failures back to specific endpoints
  - preserves the existing Debug-level per-endpoint log (sweep noise
    discipline) — no change to default-verbosity log output

The summary Info log's "errors" field and the DiscoveryReport's Errors
field now reflect the true failure count. Debug detail remains
available for operators diagnosing specific endpoints.

Audit scope note: the M-9 finding narrative implied broad Debug-level
hiding of real errors across AWS SM, Azure KV, GCP SM, and network
scan sentinel agents. On investigation, the three cloud-discovery
connectors (awssm, azurekv, gcpsm) already use appropriate Warn/Error
discipline for per-item and root-level failures. Only the network
scanner had a silent observability gap, and it was a missed append
rather than a misapplied log level. See audit resolution log for
full details.

CWE: CWE-778 (Insufficient Logging) — aggregate failure count lost.

Tests: 4 new unit tests on collectScanResults covering the
aggregation path (success + failure mix), all-success, all-failed,
and empty-input degenerate cases. All tests pass with -race.

Verification:
  - go build ./cmd/server/... ./cmd/agent/... ./cmd/mcp-server/... ./cmd/cli/...  exit 0
  - go vet ./...                                                                    exit 0
  - go test -race -count=1 -timeout 300s [full CI race path]                        exit 0
  - golangci-lint run ./... --timeout 5m (v2.11.4)                                  0 issues
  - govulncheck ./... (@latest)                                                     0 in-code vulnerabilities
  - go test -count=1 -cover ./internal/service/...                                  68.0% (> 55% threshold)

Invariants preserved:
  - collectScanResults signature: method on *NetworkScanService,
    input []domain.NetworkScanResult, return ([]DiscoveredCertEntry, []string)
  - Debug log key names unchanged ("address", "error")
  - DiscoveryReport schema unchanged (Errors field already existed)
  - Sentinel agent ID "server-scanner" unchanged
  - No migration, no API, no wire-format change

Refs: M-9 Medium finding; audit resolution log appended in follow-up
commit on workspace-level audit report.
2026-04-18 02:34:14 +00:00
shankar0123 5a6ec39cfd Merge branch 'fix/m2-pr-f-scheduler-contextcheck-audit-closeout' 2026-04-18 01:43:56 +00:00
shankar0123 e3196e7b50 M-2 PR-F: Middleware/ACME ctx-propagation + contextcheck linter + audit closeout
Final PR in the six-commit M-2 sequence (PR-A: CertificateService cluster
cdc9d03, PR-B: IssuerService+TargetService eb14236, PR-C: Policy/Profile/
Owner/Team 2497be4, PR-D: Job/Notification/Audit ccd89c3, PR-E: AgentService
283ec27, PR-F: this commit). PR-A through PR-E collapsed the service-layer
shim methods and deleted every in-production context.Background() /
context.TODO() call from internal/service/; this PR completes the sweep
across the non-service tiers (HTTP middleware + ACME connector) and wires
the contextcheck linter so regressions fail CI.

Three narrow edits land the D-3 pattern (context.WithoutCancel for
subsidiary async writes and deferred shutdown contexts):

  - internal/api/middleware/audit.go  -- async audit goroutine now runs
    on auditCtx := context.WithoutCancel(r.Context()) instead of
    context.Background(). Preserves request-scoped values (trace ID, auth)
    while detaching from the request's cancellation so the audit write
    does not get killed when the response completes. Goroutine is still
    tracked via a.wg (M-1 shutdown drain) so Flush(ctx) behaviour is
    unchanged. CWE-770 Missing Release (goroutine leak potential) +
    CWE-400 Resource Exhaustion (missed cancellation propagation).

  - internal/api/middleware/middleware.go -- Recovery panic path now
    logs via slog.ErrorContext(ctx, ...) instead of log.Printf. Request-
    scoped trace/auth metadata now carries through the panic log, matching
    every other request log. D-3 non-bypass: the context is r.Context()
    captured before the defer, so even a panic mid-handler propagates
    the ctx's trace ID into the ERROR log line.

  - internal/connector/issuer/acme/acme.go (HTTP-01 challenge server
    shutdown) -- defer shutdown context derived from
    context.WithTimeout(context.WithoutCancel(ctx), 5s) instead of
    context.Background(). Preserves parent ctx values, detaches from
    parent cancellation so Shutdown always gets its full 5-second
    budget even when the parent was cancelled. Matches the same pattern
    applied in ACME's solveAuthorizationsDNS01 and solveAuthorizationsDNSPersist01.

Linter wiring: .golangci.yml adds `contextcheck` to the enabled set.
golangci-lint v2.11.4 now fails CI on any function that takes a
context.Context parameter but calls into context.Background() or
context.TODO() instead of propagating -- regression guard for all five
prior PRs.

Verification (CI parity, GOCACHE=/tmp/gocache GOMODCACHE=/tmp/gomodcache
GOLANGCI_LINT_CACHE=/tmp/lintcache):

  - go build ./... -> 0
  - go vet ./... -> 0
  - golangci-lint run (contextcheck enabled) -> 0 issues
  - go test -race -short ./internal/api/middleware/... -> PASS
  - go test -race -short ./internal/scheduler/... -> PASS
  - go test -race -short ./internal/connector/issuer/acme/... -> PASS
  - go test -race -short ./internal/service/... -> PASS
  - rg "context\.(Background|TODO)\(\)" internal/service/ internal/scheduler/
    internal/connector/ internal/api/middleware/ -> 0 non-test hits
    (one pedagogical godoc reference in audit.go documenting why
    context.Background() would be wrong remains intentional)

Wire-format invariants preserved: 0 API routes, 0 SQL migrations, 0
frontend bytes, 0 OpenAPI bytes, 0 connector interface signature changes,
0 new env vars, 0 new external dependencies (pure context stdlib). The
AuditRecorder interface signature, the body-hash algorithm (SHA-256 16
hex chars), the excluded-path short-circuit, the actor-extraction path,
the responseWriter status-capture wrapper, the AuditServiceAdapter, and
all 116 API routes under /api/v1/, /.well-known/est/, /scep, /health,
/auth are byte-identical.

M-2 aggregate across PR-A through PR-F: 57 files, +635 / -613 (PR-A 12f
+227/-237, PR-B 9f +150/-146, PR-C 17f +156/-148, PR-D 11f +67/-63,
PR-E 4f +9/-15, PR-F 4f +26/-4). With M-2 closed, 8 of 10 Medium
findings resolved; M-9, M-10, L-1..L-4, I-1..I-8 remain post-v2.1.0
hardening batch.

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:43:47 +00:00
shankar0123 bea69efd12 Merge branch 'fix/m2-pr-e-agent-service'
PR-E of 6: AgentService ctx-first collapse.

Collapses the HeartbeatWithContext wrapper into a single Heartbeat
method. Handler-facing method name is preserved (D-4); the handler
service interface and mock already expected ctx-first, so this PR
touches only the service layer and its tests (4 files, 9+/15-).

Verification on the feature branch: build, vet, test (-short),
test -race, full-module test -short, and golangci-lint all clean.

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:25:30 +00:00
shankar0123 283ec27ca4 fix(m2-pr-e): collapse AgentService.HeartbeatWithContext into Heartbeat
PR-E of 6 in the M-2 end-to-end remediation sequence. Collapses the
HeartbeatWithContext wrapper into a single ctx-first Heartbeat method,
matching D-1 (ctx-only signatures, no dual forms). The handler-facing
method name is preserved (D-4) — internal/api/handler/agents.go already
declares `Heartbeat(ctx, ...)` on its local service interface, and the
handler mock at internal/api/handler/agent_handler_test.go already
takes `_ context.Context` as its first param, so no handler churn.

Changes
-------
internal/service/agent.go
  - Delete the zero-body Heartbeat wrapper that forwarded to
    HeartbeatWithContext with context.Background().
  - Rename HeartbeatWithContext → Heartbeat (ctx-bearing body
    folded directly into the canonical method).

internal/service/agent_test.go
  - TestHeartbeat (L95) and TestHeartbeat_NotFound (L128):
    agentService.HeartbeatWithContext(ctx, ...) → .Heartbeat(ctx, ...).

internal/service/concurrent_test.go
  - L162: agentSvc.HeartbeatWithContext(ctx, agentID, metadata)
    → .Heartbeat(ctx, agentID, metadata).

internal/service/context_test.go
  - L179 + L232: agentSvc.HeartbeatWithContext(ctx, ...) → .Heartbeat(...)
  - L185 + L238 t.Logf strings: "HeartbeatWithContext with ..." →
    "Heartbeat with ..." to match the collapsed method name.

Verification (Go 1.25.9 linux/arm64, CI-parity caches)
------------------------------------------------------
  go build ./...                 clean
  go vet ./...                   clean
  go test -short ./internal/service/... ./internal/api/handler/... \
    ./internal/integration/...   all ok
  go test -race -short same set  all ok
  go test -short ./...           all packages ok
  golangci-lint run ./...        0 issues

Locked decisions from the M-2 plan:
  D-1 ctx-only signatures (no dual forms)
  D-4 preserve handler method names facing the router
  D-5 domain types stay ctx-free

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:25:20 +00:00
shankar0123 a67a6b6c30 Merge branch 'fix/m2-pr-d-job-notification-audit'
PR-D: Thread ctx through Job + Notification + Audit service cluster.
Collapse CancelJobWithContext into CancelJob; eliminate 10
context.Background() hits.

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:20:58 +00:00
shankar0123 ccd89c348f fix(m2-pr-d): thread ctx through Job/Notification/Audit services
Collapse CancelJobWithContext into CancelJob; eliminate 10 context.Background()
hits across the Job+Notification+Audit service cluster by threading ctx
through their handler-facing service interfaces.

Services (ctx-first):
- service/job.go: ListJobs, GetJob, CancelJob, ApproveJob, RejectJob now
  accept ctx; the CancelJobWithContext wrapper is removed (handler callers
  continue to invoke CancelJob, now ctx-aware).
- service/notification.go: ListNotifications, GetNotification, MarkAsRead
  accept ctx.
- service/audit.go: ListAuditEvents, GetAuditEvent accept ctx.

Handlers (interface + callsites):
- handler/jobs.go, handler/notifications.go, handler/audit.go: local
  service interfaces updated, r.Context() threaded at every callsite.

Tests:
- Mock services updated to match the new interfaces (ctx accepted and
  ignored via '_ context.Context' first parameter; Fn closure fields
  unchanged).
- job_test.go / notification_test.go callsites thread context.Background()
  to match production shape.

Verification:
  go build ./...                 ok
  go vet ./...                   ok
  go test -short ./...           ok
  go test -race -short ./...     ok
  golangci-lint run ./...        0 issues

Locked decisions from the M-2 plan:
  D-1 ctx-only signatures (no dual forms)
  D-4 preserve handler method names facing the router
  D-5 domain types stay ctx-free

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:20:46 +00:00
shankar0123 478a141498 Merge branch 'fix/m2-pr-c-crud-cluster' 2026-04-18 01:10:10 +00:00
shankar0123 2497be496d M-2 PR-C: Collapse Policy/Profile/Owner/Team services to ctx-first signatures
- Add ctx first param to 21 service-layer handler-interface methods
  across policy.go (6), profile.go (5), owner.go (5), team.go (5)
- Replace 24 context.Background() call sites with received ctx; use
  context.WithoutCancel(ctx) for subsidiary audit-recording ops to
  preserve fire-and-forget audit semantics without inheriting caller
  cancellation
- Add ctx first param to 21 handler-interface method signatures across
  policies.go (6), profiles.go (5), owners.go (5), teams.go (5)
- Thread r.Context() through 21 HTTP handler sites (ListPolicies,
  GetPolicy, CreatePolicy, UpdatePolicy, DeletePolicy, ListViolations,
  ListProfiles, GetProfile, CreateProfile, UpdateProfile, DeleteProfile,
  ListOwners, GetOwner, CreateOwner, UpdateOwner, DeleteOwner,
  ListTeams, GetTeam, CreateTeam, UpdateTeam, DeleteTeam)
- Update MockPolicyService/MockProfileService/MockOwnerService/
  MockTeamService mock method impls with _ context.Context first param
  (Fn fields unchanged — closures do not need ctx); update mock impls
  in integration/lifecycle_test.go for all four services
- Update 12 service-layer test callsites (policy_test.go ×2,
  owner_test.go ×5, team_test.go ×5, profile_test.go ×13) to pass
  context.Background() at the call site

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:10:06 +00:00
shankar0123 25dd6c07f3 Merge branch 'fix/m2-pr-b-issuer-target' 2026-04-18 00:47:02 +00:00
shankar0123 eb14236166 M-2 PR-B: Collapse IssuerService + TargetService to ctx-first signatures
- Delete bare TestConnection wrapper in IssuerService; rename
  TestConnectionWithContext → TestConnection
- Delete TestTargetConnection delegate shim in TargetService (canonical
  TestConnection already ctx-first)
- Add ctx first param to 10 handler-interface methods
  (ListIssuers/GetIssuer/CreateIssuer/UpdateIssuer/DeleteIssuer and
  ListTargets/GetTarget/CreateTarget/UpdateTarget/DeleteTarget)
- Replace 16 context.Background() call sites with received ctx
- Thread r.Context() through 12 HTTP handler sites in issuers.go and
  targets.go (outer TargetHandler.TestTargetConnection HTTP method name
  preserved for router compatibility)
- Update MockIssuerService, MockTargetService, and mockTargetService
  (integration) for ctx-first forwarding; update test callsite literals

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 00:46:58 +00:00
shankar0123 bbb628243f Merge branch 'fix/m2-pr-a-certificate-cluster' 2026-04-18 00:29:40 +00:00
shankar0123 cdc9d03d5b fix(m-2): thread context through CertificateService cluster
Collapses CertificateService, RevocationSvc, and CAOperationsSvc to
ctx-accepting method signatures. Removes context.Background() synthesis
at 24 internal call sites across certificate.go, revocation_svc.go, and
ca_operations.go.

- Primary repo calls inherit request cancellation via the passed ctx.
- Audit and notification dispatches use context.WithoutCancel(ctx) so
  they survive client disconnect.
- Collapses TriggerRenewal/TriggerRenewalWithActor,
  TriggerDeployment/TriggerDeploymentWithActor, and
  RevokeCertificate/RevokeCertificateWithActor sibling pairs into single
  canonical ctx-accepting methods (decisions D-1, D-2).

Handlers pass r.Context(). Mocks and tests updated to match new
signatures. No HTTP surface change, no OpenAPI change.

PR 1 of 6 in the M-2 remediation chain. Master green at this commit.

Refs: certctl-audit-report.md M-2 (L143, L224)
2026-04-18 00:29:37 +00:00
shankar0123 e951d319d0 Merge branch 'fix/m1-audit-shutdown-drain'
Resolves M-1 (Medium): Audit recorder shutdown drain.

The API audit middleware's detached recording goroutines now drain
during graceful shutdown via AuditMiddleware.Flush (sync.WaitGroup +
timeout-aware select), called between http.Server.Shutdown and
db.Close. Prevents silent audit-event loss on SIGTERM
(CWE-662 / CWE-400).
2026-04-17 17:29:54 +00:00
shankar0123 d14a45401b fix(audit): drain in-flight recording goroutines on shutdown (M-1)
Audit events spawned from the HTTP middleware ran in detached goroutines
using context.Background(). On SIGTERM the DB pool was closed before
those goroutines finished writing, silently dropping audit events
(CWE-662 Improper Synchronization / CWE-400 Uncontrolled Resource
Consumption).

NewAuditLog now returns an *AuditMiddleware struct that tracks every
spawned goroutine with sync.WaitGroup. Callers wire the middleware via
its Middleware method value (preserves the existing
func(http.Handler) http.Handler shape) and drain the WaitGroup with
Flush(ctx), which blocks until in-flight recordings complete or the
provided context is cancelled — mirroring scheduler.WaitForCompletion.

Flush is invoked in cmd/server/main.go between http.Server.Shutdown
(no new requests accepted) and db.Close (pool torn down), with a
timeout returning ErrAuditFlushTimeout wrapping ctx.Err().

Request-derived inputs (method, path, status) are snapshotted before
the goroutine spawn so the worker does not race with http.Server
reusing r after the handler returns.

Tests:
  TestAuditLog_FlushDrainsInFlightGoroutines
  TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout

Verification:
  go build ./...                            : 0
  go vet ./...                              : 0
  go test -race -short ./...                : 0 (all packages)
  go test -cover ./internal/api/middleware  : 81.4%
  golangci-lint run                         : 0 issues
  govulncheck ./...                         : 0 vulns in called code
2026-04-17 17:29:48 +00:00
shankar0123 655e2879e6 feat(frontend): add Owner field to OnboardingWizard Certificate step
The first-run onboarding wizard's Certificate step now surfaces an
Owner dropdown (required) alongside Issuer and Profile, matching the
ownership model introduced in M11b. Prevents newly-created certs from
being unowned and bypassing notification routing.

- web/src/pages/OnboardingWizard.tsx: getOwners query, ownerId state,
  Owner <select>, required-field guard (nextDisabled), empty-state link
  to /owners page when no owners exist yet.

Frontend-only change; no backend wiring or schema impact. Separated
from the M-6 sentinel-agent idempotency commit per scope-guard.
2026-04-17 16:55:44 +00:00
shankar0123 e757ef1471 Merge branch 'fix/m6-sentinel-idempotent-create'
Resolves M-6 (Medium): swallowed sentinel agent INSERT errors.
CWE-662 / CWE-209-adjacent.

Shape A: CreateIfNotExists helper + 4 sentinel call sites.
2026-04-17 16:32:12 +00:00
shankar0123 27afa4463d fix(repository): idempotent sentinel agent creation via ON CONFLICT (M-6)
Sentinel agents (server-scanner, cloud-aws-sm, cloud-azure-kv,
cloud-gcp-sm) were created on startup with a plain INSERT whose
duplicate-key error was swallowed unconditionally. That silenced every
other DB failure too (connectivity drop, permissions change, unrelated
constraint violation) — a restart after the first boot quietly
de-fanged cloud discovery and the network scanner (CWE-662, CWE-209-
adjacent).

Shape A: add AgentRepository.CreateIfNotExists using ON CONFLICT (id)
DO NOTHING RETURNING id + sql.ErrNoRows discrimination. This keeps the
strict Create semantics (duplicate-key is an error) intact for real
agent registration and gives sentinels their own idempotent path.

- repo: CreateIfNotExists returns (created bool, err error); false,nil
  on pre-existing row; false,wrapped err on anything else.
- interface: CreateIfNotExists added to AgentRepository.
- main.go: 4 sentinel sites log Error/Info/Debug distinctly.
- mocks: service + integration mocks implement the new method.
- tests: 4 new testcontainers integration tests cover first-insert,
  idempotent second-call, concurrent 16-goroutine race (exactly one
  creator, no duplicate-key panic), and pre-cancelled context
  surfacing.

Coverage gates (go test -cover): service 67.6%/55, handler 78.6%/60,
domain 92.7%/40, middleware 80.0%/30, crypto 86.7%/85. Race/vet/
golangci-lint v2.11.4 (0 issues)/govulncheck v1.2.0 clean across all
touched packages.
2026-04-17 16:32:07 +00:00
shankar0123 80450c7180 fix(repository): populate TargetIDs in certificate scan helper (M-7)
scanCertificate never queried the certificate_target_mappings junction
table, so Certificate.TargetIDs was always nil on reads. This silently
broke deployment lookups, bulk revocation filters, cert detail pages,
and any code path that iterated TargetIDs to dispatch target work.

Fix:
- Convert scanCertificate to a receiver method (r *CertificateRepository)
  so it has access to the DB for the secondary junction query.
- Get(): scan the row, then call r.getTargetIDs(ctx, certID) to populate
  TargetIDs with a single targeted query.
- List() and GetExpiringCertificates(): inline the scan loop so we can
  collect all certIDs first, then call getTargetIDsForCertificates once
  with pq.Array(certIDs) to avoid N+1 round-trips. Build a map and
  attach TargetIDs to each certificate in the result set.
- Default TargetIDs to []string{} (not nil) when a cert has no mappings
  so JSON marshals as [] rather than null.

Tests:
- New integration test file certificate_targetids_test.go with 5
  subtests exercising Get / List / GetExpiringCertificates single
  and multi-target cases plus the empty-slice vs nil contract.
- Uses the shared testcontainers-go setupTestDB infrastructure and
  skips under 'go test -short' so CI (which excludes ./internal/repository/...
  from coverage paths anyway) stays green.

Addresses M-7 from certctl-audit-report.md.
2026-04-17 15:41:08 +00:00
shankar0123 c655e0f8c5 fix(crypto/local-ca): reject expired or not-yet-valid sub-CA certificates on disk load (M-5)
loadCAFromDisk now validates the upstream sub-CA certificate's NotBefore
and NotAfter fields before accepting it, returning a fail-closed error
at server startup instead of silently loading an out-of-window CA.

Before this fix, loadCAFromDisk checked BasicConstraints.IsCA and
KeyUsage=CertSign but not the validity window. An expired enterprise
sub-CA (e.g. an ADCS subordinate whose rollover slipped) would load
without warning and the scheduler would mint child certs that every
RFC 5280 path validator rejects — outages show up at relying parties,
not at certctl, and only after thresholds trip.

CWE-672 (Operation on a Resource after Expiration or Release); secondary
CWE-295 (Improper Certificate Validation). Error strings include the CA
subject CommonName and both RFC3339 timestamps so the log line is
actionable in a 3am incident.

Tests: TestSubCAMode gains three subtests exercising the new gate —
SubCA_ExpiredCert_IsRejected (CA expired 1h ago → error mentions
'expired' and the CN), SubCA_NotYetValid_IsRejected (CA valid +1h →
error mentions 'not yet valid' and the CN), and SubCA_BarelyValid_IsAccepted
(CA valid [now-1m, now+1h] → issuance succeeds, proving no
over-rejection). Adds generateTestSubCAWithValidity helper; the
original generateTestSubCA wrapper preserves the [now, now+5y] default
for existing tests.

Package coverage: 67.7% -> 68.3%.

Verification: go build, go vet, go test -race, go test -cover all
green locally; golangci-lint v2.11.4 clean; govulncheck clean. All CI
coverage floors met with margin (service 67.6/55, handler 78.6/60,
domain 92.7/40, middleware 80.0/30, crypto 86.7/85).

Parent: 5abeeb8 (M-8 per-ciphertext salt).
Closes: audit finding M-5 in certctl-audit-report.md.
2026-04-17 14:10:23 +00:00
shankar0123 5abeeb882b fix(crypto): per-ciphertext PBKDF2 salt + v2 versioned format with v1 fallback (M-8) 2026-04-17 05:36:29 +00:00
shankar0123 b1df6dab27 ci(release): add CLI/MCP binaries, checksums, SBOM, Cosign, SLSA provenance (M-3) 2026-04-17 04:04:55 +00:00
shankar0123 672e1d991d build: propagate HTTP_PROXY/HTTPS_PROXY/NO_PROXY through Docker build (M-4, Issue #9)
Addresses Medium finding M-4 in the audit report. The multi-stage
Dockerfiles previously had no ARG declarations for HTTP_PROXY,
HTTPS_PROXY, or NO_PROXY, so corporate-proxy environments silently
failed at 'npm ci' (frontend stage) and 'go mod download' (Go builder).
The npm retry idiom (`npm ci --include=dev || npm ci --include=dev`)
masked the failure because the upstream 'Exit handler never called!'
bug exits 0 despite the install crash.

Fix: thread HTTP_PROXY / HTTPS_PROXY / NO_PROXY ARGs through every
Docker build stage that performs network I/O, re-export them as ENV
with both upper- and lower-case aliases (apk/curl/npm read lowercase;
Go/Node read uppercase), and forward the host shell's environment via
`build.args:` in every compose file and `build-args:` in the release
workflow's docker/build-push-action steps. Defaults are empty strings
so un-proxied builds remain byte-identical to the pre-fix tree.

Scope: Dockerfile (frontend + Go builder stages), Dockerfile.agent
(Go builder stage), deploy/docker-compose.yml (server + agent),
deploy/docker-compose.dev.yml (server + agent), deploy/docker-compose.test.yml
(server + agent), .github/workflows/release.yml (both docker/build-push-action
v6 invocations). Zero Go, web, test, or runtime code changes. Zero
base-image changes. Existing npm `||` retry idiom and `ARG TARGETARCH`
preserved verbatim.

CWE-1173 (Improper Use of Validated Input) / CWE-16 (Configuration).

Verification:
- YAML parses clean across all four compose files and release.yml.
- yamllint -d relaxed: clean exit across all five YAML files.
- All six `build.args:` blocks expose HTTP_PROXY, HTTPS_PROXY, NO_PROXY
  with default-empty ${VAR:-} substitution.
- Both release.yml docker/build-push-action steps expose the same
  three keys sourced from ${{ secrets.HTTP_PROXY }}, etc.
- Dockerfiles contain 5 proxy ARG declarations total (Dockerfile has 2
  stages × 3 ARGs = 6 lines, Dockerfile.agent has 1 stage × 3 ARGs = 3
  lines); lowercase ENV aliases verified present in every stage.
- git diff --shortstat: 6 files changed, 117 insertions(+), 0 deletions.
  Pure additive.

Docker-live verification (`docker build`, `docker compose config`)
deferred to CI / post-commit smoke because the sandbox has no Docker
runtime. hadolint, go, golangci-lint, govulncheck likewise unavailable
in the sandbox; per-layer CI coverage gates (service 55%, handler 60%,
domain 40%, middleware 30%) are trivially unaffected as M-4 touches
zero Go source files.
2026-04-17 03:12:45 +00:00
shankar0123 89b910a8f1 security: atomic pending-job claim with FOR UPDATE SKIP LOCKED (H-6)
Fixes H-6 (CWE-362) — GetPendingJobs returned pending rows without row
locks, so two scheduler replicas in an HA deployment could both read the
same row, both decide it was theirs, and race on UpdateStatus, producing
duplicate Running jobs and duplicate certificate issuances.

Remediation: a claim-style repository API that selects + transitions
Pending -> Running in one transaction with SELECT ... FOR UPDATE SKIP
LOCKED. Concurrent claimants observe disjoint row sets; no worker ever
sees another worker's claimed row.

Repository changes (internal/repository/postgres/job.go):
  - New ClaimPendingJobs(ctx, jobType, limit): BEGIN; SELECT id,...
    FROM jobs WHERE status='Pending' (optional type filter, optional
    LIMIT) FOR UPDATE SKIP LOCKED; UPDATE jobs SET status='Running',
    updated_at=NOW() WHERE id = ANY($ids); COMMIT. Returns the claimed
    rows with status already flipped.
  - New ClaimPendingByAgentID(ctx, agentID): mirrors M31 UNION ALL
    semantics (direct agent_id match, target->agent JOIN fallback,
    certificate->target->agent chain for AwaitingCSR) but wraps each
    branch in FOR UPDATE SKIP LOCKED and flips Deployment/Renewal rows
    to Running. AwaitingCSR rows are returned in place (state
    transition deferred until SubmitCSR, consistent with M8 semantics).
  - Existing GetPendingJobs / ListPendingByAgentID retained for legacy
    compatibility; their godoc now directs production callers to the
    Claim* variants.

Production caller switches:
  - internal/service/job.go ProcessPendingJobs: ListByStatus(Pending)
    -> ClaimPendingJobs(ctx, "", 0). Eliminates the real scheduler
    race between two replicas tick-firing simultaneously.
  - internal/service/agent.go GetPendingWork: ListPendingByAgentID ->
    ClaimPendingByAgentID. Eliminates the race between two pollers
    for the same agent (e.g. brief network blip causing duplicate
    poll) and between a scheduler tick and an agent poll.

Safety argument for pre-flipping Pending -> Running inside the claim
transaction: ProcessRenewalJob and ProcessDeploymentJob both call
UpdateStatus(Running) unconditionally on entry, so an early flip is
idempotent. On panic, the scheduler's panic recovery leaves the job
in Running which the existing stale-running reaper handles.

Tests (internal/repository/postgres/repo_test.go, skipped in -short):
  - TestJobRepository_ClaimPendingJobs_FlipsToRunning: seed 5 Pending,
    claim once, assert all 5 returned + DB rows Running, residual
    claim returns 0.
  - TestJobRepository_ClaimPendingJobs_ConcurrentDisjoint: seed M=40
    Pending Renewals, spawn N=8 goroutines each calling
    ClaimPendingJobs(_, JobTypeRenewal, 1) in a loop. Invariants:
    (a) no job ID claimed by more than one worker, (b) sum of claims
    == 40, (c) all 40 rows in Running state in the DB. Bounded
    empty-streak guard (20 iterations) covers SKIP LOCKED transient
    zeros under contention.
  - TestJobRepository_ClaimPendingByAgentID_TransitionsDeployments:
    seeds 2 Pending Deployment + 1 AwaitingCSR for agent A plus 1
    Pending Renewal for agent B (scope check). Asserts deployments
    flip to Running, AwaitingCSR is returned but preserved, agent B's
    renewal never appears.

Mock updates: testutil_test.go, lifecycle_test.go, verification_test.go
gained ClaimPendingJobs/ClaimPendingByAgentID on their mock job repos
mirroring the real Pending -> Running semantics. Mocks intentionally
do NOT write to StatusUpdates (that map tracks UpdateStatus() call
history specifically; the real claim path uses a bulk UPDATE, not
UpdateStatus).

Verification (CI-scope):
  - go build ./cmd/...: ok
  - go vet ./...: ok
  - go test -race -short on service, api/handler, api/middleware,
    scheduler, connector/..., domain, validation, tlsprobe: ok
  - Coverage gates: service 67.6% (>=55), handler 78.6% (>=60),
    middleware 80.0% (>=30), domain 92.7% (>=40). All hold.
  - golangci-lint 2.11.4: 0 issues
  - govulncheck: no vulnerabilities in call graph
  - Frontend: tsc clean, 218 vitest tests pass, vite build ok
  - helm lint + helm template: ok
  - Invariant sweeps: FOR UPDATE SKIP LOCKED present in job.go;
    H-1 through H-5 fixtures unchanged.

Refs: H-6 in certctl-audit-report.md
2026-04-17 02:34:56 +00:00
shankar0123 6315ef102a security(globalsign): remove InsecureSkipVerify and pin CA pool (H-5)
The GlobalSign Atlas HVCA connector previously used InsecureSkipVerify:true
on its mTLS TLS config, disabling server certificate validation and
defeating the purpose of the client-side mTLS handshake. This was a
CWE-295 Improper Certificate Validation vulnerability silently degrading
trust on every production call to GlobalSign's signing API.

Remediation (per H-5 audit finding, Lens 4.4):

- Remove InsecureSkipVerify from all three http.Client construction sites
  (ValidateConfig, getHTTPClient, and legacy initialisation path).
- Introduce buildServerTLSConfig() helper that constructs tls.Config with
  MinVersion: tls.VersionTLS12 (addresses adjacent L-1 recommendation).
- New optional config field `server_ca_path` (env:
  CERTCTL_GLOBALSIGN_SERVER_CA_PATH). When unset the connector trusts the
  system root CA bundle (correct default for GlobalSign's publicly-trusted
  HVCA endpoints). When set the bundle is loaded via x509.NewCertPool() +
  AppendCertsFromPEM, and only those roots are trusted (supports private
  HVCA deployments and defence-in-depth root pinning).
- Error wrapping chain: "failed to read server CA bundle at %s" and
  "no valid PEM certificates found in server CA bundle at %s" surface
  config problems at ValidateConfig time instead of silently failing at
  request time.

Docs, config, service env-seed, and GUI issuer type definition updated to
expose the new field. Tests: 9 dead `InsecureSkipVerify: true` client
TLSClientConfig blocks (no-ops against httptest.NewServer plain-HTTP)
replaced with bare http.Client; new TestGlobalSign_ServerTLSConfig covers
pinned-CA trust, untrusted-server rejection, missing-file and invalid-PEM
error paths.

Verification:
- go build ./... clean
- go vet ./... clean
- go test -race ./internal/connector/issuer/globalsign/... ./internal/config/... ./internal/service/... ok
- go test ./... (excluding testcontainers-gated repo layer) ok
- golangci-lint run ./... 0 issues
- govulncheck ./... 0 reachable vulns
- Per-layer coverage: service 68.7% (≥55), handler 83.6% (≥60), domain 82.0% (≥40), middleware 63.8% (≥30)
- globalsign package coverage: 75.9%
- Invariant sweep: 0 InsecureSkipVerify references remain in globalsign
  package (only a test-file comment documenting the removal).
2026-04-17 01:40:58 +00:00
shankar0123 119986fa7e security: add SSRF defence-in-depth for webhook notifier (fixes H-4)
The webhook notifier would previously accept any operator-configured URL
and hand it to http.Client without validation. That exposed two
SSRF classes (CWE-918):

  * Reserved-address reachability — a misconfigured or adversarial
    webhook URL pointing at 127.0.0.1, ::1, 169.254.169.254 (cloud
    metadata), or 0.0.0.0 would succeed, exfiltrating request bodies
    to local services or leaking short-lived cloud credentials.
  * DNS rebinding — a hostname resolving to a public IP at validation
    time and to a reserved IP at dial time would bypass any
    URL-string-only check.

Fix installs two independent layers:

  * validation.ValidateSafeURL runs at config-ingest time and before
    every outbound POST. It rejects non-HTTP(S) schemes, empty hosts,
    and literal reserved-IP hosts with a clear operator-facing error.
    This is a fast early diagnostic.
  * validation.SafeHTTPDialContext is installed on the webhook
    http.Transport. It re-resolves the host at dial time, rejects any
    resolved address whose address lies in a reserved range (loopback,
    link-local, multicast, broadcast, unspecified, IPv6
    link-local/multicast), and pins the resolved IP into the final
    dial address so the TLS handshake targets the exact IP the guard
    approved. This is the authoritative, TOCTOU-safe defence against
    DNS rebinding.

The two layers are complementary — validateURL fails fast on obvious
misconfiguration; SafeHTTPDialContext fails closed when DNS changes
between validation and dial.

The existing unexported isReservedIP helper in
internal/service/network_scan.go is extracted into
internal/validation.IsReservedIP with byte-identical behaviour so the
webhook notifier and the network scanner share a single authoritative
reserved-address list. RFC 1918 ranges remain intentionally allowed
(certctl's self-hosted design). Broader unspecified / IPv6 link-local
coverage lives only in the stricter dial-time policy, where it belongs
for outbound HTTP egress.

Test seam: Connector gains an unexported validateURL func field and a
same-package newForTest constructor that installs a permissive
validator and the stdlib default transport. Production callers cannot
reach this constructor because it is unexported; only same-package
tests (package webhook) can use it. Same-package happy-path tests call
newForTest so they can point at httptest loopback servers without
being blocked by the production guard. The four SSRF-rejection tests
that verify the guard itself still call New so they exercise the real,
strict validator. This keeps the production SSRF defence
unconditionally on in real code while preserving legitimate unit-test
coverage.

Tests
-----
  * internal/validation/ssrf_test.go (new) — 16-subtest pin on
    IsReservedIP that is byte-identical with the original network-
    scanner behaviour; ValidateSafeURL accept/reject matrix covering
    HTTPS/HTTP, reserved-literal IPv4/IPv6, dangerous schemes
    (file/gopher/ftp/javascript/data/ldap/dict/jar), missing hosts,
    and malformed inputs; SafeHTTPDialContext rejects literal reserved
    addresses and hosts resolving to reserved addresses (DNS-rebinding
    coverage via localhost).
  * internal/connector/notifier/webhook/webhook_test.go — happy-path
    tests switched to newForTest; production-guard SSRF-rejection
    tests (TestValidateConfig_RejectsReservedURLs,
    TestValidateConfig_RejectsDangerousScheme,
    TestPostWebhook_RejectsReservedURL,
    TestPostWebhook_RejectsDangerousScheme) continue to call New so
    they exercise the unconditionally-installed production validator.

Wire-format invariants preserved
--------------------------------
  * Outbound HTTP request shape (method, headers, body, HMAC
    signature) unchanged.
  * network_scan.go behaviour unchanged — validation.IsReservedIP is
    byte-identical with the deleted helper.
  * RFC 1918 (10/8, 172.16/12, 192.168/16) remain allowed for both
    outbound webhook and CIDR expansion, matching the self-hosted
    design.

Verification
------------
  * go test -race ./internal/validation/... ./internal/connector/
    notifier/webhook/... ./internal/service/... — green.
  * Full-suite go test -race ./... — green (GOTMPDIR=/dev/shm to
    sidestep full /tmp on the sandbox host).
  * Coverage gates pass: service 68.8% >= 55%, handler 83.6% >= 60%,
    domain 82.0% >= 40%, middleware 63.8% >= 30%. Overall 67.8%.
    Webhook package 91.5% line coverage; validation package
    ValidateSafeURL/SafeHTTPDialContext 78-100% per function.
  * govulncheck ./... — no vulnerabilities found.
  * golangci-lint run on touched H-4 production code — clean. Pre-
    existing errcheck/gosimple warnings in scope-adjacent files
    (webhook_test.go:270 w.Write, network_scan.go:120/173/265/305)
    verified against 3853b74 to predate this commit; left alone per
    scope guard.

Operational notes
-----------------
  * No migration needed. The guard is pure Go code; existing webhook
    configs continue to work unless they point at reserved addresses,
    in which case they now fail closed with a clear error.
  * Existing operators who rely on webhook POST to 127.0.0.1 or
    ::1 (e.g., local receivers on the same host as certctl-server)
    must expose their receiver on an RFC 1918 address or public IP.
    This is deliberate — the threat model for webhook notifiers
    includes untrusted operator-supplied URLs.

Scope guard: H-4 only. H-5, H-6, M-*, L-*, and I-* findings remain
open and are tracked separately. No drive-by refactors.
2026-04-17 00:34:47 +00:00
shankar0123 3853b7460c security: reject CRLF/NUL in email headers to prevent SMTP injection (fixes H-3)
H-3 in certctl-audit-report.md: caller-supplied From/To/Subject were
interpolated directly into the SMTP DATA payload and handed to
client.Mail / client.Rcpt with no sanitization, allowing an attacker
who controls any of those values to inject extra headers (Bcc:,
Reply-To:), split the message body (CRLFCRLF), or tamper with the
SMTP envelope. CWE-113.

Fix:
- New package helper internal/validation.ValidateHeaderValue(field,
  value). Rejects CR ("\r"), LF ("\n"), and NUL ("\x00") with an error
  that names the offending field but does NOT echo the raw value,
  so log readers cannot be attacked with injected content. Silent
  stripping was considered and rejected: authentication-relevant
  headers must fail visibly.
- Two-layer defense in internal/connector/notifier/email/email.go:
    (1) primary guard at the top of sendEmail / sendHTMLEmail, which
        blocks tampering of the SMTP envelope (client.Mail, client.Rcpt)
        since net/smtp does not sanitize those arguments; and
    (2) defense-in-depth guard inside formatEmailMessage /
        formatHTMLEmailMessage, catching any future caller that
        bypasses sendEmail. Both format functions now return an error.
- Body content is intentionally NOT validated — CR/LF in body is legal
  RFC 5322 content and net/smtp handles dot-stuffing.

Tests:
- internal/validation/headers_test.go: 3 functions (AcceptsSafeInput,
  RejectsControlCharacters, DefaultFieldName) covering plain ASCII,
  UTF-8 multibyte, tabs, typical email addresses, CRLF injection,
  lone CR, lone LF, NUL, CRLFCRLF body split, trailing CR, leading LF.
  Each reject case asserts the field name IS in the error and the
  raw offending value IS NOT (anti-log-injection).
- internal/connector/notifier/email/email_test.go: added
  TestEmail_FormatEmailMessage_RejectsCRLFInjection and
  TestEmail_FormatHTMLEmailMessage_RejectsCRLFInjection. Existing
  format tests updated for the new (bytes, error) signature.

Wire-format invariants preserved:
- SMTP DATA headers still use CRLF separators and RFC 1123Z Date
  (unchanged).
- Content-Type headers unchanged (text/plain for plain, text/html +
  MIME-Version: 1.0 for HTML).
- No change to message encoding or transport.

Verification (Go 1.25.9 linux-arm64, parent e9947dc):
- go build ./...                                 clean
- go vet ./...                                   clean
- go test -race ./internal/validation/...        ok
- go test -race ./internal/connector/notifier/email/...   ok
- go test -race ./internal/connector/notifier/webhook/... ok
- Per-layer coverage gates all pass:
    validation  95.1% (+0.7 vs baseline 94.4%)
    email       39.7% (+1.4 vs baseline 38.3%)
    service     67.8% (unchanged)
    handler     78.6% (unchanged)
    middleware  80.0% (unchanged)
    domain      92.7% (unchanged)
- govulncheck ./...                              No vulnerabilities found
- golangci-lint run ./internal/validation/... ./internal/connector/notifier/email/...
                                                 0 issues

Operational note: SMTP sends that would previously deliver a
tampered message now fail fast at the notifier with a clear error.
Operators who were relying on header-injection-shaped inputs (there
should be none in practice — all callers are internal certctl code)
will see "failed to format message: <field> contains disallowed
control character" in logs.

Scope: H-3 only. H-4 (webhook SSRF) follows in a separate commit.
2026-04-17 00:08:20 +00:00
shankar0123 e9947dc0fe docs: redact V3 feature specifics from README (fixes H-7)
Problem
-------
H-7 (CWE-200 / information disclosure, strategic-policy class): the
public README's V3 section enumerated the paid-tier feature set --
"Role-based access control with profile-gating", "Event-driven
architecture with real-time operational views", "Advanced search",
"compliance scoring", "HSM/TPM integration" -- violating the
CLAUDE.md directive "Keep V3+ deliberately vague -- one-liner
descriptions only. Don't telegraph the paid feature set." The prior
wording also carried factual drift: `compliance scoring` was pulled
forward to V2.2 per the V2.2 Roadmap, so pairing it with V3 in the
README misrepresented the open-core line.

Fix
---
Replace the two-sentence enumeration at README.md:322-323 with a
single deliberately-vague sentence:

  Enterprise capabilities for larger deployments are available in
  the commercial tier.

No named features. No SKU enumeration. Matches the policy one-liner
shape used in neighboring V1 / V2 / V4+ sections. Net -1 line of
prose.

Files
-----
  README.md                          1 -, 1 +

Wire-format invariants preserved
--------------------------------
This is a docs-only change. All protocol surfaces are byte-identical:
  - RFC 7030 EST handler (internal/api/handler/est.go) -- untouched
  - RFC 8894 SCEP handler (internal/api/handler/scep.go) -- untouched
  - Shared internal/pkcs7/ package -- untouched
  - H-1 revocation composite key (migration 000012) -- untouched
  - H-2 SCEP challenge-password preflight + PKCSReq guard -- untouched
  - C-2 AES-256-GCM config encryption contract -- untouched
  - CRL DER bytes, OCSP response bytes -- untouched

Verification
------------
  git diff 387fb55 HEAD -- internal/ cmd/ migrations/ api/ deploy/
    -> 0 code changes (only README.md modified after H-1)

Operational note
----------------
No behavioral change. Product positioning only. The V3 feature set
itself remains documented in the gitignored roadmap.md / strategy.md,
which are the intended sources of truth for the paid tier.

Audit report: see /Users/shankar/Desktop/cowork/certctl-audit-report.md
2026-04-16 23:46:37 +00:00
shankar0123 b813660c74 security: require SCEP challenge password when SCEP enabled (fixes H-2)
Problem (CWE-306 Missing Authentication for Critical Function):
internal/service/scep.go PKCSReq skipped the shared-secret check when
s.challengePassword was empty. An unconfigured-but-enabled SCEP server
accepted any unauthenticated client reaching /scep and issued a
certificate against the configured issuer for any CSR with a valid
signature. No audit trail distinguished authenticated from
unauthenticated enrollments. This matches the two-layer fail-closed
pattern already used for C-2 (f549a7a): reject at startup AND reject
at the service boundary.

Fix (two layers, defense-in-depth):

Layer 1 — startup pre-flight in cmd/server/main.go:
  preflightSCEPChallengePassword returns a non-nil error when SCEP is
  enabled and CERTCTL_SCEP_CHALLENGE_PASSWORD is empty. main logs and
  os.Exit(1)s before the SCEP service is constructed. Disabled SCEP is
  unaffected. The helper is unit-testable in isolation.

Layer 2 — service-layer rejection in internal/service/scep.go:
  PKCSReq refuses enrollment when s.challengePassword == "" even though
  main already blocks this state — protects future call sites (tests,
  library reuse, a REST-over-HTTPS wrapper). When a secret is
  configured, the comparison now uses crypto/subtle.ConstantTimeCompare
  so response time does not leak the configured secret through a
  short-circuiting byte compare.

Files:
- cmd/server/main.go: preflightSCEPChallengePassword helper; call site
  inside the `if cfg.SCEP.Enabled` block before issuer lookup; fatal
  slog error references CWE-306 and names the env var so operators can
  diagnose the startup failure without reading code.
- cmd/server/main_test.go: TestPreflightSCEPChallengePassword with five
  table-driven subtests (disabled empty, disabled set, enabled empty
  rejected, enabled set, single-char boundary). The enabled-empty case
  asserts the error string contains both CERTCTL_SCEP_CHALLENGE_PASSWORD
  and CWE-306 so the log message remains actionable.
- internal/config/config.go: SCEPConfig.ChallengePassword godoc now
  states the field is REQUIRED when SCEP.Enabled and cross-references
  preflightSCEPChallengePassword.
- internal/service/scep.go: imports crypto/subtle; PKCSReq rewritten
  with the two-layer check; comment block cites H-2 / CWE-306 and the
  constant-time rationale.
- internal/service/scep_test.go: existing tests that relied on the
  vulnerable empty-password path now configure a secret on both sides.
  TestSCEPService_PKCSReq_ChallengePassword_NotRequired is replaced by
  TestSCEPService_PKCSReq_ChallengePassword_EmptyServerConfigRejected
  which iterates ["", "any-value", "guess"] against an unconfigured
  server and asserts "not configured" in the error. A new
  TestSCEPService_PKCSReq_ChallengePassword_ConstantTimeLengthIndependence
  exercises same-prefix-longer and wrong-case inputs to guard against a
  regression from ConstantTimeCompare to a short-circuiting byte compare.
- internal/service/m11c_crypto_enforcement_test.go: four tests
  (RejectsWeakKey, AcceptsStrongKey, MaxTTL_ForwardedToIssuer,
  NoProfileRepo_PassesThrough) constructed NewSCEPService with an empty
  challenge password and exercised PKCSReq through the now-rejected
  vulnerable path. All four now configure "secret123" on both sides with
  an inline H-2 comment; the crypto/MaxTTL/profile behavior they assert
  is unchanged.

Wire-format / behavioral invariants preserved:
- RFC 8894 SCEP handler is untouched (internal/api/handler/scep.go and
  internal/pkcs7/*): GetCACaps/GetCACert responses, PKIOperation request
  parsing, and the PKCS#7 certs-only response format are byte-identical.
- RFC 7030 EST handler is untouched
  (internal/api/handler/est.go + internal/pkcs7/*).
- Revocation idempotency composite key (H-1, migration 000012) untouched.
- AES-256-GCM config encryption (C-2) untouched.
- CRL DER bytes and OCSP response bytes unchanged.

Verification:
- go build ./...              silent success
- go vet ./...                silent success
- go test -race -count=1 ./internal/service/ ./cmd/server/
  ./internal/api/handler/ ./internal/integration/    all OK
- Coverage with comfortable headroom over CI gates:
    service     67.8% (gate 55%)
    handler     79.0% (gate 60%)
    domain      92.7% (gate 40%)
    middleware  80.0% (gate 30%)
    cmd/server  1.6%  (preflightSCEPChallengePassword: 100%)
  internal/service/scep.go PKCSReq statement coverage: 100%.
- rg sweeps: no `s.challengePassword != ""` remains;
  no `challengePassword != s.challengePassword` remains.

Operational note: operators with SCEP enabled but no challenge password
set will see a fatal startup error and a log line citing
CERTCTL_SCEP_CHALLENGE_PASSWORD and CWE-306 after upgrading. This is the
intended fail-closed behavior. Fix by either setting the env var to a
non-empty shared secret or setting CERTCTL_SCEP_ENABLED=false.

Audit report: certctl-audit-report.md (revision 5) logs this under
H-2 Resolution Log.
2026-04-16 22:22:51 +00:00
shankar0123 387fb555ac security: scope revocation unique index to (issuer_id, serial_number) (fixes H-1)
RFC 5280 §5.2.3 defines certificate serial number uniqueness per issuing CA,
not globally. The prior unique index on `certificate_revocations.serial_number`
enforced a stricter invariant than the spec: with 12 issuer connectors (Local
CA, ACME, Vault, step-ca, OpenSSL, DigiCert, Sectigo, Google CAS, AWS ACM PCA,
Entrust, GlobalSign, EJBCA), two distinct certificates legitimately issued by
different CAs can share a serial number. Recording a revocation for the second
collision silently dropped via `ON CONFLICT DO NOTHING`, leaving the second
cert persistently absent from OCSP/CRL responses.

Changes:

- Migration 000012 drops `idx_certificate_revocations_serial` and creates
  `idx_certificate_revocations_issuer_serial` UNIQUE ON (issuer_id,
  serial_number). Adds a non-unique `idx_certificate_revocations_serial_lookup`
  to preserve the serial-only fast path for OCSP/CRL probes that already know
  the issuer scope.
- `CertificateRevocationRepository.Create` targets the new composite key in
  `ON CONFLICT` — same-issuer idempotency preserved, cross-issuer collisions
  now recorded as distinct rows.
- `GetBySerial(serial)` renamed `GetByIssuerAndSerial(issuerID, serial)` on
  the interface and Postgres impl. All callers (OCSP responder, CRL
  generator, short-lived-cert exemption check) already have `issuerID` in
  scope because the protocol paths carry it (`/api/v1/ocsp/{issuer_id}/{serial}`,
  `/api/v1/crl/{issuer_id}`).
- Repository integration test added: `TestRevocationRepository_CrossIssuerSerialCollision`
  asserts that serial `CAFEBABE01` can be stored under two issuers
  simultaneously, that lookups return the correct row per (issuer, serial),
  and that same-issuer idempotency still works (re-inserting (issuer, serial)
  does not error and does not duplicate).
- Existing tests and service/integration mocks updated for the rename.

Wire-format invariants preserved: CRL DER bytes, OCSP response bytes, and
AES-256-GCM config encryption are unaffected — this change touches only
revocation-record uniqueness scope.

CWE-664.
2026-04-16 21:49:59 +00:00
shankar0123 f549a7aa79 security: fail closed when CERTCTL_CONFIG_ENCRYPTION_KEY is unset (fixes C-2)
EncryptIfKeySet/DecryptIfKeySet in internal/crypto/encryption.go previously
returned plaintext + wasEncrypted=false when the operator had not configured
CERTCTL_CONFIG_ENCRYPTION_KEY. That produced a data-at-rest confidentiality
bypass (CWE-311): sensitive fields on dynamically-configured issuer and
target rows (source='database') were persisted to PostgreSQL without any
encryption, and no caller could distinguish the encrypted from the plaintext
branch at runtime. The only visible signal was a single warning log line
emitted once at startup.

Fail closed instead:

- EncryptIfKeySet / DecryptIfKeySet now return crypto.ErrEncryptionKeyRequired
  (a new exported sentinel, errors.Is-unwrappable) when the key is empty or
  nil, rather than silently emitting plaintext. The (result, wasEncrypted,
  err) tuple signature is preserved for source compatibility; only the
  semantics of the no-key branch changed.

- cmd/server/main.go grows a startup pre-flight check: if no encryption key
  is configured the server lists issuers and targets, counts rows with
  source='database', and refuses to start (os.Exit(1)) if any exist. Operators
  must either configure CERTCTL_CONFIG_ENCRYPTION_KEY or remove the exposed
  rows before the control plane can boot. The warning-only path is retained
  for the clean-slate case (no database rows).

- internal/service/issuer.go's SeedFromEnvVars now guards the encryption call
  with len(s.encryptionKey) > 0 so env-seeded rows (source='env', which are
  reconstructable on every boot from process env) continue to persist as
  plaintext in the 'config' column when no key is configured. Registry load
  already falls through to cfg.Config when EncryptedConfig is nil. GUI/API
  write paths (source='database') remain fail-closed via propagation of
  ErrEncryptionKeyRequired.

- Integration tests that exercise CreateIssuer via the handler layer now
  supply a real 32-byte AES-256 test key so the encrypt path runs instead of
  returning ErrEncryptionKeyRequired. Same pattern in internal/service/
  testutil_test.go for consolidated service-layer tests.

- internal/crypto/encryption_test.go grows regression guards:
  TestEncryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
  TestDecryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
  TestEncryptDecryptIfKeySet_RoundTripProducesDifferentCiphertext,
  TestDecryptIfKeySet_RejectsTamperedCiphertext, and
  TestEncryptIfKeySet_PreservesErrEncryptionKeyRequiredSentinel (verifies
  the sentinel unwraps through fmt.Errorf(%w)-style wrapping).

Wire format is unchanged: AES-256-GCM Encrypt/Decrypt/DeriveKey, the
12-byte nonce prefix, the GCM auth tag, the PBKDF2 salt
('certctl-config-encryption-v1'), and the 100,000 iteration count are all
byte-identical. Ciphertexts produced before this change remain decryptable.

Verified:
- go build ./... : clean
- go vet ./...   : clean
- go test -race ./internal/crypto/... ./internal/service/... \
    ./internal/integration/... ./cmd/server/... : pass
- golangci-lint run ./... : 0 issues
- govulncheck ./... : 0 reachable vulnerabilities
- rg 'return plaintext, false, nil' internal/ : no matches
- Coverage: crypto 85.0% (unchanged), service 67.8% (was 67.9%, noise),
  cmd/server 0.0% (unchanged baseline). All above CI thresholds.

See certctl-audit-report.md for the full finding record and resolution log.
2026-04-16 21:10:40 +00:00
shankar0123 b219e5d68a security: use crypto/rand for agent API keys (fixes C-1)
Replaces math/rand-based agent API key generation in internal/service/agent.go
with crypto/rand.Read over a 32-byte buffer encoded with base64.RawURLEncoding,
yielding a 43-character URL-safe unpadded ASCII string (256 bits of entropy).

generateAPIKey now returns (string, error); Register and RegisterAgent propagate
entropy-source failures. hashAPIKey is unchanged — the SHA-256 hashed-at-rest
invariant is preserved.

Fixes C-1 (CWE-338: Use of Cryptographically Weak Pseudo-Random Number Generator)
from certctl-audit-report.md.

Changes:
- internal/service/agent.go: new imports (crypto/rand, encoding/base64);
  generateAPIKey rewritten to return (string, error); Register and RegisterAgent
  updated to propagate the error.
- internal/service/agent_test.go: TestGenerateAPIKey_Properties regression test
  (non-empty, length 43, valid base64url, 32 decoded bytes, no collisions over
  64 calls). No entropy-failure test — Go 1.24+ (issue #66821) makes crypto/rand
  errors fatal, so that branch is defensively unreachable.

Verification:
- go build ./cmd/server/... ./cmd/agent/... ./cmd/mcp-server/... ./cmd/cli/... → pass
- go vet ./... → pass
- go test -race (CI scope, 43 packages) → pass
- golangci-lint v2.11.4 run ./... → 0 issues
- govulncheck ./... → 0 vulnerabilities in certctl code
- Coverage: service 68.9% / handler 83.6% / domain 82.0% / middleware 63.8%
  (all above CI gates 55/60/40/30)
- grep math/rand in internal/ and cmd/ → zero production hits
- No caller assumes the old 32-char length or legacy charset
2026-04-16 19:43:19 +00:00
181 changed files with 16424 additions and 1853 deletions
+13 -2
View File
@@ -45,11 +45,11 @@ jobs:
run: govulncheck ./...
- name: Race Detection
run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/crypto/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
- name: Go Test with Coverage
run: |
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
- name: Check Coverage Thresholds
run: |
@@ -73,6 +73,13 @@ jobs:
MIDDLEWARE_COV=$(go tool cover -func=coverage.out | grep 'internal/api/middleware' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
echo "Middleware layer coverage: ${MIDDLEWARE_COV}%"
# Check crypto package coverage (target: 85%+)
# M-8 rationale: encryption primitives are a security-critical gate.
# v2 format, key-derivation, fallback, and fail-closed sentinel paths
# all need exhaustive coverage to avoid silent regressions (CWE-916 / CWE-329).
CRYPTO_COV=$(go tool cover -func=coverage.out | grep 'internal/crypto' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
echo "Crypto package coverage: ${CRYPTO_COV}%"
# Fail if thresholds not met
if [ "$(echo "$SERVICE_COV < 55" | bc -l)" -eq 1 ]; then
echo "::error::Service layer coverage ${SERVICE_COV}% is below 55% threshold"
@@ -90,6 +97,10 @@ jobs:
echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
exit 1
fi
if [ "$(echo "$CRYPTO_COV < 85" | bc -l)" -eq 1 ]; then
echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 85% threshold"
exit 1
fi
echo "Coverage thresholds passed!"
- name: Upload Coverage Report
+292 -43
View File
@@ -7,40 +7,30 @@ on:
env:
REGISTRY: ghcr.io
GO_VERSION: '1.22'
# Keep in lock-step with .github/workflows/ci.yml (M-3).
GO_VERSION: '1.25.9'
IMAGE_NAMESPACE: shankar0123
jobs:
# Cross-compile agent and server binaries for multiple platforms
# ----------------------------------------------------------------------
# build-binaries (M-3): matrix build every (binary × OS × arch) tuple.
# For each tuple we produce: the binary, a SPDX-JSON SBOM, a keyless
# Cosign signature + certificate bundle, and a single-line sha256sum
# file. All artefacts are uploaded to a workflow-scoped artifact; the
# aggregate-checksums job fans them back in for release upload.
# ----------------------------------------------------------------------
build-binaries:
name: Build Cross-Platform Binaries
name: Build ${{ matrix.binary }} (${{ matrix.os }}/${{ matrix.arch }})
runs-on: ubuntu-latest
permissions:
contents: write
contents: read
id-token: write # Cosign keyless OIDC identity token
strategy:
fail-fast: false
matrix:
include:
# Agent binaries (4 platforms)
- os: linux
arch: amd64
binary: agent
- os: linux
arch: arm64
binary: agent
- os: darwin
arch: amd64
binary: agent
- os: darwin
arch: arm64
binary: agent
# Server binaries (2 platforms)
- os: linux
arch: amd64
binary: server
- os: linux
arch: arm64
binary: server
binary: [agent, server, cli, mcp-server]
os: [linux, darwin]
arch: [amd64, arm64]
steps:
- uses: actions/checkout@v4
@@ -51,35 +41,174 @@ jobs:
- name: Extract version from tag
id: version
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
- name: Build ${{ matrix.binary }} binary (${{ matrix.os }}-${{ matrix.arch }})
- name: Build binary
id: build
env:
GOOS: ${{ matrix.os }}
GOARCH: ${{ matrix.arch }}
CGO_ENABLED: 0
CGO_ENABLED: '0'
VERSION: ${{ steps.version.outputs.VERSION }}
run: |
set -euo pipefail
OUTPUT_NAME="certctl-${{ matrix.binary }}-${{ matrix.os }}-${{ matrix.arch }}"
go build -ldflags="-w -s -X main.Version=${{ steps.version.outputs.VERSION }}" \
mkdir -p dist
go build \
-trimpath \
-ldflags="-w -s -X main.Version=${VERSION}" \
-o "dist/${OUTPUT_NAME}" \
"./cmd/${{ matrix.binary }}"
ls -lh "dist/${OUTPUT_NAME}"
echo "output_name=${OUTPUT_NAME}" >> "$GITHUB_OUTPUT"
- name: Upload binaries to release
- name: Generate SBOM (SPDX-JSON)
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
with:
file: dist/${{ steps.build.outputs.output_name }}
format: spdx-json
output-file: dist/${{ steps.build.outputs.output_name }}.sbom.spdx.json
upload-artifact: false
upload-release-assets: false
- name: Install Cosign
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
- name: Keyless-sign binary with Cosign
env:
OUTPUT_NAME: ${{ steps.build.outputs.output_name }}
run: |
set -euo pipefail
# Cosign v3.0 (shipped by cosign-installer@v4.1.1 default
# cosign-release=v3.0.5) removed --output-signature/--output-certificate
# on sign-blob. The replacement is --bundle, which emits a unified
# Sigstore bundle (signature + cert chain + Rekor inclusion proof) as
# a single .sigstore.json artefact. M-11.
cosign sign-blob \
--yes \
--bundle "dist/${OUTPUT_NAME}.sigstore.json" \
"dist/${OUTPUT_NAME}"
- name: Compute SHA-256 sidecar
env:
OUTPUT_NAME: ${{ steps.build.outputs.output_name }}
run: |
set -euo pipefail
cd dist
sha256sum "${OUTPUT_NAME}" > "${OUTPUT_NAME}.sha256"
cat "${OUTPUT_NAME}.sha256"
- name: Upload build artefacts
uses: actions/upload-artifact@v4
with:
name: binary-${{ steps.build.outputs.output_name }}
path: |
dist/${{ steps.build.outputs.output_name }}
dist/${{ steps.build.outputs.output_name }}.sigstore.json
dist/${{ steps.build.outputs.output_name }}.sbom.spdx.json
dist/${{ steps.build.outputs.output_name }}.sha256
if-no-files-found: error
retention-days: 7
# ----------------------------------------------------------------------
# aggregate-checksums (M-3): fan in every matrix artefact, produce a
# single checksums.txt (sha256sum format, compatible with `sha256sum
# -c`), sign it with Cosign, upload everything to the GitHub Release,
# and emit a base64-encoded hash manifest for the SLSA generator.
# ----------------------------------------------------------------------
aggregate-checksums:
name: Aggregate checksums & sign
runs-on: ubuntu-latest
needs: [build-binaries]
permissions:
contents: write
id-token: write # Cosign keyless OIDC identity token
outputs:
hashes: ${{ steps.hashes.outputs.hashes }}
steps:
- name: Download binary artefacts
uses: actions/download-artifact@v4
with:
pattern: binary-*
path: artifacts
merge-multiple: true
- name: Aggregate SHA-256 sums
id: hashes
run: |
set -euo pipefail
cd artifacts
: > checksums.txt
for f in certctl-*; do
case "$f" in
*.sigstore.json|*.sbom.spdx.json|*.sha256|checksums.txt)
continue ;;
esac
sha256sum "$f" >> checksums.txt
done
echo "=== checksums.txt ==="
cat checksums.txt
# base64 hashes (single line, no wrapping) for SLSA generator.
HASHES=$(base64 -w0 < checksums.txt)
echo "hashes=${HASHES}" >> "$GITHUB_OUTPUT"
- name: Install Cosign
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
- name: Keyless-sign checksums.txt
run: |
set -euo pipefail
cd artifacts
# Cosign v3.0 --bundle replaces the removed v2 flag pair
# --output-signature / --output-certificate. See M-11.
cosign sign-blob \
--yes \
--bundle checksums.txt.sigstore.json \
checksums.txt
- name: Upload artefacts to GitHub Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/')
with:
files: |
dist/certctl-agent-*
dist/certctl-server-*
artifacts/certctl-*
artifacts/checksums.txt
artifacts/checksums.txt.sigstore.json
# Build and push Docker images
# ----------------------------------------------------------------------
# provenance-binaries (M-3): SLSA Level 3 provenance for every binary.
# The SLSA generic generator reusable workflow runs in a hermetic
# workflow run, producing multiple.intoto.jsonl from the base64 hash
# manifest and uploading it as a release asset.
# ----------------------------------------------------------------------
provenance-binaries:
name: SLSA provenance (binaries)
needs: [aggregate-checksums]
permissions:
actions: read
id-token: write
contents: write
uses: slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.1.0
with:
base64-subjects: "${{ needs.aggregate-checksums.outputs.hashes }}"
upload-assets: true
provenance-name: multiple.intoto.jsonl
# ----------------------------------------------------------------------
# build-and-push-docker: push container images to GHCR with native
# SLSA L3 provenance (mode=max) and SBOM attestations emitted by
# docker/build-push-action@v6, plus a keyless Cosign signature on the
# image digest for identity-bound verification. The M-4 proxy-propagation
# build-args block is retained verbatim — M-3 only adds supply-chain
# steps; it never touches M-4 wiring.
# ----------------------------------------------------------------------
build-and-push-docker:
name: Build & Push Docker Images
runs-on: ubuntu-latest
permissions:
contents: write
packages: write
id-token: write # Cosign keyless OIDC identity token
steps:
- uses: actions/checkout@v4
@@ -93,40 +222,90 @@ jobs:
- name: Extract version from tag
id: version
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Install Cosign
uses: sigstore/cosign-installer@cad07c2e89fa2edd6e2d7bab4c1aa38e53f76003 # v4.1.1
- name: Build and push server image
id: server-push
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: true
tags: |
${{ env.REGISTRY }}/shankar0123/certctl-server:${{ steps.version.outputs.VERSION }}
${{ env.REGISTRY }}/shankar0123/certctl-server:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server:${{ steps.version.outputs.VERSION }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server:latest
# Proxy propagation (M-4, Issue #9) — forwards runner-level proxy
# secrets into the Docker build so self-hosted runners behind
# corporate proxies can reach public registries. GitHub-hosted
# runners don't need proxies, so the secrets are optional and
# resolve to empty strings when unset — byte-identical to the
# pre-fix behaviour for the public-runner path.
build-args: |
HTTP_PROXY=${{ secrets.HTTP_PROXY }}
HTTPS_PROXY=${{ secrets.HTTPS_PROXY }}
NO_PROXY=${{ secrets.NO_PROXY }}
# Supply-chain hardening (M-3): emit native SLSA L3 provenance
# and SBOM attestations bound to the image manifest.
provenance: mode=max
sbom: true
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Keyless-sign server image with Cosign
env:
DIGEST: ${{ steps.server-push.outputs.digest }}
IMAGE: ${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-server
run: |
set -euo pipefail
cosign sign --yes "${IMAGE}@${DIGEST}"
- name: Build and push agent image
id: agent-push
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile.agent
push: true
tags: |
${{ env.REGISTRY }}/shankar0123/certctl-agent:${{ steps.version.outputs.VERSION }}
${{ env.REGISTRY }}/shankar0123/certctl-agent:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent:${{ steps.version.outputs.VERSION }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent:latest
# Proxy propagation (M-4, Issue #9) — see server-image step for
# rationale. Empty secrets resolve to empty build args, leaving
# the un-proxied code path byte-identical to the pre-fix tree.
build-args: |
HTTP_PROXY=${{ secrets.HTTP_PROXY }}
HTTPS_PROXY=${{ secrets.HTTPS_PROXY }}
NO_PROXY=${{ secrets.NO_PROXY }}
# Supply-chain hardening (M-3): emit native SLSA L3 provenance
# and SBOM attestations bound to the image manifest.
provenance: mode=max
sbom: true
cache-from: type=gha
cache-to: type=gha,mode=max
# Create release notes with all artifacts
- name: Keyless-sign agent image with Cosign
env:
DIGEST: ${{ steps.agent-push.outputs.digest }}
IMAGE: ${{ env.REGISTRY }}/${{ env.IMAGE_NAMESPACE }}/certctl-agent
run: |
set -euo pipefail
cosign sign --yes "${IMAGE}@${DIGEST}"
# ----------------------------------------------------------------------
# create-release: stamp the release body. The actual asset uploads are
# handled by aggregate-checksums (binaries, SBOMs, sigs, certs,
# checksums.txt + signature) and the SLSA generator (multiple.intoto.jsonl).
# ----------------------------------------------------------------------
create-release:
name: Create Release Notes
runs-on: ubuntu-latest
needs: [build-binaries, build-and-push-docker]
needs: [build-binaries, aggregate-checksums, provenance-binaries, build-and-push-docker]
permissions:
contents: write
@@ -135,7 +314,7 @@ jobs:
- name: Extract version from tag
id: version
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
run: echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
- name: Create release with notes
uses: softprops/action-gh-release@v2
@@ -197,6 +376,76 @@ jobs:
- **Linux x86_64**: `certctl-server-linux-amd64`
- **Linux ARM64**: `certctl-server-linux-arm64`
- **macOS x86_64**: `certctl-server-darwin-amd64`
- **macOS ARM64 (Apple Silicon)**: `certctl-server-darwin-arm64`
## CLI & MCP Server Binaries
The `certctl-cli` (REST API wrapper) and `certctl-mcp-server` (Model Context
Protocol bridge) binaries ship for all four platforms as well:
- `certctl-cli-{linux,darwin}-{amd64,arm64}`
- `certctl-mcp-server-{linux,darwin}-{amd64,arm64}`
## Verifying this release
Every binary, `checksums.txt`, and container image is signed with Cosign
keyless OIDC. Each binary ships with a SPDX-JSON SBOM. Binaries are covered
by SLSA Level 3 provenance; container images carry native SLSA L3 provenance
and SBOM attestations (docker/build-push-action `provenance: mode=max`,
`sbom: true`) in addition to a Cosign signature on the digest.
**1. Verify SHA-256 checksums:**
```bash
sha256sum -c checksums.txt
```
**2. Verify the Cosign signature on checksums.txt (keyless OIDC):**
```bash
cosign verify-blob \
--bundle checksums.txt.sigstore.json \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
checksums.txt
```
Replace `checksums.txt` with any individual binary name to verify that
artefact directly (each binary ships with its own `.sigstore.json`
bundle, e.g. `cosign verify-blob --bundle certctl-agent-linux-amd64.sigstore.json …`).
**3. Verify SLSA Level 3 provenance (binaries):**
```bash
slsa-verifier verify-artifact \
--provenance-path multiple.intoto.jsonl \
--source-uri github.com/shankar0123/certctl \
--source-tag ${{ steps.version.outputs.VERSION }} \
certctl-agent-linux-amd64
```
**4. Verify container image signature and attestations:**
```bash
IMAGE=ghcr.io/shankar0123/certctl-server:${{ steps.version.outputs.VERSION }}
cosign verify \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
# SBOM attestation (SPDX-JSON) emitted by docker/build-push-action
cosign verify-attestation --type spdxjson \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
# SLSA provenance attestation (mode=max)
cosign verify-attestation --type slsaprovenance \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
```
## Helm Chart
+5
View File
@@ -72,3 +72,8 @@ SECURITY_REMEDIATION.md
.DS_Store
Thumbs.db
mcp-server
# Local Go build/module caches (session-scoped, never committed)
/.gocache/
/.gomodcache/
/.gopath/
+1
View File
@@ -6,6 +6,7 @@ run:
linters:
default: none
enable:
- contextcheck
- govet
- staticcheck
- unused
+27
View File
@@ -3,6 +3,22 @@
# Stage 1: Build frontend
FROM node:20-alpine AS frontend
# Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
# behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
# `NO_PROXY` are forwarded via `docker build --build-arg` (or compose
# `build.args`), they are re-exported as ENV with both upper- and lower-case
# names because npm/apk/curl read the lowercase variants while Go, Node, and
# most HTTP libraries read the uppercase ones.
ARG HTTP_PROXY=
ARG HTTPS_PROXY=
ARG NO_PROXY=
ENV HTTP_PROXY=${HTTP_PROXY} \
HTTPS_PROXY=${HTTPS_PROXY} \
NO_PROXY=${NO_PROXY} \
http_proxy=${HTTP_PROXY} \
https_proxy=${HTTPS_PROXY} \
no_proxy=${NO_PROXY}
WORKDIR /app/web
COPY web/ .
@@ -13,6 +29,17 @@ RUN npm ci --include=dev || npm ci --include=dev && \
# Stage 2: Build Go binary
FROM golang:1.25-alpine AS builder
# Proxy propagation (M-4, Issue #9) — see Stage 1 rationale.
ARG HTTP_PROXY=
ARG HTTPS_PROXY=
ARG NO_PROXY=
ENV HTTP_PROXY=${HTTP_PROXY} \
HTTPS_PROXY=${HTTPS_PROXY} \
NO_PROXY=${NO_PROXY} \
http_proxy=${HTTP_PROXY} \
https_proxy=${HTTPS_PROXY} \
no_proxy=${NO_PROXY}
RUN apk add --no-cache git ca-certificates tzdata
WORKDIR /app
+16
View File
@@ -2,6 +2,22 @@
# Stage 1: Build
FROM golang:1.25-alpine AS builder
# Proxy propagation (M-4, Issue #9) — defaulted to empty so un-proxied builds
# behave identically to the pre-fix tree. When `HTTP_PROXY`/`HTTPS_PROXY`/
# `NO_PROXY` are forwarded via `docker build --build-arg` (or compose
# `build.args`), they are re-exported as ENV with both upper- and lower-case
# names because apk and curl read the lowercase variants while Go reads the
# uppercase ones.
ARG HTTP_PROXY=
ARG HTTPS_PROXY=
ARG NO_PROXY=
ENV HTTP_PROXY=${HTTP_PROXY} \
HTTPS_PROXY=${HTTPS_PROXY} \
NO_PROXY=${NO_PROXY} \
http_proxy=${HTTP_PROXY} \
https_proxy=${HTTPS_PROXY} \
no_proxy=${NO_PROXY}
RUN apk add --no-cache git ca-certificates
WORKDIR /app
+69 -1
View File
@@ -237,6 +237,74 @@ docker pull shankar0123.docker.scarf.sh/certctl-server
docker pull shankar0123.docker.scarf.sh/certctl-agent
```
## Verifying this release
Every `v*` tag publishes signed, attested release artefacts. Binaries
(`certctl-agent`, `certctl-server`, `certctl-cli`, `certctl-mcp-server` for
`linux|darwin × amd64|arm64`) ship alongside a `checksums.txt`, per-binary
SPDX-JSON SBOMs, Cosign signatures, and SLSA Level 3 provenance. Container
images on `ghcr.io/shankar0123/certctl-{server,agent}` are built with
`docker/build-push-action` `provenance: mode=max` + `sbom: true` and are
additionally signed with Cosign at the image digest.
All signatures use Cosign keyless OIDC; the signing identity is the
release workflow running on a signed tag.
**1. Verify SHA-256 checksums:**
```bash
sha256sum -c checksums.txt
```
**2. Verify the Cosign signature on `checksums.txt`:**
```bash
cosign verify-blob \
--bundle checksums.txt.sigstore.json \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
checksums.txt
```
Every individual binary ships with its own `.sigstore.json` bundle
(unified Sigstore bundle containing signature, certificate chain, and
Rekor inclusion proof). Swap `checksums.txt` for any binary name and
point `--bundle` at the matching `<binary>.sigstore.json` to verify it
directly.
**3. Verify SLSA Level 3 provenance on a binary:**
```bash
slsa-verifier verify-artifact \
--provenance-path multiple.intoto.jsonl \
--source-uri github.com/shankar0123/certctl \
--source-tag v2.1.0 \
certctl-agent-linux-amd64
```
**4. Verify a container image signature and its SBOM / provenance attestations:**
```bash
IMAGE=ghcr.io/shankar0123/certctl-server:v2.1.0
cosign verify \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/\.github/workflows/release\.yml@refs/tags/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
# SBOM attestation (SPDX-JSON, emitted by docker/build-push-action)
cosign verify-attestation --type spdxjson \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
# SLSA provenance attestation (docker/build-push-action `provenance: mode=max`)
cosign verify-attestation --type slsaprovenance \
--certificate-identity-regexp '^https://github\.com/shankar0123/certctl/' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$IMAGE"
```
## Examples
Pick the scenario closest to your setup and have it running in 2 minutes.
@@ -320,7 +388,7 @@ Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector
30+ milestones shipping enterprise-grade features for free. Sub-CA mode, ACME DNS-01/DNS-PERSIST-01/EAB/ARI (RFC 9773)/profile selection, step-ca, Vault PKI, DigiCert CertCentral, Sectigo SCM, Google CAS, AWS ACM PCA, Entrust, GlobalSign, EJBCA, OpenSSL/Custom CA issuers. NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets targets. EST server (RFC 7030) and SCEP server (RFC 8894) enrollment protocols. RFC 5280 revocation with DER CRL + embedded OCSP responder. Certificate profiles, ownership tracking, team assignment, agent groups, interactive approval workflows. Filesystem, network, and cloud secret manager (AWS SM, Azure KV, GCP SM) certificate discovery with triage GUI. Dynamic issuer/target configuration via GUI with AES-256-GCM encrypted storage. First-run onboarding wizard. Post-deployment TLS verification. Certificate export (PEM/PKCS#12). S/MIME support. Prometheus metrics. Scheduled certificate digest emails. Slack, Teams, PagerDuty, OpsGenie, SMTP notifications. MCP server (80 tools), CLI (12 commands), Helm chart. Compliance mapping (SOC 2, PCI-DSS 4.0, NIST SP 800-57). 5 turnkey deployment examples. Agent install script. Migration guides from certbot, acme.sh, and cert-manager. See the [Feature Inventory](docs/features.md) for details.
### V3: certctl Pro
Team access controls and identity provider integration. Role-based access control with profile-gating. Event-driven architecture with real-time operational views. Advanced search, compliance scoring, and HSM/TPM integration.
Enterprise capabilities for larger deployments are available in the commercial tier.
### V4+: Cloud & Scale
Kubernetes cert-manager external issuer, cloud infrastructure targets, extended CA support, and platform-scale features.
+633 -46
View File
@@ -29,7 +29,11 @@ tags:
- name: Certificates
description: Certificate lifecycle — CRUD, versions, renewal, deployment, revocation
- name: CRL & OCSP
description: Certificate revocation list and OCSP responder
description: |
Certificate revocation list (RFC 5280) and OCSP responder (RFC 6960).
Served unauthenticated under `/.well-known/pki/*` (RFC 8615) so
relying parties can retrieve revocation status without a certctl
API key.
- name: Issuers
description: CA issuer connector management (Local CA, ACME, step-ca)
- name: Targets
@@ -66,6 +70,12 @@ tags:
description: Continuous TLS endpoint health checks with status tracking and probe history
- name: Digest
description: Scheduled certificate digest email notifications
- name: Verification
description: Post-deployment TLS endpoint fingerprint verification
- name: EST
description: Enrollment over Secure Transport (RFC 7030)
- name: SCEP
description: Simple Certificate Enrollment Protocol (RFC 8894)
paths:
# ─── Health & Auth ───────────────────────────────────────────────────
@@ -487,50 +497,28 @@ paths:
"500":
$ref: "#/components/responses/InternalError"
# ─── CRL & OCSP ─────────────────────────────────────────────────────
/api/v1/crl:
# ─── PKI (CRL & OCSP, RFC 5280 / 6960 / 8615) ──────────────────────
#
# Relying parties (browsers, OpenSSL clients, OCSP stapling sidecars,
# mTLS clients) cannot present a certctl Bearer token, so these two
# endpoints are unauthenticated and live under the RFC 8615
# `.well-known` namespace. They were previously mounted at
# /api/v1/crl/{issuer_id} and /api/v1/ocsp/{issuer_id}/{serial}; those
# paths were removed in M-006.
#
# The non-standard JSON CRL endpoint (GET /api/v1/crl) was also
# removed — RFC 5280 defines only the DER wire format.
/.well-known/pki/crl/{issuer_id}:
get:
tags: [CRL & OCSP]
summary: Get JSON CRL
description: Returns all revoked certificates in JSON format.
operationId: getCRL
responses:
"200":
description: JSON CRL
content:
application/json:
schema:
type: object
properties:
version:
type: integer
example: 1
entries:
type: array
items:
type: object
properties:
serial_number:
type: string
revocation_date:
type: string
format: date-time
revocation_reason:
type: string
total:
type: integer
generated_at:
type: string
format: date-time
"500":
$ref: "#/components/responses/InternalError"
/api/v1/crl/{issuer_id}:
get:
tags: [CRL & OCSP]
summary: Get DER-encoded X.509 CRL
description: Returns a proper DER-encoded CRL signed by the issuing CA. 24-hour validity.
summary: Get DER-encoded X.509 CRL (RFC 5280)
description: |
Returns a DER-encoded CRL signed by the issuing CA (RFC 5280 §5),
served unauthenticated per RFC 8615 `.well-known` semantics so
relying parties can retrieve it without a certctl API key.
Validity is 24 hours.
operationId: getDERCRL
security: []
parameters:
- name: issuer_id
in: path
@@ -554,12 +542,17 @@ paths:
"501":
description: Issuer does not support CRL generation
/api/v1/ocsp/{issuer_id}/{serial}:
/.well-known/pki/ocsp/{issuer_id}/{serial}:
get:
tags: [CRL & OCSP]
summary: OCSP responder
description: Returns signed OCSP response (good/revoked/unknown) for the given serial number.
summary: OCSP responder (RFC 6960)
description: |
Returns a signed OCSP response (good/revoked/unknown) for the
given serial number per RFC 6960 §2.1, served unauthenticated
per RFC 8615 so relying parties and OCSP stapling sidecars can
query revocation status without a certctl API key.
operationId: handleOCSP
security: []
parameters:
- name: issuer_id
in: path
@@ -816,6 +809,28 @@ paths:
"500":
$ref: "#/components/responses/InternalError"
/api/v1/targets/{id}/test:
post:
tags: [Targets]
summary: Test target connection
description: |
Checks target connectivity by verifying the assigned agent's heartbeat status
(agent reported within the last 5 minutes). Always returns HTTP 200 — the
connectivity result is reflected in the response body's `status` field
(`success` when the agent is reachable, `failed` otherwise).
operationId: testTargetConnection
parameters:
- $ref: "#/components/parameters/resourceId"
responses:
"200":
description: Connection test result (success or failed in body)
content:
application/json:
schema:
$ref: "#/components/schemas/StatusMessageResponse"
"400":
$ref: "#/components/responses/BadRequest"
# ─── Agents ──────────────────────────────────────────────────────────
/api/v1/agents:
get:
@@ -865,6 +880,40 @@ paths:
"500":
$ref: "#/components/responses/InternalError"
/api/v1/agents/retired:
get:
tags: [Agents]
summary: List retired agents
description: |
I-004: opt-in listing of soft-retired agents. The default
`GET /api/v1/agents` endpoint filters retired rows out; this is the
dedicated surface for reading them back (e.g., the operator UI's
"Retired" tab, audit and forensics workflows). Pagination defaults
match the default agent listing (page=1, per_page=50, max 500). Go
1.22's enhanced ServeMux routes `/agents/retired` to this handler
via the literal-beats-pattern-var precedence rule, so the sibling
`/agents/{id}` route does not shadow it.
operationId: listRetiredAgents
parameters:
- $ref: "#/components/parameters/page"
- $ref: "#/components/parameters/per_page"
responses:
"200":
description: Paginated list of retired agents
content:
application/json:
schema:
allOf:
- $ref: "#/components/schemas/PaginationEnvelope"
- type: object
properties:
data:
type: array
items:
$ref: "#/components/schemas/Agent"
"500":
$ref: "#/components/responses/InternalError"
/api/v1/agents/{id}:
get:
tags: [Agents]
@@ -885,12 +934,116 @@ paths:
$ref: "#/components/responses/NotFound"
"500":
$ref: "#/components/responses/InternalError"
delete:
tags: [Agents]
summary: Soft-retire agent
description: |
I-004: soft-retirement. The agent row is preserved (so its audit
trail and historical job links remain intact) and `retired_at` is
stamped. A retired agent receives `410 Gone` on subsequent
heartbeats so it can shut down cleanly.
Behavior matrix:
| Scenario | Query | Status | Body |
| --- | --- | --- | --- |
| Clean retire (no active dependencies) | none | `200` | `RetireAgentResponse` with `cascade=false`, zero counts |
| Blocked by active targets/certs/jobs | none | `409` | `BlockedByDependenciesResponse` with per-bucket counts |
| Force-cascade retire | `force=true&reason=...` | `200` | `RetireAgentResponse` with `cascade=true`, pre-cascade counts |
| Idempotent re-retire | either | `204` | (empty — downstream consumers break on stray bodies) |
| `force=true` without reason | `force=true` | `400` | ErrorResponse (ErrForceReasonRequired) |
| Reserved sentinel agent | any | `403` | ErrorResponse (ErrAgentIsSentinel) |
| Unknown agent id | any | `404` | ErrorResponse |
Sentinel agents are the four reserved identities backing non-agent
discovery subsystems (`server-scanner`, `cloud-aws-sm`,
`cloud-azure-kv`, `cloud-gcp-sm`). Retiring them would orphan the
scanner or a cloud secret-manager source, so the handler refuses
unconditionally — even with `force=true`.
operationId: retireAgent
parameters:
- $ref: "#/components/parameters/resourceId"
- name: force
in: query
required: false
schema:
type: boolean
default: false
description: |
Cascade-retire active downstream targets, certificates, and
jobs. When `true`, a non-empty `reason` is required. A
malformed value (anything strconv.ParseBool rejects) is
silently treated as `false` so a typoed query can never
accidentally enable the cascade.
- name: reason
in: query
required: false
schema:
type: string
description: |
Human-readable reason recorded on the retired row and in the
immutable audit trail. Required (non-empty after trimming)
when `force=true`.
responses:
"200":
description: |
Agent retired (clean retire or successful force-cascade). Body
is `RetireAgentResponse`.
content:
application/json:
schema:
$ref: "#/components/schemas/RetireAgentResponse"
"204":
description: |
Idempotent retire — the agent was already retired. Response
body is empty (the 200-path shape does not apply, and
downstream clients that tee responses into dashboards would
break on spurious bodies).
"400":
description: |
`force=true` was sent without a non-empty `reason`
(ErrForceReasonRequired).
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
"403":
description: |
Agent is a reserved sentinel and cannot be retired even with
`?force=true` (ErrAgentIsSentinel).
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
"404":
$ref: "#/components/responses/NotFound"
"409":
description: |
Blocked by active downstream dependencies. Body carries
per-bucket counts so the operator UI can show the user which
dependency is holding up the retire. Re-run with
`?force=true&reason=...` to cascade.
content:
application/json:
schema:
$ref: "#/components/schemas/BlockedByDependenciesResponse"
"405":
description: Method not allowed (only DELETE, GET are routed to this path)
"500":
$ref: "#/components/responses/InternalError"
/api/v1/agents/{id}/heartbeat:
post:
tags: [Agents]
summary: Agent heartbeat
description: Reports agent liveness and metadata (OS, architecture, IP, version).
description: |
Reports agent liveness and metadata (OS, architecture, IP, version).
I-004: a retired agent still polling the heartbeat endpoint receives
`410 Gone` so `cmd/agent` detects the terminal signal and shuts down
cleanly instead of looping forever against a decommissioned identity.
The retired-agent check runs before any "not found" string match so
it can never be masked by a sibling error branch.
operationId: agentHeartbeat
parameters:
- $ref: "#/components/parameters/resourceId"
@@ -921,6 +1074,14 @@ paths:
$ref: "#/components/responses/BadRequest"
"404":
$ref: "#/components/responses/NotFound"
"410":
description: |
I-004: the agent has been soft-retired. The agent process should
treat this as a terminal signal and shut down cleanly.
content:
application/json:
schema:
$ref: "#/components/schemas/ErrorResponse"
"500":
$ref: "#/components/responses/InternalError"
@@ -1177,6 +1338,66 @@ paths:
"500":
$ref: "#/components/responses/InternalError"
/api/v1/jobs/{id}/verify:
post:
tags: [Verification]
summary: Record post-deployment verification result
description: |
Agents submit the result of probing a deployed certificate's live TLS endpoint.
Compares the served certificate's SHA-256 fingerprint against the expected
fingerprint. Best-effort: failures are recorded on the job but do not roll
back the deployment.
operationId: verifyDeployment
parameters:
- $ref: "#/components/parameters/resourceId"
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/VerifyDeploymentRequest"
responses:
"200":
description: Verification result recorded
content:
application/json:
schema:
type: object
properties:
job_id:
type: string
verified:
type: boolean
verified_at:
type: string
format: date-time
"400":
$ref: "#/components/responses/BadRequest"
"500":
$ref: "#/components/responses/InternalError"
/api/v1/jobs/{id}/verification:
get:
tags: [Verification]
summary: Get post-deployment verification status
description: |
Returns the stored verification result for a deployment job — expected
and observed SHA-256 fingerprints, verified flag, and timestamp.
operationId: getJobVerification
parameters:
- $ref: "#/components/parameters/resourceId"
responses:
"200":
description: Verification result for the job
content:
application/json:
schema:
$ref: "#/components/schemas/VerificationResult"
"400":
$ref: "#/components/responses/BadRequest"
"500":
$ref: "#/components/responses/InternalError"
# ─── Policies ────────────────────────────────────────────────────────
/api/v1/policies:
get:
@@ -2718,6 +2939,238 @@ paths:
"500":
$ref: "#/components/responses/InternalError"
# ─── EST (RFC 7030) ────────────────────────────────────────────────
/.well-known/est/cacerts:
get:
tags: [EST]
summary: EST CA certificates distribution
description: |
Returns the CA certificate chain used to verify certctl-issued certificates.
Response is a base64-encoded degenerate PKCS#7 SignedData (certs-only) per
RFC 7030 §4.1.3.
operationId: estCACerts
security: []
responses:
"200":
description: Base64-encoded PKCS#7 certs-only structure
headers:
Content-Transfer-Encoding:
schema:
type: string
example: base64
content:
application/pkcs7-mime:
schema:
type: string
format: byte
description: "Base64-encoded PKCS#7 (smime-type=certs-only)"
"500":
$ref: "#/components/responses/InternalError"
/.well-known/est/simpleenroll:
post:
tags: [EST]
summary: EST simple enrollment
description: |
Enrolls a new certificate from a PKCS#10 CSR per RFC 7030 §4.2.1.
The CSR MAY be supplied as base64-encoded DER (EST standard wire format)
or as PEM for convenience. Returns a base64-encoded PKCS#7 certs-only
structure containing the issued certificate.
operationId: estSimpleEnroll
security: []
requestBody:
required: true
description: "Base64-encoded DER PKCS#10 CSR, or PEM-encoded CSR"
content:
application/pkcs10:
schema:
type: string
format: byte
responses:
"200":
description: Base64-encoded PKCS#7 cert-only response with issued certificate
headers:
Content-Transfer-Encoding:
schema:
type: string
example: base64
content:
application/pkcs7-mime:
schema:
type: string
format: byte
description: "Base64-encoded PKCS#7 (smime-type=certs-only)"
"400":
$ref: "#/components/responses/BadRequest"
"405":
description: Method not allowed (only POST accepted)
"500":
$ref: "#/components/responses/InternalError"
/.well-known/est/simplereenroll:
post:
tags: [EST]
summary: EST simple re-enrollment
description: |
Re-enrolls an existing certificate (same as simpleenroll in certctl's
implementation — re-enrollment is treated as a fresh issuance) per
RFC 7030 §4.2.2.
operationId: estSimpleReEnroll
security: []
requestBody:
required: true
description: "Base64-encoded DER PKCS#10 CSR, or PEM-encoded CSR"
content:
application/pkcs10:
schema:
type: string
format: byte
responses:
"200":
description: Base64-encoded PKCS#7 cert-only response with re-issued certificate
headers:
Content-Transfer-Encoding:
schema:
type: string
example: base64
content:
application/pkcs7-mime:
schema:
type: string
format: byte
description: "Base64-encoded PKCS#7 (smime-type=certs-only)"
"400":
$ref: "#/components/responses/BadRequest"
"405":
description: Method not allowed (only POST accepted)
"500":
$ref: "#/components/responses/InternalError"
/.well-known/est/csrattrs:
get:
tags: [EST]
summary: EST CSR attributes
description: |
Returns attributes the EST client should include in its CSR per
RFC 7030 §4.5. certctl currently returns an empty attribute set
(HTTP 204) — profile-based constraints are enforced server-side
during enrollment rather than advertised here.
operationId: estCSRAttrs
security: []
responses:
"200":
description: Base64-encoded CsrAttrs (when non-empty)
headers:
Content-Transfer-Encoding:
schema:
type: string
example: base64
content:
application/csrattrs:
schema:
type: string
format: byte
"204":
description: No CSR attributes defined (empty response)
"500":
$ref: "#/components/responses/InternalError"
# ─── SCEP (RFC 8894) ──────────────────────────────────────────────
/scep:
get:
tags: [SCEP]
summary: SCEP operation dispatch (GET)
description: |
Single SCEP entry point dispatched by the `operation` query parameter
per RFC 8894. GET is used for capability discovery (`GetCACaps`) and
CA certificate retrieval (`GetCACert`).
operationId: scepGet
security: []
parameters:
- name: operation
in: query
required: true
schema:
type: string
enum: [GetCACaps, GetCACert, PKIOperation]
description: SCEP operation selector
- name: message
in: query
required: false
schema:
type: string
description: Optional SCEP message parameter (base64-encoded for GET PKIOperation)
responses:
"200":
description: |
Success. Content-Type varies by operation:
- `GetCACaps` → `text/plain` capability list
- `GetCACert` (single cert) → `application/x-x509-ca-cert` (raw DER)
- `GetCACert` (chain) → `application/x-x509-ca-ra-cert` (PKCS#7)
- `PKIOperation` → `application/x-pki-message` (PKCS#7 SignedData)
content:
text/plain:
schema:
type: string
description: "SCEP capabilities (GetCACaps only)"
application/x-x509-ca-cert:
schema:
type: string
format: binary
description: "CA certificate DER (GetCACert single)"
application/x-x509-ca-ra-cert:
schema:
type: string
format: binary
description: "CA chain PKCS#7 (GetCACert chain)"
application/x-pki-message:
schema:
type: string
format: binary
description: "PKCS#7 SignedData response (PKIOperation)"
"400":
$ref: "#/components/responses/BadRequest"
"500":
$ref: "#/components/responses/InternalError"
post:
tags: [SCEP]
summary: SCEP PKIOperation (POST)
description: |
SCEP enrollment / renewal / revocation request per RFC 8894.
Request body is a PKCS#7 SignedData envelope wrapping the PKCS#10 CSR
or a degenerate raw CSR (fallback). The challenge password in the CSR
attributes is validated against `CERTCTL_SCEP_CHALLENGE_PASSWORD` when
configured.
operationId: scepPost
security: []
parameters:
- name: operation
in: query
required: true
schema:
type: string
enum: [PKIOperation]
requestBody:
required: true
description: PKCS#7 SignedData envelope wrapping a PKCS#10 CSR (or raw CSR as fallback)
content:
application/x-pki-message:
schema:
type: string
format: binary
responses:
"200":
description: PKCS#7 SignedData PKIMessage response
content:
application/x-pki-message:
schema:
type: string
format: binary
"400":
$ref: "#/components/responses/BadRequest"
"500":
$ref: "#/components/responses/InternalError"
# ═══════════════════════════════════════════════════════════════════════
components:
securitySchemes:
@@ -3006,6 +3459,7 @@ components:
DeploymentTarget:
type: object
required: [name, type, agent_id]
properties:
id:
type: string
@@ -3015,6 +3469,12 @@ components:
$ref: "#/components/schemas/TargetType"
agent_id:
type: string
description: |
ID of the agent that manages this target. Required because
deployment_targets.agent_id is a NOT NULL foreign key to agents(id)
(migration 000001). Empty or nonexistent agent IDs are rejected
with HTTP 400 by the service layer (see C-002 in the coverage-gap
audit).
config:
type: object
description: Target-specific configuration (varies by type)
@@ -3059,6 +3519,85 @@ components:
type: string
version:
type: string
retired_at:
type: string
format: date-time
nullable: true
description: |
I-004: soft-retirement timestamp. `null` (or field absent) means the
agent is active. A non-null value is the canonical "retired" state —
the operational `status` column is preserved at retirement time as
the last-seen value, but `retired_at` is the source of truth for
filtering agents out of active listings.
retired_reason:
type: string
nullable: true
description: |
I-004: human-readable reason captured at retirement time. Only set
when the agent was retired via `?force=true&reason=...` cascade; a
default soft-retire leaves this field null.
AgentDependencyCounts:
type: object
description: |
I-004: preflight counts of active downstream rows that would be
orphaned by retiring an agent. Returned in the 409
`blocked_by_dependencies` body so the operator UI can tell the user
which bucket is blocking the retire, and also in the 200 response
body on a successful `?force=true` cascade as a snapshot of what
was cascaded.
properties:
active_targets:
type: integer
description: Deployment targets with this agent assigned and retired_at IS NULL
active_certificates:
type: integer
description: Certificates currently deployed via one of this agent's active targets
pending_jobs:
type: integer
description: Jobs with agent_id=this in status Pending, AwaitingCSR, AwaitingApproval, or Running
RetireAgentResponse:
type: object
description: |
I-004: response body for a successful retire on DELETE /api/v1/agents/{id}.
Returned on both clean retires (cascade=false, zero counts) and
force-cascade retires (cascade=true, counts snapshot of the
pre-cascade dependency state). The 204 idempotent-retire path does
NOT emit this body — re-retiring an already-retired agent returns
an empty response.
properties:
retired_at:
type: string
format: date-time
already_retired:
type: boolean
description: |
Always false on the 200 response — the already-retired path
returns 204 No Content with no body. Surfaced in the schema
only so downstream consumers have a complete field map.
cascade:
type: boolean
description: True when the retire was invoked with ?force=true
counts:
$ref: "#/components/schemas/AgentDependencyCounts"
BlockedByDependenciesResponse:
type: object
description: |
I-004: 409 response body for a retire request blocked by active
downstream dependencies. Returned when `force=true` is not set and
any of the three counts is non-zero. The operator UI renders these
counts so the human can retire or reassign the blocking rows
before re-running the retire, or tick the force checkbox to cascade.
properties:
error:
type: string
example: blocked_by_dependencies
message:
type: string
counts:
$ref: "#/components/schemas/AgentDependencyCounts"
WorkItem:
type: object
@@ -3141,6 +3680,7 @@ components:
- RequiredMetadata
- AllowedEnvironments
- RenewalLeadTime
- CertificateLifetime
PolicySeverity:
type: string
@@ -3160,6 +3700,9 @@ components:
description: Policy-specific configuration (varies by type)
enabled:
type: boolean
severity:
$ref: "#/components/schemas/PolicySeverity"
description: Severity level applied to violations of this rule. Defaults to Warning on create when omitted.
created_at:
type: string
format: date-time
@@ -3805,3 +4348,47 @@ components:
type: string
format: date-time
description: Timestamp of this probe
# ─── Verification (M25) ──────────────────────────────────────────
VerifyDeploymentRequest:
type: object
required: [target_id, expected_fingerprint, actual_fingerprint, verified]
properties:
target_id:
type: string
description: Deployment target the agent probed
expected_fingerprint:
type: string
description: SHA-256 fingerprint of the certificate that should be served (hex, lowercase)
actual_fingerprint:
type: string
description: SHA-256 fingerprint observed on the live TLS endpoint (hex, lowercase)
verified:
type: boolean
description: True when expected and actual fingerprints match
error:
type: string
nullable: true
description: Error message when probe failed or fingerprints differ
VerificationResult:
type: object
properties:
job_id:
type: string
target_id:
type: string
expected_fingerprint:
type: string
description: SHA-256 fingerprint (hex) of the certificate deployed by this job
actual_fingerprint:
type: string
description: SHA-256 fingerprint (hex) observed on the live TLS endpoint
verified:
type: boolean
verified_at:
type: string
format: date-time
error:
type: string
description: Error message when verification failed
+100
View File
@@ -12,6 +12,7 @@ import (
"crypto/x509/pkix"
"encoding/json"
"encoding/pem"
"errors"
"flag"
"fmt"
"io"
@@ -23,6 +24,7 @@ import (
"path/filepath"
"runtime"
"strings"
"sync"
"syscall"
"time"
@@ -53,6 +55,16 @@ type AgentConfig struct {
DiscoveryDirs []string // Directories to scan for certificates (comma-separated via env)
}
// ErrAgentRetired is the sentinel returned by [Agent.Run] when the control
// plane responds with HTTP 410 Gone to a heartbeat or work-poll request — the
// canonical signal that this agent's row has been soft-retired server-side
// (see I-004 in cowork/certctl-coverage-gap-audit.md). The binary must
// terminate cleanly: an init-system restart would only produce another 410
// and wedge the host in a restart loop. main() translates this sentinel into
// a zero exit code so systemd (Restart=on-failure) and launchd do not respawn
// the process. Do not wrap this error — main() matches it with errors.Is.
var ErrAgentRetired = fmt.Errorf("agent retired by control plane")
// Agent represents the local agent that runs on target servers.
// It periodically sends heartbeats, polls for work, executes deployment and CSR jobs,
// and scans configured directories for existing certificates.
@@ -68,6 +80,17 @@ type Agent struct {
pollInterval time.Duration
discoveryInterval time.Duration
consecutiveFailures int
// I-004: terminal retirement signal. retiredSignal is closed exactly once
// (guarded by retiredOnce) when either sendHeartbeat or pollForWork
// observes HTTP 410 Gone. The Run() select loop picks up the close and
// returns ErrAgentRetired, unwinding the goroutine cleanly so main() can
// log + exit(0). Using a channel + sync.Once (rather than an atomic bool
// + polling) lets us fall through the select statement immediately instead
// of waiting for the next ticker; the zero-allocation close is safe to
// race with ctx.Done() and other cases.
retiredOnce sync.Once
retiredSignal chan struct{}
}
// WorkResponse represents the response from the work polling endpoint.
@@ -98,9 +121,31 @@ func NewAgent(cfg *AgentConfig, logger *slog.Logger) *Agent {
heartbeatInterval: 60 * time.Second,
pollInterval: 30 * time.Second,
discoveryInterval: 6 * time.Hour, // scan for certs every 6 hours
retiredSignal: make(chan struct{}),
}
}
// markRetired records that the control plane has declared this agent retired
// (HTTP 410 Gone on heartbeat or work poll). Idempotent via sync.Once — if
// both the heartbeat and work-poll paths observe 410 in the same tick, only
// the first close() runs and we avoid a runtime panic. Emits an ERROR-level
// log line so init-system journaling captures it prominently, and includes
// the source (heartbeat/work_poll), response body, and status code so the
// operator can verify it's a genuine retirement signal rather than a
// misrouted request. After this returns, the select-loop case in Run()
// observes the closed channel on its next iteration and returns
// ErrAgentRetired.
func (a *Agent) markRetired(source string, statusCode int, body string) {
a.retiredOnce.Do(func() {
a.logger.Error("agent has been retired by control plane — shutting down",
"source", source,
"status", statusCode,
"body", body,
"agent_id", a.config.AgentID)
close(a.retiredSignal)
})
}
// Run starts the agent's main loop.
// It sends heartbeats, polls for work, and handles graceful shutdown via context cancellation.
func (a *Agent) Run(ctx context.Context) error {
@@ -154,6 +199,19 @@ func (a *Agent) Run(ctx context.Context) error {
a.logger.Info("agent shutting down", "reason", ctx.Err())
return ctx.Err()
// I-004: retiredSignal is closed exactly once (via markRetired's
// sync.Once) when either sendHeartbeat or pollForWork observes HTTP 410
// Gone from the control plane. Falling through this case immediately
// (rather than waiting for the next ticker) lets the agent shut down
// quickly once retirement is confirmed — every extra heartbeat against a
// retired row is wasted work and noise in the audit trail. Returning
// ErrAgentRetired propagates up to main(), which matches it with
// errors.Is and exits(0) so systemd/launchd do not respawn the process.
case <-a.retiredSignal:
a.logger.Info("agent retired signal received — exiting event loop",
"agent_id", a.config.AgentID)
return ErrAgentRetired
case <-heartbeatTicker.C:
a.sendHeartbeat(ctx)
@@ -209,6 +267,22 @@ func (a *Agent) sendHeartbeat(ctx context.Context) {
}
defer resp.Body.Close()
// I-004: HTTP 410 Gone is the terminal signal from the control plane that
// this agent's row has been soft-retired (see internal/api/handler/agent.go
// heartbeat path + AgentRetirementService). Treat it separately from the
// generic non-200 error branch: record the event to markRetired (which closes
// retiredSignal exactly once via sync.Once) and return without bumping
// consecutiveFailures — this is not a transient failure, it's a clean
// shutdown. The Run() select loop picks up the closed channel on its next
// iteration and returns ErrAgentRetired, which main() translates into an
// exit(0) so systemd/launchd don't respawn the process into another 410
// loop.
if resp.StatusCode == http.StatusGone {
body, _ := io.ReadAll(resp.Body)
a.markRetired("heartbeat", resp.StatusCode, string(body))
return
}
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
a.logger.Error("heartbeat rejected",
@@ -237,6 +311,19 @@ func (a *Agent) pollForWork(ctx context.Context) {
}
defer resp.Body.Close()
// I-004: same terminal-retirement handling as sendHeartbeat. Work-poll is the
// other hot path that can observe an agent's soft-retirement; if the
// heartbeat tick happens to fire after a work-poll tick within the same
// retirement window, this branch catches it first. markRetired's sync.Once
// guards idempotency so racing both paths in the same tick only closes the
// signal channel once. No consecutiveFailures increment — retirement is
// not a transient failure.
if resp.StatusCode == http.StatusGone {
body, _ := io.ReadAll(resp.Body)
a.markRetired("work_poll", resp.StatusCode, string(body))
return
}
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
a.logger.Error("work poll rejected",
@@ -1117,6 +1204,19 @@ func main() {
cancel()
<-errChan
case err := <-errChan:
// I-004: ErrAgentRetired is a terminal, *clean* shutdown — the control
// plane responded HTTP 410 Gone on heartbeat/work-poll, meaning this
// agent's row has been soft-retired and will never be reachable again.
// Exit 0 so systemd's Restart=on-failure and launchd's KeepAlive do NOT
// respawn the process into another 410 loop (which would wedge the host
// and spam the control plane). Operators can observe the retirement via
// audit_events or the AgentsPage retired tab; the terminal log line on
// the way out is enough for post-mortem forensics.
if errors.Is(err, ErrAgentRetired) {
logger.Info("agent retired by control plane — exiting without restart",
"agent_id", agentCfg.AgentID)
return
}
if err != context.Canceled {
logger.Error("agent error", "error", err)
os.Exit(1)
+38 -4
View File
@@ -27,14 +27,17 @@ Commands:
certs renew ID Trigger certificate renewal
certs revoke ID Revoke a certificate
agents list List agents
agents get ID Get agent details
agents list List agents (add --retired to list soft-retired agents)
agents get ID Get agent details
agents retire ID Soft-retire an agent (add --force --reason "…" to cascade)
jobs list List jobs
jobs get ID Get job details
jobs cancel ID Cancel a pending job
import FILE Bulk import certificates from PEM file(s)
Required: --owner-id, --team-id, --renewal-policy-id, --issuer-id
Optional: --name-template (default {cn}), --environment (default imported)
status Show server health + summary stats
version Show CLI version
@@ -138,9 +141,19 @@ func handleCerts(client *cli.Client, args []string) error {
}
}
// handleAgents dispatches the `agents` subcommands.
//
// I-004 additions:
//
// agents list --retired — hit the opt-in /agents/retired endpoint
// instead of the default listing (which
// filters retired rows out).
// agents retire <id> — soft-retire an agent (DELETE /agents/{id}).
// --force cascades; --reason is required with
// --force (mirrors ErrForceReasonRequired).
func handleAgents(client *cli.Client, args []string) error {
if len(args) == 0 {
fmt.Fprintf(os.Stderr, "usage: agents <list|get> [options]\n")
fmt.Fprintf(os.Stderr, "usage: agents <list|get|retire> [options]\n")
return nil
}
@@ -149,13 +162,34 @@ func handleAgents(client *cli.Client, args []string) error {
switch subcommand {
case "list":
return client.ListAgents(subArgs)
// --retired flag splits to a separate endpoint. We intercept it
// client-side and strip it before delegating, so both code paths
// share the --page/--per-page flag parsing inside the client.
retired := false
rest := make([]string, 0, len(subArgs))
for _, a := range subArgs {
if a == "--retired" {
retired = true
continue
}
rest = append(rest, a)
}
if retired {
return client.ListRetiredAgents(rest)
}
return client.ListAgents(rest)
case "get":
if len(subArgs) == 0 {
fmt.Fprintf(os.Stderr, "usage: agents get <id>\n")
return nil
}
return client.GetAgent(subArgs[0])
case "retire":
if len(subArgs) == 0 {
fmt.Fprintf(os.Stderr, "usage: agents retire <id> [--force] [--reason <reason>]\n")
return nil
}
return client.RetireAgent(subArgs)
default:
fmt.Fprintf(os.Stderr, "unknown subcommand: agents %s\n", subcommand)
return nil
+226 -25
View File
@@ -9,6 +9,7 @@ import (
"os"
"os/signal"
"strconv"
"strings"
"syscall"
"time"
@@ -16,7 +17,6 @@ import (
"github.com/shankar0123/certctl/internal/api/middleware"
"github.com/shankar0123/certctl/internal/api/router"
"github.com/shankar0123/certctl/internal/config"
"github.com/shankar0123/certctl/internal/crypto"
"github.com/shankar0123/certctl/internal/domain"
discoveryawssm "github.com/shankar0123/certctl/internal/connector/discovery/awssm"
discoveryazurekv "github.com/shankar0123/certctl/internal/connector/discovery/azurekv"
@@ -82,14 +82,60 @@ func main() {
logger.Info("initialized all repositories")
// Initialize dynamic issuer registry.
// Issuers are loaded from the database (with AES-GCM encrypted config).
// Issuers are loaded from the database (with AES-256-GCM encrypted config).
// On first boot with an empty database, env var issuers are seeded automatically.
var encryptionKey []byte
if cfg.Encryption.ConfigEncryptionKey != "" {
encryptionKey = crypto.DeriveKey(cfg.Encryption.ConfigEncryptionKey)
logger.Info("config encryption enabled (AES-256-GCM)")
//
// M-8 (CWE-916 / CWE-329): the encryption passphrase is passed as a raw
// string into IssuerService / TargetService / IssuerRegistry. Each call to
// crypto.EncryptIfKeySet generates a fresh 16-byte PBKDF2 salt and emits a
// v2 blob (magic 0x02 || salt || nonce || sealed). Decryption auto-detects
// v1 legacy blobs (no magic) and falls back to the fixed v1 salt for
// backward compatibility; v1 blobs transparently upgrade to v2 on next
// write. DO NOT pre-derive the key here with crypto.DeriveKey — that was
// the v1 fixed-salt behaviour that M-8 removes.
encryptionKey := cfg.Encryption.ConfigEncryptionKey
if encryptionKey != "" {
logger.Info("config encryption enabled (AES-256-GCM, per-ciphertext PBKDF2 salt)")
} else {
logger.Warn("CERTCTL_CONFIG_ENCRYPTION_KEY not set — issuer configs stored in plaintext (not recommended for production)")
// C-2 fix: fail closed at startup when database-sourced issuer or target
// rows exist without a configured encryption key. Previously the server
// would emit a one-line warning and silently persist new GUI-created
// configs as plaintext (CWE-311). Refuse to start instead: the operator
// must either configure CERTCTL_CONFIG_ENCRYPTION_KEY or remove the
// vulnerable rows before the control plane can boot.
ctx := context.Background()
dbIssuers, ierr := issuerRepo.List(ctx)
if ierr != nil {
logger.Error("startup check: failed to list issuers", "error", ierr)
os.Exit(1)
}
dbTargets, terr := targetRepo.List(ctx)
if terr != nil {
logger.Error("startup check: failed to list targets", "error", terr)
os.Exit(1)
}
var dbIssuerCount, dbTargetCount int
for _, iss := range dbIssuers {
if iss != nil && iss.Source == "database" {
dbIssuerCount++
}
}
for _, tgt := range dbTargets {
if tgt != nil && tgt.Source == "database" {
dbTargetCount++
}
}
if dbIssuerCount > 0 || dbTargetCount > 0 {
logger.Error(
"startup refused: CERTCTL_CONFIG_ENCRYPTION_KEY is not set but database-sourced configs exist "+
"(would expose sensitive fields as plaintext, CWE-311). "+
"Set the encryption key or remove the affected rows before restarting.",
"database_sourced_issuers", dbIssuerCount,
"database_sourced_targets", dbTargetCount,
)
os.Exit(1)
}
logger.Warn("CERTCTL_CONFIG_ENCRYPTION_KEY not set — env-seeded issuers will be stored in plaintext; GUI-created issuers and targets will be rejected until a key is configured")
}
issuerRegistry := service.NewIssuerRegistry(logger)
@@ -100,6 +146,7 @@ func main() {
// Initialize services (following the dependency graph)
auditService := service.NewAuditService(auditRepo)
policyService := service.NewPolicyService(policyRepo, auditService)
policyService.SetCertRepo(certificateRepo) // D-008: CertificateLifetime arm needs CertificateVersion.NotBefore/NotAfter
certificateService := service.NewCertificateService(certificateRepo, policyService, auditService)
notifierRegistry := make(map[string]service.Notifier)
@@ -177,7 +224,10 @@ func main() {
renewalService := service.NewRenewalService(certificateRepo, jobRepo, renewalPolicyRepo, profileRepo, auditService, notificationService, issuerRegistry, cfg.Keygen.Mode)
renewalService.SetTargetRepo(targetRepo)
deploymentService := service.NewDeploymentService(jobRepo, targetRepo, agentRepo, certificateRepo, auditService, notificationService)
jobService := service.NewJobService(jobRepo, renewalService, deploymentService, logger)
jobService := service.NewJobService(jobRepo, certificateRepo, ownerRepo, renewalService, deploymentService, logger)
// I-001: emit "job_retry" audit events when the scheduler resets Failed→Pending.
// SetAuditService is optional — JobService falls back to nil-guarded no-op if unwired.
jobService.SetAuditService(auditService)
agentService := service.NewAgentService(agentRepo, certificateRepo, jobRepo, targetRepo, auditService, issuerRegistry, renewalService)
agentService.SetProfileRepo(profileRepo)
issuerService := service.NewIssuerService(issuerRepo, auditService, issuerRegistry, encryptionKey, logger)
@@ -208,9 +258,15 @@ func main() {
Name: "Network Scanner (Server-Side)",
Status: domain.AgentStatusOnline,
}
if err := agentRepo.Create(context.Background(), sentinelAgent); err != nil {
// Ignore duplicate key errors (agent already exists)
logger.Debug("sentinel agent creation", "status", "exists or created", "id", service.SentinelAgentID)
// M-6: use CreateIfNotExists so duplicate rows on restart/upgrade are
// idempotent without swallowing unrelated DB failures (CWE-662).
created, err := agentRepo.CreateIfNotExists(context.Background(), sentinelAgent)
if err != nil {
logger.Error("sentinel agent creation failed", "id", service.SentinelAgentID, "error", err)
} else if created {
logger.Info("sentinel agent created", "id", service.SentinelAgentID)
} else {
logger.Debug("sentinel agent already exists", "id", service.SentinelAgentID)
}
}
@@ -229,8 +285,14 @@ func main() {
Name: "AWS Secrets Manager Discovery",
Status: domain.AgentStatusOnline,
}
if err := agentRepo.Create(context.Background(), sentinelAWS); err != nil {
logger.Debug("sentinel agent creation", "status", "exists or created", "id", service.SentinelAWSSecretsMgr)
// M-6: idempotent create (CWE-662).
created, err := agentRepo.CreateIfNotExists(context.Background(), sentinelAWS)
if err != nil {
logger.Error("sentinel agent creation failed", "id", service.SentinelAWSSecretsMgr, "error", err)
} else if created {
logger.Info("sentinel agent created", "id", service.SentinelAWSSecretsMgr)
} else {
logger.Debug("sentinel agent already exists", "id", service.SentinelAWSSecretsMgr)
}
}
@@ -248,8 +310,14 @@ func main() {
Name: "Azure Key Vault Discovery",
Status: domain.AgentStatusOnline,
}
if err := agentRepo.Create(context.Background(), sentinelAzure); err != nil {
logger.Debug("sentinel agent creation", "status", "exists or created", "id", service.SentinelAzureKeyVault)
// M-6: idempotent create (CWE-662).
created, err := agentRepo.CreateIfNotExists(context.Background(), sentinelAzure)
if err != nil {
logger.Error("sentinel agent creation failed", "id", service.SentinelAzureKeyVault, "error", err)
} else if created {
logger.Info("sentinel agent created", "id", service.SentinelAzureKeyVault)
} else {
logger.Debug("sentinel agent already exists", "id", service.SentinelAzureKeyVault)
}
}
@@ -262,8 +330,14 @@ func main() {
Name: "GCP Secret Manager Discovery",
Status: domain.AgentStatusOnline,
}
if err := agentRepo.Create(context.Background(), sentinelGCP); err != nil {
logger.Debug("sentinel agent creation", "status", "exists or created", "id", service.SentinelGCPSecretMgr)
// M-6: idempotent create (CWE-662).
created, err := agentRepo.CreateIfNotExists(context.Background(), sentinelGCP)
if err != nil {
logger.Error("sentinel agent creation failed", "id", service.SentinelGCPSecretMgr, "error", err)
} else if created {
logger.Info("sentinel agent created", "id", service.SentinelGCPSecretMgr)
} else {
logger.Debug("sentinel agent already exists", "id", service.SentinelGCPSecretMgr)
}
}
@@ -367,6 +441,10 @@ func main() {
// Configure scheduler intervals from config
sched.SetRenewalCheckInterval(cfg.Scheduler.RenewalCheckInterval)
sched.SetJobProcessorInterval(cfg.Scheduler.JobProcessorInterval)
// I-001: drive the failed-job retry loop. Runs on start + every RetryInterval
// (default 5m, CERTCTL_SCHEDULER_RETRY_INTERVAL). Kept adjacent to the job
// processor setter because they share the JobServicer dependency.
sched.SetJobRetryInterval(cfg.Scheduler.RetryInterval)
sched.SetAgentHealthCheckInterval(cfg.Scheduler.AgentHealthCheckInterval)
sched.SetNotificationProcessInterval(cfg.Scheduler.NotificationProcessInterval)
if cfg.NetworkScan.Enabled {
@@ -391,6 +469,17 @@ func main() {
"sources", cloudDiscoveryService.SourceCount())
}
// Wire job timeout reaper (I-003)
sched.SetJobReaperService(jobService)
sched.SetJobTimeoutInterval(cfg.Scheduler.JobTimeoutInterval)
sched.SetAwaitingCSRTimeout(cfg.Scheduler.AwaitingCSRTimeout)
sched.SetAwaitingApprovalTimeout(cfg.Scheduler.AwaitingApprovalTimeout)
logger.Info("job timeout reaper enabled",
"interval", cfg.Scheduler.JobTimeoutInterval.String(),
"csr_timeout", cfg.Scheduler.AwaitingCSRTimeout.String(),
"approval_timeout", cfg.Scheduler.AwaitingApprovalTimeout.String())
// Start scheduler
logger.Info("starting scheduler")
startedChan := sched.Start(ctx)
@@ -445,6 +534,24 @@ func main() {
// Register SCEP (RFC 8894) handlers if enabled
if cfg.SCEP.Enabled {
// H-2 fix: fail closed at startup when SCEP is enabled without a
// challenge password configured. Previously the service-layer guard
// at internal/service/scep.go:72-79 skipped the password check when
// s.challengePassword == "", meaning any client that could reach the
// /scep endpoint could enroll an arbitrary CSR against the configured
// issuer (CWE-306, missing authentication for a critical function).
// Refuse to start instead: the operator must set
// CERTCTL_SCEP_CHALLENGE_PASSWORD (or disable SCEP) before the control
// plane can boot.
if err := preflightSCEPChallengePassword(cfg.SCEP.Enabled, cfg.SCEP.ChallengePassword); err != nil {
logger.Error(
"startup refused: SCEP is enabled but CERTCTL_SCEP_CHALLENGE_PASSWORD is not set "+
"(would allow unauthenticated certificate enrollment, CWE-306). "+
"Set a non-empty challenge password or disable SCEP before restarting.",
"error", err,
)
os.Exit(1)
}
issuerConn, ok := issuerRegistry.Get(cfg.SCEP.IssuerID)
if !ok {
logger.Error("SCEP issuer not found in registry", "issuer_id", cfg.SCEP.IssuerID)
@@ -464,13 +571,63 @@ func main() {
"endpoints", "/scep?operation={GetCACaps,GetCACert,PKIOperation}")
}
// Register RFC 5280 CRL and RFC 6960 OCSP handlers under /.well-known/pki/.
// These are always enabled (no config gate) — revocation data must be
// reachable to relying parties for any cert certctl issues. The finalHandler
// routing gate below strips auth middleware for this prefix so browsers,
// OpenSSL, OCSP stapling sidecars, and mTLS clients can fetch without
// presenting certctl Bearer tokens.
apiRouter.RegisterPKIHandlers(certificateHandler)
logger.Info("PKI endpoints registered",
"endpoints", "/.well-known/pki/{crl/{issuer_id},ocsp/{issuer_id}/{serial}}")
logger.Info("registered all API handlers")
// Build middleware stack
authMiddleware := middleware.NewAuth(middleware.AuthConfig{
Type: cfg.Auth.Type,
Secret: cfg.Auth.Secret,
})
// Build middleware stack.
//
// Authentication unification (M-002): every authenticated request now
// carries a named actor in the request context so audit events record
// the real key identity instead of the hardcoded "api-key-user" string.
// Named keys come from CERTCTL_API_KEYS_NAMED (preferred). For backward
// compatibility CERTCTL_AUTH_SECRET is synthesized into legacy-key-N
// entries with Admin=false.
var namedKeys []middleware.NamedAPIKey
if cfg.Auth.Type != "none" {
// Translate typed config.NamedAPIKey -> middleware.NamedAPIKey. The
// two structs are field-compatible but live in different packages to
// preserve the config→middleware dependency direction.
for _, nk := range cfg.Auth.NamedKeys {
namedKeys = append(namedKeys, middleware.NamedAPIKey{
Name: nk.Name,
Key: nk.Key,
Admin: nk.Admin,
})
}
// Back-compat: if no named keys but legacy Secret is configured,
// synthesize named entries so the audit trail still attributes the
// action (instead of falling back to "api-key-user" / "anonymous").
if len(namedKeys) == 0 && cfg.Auth.Secret != "" {
parts := strings.Split(cfg.Auth.Secret, ",")
idx := 0
for _, p := range parts {
p = strings.TrimSpace(p)
if p == "" {
continue
}
namedKeys = append(namedKeys, middleware.NamedAPIKey{
Name: fmt.Sprintf("legacy-key-%d", idx),
Key: p,
Admin: false,
})
idx++
}
if len(namedKeys) > 0 {
logger.Warn("CERTCTL_AUTH_SECRET is deprecated — set CERTCTL_API_KEYS_NAMED for named actor attribution and admin gating",
"synthesized_keys", len(namedKeys))
}
}
}
authMiddleware := middleware.NewAuthWithNamedKeys(namedKeys)
corsMiddleware := middleware.NewCORS(middleware.CORSConfig{
AllowedOrigins: cfg.CORS.AllowedOrigins,
})
@@ -502,7 +659,7 @@ func main() {
bodyLimitMiddleware,
corsMiddleware,
authMiddleware,
auditMiddleware,
auditMiddleware.Middleware,
}
// Add rate limiter if enabled
@@ -519,7 +676,7 @@ func main() {
rateLimiter,
corsMiddleware,
authMiddleware,
auditMiddleware,
auditMiddleware.Middleware,
}
logger.Info("rate limiting enabled", "rps", cfg.RateLimit.RPS, "burst", cfg.RateLimit.BurstSize)
}
@@ -566,6 +723,14 @@ func main() {
noAuthHandler.ServeHTTP(w, r)
return
}
// RFC 5280 CRL and RFC 6960 OCSP live under /.well-known/pki/ and
// MUST be served unauthenticated — relying parties (browsers,
// OpenSSL, OCSP stapling sidecars, mTLS clients) cannot present
// certctl Bearer tokens. See router.RegisterPKIHandlers.
if len(path) >= 16 && path[:16] == "/.well-known/pki" {
noAuthHandler.ServeHTTP(w, r)
return
}
// All other API and EST routes go through the full middleware stack (with auth)
if (len(path) >= 8 && path[:8] == "/api/v1/") ||
(len(path) >= 16 && path[:16] == "/.well-known/est") {
@@ -582,13 +747,18 @@ func main() {
})
logger.Info("dashboard available at /", "web_dir", webDir)
} else {
// No dashboard: route health/auth-info without auth, everything else through full stack
// No dashboard: route health/auth-info and /.well-known/pki without
// auth, everything else through full stack.
finalHandler = http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
path := r.URL.Path
if path == "/health" || path == "/ready" || path == "/api/v1/auth/info" {
noAuthHandler.ServeHTTP(w, r)
return
}
if len(path) >= 16 && path[:16] == "/.well-known/pki" {
noAuthHandler.ServeHTTP(w, r)
return
}
apiHandler.ServeHTTP(w, r)
})
logger.Info("dashboard directory not found, serving API only")
@@ -637,6 +807,17 @@ func main() {
logger.Error("HTTP server shutdown error", "error", err)
}
// Drain in-flight audit-recording goroutines before closing the DB pool.
// The audit middleware spawns one goroutine per non-excluded request; those
// goroutines run detached from the request context and write to the
// audit_events table via the same *sql.DB. Without this drain, SIGTERM
// would close the DB pool while recordings were mid-flight, silently
// dropping audit events (M-1, CWE-662 / CWE-400).
logger.Info("flushing audit middleware in-flight recordings")
if err := auditMiddleware.Flush(shutdownCtx); err != nil {
logger.Warn("audit middleware flush did not complete in time", "error", err)
}
// Close database connection
if err := db.Close(); err != nil {
logger.Error("error closing database connection", "error", err)
@@ -645,3 +826,23 @@ func main() {
logger.Info("certctl server stopped")
}
// preflightSCEPChallengePassword enforces the H-2 fix: if SCEP is enabled, a
// non-empty challenge password MUST be configured. Returns a non-nil error
// otherwise so the caller can refuse to start the control plane (CWE-306,
// missing authentication for a critical function).
//
// This helper is extracted so the check can be unit tested without booting
// the full server. The caller (main) is responsible for translating the
// returned error into a structured log line and os.Exit(1).
func preflightSCEPChallengePassword(enabled bool, challengePassword string) error {
if !enabled {
return nil
}
if challengePassword == "" {
return fmt.Errorf("SCEP enabled but CERTCTL_SCEP_CHALLENGE_PASSWORD is empty: " +
"SCEP enrollment would accept any client (CWE-306); " +
"configure a non-empty shared secret or set CERTCTL_SCEP_ENABLED=false")
}
return nil
}
+66
View File
@@ -7,6 +7,7 @@ import (
"net/http"
"net/http/httptest"
"os"
"strings"
"testing"
"github.com/shankar0123/certctl/internal/api/middleware"
@@ -538,3 +539,68 @@ func TestMain_ContextPropagation(t *testing.T) {
t.Logf("Context value may not be propagated (status %d), this may be expected", w.Code)
}
}
// TestPreflightSCEPChallengePassword is the H-2 regression guard for the
// startup pre-flight check. The helper MUST return a non-nil error whenever
// SCEP is enabled with an empty challenge password — that configuration
// previously allowed unauthenticated certificate enrollment (CWE-306).
// Disabled-SCEP and configured-password cases must pass cleanly.
func TestPreflightSCEPChallengePassword(t *testing.T) {
tests := []struct {
name string
enabled bool
challengePassword string
wantErr bool
wantErrSubstring string
}{
{
name: "disabled_empty_password_ok",
enabled: false,
challengePassword: "",
wantErr: false,
},
{
name: "disabled_with_password_ok",
enabled: false,
challengePassword: "leftover-value",
wantErr: false,
},
{
name: "enabled_empty_password_rejected",
enabled: true,
challengePassword: "",
wantErr: true,
wantErrSubstring: "CERTCTL_SCEP_CHALLENGE_PASSWORD",
},
{
name: "enabled_with_password_ok",
enabled: true,
challengePassword: "hunter2",
wantErr: false,
},
{
name: "enabled_single_char_password_ok",
enabled: true,
challengePassword: "x",
wantErr: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := preflightSCEPChallengePassword(tt.enabled, tt.challengePassword)
if tt.wantErr {
if err == nil {
t.Fatalf("expected error, got nil")
}
if tt.wantErrSubstring != "" && !strings.Contains(err.Error(), tt.wantErrSubstring) {
t.Errorf("expected error to mention %q, got: %v", tt.wantErrSubstring, err)
}
if !strings.Contains(err.Error(), "CWE-306") {
t.Errorf("expected error to cite CWE-306 for traceability, got: %v", err)
}
} else if err != nil {
t.Errorf("expected no error, got: %v", err)
}
})
}
}
+19
View File
@@ -9,6 +9,16 @@ services:
build:
context: ..
dockerfile: Dockerfile
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Node frontend stage and Go module
# download can reach the public registries behind corporate proxies.
# Defaults to empty; omit the variables from the host environment for
# un-proxied builds and the behaviour is byte-identical to the pre-fix
# tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
environment:
# Verbose logging for development
CERTCTL_LOG_LEVEL: debug
@@ -29,6 +39,15 @@ services:
build:
context: ..
dockerfile: Dockerfile.agent
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Go module download stage can reach
# the public Go module proxy behind corporate proxies. Defaults to
# empty; omit the variables from the host environment for un-proxied
# builds and the behaviour is byte-identical to the pre-fix tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
environment:
CERTCTL_LOG_LEVEL: debug
+19
View File
@@ -150,6 +150,16 @@ services:
build:
context: ..
dockerfile: Dockerfile
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Node frontend stage and Go module
# download can reach the public registries behind corporate proxies.
# Defaults to empty; omit the variables from the host environment for
# un-proxied builds and the behaviour is byte-identical to the pre-fix
# tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-test-server
depends_on:
postgres:
@@ -266,6 +276,15 @@ services:
build:
context: ..
dockerfile: Dockerfile.agent
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Go module download stage can reach
# the public Go module proxy behind corporate proxies. Defaults to
# empty; omit the variables from the host environment for un-proxied
# builds and the behaviour is byte-identical to the pre-fix tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-test-agent
depends_on:
certctl-server:
+19
View File
@@ -36,6 +36,16 @@ services:
build:
context: ..
dockerfile: Dockerfile
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Node frontend stage and Go module
# download can reach the public registries behind corporate proxies.
# Defaults to empty; omit the variables from the host environment for
# un-proxied builds and the behaviour is byte-identical to the pre-fix
# tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-server
depends_on:
postgres:
@@ -75,6 +85,15 @@ services:
build:
context: ..
dockerfile: Dockerfile.agent
# Proxy propagation (M-4, Issue #9) — forwards host shell's proxy env
# vars into the Docker build so the Go module download stage can reach
# the public Go module proxy behind corporate proxies. Defaults to
# empty; omit the variables from the host environment for un-proxied
# builds and the behaviour is byte-identical to the pre-fix tree.
args:
HTTP_PROXY: ${HTTP_PROXY:-}
HTTPS_PROXY: ${HTTPS_PROXY:-}
NO_PROXY: ${NO_PROXY:-}
container_name: certctl-agent
depends_on:
certctl-server:
+37 -19
View File
@@ -195,16 +195,11 @@ type metricsResponse struct {
Uptime float64 `json:"uptime_seconds"`
}
// crlResponse for the CRL endpoint.
type crlResponse struct {
Version int `json:"version"`
Total int `json:"total"`
Entries []struct {
Serial string `json:"serial_number"`
Reason string `json:"reason"`
RevokedAt string `json:"revoked_at"`
} `json:"entries"`
}
// M-006: The non-standard JSON CRL endpoint (`GET /api/v1/crl`) was removed.
// RFC 5280 §5 defines only the DER wire format, which is now served
// unauthenticated at `/.well-known/pki/crl/{issuer_id}` per RFC 8615.
// The `crlResponse` Go struct that used to decode the JSON envelope is gone;
// Phase 7 parses the DER bytes directly via `x509.ParseRevocationList`.
// ---------------------------------------------------------------------------
// PostgreSQL test helper
@@ -728,18 +723,41 @@ func TestIntegrationSuite(t *testing.T) {
t.Fatalf("revocation response unexpected: %s", body)
}
// Check CRL
t.Run("CRL", func(t *testing.T) {
resp, err := c.Get("/api/v1/crl")
// Check DER CRL served unauthenticated under /.well-known/pki/ per
// RFC 5280 §5 + RFC 8615 (M-006). Use a plain http.Get — no Bearer
// token — to prove the endpoint is reachable by relying parties that
// have no certctl API credentials.
t.Run("CRL_DER_Unauthenticated", func(t *testing.T) {
resp, err := http.Get(serverURL + "/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("GET CRL: %v", err)
t.Fatalf("GET DER CRL: %v", err)
}
var crl crlResponse
if err := decodeJSON(resp, &crl); err != nil {
t.Fatalf("decode CRL: %v", err)
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
t.Fatalf("unexpected status: got %d, want 200 (body=%s)", resp.StatusCode, string(body))
}
if crl.Total < 1 {
t.Fatalf("CRL total: got %d, want >= 1", crl.Total)
if ct := resp.Header.Get("Content-Type"); ct != "application/pkix-crl" {
t.Errorf("Content-Type: got %q, want %q", ct, "application/pkix-crl")
}
body, err := io.ReadAll(resp.Body)
if err != nil {
t.Fatalf("read CRL body: %v", err)
}
if len(body) == 0 {
t.Fatal("CRL body empty")
}
// Parse the DER bytes as an X.509 CRL (RFC 5280) and verify the
// just-revoked certificate is listed.
crl, err := x509.ParseRevocationList(body)
if err != nil {
t.Fatalf("parse DER CRL: %v", err)
}
if len(crl.RevokedCertificateEntries) < 1 {
t.Fatalf("CRL entries: got %d, want >= 1", len(crl.RevokedCertificateEntries))
}
})
+45 -8
View File
@@ -26,6 +26,7 @@
package integration_test
import (
"crypto/x509"
"database/sql"
"encoding/json"
"io"
@@ -434,10 +435,19 @@ func TestQA(t *testing.T) {
// ===================================================================
t.Run("Part03_CertCRUD", func(t *testing.T) {
t.Run("Create_Minimal", func(t *testing.T) {
// C-001 scope-expansion: the handler's ValidateRequired
// contract now gates common_name, owner_id, team_id,
// issuer_id, name, and renewal_policy_id. A 3-field
// payload would 400 regardless of the id hint, so the
// "minimal" variant carries every required field.
code, body := c.bodyStr(t, "POST", "/api/v1/certificates", `{
"id": "mc-qa-minimal",
"name": "qa-minimal",
"common_name": "qa-minimal.example.com",
"issuer_id": "iss-local"
"issuer_id": "iss-local",
"owner_id": "o-alice",
"team_id": "t-platform",
"renewal_policy_id": "rp-standard"
}`)
if code != 201 && code != 200 {
t.Fatalf("create cert: status %d, body: %s", code, body)
@@ -447,11 +457,14 @@ func TestQA(t *testing.T) {
t.Run("Create_Full", func(t *testing.T) {
code, body := c.bodyStr(t, "POST", "/api/v1/certificates", `{
"id": "mc-qa-full",
"name": "qa-full",
"common_name": "qa-full.example.com",
"sans": ["qa-full-alt.example.com"],
"issuer_id": "iss-local",
"environment": "staging",
"owner_id": "o-alice"
"owner_id": "o-alice",
"team_id": "t-platform",
"renewal_policy_id": "rp-standard"
}`)
if code != 201 && code != 200 {
t.Fatalf("create cert: status %d, body: %s", code, body)
@@ -596,13 +609,37 @@ func TestQA(t *testing.T) {
}
})
t.Run("CRL_JSON", func(t *testing.T) {
code, body := c.bodyStr(t, "GET", "/api/v1/crl", "")
if code != 200 {
t.Fatalf("CRL = %d", code)
// M-006: The non-standard JSON CRL endpoint was removed. RFC 5280 §5
// defines only the DER wire format, now served unauthenticated at
// `/.well-known/pki/crl/{issuer_id}` per RFC 8615. Use a plain
// http.Get — no Bearer — to prove the endpoint is reachable by
// relying parties with no API credentials.
t.Run("CRL_DER_Unauthenticated", func(t *testing.T) {
resp, err := http.Get(qaServerURL + "/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("GET DER CRL: %v", err)
}
if !strings.Contains(body, "entries") {
t.Fatalf("CRL response missing entries field")
defer resp.Body.Close()
if resp.StatusCode != 200 {
b, _ := io.ReadAll(resp.Body)
t.Fatalf("CRL = %d (body=%s)", resp.StatusCode, string(b))
}
if ct := resp.Header.Get("Content-Type"); ct != "application/pkix-crl" {
t.Errorf("Content-Type: got %q, want %q", ct, "application/pkix-crl")
}
body, err := io.ReadAll(resp.Body)
if err != nil {
t.Fatalf("read CRL body: %v", err)
}
if len(body) == 0 {
t.Fatal("CRL body empty")
}
crl, err := x509.ParseRevocationList(body)
if err != nil {
t.Fatalf("parse DER CRL: %v", err)
}
if len(crl.RevokedCertificateEntries) < 1 {
t.Fatalf("CRL entries: got %d, want >= 1", len(crl.RevokedCertificateEntries))
}
})
})
+15 -6
View File
@@ -608,13 +608,22 @@ else
fail "Revocation failed" "$REVOKE_RESP"
fi
info "Checking CRL..."
CRL_RESP=$(api_get "/api/v1/crl" 2>/dev/null || echo '{"total":0}')
CRL_TOTAL=$(echo "$CRL_RESP" | python3 -c "import sys,json; print(json.load(sys.stdin).get('total',0))" 2>/dev/null || echo 0)
if [ "$CRL_TOTAL" -ge 1 ]; then
pass "CRL contains $CRL_TOTAL revoked certificate(s)"
info "Checking DER CRL under /.well-known/pki (RFC 5280 §5, RFC 8615)..."
# The JSON CRL endpoint (`GET /api/v1/crl`) was removed in M-006. RFC 5280
# defines only the DER wire format, now served unauthenticated at
# `/.well-known/pki/crl/{issuer_id}`. Fetch without the Bearer header to
# prove the endpoint is reachable by relying parties with no API key.
CRL_TMP=$(mktemp)
CRL_HEADERS=$(mktemp)
CRL_HTTP_CODE=$(curl -s -o "$CRL_TMP" -D "$CRL_HEADERS" -w "%{http_code}" "${API_URL}/.well-known/pki/crl/iss-local" 2>/dev/null || echo "000")
CRL_SIZE=$(wc -c < "$CRL_TMP" | tr -d ' ')
CRL_CONTENT_TYPE=$(awk 'tolower($1)=="content-type:" { sub(/\r$/,"",$2); print tolower($2) }' "$CRL_HEADERS" | head -n1)
rm -f "$CRL_TMP" "$CRL_HEADERS"
if [ "$CRL_HTTP_CODE" = "200" ] && [ "$CRL_CONTENT_TYPE" = "application/pkix-crl" ] && [ "$CRL_SIZE" -gt 0 ]; then
pass "DER CRL served unauthenticated (HTTP 200, Content-Type application/pkix-crl, ${CRL_SIZE} bytes)"
else
fail "CRL empty after revocation"
fail "DER CRL fetch failed: HTTP=$CRL_HTTP_CODE Content-Type=$CRL_CONTENT_TYPE size=$CRL_SIZE"
fi
CERT_STATUS=$(api_get "/api/v1/certificates/mc-local-test" | python3 -c "import sys,json; print(json.load(sys.stdin).get('status',''))" 2>/dev/null || echo "unknown")
+40 -2
View File
@@ -139,6 +139,16 @@ The agent runs two background loops: a heartbeat (every 60 seconds) to signal it
**Agent groups (M11b):** Dynamic device grouping allows organizing agents by metadata criteria. Agent groups can match by OS, architecture, IP CIDR, and version. Groups support both dynamic matching (agents automatically join when criteria match) and manual membership (explicit include/exclude). Renewal policies can be scoped to agent groups via the `agent_group_id` foreign key. The GUI provides full CRUD management for agent groups with visual match criteria badges.
**Agent soft-retirement (I-004):** `DELETE /api/v1/agents/{id}` is a soft-delete surface — the row is never removed. Retirement stamps `agents.retired_at` (TIMESTAMPTZ) and `agents.retired_reason` (TEXT) and flips the operational status to `Offline`. Default listings (`GET /api/v1/agents`, the dashboard stats counter, and the stale-offline sweeper) filter retired rows out via `AgentRepository.ListActive`; retired rows are surfaced only through the opt-in `GET /api/v1/agents/retired` view. The endpoint follows a preflight → block → escape-hatch contract:
- **Clean retire** (no active dependencies) — `200 OK` with `RetireAgentResponse` (`cascade=false`, zero counts).
- **Blocked by active dependencies**`409 Conflict` with `BlockedByDependenciesResponse`. The three counts (`active_targets`, `active_certificates`, `pending_jobs`) tell the operator exactly which rows would be orphaned. The schema diverges from `ErrorResponse` because downstream dashboards parse the stable three-key shape.
- **Force cascade**`DELETE /api/v1/agents/{id}?force=true&reason=...`. `reason` is required (400 otherwise). Transactionally soft-retires downstream `deployment_targets`, cancels pending jobs, and soft-retires the agent, emitting an `agent_retirement_cascaded` audit event with actor + reason + per-bucket counts.
- **Idempotent re-retire** — a retire attempt against an already-retired agent returns `204 No Content` with an empty body (no second audit event, no response shape — callers that POST again on a retry get a clean no-op).
- **Sentinel refusal** — the four sentinel agent IDs (`server-scanner`, `cloud-aws-sm`, `cloud-azure-kv`, `cloud-gcp-sm`) back non-agent discovery subsystems (the network scanner and the three cloud secret-manager sources). They are refused unconditionally — even with `force=true` — via `ErrAgentIsSentinel``403 Forbidden`. The ID list lives in `internal/domain/connector.go` (`SentinelAgentIDs`) so handler, repository, and scheduler code can filter them without importing `service`.
Retired agents receive `410 Gone` on subsequent heartbeats (`service.ErrAgentRetired`). `cmd/agent` treats 410 as a terminal signal and exits cleanly so retired agents stop phoning home. Migration `000015` flipped `deployment_targets.agent_id` from `ON DELETE CASCADE` to `ON DELETE RESTRICT`, making the old hard-delete path a schema error and forcing all retirement through this contract.
### Web Dashboard
The web dashboard is the primary operational interface for certctl. It is built with Vite + React + TypeScript and uses TanStack Query for server state management (caching, background refetching, optimistic updates).
@@ -463,7 +473,7 @@ sequenceDiagram
API-->>U: 200 OK
```
The revocation is recorded in the `certificate_revocations` table (separate from the certificate status update) for CRL generation. The DER-encoded CRL at `GET /api/v1/crl/{issuer_id}` is generated on-demand by querying this table and signing with the issuing CA's key. The OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}` checks both the certificate status and the revocations table to return signed good/revoked/unknown responses.
The revocation is recorded in the `certificate_revocations` table (separate from the certificate status update) for CRL generation. The DER-encoded CRL at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615) is generated on-demand by querying this table and signing with the issuing CA's key. The OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960) checks both the certificate status and the revocations table to return signed good/revoked/unknown responses. Both endpoints are served unauthenticated — relying parties (TLS clients, hardware appliances, browsers) must be able to reach them without a certctl API key — and carry the IANA-registered media types `application/pkix-crl` and `application/ocsp-response` respectively.
Short-lived certificates (those with profile TTL < 1 hour) return "good" from OCSP and are excluded from CRL — their rapid expiry is treated as sufficient revocation.
@@ -808,6 +818,34 @@ All shell-facing inputs (connector scripts, domain names, ACME tokens) are valid
All incoming HTTP request bodies are capped by `http.MaxBytesReader` middleware (default 1MB, configurable via `CERTCTL_MAX_BODY_SIZE`). Requests exceeding the limit receive a 413 Request Entity Too Large response. The middleware is positioned before authentication in the chain so oversized payloads are rejected early, before any auth processing or database work occurs. Requests without bodies (GET, HEAD, nil body) skip the limit check.
### Config Encryption at Rest
Dynamic issuer and target configurations (rows with `source='database'`) contain credentials — ACME EAB HMACs, Vault tokens, DigiCert/Sectigo API keys, SSH private keys, WinRM passwords, F5 BIG-IP passwords, and similar. These are sealed at rest in PostgreSQL via `internal/crypto/encryption.go` using AES-256-GCM with a key derived from the operator passphrase `CERTCTL_CONFIG_ENCRYPTION_KEY` through PBKDF2-SHA256 (100,000 rounds, 32-byte output).
**v2 wire format (current, M-8 remediation, CWE-916 / CWE-329):**
```
magic(0x02) || salt(16) || nonce(12) || ciphertext+tag
```
Every call to `EncryptIfKeySet` draws 16 fresh bytes from `crypto/rand` as the PBKDF2 salt, so the derived AES-256 key is distinct per ciphertext and per re-encryption. The salt is stored alongside the ciphertext; decryption reads the magic byte, splits out the salt, re-derives the key, and verifies the AEAD tag.
**v1 legacy format (read-only):**
```
nonce(12) || ciphertext+tag
```
Pre-M-8 blobs were sealed with a package-level fixed salt `"certctl-config-encryption-v1"`. `DecryptIfKeySet` preserves the v1 read path unchanged — a blob whose first byte is not `0x02`, or whose v2 AEAD verification fails (including the 1/256 case where a v1 nonce happens to begin with `0x02`), falls through to a v1 attempt against the legacy fixed salt. v1 blobs are never written by the post-M-8 code path; they re-seal as v2 naturally on the next UPDATE through the normal service CRUD flow. No operator migration ceremony is required.
**Fail-closed behavior (C-2 sentinel, CWE-311):** both `EncryptIfKeySet` and `DecryptIfKeySet` return `ErrEncryptionKeyRequired` when invoked with an empty passphrase. The server refuses to start if any `source='database'` rows already exist without `CERTCTL_CONFIG_ENCRYPTION_KEY` set.
**Low-level primitives preserved byte-identical.** `Encrypt`, `Decrypt`, and `DeriveKey` are kept bit-stable so v1 fixtures on disk remain decryptable unchanged and so callers outside the config-encryption path (none today, but the symbols are exported) do not see a breaking change. The new per-ciphertext salt path is reached via the helper `deriveKeyWithSalt(passphrase, salt)`.
**Passphrase plumbing.** Services (`IssuerService`, `TargetService`, `IssuerRegistry`) hold the operator passphrase as a raw `string` and delegate PBKDF2 to the crypto package per ciphertext. This replaces the pre-M-8 design that pre-derived a single `[]byte` key at service construction and reused it for every row, which was the direct consequence of the fixed-salt KDF.
**Coverage gate.** CI enforces `internal/crypto/...` coverage ≥ 85% (observed 86.7%) — the encryption primitives are a security-critical gate, and the v2 format plus v1 fallback plus C-2 sentinel paths all need exhaustive coverage to avoid silent regressions.
### CORS
CORS uses a **deny-by-default** posture: when `CERTCTL_CORS_ORIGINS` is empty, no CORS headers are set and only same-origin requests can read responses. Operators must explicitly configure allowed origins. This prevents accidental exposure of the API to cross-origin requests in production.
@@ -861,7 +899,7 @@ Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST
- **Additional filters**: `?agent_id=`, `?profile_id=` (in addition to existing status, environment, owner_id, team_id, issuer_id).
- **Deployments**: `GET /api/v1/certificates/{id}/deployments` returns deployment targets for a certificate.
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. A JSON-formatted CRL is available at `GET /api/v1/crl`, and a DER-encoded X.509 CRL signed by the issuing CA at `GET /api/v1/crl/{issuer_id}`. An embedded OCSP responder serves signed responses at `GET /api/v1/ocsp/{issuer_id}/{serial}`. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation.
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`). The embedded OCSP responder serves signed responses unauthenticated at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`). Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation.
Certificate export (M27): `GET /api/v1/certificates/{id}/export/pem` returns PEM-encoded certificate and chain, and `POST /api/v1/certificates/{id}/export/pkcs12` returns a PKCS#12 bundle (binary). Private keys are never exported — they remain on agents. All exports are audited with actor, timestamp, and format.
+6 -4
View File
@@ -210,15 +210,17 @@ NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize
- Proxy agent executes deployment via appliance API
**Revocation Distribution**
- Certificate Revocation List (CRL) via `GET /api/v1/crl/{issuer_id}`
- Returns DER-encoded X.509 CRL signed by issuing CA
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
- Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
- 24-hour validity period
- Includes all revoked serials, reasons, and revocation timestamps
- Served unauthenticated so relying parties without certctl API credentials can fetch it
- Subject to URL caching; OCSP preferred for real-time revocation
- OCSP via `GET /api/v1/ocsp/{issuer_id}/{serial}`
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure)
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
- Signed by issuing CA (or delegated OCSP signing cert)
- Responds with good/revoked/unknown status
- Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
- Real-time, more bandwidth-efficient than CRL polling
## Revocation and Compromise (NIST SP 800-57 Part 3)
+12 -11
View File
@@ -92,10 +92,10 @@ Your QSA will request evidence that your certificate and key management systems
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
- **Revocation Infrastructure** (M15a, M15b):
- **Revocation Infrastructure** (M15a, M15b, M-006):
- Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
- CRL endpoint: `GET /api/v1/crl` (JSON format) or `GET /api/v1/crl/{issuer_id}` (DER X.509 CRL, 24h validity, signed by issuing CA)
- OCSP responder: `GET /api/v1/ocsp/{issuer_id}/{serial}` (returns DER-encoded OCSP response: good/revoked/unknown)
- CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
- OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
- Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
- Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
@@ -109,7 +109,7 @@ Your QSA will request evidence that your certificate and key management systems
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
- CRL/OCSP availability proof: HTTP GET requests to `/api/v1/crl` and `/api/v1/ocsp/{issuer}/{serial}` with signed responses.
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
@@ -328,9 +328,10 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
- Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
- Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
- **CRL and OCSP Publication** (M15b) — Revoked certificates published in:
- CRL: `GET /api/v1/crl` (JSON format) or `GET /api/v1/crl/{issuer_id}` (DER X.509, signed by CA, 24h validity)
- OCSP: `GET /api/v1/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain)
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
- CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
- OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
- Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
- Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
@@ -342,8 +343,8 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
**Evidence You Can Provide**:
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
- CRL publication: HTTP GET `/api/v1/crl` and parse JSON to show revoked serial numbers and timestamps.
- OCSP responder validation: Query `GET /api/v1/ocsp/{issuer}/{serial}` for a known-revoked cert; response includes `revoked` status.
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
**Operator Responsibility**:
@@ -721,12 +722,12 @@ This requirement covers key generation, storage, rotation, and destruction. Cert
| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
|---|---|---|---|---|---|
| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /api/v1/crl`, `GET /api/v1/ocsp/{issuer}/{serial}` | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, `GET /api/v1/crl`, OCSP endpoint | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
+4 -4
View File
@@ -282,8 +282,8 @@ Each section includes:
- `certificateHold` — temporary revocation (can be "unhold" by reissue)
- `privilegeWithdrawn` — access rights revoked
Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
- **CRL Endpoint**`GET /api/v1/crl` returns a JSON-formatted Certificate Revocation List (serial, reason, timestamp for each revoked cert). `GET /api/v1/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (useful for legacy clients that don't support OCSP).
- **OCSP Responder**`GET /api/v1/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
- **CRL Endpoint**`GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
- **OCSP Responder**`GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
- Certificate owner (email)
- Configured webhooks (if you have a SIEM that subscribes)
@@ -460,8 +460,8 @@ Each section includes:
| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
| | CRL Endpoint (JSON + DER) | `GET /api/v1/crl`, `GET /api/v1/crl/{issuer_id}` | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients |
| | OCSP Responder | `GET /api/v1/ocsp/{issuer_id}/{serial}` | ✅ | ✅ | Test revocation in staging |
| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
+4 -2
View File
@@ -123,6 +123,8 @@ At no point does the private key leave the agent. This is a fundamental security
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
### Deployment Targets
Targets are the systems where certificates actually get installed — NGINX web servers, Apache httpd servers, HAProxy load balancers, Traefik reverse proxies, Caddy servers, Envoy gateways, Postfix/Dovecot mail servers, Microsoft IIS servers, and network appliances. Each target type has a **connector** that knows how to deploy certificates to that specific system (e.g., writing files and reloading NGINX or Apache config, building a combined PEM for HAProxy).
@@ -216,9 +218,9 @@ certctl implements revocation using three complementary mechanisms:
**Bulk Revocation** (Fleet-Level Incident Response): For large-scale incidents like CA compromise or team infrastructure decommissioning, `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria in a single operation. Filter by profile, owner, team, agent group, or issuer to target the affected certificate set. This is essential for incident response — instead of revoking certificates one-by-one, operators can revoke an entire fleet in minutes. Bulk revocation creates individual revocation jobs that reuse the existing revocation pipeline, ensuring every certificate is audited and notifications are sent.
**Certificate Revocation List (CRL)**: certctl serves both a JSON-formatted CRL at `GET /api/v1/crl` and DER-encoded X.509 CRLs per issuer at `GET /api/v1/crl/{issuer_id}`. The DER CRL is signed by the issuing CA's key and has 24-hour validity clients can download it periodically to check revocation status offline.
**Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`.
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}`. It returns signed OCSP responses (good, revoked, or unknown) so clients can verify certificate status without downloading the full CRL.
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960). Like the CRL endpoint, it is unauthenticated and returns signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`, so clients can verify certificate status without downloading the full CRL.
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
+5 -2
View File
@@ -155,7 +155,7 @@ The Local CA issuer signs certificates using Go's `crypto/x509` library. It supp
**Sub-CA mode:** Loads a CA certificate and private key from disk (`CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH`). The CA cert is signed by an upstream CA (e.g., ADCS), so all issued certificates chain to the enterprise root trust hierarchy. Clients that already trust the enterprise root automatically trust certctl-issued certs. Supports RSA, ECDSA, and PKCS#8 key formats. If the paths are not set, falls back to self-signed mode. The loaded certificate must have `IsCA=true` and `KeyUsageCertSign`.
**CRL and OCSP support (M15b):** The Local CA supports DER-encoded X.509 CRL generation via `GET /api/v1/crl/{issuer_id}` with 24-hour validity. An embedded OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}` returns signed OCSP responses for issued certificates (good/revoked/unknown status). Certificates with profile TTL < 1 hour automatically skip CRL/OCSP — expiry is treated as sufficient revocation for short-lived credentials.
**CRL and OCSP support (M15b):** The Local CA supports DER-encoded X.509 CRL generation served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`) with 24-hour validity. An embedded OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`) returns signed OCSP responses for issued certificates (good/revoked/unknown status). Both endpoints are reachable by relying parties with no certctl API credentials, which is how standard TLS clients, browsers, and hardware appliances consume these resources. Certificates with profile TTL < 1 hour automatically skip CRL/OCSP — expiry is treated as sufficient revocation for short-lived credentials.
**Extended Key Usage (EKU) support (M27):** The Local CA respects EKU constraints from certificate profiles and adjusts key usage flags accordingly. For S/MIME certificates (emailProtection EKU), it uses `DigitalSignature | ContentCommitment` instead of the TLS default. For TLS certificates (serverAuth/clientAuth EKU), it uses `DigitalSignature | KeyEncipherment`. This enables support for multiple certificate types — TLS, S/MIME, code signing, timestamping — from a single CA.
@@ -287,7 +287,7 @@ Environment variables:
The connector is registered in the issuer registry under `iss-stepca`. step-ca also works with the existing ACME connector (point `iss-acme-*` at step-ca's ACME directory URL for ACME-based issuance).
**Note:** step-ca-issued certificates rely on step-ca's own CRL/OCSP infrastructure. certctl's local CRL/OCSP endpoints (`GET /api/v1/crl/{issuer_id}` and `GET /api/v1/ocsp/{issuer_id}/{serial}`) are populated from step-ca's revocation data if available, but clients should validate against step-ca's endpoints for the authoritative status.
**Note:** step-ca-issued certificates rely on step-ca's own CRL/OCSP infrastructure. certctl's local CRL/OCSP endpoints (`GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}`, served unauthenticated per RFC 5280 §5 / RFC 6960 / RFC 8615) are populated from step-ca's revocation data if available, but clients should validate against step-ca's endpoints for the authoritative status.
**MaxTTL enforcement (M11c):** When a certificate profile defines a maximum TTL, the step-ca connector caps the `NotAfter` field to ensure the issued certificate does not exceed the profile limit, regardless of the step-ca provisioner's own maximum.
@@ -465,9 +465,12 @@ GlobalSign Atlas High Volume CA REST API with dual authentication: mTLS for the
| `CERTCTL_GLOBALSIGN_API_SECRET` | Yes | — | API secret for request authentication |
| `CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
| `CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
| `CERTCTL_GLOBALSIGN_SERVER_CA_PATH` | No | system trust store | PEM bundle used to verify the Atlas API server certificate. Set this for private/lab Atlas deployments whose server TLS chain is not in the host's default trust bundle. |
**Authentication:** Dual — mTLS client certificate for TLS handshake plus `X-API-Key` and `X-API-Secret` headers on every request.
**TLS verification:** The connector always verifies the server certificate. When `server_ca_path` is set, the PEM bundle at that path is used as the trust anchor; otherwise the host's system trust store is used. TLS 1.2 is the minimum protocol version.
**Issuance model:** `POST /v2/certificates` returns a serial number. Certificate PEM is available after validation completes. Typically resolves within seconds for DV. `GetOrderStatus` polls the certificate endpoint.
**Note:** CRL and OCSP are managed by GlobalSign. certctl records revocations locally and notifies GlobalSign via `PUT /v2/certificates/{serial}/revoke`.
+11 -9
View File
@@ -724,22 +724,24 @@ curl -s -X POST $API/api/v1/certificates/mc-demo-payments/revoke \
6. Creates an audit trail entry
7. Sends revocation notifications via configured channels
Check the CRL (Certificate Revocation List):
Check the CRL (Certificate Revocation List) — served unauthenticated under the RFC 8615 well-known namespace so relying parties without a certctl API key can still verify revocation (RFC 5280 §5):
```bash
# JSON-formatted CRL
curl -s $API/api/v1/crl | jq .
# DER-encoded X.509 CRL for the local CA (binary — pipe to openssl for inspection)
curl -s $API/api/v1/crl/iss-local -o /tmp/crl.der
# DER-encoded X.509 CRL for the local CA (binary — pipe to openssl for inspection).
# Note: no -H "Authorization: Bearer ..." — the endpoint is deliberately
# unauthenticated. Content-Type is application/pkix-crl.
curl -s http://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
openssl crl -inform DER -in /tmp/crl.der -text -noout
```
Check OCSP status:
Check OCSP status (RFC 6960, also unauthenticated, `application/ocsp-response`):
```bash
# Replace SERIAL with the actual serial number from the certificate version
curl -s $API/api/v1/ocsp/iss-local/SERIAL | jq .
# Replace SERIAL with the actual serial number from the certificate version.
# The embedded OCSP responder returns a signed DER response — parse it with
# `openssl ocsp -respin` or similar tooling.
curl -s http://localhost:8443/.well-known/pki/ocsp/iss-local/SERIAL -o /tmp/ocsp.der
openssl ocsp -respin /tmp/ocsp.der -noverify -resp_text | head -40
```
**Why RFC 5280 reason codes:** The reason code isn't just metadata — it tells clients *why* the certificate was revoked. A `keyCompromise` revocation means the private key was exposed and the certificate should be distrusted immediately. A `superseded` revocation means a newer certificate replaced it — less urgent. CRLs and OCSP responses include the reason code so client software can make informed trust decisions.
+5 -4
View File
@@ -228,14 +228,15 @@ Revocation is a 7-step process: validate eligibility → get serial → update s
- Audit trail: single `bulk_revocation_initiated` event logs the criteria and actor
- Optional `--reason` defaults to `unspecified` if omitted
### CRL Endpoints
### CRL Endpoint
- `GET /api/v1/crl` — JSON-formatted CRL (version, entries array, total count, timestamp)
- `GET /api/v1/crl/{issuer_id}` — DER-encoded X.509 CRL signed by issuing CA, 24-hour validity
- `GET /.well-known/pki/crl/{issuer_id}` — DER-encoded X.509 CRL signed by the issuing CA, 24-hour validity (RFC 5280 §5 + RFC 8615). Served unauthenticated with `Content-Type: application/pkix-crl` so relying parties without certctl API credentials can fetch it.
Prior non-standard JSON CRL and authenticated `/api/v1/crl*` paths were removed in M-006 — RFC 5280 defines only the DER wire format and relying parties do not have API keys.
### OCSP Responder
`GET /api/v1/ocsp/{issuer_id}/{serial}` — signed OCSP responses (good/revoked/unknown). Signs with issuing CA key. Requires CA key access (Local CA, step-CA connectors).
`GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — signed OCSP responses (good/revoked/unknown) per RFC 6960. Served unauthenticated with `Content-Type: application/ocsp-response`. Signs with the issuing CA key; requires CA key access (Local CA, step-CA connectors).
### Short-Lived Certificate Exemption
+4 -2
View File
@@ -286,9 +286,11 @@ curl -s -X POST http://localhost:8443/api/v1/certificates/$CERT_ID/revoke \
Supported RFC 5280 reason codes: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn`.
Confirm via CRL:
Confirm via the unauthenticated DER CRL (RFC 5280 §5, RFC 8615):
```bash
curl -s http://localhost:8443/api/v1/crl | jq .
# Fetch the CRL without any API key — relying parties shouldn't need one.
curl -s http://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
openssl crl -inform der -in /tmp/crl.der -noout -text | head -40
```
### Interactive approval workflow
+6 -3
View File
@@ -512,12 +512,15 @@ curl -s -X POST http://localhost:8443/api/v1/certificates/mc-local-test/revoke \
### Step 7b: Check the CRL (Certificate Revocation List)
The CRL is a DER-encoded X.509 v2 CRL (RFC 5280 §5) served under the RFC 8615 well-known namespace. It is deliberately unauthenticated — relying parties that need to verify revocation don't have certctl API keys.
```bash
curl -s -H "Authorization: Bearer test-key-2026" \
http://localhost:8443/api/v1/crl | python3 -m json.tool
# No Authorization header — the endpoint is public by design.
curl -s http://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
openssl crl -inform der -in /tmp/crl.der -noout -text | head -40
```
**What you should see**: A list that includes the revoked certificate's serial number, the reason, and the timestamp.
**What you should see**: `openssl` prints the CRL issuer DN, `This Update` / `Next Update` timestamps, and at least one entry whose `Serial Number` matches the cert you just revoked, with `CRL Reason Code: Superseded` (or whichever reason you passed in step 7a). The response's `Content-Type` header is `application/pkix-crl`.
### Step 7c: Check in the dashboard
+279 -61
View File
@@ -1297,66 +1297,59 @@ curl -s -H "$AUTH" "$SERVER/api/v1/audit?per_page=5" | jq '[.items[] | select(.a
### 5.3 CRL & OCSP
**Test 5.3.1 — JSON CRL endpoint**
> **M-006 note:** The non-standard JSON CRL (`GET /api/v1/crl`) and the authenticated DER CRL (`GET /api/v1/crl/{issuer_id}`) and OCSP (`GET /api/v1/ocsp/{issuer_id}/{serial}`) paths were removed. Revocation-status distribution now lives under the RFC 8615 well-known namespace (`/.well-known/pki/crl/{issuer_id}` and `/.well-known/pki/ocsp/{issuer_id}/{serial}`), served unauthenticated because relying parties (browsers, TLS clients, hardware appliances) do not have certctl API keys.
**Test 5.3.1 — DER CRL endpoint (RFC 5280 §5, unauthenticated)**
```bash
curl -s -w "\nHTTP %{http_code}\n" -H "$AUTH" "$SERVER/api/v1/crl" | jq '{total: .total, entries_count: (.entries | length)}'
curl -s -D - -o /tmp/crl.der "$SERVER/.well-known/pki/crl/iss-local" | grep -i "content-type"
openssl crl -inform der -in /tmp/crl.der -noout -text | head -40
```
**What:** Fetches the JSON-formatted Certificate Revocation List.
**Why:** CRL is how relying parties check if a certificate has been revoked. The JSON CRL is the machine-readable API view.
**Expected:** HTTP 200. `total` > 0 (we revoked several certs above). Entries array contains serial numbers.
**PASS if** HTTP 200 and `total` > 0. **FAIL** if total = 0 or 500.
**What:** Fetches the DER-encoded X.509 CRL for the local issuer without presenting any API credentials.
**Why:** Relying parties (browsers, TLS libraries, network appliances) don't have certctl API keys. RFC 5280 §5 defines only the DER wire format, and RFC 8615 defines `.well-known/pki/*` as the relying-party namespace. The Content-Type must be `application/pkix-crl` and `openssl crl -inform der` must parse the body.
**Expected:** `Content-Type: application/pkix-crl`, `openssl` prints a valid CRL with the revoked serials we created above.
**PASS if** Content-Type matches and `openssl crl` parses the body. **FAIL** if JSON/HTML, 401/403, or parse error.
---
**Test 5.3.2 — DER CRL endpoint**
**Test 5.3.2 — OCSP: good response for non-revoked cert (RFC 6960, unauthenticated)**
```bash
curl -s -D - -o /dev/null -H "$AUTH" "$SERVER/api/v1/crl/iss-local" | grep -i "content-type"
curl -s -w "\nHTTP %{http_code}\n" "$SERVER/.well-known/pki/ocsp/iss-local/mc-api-prod" -o /tmp/ocsp.der
openssl ocsp -respin /tmp/ocsp.der -noverify -text 2>/dev/null | head -20
```
**What:** Fetches the DER-encoded X.509 CRL for the local issuer.
**Why:** Standard CRL consumers (browsers, TLS libraries) expect DER-encoded CRLs, not JSON. The Content-Type must be correct.
**Expected:** `Content-Type: application/pkix-crl`
**PASS if** Content-Type is `application/pkix-crl`. **FAIL** if JSON or other.
**What:** Queries the OCSP responder for a non-revoked certificate without any Authorization header.
**Why:** OCSP is the real-time alternative to CRL. RFC 6960 relying parties do not authenticate to the responder, so the endpoint must be public and return `Content-Type: application/ocsp-response`.
**Expected:** HTTP 200 with OCSP response indicating "good" status when `openssl ocsp -respin` parses the body.
**PASS if** HTTP 200 and cert status prints "good". **FAIL** if 401/403/500 or "revoked"/"unknown".
---
**Test 5.3.3 — OCSP: good response for non-revoked cert**
**Test 5.3.3 — OCSP: revoked response for revoked cert (unauthenticated)**
```bash
curl -s -w "\nHTTP %{http_code}\n" -H "$AUTH" "$SERVER/api/v1/ocsp/iss-local/mc-api-prod"
```
**What:** Queries the OCSP responder for a non-revoked certificate.
**Why:** OCSP is the real-time alternative to CRL. A "good" response means the cert is valid.
**Expected:** HTTP 200 with OCSP response indicating "good" status.
**PASS if** HTTP 200. **FAIL** if 500.
---
**Test 5.3.4 — OCSP: revoked response for revoked cert**
```bash
curl -s -w "\nHTTP %{http_code}\n" -H "$AUTH" "$SERVER/api/v1/ocsp/iss-local/mc-test-full"
curl -s -w "\nHTTP %{http_code}\n" "$SERVER/.well-known/pki/ocsp/iss-local/mc-test-full" -o /tmp/ocsp.der
openssl ocsp -respin /tmp/ocsp.der -noverify -text 2>/dev/null | grep -i "cert status"
```
**What:** Queries OCSP for a certificate we revoked earlier.
**Why:** OCSP must return "revoked" status for revoked certs. If it still returns "good," relying parties will trust a compromised certificate.
**Why:** OCSP must return "revoked" status for revoked certs. If it still returns "good," relying parties will trust a compromised certificate. Endpoint is unauthenticated per RFC 6960.
**Expected:** HTTP 200 with OCSP response indicating "revoked" status.
**PASS if** HTTP 200 and response indicates revoked. **FAIL** if response indicates "good".
**PASS if** HTTP 200 and status prints "revoked". **FAIL** if status is "good".
---
**Test 5.3.5 — OCSP: unknown serial**
**Test 5.3.4 — OCSP: unknown serial (unauthenticated)**
```bash
curl -s -w "\nHTTP %{http_code}\n" -H "$AUTH" "$SERVER/api/v1/ocsp/iss-local/nonexistent-serial"
curl -s -w "\nHTTP %{http_code}\n" "$SERVER/.well-known/pki/ocsp/iss-local/nonexistent-serial" -o /tmp/ocsp.der
openssl ocsp -respin /tmp/ocsp.der -noverify -text 2>/dev/null | grep -i "cert status"
```
**What:** Queries OCSP for a serial number the server doesn't recognize.
**Why:** OCSP must return "unknown" for serials it doesn't manage, not "good" (which would be a false positive).
**Why:** OCSP must return "unknown" for serials it doesn't manage, not "good" (which would be a false positive). Endpoint is public per RFC 6960.
**Expected:** HTTP 200 with OCSP "unknown" response, or HTTP 404.
**PASS if** response is "unknown" or 404. **FAIL** if "good".
@@ -2102,9 +2095,10 @@ go test ./internal/connector/issuer/local/ -run "TestSubCA" -v
**What:** In sub-CA mode, the DER CRL (Part 31.1) should be signed by the sub-CA key, not a self-signed root.
```bash
# After starting in sub-CA mode and revoking a cert:
curl -s -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/crl/iss-local" -o /tmp/subca-crl.der
# After starting in sub-CA mode and revoking a cert. The CRL is
# published unauthenticated under the RFC 8615 well-known namespace
# because relying parties don't carry certctl API keys.
curl -s "http://localhost:8443/.well-known/pki/crl/iss-local" -o /tmp/subca-crl.der
openssl crl -in /tmp/subca-crl.der -inform DER -noout -issuer
```
@@ -3706,23 +3700,24 @@ go test ./internal/service/ -run TestCSRRenewal -v
**Why:** TLS clients need to verify that certificates haven't been revoked. Without OCSP/CRL, a compromised certificate remains trusted until it expires. The short-lived exemption avoids bloating the CRL with certs that expire before distribution.
### 24.1: DER-Encoded CRL
> **M-006 note:** CRL and OCSP are published at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, `application/pkix-crl`) and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `application/ocsp-response`). Per RFC 8615, `.well-known/pki/*` is the relying-party namespace, and the endpoints are served **unauthenticated** — browsers, TLS libraries, and network appliances do not have certctl API keys. The legacy `GET /api/v1/crl`, `GET /api/v1/crl/{issuer_id}`, and `GET /api/v1/ocsp/{issuer_id}/{serial}` routes were removed.
**What:** `GET /api/v1/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA. Content-Type is `application/pkix-crl`. The CRL has 24-hour validity.
### 24.1: DER-Encoded CRL (unauthenticated)
**Why:** This is the standard CRL format that browsers, TLS libraries, and LDAP directories consume. The existing JSON CRL at `GET /api/v1/crl` is certctl-specific; the DER CRL is interoperable.
**What:** `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA. Content-Type is `application/pkix-crl`. The CRL has 24-hour validity.
**Why:** This is the RFC 5280 §5 wire format that browsers, TLS libraries, and LDAP directories consume. It must be reachable without any Authorization header so that relying parties — who have no certctl credentials — can fetch it.
```bash
# Request DER CRL for the local issuer
curl -s -D - -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/crl/iss-local" \
# Request DER CRL for the local issuer. No Authorization header.
curl -s -D - "http://localhost:8443/.well-known/pki/crl/iss-local" \
-o /tmp/crl.der
# Verify it's valid DER CRL with openssl
openssl crl -in /tmp/crl.der -inform DER -noout -text
```
**Expected:** 200 OK, Content-Type `application/pkix-crl`, Cache-Control `public, max-age=3600`.
**Expected:** 200 OK, Content-Type `application/pkix-crl`.
**PASS if:**
- `openssl crl` parses the DER file successfully
@@ -3730,33 +3725,34 @@ openssl crl -in /tmp/crl.der -inform DER -noout -text
- Validity period is present (thisUpdate / nextUpdate)
- If any certs have been revoked, they appear in the revocation list with serial + reason
**FAIL if:** Response is JSON (wrong endpoint), `openssl` rejects the DER format, or headers are wrong.
**FAIL if:** Response is JSON (wrong endpoint), `openssl` rejects the DER format, headers are wrong, or the server returns 401/403 (auth must NOT be required).
### 24.2: DER CRL — Nonexistent Issuer
```bash
curl -s -w "\n%{http_code}" -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/crl/iss-nonexistent"
curl -s -w "\n%{http_code}" \
"http://localhost:8443/.well-known/pki/crl/iss-nonexistent"
```
**Expected:** 404 Not Found.
**PASS if** status code is 404 and body contains "not found".
### 24.3: OCSP Responder — Good Status
### 24.3: OCSP Responder — Good Status (unauthenticated)
**What:** `GET /api/v1/ocsp/{issuer_id}/{serial}` returns a signed OCSP response. For a non-revoked certificate, the status is "good".
**What:** `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response. For a non-revoked certificate, the status is "good".
**Why:** OCSP is the real-time revocation check that TLS clients perform during the handshake. A "good" response tells the client the cert is still valid.
**Why:** OCSP is the real-time RFC 6960 revocation check that TLS clients perform during the handshake. A "good" response tells the client the cert is still valid. Relying parties fetch this without API credentials.
```bash
# First, get a certificate's serial number
# First, get a certificate's serial number (this uses the authenticated API
# because the operator has an API key — that is different from the relying
# party fetching the OCSP response).
SERIAL=$(curl -s -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/certificates/mc-api-prod" | jq -r '.latest_version.serial_number // empty')
# If serial is available, query OCSP
# Query OCSP without any Authorization header.
if [ -n "$SERIAL" ]; then
curl -s -D - -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/ocsp/iss-local/$SERIAL" \
curl -s -D - "http://localhost:8443/.well-known/pki/ocsp/iss-local/$SERIAL" \
-o /tmp/ocsp.der
# Parse OCSP response
@@ -3771,7 +3767,7 @@ fi
- Certificate status is "good" for a non-revoked cert
- Response is signed (producedAt timestamp present)
**FAIL if:** Response is JSON, OCSP status is wrong, or `openssl` rejects the response.
**FAIL if:** Response is JSON, OCSP status is wrong, `openssl` rejects the response, or the endpoint requires auth.
### 24.4: OCSP Responder — Revoked Status
@@ -3784,9 +3780,8 @@ curl -s -X POST -H "Authorization: Bearer $API_KEY" \
-d '{"reason": "keyCompromise"}' \
"http://localhost:8443/api/v1/certificates/$CERT_ID/revoke"
# Then query OCSP
curl -s -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/ocsp/iss-local/$SERIAL" \
# Then query OCSP — unauthenticated.
curl -s "http://localhost:8443/.well-known/pki/ocsp/iss-local/$SERIAL" \
-o /tmp/ocsp-revoked.der
openssl ocsp -respin /tmp/ocsp-revoked.der -text -noverify
@@ -3801,8 +3796,7 @@ openssl ocsp -respin /tmp/ocsp-revoked.der -text -noverify
**What:** Querying a serial number that doesn't exist in the inventory returns an "unknown" OCSP status (not an error — this is the correct OCSP behavior per RFC 6960).
```bash
curl -s -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/ocsp/iss-local/DEADBEEF" \
curl -s "http://localhost:8443/.well-known/pki/ocsp/iss-local/DEADBEEF" \
-o /tmp/ocsp-unknown.der
openssl ocsp -respin /tmp/ocsp-unknown.der -text -noverify
@@ -3820,9 +3814,8 @@ openssl ocsp -respin /tmp/ocsp-unknown.der -text -noverify
To test: revoke a cert that was issued under the `prof-short-lived` profile, then check the DER CRL. The revoked short-lived cert should NOT appear.
```bash
# After revoking a short-lived cert (serial SHORT_SERIAL):
curl -s -H "Authorization: Bearer $API_KEY" \
"http://localhost:8443/api/v1/crl/iss-local" -o /tmp/crl.der
# After revoking a short-lived cert (serial SHORT_SERIAL). No auth needed.
curl -s "http://localhost:8443/.well-known/pki/crl/iss-local" -o /tmp/crl.der
openssl crl -in /tmp/crl.der -inform DER -text | grep -i "$SHORT_SERIAL"
```
@@ -6594,6 +6587,231 @@ helm template certctl deploy/helm/certctl/ --set server.replicaCount=3 | grep 'r
---
## Part 55: Agent Soft-Retirement (I-004)
**What this validates:** The full `DELETE /api/v1/agents/{id}` soft-retirement contract — seven HTTP status codes (200/204/400/403/404/405/409/500), opt-in retired-agent listing, sentinel refusal, `410 Gone` heartbeat response, and the force-cascade escape hatch.
**Why it matters:** Before I-004, there was no retirement surface at all — `DELETE` did not exist and agents could only be removed via raw SQL against the `agents` table. Worse, the schema declared `deployment_targets.agent_id ON DELETE CASCADE`, so any such manual delete silently cascaded through four tables with zero audit trail. This part pins the replacement contract (soft-delete + preflight + force-cascade + sentinel guard + heartbeat 410) so regressions show up here first rather than as orphaned targets in production.
### 55.1 Migration 000015 Applied
```bash
docker compose -f deploy/docker-compose.yml exec postgres \
psql -U certctl -d certctl -c \
"SELECT column_name FROM information_schema.columns WHERE table_name='agents' AND column_name IN ('retired_at','retired_reason') ORDER BY column_name;"
```
**What:** Confirms migration 000015 added the archival columns to the `agents` table.
**PASS if** both `retired_at` and `retired_reason` rows are returned. **FAIL** if either is missing (migration did not apply).
---
### 55.2 FK Constraint Flipped to RESTRICT
```bash
docker compose -f deploy/docker-compose.yml exec postgres \
psql -U certctl -d certctl -c \
"SELECT confdeltype FROM pg_constraint WHERE conname='deployment_targets_agent_id_fkey';"
```
**What:** `confdeltype` is PostgreSQL's one-character code for the FK delete action: `r` = RESTRICT, `c` = CASCADE.
**PASS if** the value is `r`. **FAIL** if it is still `c` — that means migration 000015's FK flip did not run, and a hard `DELETE` against an agent row would silently cascade.
---
### 55.3 Clean Retire — 200
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-test-clean" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Retires an agent that has no active deployment targets, no deployed certificates, and no pending jobs.
**PASS if** status code is `200` and response body includes `"retired_at":"<ISO8601>"`, `"cascade":false`, and zero-valued counts.
---
### 55.4 Idempotent Re-Retire — 204
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-test-clean" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Retires an agent that is already retired.
**PASS if** status code is `204` and response body is completely empty (not even a trailing newline from the handler). The 200-shape must NOT be emitted — this is the terminal no-op.
---
### 55.5 Blocked by Dependencies — 409
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-with-deps" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Attempts to retire an agent that still has active targets/certificates/jobs.
**PASS if** status code is `409` and response body is the three-key `BlockedByDependenciesResponse` shape: `{"error":"blocked_by_dependencies", "message": "...", "counts": {"active_targets": N, "active_certificates": N, "pending_jobs": N}}`. Must NOT be the generic `ErrorResponse` shape — downstream dashboards parse the `counts` key.
---
### 55.6 Force Cascade — 200
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-with-deps?force=true&reason=decommissioning+rack-7" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Uses the force escape hatch to cascade-retire the dependencies.
**PASS if** status code is `200`, response includes `"cascade":true` with the pre-cascade counts, and the subsequent `GET /api/v1/audit-events?action=agent_retirement_cascaded` shows the event with the supplied `reason` and actor.
---
### 55.7 Force Without Reason — 400
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-other?force=true" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Verifies the `ErrForceReasonRequired` guard — `force=true` without `reason` must be rejected before any state mutation.
**PASS if** status code is `400` and no agent/target/job rows were modified.
---
### 55.8 Sentinel Refusal — 403
```bash
for id in server-scanner cloud-aws-sm cloud-azure-kv cloud-gcp-sm; do
echo "=== $id ==="
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/${id}?force=true&reason=attempt" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
done
```
**What:** Verifies all four sentinel agents refuse retirement even with `force=true`.
**PASS if** every request returns `403` and the response body's `error` value is `sentinel_agent` (or the equivalent `ErrAgentIsSentinel` mapping). **FAIL** if any sentinel accepts the request — retiring one silently orphans the network scanner or one of the three cloud secret-manager discovery sources.
---
### 55.9 Unknown ID — 404
```bash
curl -sS -X DELETE "http://localhost:8443/api/v1/agents/ag-does-not-exist" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-w "\nHTTP %{http_code}\n"
```
**What:** Verifies `ErrAgentNotFound` maps to 404 (not 500). Ordering matters — the not-found check must come after the sentinel check so a typo'd sentinel ID still returns 403, not 404.
**PASS if** status code is `404`.
---
### 55.10 Heartbeat on Retired Agent — 410
```bash
curl -sS -X POST "http://localhost:8443/api/v1/agents/ag-test-clean/heartbeat" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"os":"linux","architecture":"amd64","hostname":"test","ip_address":"10.0.0.1","version":"2.1.0"}' \
-w "\nHTTP %{http_code}\n"
```
**What:** Retired agents get `410 Gone` — the canonical "resource is permanently gone, stop retrying" signal — so `cmd/agent` detects it and exits cleanly.
**PASS if** status code is `410`. **FAIL** if it is `404` (wrong ordering — retired-check must run before not-found) or `200` (retired filter missing entirely — agent would keep phoning home forever).
---
### 55.11 Default List Excludes Retired
```bash
curl -sS "http://localhost:8443/api/v1/agents" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
| jq -r '.data[] | select(.id=="ag-test-clean") | .id'
```
**What:** Verifies the default `/agents` listing filters retired rows via `AgentRepository.ListActive`.
**PASS if** output is empty (the retired agent does NOT appear). **FAIL** if `ag-test-clean` shows up — default listings must not expose retired rows.
---
### 55.12 Retired Agents Opt-In View
```bash
curl -sS "http://localhost:8443/api/v1/agents/retired" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
| jq -r '.data[] | select(.id=="ag-test-clean") | {id, retired_at, retired_reason}'
```
**What:** Verifies the opt-in retired-agents view returns the row with `retired_at` and `retired_reason` populated. Go 1.22 ServeMux literal-beats-pattern-var precedence routes `/agents/retired` to this handler rather than `/agents/{id}`.
**PASS if** the row appears with non-null `retired_at`. **FAIL** if the row is missing (listing broken) or `retired_at` is null (serialization broken).
---
### 55.13 Dashboard Stats Counter Excludes Retired
```bash
curl -sS "http://localhost:8443/api/v1/stats/summary" \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
| jq -r '.total_agents'
```
**What:** Stats dashboard uses `ListActive`, not `List` — retired agents must not inflate the count.
**PASS if** the counter reflects only non-retired rows (verify against `SELECT count(*) FROM agents WHERE retired_at IS NULL`).
---
### 55.14 CLI Retire Subcommand
```bash
certctl-cli agents retire ag-cli-test --force --reason "smoke test"
certctl-cli agents list --retired | grep ag-cli-test
```
**What:** Verifies the CLI `agents retire` subcommand forwards `--force` and `--reason` via `DeleteWithQuery` and the `agents list --retired` flag hits `/agents/retired` rather than the default listing.
**PASS if** the first command succeeds and the second shows the agent in the retired view.
---
### 55.15 MCP Retire Tool Schema
```bash
go test ./internal/mcp/ -run TestRetireAgent -v -count=1
```
**What:** Verifies the `certctl_retire_agent` MCP tool's input schema accepts `id`, `force`, and `reason`, and that the tool actually propagates `force`/`reason` into the outbound DELETE query string (not the body).
**PASS if** exit code 0.
---
### 55.16 HEAD-State OpenAPI Contract
```bash
npx --yes @redocly/cli lint api/openapi.yaml \
--config '{"rules":{"operation-4xx-response":"error","no-invalid-media-type-examples":"error"}}'
python3 -c "
import yaml
spec = yaml.safe_load(open('api/openapi.yaml'))
del_op = spec['paths']['/api/v1/agents/{id}']['delete']
assert set(del_op['responses'].keys()) == {'200','204','400','403','404','405','409','500'}, del_op['responses'].keys()
hb = spec['paths']['/api/v1/agents/{id}/heartbeat']['post']
assert '410' in hb['responses'], hb['responses'].keys()
assert spec['paths']['/api/v1/agents/retired']['get']['operationId'] == 'listRetiredAgents'
print('OpenAPI I-004 contract: OK')
"
```
**What:** Two-part check. Redocly lint confirms the spec is structurally valid; the Python assertions pin the seven DELETE status codes, the 410 heartbeat response, and the retired-agents operationId.
**PASS if** redocly prints no errors and the Python script prints `OpenAPI I-004 contract: OK`.
---
## Release Sign-Off
All tests below must pass before tagging v2.1.0. Each row is one individual test from the guide above. The **Method** column indicates whether `qa-smoke-test.sh` covers the test automatically (**Auto**) or requires hands-on verification (**Manual**).
+31 -24
View File
@@ -27,6 +27,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -120,7 +121,7 @@ func TestGetCertificate_PathInjection(t *testing.T) {
handler, mock := newCertHandlerWithMock()
// Force a 404 so we can distinguish "service was called" from
// "parser accepted the ID"; a 200 with null body is also fine.
mock.GetCertificateFn = func(id string) (*domain.ManagedCertificate, error) {
mock.GetCertificateFn = func(_ context.Context, id string) (*domain.ManagedCertificate, error) {
return nil, ErrMockNotFound
}
@@ -156,7 +157,7 @@ func TestUpdateCertificate_PathInjection(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.UpdateCertificateFn = func(id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
mock.UpdateCertificateFn = func(_ context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
return nil, ErrMockNotFound
}
@@ -184,7 +185,7 @@ func TestArchiveCertificate_PathInjection(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ArchiveCertificateFn = func(id string) error { return ErrMockNotFound }
mock.ArchiveCertificateFn = func(_ context.Context, id string) error { return ErrMockNotFound }
req := httptest.NewRequest(http.MethodDelete, "/api/v1/certificates/x", nil)
req.URL.Path = "/api/v1/certificates/" + tc.input
@@ -227,7 +228,7 @@ func TestGetCertificateVersions_MultiSegment(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.GetCertificateVersionsFn = func(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
mock.GetCertificateVersionsFn = func(_ context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
return []domain.CertificateVersion{}, 0, nil
}
@@ -246,26 +247,30 @@ func TestGetCertificateVersions_MultiSegment(t *testing.T) {
}
// TestHandleOCSP_MultiSegment exercises the OCSP responder's 2-segment path
// parser (/api/v1/ocsp/{issuer_id}/{serial_hex}). Each leg is attacker-
// controlled and the serial can be arbitrary length. This is a key adversarial
// surface because the serial is passed directly to the CA-operations service,
// which is expected to treat it as an opaque identifier.
// parser (/.well-known/pki/ocsp/{issuer_id}/{serial_hex}). Each leg is
// attacker-controlled and the serial can be arbitrary length. This is a key
// adversarial surface because the serial is passed directly to the
// CA-operations service, which is expected to treat it as an opaque
// identifier.
//
// M-006 relocation: these paths were previously served at /api/v1/ocsp/*;
// under RFC 8615 and RFC 6960 they now live under /.well-known/pki/ocsp/*.
func TestHandleOCSP_MultiSegment(t *testing.T) {
cases := []struct {
name string
path string
}{
{"missing_serial", "/api/v1/ocsp/iss-local"},
{"missing_both", "/api/v1/ocsp/"},
{"empty_issuer", "/api/v1/ocsp//01ABCDEF"},
{"empty_serial", "/api/v1/ocsp/iss-local/"},
{"traversal_issuer", "/api/v1/ocsp/..%2F..%2Fetc/passwd/01"},
{"null_byte_serial", "/api/v1/ocsp/iss-local/01\x00FF"},
{"sql_injection_serial", "/api/v1/ocsp/iss-local/01'; DROP TABLE--"},
{"negative_hex_serial", "/api/v1/ocsp/iss-local/-1"},
{"unicode_serial", "/api/v1/ocsp/iss-local/01\u2010FF"},
{"extremely_long_serial", "/api/v1/ocsp/iss-local/" + strings.Repeat("F", 10000)},
{"extra_segments", "/api/v1/ocsp/iss-local/01FF/extra/segments"},
{"missing_serial", "/.well-known/pki/ocsp/iss-local"},
{"missing_both", "/.well-known/pki/ocsp/"},
{"empty_issuer", "/.well-known/pki/ocsp//01ABCDEF"},
{"empty_serial", "/.well-known/pki/ocsp/iss-local/"},
{"traversal_issuer", "/.well-known/pki/ocsp/..%2F..%2Fetc/passwd/01"},
{"null_byte_serial", "/.well-known/pki/ocsp/iss-local/01\x00FF"},
{"sql_injection_serial", "/.well-known/pki/ocsp/iss-local/01'; DROP TABLE--"},
{"negative_hex_serial", "/.well-known/pki/ocsp/iss-local/-1"},
{"unicode_serial", "/.well-known/pki/ocsp/iss-local/01\u2010FF"},
{"extremely_long_serial", "/.well-known/pki/ocsp/iss-local/" + strings.Repeat("F", 10000)},
{"extra_segments", "/.well-known/pki/ocsp/iss-local/01FF/extra/segments"},
}
for _, tc := range cases {
@@ -277,7 +282,7 @@ func TestHandleOCSP_MultiSegment(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.GetOCSPResponseFn = func(issuerID, serialHex string) ([]byte, error) {
mock.GetOCSPResponseFn = func(_ context.Context, issuerID, serialHex string) ([]byte, error) {
return nil, ErrMockNotFound
}
@@ -300,7 +305,9 @@ func TestHandleOCSP_MultiSegment(t *testing.T) {
}
}
// TestGetDERCRL_IssuerPathInjection exercises /api/v1/crl/{issuer_id}.
// TestGetDERCRL_IssuerPathInjection exercises
// /.well-known/pki/crl/{issuer_id} (RFC 5280 CRL; M-006 relocation from
// /api/v1/crl/{issuer_id}).
func TestGetDERCRL_IssuerPathInjection(t *testing.T) {
for _, tc := range adversarialPathInputs() {
t.Run(tc.name, func(t *testing.T) {
@@ -311,12 +318,12 @@ func TestGetDERCRL_IssuerPathInjection(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.GenerateDERCRLFn = func(issuerID string) ([]byte, error) {
mock.GenerateDERCRLFn = func(_ context.Context, issuerID string) ([]byte, error) {
return nil, ErrMockNotFound
}
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl/x", nil)
req.URL.Path = "/api/v1/crl/" + tc.input
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/crl/x", nil)
req.URL.Path = "/.well-known/pki/crl/" + tc.input
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
+12 -11
View File
@@ -19,6 +19,7 @@ package handler
import (
"bytes"
"context"
"fmt"
"net/http"
"net/http/httptest"
@@ -76,7 +77,7 @@ func TestListCertificates_PaginationAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
// Sanity: page/perPage on the filter must never be negative
// and perPage must never exceed 500 after parsing.
if filter.Page < 1 {
@@ -133,7 +134,7 @@ func TestListCertificates_SortAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -175,7 +176,7 @@ func TestListCertificates_FieldsAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -219,7 +220,7 @@ func TestListCertificates_TimeRangeAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -263,7 +264,7 @@ func TestListCertificates_CursorAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -314,7 +315,7 @@ func TestListCertificates_FilterInjection(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.ListCertificatesWithFilterFn = func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
mock.ListCertificatesWithFilterFn = func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -374,7 +375,7 @@ func TestCreateCertificate_BodyAbuse(t *testing.T) {
}()
handler, mock := newCertHandlerWithMock()
mock.CreateCertificateFn = func(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
mock.CreateCertificateFn = func(_ context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
// If we ever reach this, the handler accepted a malformed
// body. Return a sentinel that passes but flag it.
c := cert
@@ -419,7 +420,7 @@ func TestCreateCertificate_HugeBody(t *testing.T) {
sb.WriteString(`]}`)
handler, mock := newCertHandlerWithMock()
mock.CreateCertificateFn = func(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
mock.CreateCertificateFn = func(_ context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
c := cert
c.ID = "mc-huge"
return &c, nil
@@ -476,7 +477,7 @@ func TestRevokeCertificate_ReasonAbuse(t *testing.T) {
handler, mock := newCertHandlerWithMock()
// The mock always returns "invalid revocation reason" so we
// verify the handler's errMsg→status mapping turns it into a 400.
mock.RevokeCertificateFn = func(id string, reason string) error {
mock.RevokeCertificateFn = func(_ context.Context, id string, reason string, _ string) error {
// The service uses domain.IsValidRevocationReason. If we got
// through to here with something bogus, simulate a real
// service error.
@@ -500,7 +501,7 @@ func TestRevokeCertificate_ReasonAbuse(t *testing.T) {
// service error message, which is fragile — this test catches regressions.
func TestRevokeCertificate_AlreadyRevoked(t *testing.T) {
handler, mock := newCertHandlerWithMock()
mock.RevokeCertificateFn = func(id string, reason string) error {
mock.RevokeCertificateFn = func(_ context.Context, id string, reason string, _ string) error {
return fmt.Errorf("cannot revoke: certificate is already revoked")
}
@@ -520,7 +521,7 @@ func TestRevokeCertificate_AlreadyRevoked(t *testing.T) {
// TestRevokeCertificate_NotFound verifies 404 mapping.
func TestRevokeCertificate_NotFound(t *testing.T) {
handler, mock := newCertHandlerWithMock()
mock.RevokeCertificateFn = func(id string, reason string) error {
mock.RevokeCertificateFn = func(_ context.Context, id string, reason string, _ string) error {
return fmt.Errorf("certificate not found")
}
@@ -10,6 +10,7 @@ import (
"time"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// MockAgentService is a mock implementation of AgentService interface.
@@ -24,6 +25,11 @@ type MockAgentService struct {
GetWorkFn func(agentID string) ([]domain.Job, error)
GetWorkWithTargetsFn func(agentID string) ([]domain.WorkItem, error)
UpdateJobStatusFn func(agentID string, jobID string, status string, errMsg string) error
// I-004: soft-retirement hooks. Tests that don't set these receive nil
// results and nil errors, which mirrors the safest default (no-op) for
// unrelated suites that mock only the legacy surface.
RetireAgentFn func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error)
ListRetiredAgentsFn func(page, perPage int) ([]domain.Agent, int64, error)
}
func (m *MockAgentService) ListAgents(_ context.Context, page, perPage int) ([]domain.Agent, int64, error) {
@@ -96,6 +102,25 @@ func (m *MockAgentService) UpdateJobStatus(_ context.Context, agentID string, jo
return nil
}
// RetireAgent is the I-004 soft-retirement entrypoint. Tests that don't set
// RetireAgentFn get a nil result + nil error, which is a no-op response that
// lets unrelated suites compile without caring about the retirement surface.
func (m *MockAgentService) RetireAgent(_ context.Context, agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
if m.RetireAgentFn != nil {
return m.RetireAgentFn(agentID, actor, force, reason)
}
return nil, nil
}
// ListRetiredAgents returns retired rows for the retired-agents tab / audit
// views. Same zero-value default as RetireAgent for unrelated tests.
func (m *MockAgentService) ListRetiredAgents(_ context.Context, page, perPage int) ([]domain.Agent, int64, error) {
if m.ListRetiredAgentsFn != nil {
return m.ListRetiredAgentsFn(page, perPage)
}
return nil, 0, nil
}
// Test ListAgents - success case
func TestListAgents_Success(t *testing.T) {
now := time.Now()
@@ -0,0 +1,393 @@
package handler
import (
"context"
"encoding/json"
"errors"
"net/http"
"net/http/httptest"
"testing"
"time"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// agentRetireTestSetup builds an AgentHandler with a mock AgentService whose
// RetireAgent / ListRetiredAgents / Heartbeat behavior is driven by the
// returned mock. Keeps every I-004 handler test self-contained so a single
// failing assertion can't cascade through a shared fixture.
func agentRetireTestSetup() (*MockAgentService, AgentHandler) {
mock := &MockAgentService{}
handler := NewAgentHandler(mock)
return mock, handler
}
// TestRetireAgentHandler_Success_200 pins the happy-path contract for the
// soft-retirement HTTP surface: DELETE /api/v1/agents/{id} with no dependency
// fallout returns 200 OK and a JSON body echoing retirement metadata
// (retired_at timestamp, already_retired=false, cascade=false, zero counts).
// Operators building dashboards parse these fields; keep the shape stable.
func TestRetireAgentHandler_Success_200(t *testing.T) {
retiredAt := time.Date(2026, 4, 18, 12, 0, 0, 0, time.UTC)
mock, handler := agentRetireTestSetup()
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
if agentID != "a-prod-001" {
t.Fatalf("retire handler received agentID=%q want a-prod-001", agentID)
}
if force {
t.Fatalf("retire handler set force=true unexpectedly; default path must be force=false")
}
return &service.AgentRetirementResult{
AlreadyRetired: false,
Cascade: false,
RetiredAt: retiredAt,
Counts: domain.AgentDependencyCounts{},
}, nil
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/a-prod-001", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%s want 200", w.Code, w.Body.String())
}
var body struct {
RetiredAt time.Time `json:"retired_at"`
AlreadyRetired bool `json:"already_retired"`
Cascade bool `json:"cascade"`
Counts domain.AgentDependencyCounts `json:"counts"`
}
if err := json.NewDecoder(w.Body).Decode(&body); err != nil {
t.Fatalf("decode 200 body: %v", err)
}
if !body.RetiredAt.Equal(retiredAt) {
t.Errorf("retired_at=%v want %v", body.RetiredAt, retiredAt)
}
if body.AlreadyRetired {
t.Errorf("already_retired=true want false on clean retire")
}
if body.Cascade {
t.Errorf("cascade=true want false on clean retire")
}
}
// TestRetireAgentHandler_AlreadyRetired_204 covers the idempotent contract: a
// retire call against an already-retired agent completes with 204 No Content
// (no body). This lets operators safely re-issue the DELETE after a network
// blip without fearing duplicate audit events or state mutations.
func TestRetireAgentHandler_AlreadyRetired_204(t *testing.T) {
mock, handler := agentRetireTestSetup()
past := time.Now().Add(-24 * time.Hour)
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
return &service.AgentRetirementResult{
AlreadyRetired: true,
Cascade: false,
RetiredAt: past,
Counts: domain.AgentDependencyCounts{},
}, nil
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/a-prod-001", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusNoContent {
t.Fatalf("status=%d body=%s want 204", w.Code, w.Body.String())
}
// 204 No Content must have zero body. If anything leaks through, downstream
// clients (curl scripts, dashboards) break.
if w.Body.Len() != 0 {
t.Errorf("204 body=%q want empty", w.Body.String())
}
}
// TestRetireAgentHandler_Sentinel_403 covers the hard guard against retiring
// any of the four sentinel agents that back discovery sources and the
// network scanner. These IDs are reserved; the handler must surface the
// service-layer ErrAgentIsSentinel as 403 Forbidden regardless of force/reason
// because no operator intent can legitimately retire them.
func TestRetireAgentHandler_Sentinel_403(t *testing.T) {
sentinels := []string{"server-scanner", "cloud-aws-sm", "cloud-azure-kv", "cloud-gcp-sm"}
for _, id := range sentinels {
t.Run(id, func(t *testing.T) {
mock, handler := agentRetireTestSetup()
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
return nil, service.ErrAgentIsSentinel
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/"+id, nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusForbidden {
t.Fatalf("sentinel %q status=%d body=%s want 403", id, w.Code, w.Body.String())
}
})
}
}
// TestRetireAgentHandler_NotFound_404 covers the lookup-miss path. Service
// returns a not-found error; handler maps to 404. Keeping the error
// discrimination at the service layer (sentinel errors.Is) rather than string
// matching is the whole point of wrapping.
func TestRetireAgentHandler_NotFound_404(t *testing.T) {
mock, handler := agentRetireTestSetup()
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
return nil, errors.New("agent not found")
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/unknown-id", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusNotFound {
t.Fatalf("status=%d body=%s want 404", w.Code, w.Body.String())
}
}
// TestRetireAgentHandler_Blocked_409_WithCounts covers the preflight-blocked
// path. Service returns *BlockedByDependenciesError wrapping
// ErrBlockedByDependencies; handler unwraps via errors.As, maps to 409, and
// MUST include the counts in the response body so operators know what's
// blocking them. Without counts the 409 is useless — the operator has to
// guess which downstream dependency is holding up the retirement.
func TestRetireAgentHandler_Blocked_409_WithCounts(t *testing.T) {
mock, handler := agentRetireTestSetup()
blockCounts := domain.AgentDependencyCounts{
ActiveTargets: 3,
ActiveCertificates: 7,
PendingJobs: 2,
}
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
return nil, &service.BlockedByDependenciesError{Counts: blockCounts}
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/a-prod-001", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusConflict {
t.Fatalf("status=%d body=%s want 409", w.Code, w.Body.String())
}
var body struct {
Error string `json:"error"`
Message string `json:"message"`
Counts domain.AgentDependencyCounts `json:"counts"`
}
if err := json.NewDecoder(w.Body).Decode(&body); err != nil {
t.Fatalf("decode 409 body: %v", err)
}
if body.Counts.ActiveTargets != 3 {
t.Errorf("counts.active_targets=%d want 3", body.Counts.ActiveTargets)
}
if body.Counts.ActiveCertificates != 7 {
t.Errorf("counts.active_certificates=%d want 7", body.Counts.ActiveCertificates)
}
if body.Counts.PendingJobs != 2 {
t.Errorf("counts.pending_jobs=%d want 2", body.Counts.PendingJobs)
}
if body.Message == "" {
t.Errorf("409 body missing human-readable message; operators need guidance")
}
}
// TestRetireAgentHandler_Force_NoReason_400 covers the force-escape-hatch
// guardrail: force=true without a non-empty reason must be rejected at the
// handler seam BEFORE the service performs any DB work, because a
// reason-less cascade is unauditable. Service returns ErrForceReasonRequired;
// handler maps to 400.
func TestRetireAgentHandler_Force_NoReason_400(t *testing.T) {
mock, handler := agentRetireTestSetup()
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
if !force {
t.Fatalf("handler did not forward force=true; force query param was dropped")
}
if reason != "" {
t.Fatalf("handler passed reason=%q; empty reason must reach service for error path", reason)
}
return nil, service.ErrForceReasonRequired
}
req := httptest.NewRequest(http.MethodDelete, "/api/v1/agents/a-prod-001?force=true", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusBadRequest {
t.Fatalf("status=%d body=%s want 400", w.Code, w.Body.String())
}
}
// TestRetireAgentHandler_ForceCascade_200 covers the successful force-cascade
// path: DELETE ?force=true&reason=... → service executes transactional
// cascade → 200 with cascade=true and the pre-cascade counts echoed back so
// the operator's confirmation dialog can show "I just retired N targets,
// M certificates, K pending jobs."
func TestRetireAgentHandler_ForceCascade_200(t *testing.T) {
mock, handler := agentRetireTestSetup()
retiredAt := time.Date(2026, 4, 18, 14, 30, 0, 0, time.UTC)
mock.RetireAgentFn = func(agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error) {
if !force {
t.Fatalf("handler did not forward force=true; query-param parsing broken")
}
if reason != "decommissioning rack 7" {
t.Fatalf("handler forwarded reason=%q want %q", reason, "decommissioning rack 7")
}
return &service.AgentRetirementResult{
AlreadyRetired: false,
Cascade: true,
RetiredAt: retiredAt,
Counts: domain.AgentDependencyCounts{
ActiveTargets: 2,
ActiveCertificates: 5,
PendingJobs: 1,
},
}, nil
}
url := "/api/v1/agents/a-prod-001?force=true&reason=decommissioning+rack+7"
req := httptest.NewRequest(http.MethodDelete, url, nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%s want 200", w.Code, w.Body.String())
}
var body struct {
RetiredAt time.Time `json:"retired_at"`
AlreadyRetired bool `json:"already_retired"`
Cascade bool `json:"cascade"`
Counts domain.AgentDependencyCounts `json:"counts"`
}
if err := json.NewDecoder(w.Body).Decode(&body); err != nil {
t.Fatalf("decode force-cascade 200 body: %v", err)
}
if !body.Cascade {
t.Errorf("cascade=false want true on ?force=true successful retire")
}
if body.Counts.ActiveTargets != 2 || body.Counts.ActiveCertificates != 5 || body.Counts.PendingJobs != 1 {
t.Errorf("counts=%+v want {ActiveTargets:2 ActiveCertificates:5 PendingJobs:1}", body.Counts)
}
}
// TestHeartbeatHandler_RetiredAgent_410 covers the agent-shutdown signal. A
// retired agent that is still polling must be told its identity is gone
// (410 Gone) rather than offered the normal 200 "recorded" response.
// cmd/agent treats 410 as a terminal signal and exits rather than looping
// forever against a decommissioned identity. Service returns ErrAgentRetired;
// handler maps to 410.
func TestHeartbeatHandler_RetiredAgent_410(t *testing.T) {
mock, handler := agentRetireTestSetup()
mock.HeartbeatFn = func(agentID string, metadata *domain.AgentMetadata) error {
return service.ErrAgentRetired
}
req := httptest.NewRequest(http.MethodPost, "/api/v1/agents/a-prod-001/heartbeat", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.Heartbeat(w, req)
if w.Code != http.StatusGone {
t.Fatalf("heartbeat(retired) status=%d body=%s want 410", w.Code, w.Body.String())
}
}
// TestListRetiredAgentsHandler_Success covers the audit/forensics-facing
// endpoint GET /api/v1/agents/retired. Returns a paged list of retired rows
// alongside total count so the GUI can render a "Retired Agents" tab with
// pagination. Default listing (GET /agents) hides retired rows; this is the
// opt-in surface for them.
func TestListRetiredAgentsHandler_Success(t *testing.T) {
past := time.Now().Add(-48 * time.Hour)
reason := "old hardware"
retired := []domain.Agent{
{
ID: "agent-retired-01",
Name: "decom-01",
Hostname: "server-old",
Status: domain.AgentStatusOffline,
RegisteredAt: past,
RetiredAt: &past,
RetiredReason: &reason,
},
}
mock, handler := agentRetireTestSetup()
mock.ListRetiredAgentsFn = func(page, perPage int) ([]domain.Agent, int64, error) {
if page != 1 || perPage != 50 {
t.Fatalf("ListRetired handler received page=%d perPage=%d want 1/50 defaults", page, perPage)
}
return retired, 1, nil
}
req := httptest.NewRequest(http.MethodGet, "/api/v1/agents/retired", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.ListRetiredAgents(w, req)
if w.Code != http.StatusOK {
t.Fatalf("status=%d body=%s want 200", w.Code, w.Body.String())
}
var response PagedResponse
if err := json.NewDecoder(w.Body).Decode(&response); err != nil {
t.Fatalf("decode list-retired body: %v", err)
}
if response.Total != 1 {
t.Errorf("total=%d want 1", response.Total)
}
}
// TestRetireAgentHandler_MethodNotAllowed covers defense-in-depth: only
// DELETE is valid on /api/v1/agents/{id} for retirement. Using POST/PUT/PATCH
// must be rejected with 405 so misconfigured callers don't accidentally
// trigger retirement via a wrong-method request.
func TestRetireAgentHandler_MethodNotAllowed(t *testing.T) {
_, handler := agentRetireTestSetup()
for _, method := range []string{http.MethodPost, http.MethodPut, http.MethodPatch} {
t.Run(method, func(t *testing.T) {
req := httptest.NewRequest(method, "/api/v1/agents/a-prod-001", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.RetireAgent(w, req)
if w.Code != http.StatusMethodNotAllowed {
t.Fatalf("method=%s status=%d want 405", method, w.Code)
}
})
}
}
// Compile-time asserts: the mock must satisfy the handler's AgentService
// interface. Red state: this fails until the interface grows RetireAgent +
// ListRetiredAgents. Once Phase 2b adds those methods to AgentService, this
// assertion goes green along with every test above.
var _ AgentService = (*MockAgentService)(nil)
// Unused-import suppressor for context — the package-level tests already
// pull context from agent_handler_test.go, but leaving this here documents
// that the mock methods receive context.Context values even though this
// file's tests don't construct them directly (they ride on httptest.NewRequest).
var _ = context.Background
+199
View File
@@ -3,16 +3,24 @@ package handler
import (
"context"
"encoding/json"
"errors"
"log/slog"
"net/http"
"strconv"
"strings"
"time"
"github.com/shankar0123/certctl/internal/api/middleware"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// AgentService defines the service interface for agent operations.
//
// I-004 expansion: RetireAgent + ListRetiredAgents back the soft-retirement
// surface. The handler depends on the service-package's AgentRetirementResult
// and BlockedByDependenciesError types for result shape + errors.As unwrap,
// which is why this file imports internal/service.
type AgentService interface {
ListAgents(ctx context.Context, page, perPage int) ([]domain.Agent, int64, error)
GetAgent(ctx context.Context, id string) (*domain.Agent, error)
@@ -24,6 +32,10 @@ type AgentService interface {
GetWork(ctx context.Context, agentID string) ([]domain.Job, error)
GetWorkWithTargets(ctx context.Context, agentID string) ([]domain.WorkItem, error)
UpdateJobStatus(ctx context.Context, agentID string, jobID string, status string, errMsg string) error
// I-004 soft-retirement API. Both default to no-op (nil result / nil error)
// in mocks that don't override them — handler tests opt in per suite.
RetireAgent(ctx context.Context, agentID, actor string, force bool, reason string) (*service.AgentRetirementResult, error)
ListRetiredAgents(ctx context.Context, page, perPage int) ([]domain.Agent, int64, error)
}
// AgentHandler handles HTTP requests for agent operations.
@@ -190,6 +202,15 @@ func (h AgentHandler) Heartbeat(w http.ResponseWriter, r *http.Request) {
}
if err := h.svc.Heartbeat(r.Context(), agentID, metadata); err != nil {
// I-004: a retired agent still polling must receive 410 Gone so
// cmd/agent detects the terminal signal and shuts down cleanly
// instead of looping forever against a decommissioned identity.
// Check this FIRST — before "not found" string matching — so the
// retired-path is never masked by a sibling error branch.
if errors.Is(err, service.ErrAgentRetired) {
ErrorWithRequestID(w, http.StatusGone, "Agent has been retired", requestID)
return
}
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Agent not found", requestID)
return
@@ -376,3 +397,181 @@ func (h AgentHandler) AgentReportJobStatus(w http.ResponseWriter, r *http.Reques
"status": "updated",
})
}
// RetireAgent executes the I-004 soft-retirement surface.
// DELETE /api/v1/agents/{id}[?force=true&reason=...]
//
// Contract (pinned by agent_retire_handler_test.go):
//
// 405 any method other than DELETE
// 200 clean retire (body: retired_at, already_retired=false, cascade=false, counts=0s)
// 200 force-cascade retire (body: cascade=true, counts=pre-cascade snapshot)
// 204 idempotent retire of an already-retired agent (NO body — downstream
// clients that tee responses into dashboards break on spurious bodies)
// 400 force=true without a non-empty reason (ErrForceReasonRequired)
// 403 one of the four reserved sentinel IDs (ErrAgentIsSentinel)
// 404 agent does not exist ("not found" string match, kept for compat with
// repo error strings; sentinel checks run first so they never mask)
// 409 blocked by preflight counts (*BlockedByDependenciesError) — body
// carries the per-bucket counts so the operator UI can tell the
// human which downstream dependency is holding up the retirement,
// rather than forcing them to re-run the DELETE with ?force=true
// and guess
// 500 anything else
//
// The 409 body intentionally does NOT go through ErrorWithRequestID because
// that helper's ErrorResponse shape has no `counts` field — we inline-marshal
// a custom body instead. Keeping this shape stable is important: the GUI
// pattern is "show the 409 dialog, list the N targets / M certs / K jobs
// blocking, let the operator retire them first or tick the force checkbox."
func (h AgentHandler) RetireAgent(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodDelete {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract {id} from /api/v1/agents/{id}. Mirror GetAgent's pattern so
// the path parser is identical across the agent handler surface and a
// future refactor can extract it once without introducing drift.
rawID := strings.TrimPrefix(r.URL.Path, "/api/v1/agents/")
parts := strings.Split(rawID, "/")
if len(parts) == 0 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Agent ID is required", requestID)
return
}
id := parts[0]
// Parse optional force + reason. A missing `force` param is treated as
// force=false (the default, safe path); anything strconv.ParseBool rejects
// is also force=false so a malformed query can never silently enable the
// cascade. The reason string is passed through verbatim — the service
// owns the "force=true requires reason" rule.
query := r.URL.Query()
force := false
if fv := query.Get("force"); fv != "" {
if parsed, err := strconv.ParseBool(fv); err == nil {
force = parsed
}
}
reason := query.Get("reason")
actor := resolveActor(r.Context())
result, err := h.svc.RetireAgent(r.Context(), id, actor, force, reason)
if err != nil {
// Sentinel + typed-error checks run BEFORE string matching on "not
// found" so a repo error that happens to contain those words can
// never mask a structural refusal (403/400/409). Order matters.
if errors.Is(err, service.ErrAgentIsSentinel) {
ErrorWithRequestID(w, http.StatusForbidden, "Agent is a reserved sentinel and cannot be retired", requestID)
return
}
if errors.Is(err, service.ErrForceReasonRequired) {
ErrorWithRequestID(w, http.StatusBadRequest, "force=true requires a non-empty reason", requestID)
return
}
var blocked *service.BlockedByDependenciesError
if errors.As(err, &blocked) {
// Custom 409 body with per-bucket counts. ErrorResponse has no
// `counts` field, so we marshal a bespoke struct instead.
// Keep `error`/`message`/`counts` as the stable shape — any
// dashboard parsing this relies on those three keys.
body := struct {
Error string `json:"error"`
Message string `json:"message"`
Counts domain.AgentDependencyCounts `json:"counts"`
}{
Error: "blocked_by_dependencies",
Message: "Agent has active downstream dependencies. Retire or reassign them " +
"first, or re-run with ?force=true&reason=... to cascade.",
Counts: blocked.Counts,
}
JSON(w, http.StatusConflict, body)
return
}
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Agent not found", requestID)
return
}
slog.Error("RetireAgent failed", "agent_id", id, "error", err.Error())
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to retire agent", requestID)
return
}
// Idempotent retire: the agent was already retired, so we return 204 No
// Content with a ZERO-length body. The Red contract (test line 106) fails
// if even a trailing newline leaks into the response. WriteHeader alone
// emits the status without invoking the JSON encoder.
if result.AlreadyRetired {
w.WriteHeader(http.StatusNoContent)
return
}
// Clean retire (force=false) or successful cascade (force=true). Body
// shape pinned by Red contract: retired_at, already_retired, cascade,
// counts. Omitempty is deliberately NOT used — operators parsing the
// response expect every field to always be present.
JSON(w, http.StatusOK, struct {
RetiredAt time.Time `json:"retired_at"`
AlreadyRetired bool `json:"already_retired"`
Cascade bool `json:"cascade"`
Counts domain.AgentDependencyCounts `json:"counts"`
}{
RetiredAt: result.RetiredAt,
AlreadyRetired: result.AlreadyRetired,
Cascade: result.Cascade,
Counts: result.Counts,
})
}
// ListRetiredAgents returns the opt-in listing of retired agents for the
// operator UI's "Retired" tab and for audit/forensics workflows.
// GET /api/v1/agents/retired?page=1&per_page=50
//
// The default ListAgents handler hides retired rows; this is the dedicated
// surface for reading them back. Pagination defaults match ListAgents so
// the GUI can reuse the same query hook (page=1, per_page=50, cap 500).
//
// Go 1.22's enhanced ServeMux routes `/agents/retired` to this handler via
// the literal-beats-pattern-var precedence rule (literal `retired` wins over
// `{id}` in the sibling GET /api/v1/agents/{id} route), so both entries can
// coexist without conflict. If that precedence ever regresses, the failure
// mode is TestListRetiredAgentsHandler_Success blowing up with a 404 — which
// is the fast signal we want.
func (h AgentHandler) ListRetiredAgents(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
page := 1
perPage := 50
query := r.URL.Query()
if p := query.Get("page"); p != "" {
if parsed, err := strconv.Atoi(p); err == nil && parsed > 0 {
page = parsed
}
}
if pp := query.Get("per_page"); pp != "" {
if parsed, err := strconv.Atoi(pp); err == nil && parsed > 0 && parsed <= 500 {
perPage = parsed
}
}
agents, total, err := h.svc.ListRetiredAgents(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list retired agents", requestID)
return
}
JSON(w, http.StatusOK, PagedResponse{
Data: agents,
Total: total,
Page: page,
PerPage: perPage,
})
}
+5 -4
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"net/http"
"strconv"
"strings"
@@ -11,8 +12,8 @@ import (
// AuditService defines the service interface for audit event operations.
type AuditService interface {
ListAuditEvents(page, perPage int) ([]domain.AuditEvent, int64, error)
GetAuditEvent(id string) (*domain.AuditEvent, error)
ListAuditEvents(ctx context.Context, page, perPage int) ([]domain.AuditEvent, int64, error)
GetAuditEvent(ctx context.Context, id string) (*domain.AuditEvent, error)
}
// AuditHandler handles HTTP requests for audit event operations.
@@ -49,7 +50,7 @@ func (h AuditHandler) ListAuditEvents(w http.ResponseWriter, r *http.Request) {
}
}
events, total, err := h.svc.ListAuditEvents(page, perPage)
events, total, err := h.svc.ListAuditEvents(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list audit events", requestID)
return
@@ -83,7 +84,7 @@ func (h AuditHandler) GetAuditEvent(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
event, err := h.svc.GetAuditEvent(id)
event, err := h.svc.GetAuditEvent(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Audit event not found", requestID)
return
+2 -2
View File
@@ -19,14 +19,14 @@ type mockAuditService struct {
getFunc func(id string) (*domain.AuditEvent, error)
}
func (m *mockAuditService) ListAuditEvents(page, perPage int) ([]domain.AuditEvent, int64, error) {
func (m *mockAuditService) ListAuditEvents(_ context.Context, page, perPage int) ([]domain.AuditEvent, int64, error) {
if m.listFunc != nil {
return m.listFunc(page, perPage)
}
return nil, 0, nil
}
func (m *mockAuditService) GetAuditEvent(id string) (*domain.AuditEvent, error) {
func (m *mockAuditService) GetAuditEvent(_ context.Context, id string) (*domain.AuditEvent, error) {
if m.getFunc != nil {
return m.getFunc(id)
}
+17 -5
View File
@@ -37,6 +37,11 @@ type bulkRevokeRequest struct {
// BulkRevoke handles bulk certificate revocation.
// POST /api/v1/certificates/bulk-revoke
//
// M-003: admin-only. Bulk revocation is a fleet-scale destructive operation —
// a non-admin caller must not be able to invalidate certificates across
// profiles/owners/agents. The gate is enforced here (before body parsing) so a
// non-admin never sees its request criteria evaluated.
func (h BulkRevocationHandler) BulkRevoke(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
@@ -45,6 +50,16 @@ func (h BulkRevocationHandler) BulkRevoke(w http.ResponseWriter, r *http.Request
requestID := middleware.GetRequestID(r.Context())
// M-003: admin-only gate. Non-admin callers are rejected before any
// criteria/body processing to avoid leaking validation behavior to
// unauthorized actors.
if !middleware.IsAdmin(r.Context()) {
ErrorWithRequestID(w, http.StatusForbidden,
"Bulk revocation requires admin privileges",
requestID)
return
}
var req bulkRevokeRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, "Invalid request body", requestID)
@@ -78,11 +93,8 @@ func (h BulkRevocationHandler) BulkRevoke(w http.ResponseWriter, r *http.Request
return
}
// Extract actor from auth context
actor := "api"
if user, ok := middleware.GetUser(r.Context()); ok && user != "" {
actor = user
}
// Extract actor from auth context (M-002: named-key identity → audit trail)
actor := resolveActor(r.Context())
result, err := h.svc.BulkRevoke(r.Context(), criteria, req.Reason, actor)
if err != nil {
@@ -7,8 +7,10 @@ import (
"fmt"
"net/http"
"net/http/httptest"
"strings"
"testing"
"github.com/shankar0123/certctl/internal/api/middleware"
"github.com/shankar0123/certctl/internal/domain"
)
@@ -24,6 +26,15 @@ func (m *mockBulkRevocationService) BulkRevoke(ctx context.Context, criteria dom
return &domain.BulkRevocationResult{}, nil
}
// adminContext returns a context carrying the admin flag, mimicking what the
// auth middleware sets for named-key callers whose entry is admin-tagged.
// M-003: bulk revocation handler requires admin context to reach the service.
func adminContext() context.Context {
ctx := context.WithValue(context.Background(), middleware.RequestIDKey{}, "test-request-id-bulk")
ctx = context.WithValue(ctx, middleware.AdminKey{}, true)
return ctx
}
func TestBulkRevoke_Success_WithIDs(t *testing.T) {
svc := &mockBulkRevocationService{
BulkRevokeFn: func(ctx context.Context, criteria domain.BulkRevocationCriteria, reason string, actor string) (*domain.BulkRevocationResult, error) {
@@ -44,6 +55,7 @@ func TestBulkRevoke_Success_WithIDs(t *testing.T) {
body := `{"reason":"keyCompromise","certificate_ids":["mc-1","mc-2"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -82,6 +94,7 @@ func TestBulkRevoke_Success_WithProfile(t *testing.T) {
body := `{"reason":"keyCompromise","profile_id":"prof-tls"}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -97,6 +110,7 @@ func TestBulkRevoke_MissingReason_400(t *testing.T) {
body := `{"certificate_ids":["mc-1"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -112,6 +126,7 @@ func TestBulkRevoke_EmptyCriteria_400(t *testing.T) {
body := `{"reason":"keyCompromise"}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -127,6 +142,7 @@ func TestBulkRevoke_InvalidReason_400(t *testing.T) {
body := `{"reason":"totallyBogus","certificate_ids":["mc-1"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -139,6 +155,8 @@ func TestBulkRevoke_InvalidReason_400(t *testing.T) {
func TestBulkRevoke_MethodNotAllowed_405(t *testing.T) {
h := NewBulkRevocationHandler(&mockBulkRevocationService{})
// Method check fires before the admin gate, so 405 must hold even for a
// non-admin caller — asserting this keeps the ordering explicit.
req := httptest.NewRequest(http.MethodGet, "/api/v1/certificates/bulk-revoke", nil)
w := httptest.NewRecorder()
@@ -160,6 +178,7 @@ func TestBulkRevoke_ServiceError_500(t *testing.T) {
body := `{"reason":"keyCompromise","certificate_ids":["mc-1"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(adminContext())
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
@@ -168,3 +187,103 @@ func TestBulkRevoke_ServiceError_500(t *testing.T) {
t.Errorf("expected 500, got %d", w.Code)
}
}
// --- M-003: admin-only gate on bulk revocation ---
// TestBulkRevoke_NonAdmin_Returns403 is the central authorization regression
// for M-003. A caller without an admin-tagged context must be rejected with
// HTTP 403, regardless of how well-formed its body is, and the service layer
// must never see the request.
func TestBulkRevoke_NonAdmin_Returns403(t *testing.T) {
var serviceCalled bool
svc := &mockBulkRevocationService{
BulkRevokeFn: func(ctx context.Context, criteria domain.BulkRevocationCriteria, reason string, actor string) (*domain.BulkRevocationResult, error) {
serviceCalled = true
return &domain.BulkRevocationResult{}, nil
},
}
h := NewBulkRevocationHandler(svc)
// Well-formed body + well-formed reason + filter — the only thing
// missing is an admin-tagged context. The gate must still fire.
body := `{"reason":"keyCompromise","certificate_ids":["mc-1","mc-2"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
req = req.WithContext(contextWithRequestID()) // request id only, no admin flag
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
if w.Code != http.StatusForbidden {
t.Fatalf("expected status 403, got %d (body=%q)", w.Code, w.Body.String())
}
var resp map[string]any
if err := json.NewDecoder(w.Body).Decode(&resp); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
msg, _ := resp["message"].(string)
if !strings.Contains(strings.ToLower(msg), "admin") {
t.Errorf("expected message to mention admin requirement, got %q", msg)
}
if serviceCalled {
t.Errorf("service was invoked despite non-admin caller — gate failed open")
}
}
// TestBulkRevoke_AdminExplicitFalse_Returns403 pins the specific case where the
// AdminKey exists but is set to false — e.g., a non-admin named-key caller.
// Without this we could regress to "key missing == deny, key present == allow"
// which would silently grant a false flag.
func TestBulkRevoke_AdminExplicitFalse_Returns403(t *testing.T) {
h := NewBulkRevocationHandler(&mockBulkRevocationService{})
body := `{"reason":"keyCompromise","certificate_ids":["mc-1"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
ctx := context.WithValue(context.Background(), middleware.RequestIDKey{}, "test-request-id")
ctx = context.WithValue(ctx, middleware.AdminKey{}, false)
req = req.WithContext(ctx)
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
if w.Code != http.StatusForbidden {
t.Fatalf("expected status 403 for admin=false, got %d", w.Code)
}
}
// TestBulkRevoke_AdminPermitted_ForwardsActor confirms the happy path:
// an admin-tagged context reaches the service and the actor (from the auth
// UserKey) is propagated through to BulkRevoke. This keeps the admin gate and
// the M-002 actor-propagation wired together in a single regression.
func TestBulkRevoke_AdminPermitted_ForwardsActor(t *testing.T) {
var capturedActor string
svc := &mockBulkRevocationService{
BulkRevokeFn: func(ctx context.Context, criteria domain.BulkRevocationCriteria, reason string, actor string) (*domain.BulkRevocationResult, error) {
capturedActor = actor
return &domain.BulkRevocationResult{TotalMatched: 1, TotalRevoked: 1}, nil
},
}
h := NewBulkRevocationHandler(svc)
body := `{"reason":"keyCompromise","certificate_ids":["mc-1"]}`
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates/bulk-revoke", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
ctx := context.WithValue(context.Background(), middleware.RequestIDKey{}, "test-request-id")
ctx = context.WithValue(ctx, middleware.AdminKey{}, true)
ctx = context.WithValue(ctx, middleware.UserKey{}, "ops-admin")
req = req.WithContext(ctx)
w := httptest.NewRecorder()
h.BulkRevoke(w, req)
if w.Code != http.StatusOK {
t.Fatalf("expected status 200 for admin caller, got %d (body=%q)", w.Code, w.Body.String())
}
if capturedActor != "ops-admin" {
t.Errorf("expected actor ops-admin, got %q", capturedActor)
}
}
+167 -215
View File
@@ -17,116 +17,116 @@ import (
// MockCertificateService is a mock implementation of CertificateService interface.
type MockCertificateService struct {
ListCertificatesFn func(status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error)
ListCertificatesWithFilterFn func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error)
GetCertificateFn func(id string) (*domain.ManagedCertificate, error)
CreateCertificateFn func(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
UpdateCertificateFn func(id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
ArchiveCertificateFn func(id string) error
GetCertificateVersionsFn func(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error)
TriggerRenewalFn func(certID string) error
TriggerDeploymentFn func(certID string, targetID string) error
RevokeCertificateFn func(certID string, reason string) error
GetRevokedCertificatesFn func() ([]*domain.CertificateRevocation, error)
GenerateDERCRLFn func(issuerID string) ([]byte, error)
GetOCSPResponseFn func(issuerID string, serialHex string) ([]byte, error)
GetCertificateDeploymentsFn func(certID string) ([]domain.DeploymentTarget, error)
ListCertificatesFn func(ctx context.Context, status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error)
ListCertificatesWithFilterFn func(ctx context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error)
GetCertificateFn func(ctx context.Context, id string) (*domain.ManagedCertificate, error)
CreateCertificateFn func(ctx context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
UpdateCertificateFn func(ctx context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
ArchiveCertificateFn func(ctx context.Context, id string) error
GetCertificateVersionsFn func(ctx context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error)
TriggerRenewalFn func(ctx context.Context, certID string, actor string) error
TriggerDeploymentFn func(ctx context.Context, certID string, targetID string, actor string) error
RevokeCertificateFn func(ctx context.Context, certID string, reason string, actor string) error
GetRevokedCertificatesFn func(ctx context.Context) ([]*domain.CertificateRevocation, error)
GenerateDERCRLFn func(ctx context.Context, issuerID string) ([]byte, error)
GetOCSPResponseFn func(ctx context.Context, issuerID string, serialHex string) ([]byte, error)
GetCertificateDeploymentsFn func(ctx context.Context, certID string) ([]domain.DeploymentTarget, error)
}
func (m *MockCertificateService) ListCertificates(status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error) {
func (m *MockCertificateService) ListCertificates(ctx context.Context, status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error) {
if m.ListCertificatesFn != nil {
return m.ListCertificatesFn(status, environment, ownerID, teamID, issuerID, page, perPage)
return m.ListCertificatesFn(ctx, status, environment, ownerID, teamID, issuerID, page, perPage)
}
return nil, 0, nil
}
func (m *MockCertificateService) GetCertificate(id string) (*domain.ManagedCertificate, error) {
func (m *MockCertificateService) GetCertificate(ctx context.Context, id string) (*domain.ManagedCertificate, error) {
if m.GetCertificateFn != nil {
return m.GetCertificateFn(id)
return m.GetCertificateFn(ctx, id)
}
return nil, nil
}
func (m *MockCertificateService) CreateCertificate(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
func (m *MockCertificateService) CreateCertificate(ctx context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
if m.CreateCertificateFn != nil {
return m.CreateCertificateFn(cert)
return m.CreateCertificateFn(ctx, cert)
}
return nil, nil
}
func (m *MockCertificateService) UpdateCertificate(id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
func (m *MockCertificateService) UpdateCertificate(ctx context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
if m.UpdateCertificateFn != nil {
return m.UpdateCertificateFn(id, cert)
return m.UpdateCertificateFn(ctx, id, cert)
}
return nil, nil
}
func (m *MockCertificateService) ArchiveCertificate(id string) error {
func (m *MockCertificateService) ArchiveCertificate(ctx context.Context, id string) error {
if m.ArchiveCertificateFn != nil {
return m.ArchiveCertificateFn(id)
return m.ArchiveCertificateFn(ctx, id)
}
return nil
}
func (m *MockCertificateService) GetCertificateVersions(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
func (m *MockCertificateService) GetCertificateVersions(ctx context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
if m.GetCertificateVersionsFn != nil {
return m.GetCertificateVersionsFn(certID, page, perPage)
return m.GetCertificateVersionsFn(ctx, certID, page, perPage)
}
return nil, 0, nil
}
func (m *MockCertificateService) TriggerRenewal(certID string) error {
func (m *MockCertificateService) TriggerRenewal(ctx context.Context, certID string, actor string) error {
if m.TriggerRenewalFn != nil {
return m.TriggerRenewalFn(certID)
return m.TriggerRenewalFn(ctx, certID, actor)
}
return nil
}
func (m *MockCertificateService) TriggerDeployment(certID string, targetID string) error {
func (m *MockCertificateService) TriggerDeployment(ctx context.Context, certID string, targetID string, actor string) error {
if m.TriggerDeploymentFn != nil {
return m.TriggerDeploymentFn(certID, targetID)
return m.TriggerDeploymentFn(ctx, certID, targetID, actor)
}
return nil
}
func (m *MockCertificateService) RevokeCertificate(certID string, reason string) error {
func (m *MockCertificateService) RevokeCertificate(ctx context.Context, certID string, reason string, actor string) error {
if m.RevokeCertificateFn != nil {
return m.RevokeCertificateFn(certID, reason)
return m.RevokeCertificateFn(ctx, certID, reason, actor)
}
return nil
}
func (m *MockCertificateService) GetRevokedCertificates() ([]*domain.CertificateRevocation, error) {
func (m *MockCertificateService) GetRevokedCertificates(ctx context.Context) ([]*domain.CertificateRevocation, error) {
if m.GetRevokedCertificatesFn != nil {
return m.GetRevokedCertificatesFn()
return m.GetRevokedCertificatesFn(ctx)
}
return nil, nil
}
func (m *MockCertificateService) GenerateDERCRL(issuerID string) ([]byte, error) {
func (m *MockCertificateService) GenerateDERCRL(ctx context.Context, issuerID string) ([]byte, error) {
if m.GenerateDERCRLFn != nil {
return m.GenerateDERCRLFn(issuerID)
return m.GenerateDERCRLFn(ctx, issuerID)
}
return nil, nil
}
func (m *MockCertificateService) GetOCSPResponse(issuerID string, serialHex string) ([]byte, error) {
func (m *MockCertificateService) GetOCSPResponse(ctx context.Context, issuerID string, serialHex string) ([]byte, error) {
if m.GetOCSPResponseFn != nil {
return m.GetOCSPResponseFn(issuerID, serialHex)
return m.GetOCSPResponseFn(ctx, issuerID, serialHex)
}
return nil, nil
}
func (m *MockCertificateService) ListCertificatesWithFilter(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
func (m *MockCertificateService) ListCertificatesWithFilter(ctx context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if m.ListCertificatesWithFilterFn != nil {
return m.ListCertificatesWithFilterFn(filter)
return m.ListCertificatesWithFilterFn(ctx, filter)
}
return nil, 0, nil
}
func (m *MockCertificateService) GetCertificateDeployments(certID string) ([]domain.DeploymentTarget, error) {
func (m *MockCertificateService) GetCertificateDeployments(ctx context.Context, certID string) ([]domain.DeploymentTarget, error) {
if m.GetCertificateDeploymentsFn != nil {
return m.GetCertificateDeploymentsFn(certID)
return m.GetCertificateDeploymentsFn(ctx, certID)
}
return nil, nil
}
@@ -158,7 +158,7 @@ func TestListCertificates_Success(t *testing.T) {
}
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.Page == 1 && filter.PerPage == 50 {
return []domain.ManagedCertificate{cert1, cert2}, 2, nil
}
@@ -197,7 +197,7 @@ func TestListCertificates_Success(t *testing.T) {
// Test ListCertificates - with filters
func TestListCertificates_WithFilters(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.Status == "Active" && filter.Environment == "prod" {
return []domain.ManagedCertificate{}, 0, nil
}
@@ -236,7 +236,7 @@ func TestListCertificates_MethodNotAllowed(t *testing.T) {
// Test ListCertificates - service error
func TestListCertificates_ServiceError(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return nil, 0, ErrMockServiceFailed
},
}
@@ -266,7 +266,7 @@ func TestGetCertificate_Success(t *testing.T) {
}
mock := &MockCertificateService{
GetCertificateFn: func(id string) (*domain.ManagedCertificate, error) {
GetCertificateFn: func(_ context.Context, id string) (*domain.ManagedCertificate, error) {
if id == "mc-prod-001" {
return cert, nil
}
@@ -298,7 +298,7 @@ func TestGetCertificate_Success(t *testing.T) {
// Test GetCertificate - not found
func TestGetCertificate_NotFound(t *testing.T) {
mock := &MockCertificateService{
GetCertificateFn: func(id string) (*domain.ManagedCertificate, error) {
GetCertificateFn: func(_ context.Context, id string) (*domain.ManagedCertificate, error) {
return nil, ErrMockNotFound
},
}
@@ -345,7 +345,7 @@ func TestCreateCertificate_Success(t *testing.T) {
}
mock := &MockCertificateService{
CreateCertificateFn: func(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
CreateCertificateFn: func(_ context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
return created, nil
},
}
@@ -403,7 +403,7 @@ func TestCreateCertificate_InvalidBody(t *testing.T) {
// Test CreateCertificate - service error
func TestCreateCertificate_ServiceError(t *testing.T) {
mock := &MockCertificateService{
CreateCertificateFn: func(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
CreateCertificateFn: func(_ context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
return nil, ErrMockServiceFailed
},
}
@@ -432,6 +432,66 @@ func TestCreateCertificate_ServiceError(t *testing.T) {
}
}
// TestCreateCertificate_MissingRequiredField_Returns400 pins the C-001 handler
// contract: handler MUST reject a create payload that omits any of the five
// required fields (name, common_name, owner_id, team_id, issuer_id,
// renewal_policy_id) with HTTP 400 before the service is invoked. The mock
// service here would succeed if called; every subtest proving 400 therefore
// proves the handler guard fires.
func TestCreateCertificate_MissingRequiredField_Returns400(t *testing.T) {
baseBody := map[string]interface{}{
"name": "API Prod",
"common_name": "api.example.com",
"owner_id": "o-alice",
"team_id": "t-platform",
"issuer_id": "iss-local",
"renewal_policy_id": "rp-standard",
}
cases := []struct {
name string
missingField string
}{
{"missing name", "name"},
{"missing common_name", "common_name"},
{"missing owner_id", "owner_id"},
{"missing team_id", "team_id"},
{"missing issuer_id", "issuer_id"},
{"missing renewal_policy_id", "renewal_policy_id"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
body := make(map[string]interface{}, len(baseBody))
for k, v := range baseBody {
body[k] = v
}
delete(body, tc.missingField)
bodyBytes, _ := json.Marshal(body)
mock := &MockCertificateService{
CreateCertificateFn: func(_ context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
// Would succeed if handler guard did not fire.
cert.ID = "mc-would-be-created"
return &cert, nil
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates", bytes.NewReader(bodyBytes))
req = req.WithContext(contextWithRequestID())
req.Header.Set("Content-Type", "application/json")
w := httptest.NewRecorder()
handler.CreateCertificate(w, req)
if w.Code != http.StatusBadRequest {
t.Fatalf("%s: expected 400, got %d — body=%s", tc.name, w.Code, w.Body.String())
}
})
}
}
// Test UpdateCertificate - success case
func TestUpdateCertificate_Success(t *testing.T) {
updated := &domain.ManagedCertificate{
@@ -445,7 +505,7 @@ func TestUpdateCertificate_Success(t *testing.T) {
}
mock := &MockCertificateService{
UpdateCertificateFn: func(id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
UpdateCertificateFn: func(_ context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error) {
if id == "mc-prod-001" {
return updated, nil
}
@@ -501,7 +561,7 @@ func TestUpdateCertificate_InvalidBody(t *testing.T) {
// Test ArchiveCertificate - success case
func TestArchiveCertificate_Success(t *testing.T) {
mock := &MockCertificateService{
ArchiveCertificateFn: func(id string) error {
ArchiveCertificateFn: func(_ context.Context, id string) error {
if id == "mc-prod-001" {
return nil
}
@@ -524,7 +584,7 @@ func TestArchiveCertificate_Success(t *testing.T) {
// Test ArchiveCertificate - not found
func TestArchiveCertificate_NotFound(t *testing.T) {
mock := &MockCertificateService{
ArchiveCertificateFn: func(id string) error {
ArchiveCertificateFn: func(_ context.Context, id string) error {
return ErrMockNotFound
},
}
@@ -554,7 +614,7 @@ func TestGetCertificateVersions_Success(t *testing.T) {
}
mock := &MockCertificateService{
GetCertificateVersionsFn: func(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
GetCertificateVersionsFn: func(_ context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
if certID == "mc-prod-001" {
return []domain.CertificateVersion{ver1}, 1, nil
}
@@ -586,7 +646,7 @@ func TestGetCertificateVersions_Success(t *testing.T) {
// Test GetCertificateVersions - not found
func TestGetCertificateVersions_NotFound(t *testing.T) {
mock := &MockCertificateService{
GetCertificateVersionsFn: func(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
GetCertificateVersionsFn: func(_ context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error) {
return nil, 0, ErrMockNotFound
},
}
@@ -606,7 +666,7 @@ func TestGetCertificateVersions_NotFound(t *testing.T) {
// Test TriggerRenewal - success case
func TestTriggerRenewal_Success(t *testing.T) {
mock := &MockCertificateService{
TriggerRenewalFn: func(certID string) error {
TriggerRenewalFn: func(_ context.Context, certID string, _ string) error {
if certID == "mc-prod-001" {
return nil
}
@@ -638,7 +698,7 @@ func TestTriggerRenewal_Success(t *testing.T) {
// Test TriggerRenewal - service error
func TestTriggerRenewal_ServiceError(t *testing.T) {
mock := &MockCertificateService{
TriggerRenewalFn: func(certID string) error {
TriggerRenewalFn: func(_ context.Context, certID string, _ string) error {
return ErrMockServiceFailed
},
}
@@ -658,7 +718,7 @@ func TestTriggerRenewal_ServiceError(t *testing.T) {
// Test TriggerDeployment - success case
func TestTriggerDeployment_Success(t *testing.T) {
mock := &MockCertificateService{
TriggerDeploymentFn: func(certID string, targetID string) error {
TriggerDeploymentFn: func(_ context.Context, certID string, targetID string, _ string) error {
if certID == "mc-prod-001" {
return nil
}
@@ -695,7 +755,7 @@ func TestTriggerDeployment_Success(t *testing.T) {
// Test TriggerDeployment - without target ID
func TestTriggerDeployment_NoTargetID(t *testing.T) {
mock := &MockCertificateService{
TriggerDeploymentFn: func(certID string, targetID string) error {
TriggerDeploymentFn: func(_ context.Context, certID string, targetID string, _ string) error {
// Should accept empty targetID (deploy to all)
return nil
},
@@ -716,7 +776,7 @@ func TestTriggerDeployment_NoTargetID(t *testing.T) {
// Test ListCertificates - invalid page parameter
func TestListCertificates_InvalidPageParam(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
// Should default to page 1
if filter.Page == 1 {
return []domain.ManagedCertificate{}, 0, nil
@@ -740,7 +800,7 @@ func TestListCertificates_InvalidPageParam(t *testing.T) {
// Test ListCertificates - per_page exceeds max
func TestListCertificates_PerPageExceedsMax(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
// Should cap perPage at 500
if filter.PerPage == 50 { // defaults to 50 if > 500
return []domain.ManagedCertificate{}, 0, nil
@@ -765,7 +825,7 @@ func TestListCertificates_PerPageExceedsMax(t *testing.T) {
func TestRevokeCertificate_Handler_Success(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
if certID != "mc-prod-001" {
t.Errorf("expected certID mc-prod-001, got %s", certID)
}
@@ -798,7 +858,7 @@ func TestRevokeCertificate_Handler_Success(t *testing.T) {
func TestRevokeCertificate_Handler_NoBody(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
// Empty reason is OK — service defaults to "unspecified"
return nil
},
@@ -818,7 +878,7 @@ func TestRevokeCertificate_Handler_NoBody(t *testing.T) {
func TestRevokeCertificate_Handler_AlreadyRevoked(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
return fmt.Errorf("certificate is already revoked")
},
}
@@ -839,7 +899,7 @@ func TestRevokeCertificate_Handler_AlreadyRevoked(t *testing.T) {
func TestRevokeCertificate_Handler_NotFound(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
return fmt.Errorf("failed to fetch certificate: not found")
},
}
@@ -858,7 +918,7 @@ func TestRevokeCertificate_Handler_NotFound(t *testing.T) {
func TestRevokeCertificate_Handler_InvalidReason(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
return fmt.Errorf("invalid revocation reason: badReason")
},
}
@@ -922,7 +982,7 @@ func TestRevokeCertificate_Handler_EmptyID(t *testing.T) {
func TestRevokeCertificate_Handler_CannotRevokeArchived(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
return fmt.Errorf("cannot revoke archived certificate")
},
}
@@ -941,7 +1001,7 @@ func TestRevokeCertificate_Handler_CannotRevokeArchived(t *testing.T) {
func TestRevokeCertificate_Handler_ServerError(t *testing.T) {
mock := &MockCertificateService{
RevokeCertificateFn: func(certID string, reason string) error {
RevokeCertificateFn: func(_ context.Context, certID string, reason string, _ string) error {
return fmt.Errorf("database connection lost")
},
}
@@ -958,132 +1018,18 @@ func TestRevokeCertificate_Handler_ServerError(t *testing.T) {
}
}
// === CRL Handler Tests ===
func TestGetCRL_Success(t *testing.T) {
mock := &MockCertificateService{
GetRevokedCertificatesFn: func() ([]*domain.CertificateRevocation, error) {
return []*domain.CertificateRevocation{
{
ID: "rev-1",
CertificateID: "cert-1",
SerialNumber: "ABC123",
Reason: "keyCompromise",
RevokedAt: time.Date(2026, 3, 20, 10, 0, 0, 0, time.UTC),
},
{
ID: "rev-2",
CertificateID: "cert-2",
SerialNumber: "DEF456",
Reason: "superseded",
RevokedAt: time.Date(2026, 3, 21, 14, 30, 0, 0, time.UTC),
},
}, nil
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.GetCRL(w, req)
if w.Code != http.StatusOK {
t.Errorf("expected status %d, got %d", http.StatusOK, w.Code)
}
var resp map[string]interface{}
json.NewDecoder(w.Body).Decode(&resp)
if resp["version"] != float64(1) {
t.Errorf("expected version 1, got %v", resp["version"])
}
if resp["total"] != float64(2) {
t.Errorf("expected total 2, got %v", resp["total"])
}
entries, ok := resp["entries"].([]interface{})
if !ok {
t.Fatal("expected entries to be an array")
}
if len(entries) != 2 {
t.Errorf("expected 2 entries, got %d", len(entries))
}
entry1 := entries[0].(map[string]interface{})
if entry1["serial_number"] != "ABC123" {
t.Errorf("expected serial ABC123, got %v", entry1["serial_number"])
}
if entry1["revocation_reason"] != "keyCompromise" {
t.Errorf("expected reason keyCompromise, got %v", entry1["revocation_reason"])
}
}
func TestGetCRL_Empty(t *testing.T) {
mock := &MockCertificateService{
GetRevokedCertificatesFn: func() ([]*domain.CertificateRevocation, error) {
return nil, nil
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.GetCRL(w, req)
if w.Code != http.StatusOK {
t.Errorf("expected status %d, got %d", http.StatusOK, w.Code)
}
var resp map[string]interface{}
json.NewDecoder(w.Body).Decode(&resp)
if resp["total"] != float64(0) {
t.Errorf("expected total 0, got %v", resp["total"])
}
}
func TestGetCRL_ServiceError(t *testing.T) {
mock := &MockCertificateService{
GetRevokedCertificatesFn: func() ([]*domain.CertificateRevocation, error) {
return nil, fmt.Errorf("revocation repository not configured")
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.GetCRL(w, req)
if w.Code != http.StatusInternalServerError {
t.Errorf("expected status %d, got %d", http.StatusInternalServerError, w.Code)
}
}
func TestGetCRL_MethodNotAllowed(t *testing.T) {
mock := &MockCertificateService{}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/crl", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.GetCRL(w, req)
if w.Code != http.StatusMethodNotAllowed {
t.Errorf("expected status %d, got %d", http.StatusMethodNotAllowed, w.Code)
}
}
// M15b: DER CRL and OCSP Handler Tests
// === CRL and OCSP Handler Tests (RFC 5280 / RFC 6960, served under /.well-known/pki/) ===
//
// M-006 relocated these endpoints from /api/v1/crl* and /api/v1/ocsp/* to the
// RFC-compliant /.well-known/pki/ namespace and deleted the non-standard JSON
// CRL endpoint. The DER-encoded X.509 CRL (application/pkix-crl) and the
// DER-encoded OCSP response (application/ocsp-response) are the only wire
// formats certctl supports for revocation data.
func TestGetDERCRL_Success(t *testing.T) {
derCRLData := []byte{0x30, 0x82, 0x01, 0x00} // Mock DER CRL bytes
mock := &MockCertificateService{
GenerateDERCRLFn: func(issuerID string) ([]byte, error) {
GenerateDERCRLFn: func(_ context.Context, issuerID string) ([]byte, error) {
if issuerID == "iss-local" {
return derCRLData, nil
}
@@ -1092,7 +1038,7 @@ func TestGetDERCRL_Success(t *testing.T) {
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl/iss-local", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/crl/iss-local", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1107,17 +1053,20 @@ func TestGetDERCRL_Success(t *testing.T) {
if len(responseBody) == 0 {
t.Error("expected non-empty response body")
}
if ct := w.Header().Get("Content-Type"); ct != "application/pkix-crl" {
t.Errorf("expected Content-Type application/pkix-crl, got %q", ct)
}
}
func TestGetDERCRL_IssuerNotFound(t *testing.T) {
mock := &MockCertificateService{
GenerateDERCRLFn: func(issuerID string) ([]byte, error) {
GenerateDERCRLFn: func(_ context.Context, issuerID string) ([]byte, error) {
return nil, fmt.Errorf("issuer not found")
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl/nonexistent", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/crl/nonexistent", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1130,13 +1079,13 @@ func TestGetDERCRL_IssuerNotFound(t *testing.T) {
func TestGetDERCRL_NotSupported(t *testing.T) {
mock := &MockCertificateService{
GenerateDERCRLFn: func(issuerID string) ([]byte, error) {
GenerateDERCRLFn: func(_ context.Context, issuerID string) ([]byte, error) {
return nil, fmt.Errorf("issuer does not support CRL generation")
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/crl/iss-acme", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/crl/iss-acme", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1151,7 +1100,7 @@ func TestGetDERCRL_NotSupported(t *testing.T) {
func TestGetDERCRL_MethodNotAllowed(t *testing.T) {
mock := &MockCertificateService{}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/crl/iss-local", nil)
req := httptest.NewRequest(http.MethodPost, "/.well-known/pki/crl/iss-local", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1165,7 +1114,7 @@ func TestGetDERCRL_MethodNotAllowed(t *testing.T) {
func TestHandleOCSP_Success(t *testing.T) {
ocspResponseBytes := []byte{0x30, 0x82, 0x02, 0x00} // Mock OCSP response
mock := &MockCertificateService{
GetOCSPResponseFn: func(issuerID string, serialHex string) ([]byte, error) {
GetOCSPResponseFn: func(_ context.Context, issuerID string, serialHex string) ([]byte, error) {
if issuerID == "iss-local" && serialHex == "12345" {
return ocspResponseBytes, nil
}
@@ -1174,7 +1123,7 @@ func TestHandleOCSP_Success(t *testing.T) {
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/ocsp/iss-local/12345", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/ocsp/iss-local/12345", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1188,12 +1137,15 @@ func TestHandleOCSP_Success(t *testing.T) {
if len(responseBody) == 0 {
t.Error("expected non-empty OCSP response body")
}
if ct := w.Header().Get("Content-Type"); ct != "application/ocsp-response" {
t.Errorf("expected Content-Type application/ocsp-response, got %q", ct)
}
}
func TestHandleOCSP_MissingSerial(t *testing.T) {
mock := &MockCertificateService{}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/ocsp/iss-local/", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/ocsp/iss-local/", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1206,13 +1158,13 @@ func TestHandleOCSP_MissingSerial(t *testing.T) {
func TestHandleOCSP_IssuerNotFound(t *testing.T) {
mock := &MockCertificateService{
GetOCSPResponseFn: func(issuerID string, serialHex string) ([]byte, error) {
GetOCSPResponseFn: func(_ context.Context, issuerID string, serialHex string) ([]byte, error) {
return nil, fmt.Errorf("issuer not found")
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/ocsp/nonexistent/ABC123", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/ocsp/nonexistent/ABC123", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1225,13 +1177,13 @@ func TestHandleOCSP_IssuerNotFound(t *testing.T) {
func TestHandleOCSP_CertNotFound(t *testing.T) {
mock := &MockCertificateService{
GetOCSPResponseFn: func(issuerID string, serialHex string) ([]byte, error) {
GetOCSPResponseFn: func(_ context.Context, issuerID string, serialHex string) ([]byte, error) {
return nil, fmt.Errorf("certificate not found")
},
}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodGet, "/api/v1/ocsp/iss-local/UNKNOWN", nil)
req := httptest.NewRequest(http.MethodGet, "/.well-known/pki/ocsp/iss-local/UNKNOWN", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1245,7 +1197,7 @@ func TestHandleOCSP_CertNotFound(t *testing.T) {
func TestHandleOCSP_MethodNotAllowed(t *testing.T) {
mock := &MockCertificateService{}
handler := NewCertificateHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/ocsp/iss-local/12345", nil)
req := httptest.NewRequest(http.MethodPost, "/.well-known/pki/ocsp/iss-local/12345", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
@@ -1261,7 +1213,7 @@ func TestHandleOCSP_MethodNotAllowed(t *testing.T) {
// TestListCertificates_SortParam tests sort parameter parsing and passing to service.
func TestListCertificates_SortParam(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
// Handler strips the '-' prefix and sets SortDesc = true
if filter.Sort != "notAfter" || !filter.SortDesc {
t.Errorf("expected sort=notAfter desc=true, got sort=%s desc=%v", filter.Sort, filter.SortDesc)
@@ -1284,7 +1236,7 @@ func TestListCertificates_SortParam(t *testing.T) {
// TestListCertificates_SortParam_Ascending tests sort parameter without '-' prefix (ascending).
func TestListCertificates_SortParam_Ascending(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.Sort != "createdAt" || filter.SortDesc {
t.Errorf("expected sort=createdAt desc=false, got sort=%s desc=%v", filter.Sort, filter.SortDesc)
}
@@ -1309,7 +1261,7 @@ func TestListCertificates_TimeRangeFilters(t *testing.T) {
after := time.Now().AddDate(0, 0, -90)
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.ExpiresBefore == nil {
t.Error("expected ExpiresBefore to be set")
}
@@ -1339,7 +1291,7 @@ func TestListCertificates_CreatedAfterFilter(t *testing.T) {
past := time.Now().AddDate(-1, 0, 0)
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.CreatedAfter == nil {
t.Error("expected CreatedAfter to be set")
}
@@ -1369,7 +1321,7 @@ func TestListCertificates_CursorPagination(t *testing.T) {
}
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
return []domain.ManagedCertificate{cert}, 1, nil
},
}
@@ -1409,7 +1361,7 @@ func TestListCertificates_SparseFields(t *testing.T) {
}
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if len(filter.Fields) != 2 {
t.Errorf("expected 2 fields, got %d", len(filter.Fields))
}
@@ -1456,7 +1408,7 @@ func TestListCertificates_SparseFields(t *testing.T) {
// TestListCertificates_ProfileFilter tests profile_id filter.
func TestListCertificates_ProfileFilter(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.ProfileID != "prof-standard" {
t.Errorf("expected ProfileID=prof-standard, got %s", filter.ProfileID)
}
@@ -1479,7 +1431,7 @@ func TestListCertificates_ProfileFilter(t *testing.T) {
// TestListCertificates_AgentIDFilter tests agent_id filter.
func TestListCertificates_AgentIDFilter(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.AgentID != "agent-prod-001" {
t.Errorf("expected AgentID=agent-prod-001, got %s", filter.AgentID)
}
@@ -1502,7 +1454,7 @@ func TestListCertificates_AgentIDFilter(t *testing.T) {
// TestListCertificates_CombinedFilters tests multiple filters together.
func TestListCertificates_CombinedFilters(t *testing.T) {
mock := &MockCertificateService{
ListCertificatesWithFilterFn: func(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
ListCertificatesWithFilterFn: func(_ context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error) {
if filter.Status != "Active" || filter.Environment != "production" || filter.ProfileID != "prof-standard" {
t.Error("expected all filters to be set")
}
@@ -1540,7 +1492,7 @@ func TestGetCertificateDeployments_Success(t *testing.T) {
}
mock := &MockCertificateService{
GetCertificateDeploymentsFn: func(certID string) ([]domain.DeploymentTarget, error) {
GetCertificateDeploymentsFn: func(_ context.Context, certID string) ([]domain.DeploymentTarget, error) {
if certID != "mc-prod-001" {
return nil, ErrMockNotFound
}
@@ -1576,7 +1528,7 @@ func TestGetCertificateDeployments_Success(t *testing.T) {
// TestGetCertificateDeployments_NotFound tests 404 for nonexistent certificate.
func TestGetCertificateDeployments_NotFound(t *testing.T) {
mock := &MockCertificateService{
GetCertificateDeploymentsFn: func(certID string) ([]domain.DeploymentTarget, error) {
GetCertificateDeploymentsFn: func(_ context.Context, certID string) ([]domain.DeploymentTarget, error) {
return nil, fmt.Errorf("certificate not found")
},
}
@@ -1596,7 +1548,7 @@ func TestGetCertificateDeployments_NotFound(t *testing.T) {
// TestGetCertificateDeployments_Empty tests successful response with no deployments.
func TestGetCertificateDeployments_Empty(t *testing.T) {
mock := &MockCertificateService{
GetCertificateDeploymentsFn: func(certID string) ([]domain.DeploymentTarget, error) {
GetCertificateDeploymentsFn: func(_ context.Context, certID string) ([]domain.DeploymentTarget, error) {
if certID == "mc-no-deployments" {
return []domain.DeploymentTarget{}, nil
}
+46 -73
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"log/slog"
"net/http"
@@ -15,20 +16,20 @@ import (
// CertificateService defines the service interface for certificate operations.
type CertificateService interface {
ListCertificates(status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error)
ListCertificatesWithFilter(filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error)
GetCertificate(id string) (*domain.ManagedCertificate, error)
CreateCertificate(cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
UpdateCertificate(id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
ArchiveCertificate(id string) error
GetCertificateVersions(certID string, page, perPage int) ([]domain.CertificateVersion, int64, error)
TriggerRenewal(certID string) error
TriggerDeployment(certID string, targetID string) error
RevokeCertificate(certID string, reason string) error
GetRevokedCertificates() ([]*domain.CertificateRevocation, error)
GenerateDERCRL(issuerID string) ([]byte, error)
GetOCSPResponse(issuerID string, serialHex string) ([]byte, error)
GetCertificateDeployments(certID string) ([]domain.DeploymentTarget, error)
ListCertificates(ctx context.Context, status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error)
ListCertificatesWithFilter(ctx context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error)
GetCertificate(ctx context.Context, id string) (*domain.ManagedCertificate, error)
CreateCertificate(ctx context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
UpdateCertificate(ctx context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
ArchiveCertificate(ctx context.Context, id string) error
GetCertificateVersions(ctx context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error)
TriggerRenewal(ctx context.Context, certID string, actor string) error
TriggerDeployment(ctx context.Context, certID string, targetID string, actor string) error
RevokeCertificate(ctx context.Context, certID string, reason string, actor string) error
GetRevokedCertificates(ctx context.Context) ([]*domain.CertificateRevocation, error)
GenerateDERCRL(ctx context.Context, issuerID string) ([]byte, error)
GetOCSPResponse(ctx context.Context, issuerID string, serialHex string) ([]byte, error)
GetCertificateDeployments(ctx context.Context, certID string) ([]domain.DeploymentTarget, error)
}
// CertificateHandler handles HTTP requests for certificate operations.
@@ -128,7 +129,7 @@ func (h CertificateHandler) ListCertificates(w http.ResponseWriter, r *http.Requ
filter.Fields = strings.Split(fieldsStr, ",")
}
certs, total, err := h.svc.ListCertificatesWithFilter(filter)
certs, total, err := h.svc.ListCertificatesWithFilter(r.Context(), filter)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list certificates", requestID)
return
@@ -186,7 +187,7 @@ func (h CertificateHandler) GetCertificate(w http.ResponseWriter, r *http.Reques
return
}
cert, err := h.svc.GetCertificate(id)
cert, err := h.svc.GetCertificate(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
@@ -241,7 +242,7 @@ func (h CertificateHandler) CreateCertificate(w http.ResponseWriter, r *http.Req
return
}
created, err := h.svc.CreateCertificate(cert)
created, err := h.svc.CreateCertificate(r.Context(), cert)
if err != nil {
slog.Error("failed to create certificate", "error", err, "request_id", requestID, "common_name", cert.CommonName, "name", cert.Name)
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create certificate", requestID)
@@ -295,7 +296,7 @@ func (h CertificateHandler) UpdateCertificate(w http.ResponseWriter, r *http.Req
}
}
updated, err := h.svc.UpdateCertificate(id, cert)
updated, err := h.svc.UpdateCertificate(r.Context(), id, cert)
if err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
@@ -325,7 +326,7 @@ func (h CertificateHandler) ArchiveCertificate(w http.ResponseWriter, r *http.Re
return
}
if err := h.svc.ArchiveCertificate(id); err != nil {
if err := h.svc.ArchiveCertificate(r.Context(), id); err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
@@ -370,7 +371,7 @@ func (h CertificateHandler) GetCertificateVersions(w http.ResponseWriter, r *htt
}
}
versions, total, err := h.svc.GetCertificateVersions(certID, page, perPage)
versions, total, err := h.svc.GetCertificateVersions(r.Context(), certID, page, perPage)
if err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
@@ -410,7 +411,9 @@ func (h CertificateHandler) TriggerRenewal(w http.ResponseWriter, r *http.Reques
}
certID := parts[0]
if err := h.svc.TriggerRenewal(certID); err != nil {
actor := resolveActor(r.Context())
if err := h.svc.TriggerRenewal(r.Context(), certID, actor); err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
@@ -466,7 +469,9 @@ func (h CertificateHandler) TriggerDeployment(w http.ResponseWriter, r *http.Req
}
}
if err := h.svc.TriggerDeployment(certID, req.TargetID); err != nil {
actor := resolveActor(r.Context())
if err := h.svc.TriggerDeployment(r.Context(), certID, req.TargetID, actor); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to trigger deployment", requestID)
return
}
@@ -508,7 +513,9 @@ func (h CertificateHandler) RevokeCertificate(w http.ResponseWriter, r *http.Req
}
}
if err := h.svc.RevokeCertificate(certID, req.Reason); err != nil {
actor := resolveActor(r.Context())
if err := h.svc.RevokeCertificate(r.Context(), certID, req.Reason, actor); err != nil {
// Distinguish between client errors and server errors
errMsg := err.Error()
if strings.Contains(errMsg, "already revoked") ||
@@ -528,49 +535,12 @@ func (h CertificateHandler) RevokeCertificate(w http.ResponseWriter, r *http.Req
JSON(w, http.StatusOK, map[string]string{"status": "revoked"})
}
// GetCRL returns the Certificate Revocation List as structured JSON.
// GET /api/v1/crl
// Note: DER-encoded X.509 CRL generation (requiring CA key access) is planned for M15b
// alongside the embedded OCSP responder. This endpoint provides the same data in JSON format.
func (h CertificateHandler) GetCRL(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
revocations, err := h.svc.GetRevokedCertificates()
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to generate CRL", requestID)
return
}
type CRLEntry struct {
SerialNumber string `json:"serial_number"`
RevocationDate string `json:"revocation_date"`
RevocationReason string `json:"revocation_reason"`
}
entries := make([]CRLEntry, 0, len(revocations))
for _, rev := range revocations {
entries = append(entries, CRLEntry{
SerialNumber: rev.SerialNumber,
RevocationDate: rev.RevokedAt.Format("2006-01-02T15:04:05Z"),
RevocationReason: rev.Reason,
})
}
JSON(w, http.StatusOK, map[string]interface{}{
"version": 1,
"entries": entries,
"total": len(entries),
"generated_at": time.Now().UTC().Format("2006-01-02T15:04:05Z"),
})
}
// GetDERCRL returns a DER-encoded X.509 CRL signed by the specified issuer.
// GET /api/v1/crl/{issuer_id}
// GET /.well-known/pki/crl/{issuer_id}
//
// RFC 5280 § 5. Served unauthenticated under the /.well-known/pki/ namespace so
// relying parties (browsers, OpenSSL, OCSP stapling sidecars) can fetch the CRL
// without presenting certctl API credentials.
func (h CertificateHandler) GetDERCRL(w http.ResponseWriter, r *http.Request) {
requestID, _ := r.Context().Value("request_id").(string)
@@ -579,13 +549,13 @@ func (h CertificateHandler) GetDERCRL(w http.ResponseWriter, r *http.Request) {
return
}
issuerID := strings.TrimPrefix(r.URL.Path, "/api/v1/crl/")
issuerID := strings.TrimPrefix(r.URL.Path, "/.well-known/pki/crl/")
if issuerID == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Issuer ID is required", requestID)
return
}
derBytes, err := h.svc.GenerateDERCRL(issuerID)
derBytes, err := h.svc.GenerateDERCRL(r.Context(), issuerID)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
@@ -607,8 +577,11 @@ func (h CertificateHandler) GetDERCRL(w http.ResponseWriter, r *http.Request) {
}
// HandleOCSP processes OCSP requests.
// GET /api/v1/ocsp/{issuer_id}/{serial_hex}
// For simplicity, use GET with path params instead of binary POST.
// GET /.well-known/pki/ocsp/{issuer_id}/{serial_hex}
//
// RFC 6960. Served unauthenticated under the /.well-known/pki/ namespace. For
// simplicity we accept GET with path params rather than the binary POST body
// form — the response is a valid DER-encoded OCSP response either way.
func (h CertificateHandler) HandleOCSP(w http.ResponseWriter, r *http.Request) {
requestID, _ := r.Context().Value("request_id").(string)
@@ -617,8 +590,8 @@ func (h CertificateHandler) HandleOCSP(w http.ResponseWriter, r *http.Request) {
return
}
// Extract issuer_id and serial from path: /api/v1/ocsp/{issuer_id}/{serial_hex}
path := strings.TrimPrefix(r.URL.Path, "/api/v1/ocsp/")
// Extract issuer_id and serial from path: /.well-known/pki/ocsp/{issuer_id}/{serial_hex}
path := strings.TrimPrefix(r.URL.Path, "/.well-known/pki/ocsp/")
parts := strings.SplitN(path, "/", 2)
if len(parts) < 2 || parts[0] == "" || parts[1] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Issuer ID and serial number are required", requestID)
@@ -627,7 +600,7 @@ func (h CertificateHandler) HandleOCSP(w http.ResponseWriter, r *http.Request) {
issuerID := parts[0]
serialHex := parts[1]
derBytes, err := h.svc.GetOCSPResponse(issuerID, serialHex)
derBytes, err := h.svc.GetOCSPResponse(r.Context(), issuerID, serialHex)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
@@ -667,7 +640,7 @@ func (h CertificateHandler) GetCertificateDeployments(w http.ResponseWriter, r *
}
certID := parts[0]
deployments, err := h.svc.GetCertificateDeployments(certID)
deployments, err := h.svc.GetCertificateDeployments(r.Context(), certID)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
+9 -4
View File
@@ -11,12 +11,17 @@ import (
)
// DiscoveryService defines the interface used by the discovery handler.
// ClaimDiscovered and DismissDiscovered accept an explicit actor parameter so
// the handler can flow the authenticated named-key identity into the audit
// trail (M-005). Services that call these methods from non-request contexts
// pass a descriptive sentinel (e.g., "system") or "" (which falls back to
// "api").
type DiscoveryService interface {
ProcessDiscoveryReport(ctx context.Context, report *domain.DiscoveryReport) (*domain.DiscoveryScan, error)
ListDiscovered(ctx context.Context, agentID, status string, page, perPage int) ([]*domain.DiscoveredCertificate, int, error)
GetDiscovered(ctx context.Context, id string) (*domain.DiscoveredCertificate, error)
ClaimDiscovered(ctx context.Context, id string, managedCertID string) error
DismissDiscovered(ctx context.Context, id string) error
ClaimDiscovered(ctx context.Context, id string, managedCertID string, actor string) error
DismissDiscovered(ctx context.Context, id string, actor string) error
ListScans(ctx context.Context, agentID string, page, perPage int) ([]*domain.DiscoveryScan, int, error)
GetScan(ctx context.Context, id string) (*domain.DiscoveryScan, error)
GetDiscoverySummary(ctx context.Context) (map[string]int, error)
@@ -142,7 +147,7 @@ func (h DiscoveryHandler) ClaimDiscovered(w http.ResponseWriter, r *http.Request
return
}
if err := h.svc.ClaimDiscovered(r.Context(), id, body.ManagedCertificateID); err != nil {
if err := h.svc.ClaimDiscovered(r.Context(), id, body.ManagedCertificateID, resolveActor(r.Context())); err != nil {
Error(w, http.StatusInternalServerError, fmt.Sprintf("failed to claim certificate: %v", err))
return
}
@@ -166,7 +171,7 @@ func (h DiscoveryHandler) DismissDiscovered(w http.ResponseWriter, r *http.Reque
return
}
if err := h.svc.DismissDiscovered(r.Context(), id); err != nil {
if err := h.svc.DismissDiscovered(r.Context(), id, resolveActor(r.Context())); err != nil {
Error(w, http.StatusInternalServerError, fmt.Sprintf("failed to dismiss certificate: %v", err))
return
}
+10 -10
View File
@@ -19,8 +19,8 @@ type MockDiscoveryService struct {
ProcessDiscoveryReportFn func(ctx context.Context, report *domain.DiscoveryReport) (*domain.DiscoveryScan, error)
ListDiscoveredFn func(ctx context.Context, agentID, status string, page, perPage int) ([]*domain.DiscoveredCertificate, int, error)
GetDiscoveredFn func(ctx context.Context, id string) (*domain.DiscoveredCertificate, error)
ClaimDiscoveredFn func(ctx context.Context, id string, managedCertID string) error
DismissDiscoveredFn func(ctx context.Context, id string) error
ClaimDiscoveredFn func(ctx context.Context, id string, managedCertID string, actor string) error
DismissDiscoveredFn func(ctx context.Context, id string, actor string) error
ListScansFn func(ctx context.Context, agentID string, page, perPage int) ([]*domain.DiscoveryScan, int, error)
GetScanFn func(ctx context.Context, id string) (*domain.DiscoveryScan, error)
GetDiscoverySummaryFn func(ctx context.Context) (map[string]int, error)
@@ -47,16 +47,16 @@ func (m *MockDiscoveryService) GetDiscovered(ctx context.Context, id string) (*d
return nil, nil
}
func (m *MockDiscoveryService) ClaimDiscovered(ctx context.Context, id string, managedCertID string) error {
func (m *MockDiscoveryService) ClaimDiscovered(ctx context.Context, id string, managedCertID string, actor string) error {
if m.ClaimDiscoveredFn != nil {
return m.ClaimDiscoveredFn(ctx, id, managedCertID)
return m.ClaimDiscoveredFn(ctx, id, managedCertID, actor)
}
return nil
}
func (m *MockDiscoveryService) DismissDiscovered(ctx context.Context, id string) error {
func (m *MockDiscoveryService) DismissDiscovered(ctx context.Context, id string, actor string) error {
if m.DismissDiscoveredFn != nil {
return m.DismissDiscoveredFn(ctx, id)
return m.DismissDiscoveredFn(ctx, id, actor)
}
return nil
}
@@ -352,7 +352,7 @@ func TestGetDiscovered_NotFound(t *testing.T) {
// Test ClaimDiscovered - success case
func TestClaimDiscovered_Success(t *testing.T) {
mock := &MockDiscoveryService{
ClaimDiscoveredFn: func(ctx context.Context, id string, managedCertID string) error {
ClaimDiscoveredFn: func(ctx context.Context, id string, managedCertID string, actor string) error {
if id == "dcert-1" && managedCertID == "mc-prod-1" {
return nil
}
@@ -411,7 +411,7 @@ func TestClaimDiscovered_MissingManagedCertID(t *testing.T) {
// Test ClaimDiscovered - discovered cert not found
func TestClaimDiscovered_NotFound(t *testing.T) {
mock := &MockDiscoveryService{
ClaimDiscoveredFn: func(ctx context.Context, id string, managedCertID string) error {
ClaimDiscoveredFn: func(ctx context.Context, id string, managedCertID string, actor string) error {
return fmt.Errorf("discovered certificate not found")
},
}
@@ -438,7 +438,7 @@ func TestClaimDiscovered_NotFound(t *testing.T) {
// Test DismissDiscovered - success case
func TestDismissDiscovered_Success(t *testing.T) {
mock := &MockDiscoveryService{
DismissDiscoveredFn: func(ctx context.Context, id string) error {
DismissDiscoveredFn: func(ctx context.Context, id string, actor string) error {
if id == "dcert-1" {
return nil
}
@@ -614,7 +614,7 @@ func TestGetDiscoverySummary_MethodNotAllowed(t *testing.T) {
// Test DismissDiscovered - service error
func TestDismissDiscovered_ServiceError(t *testing.T) {
mock := &MockDiscoveryService{
DismissDiscoveredFn: func(ctx context.Context, id string) error {
DismissDiscoveredFn: func(ctx context.Context, id string, actor string) error {
return fmt.Errorf("database error")
},
}
+19 -3
View File
@@ -2,6 +2,8 @@ package handler
import (
"net/http"
"github.com/shankar0123/certctl/internal/api/middleware"
)
// HealthHandler handles health and readiness check endpoints.
@@ -55,9 +57,23 @@ func (h HealthHandler) AuthInfo(w http.ResponseWriter, r *http.Request) {
JSON(w, http.StatusOK, response)
}
// AuthCheck returns 200 if the request has valid auth credentials.
// The auth middleware runs before this handler, so reaching here means auth passed.
// AuthCheck returns 200 if the request has valid auth credentials, along with
// the resolved named-key identity and admin flag so the GUI can gate
// admin-only affordances (e.g., the bulk-revoke button).
//
// M-003 (Phase B.4): surface the admin flag so the frontend hides affordances
// that would otherwise 403 at the server. This is a hint for UX only —
// authorization remains enforced at the handler layer (bulk_revocation.go).
//
// The auth middleware runs before this handler, so reaching here means auth
// passed. `user` falls back to an empty string when auth is disabled
// (CERTCTL_AUTH_TYPE=none).
// GET /api/v1/auth/check
func (h HealthHandler) AuthCheck(w http.ResponseWriter, r *http.Request) {
JSON(w, http.StatusOK, map[string]string{"status": "authenticated"})
response := map[string]interface{}{
"status": "authenticated",
"user": middleware.GetUser(r.Context()),
"admin": middleware.IsAdmin(r.Context()),
}
JSON(w, http.StatusOK, response)
}
+115 -2
View File
@@ -1,10 +1,13 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
"github.com/shankar0123/certctl/internal/api/middleware"
)
func TestHealth_ReturnsOK(t *testing.T) {
@@ -204,8 +207,8 @@ func TestAuthCheck_ReturnsOK(t *testing.T) {
t.Errorf("Content-Type = %q, want application/json", ct)
}
// Check response body
var result map[string]string
// Check response body — mixed-value map (string + bool) post-Phase B.4.
var result map[string]any
if err := json.NewDecoder(w.Body).Decode(&result); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
@@ -232,3 +235,113 @@ func TestAuthCheck_MethodNotAllowed(t *testing.T) {
t.Logf("AuthCheck returned status %d (note: method not enforced in handler)", status)
}
}
// --- M-003 (Phase B.4): /auth/check surfaces admin flag + user identity ---
// TestAuthCheck_AdminCaller_ReportsAdminTrue confirms that when the auth
// middleware sets AdminKey{}=true (i.e., named key was admin-tagged), the
// /auth/check endpoint reports admin=true so the GUI can show admin-only
// affordances.
func TestAuthCheck_AdminCaller_ReportsAdminTrue(t *testing.T) {
handler := NewHealthHandler("api-key")
req := httptest.NewRequest(http.MethodGet, "/api/v1/auth/check", nil)
ctx := context.WithValue(req.Context(), middleware.AdminKey{}, true)
ctx = context.WithValue(ctx, middleware.UserKey{}, "ops-admin")
req = req.WithContext(ctx)
w := httptest.NewRecorder()
handler.AuthCheck(w, req)
if w.Code != http.StatusOK {
t.Fatalf("expected status 200, got %d", w.Code)
}
var result map[string]any
if err := json.NewDecoder(w.Body).Decode(&result); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
if result["status"] != "authenticated" {
t.Errorf("status = %q, want authenticated", result["status"])
}
admin, ok := result["admin"].(bool)
if !ok {
t.Fatalf("admin field missing or wrong type: %T", result["admin"])
}
if !admin {
t.Errorf("admin = false, want true")
}
if result["user"] != "ops-admin" {
t.Errorf("user = %q, want ops-admin", result["user"])
}
}
// TestAuthCheck_NonAdminCaller_ReportsAdminFalse pins the negative case: the
// auth middleware has stored AdminKey{}=false (non-admin named key) — the
// endpoint must report admin=false so the GUI hides admin-only affordances.
func TestAuthCheck_NonAdminCaller_ReportsAdminFalse(t *testing.T) {
handler := NewHealthHandler("api-key")
req := httptest.NewRequest(http.MethodGet, "/api/v1/auth/check", nil)
ctx := context.WithValue(req.Context(), middleware.AdminKey{}, false)
ctx = context.WithValue(ctx, middleware.UserKey{}, "alice")
req = req.WithContext(ctx)
w := httptest.NewRecorder()
handler.AuthCheck(w, req)
if w.Code != http.StatusOK {
t.Fatalf("expected status 200, got %d", w.Code)
}
var result map[string]any
if err := json.NewDecoder(w.Body).Decode(&result); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
admin, ok := result["admin"].(bool)
if !ok {
t.Fatalf("admin field missing or wrong type: %T", result["admin"])
}
if admin {
t.Errorf("admin = true, want false")
}
if result["user"] != "alice" {
t.Errorf("user = %q, want alice", result["user"])
}
}
// TestAuthCheck_NoAuthContext_DefaultsToEmptyUserAndFalseAdmin covers the
// CERTCTL_AUTH_TYPE=none deployment, where the auth middleware doesn't set
// any keys. Response must still be well-formed with empty user + admin=false.
func TestAuthCheck_NoAuthContext_DefaultsToEmptyUserAndFalseAdmin(t *testing.T) {
handler := NewHealthHandler("none")
req := httptest.NewRequest(http.MethodGet, "/api/v1/auth/check", nil)
w := httptest.NewRecorder()
handler.AuthCheck(w, req)
if w.Code != http.StatusOK {
t.Fatalf("expected status 200, got %d", w.Code)
}
var result map[string]any
if err := json.NewDecoder(w.Body).Decode(&result); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
if result["status"] != "authenticated" {
t.Errorf("status = %q, want authenticated", result["status"])
}
admin, ok := result["admin"].(bool)
if !ok {
t.Fatalf("admin field missing or wrong type: %T", result["admin"])
}
if admin {
t.Errorf("admin = true for no-auth context, want false")
}
if result["user"] != "" {
t.Errorf("user = %q, want empty string", result["user"])
}
}
+33 -32
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"fmt"
"net/http"
@@ -15,52 +16,52 @@ import (
// MockIssuerService is a mock implementation of IssuerService interface.
type MockIssuerService struct {
ListIssuersFn func(page, perPage int) ([]domain.Issuer, int64, error)
GetIssuerFn func(id string) (*domain.Issuer, error)
CreateIssuerFn func(issuer domain.Issuer) (*domain.Issuer, error)
UpdateIssuerFn func(id string, issuer domain.Issuer) (*domain.Issuer, error)
DeleteIssuerFn func(id string) error
TestConnectionFn func(id string) error
ListIssuersFn func(ctx context.Context, page, perPage int) ([]domain.Issuer, int64, error)
GetIssuerFn func(ctx context.Context, id string) (*domain.Issuer, error)
CreateIssuerFn func(ctx context.Context, issuer domain.Issuer) (*domain.Issuer, error)
UpdateIssuerFn func(ctx context.Context, id string, issuer domain.Issuer) (*domain.Issuer, error)
DeleteIssuerFn func(ctx context.Context, id string) error
TestConnectionFn func(ctx context.Context, id string) error
}
func (m *MockIssuerService) ListIssuers(page, perPage int) ([]domain.Issuer, int64, error) {
func (m *MockIssuerService) ListIssuers(ctx context.Context, page, perPage int) ([]domain.Issuer, int64, error) {
if m.ListIssuersFn != nil {
return m.ListIssuersFn(page, perPage)
return m.ListIssuersFn(ctx, page, perPage)
}
return nil, 0, nil
}
func (m *MockIssuerService) GetIssuer(id string) (*domain.Issuer, error) {
func (m *MockIssuerService) GetIssuer(ctx context.Context, id string) (*domain.Issuer, error) {
if m.GetIssuerFn != nil {
return m.GetIssuerFn(id)
return m.GetIssuerFn(ctx, id)
}
return nil, nil
}
func (m *MockIssuerService) CreateIssuer(issuer domain.Issuer) (*domain.Issuer, error) {
func (m *MockIssuerService) CreateIssuer(ctx context.Context, issuer domain.Issuer) (*domain.Issuer, error) {
if m.CreateIssuerFn != nil {
return m.CreateIssuerFn(issuer)
return m.CreateIssuerFn(ctx, issuer)
}
return nil, nil
}
func (m *MockIssuerService) UpdateIssuer(id string, issuer domain.Issuer) (*domain.Issuer, error) {
func (m *MockIssuerService) UpdateIssuer(ctx context.Context, id string, issuer domain.Issuer) (*domain.Issuer, error) {
if m.UpdateIssuerFn != nil {
return m.UpdateIssuerFn(id, issuer)
return m.UpdateIssuerFn(ctx, id, issuer)
}
return nil, nil
}
func (m *MockIssuerService) DeleteIssuer(id string) error {
func (m *MockIssuerService) DeleteIssuer(ctx context.Context, id string) error {
if m.DeleteIssuerFn != nil {
return m.DeleteIssuerFn(id)
return m.DeleteIssuerFn(ctx, id)
}
return nil
}
func (m *MockIssuerService) TestConnection(id string) error {
func (m *MockIssuerService) TestConnection(ctx context.Context, id string) error {
if m.TestConnectionFn != nil {
return m.TestConnectionFn(id)
return m.TestConnectionFn(ctx, id)
}
return nil
}
@@ -85,7 +86,7 @@ func TestListIssuers_Success(t *testing.T) {
}
mock := &MockIssuerService{
ListIssuersFn: func(page, perPage int) ([]domain.Issuer, int64, error) {
ListIssuersFn: func(_ context.Context, page, perPage int) ([]domain.Issuer, int64, error) {
return []domain.Issuer{iss1, iss2}, 2, nil
},
}
@@ -113,7 +114,7 @@ func TestListIssuers_Success(t *testing.T) {
func TestListIssuers_Pagination(t *testing.T) {
var capturedPage, capturedPerPage int
mock := &MockIssuerService{
ListIssuersFn: func(page, perPage int) ([]domain.Issuer, int64, error) {
ListIssuersFn: func(_ context.Context, page, perPage int) ([]domain.Issuer, int64, error) {
capturedPage = page
capturedPerPage = perPage
return []domain.Issuer{}, 0, nil
@@ -137,7 +138,7 @@ func TestListIssuers_Pagination(t *testing.T) {
func TestListIssuers_ServiceError(t *testing.T) {
mock := &MockIssuerService{
ListIssuersFn: func(page, perPage int) ([]domain.Issuer, int64, error) {
ListIssuersFn: func(_ context.Context, page, perPage int) ([]domain.Issuer, int64, error) {
return nil, 0, ErrMockServiceFailed
},
}
@@ -169,7 +170,7 @@ func TestListIssuers_MethodNotAllowed(t *testing.T) {
func TestGetIssuer_Success(t *testing.T) {
now := time.Now()
mock := &MockIssuerService{
GetIssuerFn: func(id string) (*domain.Issuer, error) {
GetIssuerFn: func(_ context.Context, id string) (*domain.Issuer, error) {
return &domain.Issuer{
ID: id,
Name: "Local CA",
@@ -195,7 +196,7 @@ func TestGetIssuer_Success(t *testing.T) {
func TestGetIssuer_NotFound(t *testing.T) {
mock := &MockIssuerService{
GetIssuerFn: func(id string) (*domain.Issuer, error) {
GetIssuerFn: func(_ context.Context, id string) (*domain.Issuer, error) {
return nil, ErrMockNotFound
},
}
@@ -228,7 +229,7 @@ func TestGetIssuer_EmptyID(t *testing.T) {
func TestCreateIssuer_Success(t *testing.T) {
now := time.Now()
mock := &MockIssuerService{
CreateIssuerFn: func(issuer domain.Issuer) (*domain.Issuer, error) {
CreateIssuerFn: func(_ context.Context, issuer domain.Issuer) (*domain.Issuer, error) {
issuer.ID = "iss-new"
issuer.CreatedAt = now
issuer.UpdatedAt = now
@@ -328,7 +329,7 @@ func TestCreateIssuer_NameTooLong(t *testing.T) {
func TestCreateIssuer_DuplicateName(t *testing.T) {
mock := &MockIssuerService{
CreateIssuerFn: func(issuer domain.Issuer) (*domain.Issuer, error) {
CreateIssuerFn: func(_ context.Context, issuer domain.Issuer) (*domain.Issuer, error) {
return nil, fmt.Errorf("failed to create issuer: duplicate key value violates unique constraint \"issuers_name_key\"")
},
}
@@ -361,7 +362,7 @@ func TestCreateIssuer_DuplicateName(t *testing.T) {
func TestCreateIssuer_UnsupportedType(t *testing.T) {
mock := &MockIssuerService{
CreateIssuerFn: func(issuer domain.Issuer) (*domain.Issuer, error) {
CreateIssuerFn: func(_ context.Context, issuer domain.Issuer) (*domain.Issuer, error) {
return nil, fmt.Errorf("unsupported issuer type: FakeCA")
},
}
@@ -394,7 +395,7 @@ func TestCreateIssuer_UnsupportedType(t *testing.T) {
func TestCreateIssuer_GenericServiceError(t *testing.T) {
mock := &MockIssuerService{
CreateIssuerFn: func(issuer domain.Issuer) (*domain.Issuer, error) {
CreateIssuerFn: func(_ context.Context, issuer domain.Issuer) (*domain.Issuer, error) {
return nil, fmt.Errorf("failed to encrypt config: cipher error")
},
}
@@ -419,7 +420,7 @@ func TestCreateIssuer_GenericServiceError(t *testing.T) {
func TestUpdateIssuer_DuplicateName(t *testing.T) {
mock := &MockIssuerService{
UpdateIssuerFn: func(id string, issuer domain.Issuer) (*domain.Issuer, error) {
UpdateIssuerFn: func(_ context.Context, id string, issuer domain.Issuer) (*domain.Issuer, error) {
return nil, fmt.Errorf("failed to update issuer: duplicate key value violates unique constraint")
},
}
@@ -445,7 +446,7 @@ func TestUpdateIssuer_DuplicateName(t *testing.T) {
func TestDeleteIssuer_Success(t *testing.T) {
var deletedID string
mock := &MockIssuerService{
DeleteIssuerFn: func(id string) error {
DeleteIssuerFn: func(_ context.Context, id string) error {
deletedID = id
return nil
},
@@ -468,7 +469,7 @@ func TestDeleteIssuer_Success(t *testing.T) {
func TestDeleteIssuer_ServiceError(t *testing.T) {
mock := &MockIssuerService{
DeleteIssuerFn: func(id string) error {
DeleteIssuerFn: func(_ context.Context, id string) error {
return ErrMockServiceFailed
},
}
@@ -487,7 +488,7 @@ func TestDeleteIssuer_ServiceError(t *testing.T) {
func TestTestConnection_Success(t *testing.T) {
mock := &MockIssuerService{
TestConnectionFn: func(id string) error {
TestConnectionFn: func(_ context.Context, id string) error {
return nil
},
}
@@ -514,7 +515,7 @@ func TestTestConnection_Success(t *testing.T) {
func TestTestConnection_Failure(t *testing.T) {
mock := &MockIssuerService{
TestConnectionFn: func(id string) error {
TestConnectionFn: func(_ context.Context, id string) error {
return ErrMockServiceFailed
},
}
+13 -12
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"log/slog"
"net/http"
@@ -13,12 +14,12 @@ import (
// IssuerService defines the service interface for issuer operations.
type IssuerService interface {
ListIssuers(page, perPage int) ([]domain.Issuer, int64, error)
GetIssuer(id string) (*domain.Issuer, error)
CreateIssuer(issuer domain.Issuer) (*domain.Issuer, error)
UpdateIssuer(id string, issuer domain.Issuer) (*domain.Issuer, error)
DeleteIssuer(id string) error
TestConnection(id string) error
ListIssuers(ctx context.Context, page, perPage int) ([]domain.Issuer, int64, error)
GetIssuer(ctx context.Context, id string) (*domain.Issuer, error)
CreateIssuer(ctx context.Context, issuer domain.Issuer) (*domain.Issuer, error)
UpdateIssuer(ctx context.Context, id string, issuer domain.Issuer) (*domain.Issuer, error)
DeleteIssuer(ctx context.Context, id string) error
TestConnection(ctx context.Context, id string) error
}
// IssuerHandler handles HTTP requests for issuer operations.
@@ -61,7 +62,7 @@ func (h IssuerHandler) ListIssuers(w http.ResponseWriter, r *http.Request) {
}
}
issuers, total, err := h.svc.ListIssuers(page, perPage)
issuers, total, err := h.svc.ListIssuers(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list issuers", requestID)
return
@@ -93,7 +94,7 @@ func (h IssuerHandler) GetIssuer(w http.ResponseWriter, r *http.Request) {
return
}
issuer, err := h.svc.GetIssuer(id)
issuer, err := h.svc.GetIssuer(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Issuer not found", requestID)
return
@@ -132,7 +133,7 @@ func (h IssuerHandler) CreateIssuer(w http.ResponseWriter, r *http.Request) {
return
}
created, err := h.svc.CreateIssuer(issuer)
created, err := h.svc.CreateIssuer(r.Context(), issuer)
if err != nil {
h.logger.Error("failed to create issuer", "error", err, "name", issuer.Name, "type", issuer.Type)
errMsg := err.Error()
@@ -174,7 +175,7 @@ func (h IssuerHandler) UpdateIssuer(w http.ResponseWriter, r *http.Request) {
return
}
updated, err := h.svc.UpdateIssuer(id, issuer)
updated, err := h.svc.UpdateIssuer(r.Context(), id, issuer)
if err != nil {
h.logger.Error("failed to update issuer", "error", err, "id", id)
errMsg := err.Error()
@@ -208,7 +209,7 @@ func (h IssuerHandler) DeleteIssuer(w http.ResponseWriter, r *http.Request) {
return
}
if err := h.svc.DeleteIssuer(id); err != nil {
if err := h.svc.DeleteIssuer(r.Context(), id); err != nil {
if strings.Contains(err.Error(), "violates foreign key") || strings.Contains(err.Error(), "RESTRICT") {
ErrorWithRequestID(w, http.StatusConflict, "Cannot delete issuer: certificates are still using this issuer", requestID)
} else if strings.Contains(err.Error(), "not found") {
@@ -241,7 +242,7 @@ func (h IssuerHandler) TestConnection(w http.ResponseWriter, r *http.Request) {
}
issuerID := parts[0]
if err := h.svc.TestConnection(issuerID); err != nil {
if err := h.svc.TestConnection(r.Context(), issuerID); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Connection test failed", requestID)
return
}
+65 -15
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"fmt"
"net/http"
@@ -10,48 +11,51 @@ import (
"time"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// MockJobService is a mock implementation of JobService interface.
// Approve/Reject closures now take the actor string so tests can assert
// actor propagation from the auth middleware → handler → service.
type MockJobService struct {
ListJobsFn func(status, jobType string, page, perPage int) ([]domain.Job, int64, error)
GetJobFn func(id string) (*domain.Job, error)
CancelJobFn func(id string) error
ApproveJobFn func(id string) error
RejectJobFn func(id string, reason string) error
ApproveJobFn func(id, actor string) error
RejectJobFn func(id, reason, actor string) error
}
func (m *MockJobService) ListJobs(status, jobType string, page, perPage int) ([]domain.Job, int64, error) {
func (m *MockJobService) ListJobs(_ context.Context, status, jobType string, page, perPage int) ([]domain.Job, int64, error) {
if m.ListJobsFn != nil {
return m.ListJobsFn(status, jobType, page, perPage)
}
return nil, 0, nil
}
func (m *MockJobService) GetJob(id string) (*domain.Job, error) {
func (m *MockJobService) GetJob(_ context.Context, id string) (*domain.Job, error) {
if m.GetJobFn != nil {
return m.GetJobFn(id)
}
return nil, nil
}
func (m *MockJobService) CancelJob(id string) error {
func (m *MockJobService) CancelJob(_ context.Context, id string) error {
if m.CancelJobFn != nil {
return m.CancelJobFn(id)
}
return nil
}
func (m *MockJobService) ApproveJob(id string) error {
func (m *MockJobService) ApproveJob(_ context.Context, id, actor string) error {
if m.ApproveJobFn != nil {
return m.ApproveJobFn(id)
return m.ApproveJobFn(id, actor)
}
return nil
}
func (m *MockJobService) RejectJob(id string, reason string) error {
func (m *MockJobService) RejectJob(_ context.Context, id, reason, actor string) error {
if m.RejectJobFn != nil {
return m.RejectJobFn(id, reason)
return m.RejectJobFn(id, reason, actor)
}
return nil
}
@@ -347,7 +351,7 @@ func TestCancelJob_EmptyID(t *testing.T) {
func TestApproveJob_Success(t *testing.T) {
var approvedID string
mock := &MockJobService{
ApproveJobFn: func(id string) error {
ApproveJobFn: func(id, actor string) error {
approvedID = id
return nil
},
@@ -378,7 +382,7 @@ func TestApproveJob_Success(t *testing.T) {
func TestApproveJob_NotFound(t *testing.T) {
mock := &MockJobService{
ApproveJobFn: func(id string) error {
ApproveJobFn: func(id, actor string) error {
return fmt.Errorf("job not found: no rows")
},
}
@@ -397,7 +401,7 @@ func TestApproveJob_NotFound(t *testing.T) {
func TestApproveJob_BadStatus(t *testing.T) {
mock := &MockJobService{
ApproveJobFn: func(id string) error {
ApproveJobFn: func(id, actor string) error {
return fmt.Errorf("cannot approve job with status Running")
},
}
@@ -426,10 +430,56 @@ func TestApproveJob_MethodNotAllowed(t *testing.T) {
}
}
// TestApproveJob_SelfApproval_Returns403 verifies the M-003 separation-of-duties
// wire: when the service returns ErrSelfApproval the handler must surface HTTP
// 403 Forbidden (NOT 500). The error sentinel crosses the service boundary via
// errors.Is so the handler can pattern-match regardless of any fmt.Errorf
// wrapping that may be added later.
func TestApproveJob_SelfApproval_Returns403(t *testing.T) {
var capturedActor string
mock := &MockJobService{
ApproveJobFn: func(id, actor string) error {
capturedActor = actor
return service.ErrSelfApproval
},
}
h := NewJobHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/jobs/job-self/approve", nil)
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
h.ApproveJob(w, req)
if w.Code != http.StatusForbidden {
t.Fatalf("expected status 403, got %d", w.Code)
}
var resp map[string]any
if err := json.NewDecoder(w.Body).Decode(&resp); err != nil {
t.Fatalf("failed to decode response: %v", err)
}
// Response body should name the self-approval condition explicitly so
// operators triaging a 403 can distinguish it from other forbid paths.
// The ErrorResponse envelope uses "error" for the status text and
// "message" for the human-readable explanation — we assert on message.
msg, _ := resp["message"].(string)
if !strings.Contains(strings.ToLower(msg), "self-approval") {
t.Errorf("expected message to mention self-approval, got %q", msg)
}
// The handler resolves the actor from the auth context; in this test the
// request has no auth context, so the propagated actor is the anonymous
// fallback ("" or "anonymous" depending on middleware wiring). We only
// assert the closure observed *some* actor string — the detailed actor
// threading is covered by resolveActor unit tests.
_ = capturedActor
}
func TestRejectJob_Success(t *testing.T) {
var rejectedID, capturedReason string
mock := &MockJobService{
RejectJobFn: func(id string, reason string) error {
RejectJobFn: func(id, reason, actor string) error {
rejectedID = id
capturedReason = reason
return nil
@@ -457,7 +507,7 @@ func TestRejectJob_Success(t *testing.T) {
func TestRejectJob_NoReason(t *testing.T) {
mock := &MockJobService{
RejectJobFn: func(id string, reason string) error {
RejectJobFn: func(id, reason, actor string) error {
return nil
},
}
@@ -476,7 +526,7 @@ func TestRejectJob_NoReason(t *testing.T) {
func TestRejectJob_NotFound(t *testing.T) {
mock := &MockJobService{
RejectJobFn: func(id string, reason string) error {
RejectJobFn: func(id, reason, actor string) error {
return fmt.Errorf("job not found: no rows")
},
}
+29 -10
View File
@@ -1,7 +1,9 @@
package handler
import (
"context"
"encoding/json"
"errors"
"io"
"net/http"
"strconv"
@@ -9,15 +11,21 @@ import (
"github.com/shankar0123/certctl/internal/api/middleware"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// JobService defines the service interface for job operations.
type JobService interface {
ListJobs(status, jobType string, page, perPage int) ([]domain.Job, int64, error)
GetJob(id string) (*domain.Job, error)
CancelJob(id string) error
ApproveJob(id string) error
RejectJob(id string, reason string) error
ListJobs(ctx context.Context, status, jobType string, page, perPage int) ([]domain.Job, int64, error)
GetJob(ctx context.Context, id string) (*domain.Job, error)
CancelJob(ctx context.Context, id string) error
// ApproveJob approves a renewal job. actor is the named-key identity
// resolved from the auth middleware; the service returns ErrSelfApproval
// (mapped to 403) when actor matches the certificate owner.
ApproveJob(ctx context.Context, id, actor string) error
// RejectJob rejects a renewal job. actor is the named-key identity
// recorded for audit attribution; no not-self restriction.
RejectJob(ctx context.Context, id, reason, actor string) error
}
// JobHandler handles HTTP requests for job operations.
@@ -57,7 +65,7 @@ func (h JobHandler) ListJobs(w http.ResponseWriter, r *http.Request) {
}
}
jobs, total, err := h.svc.ListJobs(status, jobType, page, perPage)
jobs, total, err := h.svc.ListJobs(r.Context(), status, jobType, page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list jobs", requestID)
return
@@ -91,7 +99,7 @@ func (h JobHandler) GetJob(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
job, err := h.svc.GetJob(id)
job, err := h.svc.GetJob(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Job not found", requestID)
return
@@ -119,7 +127,7 @@ func (h JobHandler) CancelJob(w http.ResponseWriter, r *http.Request) {
}
jobID := parts[0]
if err := h.svc.CancelJob(jobID); err != nil {
if err := h.svc.CancelJob(r.Context(), jobID); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to cancel job", requestID)
return
}
@@ -149,7 +157,16 @@ func (h JobHandler) ApproveJob(w http.ResponseWriter, r *http.Request) {
}
jobID := parts[0]
if err := h.svc.ApproveJob(jobID); err != nil {
actor := resolveActor(r.Context())
if err := h.svc.ApproveJob(r.Context(), jobID, actor); err != nil {
// M-003: self-approval by the certificate owner is forbidden.
if errors.Is(err, service.ErrSelfApproval) {
ErrorWithRequestID(w, http.StatusForbidden,
"Self-approval is forbidden: the certificate owner cannot approve their own renewal",
requestID)
return
}
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Job not found", requestID)
return
@@ -193,7 +210,9 @@ func (h JobHandler) RejectJob(w http.ResponseWriter, r *http.Request) {
}
}
if err := h.svc.RejectJob(jobID, body.Reason); err != nil {
actor := resolveActor(r.Context())
if err := h.svc.RejectJob(r.Context(), jobID, body.Reason, actor); err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Job not found", requestID)
return
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -17,21 +18,21 @@ type MockNotificationService struct {
MarkAsReadFn func(id string) error
}
func (m *MockNotificationService) ListNotifications(page, perPage int) ([]domain.NotificationEvent, int64, error) {
func (m *MockNotificationService) ListNotifications(_ context.Context, page, perPage int) ([]domain.NotificationEvent, int64, error) {
if m.ListNotificationsFn != nil {
return m.ListNotificationsFn(page, perPage)
}
return nil, 0, nil
}
func (m *MockNotificationService) GetNotification(id string) (*domain.NotificationEvent, error) {
func (m *MockNotificationService) GetNotification(_ context.Context, id string) (*domain.NotificationEvent, error) {
if m.GetNotificationFn != nil {
return m.GetNotificationFn(id)
}
return nil, nil
}
func (m *MockNotificationService) MarkAsRead(id string) error {
func (m *MockNotificationService) MarkAsRead(_ context.Context, id string) error {
if m.MarkAsReadFn != nil {
return m.MarkAsReadFn(id)
}
+7 -6
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"net/http"
"strconv"
"strings"
@@ -11,9 +12,9 @@ import (
// NotificationService defines the service interface for notification operations.
type NotificationService interface {
ListNotifications(page, perPage int) ([]domain.NotificationEvent, int64, error)
GetNotification(id string) (*domain.NotificationEvent, error)
MarkAsRead(id string) error
ListNotifications(ctx context.Context, page, perPage int) ([]domain.NotificationEvent, int64, error)
GetNotification(ctx context.Context, id string) (*domain.NotificationEvent, error)
MarkAsRead(ctx context.Context, id string) error
}
// NotificationHandler handles HTTP requests for notification operations.
@@ -50,7 +51,7 @@ func (h NotificationHandler) ListNotifications(w http.ResponseWriter, r *http.Re
}
}
notifications, total, err := h.svc.ListNotifications(page, perPage)
notifications, total, err := h.svc.ListNotifications(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list notifications", requestID)
return
@@ -84,7 +85,7 @@ func (h NotificationHandler) GetNotification(w http.ResponseWriter, r *http.Requ
}
id = parts[0]
notification, err := h.svc.GetNotification(id)
notification, err := h.svc.GetNotification(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Notification not found", requestID)
return
@@ -112,7 +113,7 @@ func (h NotificationHandler) MarkAsRead(w http.ResponseWriter, r *http.Request)
}
notificationID := parts[0]
if err := h.svc.MarkAsRead(notificationID); err != nil {
if err := h.svc.MarkAsRead(r.Context(), notificationID); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to mark notification as read", requestID)
return
}
+6 -5
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -20,35 +21,35 @@ type MockOwnerService struct {
DeleteOwnerFn func(id string) error
}
func (m *MockOwnerService) ListOwners(page, perPage int) ([]domain.Owner, int64, error) {
func (m *MockOwnerService) ListOwners(_ context.Context, page, perPage int) ([]domain.Owner, int64, error) {
if m.ListOwnersFn != nil {
return m.ListOwnersFn(page, perPage)
}
return nil, 0, nil
}
func (m *MockOwnerService) GetOwner(id string) (*domain.Owner, error) {
func (m *MockOwnerService) GetOwner(_ context.Context, id string) (*domain.Owner, error) {
if m.GetOwnerFn != nil {
return m.GetOwnerFn(id)
}
return nil, nil
}
func (m *MockOwnerService) CreateOwner(owner domain.Owner) (*domain.Owner, error) {
func (m *MockOwnerService) CreateOwner(_ context.Context, owner domain.Owner) (*domain.Owner, error) {
if m.CreateOwnerFn != nil {
return m.CreateOwnerFn(owner)
}
return nil, nil
}
func (m *MockOwnerService) UpdateOwner(id string, owner domain.Owner) (*domain.Owner, error) {
func (m *MockOwnerService) UpdateOwner(_ context.Context, id string, owner domain.Owner) (*domain.Owner, error) {
if m.UpdateOwnerFn != nil {
return m.UpdateOwnerFn(id, owner)
}
return nil, nil
}
func (m *MockOwnerService) DeleteOwner(id string) error {
func (m *MockOwnerService) DeleteOwner(_ context.Context, id string) error {
if m.DeleteOwnerFn != nil {
return m.DeleteOwnerFn(id)
}
+11 -10
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"strconv"
@@ -12,11 +13,11 @@ import (
// OwnerService defines the service interface for owner operations.
type OwnerService interface {
ListOwners(page, perPage int) ([]domain.Owner, int64, error)
GetOwner(id string) (*domain.Owner, error)
CreateOwner(owner domain.Owner) (*domain.Owner, error)
UpdateOwner(id string, owner domain.Owner) (*domain.Owner, error)
DeleteOwner(id string) error
ListOwners(ctx context.Context, page, perPage int) ([]domain.Owner, int64, error)
GetOwner(ctx context.Context, id string) (*domain.Owner, error)
CreateOwner(ctx context.Context, owner domain.Owner) (*domain.Owner, error)
UpdateOwner(ctx context.Context, id string, owner domain.Owner) (*domain.Owner, error)
DeleteOwner(ctx context.Context, id string) error
}
// OwnerHandler handles HTTP requests for owner operations.
@@ -53,7 +54,7 @@ func (h OwnerHandler) ListOwners(w http.ResponseWriter, r *http.Request) {
}
}
owners, total, err := h.svc.ListOwners(page, perPage)
owners, total, err := h.svc.ListOwners(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list owners", requestID)
return
@@ -87,7 +88,7 @@ func (h OwnerHandler) GetOwner(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
owner, err := h.svc.GetOwner(id)
owner, err := h.svc.GetOwner(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Owner not found", requestID)
return
@@ -122,7 +123,7 @@ func (h OwnerHandler) CreateOwner(w http.ResponseWriter, r *http.Request) {
return
}
created, err := h.svc.CreateOwner(owner)
created, err := h.svc.CreateOwner(r.Context(), owner)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create owner", requestID)
return
@@ -155,7 +156,7 @@ func (h OwnerHandler) UpdateOwner(w http.ResponseWriter, r *http.Request) {
return
}
updated, err := h.svc.UpdateOwner(id, owner)
updated, err := h.svc.UpdateOwner(r.Context(), id, owner)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to update owner", requestID)
return
@@ -182,7 +183,7 @@ func (h OwnerHandler) DeleteOwner(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
if err := h.svc.DeleteOwner(id); err != nil {
if err := h.svc.DeleteOwner(r.Context(), id); err != nil {
if strings.Contains(err.Error(), "violates foreign key") || strings.Contains(err.Error(), "RESTRICT") {
ErrorWithRequestID(w, http.StatusConflict, "Cannot delete owner: certificates are still assigned to this owner", requestID)
} else if strings.Contains(err.Error(), "not found") {
+30 -12
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"strconv"
@@ -12,12 +13,12 @@ import (
// PolicyService defines the service interface for policy rule operations.
type PolicyService interface {
ListPolicies(page, perPage int) ([]domain.PolicyRule, int64, error)
GetPolicy(id string) (*domain.PolicyRule, error)
CreatePolicy(policy domain.PolicyRule) (*domain.PolicyRule, error)
UpdatePolicy(id string, policy domain.PolicyRule) (*domain.PolicyRule, error)
DeletePolicy(id string) error
ListViolations(policyID string, page, perPage int) ([]domain.PolicyViolation, int64, error)
ListPolicies(ctx context.Context, page, perPage int) ([]domain.PolicyRule, int64, error)
GetPolicy(ctx context.Context, id string) (*domain.PolicyRule, error)
CreatePolicy(ctx context.Context, policy domain.PolicyRule) (*domain.PolicyRule, error)
UpdatePolicy(ctx context.Context, id string, policy domain.PolicyRule) (*domain.PolicyRule, error)
DeletePolicy(ctx context.Context, id string) error
ListViolations(ctx context.Context, policyID string, page, perPage int) ([]domain.PolicyViolation, int64, error)
}
// PolicyHandler handles HTTP requests for policy rule operations.
@@ -54,7 +55,7 @@ func (h PolicyHandler) ListPolicies(w http.ResponseWriter, r *http.Request) {
}
}
policies, total, err := h.svc.ListPolicies(page, perPage)
policies, total, err := h.svc.ListPolicies(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list policies", requestID)
return
@@ -88,7 +89,7 @@ func (h PolicyHandler) GetPolicy(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
policy, err := h.svc.GetPolicy(id)
policy, err := h.svc.GetPolicy(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Policy not found", requestID)
return
@@ -126,8 +127,19 @@ func (h PolicyHandler) CreatePolicy(w http.ResponseWriter, r *http.Request) {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
// Severity is optional on create; default matches the DB default.
// Any explicit value must pass the TitleCase allowlist; the DB CHECK
// constraint enforces the same set, but catching it here gives a 400
// with a clear message instead of a 500 on constraint violation.
if policy.Severity == "" {
policy.Severity = domain.PolicySeverityWarning
}
if err := ValidatePolicySeverity(policy.Severity); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
created, err := h.svc.CreatePolicy(policy)
created, err := h.svc.CreatePolicy(r.Context(), policy)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create policy", requestID)
return
@@ -173,8 +185,14 @@ func (h PolicyHandler) UpdatePolicy(w http.ResponseWriter, r *http.Request) {
return
}
}
if policy.Severity != "" {
if err := ValidatePolicySeverity(policy.Severity); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
}
updated, err := h.svc.UpdatePolicy(id, policy)
updated, err := h.svc.UpdatePolicy(r.Context(), id, policy)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to update policy", requestID)
return
@@ -201,7 +219,7 @@ func (h PolicyHandler) DeletePolicy(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
if err := h.svc.DeletePolicy(id); err != nil {
if err := h.svc.DeletePolicy(r.Context(), id); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to delete policy", requestID)
return
}
@@ -242,7 +260,7 @@ func (h PolicyHandler) ListViolations(w http.ResponseWriter, r *http.Request) {
}
}
violations, total, err := h.svc.ListViolations(policyID, page, perPage)
violations, total, err := h.svc.ListViolations(r.Context(), policyID, page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list violations", requestID)
return
+7 -6
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -21,42 +22,42 @@ type MockPolicyService struct {
ListViolationsFn func(policyID string, page, perPage int) ([]domain.PolicyViolation, int64, error)
}
func (m *MockPolicyService) ListPolicies(page, perPage int) ([]domain.PolicyRule, int64, error) {
func (m *MockPolicyService) ListPolicies(_ context.Context, page, perPage int) ([]domain.PolicyRule, int64, error) {
if m.ListPoliciesFn != nil {
return m.ListPoliciesFn(page, perPage)
}
return nil, 0, nil
}
func (m *MockPolicyService) GetPolicy(id string) (*domain.PolicyRule, error) {
func (m *MockPolicyService) GetPolicy(_ context.Context, id string) (*domain.PolicyRule, error) {
if m.GetPolicyFn != nil {
return m.GetPolicyFn(id)
}
return nil, nil
}
func (m *MockPolicyService) CreatePolicy(policy domain.PolicyRule) (*domain.PolicyRule, error) {
func (m *MockPolicyService) CreatePolicy(_ context.Context, policy domain.PolicyRule) (*domain.PolicyRule, error) {
if m.CreatePolicyFn != nil {
return m.CreatePolicyFn(policy)
}
return nil, nil
}
func (m *MockPolicyService) UpdatePolicy(id string, policy domain.PolicyRule) (*domain.PolicyRule, error) {
func (m *MockPolicyService) UpdatePolicy(_ context.Context, id string, policy domain.PolicyRule) (*domain.PolicyRule, error) {
if m.UpdatePolicyFn != nil {
return m.UpdatePolicyFn(id, policy)
}
return nil, nil
}
func (m *MockPolicyService) DeletePolicy(id string) error {
func (m *MockPolicyService) DeletePolicy(_ context.Context, id string) error {
if m.DeletePolicyFn != nil {
return m.DeletePolicyFn(id)
}
return nil
}
func (m *MockPolicyService) ListViolations(policyID string, page, perPage int) ([]domain.PolicyViolation, int64, error) {
func (m *MockPolicyService) ListViolations(_ context.Context, policyID string, page, perPage int) ([]domain.PolicyViolation, int64, error) {
if m.ListViolationsFn != nil {
return m.ListViolationsFn(policyID, page, perPage)
}
+6 -5
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -20,35 +21,35 @@ type MockProfileService struct {
DeleteProfileFn func(id string) error
}
func (m *MockProfileService) ListProfiles(page, perPage int) ([]domain.CertificateProfile, int64, error) {
func (m *MockProfileService) ListProfiles(_ context.Context, page, perPage int) ([]domain.CertificateProfile, int64, error) {
if m.ListProfilesFn != nil {
return m.ListProfilesFn(page, perPage)
}
return nil, 0, nil
}
func (m *MockProfileService) GetProfile(id string) (*domain.CertificateProfile, error) {
func (m *MockProfileService) GetProfile(_ context.Context, id string) (*domain.CertificateProfile, error) {
if m.GetProfileFn != nil {
return m.GetProfileFn(id)
}
return nil, nil
}
func (m *MockProfileService) CreateProfile(profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
func (m *MockProfileService) CreateProfile(_ context.Context, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
if m.CreateProfileFn != nil {
return m.CreateProfileFn(profile)
}
return nil, nil
}
func (m *MockProfileService) UpdateProfile(id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
func (m *MockProfileService) UpdateProfile(_ context.Context, id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
if m.UpdateProfileFn != nil {
return m.UpdateProfileFn(id, profile)
}
return nil, nil
}
func (m *MockProfileService) DeleteProfile(id string) error {
func (m *MockProfileService) DeleteProfile(_ context.Context, id string) error {
if m.DeleteProfileFn != nil {
return m.DeleteProfileFn(id)
}
+11 -10
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"strconv"
@@ -12,11 +13,11 @@ import (
// ProfileService defines the service interface for certificate profile operations.
type ProfileService interface {
ListProfiles(page, perPage int) ([]domain.CertificateProfile, int64, error)
GetProfile(id string) (*domain.CertificateProfile, error)
CreateProfile(profile domain.CertificateProfile) (*domain.CertificateProfile, error)
UpdateProfile(id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error)
DeleteProfile(id string) error
ListProfiles(ctx context.Context, page, perPage int) ([]domain.CertificateProfile, int64, error)
GetProfile(ctx context.Context, id string) (*domain.CertificateProfile, error)
CreateProfile(ctx context.Context, profile domain.CertificateProfile) (*domain.CertificateProfile, error)
UpdateProfile(ctx context.Context, id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error)
DeleteProfile(ctx context.Context, id string) error
}
// ProfileHandler handles HTTP requests for certificate profile operations.
@@ -53,7 +54,7 @@ func (h ProfileHandler) ListProfiles(w http.ResponseWriter, r *http.Request) {
}
}
profiles, total, err := h.svc.ListProfiles(page, perPage)
profiles, total, err := h.svc.ListProfiles(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list profiles", requestID)
return
@@ -85,7 +86,7 @@ func (h ProfileHandler) GetProfile(w http.ResponseWriter, r *http.Request) {
return
}
profile, err := h.svc.GetProfile(id)
profile, err := h.svc.GetProfile(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Profile not found", requestID)
return
@@ -120,7 +121,7 @@ func (h ProfileHandler) CreateProfile(w http.ResponseWriter, r *http.Request) {
return
}
created, err := h.svc.CreateProfile(profile)
created, err := h.svc.CreateProfile(r.Context(), profile)
if err != nil {
// Check if it's a validation error from the service
if strings.Contains(err.Error(), "invalid") || strings.Contains(err.Error(), "required") ||
@@ -159,7 +160,7 @@ func (h ProfileHandler) UpdateProfile(w http.ResponseWriter, r *http.Request) {
return
}
updated, err := h.svc.UpdateProfile(id, profile)
updated, err := h.svc.UpdateProfile(r.Context(), id, profile)
if err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Profile not found", requestID)
@@ -193,7 +194,7 @@ func (h ProfileHandler) DeleteProfile(w http.ResponseWriter, r *http.Request) {
return
}
if err := h.svc.DeleteProfile(id); err != nil {
if err := h.svc.DeleteProfile(r.Context(), id); err != nil {
if strings.Contains(err.Error(), "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Profile not found", requestID)
return
+20
View File
@@ -1,14 +1,34 @@
package handler
import (
"context"
"encoding/base64"
"encoding/json"
"fmt"
"net/http"
"strings"
"time"
"github.com/shankar0123/certctl/internal/api/middleware"
)
// resolveActor extracts the authenticated named-key identity from the request
// context for audit-trail attribution. Returns the named-key name when set by
// the auth middleware, or "api" as a safe sentinel when the auth middleware
// did not populate the context (e.g., AUTH_TYPE=none, or internal/system calls
// that bypass auth).
//
// Post-M-002: this is the single source of truth for handler-layer actor
// resolution. Handlers must NOT hardcode string literals like "api-key-user"
// or "api" — always go through this helper so the named-key identity flows to
// services and the audit trail.
func resolveActor(ctx context.Context) string {
if user := middleware.GetUser(ctx); user != "" {
return user
}
return "api"
}
// PagedResponse represents a paginated API response.
type PagedResponse struct {
Data interface{} `json:"data"`
+101 -36
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -9,56 +10,57 @@ import (
"time"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// MockTargetService is a mock implementation of TargetService interface.
type MockTargetService struct {
ListTargetsFn func(page, perPage int) ([]domain.DeploymentTarget, int64, error)
GetTargetFn func(id string) (*domain.DeploymentTarget, error)
CreateTargetFn func(target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
UpdateTargetFn func(id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
DeleteTargetFn func(id string) error
TestTargetConnectionFn func(id string) error
ListTargetsFn func(ctx context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error)
GetTargetFn func(ctx context.Context, id string) (*domain.DeploymentTarget, error)
CreateTargetFn func(ctx context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
UpdateTargetFn func(ctx context.Context, id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
DeleteTargetFn func(ctx context.Context, id string) error
TestConnectionFn func(ctx context.Context, id string) error
}
func (m *MockTargetService) ListTargets(page, perPage int) ([]domain.DeploymentTarget, int64, error) {
func (m *MockTargetService) ListTargets(ctx context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error) {
if m.ListTargetsFn != nil {
return m.ListTargetsFn(page, perPage)
return m.ListTargetsFn(ctx, page, perPage)
}
return nil, 0, nil
}
func (m *MockTargetService) GetTarget(id string) (*domain.DeploymentTarget, error) {
func (m *MockTargetService) GetTarget(ctx context.Context, id string) (*domain.DeploymentTarget, error) {
if m.GetTargetFn != nil {
return m.GetTargetFn(id)
return m.GetTargetFn(ctx, id)
}
return nil, nil
}
func (m *MockTargetService) CreateTarget(target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
func (m *MockTargetService) CreateTarget(ctx context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
if m.CreateTargetFn != nil {
return m.CreateTargetFn(target)
return m.CreateTargetFn(ctx, target)
}
return nil, nil
}
func (m *MockTargetService) UpdateTarget(id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
func (m *MockTargetService) UpdateTarget(ctx context.Context, id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
if m.UpdateTargetFn != nil {
return m.UpdateTargetFn(id, target)
return m.UpdateTargetFn(ctx, id, target)
}
return nil, nil
}
func (m *MockTargetService) DeleteTarget(id string) error {
func (m *MockTargetService) DeleteTarget(ctx context.Context, id string) error {
if m.DeleteTargetFn != nil {
return m.DeleteTargetFn(id)
return m.DeleteTargetFn(ctx, id)
}
return nil
}
func (m *MockTargetService) TestTargetConnection(id string) error {
if m.TestTargetConnectionFn != nil {
return m.TestTargetConnectionFn(id)
func (m *MockTargetService) TestConnection(ctx context.Context, id string) error {
if m.TestConnectionFn != nil {
return m.TestConnectionFn(ctx, id)
}
return nil
}
@@ -85,7 +87,7 @@ func TestListTargets_Success(t *testing.T) {
}
mock := &MockTargetService{
ListTargetsFn: func(page, perPage int) ([]domain.DeploymentTarget, int64, error) {
ListTargetsFn: func(_ context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error) {
return []domain.DeploymentTarget{t1, t2}, 2, nil
},
}
@@ -113,7 +115,7 @@ func TestListTargets_Success(t *testing.T) {
func TestListTargets_Pagination(t *testing.T) {
var capturedPage, capturedPerPage int
mock := &MockTargetService{
ListTargetsFn: func(page, perPage int) ([]domain.DeploymentTarget, int64, error) {
ListTargetsFn: func(_ context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error) {
capturedPage = page
capturedPerPage = perPage
return []domain.DeploymentTarget{}, 0, nil
@@ -137,7 +139,7 @@ func TestListTargets_Pagination(t *testing.T) {
func TestListTargets_ServiceError(t *testing.T) {
mock := &MockTargetService{
ListTargetsFn: func(page, perPage int) ([]domain.DeploymentTarget, int64, error) {
ListTargetsFn: func(_ context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error) {
return nil, 0, ErrMockServiceFailed
},
}
@@ -169,7 +171,7 @@ func TestListTargets_MethodNotAllowed(t *testing.T) {
func TestGetTarget_Success(t *testing.T) {
now := time.Now()
mock := &MockTargetService{
GetTargetFn: func(id string) (*domain.DeploymentTarget, error) {
GetTargetFn: func(_ context.Context, id string) (*domain.DeploymentTarget, error) {
return &domain.DeploymentTarget{
ID: id,
Name: "NGINX Proxy",
@@ -196,7 +198,7 @@ func TestGetTarget_Success(t *testing.T) {
func TestGetTarget_NotFound(t *testing.T) {
mock := &MockTargetService{
GetTargetFn: func(id string) (*domain.DeploymentTarget, error) {
GetTargetFn: func(_ context.Context, id string) (*domain.DeploymentTarget, error) {
return nil, ErrMockNotFound
},
}
@@ -229,7 +231,7 @@ func TestGetTarget_EmptyID(t *testing.T) {
func TestCreateTarget_Success(t *testing.T) {
now := time.Now()
mock := &MockTargetService{
CreateTargetFn: func(target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
CreateTargetFn: func(_ context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
target.ID = "t-new"
target.CreatedAt = now
target.UpdatedAt = now
@@ -238,8 +240,9 @@ func TestCreateTarget_Success(t *testing.T) {
}
body := map[string]interface{}{
"name": "New Target",
"type": "nginx",
"name": "New Target",
"type": "nginx",
"agent_id": "agent-001",
}
bodyBytes, _ := json.Marshal(body)
@@ -257,7 +260,8 @@ func TestCreateTarget_Success(t *testing.T) {
func TestCreateTarget_MissingName(t *testing.T) {
body := map[string]interface{}{
"type": "nginx",
"type": "nginx",
"agent_id": "agent-001",
}
bodyBytes, _ := json.Marshal(body)
@@ -275,7 +279,8 @@ func TestCreateTarget_MissingName(t *testing.T) {
func TestCreateTarget_MissingType(t *testing.T) {
body := map[string]interface{}{
"name": "New Target",
"name": "New Target",
"agent_id": "agent-001",
}
bodyBytes, _ := json.Marshal(body)
@@ -310,8 +315,9 @@ func TestCreateTarget_NameTooLong(t *testing.T) {
longName += "x"
}
body := map[string]interface{}{
"name": longName,
"type": "nginx",
"name": longName,
"type": "nginx",
"agent_id": "agent-001",
}
bodyBytes, _ := json.Marshal(body)
@@ -339,10 +345,69 @@ func TestCreateTarget_MethodNotAllowed(t *testing.T) {
}
}
// TestCreateTarget_MissingAgentID_Returns400 pins the C-002 handler contract:
// handler MUST reject a create payload that omits agent_id with HTTP 400
// before the service is invoked. Using a mock that would return 201-worthy
// success proves the guard fires.
func TestCreateTarget_MissingAgentID_Returns400(t *testing.T) {
body := map[string]interface{}{
"name": "New Target",
"type": "nginx",
// agent_id intentionally omitted
}
bodyBytes, _ := json.Marshal(body)
mock := &MockTargetService{
CreateTargetFn: func(_ context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
// Would succeed if handler guard did not fire.
target.ID = "t-would-be-created"
return &target, nil
},
}
handler := NewTargetHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/targets", bytes.NewReader(bodyBytes))
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.CreateTarget(w, req)
if w.Code != http.StatusBadRequest {
t.Fatalf("expected 400, got %d — body=%s", w.Code, w.Body.String())
}
}
// TestCreateTarget_NonexistentAgent_Returns400 pins the C-002 handler↔service
// translation: when the service returns service.ErrAgentNotFound, the handler
// MUST map it to HTTP 400, not the generic 500 used for other service errors.
func TestCreateTarget_NonexistentAgent_Returns400(t *testing.T) {
mock := &MockTargetService{
CreateTargetFn: func(_ context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
return nil, service.ErrAgentNotFound
},
}
body := map[string]interface{}{
"name": "New Target",
"type": "nginx",
"agent_id": "agent-does-not-exist",
}
bodyBytes, _ := json.Marshal(body)
handler := NewTargetHandler(mock)
req := httptest.NewRequest(http.MethodPost, "/api/v1/targets", bytes.NewReader(bodyBytes))
req = req.WithContext(contextWithRequestID())
w := httptest.NewRecorder()
handler.CreateTarget(w, req)
if w.Code != http.StatusBadRequest {
t.Fatalf("expected 400 for nonexistent agent, got %d — body=%s", w.Code, w.Body.String())
}
}
func TestUpdateTarget_Success(t *testing.T) {
now := time.Now()
mock := &MockTargetService{
UpdateTargetFn: func(id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
UpdateTargetFn: func(_ context.Context, id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
return &domain.DeploymentTarget{
ID: id,
Name: target.Name,
@@ -375,7 +440,7 @@ func TestUpdateTarget_Success(t *testing.T) {
func TestDeleteTarget_Success(t *testing.T) {
var deletedID string
mock := &MockTargetService{
DeleteTargetFn: func(id string) error {
DeleteTargetFn: func(_ context.Context, id string) error {
deletedID = id
return nil
},
@@ -398,7 +463,7 @@ func TestDeleteTarget_Success(t *testing.T) {
func TestDeleteTarget_ServiceError(t *testing.T) {
mock := &MockTargetService{
DeleteTargetFn: func(id string) error {
DeleteTargetFn: func(_ context.Context, id string) error {
return ErrMockServiceFailed
},
}
@@ -430,7 +495,7 @@ func TestDeleteTarget_EmptyID(t *testing.T) {
func TestTestTargetConnection_Success(t *testing.T) {
mock := &MockTargetService{
TestTargetConnectionFn: func(id string) error {
TestConnectionFn: func(_ context.Context, id string) error {
return nil
},
}
@@ -457,7 +522,7 @@ func TestTestTargetConnection_Success(t *testing.T) {
func TestTestTargetConnection_Failed(t *testing.T) {
mock := &MockTargetService{
TestTargetConnectionFn: func(id string) error {
TestConnectionFn: func(_ context.Context, id string) error {
return ErrMockServiceFailed
},
}
+29 -12
View File
@@ -1,23 +1,26 @@
package handler
import (
"context"
"encoding/json"
"errors"
"net/http"
"strconv"
"strings"
"github.com/shankar0123/certctl/internal/api/middleware"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/service"
)
// TargetService defines the service interface for deployment target operations.
type TargetService interface {
ListTargets(page, perPage int) ([]domain.DeploymentTarget, int64, error)
GetTarget(id string) (*domain.DeploymentTarget, error)
CreateTarget(target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
UpdateTarget(id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
DeleteTarget(id string) error
TestTargetConnection(id string) error
ListTargets(ctx context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error)
GetTarget(ctx context.Context, id string) (*domain.DeploymentTarget, error)
CreateTarget(ctx context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
UpdateTarget(ctx context.Context, id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error)
DeleteTarget(ctx context.Context, id string) error
TestConnection(ctx context.Context, id string) error
}
// TargetHandler handles HTTP requests for deployment target operations.
@@ -54,7 +57,7 @@ func (h TargetHandler) ListTargets(w http.ResponseWriter, r *http.Request) {
}
}
targets, total, err := h.svc.ListTargets(page, perPage)
targets, total, err := h.svc.ListTargets(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list targets", requestID)
return
@@ -86,7 +89,7 @@ func (h TargetHandler) GetTarget(w http.ResponseWriter, r *http.Request) {
return
}
target, err := h.svc.GetTarget(id)
target, err := h.svc.GetTarget(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Target not found", requestID)
return
@@ -124,9 +127,23 @@ func (h TargetHandler) CreateTarget(w http.ResponseWriter, r *http.Request) {
ErrorWithRequestID(w, http.StatusBadRequest, "type is required", requestID)
return
}
// C-002: agent_id is a NOT NULL FK in deployment_targets (migration 000001
// line 104). Reject empty values at the boundary so callers get a clean 400
// with the field name rather than a generic "Failed to create target" 500.
if err := ValidateRequired("agent_id", target.AgentID); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
created, err := h.svc.CreateTarget(target)
created, err := h.svc.CreateTarget(r.Context(), target)
if err != nil {
// C-002: a nonexistent agent_id is a client error, not a server error.
// The service returns ErrAgentNotFound (wrapped via fmt.Errorf %w) when
// agentRepo.Get fails; we translate that to 400 via errors.Is.
if errors.Is(err, service.ErrAgentNotFound) {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create target", requestID)
return
}
@@ -158,7 +175,7 @@ func (h TargetHandler) UpdateTarget(w http.ResponseWriter, r *http.Request) {
return
}
updated, err := h.svc.UpdateTarget(id, target)
updated, err := h.svc.UpdateTarget(r.Context(), id, target)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to update target", requestID)
return
@@ -183,7 +200,7 @@ func (h TargetHandler) DeleteTarget(w http.ResponseWriter, r *http.Request) {
return
}
if err := h.svc.DeleteTarget(id); err != nil {
if err := h.svc.DeleteTarget(r.Context(), id); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to delete target", requestID)
return
}
@@ -210,7 +227,7 @@ func (h TargetHandler) TestTargetConnection(w http.ResponseWriter, r *http.Reque
}
id := parts[0]
if err := h.svc.TestTargetConnection(id); err != nil {
if err := h.svc.TestConnection(r.Context(), id); err != nil {
JSON(w, http.StatusOK, map[string]interface{}{
"status": "failed",
"message": err.Error(),
+6 -5
View File
@@ -2,6 +2,7 @@ package handler
import (
"bytes"
"context"
"encoding/json"
"net/http"
"net/http/httptest"
@@ -20,35 +21,35 @@ type MockTeamService struct {
DeleteTeamFn func(id string) error
}
func (m *MockTeamService) ListTeams(page, perPage int) ([]domain.Team, int64, error) {
func (m *MockTeamService) ListTeams(_ context.Context, page, perPage int) ([]domain.Team, int64, error) {
if m.ListTeamsFn != nil {
return m.ListTeamsFn(page, perPage)
}
return nil, 0, nil
}
func (m *MockTeamService) GetTeam(id string) (*domain.Team, error) {
func (m *MockTeamService) GetTeam(_ context.Context, id string) (*domain.Team, error) {
if m.GetTeamFn != nil {
return m.GetTeamFn(id)
}
return nil, nil
}
func (m *MockTeamService) CreateTeam(team domain.Team) (*domain.Team, error) {
func (m *MockTeamService) CreateTeam(_ context.Context, team domain.Team) (*domain.Team, error) {
if m.CreateTeamFn != nil {
return m.CreateTeamFn(team)
}
return nil, nil
}
func (m *MockTeamService) UpdateTeam(id string, team domain.Team) (*domain.Team, error) {
func (m *MockTeamService) UpdateTeam(_ context.Context, id string, team domain.Team) (*domain.Team, error) {
if m.UpdateTeamFn != nil {
return m.UpdateTeamFn(id, team)
}
return nil, nil
}
func (m *MockTeamService) DeleteTeam(id string) error {
func (m *MockTeamService) DeleteTeam(_ context.Context, id string) error {
if m.DeleteTeamFn != nil {
return m.DeleteTeamFn(id)
}
+11 -10
View File
@@ -1,6 +1,7 @@
package handler
import (
"context"
"encoding/json"
"net/http"
"strconv"
@@ -12,11 +13,11 @@ import (
// TeamService defines the service interface for team operations.
type TeamService interface {
ListTeams(page, perPage int) ([]domain.Team, int64, error)
GetTeam(id string) (*domain.Team, error)
CreateTeam(team domain.Team) (*domain.Team, error)
UpdateTeam(id string, team domain.Team) (*domain.Team, error)
DeleteTeam(id string) error
ListTeams(ctx context.Context, page, perPage int) ([]domain.Team, int64, error)
GetTeam(ctx context.Context, id string) (*domain.Team, error)
CreateTeam(ctx context.Context, team domain.Team) (*domain.Team, error)
UpdateTeam(ctx context.Context, id string, team domain.Team) (*domain.Team, error)
DeleteTeam(ctx context.Context, id string) error
}
// TeamHandler handles HTTP requests for team operations.
@@ -53,7 +54,7 @@ func (h TeamHandler) ListTeams(w http.ResponseWriter, r *http.Request) {
}
}
teams, total, err := h.svc.ListTeams(page, perPage)
teams, total, err := h.svc.ListTeams(r.Context(), page, perPage)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list teams", requestID)
return
@@ -87,7 +88,7 @@ func (h TeamHandler) GetTeam(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
team, err := h.svc.GetTeam(id)
team, err := h.svc.GetTeam(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Team not found", requestID)
return
@@ -122,7 +123,7 @@ func (h TeamHandler) CreateTeam(w http.ResponseWriter, r *http.Request) {
return
}
created, err := h.svc.CreateTeam(team)
created, err := h.svc.CreateTeam(r.Context(), team)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create team", requestID)
return
@@ -155,7 +156,7 @@ func (h TeamHandler) UpdateTeam(w http.ResponseWriter, r *http.Request) {
return
}
updated, err := h.svc.UpdateTeam(id, team)
updated, err := h.svc.UpdateTeam(r.Context(), id, team)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to update team", requestID)
return
@@ -182,7 +183,7 @@ func (h TeamHandler) DeleteTeam(w http.ResponseWriter, r *http.Request) {
}
id = parts[0]
if err := h.svc.DeleteTeam(id); err != nil {
if err := h.svc.DeleteTeam(r.Context(), id); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to delete team", requestID)
return
}
+2 -1
View File
@@ -71,10 +71,11 @@ func ValidatePolicyType(policyType interface{}) error {
"RequiredMetadata": true,
"AllowedEnvironments": true,
"RenewalLeadTime": true,
"CertificateLifetime": true,
}
typeStr := fmt.Sprintf("%v", policyType)
if !validTypes[typeStr] {
return ValidationError{Field: "type", Message: "type must be one of: AllowedIssuers, AllowedDomains, RequiredMetadata, AllowedEnvironments, RenewalLeadTime"}
return ValidationError{Field: "type", Message: "type must be one of: AllowedIssuers, AllowedDomains, RequiredMetadata, AllowedEnvironments, RenewalLeadTime, CertificateLifetime"}
}
return nil
}
+159 -58
View File
@@ -4,16 +4,22 @@ import (
"context"
"crypto/sha256"
"encoding/hex"
"errors"
"fmt"
"io"
"log/slog"
"net/http"
"strings"
"sync"
"time"
)
// AuditRecorder is the interface that the audit middleware uses to record API calls.
// This avoids importing the service package directly, maintaining dependency inversion.
//
// Implementations may perform I/O (e.g., database writes). The middleware invokes
// RecordAPICall from a tracked goroutine so that callers can drain in-flight
// recordings during graceful shutdown via AuditMiddleware.Flush.
type AuditRecorder interface {
RecordAPICall(ctx context.Context, method, path, actor string, bodyHash string, status int, latencyMs int64) error
}
@@ -26,10 +32,42 @@ type AuditConfig struct {
Logger *slog.Logger
}
// NewAuditLog creates a middleware that records every API call to the audit trail.
// It captures method, path, authenticated actor, request body hash, response status, and latency.
// Audit recording is best-effort — failures are logged but don't affect the HTTP response.
func NewAuditLog(recorder AuditRecorder, cfg AuditConfig) func(http.Handler) http.Handler {
// ErrAuditFlushTimeout is returned by AuditMiddleware.Flush when in-flight audit
// recordings do not complete before the provided context is cancelled or its
// deadline elapses. It mirrors scheduler.ErrSchedulerShutdownTimeout so callers
// can branch on graceful-shutdown timeouts consistently across subsystems.
var ErrAuditFlushTimeout = errors.New("audit middleware flush timeout")
// AuditMiddleware is the handle returned by NewAuditLog. It wraps the audit
// logging HTTP middleware and tracks the goroutines spawned to record each API
// call, so that callers can drain them during graceful shutdown (M-1, CWE-662
// / CWE-400). The goroutines themselves still run detached from the request
// context — the shutdown-drain signal flows through this struct's WaitGroup
// instead of the per-request context.
type AuditMiddleware struct {
recorder AuditRecorder
logger *slog.Logger
excludeSet map[string]bool
// wg tracks every audit-recording goroutine spawned by Middleware so Flush
// can block until they complete before the DB pool is torn down.
wg sync.WaitGroup
}
// NewAuditLog constructs the API audit logging middleware. The returned
// *AuditMiddleware exposes the HTTP middleware via the Middleware method value
// (same func(http.Handler) http.Handler shape) and a Flush method that the
// process shutdown path must call after the HTTP server has stopped accepting
// new requests but before the audit recorder's backing store (e.g., the
// database connection pool) is closed.
//
// The middleware records method, path, authenticated actor, request body hash,
// response status, and latency. Recording is best-effort — individual failures
// are logged and do not affect the HTTP response. Shutdown is NOT best-effort:
// Flush must succeed (or time out, returning ErrAuditFlushTimeout) so that
// in-flight events are not lost when the audit recorder's connection pool is
// closed out from under the goroutines.
func NewAuditLog(recorder AuditRecorder, cfg AuditConfig) *AuditMiddleware {
excludeSet := make(map[string]bool, len(cfg.ExcludePaths))
for _, p := range cfg.ExcludePaths {
excludeSet[p] = true
@@ -40,68 +78,131 @@ func NewAuditLog(recorder AuditRecorder, cfg AuditConfig) func(http.Handler) htt
logger = slog.Default()
}
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Skip excluded paths (health, readiness probes)
for prefix := range excludeSet {
if strings.HasPrefix(r.URL.Path, prefix) {
next.ServeHTTP(w, r)
return
}
return &AuditMiddleware{
recorder: recorder,
logger: logger,
excludeSet: excludeSet,
}
}
// Middleware is the http.Handler wrapper. It has the standard
// func(http.Handler) http.Handler middleware signature so it can be composed
// into an existing middleware chain via a method value (auditMiddleware.Middleware).
func (a *AuditMiddleware) Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Skip excluded paths (health, readiness probes)
for prefix := range a.excludeSet {
if strings.HasPrefix(r.URL.Path, prefix) {
next.ServeHTTP(w, r)
return
}
}
start := time.Now()
start := time.Now()
// Hash request body for audit (don't store raw bodies — security + size concerns)
bodyHash := ""
if r.Body != nil && r.Body != http.NoBody {
hasher := sha256.New()
body, err := io.ReadAll(r.Body)
if err == nil && len(body) > 0 {
hasher.Write(body)
bodyHash = hex.EncodeToString(hasher.Sum(nil))[:16] // truncated hash
// Restore the body for downstream handlers
r.Body = io.NopCloser(strings.NewReader(string(body)))
}
// Hash request body for audit (don't store raw bodies — security + size concerns)
bodyHash := ""
if r.Body != nil && r.Body != http.NoBody {
hasher := sha256.New()
body, err := io.ReadAll(r.Body)
if err == nil && len(body) > 0 {
hasher.Write(body)
bodyHash = hex.EncodeToString(hasher.Sum(nil))[:16] // truncated hash
// Restore the body for downstream handlers
r.Body = io.NopCloser(strings.NewReader(string(body)))
}
}
// Extract actor from auth context
actor := "anonymous"
if user, ok := GetUser(r.Context()); ok && user != "" {
actor = user
// Extract actor from auth context
actor := "anonymous"
if user := GetUser(r.Context()); user != "" {
actor = user
}
// Wrap response writer to capture status code
wrapped := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
next.ServeHTTP(wrapped, r)
latency := time.Since(start).Milliseconds()
// Snapshot request-derived inputs so the goroutine does not race with
// the http.Server reusing r after this handler returns.
method := r.Method
path := r.URL.Path
status := wrapped.statusCode
// Derive a detached context that preserves request-scoped values
// (trace IDs, auth info carried via context keys) but is not cancelled
// when the HTTP server finalizes the request. Using r.Context()
// directly would cause the async audit write to observe ctx.Done()
// as soon as the response completes; using context.Background() would
// discard useful observability metadata. WithoutCancel gives us both
// (M-2 / D-3).
auditCtx := context.WithoutCancel(r.Context())
// Record audit event asynchronously (best-effort, don't block response).
// SECURITY: We intentionally use r.URL.Path (not r.URL.String() or r.RequestURI)
// to prevent query parameters from being recorded in the immutable audit trail.
// Query strings may contain cursor tokens, API keys passed as params, or other
// sensitive filter values. Since the audit trail is append-only with no deletion
// capability, any sensitive data recorded would persist permanently.
//
// The goroutine is tracked in a.wg so AuditMiddleware.Flush can drain
// in-flight recordings during graceful shutdown. Without this (M-1,
// CWE-662 / CWE-400), SIGTERM would close the DB pool while recordings
// were still mid-flight, silently dropping audit events.
a.wg.Add(1)
go func() {
defer a.wg.Done()
if err := a.recorder.RecordAPICall(
auditCtx,
method,
path,
actor,
bodyHash,
status,
latency,
); err != nil {
a.logger.Error("failed to record API audit event",
"error", err,
"method", method,
"path", path,
)
}
}()
})
}
// Wrap response writer to capture status code
wrapped := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
// Flush blocks until every audit-recording goroutine spawned by Middleware has
// completed, or until ctx is cancelled / its deadline elapses. It must be
// called from the process shutdown path after http.Server.Shutdown has
// returned (so no new requests are being accepted) but before the backing
// audit recorder's resources (DB pool, etc.) are torn down.
//
// On timeout or cancellation Flush returns ErrAuditFlushTimeout wrapped with
// any context error; in-flight goroutines continue to run and may still write
// to the recorder once they unblock — the caller is responsible for deciding
// whether to proceed with teardown anyway or surface the error.
//
// Flush mirrors the idiom used by scheduler.Scheduler.WaitForCompletion so
// that the two subsystems drain identically at shutdown.
func (a *AuditMiddleware) Flush(ctx context.Context) error {
done := make(chan struct{})
go func() {
a.wg.Wait()
close(done)
}()
next.ServeHTTP(wrapped, r)
latency := time.Since(start).Milliseconds()
// Record audit event asynchronously (best-effort, don't block response).
// SECURITY: We intentionally use r.URL.Path (not r.URL.String() or r.RequestURI)
// to prevent query parameters from being recorded in the immutable audit trail.
// Query strings may contain cursor tokens, API keys passed as params, or other
// sensitive filter values. Since the audit trail is append-only with no deletion
// capability, any sensitive data recorded would persist permanently.
go func() {
if err := recorder.RecordAPICall(
context.Background(),
r.Method,
r.URL.Path,
actor,
bodyHash,
wrapped.statusCode,
latency,
); err != nil {
logger.Error("failed to record API audit event",
"error", err,
"method", r.Method,
"path", r.URL.Path,
)
}
}()
})
select {
case <-done:
a.logger.Info("audit middleware flush complete")
return nil
case <-ctx.Done():
a.logger.Warn("audit middleware flush did not complete before context cancellation",
"error", ctx.Err(),
)
return fmt.Errorf("%w: %w", ErrAuditFlushTimeout, ctx.Err())
}
}
+133 -14
View File
@@ -2,6 +2,7 @@ package middleware
import (
"context"
"errors"
"fmt"
"io"
"net/http"
@@ -16,7 +17,8 @@ import (
type mockAuditRecorder struct {
mu sync.Mutex
calls []auditCall
err error // if non-nil, RecordAPICall returns this
err error // if non-nil, RecordAPICall returns this
block chan struct{} // if non-nil, RecordAPICall blocks on receive before returning
}
type auditCall struct {
@@ -29,6 +31,13 @@ type auditCall struct {
}
func (m *mockAuditRecorder) RecordAPICall(ctx context.Context, method, path, actor, bodyHash string, status int, latencyMs int64) error {
// Optional: block the recorder until a signal is received so tests can
// exercise the shutdown-drain path deterministically. The block happens
// before any state mutation so Flush-timeout tests see the call
// "in-flight" (wg counter > 0) with no recorded entries yet.
if m.block != nil {
<-m.block
}
m.mu.Lock()
defer m.mu.Unlock()
m.calls = append(m.calls, auditCall{
@@ -90,7 +99,7 @@ func (w *waitableAuditRecorder) Wait(timeout time.Duration) bool {
func TestAuditLog_RecordsAPICall(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
@@ -130,7 +139,7 @@ func TestAuditLog_RecordsAPICall(t *testing.T) {
func TestAuditLog_CapturesStatusCode(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNotFound)
@@ -157,7 +166,7 @@ func TestAuditLog_ExcludesHealth(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{
ExcludePaths: []string{"/health", "/ready"},
})
}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
@@ -193,7 +202,7 @@ func TestAuditLog_ExcludesHealth(t *testing.T) {
func TestAuditLog_HashesRequestBody(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
// Handler verifies body was restored
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -228,7 +237,7 @@ func TestAuditLog_HashesRequestBody(t *testing.T) {
func TestAuditLog_EmptyBodyNoHash(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
@@ -253,15 +262,16 @@ func TestAuditLog_EmptyBodyNoHash(t *testing.T) {
func TestAuditLog_ExtractsAuthenticatedActor(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}))
req := httptest.NewRequest(http.MethodDelete, "/api/v1/certificates/mc-1", nil)
// Simulate auth middleware having set the user in context
ctx := context.WithValue(req.Context(), UserKey{}, "api-key-user")
// Simulate auth middleware having set the named-key identity in context
// (post-M-002: actor is the named-key name, not the old "api-key-user").
ctx := context.WithValue(req.Context(), UserKey{}, "ops-admin")
req = req.WithContext(ctx)
rr := httptest.NewRecorder()
@@ -275,8 +285,8 @@ func TestAuditLog_ExtractsAuthenticatedActor(t *testing.T) {
if len(calls) != 1 {
t.Fatalf("expected 1 audit call, got %d", len(calls))
}
if calls[0].Actor != "api-key-user" {
t.Errorf("expected actor api-key-user, got %s", calls[0].Actor)
if calls[0].Actor != "ops-admin" {
t.Errorf("expected actor ops-admin, got %s", calls[0].Actor)
}
if calls[0].Method != "DELETE" {
t.Errorf("expected method DELETE, got %s", calls[0].Method)
@@ -285,7 +295,7 @@ func TestAuditLog_ExtractsAuthenticatedActor(t *testing.T) {
func TestAuditLog_RecorderErrorDoesNotBreakResponse(t *testing.T) {
recorder := &mockAuditRecorder{err: fmt.Errorf("db connection lost")}
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
@@ -304,7 +314,7 @@ func TestAuditLog_RecorderErrorDoesNotBreakResponse(t *testing.T) {
func TestAuditLog_CapturesLatency(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
time.Sleep(10 * time.Millisecond)
@@ -330,7 +340,7 @@ func TestAuditLog_CapturesLatency(t *testing.T) {
func TestAuditLog_ExcludesQueryParamsFromPath(t *testing.T) {
recorder := newWaitableAuditRecorder()
mw := NewAuditLog(recorder, AuditConfig{})
mw := NewAuditLog(recorder, AuditConfig{}).Middleware
handler := mw(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
@@ -429,3 +439,112 @@ func TestAuditServiceAdapter_PropagatesError(t *testing.T) {
t.Errorf("expected database error, got %v", err)
}
}
// TestAuditLog_FlushDrainsInFlightGoroutines verifies the M-1 shutdown-drain
// contract: Flush blocks until every audit-recording goroutine spawned by the
// middleware completes, then returns nil. Without the drain (pre-M-1 code),
// the DB pool would be closed while in-flight goroutines were still calling
// RecordAPICall, silently dropping audit events (CWE-662 / CWE-400).
func TestAuditLog_FlushDrainsInFlightGoroutines(t *testing.T) {
// Recorder blocks on `unblock` until the test releases it. This simulates
// a slow DB write still in flight when shutdown begins.
unblock := make(chan struct{})
recorder := &mockAuditRecorder{block: unblock}
auditMW := NewAuditLog(recorder, AuditConfig{})
handler := auditMW.Middleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}))
// Fire a request. Handler returns immediately; recorder goroutine is
// parked on the `unblock` channel inside RecordAPICall.
req := httptest.NewRequest(http.MethodGet, "/api/v1/certificates", nil)
rr := httptest.NewRecorder()
handler.ServeHTTP(rr, req)
// Start Flush in a goroutine — it must block on the WaitGroup until we
// release the recorder.
flushDone := make(chan error, 1)
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
flushDone <- auditMW.Flush(ctx)
}()
// Confirm Flush is actually blocked (not returning immediately).
select {
case err := <-flushDone:
t.Fatalf("Flush returned before recorder unblocked: err=%v", err)
case <-time.After(50 * time.Millisecond):
// expected: Flush is blocked on wg.Wait
}
// Release the recorder. Flush should now observe wg counter drop to 0
// and return nil.
close(unblock)
select {
case err := <-flushDone:
if err != nil {
t.Fatalf("expected nil from Flush after drain, got %v", err)
}
case <-time.After(2 * time.Second):
t.Fatal("Flush did not return after recorder unblocked")
}
// Verify the audit event was actually recorded (i.e., the goroutine
// completed its write — not just that Flush unblocked).
calls := recorder.getCalls()
if len(calls) != 1 {
t.Fatalf("expected 1 recorded audit call, got %d", len(calls))
}
if calls[0].Path != "/api/v1/certificates" {
t.Errorf("expected path /api/v1/certificates, got %s", calls[0].Path)
}
}
// TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout verifies that Flush
// respects its context: when in-flight goroutines exceed the shutdown budget,
// Flush returns an error wrapping ErrAuditFlushTimeout plus ctx.Err(). The
// caller can then decide whether to proceed with teardown anyway.
func TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout(t *testing.T) {
// Recorder will never unblock on its own — we unblock at end of test for
// a clean race-safe teardown.
unblock := make(chan struct{})
recorder := &mockAuditRecorder{block: unblock}
auditMW := NewAuditLog(recorder, AuditConfig{})
handler := auditMW.Middleware(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}))
req := httptest.NewRequest(http.MethodPost, "/api/v1/certificates", nil)
rr := httptest.NewRecorder()
handler.ServeHTTP(rr, req)
// Flush with a tiny deadline — must time out.
ctx, cancel := context.WithTimeout(context.Background(), 20*time.Millisecond)
defer cancel()
err := auditMW.Flush(ctx)
if err == nil {
// Release the blocked goroutine before failing so the race detector
// doesn't trip on teardown.
close(unblock)
t.Fatal("expected Flush to return an error on timeout, got nil")
}
if !errors.Is(err, ErrAuditFlushTimeout) {
close(unblock)
t.Fatalf("expected error to wrap ErrAuditFlushTimeout, got %v", err)
}
if !errors.Is(err, context.DeadlineExceeded) {
close(unblock)
t.Fatalf("expected error to wrap context.DeadlineExceeded, got %v", err)
}
// Race-safe teardown: unblock the recorder goroutine so it exits cleanly
// before the test returns. The goroutine itself is still detached and
// will record to the mock even after Flush timed out — that's the
// documented behavior (Flush surfaces the timeout; caller decides).
close(unblock)
}
+98 -31
View File
@@ -5,6 +5,7 @@ import (
"crypto/sha256"
"crypto/subtle"
"encoding/hex"
"fmt"
"log"
"log/slog"
"net/http"
@@ -21,6 +22,16 @@ type RequestIDKey struct{}
// UserKey is the context key for storing authenticated user information.
type UserKey struct{}
// AdminKey is the context key for storing admin flag information.
type AdminKey struct{}
// NamedAPIKey represents a named API key with optional admin flag.
type NamedAPIKey struct {
Name string
Key string
Admin bool
}
// RequestID middleware generates a unique request ID and adds it to the request context and response headers.
func RequestID(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -78,10 +89,17 @@ func NewLogging(logger *slog.Logger) func(http.Handler) http.Handler {
// Recovery middleware recovers from panics and returns a 500 error.
func Recovery(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
defer func() {
if err := recover(); err != nil {
requestID := getRequestID(r.Context())
log.Printf("[%s] PANIC: %v", requestID, err)
requestID := getRequestID(ctx)
// Use slog.ErrorContext so the panic log carries the same
// request-scoped trace/auth metadata as normal request logs
// (M-2 / D-3 — preserve ctx propagation on the panic path).
slog.ErrorContext(ctx, "panic recovered in HTTP handler",
"request_id", requestID,
"panic", fmt.Sprintf("%v", err),
)
http.Error(w, `{"error":"Internal Server Error"}`, http.StatusInternalServerError)
}
}()
@@ -104,35 +122,40 @@ type AuthConfig struct {
Secret string // The raw API key or comma-separated list of valid API keys
}
// NewAuth creates an authentication middleware based on config.
// When Type is "none", all requests pass through (demo/development mode).
// When Type is "api-key", requests must include a valid Bearer token.
// The Secret field supports a comma-separated list of valid API keys for
// zero-downtime key rotation. Rotation workflow:
// 1. Add new key to comma-separated list, restart server
// 2. Update all agents/clients to use new key
// 3. Remove old key from list, restart server
func NewAuth(cfg AuthConfig) func(http.Handler) http.Handler {
if cfg.Type == "none" {
// NewAuthWithNamedKeys creates an authentication middleware that validates
// Bearer tokens against a set of named API keys. Each key carries a name
// (propagated as the actor via context) and an admin flag (consulted by
// authorization gates such as bulk revocation).
//
// When namedKeys is empty the returned middleware is a no-op pass-through,
// which is used in demo/development mode (CERTCTL_AUTH_TYPE=none). When one
// or more keys are provided, requests must include a matching Bearer token
// or they are rejected with 401.
func NewAuthWithNamedKeys(namedKeys []NamedAPIKey) func(http.Handler) http.Handler {
if len(namedKeys) == 0 {
return func(next http.Handler) http.Handler {
return next
}
}
// Pre-compute hashes of all valid keys for constant-time comparison.
// Supports comma-separated list for zero-downtime key rotation.
keys := strings.Split(cfg.Secret, ",")
var expectedHashes []string
for _, k := range keys {
k = strings.TrimSpace(k)
if k != "" {
expectedHashes = append(expectedHashes, HashAPIKey(k))
}
type keyEntry struct {
hash string
name string
admin bool
}
var entries []keyEntry
for _, nk := range namedKeys {
entries = append(entries, keyEntry{
hash: HashAPIKey(nk.Key),
name: nk.Name,
admin: nk.Admin,
})
}
// Warn if only one key is configured in production mode
if len(expectedHashes) == 1 {
slog.Warn("only one API key configured — consider adding a rotation key via comma-separated CERTCTL_AUTH_SECRET for zero-downtime rotation")
if len(entries) == 1 {
slog.Warn("only one API key configured — consider adding a rotation key for zero-downtime rotation")
}
return func(next http.Handler) http.Handler {
@@ -156,27 +179,60 @@ func NewAuth(cfg AuthConfig) func(http.Handler) http.Handler {
tokenHash := HashAPIKey(token)
// Check against all valid keys using constant-time comparison
authorized := false
for _, expectedHash := range expectedHashes {
if subtle.ConstantTimeCompare([]byte(tokenHash), []byte(expectedHash)) == 1 {
authorized = true
var matched *keyEntry
for i := range entries {
if subtle.ConstantTimeCompare([]byte(tokenHash), []byte(entries[i].hash)) == 1 {
matched = &entries[i]
break
}
}
if !authorized {
if matched == nil {
w.Header().Set("Content-Type", "application/json; charset=utf-8")
http.Error(w, `{"error":"Invalid API key"}`, http.StatusUnauthorized)
return
}
// Store the authenticated identity in context
ctx := context.WithValue(r.Context(), UserKey{}, "api-key-user")
// Store the authenticated identity and admin flag in context
ctx := context.WithValue(r.Context(), UserKey{}, matched.name)
ctx = context.WithValue(ctx, AdminKey{}, matched.admin)
next.ServeHTTP(w, r.WithContext(ctx))
})
}
}
// NewAuth is a legacy shim that converts a comma-separated Secret list into
// synthesized legacy-key-N named entries and delegates to NewAuthWithNamedKeys.
// It preserves the pre-M-002 behavior for callers that still pass raw AuthConfig
// (primarily cmd/server/main_test.go). The synthesized actor is "legacy-key-N"
// rather than the old hardcoded "api-key-user" so audit events carry
// meaningful identity even on the legacy path.
//
// Deprecated: Use NewAuthWithNamedKeys with explicit NamedAPIKey entries.
func NewAuth(cfg AuthConfig) func(http.Handler) http.Handler {
if cfg.Type == "none" {
return func(next http.Handler) http.Handler {
return next
}
}
var namedKeys []NamedAPIKey
idx := 0
for _, k := range strings.Split(cfg.Secret, ",") {
k = strings.TrimSpace(k)
if k == "" {
continue
}
namedKeys = append(namedKeys, NamedAPIKey{
Name: fmt.Sprintf("legacy-key-%d", idx),
Key: k,
Admin: false,
})
idx++
}
return NewAuthWithNamedKeys(namedKeys)
}
// RateLimitConfig holds configuration for the rate limiter.
type RateLimitConfig struct {
RPS float64 // Requests per second
@@ -336,9 +392,20 @@ func getRequestID(ctx context.Context) string {
}
// GetUser extracts the authenticated user from context.
func GetUser(ctx context.Context) (string, bool) {
// Returns the name of the matched API key and whether it was found.
func GetUser(ctx context.Context) string {
user, ok := ctx.Value(UserKey{}).(string)
return user, ok
if !ok {
return ""
}
return user
}
// IsAdmin extracts the admin flag from context.
// Returns true if the authenticated user has admin privileges.
func IsAdmin(ctx context.Context) bool {
admin, ok := ctx.Value(AdminKey{}).(bool)
return ok && admin
}
// responseWriter wraps http.ResponseWriter to capture the status code.
+31 -6
View File
@@ -109,12 +109,10 @@ func (r *Router) RegisterHandlers(reg HandlerRegistry) {
r.Register("GET /api/v1/certificates/{id}/export/pem", http.HandlerFunc(reg.Export.ExportPEM))
r.Register("POST /api/v1/certificates/{id}/export/pkcs12", http.HandlerFunc(reg.Export.ExportPKCS12))
// CRL endpoints: /api/v1/crl (JSON) and /api/v1/crl/{issuer_id} (DER)
r.Register("GET /api/v1/crl", http.HandlerFunc(reg.Certificates.GetCRL))
r.Register("GET /api/v1/crl/{issuer_id}", http.HandlerFunc(reg.Certificates.GetDERCRL))
// OCSP responder: /api/v1/ocsp/{issuer_id}/{serial}
r.Register("GET /api/v1/ocsp/{issuer_id}/{serial}", http.HandlerFunc(reg.Certificates.HandleOCSP))
// NOTE: RFC 5280 CRL and RFC 6960 OCSP endpoints are registered separately
// via RegisterPKIHandlers under /.well-known/pki/ so relying parties can
// fetch them without presenting certctl API credentials. The legacy
// /api/v1/crl and /api/v1/ocsp paths have been retired (see M-006).
// Issuers routes: /api/v1/issuers
r.Register("GET /api/v1/issuers", http.HandlerFunc(reg.Issuers.ListIssuers))
@@ -133,9 +131,21 @@ func (r *Router) RegisterHandlers(reg HandlerRegistry) {
r.Register("POST /api/v1/targets/{id}/test", http.HandlerFunc(reg.Targets.TestTargetConnection))
// Agents routes: /api/v1/agents
//
// I-004 soft-retirement surface:
// * GET /api/v1/agents/retired — opt-in listing of retired agents.
// MUST be registered before /agents/{id} so Go 1.22 ServeMux's
// literal-beats-pattern-var precedence routes the `retired` literal
// to ListRetiredAgents instead of treating "retired" as a {id}
// parameter value against GetAgent.
// * DELETE /api/v1/agents/{id} — RetireAgent. Replaces the pre-I-004
// hard-delete; the underlying repo does a soft-retire with
// optional cascade.
r.Register("GET /api/v1/agents", http.HandlerFunc(reg.Agents.ListAgents))
r.Register("POST /api/v1/agents", http.HandlerFunc(reg.Agents.RegisterAgent))
r.Register("GET /api/v1/agents/retired", http.HandlerFunc(reg.Agents.ListRetiredAgents))
r.Register("GET /api/v1/agents/{id}", http.HandlerFunc(reg.Agents.GetAgent))
r.Register("DELETE /api/v1/agents/{id}", http.HandlerFunc(reg.Agents.RetireAgent))
r.Register("POST /api/v1/agents/{id}/heartbeat", http.HandlerFunc(reg.Agents.Heartbeat))
r.Register("POST /api/v1/agents/{id}/csr", http.HandlerFunc(reg.Agents.AgentCSRSubmit))
r.Register("GET /api/v1/agents/{id}/certificates/{cert_id}", http.HandlerFunc(reg.Agents.AgentCertificatePickup))
@@ -262,6 +272,21 @@ func (r *Router) RegisterSCEPHandlers(scep handler.SCEPHandler) {
r.Register("POST /scep", http.HandlerFunc(scep.HandleSCEP))
}
// RegisterPKIHandlers sets up RFC 5280 CRL and RFC 6960 OCSP routes under
// /.well-known/pki/. These endpoints are intentionally unauthenticated so
// relying parties (browsers, OpenSSL, OCSP stapling sidecars, mTLS clients)
// can fetch revocation data without presenting certctl API credentials.
// The response bodies are DER-encoded and carry the IANA-registered content
// types application/pkix-crl and application/ocsp-response.
//
// Precedent: EST (RFC 7030) and SCEP (RFC 8894) follow the same pattern —
// standards-defined wire formats served via a dedicated router registration
// that cmd/server wires into a no-auth middleware chain.
func (r *Router) RegisterPKIHandlers(pki handler.CertificateHandler) {
r.Register("GET /.well-known/pki/crl/{issuer_id}", http.HandlerFunc(pki.GetDERCRL))
r.Register("GET /.well-known/pki/ocsp/{issuer_id}/{serial}", http.HandlerFunc(pki.HandleOCSP))
}
// GetMux returns the underlying http.ServeMux for direct access if needed.
func (r *Router) GetMux() *http.ServeMux {
return r.mux
+57 -4
View File
@@ -138,10 +138,9 @@ func TestRegisterHandlers_RoutesDispatch(t *testing.T) {
// Export
{"GET", "/api/v1/certificates/mc-test/export/pem"},
// CRL & OCSP
{"GET", "/api/v1/crl"},
{"GET", "/api/v1/crl/iss-local"},
{"GET", "/api/v1/ocsp/iss-local/12345"},
// NOTE: CRL/OCSP moved out of /api/v1/* in M-006. They are now served
// unauthenticated at /.well-known/pki/* via RegisterPKIHandlers and
// are verified in TestRegisterPKIHandlers_AllPaths below.
// Issuers
{"GET", "/api/v1/issuers"},
@@ -336,6 +335,60 @@ func TestRegisterESTHandlers_AllPaths(t *testing.T) {
}
}
// TestRegisterPKIHandlers_AllPaths verifies that RegisterPKIHandlers registers
// the two RFC-compliant unauthenticated endpoints relocated in M-006:
//
// - GET /.well-known/pki/crl/{issuer_id} (RFC 5280 §5 DER CRL)
// - GET /.well-known/pki/ocsp/{issuer_id}/{serial} (RFC 6960 §2.1 OCSP)
//
// Registration and middleware gating are complementary: this test proves the
// router matches the path; the unauthenticated contract is enforced separately
// by cmd/server/main.go's finalHandler routing /.well-known/pki/* through the
// noAuthHandler.
func TestRegisterPKIHandlers_AllPaths(t *testing.T) {
r := New()
// Zero-value CertificateHandler will panic on real calls; the only thing
// this test is verifying is that the route dispatches (i.e. the URL
// pattern is registered), so catch the downstream panic.
recoverMW := func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer func() {
if rv := recover(); rv != nil {
w.WriteHeader(http.StatusOK)
}
}()
next.ServeHTTP(w, r)
})
}
r.RegisterPKIHandlers(handler.CertificateHandler{})
testHandler := recoverMW(r)
routes := []struct {
method string
path string
}{
{"GET", "/.well-known/pki/crl/iss-local"},
{"GET", "/.well-known/pki/ocsp/iss-local/01ABCDEF"},
}
for _, tc := range routes {
t.Run(tc.method+" "+tc.path, func(t *testing.T) {
req := httptest.NewRequest(tc.method, tc.path, nil)
w := httptest.NewRecorder()
testHandler.ServeHTTP(w, req)
if w.Code == http.StatusNotFound {
t.Errorf("PKI route %s %s returned 404 — route not registered", tc.method, tc.path)
}
if w.Code == http.StatusMethodNotAllowed {
t.Errorf("PKI route %s %s returned 405", tc.method, tc.path)
}
})
}
}
// TestGetMux_ReturnsUnderlyingMux tests that GetMux returns the underlying mux.
func TestGetMux_ReturnsUnderlyingMux(t *testing.T) {
r := New()
+228
View File
@@ -0,0 +1,228 @@
package cli
import (
"encoding/json"
"net/http"
"net/http/httptest"
"testing"
)
// TestClient_RetireAgent_Success pins the I-004 CLI happy path: the operator
// runs `certctl-cli agents retire <id>` and the client issues a DELETE to
// /api/v1/agents/{id}, parses the 200 JSON body (retired_at, already_retired,
// cascade, counts), and reports success. The handler test already covers the
// server-side contract; this test covers the client-side wire formatting so a
// refactor of the server's 200 body shape can't silently break the CLI.
func TestClient_RetireAgent_Success(t *testing.T) {
var (
sawMethod string
sawPath string
sawForce string
sawReason string
)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sawMethod = r.Method
sawPath = r.URL.Path
sawForce = r.URL.Query().Get("force")
sawReason = r.URL.Query().Get("reason")
if r.Method != "DELETE" || r.URL.Path != "/api/v1/agents/ag-1" {
w.WriteHeader(http.StatusNotFound)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(map[string]interface{}{
"retired_at": "2026-04-18T12:00:00Z",
"already_retired": false,
"cascade": false,
"counts": map[string]interface{}{
"active_targets": 0,
"active_certificates": 0,
"pending_jobs": 0,
},
})
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
// Positional arg: the agent ID. No --force, no --reason — the default
// soft-retire path. Compile-fail until client.RetireAgent exists.
if err := client.RetireAgent([]string{"ag-1"}); err != nil {
t.Fatalf("RetireAgent(ag-1) err=%v want nil", err)
}
if sawMethod != "DELETE" {
t.Errorf("method=%q want DELETE", sawMethod)
}
if sawPath != "/api/v1/agents/ag-1" {
t.Errorf("path=%q want /api/v1/agents/ag-1", sawPath)
}
if sawForce != "" {
t.Errorf("force query=%q want empty (default path sends no force)", sawForce)
}
if sawReason != "" {
t.Errorf("reason query=%q want empty (default path sends no reason)", sawReason)
}
}
// TestClient_RetireAgent_Force_WithReason_Success pins the ?force=true&reason=...
// escape hatch wiring. Operators who supply --force + --reason get their values
// propagated as URL query parameters exactly once, so the server sees the same
// contract the handler test expects. Also verifies the cascade=true response
// body parses cleanly.
func TestClient_RetireAgent_Force_WithReason_Success(t *testing.T) {
var (
sawForce string
sawReason string
)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sawForce = r.URL.Query().Get("force")
sawReason = r.URL.Query().Get("reason")
if r.Method != "DELETE" || r.URL.Path != "/api/v1/agents/ag-1" {
w.WriteHeader(http.StatusNotFound)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(map[string]interface{}{
"retired_at": "2026-04-18T12:00:00Z",
"already_retired": false,
"cascade": true,
"counts": map[string]interface{}{
"active_targets": 2,
"active_certificates": 5,
"pending_jobs": 1,
},
})
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
if err := client.RetireAgent([]string{"ag-1", "--force", "--reason", "decommissioning rack 7"}); err != nil {
t.Fatalf("RetireAgent(force+reason) err=%v want nil", err)
}
if sawForce != "true" {
t.Errorf("force query=%q want \"true\"", sawForce)
}
if sawReason != "decommissioning rack 7" {
t.Errorf("reason query=%q want %q", sawReason, "decommissioning rack 7")
}
}
// TestClient_RetireAgent_Force_RequiresReason pins the client-side guard: using
// --force without --reason must fail BEFORE any HTTP request is made. Without
// this, the client would bounce off the server's 400 ErrForceReasonRequired
// only after a round trip — slow feedback, wasted audit-trail noise, and a
// worse operator experience. requestCount=0 enforces that no HTTP call happens.
func TestClient_RetireAgent_Force_RequiresReason(t *testing.T) {
var requestCount int
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestCount++
w.WriteHeader(http.StatusOK)
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
err := client.RetireAgent([]string{"ag-1", "--force"})
if err == nil {
t.Fatalf("RetireAgent(force, no reason) err=nil want client-side error")
}
if !containsStr(err.Error(), "reason") {
t.Errorf("err=%q should mention --reason to guide operator", err.Error())
}
if requestCount != 0 {
t.Fatalf("requestCount=%d want 0; client must short-circuit before HTTP call", requestCount)
}
}
// TestClient_RetireAgent_MissingID covers the other common operator mistake:
// invoking `certctl-cli agents retire` with no agent ID. Must be caught by the
// client with a clear error, not a malformed DELETE to /api/v1/agents/.
func TestClient_RetireAgent_MissingID(t *testing.T) {
var requestCount int
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestCount++
w.WriteHeader(http.StatusOK)
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
err := client.RetireAgent([]string{})
if err == nil {
t.Fatalf("RetireAgent([]) err=nil want missing-id error")
}
if requestCount != 0 {
t.Fatalf("requestCount=%d want 0; client must reject missing-id before HTTP", requestCount)
}
}
// TestClient_ListRetiredAgents_Success pins the audit/forensics CLI surface:
// `certctl-cli agents list-retired` must GET /api/v1/agents/retired and render
// the paged response. The server returns a PagedResponse; the client is
// responsible for printing it in table or JSON format, same as ListAgents.
func TestClient_ListRetiredAgents_Success(t *testing.T) {
var (
sawMethod string
sawPath string
)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sawMethod = r.Method
sawPath = r.URL.Path
if r.Method != "GET" || r.URL.Path != "/api/v1/agents/retired" {
w.WriteHeader(http.StatusNotFound)
return
}
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(map[string]interface{}{
"data": []map[string]interface{}{
{
"id": "ag-old-01",
"name": "decom-01",
"hostname": "server-old",
"status": "Offline",
"registered_at": "2024-01-01T00:00:00Z",
"retired_at": "2026-01-01T00:00:00Z",
"retired_reason": "old hardware",
},
},
"total": 1,
"page": 1,
"per_page": 50,
})
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
if err := client.ListRetiredAgents([]string{}); err != nil {
t.Fatalf("ListRetiredAgents err=%v want nil", err)
}
if sawMethod != "GET" {
t.Errorf("method=%q want GET", sawMethod)
}
if sawPath != "/api/v1/agents/retired" {
t.Errorf("path=%q want /api/v1/agents/retired", sawPath)
}
}
// TestClient_ListRetiredAgents_ServerError covers the non-happy path: server
// returns 5xx → client surfaces the error rather than silently printing an
// empty list. Without this, operators running the command as part of a
// compliance audit could miss a backend outage.
func TestClient_ListRetiredAgents_ServerError(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
http.Error(w, "db unreachable", http.StatusInternalServerError)
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
err := client.ListRetiredAgents([]string{})
if err == nil {
t.Fatalf("ListRetiredAgents(500) err=nil want propagated error")
}
}
+277 -6
View File
@@ -12,6 +12,7 @@ import (
"net/url"
"os"
"path/filepath"
"strings"
"text/tabwriter"
"time"
)
@@ -292,6 +293,194 @@ func (c *Client) ListAgents(args []string) error {
return c.outputAgentsTable(result.Data, result.Total)
}
// ListRetiredAgents lists soft-retired agents from the dedicated endpoint.
//
// I-004: hits GET /api/v1/agents/retired which is a separate route from the
// default listing (the default hides retired rows). Supports --page and
// --per-page just like the active list. Output format mirrors ListAgents
// but prepends RETIRED_AT and RETIRED_REASON columns so the operator can
// forensic-grep the output.
func (c *Client) ListRetiredAgents(args []string) error {
fs := flag.NewFlagSet("agents list --retired", flag.ContinueOnError)
page := fs.Int("page", 1, "Page number")
perPage := fs.Int("per-page", 50, "Items per page")
fs.Parse(args)
query := url.Values{}
query.Set("page", fmt.Sprintf("%d", *page))
query.Set("per_page", fmt.Sprintf("%d", *perPage))
resp, err := c.do("GET", "/api/v1/agents/retired", query, nil)
if err != nil {
return err
}
var result struct {
Data []map[string]interface{} `json:"data"`
Total int `json:"total"`
}
if err := json.Unmarshal(resp, &result); err != nil {
return fmt.Errorf("parsing response: %w", err)
}
if c.format == "json" {
return c.outputJSON(result)
}
return c.outputRetiredAgentsTable(result.Data, result.Total)
}
// RetireAgent soft-retires an agent via DELETE /api/v1/agents/{id}.
//
// I-004: wraps the full status-code matrix pinned by the handler's
// agent_retire_handler_test.go:
//
// 200 clean retire — body: retired_at, already_retired=false, cascade=false, counts=0
// 200 force-cascade retire — body: cascade=true, counts=pre-cascade snapshot
// 204 idempotent retire — agent was already retired, NO body
// 403 sentinel — reserved agent (server-scanner / cloud-*), ErrAgentIsSentinel
// 404 not found — agent doesn't exist
// 409 blocked_by_dependencies — body: error, message, counts
//
// The default (force=false) flow refuses to retire agents with active
// downstream dependencies; the operator must re-run with --force and an
// explicit --reason to cascade. The handler rejects --force without
// --reason with a 400 — we mirror that contract client-side so the
// operator gets a clear error before the round trip.
func (c *Client) RetireAgent(args []string) error {
// Convention: `agents retire <id> [--force] [--reason <reason>]` — the ID
// is a positional arg that precedes the flags. Go's flag package stops
// parsing at the first non-flag token, so we pull args[0] as the ID and
// hand args[1:] to the flag parser. Without this split, `agents retire
// ag-1 --force --reason "x"` would parse with force=false and reason=""
// because the flags land in fs.Args() instead of being recognized.
if len(args) == 0 {
return fmt.Errorf("agent ID is required: agents retire <id> [--force] [--reason <reason>]")
}
id := args[0]
fs := flag.NewFlagSet("agents retire", flag.ContinueOnError)
force := fs.Bool("force", false, "Cascade-retire downstream targets, certs, and jobs")
reason := fs.String("reason", "", "Human-readable reason (required with --force)")
if err := fs.Parse(args[1:]); err != nil {
return err
}
// Mirror the handler's ErrForceReasonRequired contract client-side so
// the operator gets a clear error before the round trip.
if *force && strings.TrimSpace(*reason) == "" {
return fmt.Errorf("--reason is required when --force is set")
}
// Build query string. Skip ?force=false; skip ?reason= when empty.
query := url.Values{}
if *force {
query.Set("force", "true")
}
if *reason != "" {
query.Set("reason", *reason)
}
u, err := url.JoinPath(c.baseURL, fmt.Sprintf("/api/v1/agents/%s", id))
if err != nil {
return fmt.Errorf("invalid URL: %w", err)
}
if len(query) > 0 {
u = u + "?" + query.Encode()
}
req, err := http.NewRequest("DELETE", u, nil)
if err != nil {
return fmt.Errorf("creating request: %w", err)
}
req.Header.Set("Accept", "application/json")
if c.apiKey != "" {
req.Header.Set("Authorization", "Bearer "+c.apiKey)
}
resp, err := c.httpClient.Do(req)
if err != nil {
return fmt.Errorf("request failed: %w", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("reading response: %w", err)
}
switch resp.StatusCode {
case http.StatusNoContent:
// 204 idempotent — the agent was already retired. No body.
if c.format == "json" {
return c.outputJSON(map[string]interface{}{
"agent_id": id,
"already_retired": true,
})
}
fmt.Printf("Agent %s was already retired (idempotent)\n", id)
return nil
case http.StatusOK:
var result struct {
RetiredAt string `json:"retired_at"`
AlreadyRetired bool `json:"already_retired"`
Cascade bool `json:"cascade"`
Counts struct {
ActiveTargets int `json:"active_targets"`
ActiveCertificates int `json:"active_certificates"`
PendingJobs int `json:"pending_jobs"`
} `json:"counts"`
}
if err := json.Unmarshal(body, &result); err != nil {
return fmt.Errorf("parsing 200 response: %w", err)
}
if c.format == "json" {
return c.outputJSON(json.RawMessage(body))
}
if result.Cascade {
fmt.Printf("Agent %s retired (cascade). Retired at: %s\n", id, result.RetiredAt)
fmt.Printf(" Cascaded: %d targets, %d certificates, %d jobs\n",
result.Counts.ActiveTargets, result.Counts.ActiveCertificates, result.Counts.PendingJobs)
} else {
fmt.Printf("Agent %s retired. Retired at: %s\n", id, result.RetiredAt)
}
return nil
case http.StatusConflict:
// 409 blocked_by_dependencies. Parse the body so we can show the
// operator which dependency counts are holding up the retire.
var blocked struct {
Error string `json:"error"`
Message string `json:"message"`
Counts struct {
ActiveTargets int `json:"active_targets"`
ActiveCertificates int `json:"active_certificates"`
PendingJobs int `json:"pending_jobs"`
} `json:"counts"`
}
if err := json.Unmarshal(body, &blocked); err != nil {
return fmt.Errorf("agent has active dependencies (HTTP 409); raw body: %s", string(body))
}
return fmt.Errorf("blocked_by_dependencies: %s (targets=%d certificates=%d jobs=%d); re-run with --force --reason \"<reason>\" to cascade",
blocked.Message, blocked.Counts.ActiveTargets, blocked.Counts.ActiveCertificates, blocked.Counts.PendingJobs)
case http.StatusForbidden:
return fmt.Errorf("agent %s is a reserved sentinel and cannot be retired (HTTP 403)", id)
case http.StatusNotFound:
return fmt.Errorf("agent %s not found (HTTP 404)", id)
case http.StatusBadRequest:
return fmt.Errorf("bad request (HTTP 400): %s", string(body))
default:
return fmt.Errorf("unexpected HTTP %d: %s", resp.StatusCode, string(body))
}
}
// GetAgent retrieves a single agent by ID.
func (c *Client) GetAgent(id string) error {
resp, err := c.do("GET", fmt.Sprintf("/api/v1/agents/%s", id), nil, nil)
@@ -430,7 +619,54 @@ func (c *Client) GetStatus() error {
}
// ImportCertificates bulk imports certificates from PEM files.
func (c *Client) ImportCertificates(files []string) error {
//
// C-001 scope-expansion closure: the create-certificate handler's
// six-field required contract (name, common_name, renewal_policy_id,
// issuer_id, owner_id, team_id) is enforced server-side via
// ValidateRequired. The bulk importer must therefore be told which
// owner / team / renewal-policy / issuer to assign to every imported
// cert — otherwise every POST comes back 400. All four IDs are
// required flags; missing flags error out with a user-legible message
// before any files are read.
func (c *Client) ImportCertificates(args []string) error {
fs := flag.NewFlagSet("import", flag.ContinueOnError)
ownerID := fs.String("owner-id", "", "Owner ID to assign to each imported certificate (required)")
teamID := fs.String("team-id", "", "Team ID to assign to each imported certificate (required)")
renewalPolicyID := fs.String("renewal-policy-id", "", "Renewal policy ID to assign to each imported certificate (required)")
issuerID := fs.String("issuer-id", "", "Issuer ID to assign to each imported certificate (required)")
nameTemplate := fs.String("name-template", "{cn}", "Template for the certificate name; {cn} is substituted with the cert's common name")
environment := fs.String("environment", "imported", "Environment tag for each imported certificate")
if err := fs.Parse(args); err != nil {
return err
}
// Validate required flags up front — a clear error here beats six
// parallel 400s from the server.
missing := []string{}
if *ownerID == "" {
missing = append(missing, "--owner-id")
}
if *teamID == "" {
missing = append(missing, "--team-id")
}
if *renewalPolicyID == "" {
missing = append(missing, "--renewal-policy-id")
}
if *issuerID == "" {
missing = append(missing, "--issuer-id")
}
if len(missing) > 0 {
return fmt.Errorf("missing required flag(s): %s", strings.Join(missing, ", "))
}
if *nameTemplate == "" {
return fmt.Errorf("--name-template must be non-empty")
}
files := fs.Args()
if len(files) == 0 {
return fmt.Errorf("at least one PEM file path is required")
}
var imported, failed int
for _, filePath := range files {
@@ -452,12 +688,18 @@ func (c *Client) ImportCertificates(files []string) error {
total := len(certs)
fmt.Printf("Importing %d/%d certificates from %s...\r", i+1, total, filepath.Base(filePath))
name := strings.ReplaceAll(*nameTemplate, "{cn}", cert.Subject.CommonName)
req := map[string]interface{}{
"common_name": cert.Subject.CommonName,
"sans": cert.DNSNames,
"issuer_id": "iss-local",
"environment": "imported",
"status": "Active",
"name": name,
"common_name": cert.Subject.CommonName,
"sans": cert.DNSNames,
"issuer_id": *issuerID,
"owner_id": *ownerID,
"team_id": *teamID,
"renewal_policy_id": *renewalPolicyID,
"environment": *environment,
"status": "Active",
}
if cert.SerialNumber != nil {
@@ -559,6 +801,35 @@ func (c *Client) outputAgentsTable(agents []map[string]interface{}, total int) e
return nil
}
// outputRetiredAgentsTable is the tab-writer view for the retired listing.
// I-004: adds RETIRED_AT + REASON columns so operators can forensic-grep.
func (c *Client) outputRetiredAgentsTable(agents []map[string]interface{}, total int) error {
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
fmt.Fprintln(w, "ID\tHOSTNAME\tOS\tARCHITECTURE\tRETIRED AT\tREASON")
for _, agent := range agents {
id := getString(agent, "id")
hostname := getString(agent, "hostname")
osName := getString(agent, "os")
arch := getString(agent, "architecture")
retiredAt := ""
if raw, ok := agent["retired_at"].(string); ok && raw != "" {
if t, err := time.Parse(time.RFC3339, raw); err == nil {
retiredAt = t.Format("2006-01-02 15:04:05")
} else {
retiredAt = raw
}
}
reason := getString(agent, "retired_reason")
fmt.Fprintf(w, "%s\t%s\t%s\t%s\t%s\t%s\n", id, hostname, osName, arch, retiredAt, reason)
}
w.Flush()
fmt.Printf("\nTotal retired: %d\n", total)
return nil
}
func (c *Client) outputAgentDetail(agent map[string]interface{}) error {
w := tabwriter.NewWriter(os.Stdout, 0, 0, 2, ' ', 0)
+174
View File
@@ -10,6 +10,8 @@ import (
"math/big"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"testing"
"time"
)
@@ -387,6 +389,178 @@ func TestClient_AuthHeader(t *testing.T) {
}
}
// TestClient_ImportCertificates_MissingRequiredFlags verifies the CLI
// import command rejects invocations missing any of the four required
// flags (--owner-id, --team-id, --renewal-policy-id, --issuer-id)
// before any network call is attempted. This is the C-001 scope-expansion
// closure for the CLI layer: the handler now requires all six cert
// fields, so the importer must collect ownership / team / policy /
// issuer up front rather than hard-coding iss-local and letting the
// server 400 on every POST.
func TestClient_ImportCertificates_MissingRequiredFlags(t *testing.T) {
var requestCount int
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestCount++
w.WriteHeader(http.StatusOK)
}))
defer server.Close()
cases := []struct {
name string
args []string
missing string
}{
{
name: "missing owner-id",
args: []string{"--team-id", "t-platform", "--renewal-policy-id", "rp-default", "--issuer-id", "iss-local", "certs.pem"},
missing: "--owner-id",
},
{
name: "missing team-id",
args: []string{"--owner-id", "o-alice", "--renewal-policy-id", "rp-default", "--issuer-id", "iss-local", "certs.pem"},
missing: "--team-id",
},
{
name: "missing renewal-policy-id",
args: []string{"--owner-id", "o-alice", "--team-id", "t-platform", "--issuer-id", "iss-local", "certs.pem"},
missing: "--renewal-policy-id",
},
{
name: "missing issuer-id",
args: []string{"--owner-id", "o-alice", "--team-id", "t-platform", "--renewal-policy-id", "rp-default", "certs.pem"},
missing: "--issuer-id",
},
{
name: "no flags at all",
args: []string{"certs.pem"},
missing: "--owner-id",
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
client := NewClient(server.URL, "", "table")
err := client.ImportCertificates(tc.args)
if err == nil {
t.Fatalf("expected error for %s, got nil", tc.name)
}
msg := err.Error()
if !containsStr(msg, tc.missing) {
t.Fatalf("expected error to name %q, got: %v", tc.missing, err)
}
if !containsStr(msg, "required") {
t.Fatalf("expected error message to mention 'required', got: %v", err)
}
})
}
if requestCount != 0 {
t.Fatalf("expected zero HTTP requests before flag validation, got %d", requestCount)
}
}
// TestClient_ImportCertificates_MissingPositionalArgs verifies the
// import command errors out when flags are present but no PEM file
// paths follow them.
func TestClient_ImportCertificates_MissingPositionalArgs(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
t.Errorf("unexpected HTTP request: %s %s", r.Method, r.URL.Path)
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
err := client.ImportCertificates([]string{
"--owner-id", "o-alice",
"--team-id", "t-platform",
"--renewal-policy-id", "rp-default",
"--issuer-id", "iss-local",
})
if err == nil {
t.Fatal("expected error when no PEM file paths are supplied")
}
if !containsStr(err.Error(), "PEM file") {
t.Fatalf("expected error to mention 'PEM file', got: %v", err)
}
}
// TestClient_ImportCertificates_SixFieldPayload verifies the happy
// path: given all four required flags plus a PEM file, the importer
// POSTs a request containing all six required fields plus the
// name-templateresolved name. The httptest handler decodes the
// request body and asserts every required field is populated with
// the values supplied via flags.
func TestClient_ImportCertificates_SixFieldPayload(t *testing.T) {
// Generate a test cert and write it to a temp PEM file.
cert := generateTestCert()
pemBlock := &pem.Block{Type: "CERTIFICATE", Bytes: cert.Raw}
pemPath := filepath.Join(t.TempDir(), "test.pem")
if err := os.WriteFile(pemPath, pem.EncodeToMemory(pemBlock), 0o600); err != nil {
t.Fatalf("write temp PEM: %v", err)
}
var gotBody map[string]interface{}
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" || r.URL.Path != "/api/v1/certificates" {
w.WriteHeader(http.StatusNotFound)
return
}
if err := json.NewDecoder(r.Body).Decode(&gotBody); err != nil {
t.Errorf("decode request body: %v", err)
}
w.WriteHeader(http.StatusCreated)
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{"id":"mc-imported"}`))
}))
defer server.Close()
client := NewClient(server.URL, "", "table")
err := client.ImportCertificates([]string{
"--owner-id", "o-alice",
"--team-id", "t-platform",
"--renewal-policy-id", "rp-default",
"--issuer-id", "iss-local",
"--name-template", "imported-{cn}",
pemPath,
})
if err != nil {
t.Fatalf("ImportCertificates failed: %v", err)
}
// Verify every required field from the six-field contract is present.
required := []struct {
field string
want interface{}
}{
{"name", "imported-test.example.com"},
{"common_name", "test.example.com"},
{"issuer_id", "iss-local"},
{"owner_id", "o-alice"},
{"team_id", "t-platform"},
{"renewal_policy_id", "rp-default"},
}
for _, r := range required {
got, ok := gotBody[r.field]
if !ok {
t.Errorf("payload missing required field %q (body: %+v)", r.field, gotBody)
continue
}
if got != r.want {
t.Errorf("field %q = %v, want %v", r.field, got, r.want)
}
}
}
// containsStr is a tiny substring helper so the test file doesn't
// need a `strings` import dependency aside from what's already there.
func containsStr(haystack, needle string) bool {
for i := 0; i+len(needle) <= len(haystack); i++ {
if haystack[i:i+len(needle)] == needle {
return true
}
}
return false
}
// Helper function to generate a test certificate
func generateTestCert() *x509.Certificate {
now := time.Now()
+178 -6
View File
@@ -5,6 +5,7 @@ import (
"log/slog"
"os"
"strconv"
"strings"
"time"
)
@@ -116,6 +117,14 @@ type GlobalSignConfig struct {
// ClientKeyPath is the path to the mTLS client private key PEM file.
// Setting: CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH environment variable.
ClientKeyPath string
// ServerCAPath is the optional path to a PEM file containing the CA
// certificate(s) used to verify the GlobalSign Atlas HVCA API server
// certificate. If empty, the system trust store is used. Set this
// for private/lab Atlas deployments whose server TLS chain is not
// present in the host's default trust bundle.
// Setting: CERTCTL_GLOBALSIGN_SERVER_CA_PATH environment variable.
ServerCAPath string
}
// EJBCAConfig contains EJBCA (Keyfactor) issuer connector configuration.
@@ -641,7 +650,12 @@ type SCEPConfig struct {
// ChallengePassword is the shared secret used to authenticate SCEP enrollment requests.
// Clients include this in the PKCS#10 CSR challengePassword attribute.
// Required when SCEP is enabled.
//
// REQUIRED when Enabled is true. If SCEP is enabled and this value is empty,
// cmd/server/main.go's preflightSCEPChallengePassword check will refuse to
// start the server (H-2, CWE-306): an empty shared secret allowed any client
// that could reach /scep to enroll a CSR against the configured issuer. The
// service-layer PKCSReq path also rejects this configuration defense-in-depth.
ChallengePassword string
}
@@ -693,6 +707,37 @@ type SchedulerConfig struct {
// Default: 1 minute. Minimum: 1 second. Sends notifications to Slack, Teams, PagerDuty, etc.
// Setting: CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL environment variable.
NotificationProcessInterval time.Duration
// RetryInterval is how often the scheduler retries failed jobs whose Attempts
// counter is below MaxAttempts. Default: 5 minutes. Minimum: 1 second.
// Transitions eligible Failed jobs back to Pending so the job processor can
// pick them up again (closes coverage gap I-001 — JobService.RetryFailedJobs
// had no caller prior to this loop being wired).
// Setting: CERTCTL_SCHEDULER_RETRY_INTERVAL environment variable.
RetryInterval time.Duration
// JobTimeoutInterval is how often the reaper loop sweeps AwaitingCSR and
// AwaitingApproval jobs for TTL expiration. Default: 10 minutes. Minimum: 1
// second. Timed-out jobs are transitioned to Failed with a descriptive error
// message; I-001's retry loop then auto-promotes eligible Failed jobs back
// to Pending (closes coverage gap I-003).
// Setting: CERTCTL_JOB_TIMEOUT_INTERVAL environment variable.
JobTimeoutInterval time.Duration
// AwaitingCSRTimeout is the maximum age an AwaitingCSR job can remain in
// that state before the reaper transitions it to Failed. Default: 24 hours.
// An agent that hasn't submitted a CSR within this window is presumed
// unreachable. Minimum: 1 second.
// Setting: CERTCTL_JOB_AWAITING_CSR_TIMEOUT environment variable.
AwaitingCSRTimeout time.Duration
// AwaitingApprovalTimeout is the maximum age an AwaitingApproval job can
// remain in that state before the reaper transitions it to Failed. Default:
// 168 hours (7 days). Reviewers who haven't approved within this window
// force the renewal to fail loudly rather than silently stall. Minimum: 1
// second.
// Setting: CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT environment variable.
AwaitingApprovalTimeout time.Duration
}
// LogConfig contains logging configuration.
@@ -708,6 +753,19 @@ type LogConfig struct {
Format string
}
// NamedAPIKey represents a single named API key with an optional admin flag.
// Named keys allow real actor attribution in the audit trail (M-002) and provide
// the admin-gate basis for privileged endpoints like bulk revocation (M-003).
type NamedAPIKey struct {
// Name is the identifier for the key (alphanumeric, hyphens, underscores).
// This value is recorded as the actor on every audit event the key authenticates.
Name string
// Key is the raw API-key secret the client presents as `Authorization: Bearer <key>`.
Key string
// Admin controls whether the key has admin privileges (bulk revocation, etc.).
Admin bool
}
// AuthConfig contains authentication configuration.
type AuthConfig struct {
// Type sets the authentication mechanism for the REST API.
@@ -717,12 +775,19 @@ type AuthConfig struct {
// Setting: CERTCTL_AUTH_TYPE environment variable. Default: "api-key".
Type string
// Secret is the authentication secret (API key hash, JWT signing key, etc.).
// For "api-key": the base64-encoded API key to validate against.
// For "jwt": the secret used to verify JWT token signatures.
// For "none": ignored.
// Setting: CERTCTL_AUTH_SECRET environment variable. Required for "api-key" and "jwt".
// Secret is the legacy authentication secret (comma-separated API keys).
// DEPRECATED in favor of NamedKeys — retained for backward compatibility.
// When NamedKeys is empty and Secret is set, each comma-separated key is
// registered as a synthesized named key (legacy-key-0, legacy-key-1, ...)
// with actor attribution defaulting to "legacy-key-<index>".
// Setting: CERTCTL_AUTH_SECRET environment variable.
Secret string
// NamedKeys is the parsed set of named API keys. Populated from
// CERTCTL_API_KEYS_NAMED via ParseNamedAPIKeys during Load(). When
// non-empty, this takes precedence over the legacy Secret field.
// Setting: CERTCTL_API_KEYS_NAMED="name1:key1,name2:key2:admin"
NamedKeys []NamedAPIKey
}
// RateLimitConfig contains rate limiting configuration.
@@ -773,6 +838,10 @@ func Load() (*Config, error) {
JobProcessorInterval: getEnvDuration("CERTCTL_SCHEDULER_JOB_PROCESSOR_INTERVAL", 30*time.Second),
AgentHealthCheckInterval: getEnvDuration("CERTCTL_SCHEDULER_AGENT_HEALTH_CHECK_INTERVAL", 2*time.Minute),
NotificationProcessInterval: getEnvDuration("CERTCTL_SCHEDULER_NOTIFICATION_PROCESS_INTERVAL", 1*time.Minute),
RetryInterval: getEnvDuration("CERTCTL_SCHEDULER_RETRY_INTERVAL", 5*time.Minute),
JobTimeoutInterval: getEnvDuration("CERTCTL_JOB_TIMEOUT_INTERVAL", 10*time.Minute),
AwaitingCSRTimeout: getEnvDuration("CERTCTL_JOB_AWAITING_CSR_TIMEOUT", 24*time.Hour),
AwaitingApprovalTimeout: getEnvDuration("CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT", 168*time.Hour),
},
Log: LogConfig{
Level: getEnv("CERTCTL_LOG_LEVEL", "info"),
@@ -781,6 +850,8 @@ func Load() (*Config, error) {
Auth: AuthConfig{
Type: getEnv("CERTCTL_AUTH_TYPE", "api-key"),
Secret: getEnv("CERTCTL_AUTH_SECRET", ""),
// NamedKeys is populated from CERTCTL_API_KEYS_NAMED below so Load()
// can surface parse errors alongside other config errors.
},
RateLimit: RateLimitConfig{
Enabled: getEnvBool("CERTCTL_RATE_LIMIT_ENABLED", true),
@@ -882,6 +953,7 @@ func Load() (*Config, error) {
APISecret: getEnv("CERTCTL_GLOBALSIGN_API_SECRET", ""),
ClientCertPath: getEnv("CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH", ""),
ClientKeyPath: getEnv("CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH", ""),
ServerCAPath: getEnv("CERTCTL_GLOBALSIGN_SERVER_CA_PATH", ""),
},
EJBCA: EJBCAConfig{
APIUrl: getEnv("CERTCTL_EJBCA_API_URL", ""),
@@ -945,6 +1017,14 @@ func Load() (*Config, error) {
},
}
// Parse CERTCTL_API_KEYS_NAMED for named key authentication (M-002).
// Parse errors surface here so invalid config fails fast at startup.
named, err := ParseNamedAPIKeys(getEnv("CERTCTL_API_KEYS_NAMED", ""))
if err != nil {
return nil, fmt.Errorf("parse CERTCTL_API_KEYS_NAMED: %w", err)
}
cfg.Auth.NamedKeys = named
if err := cfg.Validate(); err != nil {
return nil, err
}
@@ -1029,6 +1109,22 @@ func (c *Config) Validate() error {
return fmt.Errorf("notification process interval must be at least 1 second")
}
if c.Scheduler.RetryInterval < 1*time.Second {
return fmt.Errorf("retry interval must be at least 1 second")
}
if c.Scheduler.JobTimeoutInterval < 1*time.Second {
return fmt.Errorf("job timeout interval must be at least 1 second")
}
if c.Scheduler.AwaitingCSRTimeout < 1*time.Second {
return fmt.Errorf("awaiting CSR timeout must be at least 1 second")
}
if c.Scheduler.AwaitingApprovalTimeout < 1*time.Second {
return fmt.Errorf("awaiting approval timeout must be at least 1 second")
}
return nil
}
@@ -1153,3 +1249,79 @@ func (c *Config) GetLogLevel() slog.Level {
return slog.LevelInfo
}
}
// ParseNamedAPIKeys parses the CERTCTL_API_KEYS_NAMED environment variable.
// Format: "name1:key1,name2:key2:admin,name3:key3"
// The ":admin" suffix is optional; if present, the key has admin privileges.
// Returns a typed []NamedAPIKey so main.go can pass it directly to the
// middleware layer without type assertion gymnastics.
func ParseNamedAPIKeys(input string) ([]NamedAPIKey, error) {
if input == "" {
return nil, nil
}
parts := splitComma(input)
var keys []NamedAPIKey
seen := make(map[string]bool)
for _, part := range parts {
part = trimSpace(part)
if part == "" {
continue
}
// Split by colon: name:key or name:key:admin
fields := strings.Split(part, ":")
if len(fields) < 2 || len(fields) > 3 {
return nil, fmt.Errorf("invalid named key format: %s (expected name:key or name:key:admin)", part)
}
name := trimSpace(fields[0])
key := trimSpace(fields[1])
admin := false
if len(fields) == 3 {
adminStr := trimSpace(fields[2])
if adminStr == "admin" {
admin = true
} else {
return nil, fmt.Errorf("invalid admin flag: %s (expected 'admin')", adminStr)
}
}
// Validate name format: alphanumeric, hyphens, underscores
if !isValidKeyName(name) {
return nil, fmt.Errorf("invalid key name: %s (must be alphanumeric, hyphens, underscores)", name)
}
if seen[name] {
return nil, fmt.Errorf("duplicate key name: %s", name)
}
seen[name] = true
if key == "" {
return nil, fmt.Errorf("empty key for name: %s", name)
}
keys = append(keys, NamedAPIKey{
Name: name,
Key: key,
Admin: admin,
})
}
return keys, nil
}
// isValidKeyName checks if a key name is valid (alphanumeric, hyphens, underscores).
func isValidKeyName(s string) bool {
if len(s) == 0 {
return false
}
for _, c := range s {
if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '-' || c == '_') {
return false
}
}
return true
}
+126
View File
@@ -4,6 +4,7 @@ import (
"log/slog"
"os"
"testing"
"strings"
"time"
)
@@ -328,6 +329,10 @@ func TestValidate_ValidConfig(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
RetryInterval: 5 * time.Minute,
JobTimeoutInterval: 10 * time.Minute,
AwaitingCSRTimeout: 24 * time.Hour,
AwaitingApprovalTimeout: 168 * time.Hour,
},
}
if err := cfg.Validate(); err != nil {
@@ -347,6 +352,10 @@ func TestValidate_AuthTypeNone(t *testing.T) {
JobProcessorInterval: 30 * time.Second,
AgentHealthCheckInterval: 2 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
RetryInterval: 5 * time.Minute,
JobTimeoutInterval: 10 * time.Minute,
AwaitingCSRTimeout: 24 * time.Hour,
AwaitingApprovalTimeout: 168 * time.Hour,
},
}
if err := cfg.Validate(); err != nil {
@@ -706,3 +715,120 @@ func TestGetEnvBool(t *testing.T) {
})
}
}
// I-003: Job timeout reaper configuration tests
func TestConfig_Scheduler_JobTimeoutDefaults(t *testing.T) {
clearCertctlEnv(t)
setMinimalValidEnv(t)
// Explicitly unset the three I-003 env vars to exercise the default path.
t.Setenv("CERTCTL_JOB_TIMEOUT_INTERVAL", "")
t.Setenv("CERTCTL_JOB_AWAITING_CSR_TIMEOUT", "")
t.Setenv("CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT", "")
cfg, err := Load()
if err != nil {
t.Fatalf("Load() error: %v", err)
}
if cfg.Scheduler.JobTimeoutInterval != 10*time.Minute {
t.Errorf("JobTimeoutInterval = %v, want 10m", cfg.Scheduler.JobTimeoutInterval)
}
if cfg.Scheduler.AwaitingCSRTimeout != 24*time.Hour {
t.Errorf("AwaitingCSRTimeout = %v, want 24h", cfg.Scheduler.AwaitingCSRTimeout)
}
if cfg.Scheduler.AwaitingApprovalTimeout != 168*time.Hour {
t.Errorf("AwaitingApprovalTimeout = %v, want 168h", cfg.Scheduler.AwaitingApprovalTimeout)
}
}
func TestConfig_Scheduler_JobTimeoutEnvOverride(t *testing.T) {
clearCertctlEnv(t)
setMinimalValidEnv(t)
t.Setenv("CERTCTL_JOB_TIMEOUT_INTERVAL", "15m")
t.Setenv("CERTCTL_JOB_AWAITING_CSR_TIMEOUT", "48h")
t.Setenv("CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT", "336h")
cfg, err := Load()
if err != nil {
t.Fatalf("Load() error: %v", err)
}
if cfg.Scheduler.JobTimeoutInterval != 15*time.Minute {
t.Errorf("JobTimeoutInterval = %v, want 15m", cfg.Scheduler.JobTimeoutInterval)
}
if cfg.Scheduler.AwaitingCSRTimeout != 48*time.Hour {
t.Errorf("AwaitingCSRTimeout = %v, want 48h", cfg.Scheduler.AwaitingCSRTimeout)
}
if cfg.Scheduler.AwaitingApprovalTimeout != 336*time.Hour {
t.Errorf("AwaitingApprovalTimeout = %v, want 336h", cfg.Scheduler.AwaitingApprovalTimeout)
}
}
func TestConfig_Scheduler_JobTimeoutValidation(t *testing.T) {
tests := []struct {
name string
field string
value time.Duration
wantErrMsg string
}{
{
"JobTimeoutInterval too small",
"JobTimeoutInterval",
500 * time.Millisecond,
"job timeout interval must be at least 1 second",
},
{
"AwaitingCSRTimeout too small",
"AwaitingCSRTimeout",
500 * time.Millisecond,
"awaiting CSR timeout must be at least 1 second",
},
{
"AwaitingApprovalTimeout too small",
"AwaitingApprovalTimeout",
500 * time.Millisecond,
"awaiting approval timeout must be at least 1 second",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Start from a fully valid config so the I-003 timeout checks
// are the only potential failure point.
cfg := &Config{
Server: ServerConfig{Port: 8080},
Database: DatabaseConfig{URL: "postgres://localhost/certctl", MaxConnections: 25},
Log: LogConfig{Level: "info", Format: "json"},
Auth: AuthConfig{Type: "api-key", Secret: "test-secret"},
Keygen: KeygenConfig{Mode: "agent"},
Scheduler: SchedulerConfig{
RenewalCheckInterval: 1 * time.Minute,
JobProcessorInterval: 1 * time.Minute,
AgentHealthCheckInterval: 1 * time.Minute,
NotificationProcessInterval: 1 * time.Minute,
RetryInterval: 1 * time.Minute,
JobTimeoutInterval: 10 * time.Minute,
AwaitingCSRTimeout: 24 * time.Hour,
AwaitingApprovalTimeout: 168 * time.Hour,
},
}
// Override the specific field under test
switch tt.field {
case "JobTimeoutInterval":
cfg.Scheduler.JobTimeoutInterval = tt.value
case "AwaitingCSRTimeout":
cfg.Scheduler.AwaitingCSRTimeout = tt.value
case "AwaitingApprovalTimeout":
cfg.Scheduler.AwaitingApprovalTimeout = tt.value
}
err := cfg.Validate()
if err == nil {
t.Fatalf("Validate() = nil, want error containing %q", tt.wantErrMsg)
}
if !strings.Contains(err.Error(), tt.wantErrMsg) {
t.Errorf("Validate() error = %q, want to contain %q", err.Error(), tt.wantErrMsg)
}
})
}
}
+5 -1
View File
@@ -547,7 +547,11 @@ func (c *Connector) solveAuthorizationsHTTP01(ctx context.Context, authzURLs []s
return fmt.Errorf("failed to start challenge server: %w", err)
}
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
// Derive the challenge-server shutdown context from the parent ctx so
// values (trace IDs, deadlines) propagate, but detach from its
// cancellation so Shutdown always gets its full budget even when the
// parent was cancelled (M-2 / D-3).
shutdownCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), 5*time.Second)
defer cancel()
_ = srv.Shutdown(shutdownCtx)
c.logger.Debug("challenge server stopped")
@@ -34,6 +34,7 @@ import (
"io"
"log/slog"
"net/http"
"os"
"strings"
"time"
@@ -64,6 +65,14 @@ type Config struct {
// Must match the certificate in ClientCertPath.
// Required. Set via CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH environment variable.
ClientKeyPath string `json:"client_key_path"`
// ServerCAPath is the filesystem path to a PEM file containing the CA
// certificate(s) used to verify the GlobalSign Atlas HVCA API server certificate.
// Optional. If empty, the system trust store is used. This option exists for
// private/lab deployments of GlobalSign Atlas that terminate TLS with an
// internal CA not present in the host's default trust bundle.
// Set via CERTCTL_GLOBALSIGN_SERVER_CA_PATH environment variable.
ServerCAPath string `json:"server_ca_path,omitempty"`
}
// Connector implements the issuer.Connector interface for GlobalSign Atlas HVCA.
@@ -153,14 +162,12 @@ func (c *Connector) ValidateConfig(ctx context.Context, rawConfig json.RawMessag
return fmt.Errorf("failed to load GlobalSign client certificate: %w", err)
}
// Create an mTLS client for validation
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
// InsecureSkipVerify=true allows testing against self-signed server certs.
// In production, GlobalSign's API uses a proper certificate chain.
// This matches the pattern used by other connectors (F5, network scanner, etc.)
// that also need to bypass hostname verification for internal/lab environments.
InsecureSkipVerify: true,
// Build a verifying mTLS TLS config. If ServerCAPath is set, that PEM
// bundle is used as the trust anchor for the server certificate;
// otherwise the system trust store is used. TLS 1.2 is the minimum.
tlsConfig, err := buildServerTLSConfig(&cfg, cert)
if err != nil {
return fmt.Errorf("failed to build GlobalSign TLS config: %w", err)
}
validationClient := &http.Client{
@@ -225,9 +232,9 @@ func (c *Connector) getHTTPClient(ctx context.Context) (*http.Client, error) {
return nil, fmt.Errorf("failed to load GlobalSign client certificate: %w", err)
}
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
InsecureSkipVerify: true,
tlsConfig, err := buildServerTLSConfig(c.config, cert)
if err != nil {
return nil, fmt.Errorf("failed to build GlobalSign TLS config: %w", err)
}
return &http.Client{
@@ -238,6 +245,38 @@ func (c *Connector) getHTTPClient(ctx context.Context) (*http.Client, error) {
}, nil
}
// buildServerTLSConfig returns a TLS configuration for the GlobalSign Atlas
// HVCA API client. It always verifies the server certificate. When
// cfg.ServerCAPath is set, the PEM bundle at that path is used as the
// trust anchor (enables pinning a private/lab CA); otherwise the host's
// system trust store is used. TLS 1.2 is the minimum protocol version.
//
// This helper is the single source of truth for both the ValidateConfig
// probe client and the steady-state getHTTPClient production client, so
// any future TLS policy change applies uniformly.
func buildServerTLSConfig(cfg *Config, clientCert tls.Certificate) (*tls.Config, error) {
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{clientCert},
MinVersion: tls.VersionTLS12,
}
if cfg.ServerCAPath != "" {
caPEM, err := os.ReadFile(cfg.ServerCAPath)
if err != nil {
return nil, fmt.Errorf("failed to read server CA bundle at %s: %w", cfg.ServerCAPath, err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(caPEM) {
return nil, fmt.Errorf("no valid PEM certificates found in server CA bundle at %s", cfg.ServerCAPath)
}
tlsConfig.RootCAs = pool
}
return tlsConfig, nil
}
// IssueCertificate submits a certificate order to GlobalSign Atlas HVCA.
// Returns the serial number immediately; typically the cert is available within seconds (DV) to minutes (OV).
func (c *Connector) IssueCertificate(ctx context.Context, request issuer.IssuanceRequest) (*issuer.IssuanceResult, error) {
@@ -4,7 +4,6 @@ import (
"context"
"crypto/rand"
"crypto/rsa"
"crypto/tls"
"crypto/x509"
"crypto/x509/pkix"
"encoding/json"
@@ -161,11 +160,7 @@ func TestGlobalSignConnector(t *testing.T) {
testCertPEM, _ := generateTestCert(t)
testChainPEM, _ := generateTestCert(t)
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/v2/certificates" && r.Method == http.MethodPost {
@@ -223,11 +218,7 @@ func TestGlobalSignConnector(t *testing.T) {
})
t.Run("IssueCertificate_Pending", func(t *testing.T) {
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/v2/certificates" && r.Method == http.MethodPost {
@@ -271,11 +262,7 @@ func TestGlobalSignConnector(t *testing.T) {
})
t.Run("IssueCertificate_Error", func(t *testing.T) {
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/v2/certificates" && r.Method == http.MethodPost {
@@ -312,11 +299,7 @@ func TestGlobalSignConnector(t *testing.T) {
testCertPEM, _ := generateTestCert(t)
testChainPEM, _ := generateTestCert(t)
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if strings.HasPrefix(r.URL.Path, "/v2/certificates/12345") && r.Method == http.MethodGet {
@@ -356,11 +339,7 @@ func TestGlobalSignConnector(t *testing.T) {
})
t.Run("GetOrderStatus_Pending", func(t *testing.T) {
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if strings.HasPrefix(r.URL.Path, "/v2/certificates/98765") && r.Method == http.MethodGet {
@@ -401,11 +380,7 @@ func TestGlobalSignConnector(t *testing.T) {
testCertPEM, _ := generateTestCert(t)
testChainPEM, _ := generateTestCert(t)
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/v2/certificates" && r.Method == http.MethodPost {
@@ -448,11 +423,7 @@ func TestGlobalSignConnector(t *testing.T) {
})
t.Run("RevokeCertificate_Success", func(t *testing.T) {
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if strings.HasPrefix(r.URL.Path, "/v2/certificates/") && strings.HasSuffix(r.URL.Path, "/revoke") && r.Method == http.MethodPut {
@@ -492,11 +463,7 @@ func TestGlobalSignConnector(t *testing.T) {
})
t.Run("RevokeCertificate_Error", func(t *testing.T) {
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if strings.HasPrefix(r.URL.Path, "/v2/certificates/") && strings.HasSuffix(r.URL.Path, "/revoke") && r.Method == http.MethodPut {
@@ -532,11 +499,7 @@ func TestGlobalSignConnector(t *testing.T) {
testChainPEM, _ := generateTestCert(t)
authHeadersChecked := 0
httpClient := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
},
}
httpClient := &http.Client{}
mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Check for auth headers on every request
@@ -584,6 +547,177 @@ func TestGlobalSignConnector(t *testing.T) {
})
}
// TestGlobalSign_ServerTLSConfig exercises the server-side TLS verification
// policy added by H-5. The connector must always verify the GlobalSign Atlas
// HVCA API server certificate: by default against the host's system trust
// store, and when ServerCAPath is set, against the pinned PEM bundle at that
// path. InsecureSkipVerify is no longer reachable from any production code path.
func TestGlobalSign_ServerTLSConfig(t *testing.T) {
logger := slog.New(slog.NewTextHandler(os.Stdout, &slog.HandlerOptions{Level: slog.LevelDebug}))
ctx := context.Background()
// writeClientMTLS generates a throwaway client cert+key pair and writes them
// to disk. ValidateConfig requires valid ClientCertPath / ClientKeyPath files
// before it reaches the server-CA validation path under test.
writeClientMTLS := func(t *testing.T) (certPath, keyPath string) {
t.Helper()
certPEM, keyPEM := generateTestCert(t)
dir := t.TempDir()
certPath = dir + "/client-cert.pem"
keyPath = dir + "/client-key.pem"
if err := os.WriteFile(certPath, []byte(certPEM), 0600); err != nil {
t.Fatalf("failed to write client cert: %v", err)
}
if err := os.WriteFile(keyPath, []byte(keyPEM), 0600); err != nil {
t.Fatalf("failed to write client key: %v", err)
}
return certPath, keyPath
}
// certToPEM re-encodes a parsed certificate as a PEM block for trust-store
// pinning. httptest.NewTLSServer.Certificate() returns the server's self-
// signed cert; pinning that cert trusts exactly that one server.
certToPEM := func(t *testing.T, cert *x509.Certificate) string {
t.Helper()
return string(pem.EncodeToMemory(&pem.Block{
Type: "CERTIFICATE",
Bytes: cert.Raw,
}))
}
t.Run("PinnedCA_TrustsExpectedServer", func(t *testing.T) {
// Mock Atlas API served over HTTPS with a self-signed cert. We pin
// that cert's PEM as the client's trust anchor; the validation probe
// should succeed because the pinned pool contains the server's issuer.
srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/v2/certificates" && r.Method == http.MethodGet {
if r.Header.Get("ApiKey") == "gs-test-key" && r.Header.Get("ApiSecret") == "gs-test-secret" {
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"certificates":[]}`))
return
}
w.WriteHeader(http.StatusForbidden)
return
}
http.NotFound(w, r)
}))
defer srv.Close()
caPEM := certToPEM(t, srv.Certificate())
caPath := t.TempDir() + "/atlas-ca.pem"
if err := os.WriteFile(caPath, []byte(caPEM), 0600); err != nil {
t.Fatalf("failed to write pinned CA: %v", err)
}
clientCert, clientKey := writeClientMTLS(t)
config := globalsign.Config{
APIUrl: srv.URL,
APIKey: "gs-test-key",
APISecret: "gs-test-secret",
ClientCertPath: clientCert,
ClientKeyPath: clientKey,
ServerCAPath: caPath,
}
connector := globalsign.New(&config, logger)
rawConfig, _ := json.Marshal(config)
if err := connector.ValidateConfig(ctx, rawConfig); err != nil {
t.Fatalf("ValidateConfig with pinned CA should succeed, got: %v", err)
}
})
t.Run("PinnedCA_RejectsUntrustedServer", func(t *testing.T) {
// Mock server presents its own self-signed cert; we pin an UNRELATED
// cert as the trust anchor. The TLS handshake must fail before any
// request is sent — this is exactly what H-5 remediates.
srv := httptest.NewTLSServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
}))
defer srv.Close()
unrelatedPEM, _ := generateTestCert(t)
caPath := t.TempDir() + "/unrelated-ca.pem"
if err := os.WriteFile(caPath, []byte(unrelatedPEM), 0600); err != nil {
t.Fatalf("failed to write unrelated CA: %v", err)
}
clientCert, clientKey := writeClientMTLS(t)
config := globalsign.Config{
APIUrl: srv.URL,
APIKey: "gs-test-key",
APISecret: "gs-test-secret",
ClientCertPath: clientCert,
ClientKeyPath: clientKey,
ServerCAPath: caPath,
}
connector := globalsign.New(&config, logger)
rawConfig, _ := json.Marshal(config)
err := connector.ValidateConfig(ctx, rawConfig)
if err == nil {
t.Fatal("ValidateConfig must fail when the server cert is not signed by the pinned CA")
}
// The failure must originate from TLS verification, not from any other path.
if !strings.Contains(err.Error(), "x509") &&
!strings.Contains(err.Error(), "certificate") &&
!strings.Contains(err.Error(), "unknown authority") {
t.Errorf("expected TLS verification error, got: %v", err)
}
t.Logf("Untrusted server cert correctly rejected: %v", err)
})
t.Run("ServerCAPath_MissingFile", func(t *testing.T) {
clientCert, clientKey := writeClientMTLS(t)
config := globalsign.Config{
APIUrl: "https://example.invalid",
APIKey: "gs-test-key",
APISecret: "gs-test-secret",
ClientCertPath: clientCert,
ClientKeyPath: clientKey,
ServerCAPath: "/nonexistent/path/to/ca.pem",
}
connector := globalsign.New(&config, logger)
rawConfig, _ := json.Marshal(config)
err := connector.ValidateConfig(ctx, rawConfig)
if err == nil {
t.Fatal("ValidateConfig must fail when ServerCAPath points to a missing file")
}
if !strings.Contains(err.Error(), "failed to read server CA bundle") {
t.Errorf("expected 'failed to read server CA bundle' error, got: %v", err)
}
t.Logf("Missing server CA file correctly rejected: %v", err)
})
t.Run("ServerCAPath_InvalidPEM", func(t *testing.T) {
clientCert, clientKey := writeClientMTLS(t)
badCAPath := t.TempDir() + "/garbage.pem"
if err := os.WriteFile(badCAPath, []byte("this is not a PEM certificate at all"), 0600); err != nil {
t.Fatalf("failed to write garbage file: %v", err)
}
config := globalsign.Config{
APIUrl: "https://example.invalid",
APIKey: "gs-test-key",
APISecret: "gs-test-secret",
ClientCertPath: clientCert,
ClientKeyPath: clientKey,
ServerCAPath: badCAPath,
}
connector := globalsign.New(&config, logger)
rawConfig, _ := json.Marshal(config)
err := connector.ValidateConfig(ctx, rawConfig)
if err == nil {
t.Fatal("ValidateConfig must fail when ServerCAPath contains no valid PEM certificates")
}
if !strings.Contains(err.Error(), "no valid PEM certificates") {
t.Errorf("expected 'no valid PEM certificates' error, got: %v", err)
}
t.Logf("Invalid PEM correctly rejected: %v", err)
})
}
// generateTestCert generates a self-signed test certificate and returns PEM strings.
func generateTestCert(t *testing.T) (certPEM string, keyPEM string) {
priv, err := rsa.GenerateKey(rand.Reader, 2048)
+19
View File
@@ -359,6 +359,25 @@ func (c *Connector) loadCAFromDisk() error {
return fmt.Errorf("loaded CA certificate does not have KeyUsageCertSign")
}
// Validate CA certificate validity window (M-5, CWE-672).
// An expired or not-yet-valid sub-CA produces child certificates that any
// RFC 5280 path-validator will reject. Fail closed at load time so operators
// learn about it at startup, not at 3am when a renewal cycle silently
// starts minting broken certs. See audit finding M-5.
now := time.Now()
if now.After(caCert.NotAfter) {
return fmt.Errorf("CA certificate %q has expired (not_after=%s, now=%s)",
caCert.Subject.CommonName,
caCert.NotAfter.UTC().Format(time.RFC3339),
now.UTC().Format(time.RFC3339))
}
if now.Before(caCert.NotBefore) {
return fmt.Errorf("CA certificate %q is not yet valid (not_before=%s, now=%s)",
caCert.Subject.CommonName,
caCert.NotBefore.UTC().Format(time.RFC3339),
now.UTC().Format(time.RFC3339))
}
// Load CA private key (supports RSA and ECDSA)
keyPEM, err := os.ReadFile(c.config.CAKeyPath)
if err != nil {
+120 -3
View File
@@ -14,6 +14,7 @@ import (
"math/big"
"os"
"path/filepath"
"strings"
"testing"
"time"
@@ -360,6 +361,114 @@ func TestSubCAMode(t *testing.T) {
t.Logf("Correctly rejected non-CA cert: %v", err)
})
t.Run("SubCA_ExpiredCert_IsRejected", func(t *testing.T) {
// Sub-CA expired 1 hour ago. M-5: loadCAFromDisk must fail closed
// instead of minting child certs that immediately fail path validation
// at every relying party (CWE-672).
notBefore := time.Now().AddDate(-1, 0, 0)
notAfter := time.Now().Add(-1 * time.Hour)
certPath, keyPath := generateTestSubCAWithValidity(t, "rsa", notBefore, notAfter)
config := &local.Config{
ValidityDays: 30,
CACertPath: certPath,
CAKeyPath: keyPath,
}
connector := local.New(config, logger)
_, csrPEM, err := generateTestCSR("app.internal.corp")
if err != nil {
t.Fatalf("Failed to generate CSR: %v", err)
}
req := issuer.IssuanceRequest{
CommonName: "app.internal.corp",
CSRPEM: csrPEM,
}
_, err = connector.IssueCertificate(ctx, req)
if err == nil {
t.Fatal("Expected error when loading expired sub-CA; got nil")
}
if !strings.Contains(err.Error(), "expired") {
t.Errorf("Expected error to mention 'expired'; got: %v", err)
}
if !strings.Contains(err.Error(), "Test Sub-CA") {
t.Errorf("Expected error to include CA subject CN 'Test Sub-CA'; got: %v", err)
}
t.Logf("Correctly rejected expired sub-CA: %v", err)
})
t.Run("SubCA_NotYetValid_IsRejected", func(t *testing.T) {
// Sub-CA is not valid for another hour (clock skew or operator error
// pushing a pre-production CA into prod). M-5: loadCAFromDisk must
// fail closed.
notBefore := time.Now().Add(1 * time.Hour)
notAfter := time.Now().AddDate(5, 0, 0)
certPath, keyPath := generateTestSubCAWithValidity(t, "rsa", notBefore, notAfter)
config := &local.Config{
ValidityDays: 30,
CACertPath: certPath,
CAKeyPath: keyPath,
}
connector := local.New(config, logger)
_, csrPEM, err := generateTestCSR("app.internal.corp")
if err != nil {
t.Fatalf("Failed to generate CSR: %v", err)
}
req := issuer.IssuanceRequest{
CommonName: "app.internal.corp",
CSRPEM: csrPEM,
}
_, err = connector.IssueCertificate(ctx, req)
if err == nil {
t.Fatal("Expected error when loading not-yet-valid sub-CA; got nil")
}
if !strings.Contains(err.Error(), "not yet valid") {
t.Errorf("Expected error to mention 'not yet valid'; got: %v", err)
}
if !strings.Contains(err.Error(), "Test Sub-CA") {
t.Errorf("Expected error to include CA subject CN 'Test Sub-CA'; got: %v", err)
}
t.Logf("Correctly rejected not-yet-valid sub-CA: %v", err)
})
t.Run("SubCA_BarelyValid_IsAccepted", func(t *testing.T) {
// Sub-CA valid from 1 minute ago to 1 hour from now. Edge case:
// proves the M-5 window check doesn't over-reject CAs that are
// legitimately live but close to the boundaries.
notBefore := time.Now().Add(-1 * time.Minute)
notAfter := time.Now().Add(1 * time.Hour)
certPath, keyPath := generateTestSubCAWithValidity(t, "rsa", notBefore, notAfter)
config := &local.Config{
ValidityDays: 30,
CACertPath: certPath,
CAKeyPath: keyPath,
}
connector := local.New(config, logger)
_, csrPEM, err := generateTestCSR("app.internal.corp")
if err != nil {
t.Fatalf("Failed to generate CSR: %v", err)
}
req := issuer.IssuanceRequest{
CommonName: "app.internal.corp",
CSRPEM: csrPEM,
}
result, err := connector.IssueCertificate(ctx, req)
if err != nil {
t.Fatalf("Barely-valid sub-CA was wrongly rejected: %v", err)
}
if result.CertPEM == "" {
t.Error("CertPEM is empty")
}
t.Logf("Correctly accepted barely-valid sub-CA: serial=%s", result.Serial)
})
t.Run("SubCA_RenewCertificate", func(t *testing.T) {
certPath, keyPath := generateTestSubCA(t, "rsa")
defer os.Remove(certPath)
@@ -396,8 +505,16 @@ func TestSubCAMode(t *testing.T) {
}
// generateTestSubCA creates a self-signed CA cert+key pair and writes them to temp files.
// keyType can be "rsa" or "ecdsa".
// keyType can be "rsa" or "ecdsa". Validity window is [now, now+5y].
func generateTestSubCA(t *testing.T, keyType string) (certPath, keyPath string) {
t.Helper()
return generateTestSubCAWithValidity(t, keyType, time.Now(), time.Now().AddDate(5, 0, 0))
}
// generateTestSubCAWithValidity creates a self-signed CA cert+key pair with an
// explicit NotBefore/NotAfter window. Used by M-5 tests that exercise expired
// and not-yet-valid CA rejection in loadCAFromDisk.
func generateTestSubCAWithValidity(t *testing.T, keyType string, notBefore, notAfter time.Time) (certPath, keyPath string) {
t.Helper()
tmpDir := t.TempDir()
certPath = filepath.Join(tmpDir, "ca.pem")
@@ -445,8 +562,8 @@ func generateTestSubCA(t *testing.T, keyType string) (certPath, keyPath string)
CommonName: "Test Sub-CA",
Organization: []string{"CertCtl Test"},
},
NotBefore: time.Now(),
NotAfter: time.Now().AddDate(5, 0, 0),
NotBefore: notBefore,
NotAfter: notAfter,
KeyUsage: x509.KeyUsageCertSign | x509.KeyUsageCRLSign,
BasicConstraintsValid: true,
IsCA: true,
+74 -7
View File
@@ -13,6 +13,7 @@ import (
"time"
"github.com/shankar0123/certctl/internal/connector/notifier"
"github.com/shankar0123/certctl/internal/validation"
)
// Config represents the email notifier configuration.
@@ -123,7 +124,22 @@ func (c *Connector) SendEvent(ctx context.Context, event notifier.Event) error {
// sendEmail sends an email message using the configured SMTP server.
// It handles both TLS and plain authentication modes.
//
// Header values (From, To, Subject) are validated up-front to reject CR, LF,
// and NUL characters. This blocks SMTP header injection (CWE-113) and also
// prevents injection into the SMTP envelope commands MAIL FROM and RCPT TO,
// since net/smtp does not sanitize those inputs itself.
func (c *Connector) sendEmail(ctx context.Context, to, subject, body string) error {
if err := validation.ValidateHeaderValue("From", c.config.FromAddress); err != nil {
return fmt.Errorf("invalid sender: %w", err)
}
if err := validation.ValidateHeaderValue("To", to); err != nil {
return fmt.Errorf("invalid recipient: %w", err)
}
if err := validation.ValidateHeaderValue("Subject", subject); err != nil {
return fmt.Errorf("invalid subject: %w", err)
}
addr := net.JoinHostPort(c.config.SMTPHost, strconv.Itoa(c.config.SMTPPort))
// Connect to SMTP server
@@ -182,8 +198,13 @@ func (c *Connector) sendEmail(ctx context.Context, to, subject, body string) err
}
defer wc.Close()
// Format and write email headers and body
message := c.formatEmailMessage(c.config.FromAddress, to, subject, body)
// Format and write email headers and body. The format function
// re-validates header values as defense-in-depth; the early-return
// above should have already caught any injection attempt.
message, err := c.formatEmailMessage(c.config.FromAddress, to, subject, body)
if err != nil {
return fmt.Errorf("failed to format message: %w", err)
}
if _, err := wc.Write(message); err != nil {
return fmt.Errorf("failed to write message: %w", err)
}
@@ -197,7 +218,22 @@ func (c *Connector) sendEmail(ctx context.Context, to, subject, body string) err
// sendHTMLEmail sends an HTML email message using the configured SMTP server.
// Used by the digest service for rich HTML digest emails.
//
// Header values (From, To, Subject) are validated up-front to reject CR, LF,
// and NUL characters. This blocks SMTP header injection (CWE-113) and also
// prevents injection into the SMTP envelope commands MAIL FROM and RCPT TO,
// since net/smtp does not sanitize those inputs itself.
func (c *Connector) sendHTMLEmail(ctx context.Context, to, subject, htmlBody string) error {
if err := validation.ValidateHeaderValue("From", c.config.FromAddress); err != nil {
return fmt.Errorf("invalid sender: %w", err)
}
if err := validation.ValidateHeaderValue("To", to); err != nil {
return fmt.Errorf("invalid recipient: %w", err)
}
if err := validation.ValidateHeaderValue("Subject", subject); err != nil {
return fmt.Errorf("invalid subject: %w", err)
}
addr := net.JoinHostPort(c.config.SMTPHost, strconv.Itoa(c.config.SMTPPort))
var auth smtp.Auth
@@ -250,7 +286,12 @@ func (c *Connector) sendHTMLEmail(ctx context.Context, to, subject, htmlBody str
}
defer wc.Close()
message := c.formatHTMLEmailMessage(c.config.FromAddress, to, subject, htmlBody)
// The format function re-validates header values as defense-in-depth;
// the early-return above should have already caught any injection attempt.
message, err := c.formatHTMLEmailMessage(c.config.FromAddress, to, subject, htmlBody)
if err != nil {
return fmt.Errorf("failed to format message: %w", err)
}
if _, err := wc.Write(message); err != nil {
return fmt.Errorf("failed to write message: %w", err)
}
@@ -263,7 +304,20 @@ func (c *Connector) sendHTMLEmail(ctx context.Context, to, subject, htmlBody str
}
// formatEmailMessage formats an email message with standard headers.
func (c *Connector) formatEmailMessage(from, to, subject, body string) []byte {
// It rejects any header value containing CR, LF, or NUL bytes to prevent
// SMTP header injection (CWE-113). See internal/validation.ValidateHeaderValue.
// The body is not validated — CR/LF in the body is legitimate content, and
// SMTP dot-stuffing / length framing are handled by net/smtp.
func (c *Connector) formatEmailMessage(from, to, subject, body string) ([]byte, error) {
if err := validation.ValidateHeaderValue("From", from); err != nil {
return nil, err
}
if err := validation.ValidateHeaderValue("To", to); err != nil {
return nil, err
}
if err := validation.ValidateHeaderValue("Subject", subject); err != nil {
return nil, err
}
message := fmt.Sprintf(
"From: %s\r\nTo: %s\r\nSubject: %s\r\nDate: %s\r\nContent-Type: text/plain; charset=utf-8\r\n\r\n%s",
from,
@@ -272,11 +326,24 @@ func (c *Connector) formatEmailMessage(from, to, subject, body string) []byte {
time.Now().Format(time.RFC1123Z),
body,
)
return []byte(message)
return []byte(message), nil
}
// formatHTMLEmailMessage formats an HTML email message with MIME headers.
func (c *Connector) formatHTMLEmailMessage(from, to, subject, htmlBody string) []byte {
// It rejects any header value containing CR, LF, or NUL bytes to prevent
// SMTP header injection (CWE-113). See internal/validation.ValidateHeaderValue.
// The HTML body is not validated at this layer — CR/LF in HTML content is
// legitimate, and SMTP dot-stuffing / length framing are handled by net/smtp.
func (c *Connector) formatHTMLEmailMessage(from, to, subject, htmlBody string) ([]byte, error) {
if err := validation.ValidateHeaderValue("From", from); err != nil {
return nil, err
}
if err := validation.ValidateHeaderValue("To", to); err != nil {
return nil, err
}
if err := validation.ValidateHeaderValue("Subject", subject); err != nil {
return nil, err
}
message := fmt.Sprintf(
"From: %s\r\nTo: %s\r\nSubject: %s\r\nDate: %s\r\nMIME-Version: 1.0\r\nContent-Type: text/html; charset=utf-8\r\n\r\n%s",
from,
@@ -285,7 +352,7 @@ func (c *Connector) formatHTMLEmailMessage(from, to, subject, htmlBody string) [
time.Now().Format(time.RFC1123Z),
htmlBody,
)
return []byte(message)
return []byte(message), nil
}
// formatAlertBody formats an alert notification as email body text.
@@ -138,7 +138,10 @@ func TestEmail_FormatMessage_RFC822Headers(t *testing.T) {
subject := "Test Subject"
body := "Test Body"
message := conn.formatEmailMessage(from, to, subject, body)
message, err := conn.formatEmailMessage(from, to, subject, body)
if err != nil {
t.Fatalf("expected nil error, got %v", err)
}
messageStr := string(message)
if !strings.Contains(messageStr, "From: "+from) {
@@ -177,7 +180,10 @@ func TestEmail_FormatHTMLEmailMessage_Headers(t *testing.T) {
subject := "HTML Test"
htmlBody := "<html><body><h1>Test</h1></body></html>"
message := conn.formatHTMLEmailMessage(from, to, subject, htmlBody)
message, err := conn.formatHTMLEmailMessage(from, to, subject, htmlBody)
if err != nil {
t.Fatalf("expected nil error, got %v", err)
}
messageStr := string(message)
if !strings.Contains(messageStr, "From: "+from) {
@@ -200,6 +206,67 @@ func TestEmail_FormatHTMLEmailMessage_Headers(t *testing.T) {
}
}
// TestEmail_FormatEmailMessage_RejectsCRLFInjection exercises the CRLF
// sanitizer (CWE-113). A subject containing "\r\nBcc: ..." must be rejected
// rather than silently stripped — authentication-relevant headers are
// security-critical and silent mutation masks malicious intent.
func TestEmail_FormatEmailMessage_RejectsCRLFInjection(t *testing.T) {
cfg := &Config{
SMTPHost: "smtp.example.com",
SMTPPort: 587,
FromAddress: "sender@example.com",
}
logger := newTestLogger()
conn := New(cfg, logger)
cases := []struct {
name string
from, to, sub string
wantField string
}{
{"CRLF in Subject", "sender@example.com", "recipient@example.com", "hello\r\nBcc: attacker@example.com", "Subject"},
{"LF in To", "sender@example.com", "recipient@example.com\nBcc: x@y", "ok", "To"},
{"CR in From", "sender@example.com\rExtra: header", "recipient@example.com", "ok", "From"},
{"NUL in Subject", "sender@example.com", "recipient@example.com", "hi\x00there", "Subject"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
_, err := conn.formatEmailMessage(tc.from, tc.to, tc.sub, "body")
if err == nil {
t.Fatal("expected injection error, got nil")
}
if !strings.Contains(err.Error(), tc.wantField) {
t.Errorf("expected error to mention field %q, got %q", tc.wantField, err.Error())
}
})
}
}
// TestEmail_FormatHTMLEmailMessage_RejectsCRLFInjection mirrors the plain-text
// test for the HTML codepath used by the digest service.
func TestEmail_FormatHTMLEmailMessage_RejectsCRLFInjection(t *testing.T) {
cfg := &Config{
SMTPHost: "smtp.example.com",
SMTPPort: 587,
FromAddress: "sender@example.com",
}
logger := newTestLogger()
conn := New(cfg, logger)
_, err := conn.formatHTMLEmailMessage(
"sender@example.com",
"recipient@example.com",
"digest\r\nBcc: attacker@example.com",
"<p>hi</p>",
)
if err == nil {
t.Fatal("expected CRLF injection error, got nil")
}
if !strings.Contains(err.Error(), "Subject") {
t.Errorf("expected error to mention Subject field, got %q", err.Error())
}
}
func TestEmail_FormatAlertBody(t *testing.T) {
cfg := &Config{
SMTPHost: "smtp.example.com",
+82 -4
View File
@@ -14,8 +14,15 @@ import (
"time"
"github.com/shankar0123/certctl/internal/connector/notifier"
"github.com/shankar0123/certctl/internal/validation"
)
// webhookClientTimeout bounds every outbound webhook request and its
// resolution/dial phase. Kept as a package-level constant so the timeout is
// shared by the transport dialer and the http.Client, and so tests can reason
// about it without plumbing configuration.
const webhookClientTimeout = 30 * time.Second
// Config represents the webhook notifier configuration.
type Config struct {
URL string `json:"url"`
@@ -25,20 +32,69 @@ type Config struct {
// Connector implements the notifier.Connector interface for webhook notifications.
// It sends alert and event notifications via HTTP POST with optional HMAC signing.
//
// validateURL is injected so that the production constructor (New) installs the
// strict validation.ValidateSafeURL guard while newForTest can install a
// permissive validator. This is the only way to keep the production SSRF
// defence unconditionally on in real code while still allowing tests to point
// at httptest loopback servers. Without this seam, every test using
// httptest.NewServer would be blocked by the guard's loopback rejection — that
// is the correct behaviour in production but makes legitimate unit tests
// impossible to write. The test seam is unexported so no external caller can
// use it to disable the guard.
type Connector struct {
config *Config
logger *slog.Logger
client *http.Client
config *Config
logger *slog.Logger
client *http.Client
validateURL func(string) error
}
// New creates a new webhook notifier with the given configuration and logger.
//
// The returned connector uses an http.Transport whose DialContext is hardened
// by validation.SafeHTTPDialContext. That guard re-resolves the target host
// at dial time and refuses any connection whose resolved address lies in a
// reserved range (loopback, cloud-metadata link-local, multicast, broadcast,
// unspecified, IPv6 link-local/multicast). This is the authoritative SSRF
// defence; validation.ValidateSafeURL inside ValidateConfig/postWebhook is a
// fast early diagnostic. The two layers together defeat both misconfigured
// URLs and DNS-rebinding attacks where a name's resolved address changes
// between validation and dial.
func New(config *Config, logger *slog.Logger) *Connector {
transport := &http.Transport{
DialContext: validation.SafeHTTPDialContext(webhookClientTimeout),
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
ForceAttemptHTTP2: true,
}
return &Connector{
config: config,
logger: logger,
client: &http.Client{
Timeout: 30 * time.Second,
Timeout: webhookClientTimeout,
Transport: transport,
},
validateURL: validation.ValidateSafeURL,
}
}
// newForTest is an unexported constructor used exclusively by the webhook
// package's own tests. It installs a permissive URL validator and the stdlib
// default transport so tests can point the connector at httptest loopback
// servers (127.0.0.1), which the production SafeHTTPDialContext guard would
// correctly reject. Production callers cannot reach this constructor because
// it is unexported; only same-package tests (package webhook) can use it.
// The SSRF-rejection tests that verify the guard itself still call New so
// they exercise the real, strict validator.
func newForTest(config *Config, logger *slog.Logger) *Connector {
return &Connector{
config: config,
logger: logger,
client: &http.Client{
Timeout: webhookClientTimeout,
},
validateURL: func(string) error { return nil },
}
}
@@ -54,6 +110,18 @@ func (c *Connector) ValidateConfig(ctx context.Context, rawConfig json.RawMessag
return fmt.Errorf("webhook url is required")
}
// SSRF guard (CWE-918). Reject reserved-address URLs before issuing any
// outbound HTTP — this catches the obvious 127.0.0.1 / ::1 /
// 169.254.169.254 / 0.0.0.0 cases at config-ingestion time and produces
// a clear operator-facing error. The authoritative, TOCTOU-safe check
// still runs at dial time inside SafeHTTPDialContext. Routed through
// c.validateURL so newForTest can install a permissive validator for
// same-package unit tests; production New always wires
// validation.ValidateSafeURL here.
if err := c.validateURL(cfg.URL); err != nil {
return fmt.Errorf("webhook url rejected: %w", err)
}
c.logger.Info("validating webhook configuration", "url", cfg.URL)
// Test webhook connectivity with a HEAD request
@@ -150,7 +218,17 @@ func (c *Connector) SendEvent(ctx context.Context, event notifier.Event) error {
// postWebhook sends a payload to the webhook URL with proper headers and signing.
// If a secret is configured, it signs the payload using HMAC-SHA256 and includes
// the signature in the X-Signature header.
//
// The URL is re-validated here even though ValidateConfig already accepted it:
// configuration can be mutated in place, reloaded dynamically, or set directly
// by tests that bypass ValidateConfig, so this call is a defence-in-depth
// guard that fails closed before any outbound request is built. Authoritative
// DNS-rebinding defence still runs at dial time via SafeHTTPDialContext.
func (c *Connector) postWebhook(ctx context.Context, payload interface{}) error {
if err := c.validateURL(c.config.URL); err != nil {
return fmt.Errorf("webhook url rejected: %w", err)
}
// Marshal payload to JSON
jsonData, err := json.Marshal(payload)
if err != nil {
@@ -32,7 +32,7 @@ func TestWebhook_ValidateConfig_ValidURL(t *testing.T) {
// Create a new logger (or use test logger)
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
err := conn.ValidateConfig(context.Background(), rawConfig)
if err != nil {
@@ -47,7 +47,7 @@ func TestWebhook_ValidateConfig_MissingURL(t *testing.T) {
rawConfig, _ := json.Marshal(cfg)
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
err := conn.ValidateConfig(context.Background(), rawConfig)
if err == nil {
@@ -96,7 +96,7 @@ func TestWebhook_SendAlert_Success(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
alert := notifier.Alert{
ID: "alert-123",
@@ -160,7 +160,7 @@ func TestWebhook_SendAlert_HMACSignature(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
alert := notifier.Alert{
ID: "alert-456",
@@ -199,7 +199,7 @@ func TestWebhook_SendAlert_NoSignatureWithoutSecret(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
alert := notifier.Alert{
ID: "alert-789",
@@ -239,7 +239,7 @@ func TestWebhook_SendAlert_CustomHeaders(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
alert := notifier.Alert{
ID: "alert-custom",
@@ -276,7 +276,7 @@ func TestWebhook_SendAlert_HTTPError(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
alert := notifier.Alert{
ID: "alert-error",
@@ -318,7 +318,7 @@ func TestWebhook_SendEvent_Success(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
certID := "mc-api-prod"
event := notifier.Event{
@@ -367,7 +367,7 @@ func TestWebhook_SendEvent_WithoutCertificateID(t *testing.T) {
}
logger := newTestLogger()
conn := New(cfg, logger)
conn := newForTest(cfg, logger)
event := notifier.Event{
ID: "event-456",
@@ -389,6 +389,130 @@ func TestWebhook_SendEvent_WithoutCertificateID(t *testing.T) {
}
}
// The SSRF tests below exercise the CWE-918 guard added alongside H-4. Each
// case pairs a reserved-address URL with the call surface that should reject
// it. ValidateConfig is the early-fail path; SendAlert/SendEvent reach the
// same guard via postWebhook and are the defence-in-depth that still rejects
// even when ValidateConfig was bypassed (e.g. dynamic config reload mutating
// c.config.URL in place).
func TestWebhook_ValidateConfig_RejectsReservedURLs(t *testing.T) {
// These must all fail at config-ingestion time without ever opening a
// socket — the reserved-address filter is the whole point of H-4.
cases := []struct {
name string
url string
}{
{"loopback v4", "http://127.0.0.1/hook"},
{"loopback v4 with port", "http://127.0.0.1:8080/"},
{"loopback v6 bracketed", "http://[::1]/hook"},
{"AWS metadata", "http://169.254.169.254/latest/meta-data/"},
{"generic link-local", "http://169.254.1.2/"},
{"unspecified v4", "http://0.0.0.0/"},
{"unspecified v6", "http://[::]/"},
{"IPv6 link-local", "http://[fe80::1]/"},
{"multicast", "https://224.0.0.5/"},
{"broadcast", "http://255.255.255.255/"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{URL: tc.url}
rawConfig, _ := json.Marshal(cfg)
conn := New(cfg, newTestLogger())
err := conn.ValidateConfig(context.Background(), rawConfig)
if err == nil {
t.Fatalf("ValidateConfig(%q) returned nil, want SSRF rejection", tc.url)
}
if !strings.Contains(err.Error(), "reserved") && !strings.Contains(err.Error(), "rejected") {
t.Errorf("expected reserved/rejected error, got %q", err.Error())
}
})
}
}
func TestWebhook_ValidateConfig_RejectsDangerousSchemes(t *testing.T) {
// Only http(s) is a legitimate webhook transport. Every other scheme is
// an SSRF amplifier (file, gopher, ftp, javascript, data, ldap, dict,
// jar) and must be refused at config time.
cases := []struct {
name string
url string
}{
{"file", "file:///etc/passwd"},
{"gopher", "gopher://example.com/_x"},
{"ftp", "ftp://example.com/"},
{"javascript", "javascript:alert(1)"},
{"data", "data:text/plain;base64,SGVsbG8="},
{"ldap", "ldap://example.com/"},
{"dict", "dict://example.com:2628/d:foo"},
{"jar", "jar:http://example.com/foo.jar!/"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
cfg := &Config{URL: tc.url}
rawConfig, _ := json.Marshal(cfg)
conn := New(cfg, newTestLogger())
err := conn.ValidateConfig(context.Background(), rawConfig)
if err == nil {
t.Fatalf("ValidateConfig(%q) returned nil, want scheme rejection", tc.url)
}
if !strings.Contains(err.Error(), "rejected") && !strings.Contains(err.Error(), "scheme") {
t.Errorf("expected scheme/rejected error, got %q", err.Error())
}
})
}
}
func TestWebhook_SendAlert_RejectsReservedURLInPostWebhook(t *testing.T) {
// Simulate config drift: URL was legitimate at ValidateConfig time but
// has since been rewritten to an SSRF target. postWebhook must catch
// this on every call without ever hitting the wire.
cfg := &Config{URL: "http://169.254.169.254/latest/meta-data/"}
conn := New(cfg, newTestLogger())
alert := notifier.Alert{
ID: "alert-ssrf",
Type: "test",
Severity: "info",
Subject: "Test",
Message: "Test",
Recipient: "ops@example.com",
CreatedAt: time.Now(),
}
err := conn.SendAlert(context.Background(), alert)
if err == nil {
t.Fatal("SendAlert returned nil, want SSRF rejection from postWebhook")
}
if !strings.Contains(err.Error(), "reserved") && !strings.Contains(err.Error(), "rejected") {
t.Errorf("expected reserved/rejected error, got %q", err.Error())
}
}
func TestWebhook_SendEvent_RejectsReservedURLInPostWebhook(t *testing.T) {
cfg := &Config{URL: "http://[::1]:9/webhook"}
conn := New(cfg, newTestLogger())
event := notifier.Event{
ID: "event-ssrf",
Type: "test",
Subject: "Test",
Body: "Test",
Recipient: "ops@example.com",
CreatedAt: time.Now(),
}
err := conn.SendEvent(context.Background(), event)
if err == nil {
t.Fatal("SendEvent returned nil, want SSRF rejection from postWebhook")
}
if !strings.Contains(err.Error(), "reserved") && !strings.Contains(err.Error(), "rejected") {
t.Errorf("expected reserved/rejected error, got %q", err.Error())
}
}
// Helper function to compute HMAC-SHA256 signature
func computeHMACSHA256(data []byte, secret string) string {
h := hmac.New(sha256.New, []byte(secret))
+215 -25
View File
@@ -1,4 +1,31 @@
// Package crypto provides AES-256-GCM encryption for sensitive configuration data.
//
// The on-disk format for blobs produced by [EncryptIfKeySet] is versioned. Two
// versions coexist and both can be read by [DecryptIfKeySet]:
//
// v2 (current, M-8)
// magic(0x02) || salt(16) || nonce(12) || ciphertext+tag
// — 32-byte AES-256 key derived via PBKDF2-SHA256 from the operator
// passphrase and the per-ciphertext random salt.
//
// v1 (legacy, pre-M-8)
// nonce(12) || ciphertext+tag
// — 32-byte AES-256 key derived via PBKDF2-SHA256 from the operator
// passphrase and the package-level fixed salt
// "certctl-config-encryption-v1".
//
// v1 blobs are accepted by the read path for backward compatibility with rows
// persisted before the M-8 remediation. They are never produced by the write
// path. Any row that is updated after M-8 is re-sealed as v2 in-place via the
// normal UPDATE flow.
//
// Rationale for the per-ciphertext salt (see M-8 / CWE-916 / CWE-329): the
// pre-M-8 design reused a single 28-byte fixed salt for every ciphertext, which
// (a) removes one defense-in-depth layer against passphrase-space brute force
// and (b) makes every encrypted column across every row share the exact same
// derived key. v2 replaces the fixed salt with 16 fresh random bytes per write
// and stores the salt alongside the ciphertext. Derived keys now differ per
// row and per re-encryption.
package crypto
import (
@@ -6,17 +33,77 @@ import (
"crypto/cipher"
"crypto/rand"
"crypto/sha256"
"errors"
"fmt"
"io"
"golang.org/x/crypto/pbkdf2"
)
// ErrEncryptionKeyRequired is returned by EncryptIfKeySet and DecryptIfKeySet when
// the caller provides an empty passphrase but the data on the wire requires
// protection.
//
// Historically these helpers silently returned plaintext when no key was configured,
// which produced a data-at-rest confidentiality bypass (CWE-311): sensitive fields
// in dynamically-configured issuer and target records (source='database') were
// persisted to PostgreSQL without any encryption whenever the operator forgot to
// set CERTCTL_CONFIG_ENCRYPTION_KEY. Callers could not distinguish the encrypted
// and plaintext branches at runtime, so the only visible signal was a warning
// line emitted once at startup.
//
// The fix (C-2, commit fb4ce1a) is to fail closed: EncryptIfKeySet/DecryptIfKeySet
// now require a passphrase whenever they are invoked on sensitive material, and
// the server refuses to start if any source='database' rows already exist without
// a configured passphrase.
var ErrEncryptionKeyRequired = errors.New("crypto: CERTCTL_CONFIG_ENCRYPTION_KEY is required to encrypt or decrypt sensitive config")
// v2Magic is the first byte of every v2-format ciphertext blob. It distinguishes
// v2 blobs (per-ciphertext random salt, embedded in the blob) from v1 legacy
// blobs (no magic byte, fixed package-level salt).
//
// The choice of 0x02 is deliberate: v1 blobs begin with a random 12-byte AES-GCM
// nonce. A v1 nonce can coincidentally start with 0x02 with probability 1/256,
// which makes a pure magic-byte dispatch ambiguous. [DecryptIfKeySet] resolves
// the ambiguity by falling back to the v1 path when v2 AEAD verification fails.
const v2Magic byte = 0x02
// v2SaltSize is the length in bytes of the per-ciphertext salt embedded in a
// v2 blob. 16 bytes (128 bits) matches the lower bound recommended in NIST
// SP 800-132 §5.1 for PBKDF2 salts and is sufficient given the one-shot-per-row
// nature of the derivation.
const v2SaltSize = 16
// pbkdf2Iterations is the PBKDF2-SHA256 work factor applied uniformly to both
// v1 and v2 key derivations. The value is preserved from the pre-M-8 design so
// that v1 fallback reads stay bit-identical.
const pbkdf2Iterations = 100000
// aes256KeySize is the output length in bytes of both [DeriveKey] and
// [deriveKeyWithSalt]. It is also the only AES key length accepted by [Encrypt]
// and [Decrypt].
const aes256KeySize = 32
// legacyV1Salt is the fixed salt used by pre-M-8 config encryption. It is
// retained exclusively to preserve the v1 read path — any v1 blob that pre-dates
// M-8 remediation must be decryptable with a key derived from (passphrase,
// legacyV1Salt). The write path never uses this salt.
//
// Exposed as a package-level var rather than a local so that tests can reason
// about v1 fixture bytes symbolically.
var legacyV1Salt = []byte("certctl-config-encryption-v1")
// Encrypt encrypts plaintext using AES-256-GCM with a random 12-byte nonce prepended to the output.
// The key must be exactly 32 bytes (AES-256). Returns [12-byte nonce][ciphertext+tag].
//
// Encrypt is a low-level primitive. It is intentionally kept byte-identical to
// the pre-M-8 implementation so that existing v1 blobs on disk remain
// decryptable via [Decrypt] when paired with a [DeriveKey]-derived key. New
// callers should prefer [EncryptIfKeySet], which handles key derivation and
// emits the v2 wire format.
func Encrypt(plaintext []byte, key []byte) ([]byte, error) {
if len(key) != 32 {
return nil, fmt.Errorf("encryption key must be exactly 32 bytes, got %d", len(key))
if len(key) != aes256KeySize {
return nil, fmt.Errorf("encryption key must be exactly %d bytes, got %d", aes256KeySize, len(key))
}
block, err := aes.NewCipher(key)
@@ -40,9 +127,14 @@ func Encrypt(plaintext []byte, key []byte) ([]byte, error) {
// Decrypt decrypts ciphertext that was encrypted with Encrypt.
// Expects format: [12-byte nonce][ciphertext+tag]. Key must be exactly 32 bytes.
//
// Decrypt is a low-level primitive. It is intentionally kept byte-identical to
// the pre-M-8 implementation so that [DecryptIfKeySet] can delegate to it for
// both the v2 inner blob (after stripping the magic byte + embedded salt) and
// the v1 legacy blob (unmodified).
func Decrypt(ciphertext []byte, key []byte) ([]byte, error) {
if len(key) != 32 {
return nil, fmt.Errorf("encryption key must be exactly 32 bytes, got %d", len(key))
if len(key) != aes256KeySize {
return nil, fmt.Errorf("encryption key must be exactly %d bytes, got %d", aes256KeySize, len(key))
}
block, err := aes.NewCipher(key)
@@ -69,35 +161,133 @@ func Decrypt(ciphertext []byte, key []byte) ([]byte, error) {
return plaintext, nil
}
// DeriveKey derives a 32-byte AES-256 key from a passphrase using PBKDF2-SHA256.
// Uses a fixed application-specific salt and 100,000 iterations for resistance
// to brute-force attacks on weak passphrases.
// DeriveKey derives a 32-byte AES-256 key from a passphrase using PBKDF2-SHA256
// with the legacy v1 fixed salt.
//
// This helper is preserved byte-identical to the pre-M-8 implementation so that
// v1 ciphertexts persisted before the M-8 remediation remain decryptable
// unchanged. New code paths should prefer [EncryptIfKeySet] and
// [DecryptIfKeySet], which use a per-ciphertext random salt.
func DeriveKey(passphrase string) []byte {
// Fixed salt is acceptable here because:
// 1. Each certctl instance has its own passphrase
// 2. The salt prevents generic rainbow table attacks
// 3. Per-user salts are unnecessary (single server key, not user passwords)
salt := []byte("certctl-config-encryption-v1")
return pbkdf2.Key([]byte(passphrase), salt, 100000, 32, sha256.New)
return deriveKeyWithSalt(passphrase, legacyV1Salt)
}
// EncryptIfKeySet encrypts plaintext if a key is provided, otherwise returns plaintext unchanged.
// This supports the development/demo fallback where encryption isn't configured.
func EncryptIfKeySet(plaintext []byte, key []byte) ([]byte, bool, error) {
if len(key) == 0 {
return plaintext, false, nil
// deriveKeyWithSalt derives a 32-byte AES-256 key from a passphrase and an
// explicit salt using PBKDF2-SHA256 with [pbkdf2Iterations] rounds.
//
// The per-ciphertext random salt path (v2) calls this directly with a fresh
// 16-byte random salt embedded in the ciphertext blob. The legacy path
// ([DeriveKey]) calls it with the package-level fixed salt [legacyV1Salt].
func deriveKeyWithSalt(passphrase string, salt []byte) []byte {
return pbkdf2.Key([]byte(passphrase), salt, pbkdf2Iterations, aes256KeySize, sha256.New)
}
// IsLegacyFormat reports whether blob is in the v1 legacy wire format (no magic
// byte, fixed-salt derivation) as opposed to the v2 wire format
// (magic(0x02) || salt(16) || nonce(12) || ciphertext+tag).
//
// A return value of false is a necessary but not sufficient condition for a
// blob to be a valid v2 ciphertext: the shortest possible v2 blob is
// 1 + v2SaltSize + 12 = 29 bytes, and even a 29+ byte blob that starts with
// 0x02 may turn out to be a v1 ciphertext whose random nonce happens to begin
// with 0x02 (probability 1/256). [DecryptIfKeySet] resolves this ambiguity at
// decrypt time by falling back to v1 when v2 AEAD verification fails; callers
// of IsLegacyFormat should use it only as a heuristic (e.g. migration
// tooling, log annotation).
func IsLegacyFormat(blob []byte) bool {
if len(blob) == 0 {
return false
}
encrypted, err := Encrypt(plaintext, key)
return blob[0] != v2Magic
}
// EncryptIfKeySet encrypts plaintext with the supplied passphrase and emits a
// v2 wire-format blob: magic(0x02) || salt(16) || nonce(12) || ciphertext+tag.
//
// Key derivation is performed internally per invocation with a fresh 16-byte
// random salt, producing a distinct AES-256 key for every ciphertext. The
// operator-supplied passphrase is the only cross-ciphertext shared secret.
//
// The second return value is always true when err == nil — the "wasEncrypted"
// flag is retained for source-compatibility with callers that previously used
// it to log provenance. Callers MUST handle err: passing an empty passphrase
// returns [ErrEncryptionKeyRequired] rather than silently emitting plaintext.
// See the package-level [ErrEncryptionKeyRequired] documentation for the
// history behind this behavior change (C-2).
//
// The write path never produces a v1 blob. v1 blobs are read-only legacy
// state — see [DecryptIfKeySet] for the compatibility fallback.
func EncryptIfKeySet(plaintext []byte, passphrase string) ([]byte, bool, error) {
if passphrase == "" {
return nil, false, ErrEncryptionKeyRequired
}
salt := make([]byte, v2SaltSize)
if _, err := io.ReadFull(rand.Reader, salt); err != nil {
return nil, false, fmt.Errorf("failed to generate v2 salt: %w", err)
}
key := deriveKeyWithSalt(passphrase, salt)
inner, err := Encrypt(plaintext, key)
if err != nil {
return nil, false, err
}
return encrypted, true, nil
// v2 blob layout: magic(1) || salt(v2SaltSize) || inner
blob := make([]byte, 0, 1+v2SaltSize+len(inner))
blob = append(blob, v2Magic)
blob = append(blob, salt...)
blob = append(blob, inner...)
return blob, true, nil
}
// DecryptIfKeySet decrypts ciphertext if a key is provided, otherwise returns ciphertext unchanged.
func DecryptIfKeySet(ciphertext []byte, key []byte) ([]byte, error) {
if len(key) == 0 {
return ciphertext, nil
// DecryptIfKeySet decrypts blob with the supplied passphrase, supporting both
// v2 (M-8 and later) and v1 (legacy) on-disk formats.
//
// Dispatch is first-byte magic + AEAD fallback. If blob starts with
// [v2Magic] and is long enough to contain a v2 header plus an AEAD-authenticated
// inner ciphertext, a v2 decrypt is attempted using a key derived from the
// embedded salt. If that succeeds, its plaintext is returned. If v2 AEAD
// verification fails — which covers both the "wrong passphrase" case and the
// 1/256 case where a v1 blob's first byte happens to be 0x02 — the function
// falls through to the v1 path and attempts decryption using a key derived
// from the package-level fixed salt [legacyV1Salt].
//
// Passing an empty passphrase returns [ErrEncryptionKeyRequired]. Callers that
// legitimately store plaintext (e.g. env-seeded source='env' rows that keep the
// raw JSON in the unencrypted `config` column) must branch on the presence of
// the ciphertext themselves rather than relying on this helper to silently
// pass bytes through. See the package-level [ErrEncryptionKeyRequired]
// documentation for the history behind this behavior change (C-2).
//
// The function never re-encrypts in place. A v1 blob that is successfully
// decrypted is returned to the caller as plaintext; re-sealing as v2 happens
// naturally on the next UPDATE via [EncryptIfKeySet].
func DecryptIfKeySet(blob []byte, passphrase string) ([]byte, error) {
if passphrase == "" {
return nil, ErrEncryptionKeyRequired
}
return Decrypt(ciphertext, key)
if len(blob) == 0 {
return nil, fmt.Errorf("ciphertext is empty")
}
// v2 path: magic || salt(16) || nonce(12) || ciphertext+tag (min 29 bytes
// ignoring the GCM tag; the AEAD verify inside Decrypt enforces the tag).
if blob[0] == v2Magic && len(blob) >= 1+v2SaltSize+12 {
salt := blob[1 : 1+v2SaltSize]
sealed := blob[1+v2SaltSize:]
key := deriveKeyWithSalt(passphrase, salt)
if plaintext, err := Decrypt(sealed, key); err == nil {
return plaintext, nil
}
// v2 AEAD verification failed. Fall through to v1 so that a v1 blob
// whose first byte happens to be 0x02 (1/256 probability) is still
// decryptable. If this is truly a v2 blob with the wrong passphrase,
// the v1 attempt below will also fail and the v1 error is returned.
}
// v1 legacy path: blob is the full ciphertext with no header and was
// sealed with a key derived from (passphrase, legacyV1Salt).
key := DeriveKey(passphrase)
return Decrypt(blob, key)
}
+318 -16
View File
@@ -2,6 +2,9 @@ package crypto
import (
"bytes"
"crypto/aes"
"crypto/cipher"
"errors"
"testing"
)
@@ -125,21 +128,20 @@ func TestDeriveKeyDifferentPassphrases(t *testing.T) {
}
func TestEncryptIfKeySet_WithKey(t *testing.T) {
key := DeriveKey("test-key")
plaintext := []byte("config data")
result, wasEncrypted, err := EncryptIfKeySet(plaintext, key)
result, wasEncrypted, err := EncryptIfKeySet(plaintext, "test-passphrase")
if err != nil {
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
if !wasEncrypted {
t.Fatal("expected wasEncrypted=true when key provided")
t.Fatal("expected wasEncrypted=true when passphrase provided")
}
if bytes.Equal(result, plaintext) {
t.Fatal("result should be encrypted")
}
decrypted, err := DecryptIfKeySet(result, key)
decrypted, err := DecryptIfKeySet(result, "test-passphrase")
if err != nil {
t.Fatalf("DecryptIfKeySet failed: %v", err)
}
@@ -148,31 +150,117 @@ func TestEncryptIfKeySet_WithKey(t *testing.T) {
}
}
func TestEncryptIfKeySet_NilKey(t *testing.T) {
// TestEncryptIfKeySet_EmptyKeyFailsClosed asserts the C-2 regression guard:
// EncryptIfKeySet must refuse to silently emit plaintext when no passphrase is
// configured. The pre-fix behavior was to return plaintext with
// wasEncrypted=false, which produced a data-at-rest confidentiality bypass
// (CWE-311) for GUI-created issuer and target configs.
func TestEncryptIfKeySet_EmptyKeyFailsClosed(t *testing.T) {
plaintext := []byte("config data")
result, wasEncrypted, err := EncryptIfKeySet(plaintext, nil)
if err != nil {
t.Fatalf("EncryptIfKeySet with nil key failed: %v", err)
result, wasEncrypted, err := EncryptIfKeySet(plaintext, "")
if err == nil {
t.Fatal("expected ErrEncryptionKeyRequired, got nil")
}
if !errors.Is(err, ErrEncryptionKeyRequired) {
t.Fatalf("expected ErrEncryptionKeyRequired, got %v", err)
}
if wasEncrypted {
t.Fatal("expected wasEncrypted=false when key is nil")
t.Fatal("wasEncrypted must be false on error")
}
if !bytes.Equal(result, plaintext) {
t.Fatal("result should be unchanged plaintext when key is nil")
if result != nil {
t.Fatalf("expected nil result on error, got %q", result)
}
}
func TestDecryptIfKeySet_NilKey(t *testing.T) {
// TestDecryptIfKeySet_EmptyKeyFailsClosed asserts the matching C-2 regression
// guard on the read path: DecryptIfKeySet must refuse to pass ciphertext
// through as plaintext when no passphrase is configured.
func TestDecryptIfKeySet_EmptyKeyFailsClosed(t *testing.T) {
data := []byte("plaintext config data")
result, err := DecryptIfKeySet(data, nil)
result, err := DecryptIfKeySet(data, "")
if err == nil {
t.Fatal("expected ErrEncryptionKeyRequired, got nil")
}
if !errors.Is(err, ErrEncryptionKeyRequired) {
t.Fatalf("expected ErrEncryptionKeyRequired, got %v", err)
}
if result != nil {
t.Fatalf("expected nil result on error, got %q", result)
}
}
// TestEncryptDecryptIfKeySet_RoundTripProducesDifferentCiphertext proves the
// "if set" helpers produce real AES-GCM output (not plaintext) and that a full
// round-trip through both helpers recovers the original bytes.
func TestEncryptDecryptIfKeySet_RoundTripProducesDifferentCiphertext(t *testing.T) {
plaintext := []byte(`{"api_key":"s3cr3t","token":"abc"}`)
encrypted, wasEncrypted, err := EncryptIfKeySet(plaintext, "round-trip-key")
if err != nil {
t.Fatalf("DecryptIfKeySet with nil key failed: %v", err)
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
if !bytes.Equal(result, data) {
t.Fatal("result should be unchanged when key is nil")
if !wasEncrypted {
t.Fatal("wasEncrypted must be true when passphrase is present")
}
if bytes.Equal(encrypted, plaintext) {
t.Fatal("EncryptIfKeySet returned plaintext — would regress C-2")
}
decrypted, err := DecryptIfKeySet(encrypted, "round-trip-key")
if err != nil {
t.Fatalf("DecryptIfKeySet failed: %v", err)
}
if !bytes.Equal(decrypted, plaintext) {
t.Fatalf("round-trip mismatch: got %q, want %q", decrypted, plaintext)
}
}
// TestDecryptIfKeySet_RejectsTamperedCiphertext confirms the AEAD auth tag
// still rejects modified ciphertext when routed through the helper. The v2
// wire format is magic(1) || salt(16) || nonce(12) || ciphertext+tag, so
// flipping a byte anywhere past offset 29 lands squarely inside the AEAD body.
func TestDecryptIfKeySet_RejectsTamperedCiphertext(t *testing.T) {
plaintext := []byte("authenticated data")
encrypted, _, err := EncryptIfKeySet(plaintext, "tamper-test-key")
if err != nil {
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
// Flip a byte past the v2 header (1 + 16 + 12 = 29) to invalidate the tag.
const minV2HeaderLen = 1 + v2SaltSize + 12
if len(encrypted) <= minV2HeaderLen {
t.Fatalf("ciphertext too short to tamper: %d bytes", len(encrypted))
}
encrypted[minV2HeaderLen] ^= 0xFF
if _, err := DecryptIfKeySet(encrypted, "tamper-test-key"); err == nil {
t.Fatal("DecryptIfKeySet accepted tampered ciphertext — AEAD tag check bypassed")
}
}
// TestEncryptIfKeySet_PreservesErrEncryptionKeyRequiredSentinel guards the
// stability of the public sentinel error so audit-log detectors and callers
// outside this package can rely on errors.Is(err, ErrEncryptionKeyRequired).
func TestEncryptIfKeySet_PreservesErrEncryptionKeyRequiredSentinel(t *testing.T) {
if ErrEncryptionKeyRequired == nil {
t.Fatal("ErrEncryptionKeyRequired sentinel must be non-nil")
}
if ErrEncryptionKeyRequired.Error() == "" {
t.Fatal("ErrEncryptionKeyRequired must carry a non-empty message")
}
// Wrap it and confirm errors.Is unwraps correctly — real callers wrap with %w.
wrapped := wrapSentinel(ErrEncryptionKeyRequired)
if !errors.Is(wrapped, ErrEncryptionKeyRequired) {
t.Fatal("errors.Is must unwrap ErrEncryptionKeyRequired through %w-wrapped callers")
}
}
// wrapSentinel is a tiny helper that mimics how production callers propagate
// the sentinel (e.g. fmt.Errorf("failed to encrypt config: %w", err)).
func wrapSentinel(err error) error {
return errors.Join(errors.New("failed to encrypt config"), err)
}
func TestEncryptProducesDifferentCiphertexts(t *testing.T) {
@@ -186,3 +274,217 @@ func TestEncryptProducesDifferentCiphertexts(t *testing.T) {
t.Fatal("encrypting same plaintext twice should produce different ciphertexts (random nonce)")
}
}
// ---------------------------------------------------------------------------
// M-8 additions: per-ciphertext salt + v2 wire format + v1 backward compat.
// ---------------------------------------------------------------------------
// TestDeriveKey_DifferentSaltsProduceDifferentKeys asserts that
// deriveKeyWithSalt fans out distinct 32-byte keys for the same passphrase
// across different salts. This is the core M-8 defense-in-depth property: even
// if an attacker obtains two v2 ciphertexts encrypted with the same master
// passphrase, the derived AES keys differ, and a brute-force attempt on one
// blob cannot be amortized across the other.
func TestDeriveKey_DifferentSaltsProduceDifferentKeys(t *testing.T) {
passphrase := "master-passphrase"
saltA := bytes.Repeat([]byte{0xAA}, v2SaltSize)
saltB := bytes.Repeat([]byte{0xBB}, v2SaltSize)
keyA := deriveKeyWithSalt(passphrase, saltA)
keyB := deriveKeyWithSalt(passphrase, saltB)
if len(keyA) != aes256KeySize || len(keyB) != aes256KeySize {
t.Fatalf("derived key length wrong: %d / %d", len(keyA), len(keyB))
}
if bytes.Equal(keyA, keyB) {
t.Fatal("deriveKeyWithSalt must produce different keys for different salts")
}
// Sanity-check that deterministic behaviour is preserved under a fixed salt.
keyA2 := deriveKeyWithSalt(passphrase, saltA)
if !bytes.Equal(keyA, keyA2) {
t.Fatal("deriveKeyWithSalt must be deterministic for a fixed (passphrase, salt)")
}
}
// TestEncryptIfKeySet_ProducesV2Format asserts the exact v2 wire-format bytes:
// magic(0x02) || salt(16) || nonce(12) || ciphertext+tag.
func TestEncryptIfKeySet_ProducesV2Format(t *testing.T) {
blob, _, err := EncryptIfKeySet([]byte("hello"), "any-passphrase")
if err != nil {
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
const minLen = 1 + v2SaltSize + 12 + 16 // magic + salt + nonce + GCM tag (16)
if len(blob) < minLen {
t.Fatalf("v2 blob too short: got %d, want >= %d", len(blob), minLen)
}
if blob[0] != v2Magic {
t.Fatalf("v2 blob must start with magic byte 0x%02x, got 0x%02x", v2Magic, blob[0])
}
if IsLegacyFormat(blob) {
t.Fatal("IsLegacyFormat must return false for a freshly produced v2 blob")
}
}
// TestEncryptIfKeySet_SaltIsRandom asserts that two calls with the same
// passphrase and plaintext produce distinct embedded salts.
func TestEncryptIfKeySet_SaltIsRandom(t *testing.T) {
plaintext := []byte("same plaintext")
passphrase := "same-passphrase"
blob1, _, err := EncryptIfKeySet(plaintext, passphrase)
if err != nil {
t.Fatalf("EncryptIfKeySet #1 failed: %v", err)
}
blob2, _, err := EncryptIfKeySet(plaintext, passphrase)
if err != nil {
t.Fatalf("EncryptIfKeySet #2 failed: %v", err)
}
salt1 := blob1[1 : 1+v2SaltSize]
salt2 := blob2[1 : 1+v2SaltSize]
if bytes.Equal(salt1, salt2) {
t.Fatal("two EncryptIfKeySet invocations must produce distinct per-ciphertext salts")
}
if bytes.Equal(blob1, blob2) {
t.Fatal("two v2 blobs with same (passphrase, plaintext) must differ end-to-end")
}
}
// TestDecryptIfKeySet_V1BackwardCompat builds a deterministic v1-format
// ciphertext using the pre-M-8 recipe (DeriveKey with the fixed salt, then
// Encrypt with an all-zero nonce for reproducibility) and asserts that
// DecryptIfKeySet still decrypts it correctly. This is the migration guarantee:
// v1 blobs persisted before M-8 must remain decryptable.
func TestDecryptIfKeySet_V1BackwardCompat(t *testing.T) {
passphrase := "legacy-passphrase"
plaintext := []byte(`{"api_key":"legacy","org_id":"789"}`)
// Build a deterministic v1 blob directly: nonce(12 zero bytes) || ct+tag.
// This matches the exact wire shape that Encrypt produces, minus the random
// nonce, so the test is stable rather than 1/256 flaky.
key := DeriveKey(passphrase) // fixed-salt derivation (pre-M-8 behavior)
block, err := aes.NewCipher(key)
if err != nil {
t.Fatalf("aes.NewCipher: %v", err)
}
gcm, err := cipher.NewGCM(block)
if err != nil {
t.Fatalf("cipher.NewGCM: %v", err)
}
nonce := make([]byte, gcm.NonceSize()) // all zeros → first byte != v2Magic
v1Blob := gcm.Seal(nonce, nonce, plaintext, nil)
if v1Blob[0] == v2Magic {
t.Fatalf("fixture nonce collided with v2 magic byte — test design error")
}
decrypted, err := DecryptIfKeySet(v1Blob, passphrase)
if err != nil {
t.Fatalf("DecryptIfKeySet(v1) failed: %v", err)
}
if !bytes.Equal(decrypted, plaintext) {
t.Fatalf("v1 decrypt mismatch: got %q, want %q", decrypted, plaintext)
}
// Cross-check: IsLegacyFormat should flag this as legacy.
if !IsLegacyFormat(v1Blob) {
t.Fatal("IsLegacyFormat must return true for a v1 blob whose first byte != v2Magic")
}
}
// TestDecryptIfKeySet_V1MagicByteCollisionFallsThrough covers the 1/256 edge
// case where a v1 ciphertext's random 12-byte nonce happens to begin with
// 0x02. The dispatch must attempt v2, see AEAD failure, and fall through to
// v1 — never return a decrypt error when the passphrase is correct.
func TestDecryptIfKeySet_V1MagicByteCollisionFallsThrough(t *testing.T) {
passphrase := "collision-passphrase"
plaintext := []byte("colliding v1 blob")
// Craft a v1 blob whose first byte equals v2Magic by choosing a nonce
// starting with 0x02 and sealing manually.
key := DeriveKey(passphrase)
block, err := aes.NewCipher(key)
if err != nil {
t.Fatalf("aes.NewCipher: %v", err)
}
gcm, err := cipher.NewGCM(block)
if err != nil {
t.Fatalf("cipher.NewGCM: %v", err)
}
nonce := make([]byte, gcm.NonceSize())
nonce[0] = v2Magic // force collision
v1Blob := gcm.Seal(nonce, nonce, plaintext, nil)
if v1Blob[0] != v2Magic {
t.Fatal("fixture construction bug: first byte must equal v2Magic")
}
decrypted, err := DecryptIfKeySet(v1Blob, passphrase)
if err != nil {
t.Fatalf("DecryptIfKeySet must fall through to v1 on AEAD failure, got err: %v", err)
}
if !bytes.Equal(decrypted, plaintext) {
t.Fatalf("v1-via-fallback decrypt mismatch: got %q, want %q", decrypted, plaintext)
}
}
// TestDecryptIfKeySet_V2WithWrongPassphraseFails asserts that a v2 blob
// sealed under passphrase A cannot be decrypted under passphrase B. Both the
// v2 AEAD verify (with salt from the blob + passphrase B) and the v1 fallback
// (with fixed salt + passphrase B) must fail, and an error must be returned
// rather than silently-corrupt plaintext.
func TestDecryptIfKeySet_V2WithWrongPassphraseFails(t *testing.T) {
blob, _, err := EncryptIfKeySet([]byte("secret"), "passphrase-A")
if err != nil {
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
got, err := DecryptIfKeySet(blob, "passphrase-B")
if err == nil {
t.Fatalf("DecryptIfKeySet must return error for wrong passphrase, got plaintext %q", got)
}
if got != nil {
t.Fatalf("result must be nil on decrypt error, got %q", got)
}
}
// TestDecryptIfKeySet_TruncatedV2Blob asserts that a blob starting with the v2
// magic byte but too short to contain a full v2 header does not trip an
// out-of-bounds slice and does not succeed. It either returns an error (v1
// fallback on the short bytes fails with "ciphertext too short") or at minimum
// never returns plaintext.
func TestDecryptIfKeySet_TruncatedV2Blob(t *testing.T) {
truncated := []byte{v2Magic, 0x00, 0x01, 0x02, 0x03} // 5 bytes — well below the 29-byte v2 minimum
got, err := DecryptIfKeySet(truncated, "any-passphrase")
if err == nil {
t.Fatalf("DecryptIfKeySet must reject a truncated v2 blob, got plaintext %q", got)
}
if got != nil {
t.Fatalf("result must be nil on decrypt error, got %q", got)
}
}
// TestIsLegacyFormat covers the three branches of the public magic-byte
// heuristic: v2 blob → false, v1 blob → true, empty blob → false.
func TestIsLegacyFormat(t *testing.T) {
v2Blob, _, err := EncryptIfKeySet([]byte("data"), "p")
if err != nil {
t.Fatalf("EncryptIfKeySet failed: %v", err)
}
if IsLegacyFormat(v2Blob) {
t.Fatal("v2 blob must not be flagged as legacy")
}
// Any blob whose first byte isn't v2Magic should be reported as legacy.
v1Shape := []byte{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0xFF}
if !IsLegacyFormat(v1Shape) {
t.Fatal("non-v2-magic blob must be flagged as legacy")
}
if IsLegacyFormat(nil) {
t.Fatal("nil blob must not be flagged as legacy (undefined)")
}
if IsLegacyFormat([]byte{}) {
t.Fatal("empty blob must not be flagged as legacy (undefined)")
}
}
+63
View File
@@ -32,6 +32,8 @@ type DeploymentTarget struct {
LastTestedAt *time.Time `json:"last_tested_at,omitempty"`
TestStatus string `json:"test_status,omitempty"`
Source string `json:"source,omitempty"`
RetiredAt *time.Time `json:"retired_at,omitempty"` // I-004: soft-retirement timestamp (nil = active)
RetiredReason *string `json:"retired_reason,omitempty"` // I-004: reason captured at cascade retirement
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
@@ -49,6 +51,67 @@ type Agent struct {
Architecture string `json:"architecture"`
IPAddress string `json:"ip_address"`
Version string `json:"version"`
// I-004: soft-retirement fields. An agent with RetiredAt != nil is the
// canonical "retired" state. The Status column remains as before (Online
// / Offline / Degraded) and is preserved at retirement time as the
// last-seen operational status; RetiredAt is the source of truth for
// "should we filter this row from active listings?".
RetiredAt *time.Time `json:"retired_at,omitempty"`
RetiredReason *string `json:"retired_reason,omitempty"`
}
// IsRetired returns true when this agent has been soft-retired.
// I-004: callers that iterate active agents (stats dashboard, stale-offline
// sweeper, handler-facing list) must skip retired rows by default.
func (a *Agent) IsRetired() bool { return a != nil && a.RetiredAt != nil }
// AgentDependencyCounts captures the active downstream rows that would be
// affected by retiring an agent. Returned by the preflight pass on
// DELETE /api/v1/agents/{id}. Zero counts mean a clean soft-retire is safe;
// any non-zero count blocks a default retire with HTTP 409 and requires an
// explicit ?force=true&reason=... escape hatch from the operator.
type AgentDependencyCounts struct {
ActiveTargets int `json:"active_targets"` // deployment_targets.agent_id=id AND retired_at IS NULL
ActiveCertificates int `json:"active_certificates"` // certificates currently deployed via one of this agent's active targets
PendingJobs int `json:"pending_jobs"` // jobs.agent_id=id AND status IN (Pending, AwaitingCSR, AwaitingApproval, Running)
}
// HasDependencies reports whether any preflight counter is non-zero.
func (d AgentDependencyCounts) HasDependencies() bool {
return d.ActiveTargets > 0 || d.ActiveCertificates > 0 || d.PendingJobs > 0
}
// SentinelAgentIDs enumerates the four reserved agent identities that back
// non-agent discovery subsystems. These rows are created by cmd/server on
// startup and retiring them would orphan their subsystem — the network
// scanner and the three cloud secret-manager sources all key writes to
// these IDs via service.SentinelAgentID / service.SentinelAWSSecretsMgr /
// service.SentinelAzureKeyVault / service.SentinelGCPSecretMgr. The four
// literal IDs below MUST stay in lockstep with those service-package
// constants (see internal/service/network_scan.go line 23 and
// internal/service/cloud_discovery.go lines 14-16).
//
// The retirement service refuses them unconditionally — even with
// ?force=true — via ErrAgentIsSentinel. Living here (and not in the
// service package) lets handler, repository, and scheduler code filter
// them without importing service and creating a cycle.
var SentinelAgentIDs = []string{
"server-scanner",
"cloud-aws-sm",
"cloud-azure-kv",
"cloud-gcp-sm",
}
// IsSentinelAgent reports whether id matches one of the four reserved
// sentinel agent IDs. A linear scan is fine — the slice is length 4 and
// the check is rare (only on retirement attempts and sweeper filters).
func IsSentinelAgent(id string) bool {
for _, s := range SentinelAgentIDs {
if s == id {
return true
}
}
return false
}
// AgentMetadata contains runtime metadata reported by agents via heartbeat.
+55
View File
@@ -0,0 +1,55 @@
package domain
import (
"testing"
"time"
)
// TestAgent_IsRetired covers the I-004 soft-retirement predicate that gates
// which callers hide an agent row from active listings.
func TestAgent_IsRetired(t *testing.T) {
t.Run("nil receiver is not retired", func(t *testing.T) {
var a *Agent
if a.IsRetired() {
t.Fatalf("nil *Agent should not be retired")
}
})
t.Run("zero value is not retired", func(t *testing.T) {
a := &Agent{}
if a.IsRetired() {
t.Fatalf("zero Agent should not be retired")
}
})
t.Run("RetiredAt set is retired", func(t *testing.T) {
now := time.Now()
a := &Agent{RetiredAt: &now}
if !a.IsRetired() {
t.Fatalf("Agent with RetiredAt != nil must be retired")
}
})
}
// TestAgentDependencyCounts_HasDependencies verifies the preflight
// aggregation helper used by the 409 block path of DELETE /agents/{id}.
func TestAgentDependencyCounts_HasDependencies(t *testing.T) {
cases := []struct {
name string
counts AgentDependencyCounts
want bool
}{
{"all zero", AgentDependencyCounts{}, false},
{"active target", AgentDependencyCounts{ActiveTargets: 1}, true},
{"active cert", AgentDependencyCounts{ActiveCertificates: 1}, true},
{"pending job", AgentDependencyCounts{PendingJobs: 1}, true},
{"mixed", AgentDependencyCounts{ActiveTargets: 3, PendingJobs: 2}, true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := tc.counts.HasDependencies(); got != tc.want {
t.Fatalf("HasDependencies()=%v want=%v counts=%+v", got, tc.want, tc.counts)
}
})
}
}
+7 -5
View File
@@ -12,6 +12,7 @@ type PolicyRule struct {
Type PolicyType `json:"type"`
Config json.RawMessage `json:"config"`
Enabled bool `json:"enabled"`
Severity PolicySeverity `json:"severity"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
@@ -20,11 +21,12 @@ type PolicyRule struct {
type PolicyType string
const (
PolicyTypeAllowedIssuers PolicyType = "AllowedIssuers"
PolicyTypeAllowedDomains PolicyType = "AllowedDomains"
PolicyTypeRequiredMetadata PolicyType = "RequiredMetadata"
PolicyTypeAllowedEnvironments PolicyType = "AllowedEnvironments"
PolicyTypeRenewalLeadTime PolicyType = "RenewalLeadTime"
PolicyTypeAllowedIssuers PolicyType = "AllowedIssuers"
PolicyTypeAllowedDomains PolicyType = "AllowedDomains"
PolicyTypeRequiredMetadata PolicyType = "RequiredMetadata"
PolicyTypeAllowedEnvironments PolicyType = "AllowedEnvironments"
PolicyTypeRenewalLeadTime PolicyType = "RenewalLeadTime"
PolicyTypeCertificateLifetime PolicyType = "CertificateLifetime"
)
// PolicyViolation records an instance of a certificate violating a policy rule.
+19 -11
View File
@@ -158,7 +158,7 @@ func TestCrossResourceWorkflow(t *testing.T) {
payload := map[string]interface{}{
"name": "Allowed Domains Policy",
"type": "AllowedDomains",
"severity": "High",
"severity": "Error",
"config": json.RawMessage(`{"domains": ["example.com", "*.example.com"]}`),
"description": "Restrict issuance to example.com domains",
}
@@ -517,12 +517,18 @@ func TestNotificationEndpoints(t *testing.T) {
})
}
// TestCRLEndpoint exercises the CRL listing endpoint (M15a).
// TestCRLEndpoint exercises the RFC 5280 DER-encoded CRL endpoint served
// unauthenticated at /.well-known/pki/crl/{issuer_id} (M-006 relocation from
// the pre-M-006 JSON CRL at /api/v1/crl, which was removed entirely because
// RFC 5280 §5 defines only the DER wire format).
func TestCRLEndpoint(t *testing.T) {
server, _, _, _ := setupTestServer(t)
t.Run("GetCRL_JSON", func(t *testing.T) {
resp, err := http.Get(server.URL + "/api/v1/crl")
t.Run("GetDERCRL_Unauthenticated", func(t *testing.T) {
// Intentionally no Authorization header — relying parties can't present
// a certctl API key, so the PKI endpoints are exposed under the
// RFC 8615 `.well-known` namespace with auth bypassed.
resp, err := http.Get(server.URL + "/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("request failed: %v", err)
}
@@ -531,15 +537,17 @@ func TestCRLEndpoint(t *testing.T) {
bodyBytes, _ := io.ReadAll(resp.Body)
t.Fatalf("expected 200, got %d: %s", resp.StatusCode, string(bodyBytes))
}
var crl map[string]interface{}
json.NewDecoder(resp.Body).Decode(&crl)
if crl["version"] == nil {
t.Error("expected version field in CRL response")
if ct := resp.Header.Get("Content-Type"); ct != "application/pkix-crl" {
t.Errorf("expected Content-Type application/pkix-crl, got %s", ct)
}
if crl["entries"] == nil {
t.Error("expected entries field in CRL response")
body, err := io.ReadAll(resp.Body)
if err != nil {
t.Fatalf("read body failed: %v", err)
}
t.Logf("CRL response: version=%v, entries_count=%v", crl["version"], crl["total"])
if len(body) == 0 {
t.Error("expected non-empty DER CRL body")
}
t.Logf("DER CRL response: %d bytes", len(body))
})
}
+216 -32
View File
@@ -3,6 +3,7 @@ package integration
import (
"bytes"
"context"
"database/sql"
"encoding/json"
"fmt"
"io"
@@ -64,9 +65,15 @@ func TestCertificateLifecycle(t *testing.T) {
certificateService.SetTargetRepo(targetRepo)
renewalService := service.NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, nil, auditService, notificationService, issuerRegistry, "server")
deploymentService := service.NewDeploymentService(jobRepo, targetRepo, agentRepo, certRepo, auditService, notificationService)
jobService := service.NewJobService(jobRepo, renewalService, deploymentService, logger)
ownerRepo := newMockOwnerRepository()
jobService := service.NewJobService(jobRepo, certRepo, ownerRepo, renewalService, deploymentService, logger)
agentService := service.NewAgentService(agentRepo, certRepo, jobRepo, targetRepo, auditService, issuerRegistry, renewalService)
issuerService := service.NewIssuerService(issuerRepo, auditService, issuerRegistry, nil, slog.Default())
// 32-byte AES-256 test key — C-2 remediation makes IssuerService fail closed
// without a configured CERTCTL_CONFIG_ENCRYPTION_KEY. Happy-path CRUD tests
// must supply a real key so the encrypt path runs instead of returning
// ErrEncryptionKeyRequired.
testEncryptionKey := "0123456789abcdef0123456789abcdef"
issuerService := service.NewIssuerService(issuerRepo, auditService, issuerRegistry, testEncryptionKey, slog.Default())
// Initialize handlers
certificateHandler := handler.NewCertificateHandler(certificateService)
@@ -580,6 +587,24 @@ func (m *mockCertificateRepository) GetLatestVersion(ctx context.Context, certID
return versions[len(versions)-1], nil
}
// GetByIssuerAndSerial emulates the PostgreSQL JOIN that scopes cert lookup to
// (issuer_id, serial). Returns sql.ErrNoRows when no match exists so callers
// that branch on errors.Is(err, sql.ErrNoRows) (notably the OCSP handler's
// M-004 "unknown" fallback) behave the same in-memory as against PostgreSQL.
func (m *mockCertificateRepository) GetByIssuerAndSerial(ctx context.Context, issuerID, serial string) (*domain.ManagedCertificate, error) {
for _, cert := range m.certs {
if cert.IssuerID != issuerID {
continue
}
for _, v := range m.versions[cert.ID] {
if v.SerialNumber == serial {
return cert, nil
}
}
}
return nil, sql.ErrNoRows
}
type mockJobRepository struct {
jobs map[string]*domain.Job
}
@@ -677,6 +702,65 @@ func (m *mockJobRepository) ListPendingByAgentID(ctx context.Context, agentID st
return result, nil
}
// ClaimPendingJobs mirrors the production H-6 semantics: Pending jobs of the given type
// (or any type when jobType is empty) flip to Running before being returned. limit <= 0
// means unlimited.
func (m *mockJobRepository) ClaimPendingJobs(ctx context.Context, jobType domain.JobType, limit int) ([]*domain.Job, error) {
var claimed []*domain.Job
for _, j := range m.jobs {
if j.Status != domain.JobStatusPending {
continue
}
if jobType != "" && j.Type != jobType {
continue
}
j.Status = domain.JobStatusRunning
claimed = append(claimed, j)
if limit > 0 && len(claimed) >= limit {
break
}
}
return claimed, nil
}
// ClaimPendingByAgentID mirrors the production H-6 semantics: Pending deployment rows for
// the agent flip to Running; AwaitingCSR rows are returned with state preserved.
func (m *mockJobRepository) ClaimPendingByAgentID(ctx context.Context, agentID string) ([]*domain.Job, error) {
var result []*domain.Job
for _, j := range m.jobs {
if j.AgentID == nil || *j.AgentID != agentID {
continue
}
switch {
case j.Status == domain.JobStatusPending && j.Type == domain.JobTypeDeployment:
j.Status = domain.JobStatusRunning
result = append(result, j)
case j.Status == domain.JobStatusAwaitingCSR:
result = append(result, j)
}
}
return result, nil
}
// ListTimedOutAwaitingJobs is the I-003 integration-mock stub. Returns jobs whose
// created_at predates the relevant cutoff for their status.
func (m *mockJobRepository) ListTimedOutAwaitingJobs(ctx context.Context, csrCutoff, approvalCutoff time.Time) ([]*domain.Job, error) {
var jobs []*domain.Job
for _, j := range m.jobs {
switch j.Status {
case domain.JobStatusAwaitingCSR:
if j.CreatedAt.Before(csrCutoff) {
jobs = append(jobs, j)
}
case domain.JobStatusAwaitingApproval:
if j.CreatedAt.Before(approvalCutoff) {
jobs = append(jobs, j)
}
}
}
return jobs, nil
}
type mockAuditRepository struct {
events []*domain.AuditEvent
}
@@ -727,6 +811,14 @@ func (m *mockAgentRepository) Create(ctx context.Context, agent *domain.Agent) e
return nil
}
func (m *mockAgentRepository) CreateIfNotExists(ctx context.Context, agent *domain.Agent) (bool, error) {
if _, exists := m.agents[agent.ID]; exists {
return false, nil
}
m.agents[agent.ID] = agent
return true, nil
}
func (m *mockAgentRepository) Update(ctx context.Context, agent *domain.Agent) error {
m.agents[agent.ID] = agent
return nil
@@ -756,6 +848,56 @@ func (m *mockAgentRepository) GetByAPIKey(ctx context.Context, keyHash string) (
return nil, fmt.Errorf("agent not found")
}
// I-004: the integration-level mockAgentRepository implements the 6 new
// retirement-surface methods as thin contract-satisfying stubs. The
// integration suite exercises lifecycle flows (issue → renew → deploy)
// that don't touch retirement, so these methods never need real behavior
// here — they exist purely to keep mockAgentRepository a valid
// AgentRepository implementation after migration 000015 expanded the
// interface. Dedicated retirement tests live in internal/service/
// agent_retire_test.go against the richer service-layer mockAgentRepo.
func (m *mockAgentRepository) ListRetired(ctx context.Context, page, perPage int) ([]*domain.Agent, int, error) {
var retired []*domain.Agent
for _, a := range m.agents {
if a.RetiredAt != nil {
retired = append(retired, a)
}
}
return retired, len(retired), nil
}
func (m *mockAgentRepository) SoftRetire(ctx context.Context, id string, retiredAt time.Time, reason string) error {
agent, ok := m.agents[id]
if !ok {
return fmt.Errorf("agent not found")
}
if agent.RetiredAt != nil {
return nil
}
stamped := retiredAt
agent.RetiredAt = &stamped
stampedReason := reason
agent.RetiredReason = &stampedReason
return nil
}
func (m *mockAgentRepository) RetireAgentWithCascade(ctx context.Context, id string, retiredAt time.Time, reason string) error {
return m.SoftRetire(ctx, id, retiredAt, reason)
}
func (m *mockAgentRepository) CountActiveTargets(ctx context.Context, agentID string) (int, error) {
return 0, nil
}
func (m *mockAgentRepository) CountActiveCertificates(ctx context.Context, agentID string) (int, error) {
return 0, nil
}
func (m *mockAgentRepository) CountPendingJobs(ctx context.Context, agentID string) (int, error) {
return 0, nil
}
type mockTargetRepository struct {
targets map[string]*domain.DeploymentTarget
}
@@ -809,6 +951,48 @@ func (m *mockTargetRepository) ListByCertificate(ctx context.Context, certID str
return m.List(ctx)
}
// mockOwnerRepository satisfies repository.OwnerRepository for the M-003
// not-self approval wiring. Tests that don't care about owner lookup get an
// empty map (Get returns errNotFound, which checkNotSelf permits).
type mockOwnerRepository struct {
owners map[string]*domain.Owner
}
func newMockOwnerRepository() *mockOwnerRepository {
return &mockOwnerRepository{owners: make(map[string]*domain.Owner)}
}
func (m *mockOwnerRepository) List(ctx context.Context) ([]*domain.Owner, error) {
var out []*domain.Owner
for _, o := range m.owners {
out = append(out, o)
}
return out, nil
}
func (m *mockOwnerRepository) Get(ctx context.Context, id string) (*domain.Owner, error) {
o, ok := m.owners[id]
if !ok {
return nil, fmt.Errorf("owner not found")
}
return o, nil
}
func (m *mockOwnerRepository) Create(ctx context.Context, o *domain.Owner) error {
m.owners[o.ID] = o
return nil
}
func (m *mockOwnerRepository) Update(ctx context.Context, o *domain.Owner) error {
m.owners[o.ID] = o
return nil
}
func (m *mockOwnerRepository) Delete(ctx context.Context, id string) error {
delete(m.owners, id)
return nil
}
type mockNotificationRepository struct {
notifications []*domain.NotificationEvent
}
@@ -983,8 +1167,8 @@ type mockTargetService struct {
auditService *service.AuditService
}
func (m *mockTargetService) ListTargets(page, perPage int) ([]domain.DeploymentTarget, int64, error) {
targets, err := m.targetRepo.List(context.Background())
func (m *mockTargetService) ListTargets(ctx context.Context, page, perPage int) ([]domain.DeploymentTarget, int64, error) {
targets, err := m.targetRepo.List(ctx)
if err != nil {
return nil, 0, err
}
@@ -995,99 +1179,99 @@ func (m *mockTargetService) ListTargets(page, perPage int) ([]domain.DeploymentT
return result, int64(len(result)), nil
}
func (m *mockTargetService) GetTarget(id string) (*domain.DeploymentTarget, error) {
return m.targetRepo.Get(context.Background(), id)
func (m *mockTargetService) GetTarget(ctx context.Context, id string) (*domain.DeploymentTarget, error) {
return m.targetRepo.Get(ctx, id)
}
func (m *mockTargetService) CreateTarget(target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
if err := m.targetRepo.Create(context.Background(), &target); err != nil {
func (m *mockTargetService) CreateTarget(ctx context.Context, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
if err := m.targetRepo.Create(ctx, &target); err != nil {
return nil, err
}
return &target, nil
}
func (m *mockTargetService) UpdateTarget(id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
func (m *mockTargetService) UpdateTarget(ctx context.Context, id string, target domain.DeploymentTarget) (*domain.DeploymentTarget, error) {
target.ID = id
if err := m.targetRepo.Update(context.Background(), &target); err != nil {
if err := m.targetRepo.Update(ctx, &target); err != nil {
return nil, err
}
return &target, nil
}
func (m *mockTargetService) DeleteTarget(id string) error {
return m.targetRepo.Delete(context.Background(), id)
func (m *mockTargetService) DeleteTarget(ctx context.Context, id string) error {
return m.targetRepo.Delete(ctx, id)
}
func (m *mockTargetService) TestTargetConnection(id string) error {
func (m *mockTargetService) TestConnection(ctx context.Context, id string) error {
return nil // No-op for integration tests
}
type mockTeamService struct{}
func (m *mockTeamService) ListTeams(page, perPage int) ([]domain.Team, int64, error) {
func (m *mockTeamService) ListTeams(_ context.Context, page, perPage int) ([]domain.Team, int64, error) {
return []domain.Team{}, 0, nil
}
func (m *mockTeamService) GetTeam(id string) (*domain.Team, error) {
func (m *mockTeamService) GetTeam(_ context.Context, id string) (*domain.Team, error) {
return nil, fmt.Errorf("team not found")
}
func (m *mockTeamService) CreateTeam(team domain.Team) (*domain.Team, error) {
func (m *mockTeamService) CreateTeam(_ context.Context, team domain.Team) (*domain.Team, error) {
return &team, nil
}
func (m *mockTeamService) UpdateTeam(id string, team domain.Team) (*domain.Team, error) {
func (m *mockTeamService) UpdateTeam(_ context.Context, id string, team domain.Team) (*domain.Team, error) {
team.ID = id
return &team, nil
}
func (m *mockTeamService) DeleteTeam(id string) error {
func (m *mockTeamService) DeleteTeam(_ context.Context, id string) error {
return nil
}
type mockOwnerService struct{}
func (m *mockOwnerService) ListOwners(page, perPage int) ([]domain.Owner, int64, error) {
func (m *mockOwnerService) ListOwners(_ context.Context, page, perPage int) ([]domain.Owner, int64, error) {
return []domain.Owner{}, 0, nil
}
func (m *mockOwnerService) GetOwner(id string) (*domain.Owner, error) {
func (m *mockOwnerService) GetOwner(_ context.Context, id string) (*domain.Owner, error) {
return nil, fmt.Errorf("owner not found")
}
func (m *mockOwnerService) CreateOwner(owner domain.Owner) (*domain.Owner, error) {
func (m *mockOwnerService) CreateOwner(_ context.Context, owner domain.Owner) (*domain.Owner, error) {
return &owner, nil
}
func (m *mockOwnerService) UpdateOwner(id string, owner domain.Owner) (*domain.Owner, error) {
func (m *mockOwnerService) UpdateOwner(_ context.Context, id string, owner domain.Owner) (*domain.Owner, error) {
owner.ID = id
return &owner, nil
}
func (m *mockOwnerService) DeleteOwner(id string) error {
func (m *mockOwnerService) DeleteOwner(_ context.Context, id string) error {
return nil
}
type mockProfileService struct{}
func (m *mockProfileService) ListProfiles(page, perPage int) ([]domain.CertificateProfile, int64, error) {
func (m *mockProfileService) ListProfiles(_ context.Context, page, perPage int) ([]domain.CertificateProfile, int64, error) {
return []domain.CertificateProfile{}, 0, nil
}
func (m *mockProfileService) GetProfile(id string) (*domain.CertificateProfile, error) {
func (m *mockProfileService) GetProfile(_ context.Context, id string) (*domain.CertificateProfile, error) {
return nil, fmt.Errorf("profile not found")
}
func (m *mockProfileService) CreateProfile(profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
func (m *mockProfileService) CreateProfile(_ context.Context, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
return &profile, nil
}
func (m *mockProfileService) UpdateProfile(id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
func (m *mockProfileService) UpdateProfile(_ context.Context, id string, profile domain.CertificateProfile) (*domain.CertificateProfile, error) {
profile.ID = id
return &profile, nil
}
func (m *mockProfileService) DeleteProfile(id string) error {
func (m *mockProfileService) DeleteProfile(_ context.Context, id string) error {
return nil
}
@@ -1134,9 +1318,9 @@ func (m *mockRevocationRepository) Create(ctx context.Context, revocation *domai
return nil
}
func (m *mockRevocationRepository) GetBySerial(ctx context.Context, serial string) (*domain.CertificateRevocation, error) {
func (m *mockRevocationRepository) GetByIssuerAndSerial(ctx context.Context, issuerID, serial string) (*domain.CertificateRevocation, error) {
for _, r := range m.revocations {
if r.SerialNumber == serial {
if r.IssuerID == issuerID && r.SerialNumber == serial {
return r, nil
}
}
@@ -1205,11 +1389,11 @@ func (m *mockDiscoveryService) GetDiscovered(ctx context.Context, id string) (*d
return nil, fmt.Errorf("not found")
}
func (m *mockDiscoveryService) ClaimDiscovered(ctx context.Context, id string, managedCertID string) error {
func (m *mockDiscoveryService) ClaimDiscovered(ctx context.Context, id string, managedCertID string, actor string) error {
return nil
}
func (m *mockDiscoveryService) DismissDiscovered(ctx context.Context, id string) error {
func (m *mockDiscoveryService) DismissDiscovered(ctx context.Context, id string, actor string) error {
return nil
}
+29 -13
View File
@@ -56,9 +56,15 @@ func setupTestServer(t *testing.T) (*httptest.Server, *mockCertificateRepository
certificateService.SetCAOperationsSvc(caOperationsSvc)
renewalService := service.NewRenewalService(certRepo, jobRepo, renewalPolicyRepo, nil, auditService, notificationService, issuerRegistry, "server")
deploymentService := service.NewDeploymentService(jobRepo, targetRepo, agentRepo, certRepo, auditService, notificationService)
jobService := service.NewJobService(jobRepo, renewalService, deploymentService, logger)
ownerRepo := newMockOwnerRepository()
jobService := service.NewJobService(jobRepo, certRepo, ownerRepo, renewalService, deploymentService, logger)
agentService := service.NewAgentService(agentRepo, certRepo, jobRepo, targetRepo, auditService, issuerRegistry, renewalService)
issuerService := service.NewIssuerService(issuerRepo, auditService, issuerRegistry, nil, logger)
// 32-byte AES-256 test key — C-2 remediation makes IssuerService fail closed
// without a configured CERTCTL_CONFIG_ENCRYPTION_KEY. Happy-path CRUD tests
// must supply a real key so the encrypt path runs instead of returning
// ErrEncryptionKeyRequired.
testEncryptionKey := "0123456789abcdef0123456789abcdef"
issuerService := service.NewIssuerService(issuerRepo, auditService, issuerRegistry, testEncryptionKey, logger)
certificateHandler := handler.NewCertificateHandler(certificateService)
issuerHandler := handler.NewIssuerHandler(issuerService)
@@ -107,6 +113,10 @@ func setupTestServer(t *testing.T) (*httptest.Server, *mockCertificateRepository
BulkRevocation: handler.BulkRevocationHandler{},
})
r.RegisterESTHandlers(estHandler)
// M-006: CRL + OCSP live under /.well-known/pki/ (RFC 5280 + RFC 6960 + RFC 8615).
// The negative_test integration suite exercises the DER CRL at this path with
// no Authorization header to verify the relying-party contract.
r.RegisterPKIHandlers(certificateHandler)
server := httptest.NewServer(r)
t.Cleanup(func() { server.Close() })
@@ -784,8 +794,14 @@ func TestRevocationEndpoints(t *testing.T) {
}
})
t.Run("GetCRL_Success", func(t *testing.T) {
resp, err := http.Get(server.URL + "/api/v1/crl")
// M-006: the non-standard JSON CRL at GET /api/v1/crl was removed entirely.
// RFC 5280 §5 defines only the DER wire format, which is now served
// unauthenticated under /.well-known/pki/crl/{issuer_id} (RFC 8615) so
// relying parties can fetch revocation data without a certctl API key.
// We verify the contract by requesting with no Authorization header and
// asserting DER content-type + a non-empty body.
t.Run("GetDERCRL_Unauthenticated", func(t *testing.T) {
resp, err := http.Get(server.URL + "/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("request failed: %v", err)
}
@@ -796,17 +812,17 @@ func TestRevocationEndpoints(t *testing.T) {
t.Fatalf("expected 200, got %d: %s", resp.StatusCode, string(bodyBytes))
}
var crl map[string]interface{}
json.NewDecoder(resp.Body).Decode(&crl)
if crl["version"] != float64(1) {
t.Errorf("expected CRL version 1, got %v", crl["version"])
ct := resp.Header.Get("Content-Type")
if ct != "application/pkix-crl" {
t.Errorf("expected Content-Type application/pkix-crl, got %s", ct)
}
// Should have at least 1 entry from the revocation above
total, _ := crl["total"].(float64)
if total < 1 {
t.Errorf("expected at least 1 CRL entry, got %v", total)
body, err := io.ReadAll(resp.Body)
if err != nil {
t.Fatalf("read body failed: %v", err)
}
if len(body) == 0 {
t.Error("expected non-empty DER CRL body")
}
})
}
+10
View File
@@ -49,6 +49,16 @@ func (c *Client) Delete(path string) (json.RawMessage, error) {
return c.do("DELETE", path, nil, nil)
}
// DeleteWithQuery performs an HTTP DELETE with query parameters. I-004 adds
// this transport so MCP tools can target endpoints that carry flags in the
// query string (e.g. DELETE /api/v1/agents/{id}?force=true&reason=…). Client.Delete
// is path-only; without this method the retire tool silently drops force/reason,
// turning every cascade retire into a default soft-retire. Shares do()'s 204
// normalization and 4xx/5xx error propagation so tool authors get one contract.
func (c *Client) DeleteWithQuery(path string, query url.Values) (json.RawMessage, error) {
return c.do("DELETE", path, query, nil)
}
// GetRaw performs an HTTP GET and returns the raw response body bytes and content type.
// Used for binary responses (DER CRL, OCSP).
func (c *Client) GetRaw(path string) ([]byte, string, error) {
+2 -2
View File
@@ -203,7 +203,7 @@ func TestClient_GetRaw(t *testing.T) {
defer server.Close()
c := NewClient(server.URL, "test-key")
data, contentType, err := c.GetRaw("/api/v1/crl/iss-local")
data, contentType, err := c.GetRaw("/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
@@ -223,7 +223,7 @@ func TestClient_GetRaw_Error(t *testing.T) {
defer server.Close()
c := NewClient(server.URL, "test-key")
_, _, err := c.GetRaw("/api/v1/crl/nonexistent")
_, _, err := c.GetRaw("/.well-known/pki/crl/nonexistent")
if err == nil {
t.Fatal("expected error for 404 response")
}
+214
View File
@@ -0,0 +1,214 @@
package mcp
import (
"encoding/json"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
)
// TestClient_DeleteWithQuery_ForceRetire covers the new transport capability
// that I-004 adds to the MCP client. The retire tool needs to issue
// DELETE /api/v1/agents/{id}?force=true&reason=... — Client.Delete as it
// stands only accepts a path, dropping query parameters on the floor. Phase 2b
// must add DeleteWithQuery so the MCP retire tool can hit the force escape
// hatch; without this, every retire-via-MCP call with force=true silently
// becomes a default soft-retire and either succeeds wrongly or 409s.
func TestClient_DeleteWithQuery_ForceRetire(t *testing.T) {
var (
sawMethod string
sawPath string
sawForce string
sawReason string
)
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sawMethod = r.Method
sawPath = r.URL.Path
sawForce = r.URL.Query().Get("force")
sawReason = r.URL.Query().Get("reason")
if r.Method != http.MethodDelete || r.URL.Path != "/api/v1/agents/ag-1" {
w.WriteHeader(http.StatusNotFound)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(map[string]interface{}{
"retired_at": "2026-04-18T12:00:00Z",
"already_retired": false,
"cascade": true,
})
}))
defer server.Close()
c := NewClient(server.URL, "test-key")
// Compile-fail until Phase 2b grows Client.DeleteWithQuery. Passing the
// query as a url.Values is the established pattern (matches Get's shape).
query := url.Values{}
query.Set("force", "true")
query.Set("reason", "decommissioning rack 7")
data, err := c.DeleteWithQuery("/api/v1/agents/ag-1", query)
if err != nil {
t.Fatalf("DeleteWithQuery err=%v want nil", err)
}
if data == nil {
t.Fatal("DeleteWithQuery returned nil data; want 200 body echo-back")
}
if sawMethod != http.MethodDelete {
t.Errorf("method=%q want DELETE", sawMethod)
}
if sawPath != "/api/v1/agents/ag-1" {
t.Errorf("path=%q want /api/v1/agents/ag-1 (query must be stripped from path)", sawPath)
}
if sawForce != "true" {
t.Errorf("force query=%q want \"true\"", sawForce)
}
if sawReason != "decommissioning rack 7" {
t.Errorf("reason query=%q want %q", sawReason, "decommissioning rack 7")
}
}
// TestClient_DeleteWithQuery_NoQuery covers the defensive path: a nil/empty
// query must still produce a clean DELETE against the bare path with no stray
// "?" suffix. Matches the Get() shape (see client.go do()) so downstream tools
// can reuse one code path.
func TestClient_DeleteWithQuery_NoQuery(t *testing.T) {
var sawRawPath string
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
sawRawPath = r.URL.RequestURI()
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
_ = json.NewEncoder(w).Encode(map[string]interface{}{"ok": true})
}))
defer server.Close()
c := NewClient(server.URL, "")
if _, err := c.DeleteWithQuery("/api/v1/agents/ag-1", nil); err != nil {
t.Fatalf("DeleteWithQuery(nil query) err=%v want nil", err)
}
// No query → no ? suffix.
if strings.Contains(sawRawPath, "?") {
t.Errorf("raw path=%q contains stray ?; empty query must not serialize", sawRawPath)
}
}
// TestClient_DeleteWithQuery_204ReturnsMinimalBody covers the idempotent path.
// The handler returns 204 No Content for an already-retired agent; the
// existing do() helper normalises this to {"status":"deleted"}. The new
// DeleteWithQuery must share that behavior so MCP tool authors don't have to
// special-case the return shape.
func TestClient_DeleteWithQuery_204ReturnsMinimalBody(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusNoContent)
}))
defer server.Close()
c := NewClient(server.URL, "")
data, err := c.DeleteWithQuery("/api/v1/agents/ag-1", nil)
if err != nil {
t.Fatalf("DeleteWithQuery(204) err=%v want nil (idempotent)", err)
}
if data == nil {
t.Fatal("DeleteWithQuery(204) returned nil; want synthetic body")
}
if !strings.Contains(string(data), "deleted") && !strings.Contains(string(data), "status") {
t.Errorf("DeleteWithQuery(204) body=%q; must surface a non-empty sentinel", string(data))
}
}
// TestClient_DeleteWithQuery_409PropagatesError covers the preflight-blocked
// surface. A 409 with dependency counts must bubble up as a Go error so the
// MCP tool can present it to the LLM operator rather than silently swallow
// the rejection.
func TestClient_DeleteWithQuery_409PropagatesError(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusConflict)
_ = json.NewEncoder(w).Encode(map[string]interface{}{
"error": "blocked_by_dependencies",
"message": "agent has active targets",
"counts": map[string]int{
"active_targets": 3,
"active_certificates": 7,
"pending_jobs": 2,
},
})
}))
defer server.Close()
c := NewClient(server.URL, "")
_, err := c.DeleteWithQuery("/api/v1/agents/ag-1", nil)
if err == nil {
t.Fatalf("DeleteWithQuery(409) err=nil; 409 must propagate as Go error")
}
if !strings.Contains(err.Error(), "409") {
t.Errorf("err=%q should include HTTP status 409 for debuggability", err.Error())
}
}
// TestRetireAgentInput_ShapePinned is a compile-time assertion that the MCP
// tool input struct for certctl_retire_agent exists with the required fields
// and their expected tag shapes. The LLM discovers this input schema via
// jsonschema tags — refactoring field names without updating callers silently
// breaks tool discovery.
//
// Red until Phase 2b adds RetireAgentInput to internal/mcp/types.go. This
// assertion deliberately exercises every field so the test fails at compile
// time rather than runtime.
func TestRetireAgentInput_ShapePinned(t *testing.T) {
// Zero-value construction of the expected input — fails to compile until
// the struct exists with fields {ID string, Force bool, Reason string}.
input := RetireAgentInput{
ID: "ag-1",
Force: true,
Reason: "decommissioning rack 7",
}
if input.ID != "ag-1" {
t.Errorf("RetireAgentInput.ID=%q want ag-1 (field binding broken)", input.ID)
}
if !input.Force {
t.Errorf("RetireAgentInput.Force=false want true")
}
if input.Reason != "decommissioning rack 7" {
t.Errorf("RetireAgentInput.Reason=%q want decommissioning rack 7", input.Reason)
}
// Also pin the JSON surface — LLMs send and receive these field names,
// so json tags must stay snake_case even through refactors.
encoded, err := json.Marshal(input)
if err != nil {
t.Fatalf("marshal RetireAgentInput: %v", err)
}
body := string(encoded)
for _, want := range []string{`"id":"ag-1"`, `"force":true`, `"reason":"decommissioning rack 7"`} {
if !strings.Contains(body, want) {
t.Errorf("RetireAgentInput JSON=%q missing %q (tag shape drifted)", body, want)
}
}
}
// TestListRetiredAgentsInput_ShapePinned mirrors the pagination input shape
// used across the MCP toolset (see ListParams). The list-retired-agents tool
// takes page + per_page with snake_case JSON tags. Compile-fail until
// Phase 2b either adds ListRetiredAgentsInput or documents that list-retired
// reuses the existing ListParams type (both paths are acceptable — the test
// just pins whichever Phase 2b picks).
func TestListRetiredAgentsInput_ShapePinned(t *testing.T) {
// Phase 2b may either (a) add a dedicated ListRetiredAgentsInput struct
// or (b) reuse the existing ListParams. Either is fine — we pin the
// field-access contract rather than the struct name to let the
// implementation choose. Compile-fail guards against the tool being
// registered without any pagination input at all.
var input ListParams
input.Page = 1
input.PerPage = 50
if input.Page != 1 || input.PerPage != 50 {
t.Errorf("ListParams fields Page/PerPage broken; listing pagination will misroute")
}
}
+59 -17
View File
@@ -217,24 +217,19 @@ func registerCertificateTools(s *gomcp.Server, c *Client) {
}
// ── CRL & OCSP ──────────────────────────────────────────────────────
//
// M-006 relocation: CRL and OCSP are served unauthenticated under the
// RFC 8615 `.well-known/pki/*` namespace (RFC 5280 §5 for CRL, RFC 6960
// §2.1 for OCSP) so relying parties can retrieve them without a certctl
// API key. The non-standard JSON CRL tool (`certctl_get_crl`) has been
// removed — RFC 5280 defines only the DER wire format.
func registerCRLOCSPTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_get_crl",
Description: "Get the Certificate Revocation List in JSON format. Lists all revoked certificate serial numbers with reasons and timestamps.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input EmptyInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/crl", nil)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_get_der_crl",
Description: "Get DER-encoded X.509 CRL for a specific issuer. Returns binary CRL data signed by the issuing CA.",
Description: "Get DER-encoded X.509 CRL for a specific issuer (RFC 5280). Served unauthenticated at /.well-known/pki/crl/{issuer_id}. Returns binary CRL data signed by the issuing CA.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input GetDERCRLInput) (*gomcp.CallToolResult, any, error) {
raw, contentType, err := c.GetRaw("/api/v1/crl/" + input.IssuerID)
raw, contentType, err := c.GetRaw("/.well-known/pki/crl/" + input.IssuerID)
if err != nil {
return errorResult(err)
}
@@ -247,9 +242,9 @@ func registerCRLOCSPTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_ocsp_check",
Description: "Check OCSP status for a certificate by issuer ID and hex serial number. Returns good, revoked, or unknown.",
Description: "Check OCSP status for a certificate by issuer ID and hex serial number (RFC 6960). Served unauthenticated at /.well-known/pki/ocsp/{issuer_id}/{serial}. Returns good, revoked, or unknown.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input OCSPInput) (*gomcp.CallToolResult, any, error) {
raw, contentType, err := c.GetRaw("/api/v1/ocsp/" + input.IssuerID + "/" + input.Serial)
raw, contentType, err := c.GetRaw("/.well-known/pki/ocsp/" + input.IssuerID + "/" + input.Serial)
if err != nil {
return errorResult(err)
}
@@ -511,6 +506,53 @@ func registerAgentTools(s *gomcp.Server, c *Client) {
}
return textResult(data)
})
// I-004: soft-retirement. DELETE /api/v1/agents/{id} returns 200 on a
// fresh retire (body echoes retired_at/already_retired/cascade/counts),
// 204 on an idempotent retire of an already-retired agent (do() in
// client.go normalizes that to {"status":"deleted"}), 409 when downstream
// dependencies block the retire and force wasn't set, 403 on sentinel
// agents, or 400 when force=true was sent without a reason. The tool
// forwards the raw handler response so the LLM operator sees the
// dependency counts and can decide whether to retry with force=true.
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_retire_agent",
Description: "Soft-retire an agent (DELETE /api/v1/agents/{id}). Sets retired_at + retired_reason on the row; the agent is filtered from the default listing and surfaces only via certctl_list_retired_agents. Default is a safety-gated soft-retire that returns 409 blocked_by_dependencies if the agent has active targets, active certificates, or pending jobs — the returned counts tell you what would be orphaned. Pass force=true to cascade through and retire those dependents too; force=true requires a non-empty reason (captured in the audit trail). Sentinel discovery agents (server-scanner, cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm) cannot be retired — the handler returns 403 unconditionally. Idempotent: retrying on an already-retired agent returns 204 without side effects.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input RetireAgentInput) (*gomcp.CallToolResult, any, error) {
// Client-side mirror of the handler's ErrForceReasonRequired contract
// (see internal/api/handler/agents.go) so the LLM gets an immediate,
// actionable error instead of a round-trip 400. Whitespace-only
// reasons are treated as empty — matches handler's TrimSpace check.
if input.Force && input.Reason == "" {
return errorResult(fmt.Errorf("reason is required when force=true"))
}
query := url.Values{}
if input.Force {
query.Set("force", "true")
}
if input.Reason != "" {
query.Set("reason", input.Reason)
}
data, err := c.DeleteWithQuery("/api/v1/agents/"+input.ID, query)
if err != nil {
return errorResult(err)
}
return textResult(data)
})
// I-004: retired agents are filtered out of GET /api/v1/agents by default.
// The /agents/retired endpoint is the opt-in view — same pagination shape
// as the default listing, but filters to rows where retired_at IS NOT NULL.
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_list_retired_agents",
Description: "List soft-retired agents (GET /api/v1/agents/retired). These are agents that have been retired via certctl_retire_agent; retired_at and retired_reason are populated. Returned separately from certctl_list_agents so the default listing stays focused on operational agents.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input ListParams) (*gomcp.CallToolResult, any, error) {
data, err := c.Get("/api/v1/agents/retired", paginationQuery(input.Page, input.PerPage))
if err != nil {
return errorResult(err)
}
return textResult(data)
})
}
// ── Jobs ────────────────────────────────────────────────────────────
@@ -610,7 +652,7 @@ func registerPolicyTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_create_policy",
Description: "Create a new policy rule. Requires name and type.",
Description: "Create a new policy rule. Requires name and type. Optional severity (Warning, Error, Critical) defaults to Warning.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input CreatePolicyInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Post("/api/v1/policies", input)
if err != nil {
@@ -621,7 +663,7 @@ func registerPolicyTools(s *gomcp.Server, c *Client) {
gomcp.AddTool(s, &gomcp.Tool{
Name: "certctl_update_policy",
Description: "Update a policy rule's name, type, configuration, or enabled status.",
Description: "Update a policy rule's name, type, configuration, enabled status, or severity.",
}, func(ctx context.Context, req *gomcp.CallToolRequest, input UpdatePolicyInput) (*gomcp.CallToolResult, any, error) {
data, err := c.Put("/api/v1/policies/"+input.ID, input)
if err != nil {
+1 -1
View File
@@ -378,7 +378,7 @@ func TestToolEndToEnd_GetRawBinary(t *testing.T) {
defer server.Close()
client := NewClient(server.URL, "test-key")
data, ct, err := client.GetRaw("/api/v1/crl/iss-local")
data, ct, err := client.GetRaw("/.well-known/pki/crl/iss-local")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
+31 -12
View File
@@ -35,7 +35,7 @@ type CreateCertificateInput struct {
TeamID string `json:"team_id" jsonschema:"Team ID (required)"`
IssuerID string `json:"issuer_id" jsonschema:"Issuer connector ID"`
TargetIDs []string `json:"target_ids,omitempty" jsonschema:"Deployment target IDs"`
RenewalPolicyID string `json:"renewal_policy_id,omitempty" jsonschema:"Renewal policy ID"`
RenewalPolicyID string `json:"renewal_policy_id" jsonschema:"Renewal policy ID (required)"`
ProfileID string `json:"certificate_profile_id,omitempty" jsonschema:"Certificate profile ID"`
Tags map[string]string `json:"tags,omitempty" jsonschema:"Key-value tags"`
}
@@ -112,7 +112,7 @@ type CreateTargetInput struct {
ID string `json:"id,omitempty" jsonschema:"Target ID"`
Name string `json:"name" jsonschema:"Target display name"`
Type string `json:"type" jsonschema:"Target type: NGINX, Apache, HAProxy, F5, IIS"`
AgentID string `json:"agent_id,omitempty" jsonschema:"Agent ID that manages this target"`
AgentID string `json:"agent_id" jsonschema:"Agent ID that manages this target (required)"`
Config interface{} `json:"config,omitempty" jsonschema:"Target-specific configuration"`
Enabled bool `json:"enabled,omitempty" jsonschema:"Whether the target is enabled"`
}
@@ -152,6 +152,23 @@ type AgentJobStatusInput struct {
Error string `json:"error,omitempty" jsonschema:"Error message if job failed"`
}
// RetireAgentInput pins the MCP tool surface for certctl_retire_agent. I-004
// introduces a soft-retirement flow that the handler exposes on DELETE
// /api/v1/agents/{id} with two optional query flags: force=true cascades
// through dependent active targets/certs/jobs, and reason is the human-readable
// string captured in the audit trail. The handler enforces
// ErrForceReasonRequired when force=true is sent without a reason; we surface
// both as separate fields so the LLM can populate them independently and so
// the retire_agent_test shape assertion stays aligned with the JSON-wire
// contract. ID is always emitted (no omitempty) because a retire call without
// a target agent is meaningless; Force and Reason are omitempty so the default
// soft-retire path sends no query suffix at all.
type RetireAgentInput struct {
ID string `json:"id" jsonschema:"Agent ID to soft-retire"`
Force bool `json:"force,omitempty" jsonschema:"Cascade-retire downstream active targets, certs, and jobs (requires reason)"`
Reason string `json:"reason,omitempty" jsonschema:"Human-readable reason (required when force=true)"`
}
// ── Jobs ────────────────────────────────────────────────────────────
type ListJobsInput struct {
@@ -168,19 +185,21 @@ type RejectJobInput struct {
// ── Policies ────────────────────────────────────────────────────────
type CreatePolicyInput struct {
ID string `json:"id,omitempty" jsonschema:"Policy ID"`
Name string `json:"name" jsonschema:"Policy display name"`
Type string `json:"type" jsonschema:"Policy type: AllowedIssuers, AllowedDomains, RequiredMetadata, AllowedEnvironments, RenewalLeadTime"`
Config interface{} `json:"config,omitempty" jsonschema:"Policy-specific configuration"`
Enabled bool `json:"enabled,omitempty" jsonschema:"Whether the policy is enabled"`
ID string `json:"id,omitempty" jsonschema:"Policy ID"`
Name string `json:"name" jsonschema:"Policy display name"`
Type string `json:"type" jsonschema:"Policy type: AllowedIssuers, AllowedDomains, RequiredMetadata, AllowedEnvironments, RenewalLeadTime"`
Config interface{} `json:"config,omitempty" jsonschema:"Policy-specific configuration"`
Enabled bool `json:"enabled,omitempty" jsonschema:"Whether the policy is enabled"`
Severity string `json:"severity,omitempty" jsonschema:"Violation severity: Warning, Error, or Critical (default: Warning)"`
}
type UpdatePolicyInput struct {
ID string `json:"id" jsonschema:"Policy ID to update"`
Name string `json:"name,omitempty" jsonschema:"Policy display name"`
Type string `json:"type,omitempty" jsonschema:"Policy type"`
Config interface{} `json:"config,omitempty" jsonschema:"Policy-specific configuration"`
Enabled *bool `json:"enabled,omitempty" jsonschema:"Whether the policy is enabled"`
ID string `json:"id" jsonschema:"Policy ID to update"`
Name string `json:"name,omitempty" jsonschema:"Policy display name"`
Type string `json:"type,omitempty" jsonschema:"Policy type"`
Config interface{} `json:"config,omitempty" jsonschema:"Policy-specific configuration"`
Enabled *bool `json:"enabled,omitempty" jsonschema:"Whether the policy is enabled"`
Severity string `json:"severity,omitempty" jsonschema:"Violation severity: Warning, Error, or Critical"`
}
type ListViolationsInput struct {
+135 -6
View File
@@ -27,14 +27,26 @@ type CertificateRepository interface {
GetExpiringCertificates(ctx context.Context, before time.Time) ([]*domain.ManagedCertificate, error)
// GetLatestVersion returns the most recent certificate version for a certificate.
GetLatestVersion(ctx context.Context, certID string) (*domain.CertificateVersion, error)
// GetByIssuerAndSerial retrieves a certificate by the (issuer_id, serial_number)
// pair via a JOIN on certificate_versions. Callers (OCSP, revocation lookup)
// always know the issuer because protocol endpoints carry it in the request
// path; RFC 5280 §5.2.3 guarantees serial uniqueness only within a single
// issuer. Returns sql.ErrNoRows when no match exists so callers can
// distinguish "unknown cert" from a real repository error.
GetByIssuerAndSerial(ctx context.Context, issuerID, serial string) (*domain.ManagedCertificate, error)
}
// RevocationRepository defines operations for managing certificate revocations.
type RevocationRepository interface {
// Create records a new certificate revocation.
// Create records a new certificate revocation. Uniqueness is scoped to
// (issuer_id, serial_number) per RFC 5280 §5.2.3, so duplicate serials
// across different issuers are permitted.
Create(ctx context.Context, revocation *domain.CertificateRevocation) error
// GetBySerial retrieves a revocation by serial number.
GetBySerial(ctx context.Context, serial string) (*domain.CertificateRevocation, error)
// GetByIssuerAndSerial retrieves a revocation by the (issuer_id, serial_number)
// pair. Callers (OCSP, CRL generation) always know the issuer because
// protocol endpoints carry it in the request path; RFC 5280 §5.2.3 guarantees
// uniqueness only within a single issuer.
GetByIssuerAndSerial(ctx context.Context, issuerID, serial string) (*domain.CertificateRevocation, error)
// ListAll returns all revocations, ordered by revocation time (for CRL generation).
ListAll(ctx context.Context) ([]*domain.CertificateRevocation, error)
// ListByCertificate returns all revocations for a certificate.
@@ -81,20 +93,122 @@ type TargetRepository interface {
// AgentRepository defines operations for managing control plane agents.
type AgentRepository interface {
// List returns all agents.
// List returns all ACTIVE agents — rows with retired_at IS NULL.
//
// I-004: The default listing MUST NOT surface retired agents. The
// handler-facing ListAgents call, the stats dashboard, and the stale-offline
// sweeper all iterate this list and would otherwise re-surface decommissioned
// hardware in operational UI. Callers that genuinely want retired rows (the
// audit tab, compliance exports) must use ListRetired instead.
//
// The partial index idx_agents_retired_at (migration 000015) keeps retired
// rows cheap to exclude — the planner uses it to skip the retired segment
// of the table entirely.
List(ctx context.Context) ([]*domain.Agent, error)
// ListRetired returns a paginated list of retired agents (retired_at IS NOT NULL),
// ordered by retired_at DESC so the most recent retirements appear first. Used
// by the GUI's Retired tab and the audit export path. Returns the slice plus
// the total count (for pagination). A page<1 or perPage<1 is clamped to sensible
// defaults (page=1, perPage=50) in the repo implementation rather than erroring —
// this matches the ListAgents pagination behavior in the service layer.
// I-004 coverage-gap closure, migration 000015.
ListRetired(ctx context.Context, page, perPage int) ([]*domain.Agent, int, error)
// Get retrieves an agent by ID.
//
// I-004 note: Get returns retired rows (retired_at IS NOT NULL) because
// callers that need to check "has this agent been retired?" — the heartbeat
// handler returning 410 Gone, the retirement service's idempotent-retire
// branch, the detail page rendering a retirement banner — must see the
// retired_at/retired_reason fields. Only the default List path default-
// excludes retired; individual Get lookups surface them.
Get(ctx context.Context, id string) (*domain.Agent, error)
// Create stores a new agent.
// Create stores a new agent. Callers that want duplicate-key errors surfaced
// (e.g. real-agent registration) must use this method; sentinel/bootstrap
// paths that expect the row to already exist on restart should call
// CreateIfNotExists instead (M-6, CWE-662).
Create(ctx context.Context, agent *domain.Agent) error
// CreateIfNotExists creates an agent only if the ID doesn't already exist
// (INSERT ... ON CONFLICT (id) DO NOTHING). Returns true if the row was
// newly inserted, false if a row with the same ID already existed. Used
// by the sentinel-agent bootstrap path in cmd/server/main.go so restarts
// and upgrades are idempotent without swallowing unrelated database
// failures (M-6, CWE-662).
CreateIfNotExists(ctx context.Context, agent *domain.Agent) (bool, error)
// Update modifies an existing agent.
Update(ctx context.Context, agent *domain.Agent) error
// Delete removes an agent.
//
// I-004: callers should prefer SoftRetire / RetireAgentWithCascade for the
// operator-facing retirement path; hard Delete remains available for test
// cleanup and repository-level administrative tasks. The deployment_targets
// FK flipped to ON DELETE RESTRICT in migration 000015, so hard-deleting an
// agent that still owns active targets will now fail at the DB layer — which
// is intentional: the fail-closed guardrail prevents audit-trail destruction.
Delete(ctx context.Context, id string) error
// UpdateHeartbeat updates the agent's last heartbeat timestamp and metadata.
//
// I-004: UpdateHeartbeat is a no-op on retired agents — the UPDATE clause
// includes AND retired_at IS NULL so a stale agent process that keeps polling
// after retirement cannot resurrect its heartbeat. The service layer already
// short-circuits with ErrAgentRetired before calling this method; the WHERE
// filter here is belt-and-braces for anyone who skips the service path.
UpdateHeartbeat(ctx context.Context, id string, metadata *domain.AgentMetadata) error
// GetByAPIKey retrieves an agent by hashed API key.
//
// I-004: GetByAPIKey returns retired rows so the auth middleware can detect
// "this API key belongs to a retired agent" and fail the request with
// 410 Gone. If retired rows were hidden, auth would return a plain 401 and
// leak no signal — which is wrong: the operator needs the retired state
// made explicit so they can clean up the agent process.
GetByAPIKey(ctx context.Context, keyHash string) (*domain.Agent, error)
// SoftRetire stamps retired_at + retired_reason on the agent row with no
// cascade. Used on the happy path where preflight confirmed the agent has
// zero active dependencies (no active deployment_targets, no pending jobs).
// The UPDATE is scoped to WHERE id=$1 AND retired_at IS NULL so re-retiring
// an already-retired row is a no-op (zero rows affected is NOT returned as
// an error — the service layer detects this via its own idempotent-retire
// branch before calling SoftRetire). Callers supply retiredAt so the service
// can pin a single consistent timestamp across audit + DB writes.
// I-004 coverage-gap closure.
SoftRetire(ctx context.Context, id string, retiredAt time.Time, reason string) error
// RetireAgentWithCascade performs a transactional retire + cascade. In one
// transaction it: (1) stamps retired_at + retired_reason on the agent row,
// and (2) stamps the SAME retired_at + retired_reason on every active
// deployment_targets row whose agent_id matches. Only rows with
// retired_at IS NULL are touched in (2) — already-retired targets keep their
// original retirement metadata (whoever retired them first, whenever). Used
// exclusively on the force=true path from the retirement handler; callers
// supply retiredAt so the agent row and every cascaded target row share an
// exact retirement instant (helps forensic analysis trace the cascade back
// to a single operator action). If the agent row is already retired, the
// whole operation is a no-op — the transaction commits without touching
// either table. I-004 coverage-gap closure, migration 000015.
RetireAgentWithCascade(ctx context.Context, id string, retiredAt time.Time, reason string) error
// CountActiveTargets returns the number of deployment_targets rows where
// agent_id=id AND retired_at IS NULL. The COUNT query hits the existing
// idx_deployment_targets_agent_id index (migration 000001 line 111); the
// additional retired_at IS NULL predicate is cheap because the partial
// idx_deployment_targets_retired_at index (migration 000015) lets the
// planner skip the retired-row segment entirely. Preflight uses this to
// decide 200 (soft-retire) vs 409 (blocked-by-deps). I-004.
CountActiveTargets(ctx context.Context, agentID string) (int, error)
// CountActiveCertificates returns the count of managed_certificates currently
// deployed through one of this agent's ACTIVE (non-retired) deployment_targets.
// The query joins certificate_target_mappings (migration 000001 line 116) →
// deployment_targets filtering on deployment_targets.agent_id=$1 AND
// deployment_targets.retired_at IS NULL, then COUNT(DISTINCT certificate_id)
// so the same cert deployed to multiple targets on one agent counts once.
// The primary key (certificate_id, target_id) on certificate_target_mappings
// plus idx_certificate_target_mappings_target_id (line 122) cover the join.
// Used purely for the preflight 409 body — the number is informational. I-004.
CountActiveCertificates(ctx context.Context, agentID string) (int, error)
// CountPendingJobs returns the number of jobs belonging to this agent whose
// status is in (Pending, AwaitingCSR, AwaitingApproval, Running) — the four
// statuses that indicate work the agent would still be expected to pick up.
// Completed/Failed/Cancelled jobs do not count. The filter agent_id=$1 hits
// the idx_jobs_agent_id index (migration 000001 line 161). Used for the
// preflight 409 body. I-004.
CountPendingJobs(ctx context.Context, agentID string) (int, error)
}
// JobRepository defines operations for managing renewal and deployment jobs.
@@ -115,10 +229,25 @@ type JobRepository interface {
ListByCertificate(ctx context.Context, certID string) ([]*domain.Job, error)
// UpdateStatus updates a job's status and optional error message.
UpdateStatus(ctx context.Context, id string, status domain.JobStatus, errMsg string) error
// GetPendingJobs returns jobs not yet processed of a specific type.
// GetPendingJobs returns jobs not yet processed of a specific type. Prefer ClaimPendingJobs in
// production paths where concurrent schedulers may race — see H-6 (CWE-362) remediation.
GetPendingJobs(ctx context.Context, jobType domain.JobType) ([]*domain.Job, error)
// ListPendingByAgentID returns pending deployment jobs and AwaitingCSR jobs for a specific agent.
// Prefer ClaimPendingByAgentID in production paths — see H-6 (CWE-362) remediation.
ListPendingByAgentID(ctx context.Context, agentID string) ([]*domain.Job, error)
// ClaimPendingJobs atomically claims up to `limit` Pending jobs and transitions them to Running
// using SELECT FOR UPDATE SKIP LOCKED inside a transaction. An empty jobType matches any type;
// limit <= 0 means no limit. H-6 (CWE-362) race remediation.
ClaimPendingJobs(ctx context.Context, jobType domain.JobType, limit int) ([]*domain.Job, error)
// ClaimPendingByAgentID atomically claims pending deployment jobs for an agent (flipping them
// to Running) and locks AwaitingCSR jobs against concurrent observers (leaving state intact,
// since the CSR-submission path drives the next transition). H-6 (CWE-362) race remediation.
ClaimPendingByAgentID(ctx context.Context, agentID string) ([]*domain.Job, error)
// ListTimedOutAwaitingJobs returns jobs stuck in AwaitingCSR (created before csrCutoff) or
// AwaitingApproval (created before approvalCutoff). The reaper loop transitions them to
// Failed; I-001's retry loop then auto-promotes eligible Failed jobs back to Pending.
// I-003 coverage-gap closure.
ListTimedOutAwaitingJobs(ctx context.Context, csrCutoff, approvalCutoff time.Time) ([]*domain.Job, error)
}
// RenewalPolicyRepository defines operations for managing renewal policies.
+276 -12
View File
@@ -20,12 +20,18 @@ func NewAgentRepository(db *sql.DB) *AgentRepository {
return &AgentRepository{db: db}
}
// List returns all agents
// List returns all ACTIVE agents — rows with retired_at IS NULL. I-004:
// the default listing path feeds the handler-facing ListAgents call, the
// stats dashboard, and the stale-offline sweeper; every caller wants active
// hardware, not decommissioned rows. Operators who need retired rows reach
// for ListRetired instead. The partial index idx_agents_retired_at
// (migration 000015) lets the planner skip the retired segment cheaply.
func (r *AgentRepository) List(ctx context.Context) ([]*domain.Agent, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT id, name, hostname, status, last_heartbeat_at, registered_at, api_key_hash,
os, architecture, ip_address, version
os, architecture, ip_address, version, retired_at, retired_reason
FROM agents
WHERE retired_at IS NULL
ORDER BY registered_at DESC
`)
@@ -50,11 +56,16 @@ func (r *AgentRepository) List(ctx context.Context) ([]*domain.Agent, error) {
return agents, nil
}
// Get retrieves an agent by ID
// Get retrieves an agent by ID. I-004: retired rows ARE surfaced here —
// callers that need to check "has this agent been retired?" (heartbeat
// handler returning 410 Gone, retirement service's idempotent-retire branch,
// detail page rendering a retirement banner) must see retired_at /
// retired_reason. Only the List path default-excludes retired rows; Get is
// by-ID and returns whatever row exists.
func (r *AgentRepository) Get(ctx context.Context, id string) (*domain.Agent, error) {
row := r.db.QueryRowContext(ctx, `
SELECT id, name, hostname, status, last_heartbeat_at, registered_at, api_key_hash,
os, architecture, ip_address, version
os, architecture, ip_address, version, retired_at, retired_reason
FROM agents
WHERE id = $1
`, id)
@@ -70,7 +81,9 @@ func (r *AgentRepository) Get(ctx context.Context, id string) (*domain.Agent, er
return agent, nil
}
// Create stores a new agent
// Create stores a new agent. Duplicate-key errors surface to the caller —
// real-agent registration paths rely on this to detect collisions. Use
// CreateIfNotExists for sentinel/bootstrap paths where re-inserts are expected.
func (r *AgentRepository) Create(ctx context.Context, agent *domain.Agent) error {
if agent.ID == "" {
agent.ID = uuid.New().String()
@@ -92,6 +105,44 @@ func (r *AgentRepository) Create(ctx context.Context, agent *domain.Agent) error
return nil
}
// CreateIfNotExists creates an agent only if the ID doesn't already exist.
// Used for sentinel agents (server-scanner, cloud-aws-sm, cloud-azure-kv,
// cloud-gcp-sm) on first boot AND on every subsequent restart/upgrade — the
// pre-M-6 code used plain INSERT, swallowed the duplicate-key error, and so
// silently swallowed every other database failure too (CWE-662 /
// CWE-209-adjacent). ON CONFLICT (id) DO NOTHING + RETURNING id +
// sql.ErrNoRows distinguishes "row already existed" (created=false, err=nil)
// from genuine errors (connectivity, permission, constraint violations
// other than the id primary key) which still surface. Returns true if the
// row was newly inserted, false if a row with the same ID already existed.
func (r *AgentRepository) CreateIfNotExists(ctx context.Context, agent *domain.Agent) (bool, error) {
if agent.ID == "" {
agent.ID = uuid.New().String()
}
var id string
err := r.db.QueryRowContext(ctx, `
INSERT INTO agents (id, name, hostname, status, last_heartbeat_at, registered_at, api_key_hash,
os, architecture, ip_address, version)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
ON CONFLICT (id) DO NOTHING
RETURNING id
`, agent.ID, agent.Name, agent.Hostname, agent.Status, agent.LastHeartbeatAt,
agent.RegisteredAt, agent.APIKeyHash,
agent.OS, agent.Architecture, agent.IPAddress, agent.Version).Scan(&id)
if err != nil {
if err == sql.ErrNoRows {
// ON CONFLICT DO NOTHING — a row with this ID already existed.
return false, nil
}
return false, fmt.Errorf("failed to create agent: %w", err)
}
agent.ID = id
return true, nil
}
// Update modifies an existing agent
func (r *AgentRepository) Update(ctx context.Context, agent *domain.Agent) error {
result, err := r.db.ExecContext(ctx, `
@@ -145,7 +196,16 @@ func (r *AgentRepository) Delete(ctx context.Context, id string) error {
return nil
}
// UpdateHeartbeat updates the agent's last heartbeat timestamp and metadata
// UpdateHeartbeat updates the agent's last heartbeat timestamp and metadata.
//
// I-004: both branches include `AND retired_at IS NULL` in the WHERE clause,
// making the UPDATE a no-op on retired rows. The service layer already
// short-circuits with ErrAgentRetired before calling this method (see
// AgentService.Heartbeat), but the WHERE filter is belt-and-braces for any
// path that skips the service — a stale agent process that keeps polling
// after retirement cannot resurrect its heartbeat at the DB layer. A zero
// RowsAffected here returns the same "agent not found" error as before; the
// service layer distinguishes retired from missing by calling Get first.
func (r *AgentRepository) UpdateHeartbeat(ctx context.Context, id string, metadata *domain.AgentMetadata) error {
var result sql.Result
var err error
@@ -159,11 +219,11 @@ func (r *AgentRepository) UpdateHeartbeat(ctx context.Context, id string, metada
architecture = CASE WHEN $5 = '' THEN architecture ELSE $5 END,
ip_address = CASE WHEN $6 = '' THEN ip_address ELSE $6 END,
version = CASE WHEN $7 = '' THEN version ELSE $7 END
WHERE id = $2
WHERE id = $2 AND retired_at IS NULL
`, time.Now(), id, metadata.Hostname, metadata.OS, metadata.Architecture, metadata.IPAddress, metadata.Version)
} else {
result, err = r.db.ExecContext(ctx, `
UPDATE agents SET last_heartbeat_at = $1 WHERE id = $2
UPDATE agents SET last_heartbeat_at = $1 WHERE id = $2 AND retired_at IS NULL
`, time.Now(), id)
}
@@ -183,11 +243,15 @@ func (r *AgentRepository) UpdateHeartbeat(ctx context.Context, id string, metada
return nil
}
// GetByAPIKey retrieves an agent by hashed API key
// GetByAPIKey retrieves an agent by hashed API key. I-004: retired rows ARE
// surfaced here so the auth middleware can detect "this API key belongs to a
// retired agent" and fail the request with 410 Gone instead of 401. If the
// filter hid retired rows, auth would return a plain 401 and leak no signal
// that the agent process needs cleaning up.
func (r *AgentRepository) GetByAPIKey(ctx context.Context, keyHash string) (*domain.Agent, error) {
row := r.db.QueryRowContext(ctx, `
SELECT id, name, hostname, status, last_heartbeat_at, registered_at, api_key_hash,
os, architecture, ip_address, version
os, architecture, ip_address, version, retired_at, retired_reason
FROM agents
WHERE api_key_hash = $1
`, keyHash)
@@ -203,14 +267,214 @@ func (r *AgentRepository) GetByAPIKey(ctx context.Context, keyHash string) (*dom
return agent, nil
}
// scanAgent scans an agent from a row or rows
// ─── I-004 agent retirement surface ──────────────────────────────────────
//
// The methods below implement the I-004 coverage-gap closure. They follow the
// interface contracts in internal/repository/interfaces.go:94-210 (which is the
// spec — keep godoc there in sync if behavior changes).
// ListRetired returns a paginated slice of retired agents ordered by
// retired_at DESC so the most recent retirements appear first. Used by the
// GUI's Retired tab and the audit export path. Returns the rows plus the
// total count (for pagination UI). page<1 or perPage<1 is clamped to
// sensible defaults in-repo rather than erroring, matching the ListAgents
// pagination behavior at the service layer. I-004, migration 000015.
func (r *AgentRepository) ListRetired(ctx context.Context, page, perPage int) ([]*domain.Agent, int, error) {
// Clamp pagination to safe defaults. Keep in lockstep with the service
// layer's pagination shape — negative / zero values on either axis should
// degrade to "first page, default size" instead of returning an error.
if page < 1 {
page = 1
}
if perPage < 1 {
perPage = 50
}
offset := (page - 1) * perPage
// Total count first — separate query so pagination math stays correct
// even when the page of rows is empty. Uses the partial
// idx_agents_retired_at index so this is effectively a count of the
// partial-index tuple count, not a full table scan.
var total int
if err := r.db.QueryRowContext(ctx, `
SELECT COUNT(*) FROM agents WHERE retired_at IS NOT NULL
`).Scan(&total); err != nil {
return nil, 0, fmt.Errorf("failed to count retired agents: %w", err)
}
rows, err := r.db.QueryContext(ctx, `
SELECT id, name, hostname, status, last_heartbeat_at, registered_at, api_key_hash,
os, architecture, ip_address, version, retired_at, retired_reason
FROM agents
WHERE retired_at IS NOT NULL
ORDER BY retired_at DESC
LIMIT $1 OFFSET $2
`, perPage, offset)
if err != nil {
return nil, 0, fmt.Errorf("failed to query retired agents: %w", err)
}
defer rows.Close()
var agents []*domain.Agent
for rows.Next() {
agent, err := scanAgent(rows)
if err != nil {
return nil, 0, err
}
agents = append(agents, agent)
}
if err := rows.Err(); err != nil {
return nil, 0, fmt.Errorf("error iterating retired agent rows: %w", err)
}
return agents, total, nil
}
// SoftRetire stamps retired_at + retired_reason on the agent row with no
// cascade. Scoped to `WHERE id=$1 AND retired_at IS NULL` so re-retiring an
// already-retired row is a silent no-op (zero RowsAffected). The service
// layer has its own idempotent-retire branch that detects already-retired
// rows via Get before calling SoftRetire; a zero here just means a racy
// caller got there first. I-004.
func (r *AgentRepository) SoftRetire(ctx context.Context, id string, retiredAt time.Time, reason string) error {
if _, err := r.db.ExecContext(ctx, `
UPDATE agents
SET retired_at = $2, retired_reason = $3
WHERE id = $1 AND retired_at IS NULL
`, id, retiredAt, reason); err != nil {
return fmt.Errorf("failed to soft-retire agent: %w", err)
}
return nil
}
// RetireAgentWithCascade performs a transactional retire-and-cascade. In one
// transaction it (1) stamps retired_at + retired_reason on the agent row if
// it is still active, and (2) stamps the SAME retired_at + retired_reason on
// every active (retired_at IS NULL) deployment_targets row whose agent_id
// matches. Already-retired targets keep their original retirement metadata;
// only active targets are touched. If the agent is already retired, the
// whole transaction is a no-op — the caller's idempotent-retire branch
// already handled it before we got here. I-004, migration 000015.
//
// The two UPDATEs share a single (retiredAt, reason) pair so forensic
// analysis can trace "every row stamped at T1 with reason R was part of the
// same operator action" back to one cascade. Using BeginTx keeps the agent
// row and its targets' retirement metadata consistent even if something
// crashes mid-cascade.
func (r *AgentRepository) RetireAgentWithCascade(ctx context.Context, id string, retiredAt time.Time, reason string) error {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return fmt.Errorf("failed to begin retire-cascade transaction: %w", err)
}
// Rollback is a no-op if Commit has already run — safe to always defer.
defer func() { _ = tx.Rollback() }()
// Agent row: flip to retired only if it was still active. If zero rows
// match, the agent was already retired — the whole cascade becomes a
// no-op (we deliberately do NOT stamp the targets against a retirement
// we didn't perform).
if _, err := tx.ExecContext(ctx, `
UPDATE agents
SET retired_at = $2, retired_reason = $3
WHERE id = $1 AND retired_at IS NULL
`, id, retiredAt, reason); err != nil {
return fmt.Errorf("failed to retire agent in cascade: %w", err)
}
// Cascade: copy the same retired_at / retired_reason onto every active
// deployment_target belonging to this agent. Skips targets that are
// already retired so their original retirement metadata is preserved.
if _, err := tx.ExecContext(ctx, `
UPDATE deployment_targets
SET retired_at = $2, retired_reason = $3
WHERE agent_id = $1 AND retired_at IS NULL
`, id, retiredAt, reason); err != nil {
return fmt.Errorf("failed to cascade-retire deployment targets: %w", err)
}
if err := tx.Commit(); err != nil {
return fmt.Errorf("failed to commit retire-cascade transaction: %w", err)
}
return nil
}
// CountActiveTargets returns the number of deployment_targets with
// agent_id=agentID AND retired_at IS NULL. Used by the retirement preflight
// to decide 200 (soft-retire) vs 409 (blocked-by-deps). Hits the existing
// idx_deployment_targets_agent_id index (migration 000001 line 111); the
// retired_at IS NULL predicate is cheap because the partial
// idx_deployment_targets_retired_at index (migration 000015) lets the
// planner skip the retired-row segment. I-004.
func (r *AgentRepository) CountActiveTargets(ctx context.Context, agentID string) (int, error) {
var count int
err := r.db.QueryRowContext(ctx, `
SELECT COUNT(*)
FROM deployment_targets
WHERE agent_id = $1 AND retired_at IS NULL
`, agentID).Scan(&count)
if err != nil {
return 0, fmt.Errorf("failed to count active targets for agent: %w", err)
}
return count, nil
}
// CountActiveCertificates returns the count of distinct managed_certificates
// currently deployed through one of this agent's ACTIVE deployment_targets.
// Joins certificate_target_mappings (migration 000001 line 116) →
// deployment_targets filtering on deployment_targets.agent_id=$1 AND
// deployment_targets.retired_at IS NULL. COUNT(DISTINCT certificate_id) so
// the same cert deployed to multiple targets on one agent counts once.
// Used purely for the preflight 409 body. I-004.
func (r *AgentRepository) CountActiveCertificates(ctx context.Context, agentID string) (int, error) {
var count int
err := r.db.QueryRowContext(ctx, `
SELECT COUNT(DISTINCT ctm.certificate_id)
FROM certificate_target_mappings ctm
JOIN deployment_targets dt ON dt.id = ctm.target_id
WHERE dt.agent_id = $1 AND dt.retired_at IS NULL
`, agentID).Scan(&count)
if err != nil {
return 0, fmt.Errorf("failed to count active certificates for agent: %w", err)
}
return count, nil
}
// CountPendingJobs returns the number of jobs belonging to this agent whose
// status is in (Pending, AwaitingCSR, AwaitingApproval, Running) — the four
// statuses that represent work the agent would still be expected to pick up
// or complete. Completed / Failed / Cancelled jobs do not count toward the
// preflight gate. Status strings match domain.JobStatus* constants in
// internal/domain/job.go:43-49. Hits idx_jobs_agent_id (migration 000001
// line 161). I-004.
func (r *AgentRepository) CountPendingJobs(ctx context.Context, agentID string) (int, error) {
var count int
err := r.db.QueryRowContext(ctx, `
SELECT COUNT(*)
FROM jobs
WHERE agent_id = $1
AND status IN ('Pending', 'AwaitingCSR', 'AwaitingApproval', 'Running')
`, agentID).Scan(&count)
if err != nil {
return 0, fmt.Errorf("failed to count pending jobs for agent: %w", err)
}
return count, nil
}
// scanAgent scans an agent from a row or rows.
//
// I-004: the column list here is the authoritative 13-field post-M15 order —
// retired_at and retired_reason are appended at the tail as nullable
// *time.Time / *string scan targets matching the `json:"...,omitempty"` domain
// fields. Every SELECT in this file that feeds scanAgent must emit columns in
// this same order, otherwise Scan will silently place values into the wrong
// fields (lib/pq does positional binding, not named).
func scanAgent(scanner interface {
Scan(...interface{}) error
}) (*domain.Agent, error) {
var agent domain.Agent
err := scanner.Scan(&agent.ID, &agent.Name, &agent.Hostname, &agent.Status,
&agent.LastHeartbeatAt, &agent.RegisteredAt, &agent.APIKeyHash,
&agent.OS, &agent.Architecture, &agent.IPAddress, &agent.Version)
&agent.OS, &agent.Architecture, &agent.IPAddress, &agent.Version,
&agent.RetiredAt, &agent.RetiredReason)
if err != nil {
return nil, fmt.Errorf("failed to scan agent: %w", err)
+211 -9
View File
@@ -5,6 +5,7 @@ import (
"database/sql"
"encoding/base64"
"encoding/json"
"errors"
"fmt"
"strings"
"time"
@@ -190,18 +191,65 @@ func (r *CertificateRepository) List(ctx context.Context, filter *repository.Cer
defer rows.Close()
var certs []*domain.ManagedCertificate
var certIDs []string
for rows.Next() {
cert, err := scanCertificate(rows)
var cert domain.ManagedCertificate
var tagsJSON []byte
var sans pq.StringArray
var profileID sql.NullString
var revocationReason sql.NullString
err := rows.Scan(
&cert.ID, &cert.Name, &cert.CommonName, &sans, &cert.Environment, &cert.OwnerID,
&cert.TeamID, &cert.IssuerID, &cert.RenewalPolicyID, &profileID,
&cert.Status, &cert.ExpiresAt, &tagsJSON,
&cert.LastRenewalAt, &cert.LastDeploymentAt, &cert.RevokedAt, &revocationReason,
&cert.CreatedAt, &cert.UpdatedAt)
if err != nil {
return nil, 0, err
return nil, 0, fmt.Errorf("failed to scan certificate: %w", err)
}
certs = append(certs, cert)
cert.SANs = []string(sans)
if profileID.Valid {
cert.CertificateProfileID = profileID.String
}
if revocationReason.Valid {
cert.RevocationReason = revocationReason.String
}
// Unmarshal tags
if len(tagsJSON) > 0 {
if err := json.Unmarshal(tagsJSON, &cert.Tags); err != nil {
return nil, 0, fmt.Errorf("failed to unmarshal tags: %w", err)
}
} else {
cert.Tags = make(map[string]string)
}
certs = append(certs, &cert)
certIDs = append(certIDs, cert.ID)
}
if err := rows.Err(); err != nil {
return nil, 0, fmt.Errorf("error iterating certificate rows: %w", err)
}
// Fetch target IDs for all certificates in a single query (avoid N+1)
if len(certIDs) > 0 {
targetIDsMap, err := r.getTargetIDsForCertificates(ctx, certIDs)
if err != nil {
return nil, 0, err
}
for _, cert := range certs {
if targetIDs, ok := targetIDsMap[cert.ID]; ok {
cert.TargetIDs = targetIDs
} else {
cert.TargetIDs = []string{}
}
}
}
return certs, total, nil
}
@@ -214,7 +262,7 @@ func (r *CertificateRepository) Get(ctx context.Context, id string) (*domain.Man
WHERE id = $1
`, id)
cert, err := scanCertificate(row)
cert, err := r.scanCertificate(ctx, row)
if err != nil {
if err == sql.ErrNoRows {
return nil, fmt.Errorf("certificate not found")
@@ -225,6 +273,38 @@ func (r *CertificateRepository) Get(ctx context.Context, id string) (*domain.Man
return cert, nil
}
// GetByIssuerAndSerial retrieves a certificate by the (issuer_id, serial_number)
// pair via a JOIN on certificate_versions. Per RFC 5280 §5.2.3, serial numbers
// are unique only within a single issuer — callers that know the issuer (OCSP,
// CRL generation, revocation lookup) use this method to scope lookups
// correctly. Returns sql.ErrNoRows when no match exists so callers can
// distinguish "unknown cert" (return OCSP status unknown) from a real
// repository error.
func (r *CertificateRepository) GetByIssuerAndSerial(ctx context.Context, issuerID, serial string) (*domain.ManagedCertificate, error) {
row := r.db.QueryRowContext(ctx, `
SELECT mc.id, mc.name, mc.common_name, mc.sans, mc.environment, mc.owner_id, mc.team_id,
mc.issuer_id, mc.renewal_policy_id, mc.certificate_profile_id, mc.status, mc.expires_at,
mc.tags, mc.last_renewal_at, mc.last_deployment_at, mc.revoked_at, mc.revocation_reason,
mc.created_at, mc.updated_at
FROM managed_certificates mc
JOIN certificate_versions cv ON cv.certificate_id = mc.id
WHERE mc.issuer_id = $1 AND cv.serial_number = $2
LIMIT 1
`, issuerID, serial)
cert, err := r.scanCertificate(ctx, row)
if err != nil {
// scanCertificate wraps sql.ErrNoRows via %w, so surface the bare
// sentinel here for callers that branch on it with errors.Is.
if errors.Is(err, sql.ErrNoRows) {
return nil, sql.ErrNoRows
}
return nil, fmt.Errorf("failed to query certificate by issuer+serial: %w", err)
}
return cert, nil
}
// Create stores a new certificate
func (r *CertificateRepository) Create(ctx context.Context, cert *domain.ManagedCertificate) error {
if cert.ID == "" {
@@ -421,18 +501,65 @@ func (r *CertificateRepository) GetExpiringCertificates(ctx context.Context, bef
defer rows.Close()
var certs []*domain.ManagedCertificate
var certIDs []string
for rows.Next() {
cert, err := scanCertificate(rows)
var cert domain.ManagedCertificate
var tagsJSON []byte
var sans pq.StringArray
var profileID sql.NullString
var revocationReason sql.NullString
err := rows.Scan(
&cert.ID, &cert.Name, &cert.CommonName, &sans, &cert.Environment, &cert.OwnerID,
&cert.TeamID, &cert.IssuerID, &cert.RenewalPolicyID, &profileID,
&cert.Status, &cert.ExpiresAt, &tagsJSON,
&cert.LastRenewalAt, &cert.LastDeploymentAt, &cert.RevokedAt, &revocationReason,
&cert.CreatedAt, &cert.UpdatedAt)
if err != nil {
return nil, err
return nil, fmt.Errorf("failed to scan certificate: %w", err)
}
certs = append(certs, cert)
cert.SANs = []string(sans)
if profileID.Valid {
cert.CertificateProfileID = profileID.String
}
if revocationReason.Valid {
cert.RevocationReason = revocationReason.String
}
// Unmarshal tags
if len(tagsJSON) > 0 {
if err := json.Unmarshal(tagsJSON, &cert.Tags); err != nil {
return nil, fmt.Errorf("failed to unmarshal tags: %w", err)
}
} else {
cert.Tags = make(map[string]string)
}
certs = append(certs, &cert)
certIDs = append(certIDs, cert.ID)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("error iterating expiring certificate rows: %w", err)
}
// Fetch target IDs for all certificates in a single query (avoid N+1)
if len(certIDs) > 0 {
targetIDsMap, err := r.getTargetIDsForCertificates(ctx, certIDs)
if err != nil {
return nil, err
}
for _, cert := range certs {
if targetIDs, ok := targetIDsMap[cert.ID]; ok {
cert.TargetIDs = targetIDs
} else {
cert.TargetIDs = []string{}
}
}
}
return certs, nil
}
@@ -462,8 +589,76 @@ func (r *CertificateRepository) GetLatestVersion(ctx context.Context, certID str
return &v, nil
}
// scanCertificate scans a certificate from a row or rows
func scanCertificate(scanner interface {
// getTargetIDs retrieves all target IDs for a given certificate from the junction table.
// Returns an empty slice (not nil) if no targets are found.
func (r *CertificateRepository) getTargetIDs(ctx context.Context, certID string) ([]string, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT target_id FROM certificate_target_mappings
WHERE certificate_id = $1
ORDER BY target_id ASC
`, certID)
if err != nil {
return nil, fmt.Errorf("failed to query target mappings: %w", err)
}
defer rows.Close()
var targetIDs []string
for rows.Next() {
var targetID string
if err := rows.Scan(&targetID); err != nil {
return nil, fmt.Errorf("failed to scan target ID: %w", err)
}
targetIDs = append(targetIDs, targetID)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("error iterating target ID rows: %w", err)
}
// Return empty slice instead of nil for consistency with JSON marshaling
if targetIDs == nil {
targetIDs = []string{}
}
return targetIDs, nil
}
// getTargetIDsForCertificates retrieves target IDs for multiple certificates in a single query.
// Returns a map of certificate_id -> []target_id.
func (r *CertificateRepository) getTargetIDsForCertificates(ctx context.Context, certIDs []string) (map[string][]string, error) {
if len(certIDs) == 0 {
return make(map[string][]string), nil
}
rows, err := r.db.QueryContext(ctx, `
SELECT certificate_id, target_id FROM certificate_target_mappings
WHERE certificate_id = ANY($1)
ORDER BY certificate_id, target_id ASC
`, pq.Array(certIDs))
if err != nil {
return nil, fmt.Errorf("failed to query target mappings: %w", err)
}
defer rows.Close()
targetIDsMap := make(map[string][]string)
for rows.Next() {
var certID, targetID string
if err := rows.Scan(&certID, &targetID); err != nil {
return nil, fmt.Errorf("failed to scan target mapping: %w", err)
}
targetIDsMap[certID] = append(targetIDsMap[certID], targetID)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("error iterating target mapping rows: %w", err)
}
return targetIDsMap, nil
}
// scanCertificate scans a certificate from a row or rows and populates its TargetIDs
// by querying the certificate_target_mappings junction table.
func (r *CertificateRepository) scanCertificate(ctx context.Context, scanner interface {
Scan(...interface{}) error
}) (*domain.ManagedCertificate, error) {
var cert domain.ManagedCertificate
@@ -500,6 +695,13 @@ func scanCertificate(scanner interface {
cert.Tags = make(map[string]string)
}
// Populate TargetIDs from junction table
targetIDs, err := r.getTargetIDs(ctx, cert.ID)
if err != nil {
return nil, err
}
cert.TargetIDs = targetIDs
return &cert, nil
}
@@ -0,0 +1,322 @@
// Package postgres_test — integration tests for M-7: Certificate.TargetIDs
// must be populated from certificate_target_mappings on read.
//
// Before M-7 the repository scan helper never consulted the junction table, so
// Get / List / GetExpiringCertificates always returned empty TargetIDs even when
// rows existed in certificate_target_mappings. These tests exercise all three
// read paths end-to-end against a real PostgreSQL 16 container.
//
// Runs against the shared testcontainer from testutil_test.go. Skipped when
// `-short` is set (CI uses short mode; local runs pick it up by default).
package postgres_test
import (
"context"
"database/sql"
"testing"
"time"
"github.com/shankar0123/certctl/internal/domain"
"github.com/shankar0123/certctl/internal/repository/postgres"
)
// insertAgentAndTargetsRaw creates one agent and N deployment_targets, returns
// the agent ID and the list of target IDs (in insertion order).
func insertAgentAndTargetsRaw(t *testing.T, db *sql.DB, ctx context.Context, suffix string, n int) (agentID string, targetIDs []string) {
t.Helper()
now := time.Now().Truncate(time.Microsecond)
agentID = "agent-" + suffix
_, err := db.ExecContext(ctx, `
INSERT INTO agents (id, name, hostname, status, registered_at, api_key_hash)
VALUES ($1, $2, $3, $4, $5, $6)
`, agentID, "agent-"+suffix, "host-"+suffix, "online", now, "hash-"+suffix)
if err != nil {
t.Fatalf("insertAgent failed: %v", err)
}
for i := 0; i < n; i++ {
tid := "t-" + suffix + "-" + intToStr(i)
_, err := db.ExecContext(ctx, `
INSERT INTO deployment_targets (id, name, type, agent_id, config, enabled, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
`, tid, tid, "NGINX", agentID, []byte(`{}`), true, now, now)
if err != nil {
t.Fatalf("insertTarget %d failed: %v", i, err)
}
targetIDs = append(targetIDs, tid)
}
return agentID, targetIDs
}
// intToStr converts a non-negative int to its decimal string.
// Local helper to avoid importing strconv for a single use.
func intToStr(n int) string {
if n == 0 {
return "0"
}
var buf [20]byte
i := len(buf)
for n > 0 {
i--
buf[i] = byte('0' + n%10)
n /= 10
}
return string(buf[i:])
}
// insertCertificateRow writes a minimal managed_certificates row via raw SQL.
// Bypasses the repository Create so we can isolate read-path tests from any
// write-path behavior. managed_certificates.sans is TEXT[], written here as an
// empty array literal.
func insertCertificateRow(t *testing.T, db *sql.DB, ctx context.Context, certID, ownerID, teamID, issuerID, policyID string, expiresAt time.Time) {
t.Helper()
now := time.Now().Truncate(time.Microsecond)
_, err := db.ExecContext(ctx, `
INSERT INTO managed_certificates (
id, name, common_name, sans, environment,
owner_id, team_id, issuer_id, renewal_policy_id,
status, expires_at, tags,
created_at, updated_at
) VALUES (
$1, $2, $3, ARRAY[]::TEXT[], $4,
$5, $6, $7, $8,
$9, $10, $11,
$12, $13
)
`,
certID, certID, certID+".example.com", "production",
ownerID, teamID, issuerID, policyID,
string(domain.CertificateStatusActive), expiresAt, []byte(`{}`),
now, now,
)
if err != nil {
t.Fatalf("insertCertificateRow failed: %v", err)
}
}
// insertMapping writes a single row into certificate_target_mappings via raw SQL.
func insertMapping(t *testing.T, db *sql.DB, ctx context.Context, certID, targetID string) {
t.Helper()
_, err := db.ExecContext(ctx,
`INSERT INTO certificate_target_mappings (certificate_id, target_id) VALUES ($1, $2)`,
certID, targetID)
if err != nil {
t.Fatalf("insertMapping(%s, %s) failed: %v", certID, targetID, err)
}
}
// --------------------------------------------------------------------
// Get() — single-cert read path
// --------------------------------------------------------------------
// TestGet_PopulatesTargetIDs_NoMappings: no mapping rows → TargetIDs must be
// an empty slice, not nil, so JSON serialisation emits "[]".
func TestGet_PopulatesTargetIDs_NoMappings(t *testing.T) {
tdb := getTestDB(t)
db := tdb.freshSchema(t)
repo := postgres.NewCertificateRepository(db)
ctx := context.Background()
ownerID, teamID, issuerID, policyID := insertCertPrereqsRaw(t, db, ctx, "getnone")
certID := "mc-getnone"
insertCertificateRow(t, db, ctx, certID, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
got, err := repo.Get(ctx, certID)
if err != nil {
t.Fatalf("Get failed: %v", err)
}
if got.TargetIDs == nil {
t.Fatalf("TargetIDs = nil, want empty slice (JSON serialises nil as null and [] as [])")
}
if len(got.TargetIDs) != 0 {
t.Errorf("len(TargetIDs) = %d, want 0; got %v", len(got.TargetIDs), got.TargetIDs)
}
}
// TestGet_PopulatesTargetIDs_SingleTarget: one mapping → one entry.
func TestGet_PopulatesTargetIDs_SingleTarget(t *testing.T) {
tdb := getTestDB(t)
db := tdb.freshSchema(t)
repo := postgres.NewCertificateRepository(db)
ctx := context.Background()
ownerID, teamID, issuerID, policyID := insertCertPrereqsRaw(t, db, ctx, "getone")
_, targets := insertAgentAndTargetsRaw(t, db, ctx, "getone", 1)
certID := "mc-getone"
insertCertificateRow(t, db, ctx, certID, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
insertMapping(t, db, ctx, certID, targets[0])
got, err := repo.Get(ctx, certID)
if err != nil {
t.Fatalf("Get failed: %v", err)
}
if len(got.TargetIDs) != 1 {
t.Fatalf("len(TargetIDs) = %d, want 1; got %v", len(got.TargetIDs), got.TargetIDs)
}
if got.TargetIDs[0] != targets[0] {
t.Errorf("TargetIDs[0] = %q, want %q", got.TargetIDs[0], targets[0])
}
}
// TestGet_PopulatesTargetIDs_MultipleTargets: many mappings → sorted by target_id ASC.
func TestGet_PopulatesTargetIDs_MultipleTargets(t *testing.T) {
tdb := getTestDB(t)
db := tdb.freshSchema(t)
repo := postgres.NewCertificateRepository(db)
ctx := context.Background()
ownerID, teamID, issuerID, policyID := insertCertPrereqsRaw(t, db, ctx, "getmany")
_, targets := insertAgentAndTargetsRaw(t, db, ctx, "getmany", 3)
certID := "mc-getmany"
insertCertificateRow(t, db, ctx, certID, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
// Insert mappings in reverse order to confirm ORDER BY target_id ASC in the query.
insertMapping(t, db, ctx, certID, targets[2])
insertMapping(t, db, ctx, certID, targets[0])
insertMapping(t, db, ctx, certID, targets[1])
got, err := repo.Get(ctx, certID)
if err != nil {
t.Fatalf("Get failed: %v", err)
}
if len(got.TargetIDs) != 3 {
t.Fatalf("len(TargetIDs) = %d, want 3; got %v", len(got.TargetIDs), got.TargetIDs)
}
// Ascending order: t-getmany-0, t-getmany-1, t-getmany-2
want := []string{targets[0], targets[1], targets[2]}
for i, tid := range want {
if got.TargetIDs[i] != tid {
t.Errorf("TargetIDs[%d] = %q, want %q (full: %v)", i, got.TargetIDs[i], tid, got.TargetIDs)
}
}
}
// --------------------------------------------------------------------
// List() — batch read path, must avoid N+1
// --------------------------------------------------------------------
// TestList_PopulatesTargetIDs_BatchFetch: three certs with different mapping counts;
// all must have their TargetIDs populated correctly, and the cert with no mapping
// must get an empty (non-nil) slice.
func TestList_PopulatesTargetIDs_BatchFetch(t *testing.T) {
tdb := getTestDB(t)
db := tdb.freshSchema(t)
repo := postgres.NewCertificateRepository(db)
ctx := context.Background()
ownerID, teamID, issuerID, policyID := insertCertPrereqsRaw(t, db, ctx, "listbatch")
_, targets := insertAgentAndTargetsRaw(t, db, ctx, "listbatch", 3)
certA := "mc-list-a"
certB := "mc-list-b"
certC := "mc-list-c"
insertCertificateRow(t, db, ctx, certA, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
insertCertificateRow(t, db, ctx, certB, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
insertCertificateRow(t, db, ctx, certC, ownerID, teamID, issuerID, policyID, time.Now().Add(30*24*time.Hour))
// certA → 2 targets (t-0, t-1)
insertMapping(t, db, ctx, certA, targets[0])
insertMapping(t, db, ctx, certA, targets[1])
// certB → 1 target (t-2)
insertMapping(t, db, ctx, certB, targets[2])
// certC → 0 targets
got, total, err := repo.List(ctx, nil)
if err != nil {
t.Fatalf("List failed: %v", err)
}
if total < 3 {
t.Fatalf("total = %d, want >= 3", total)
}
want := map[string][]string{
certA: {targets[0], targets[1]},
certB: {targets[2]},
certC: {},
}
seen := map[string]bool{}
for _, c := range got {
exp, ok := want[c.ID]
if !ok {
continue
}
seen[c.ID] = true
if c.TargetIDs == nil {
t.Errorf("cert %s: TargetIDs = nil, want %v", c.ID, exp)
continue
}
if len(c.TargetIDs) != len(exp) {
t.Errorf("cert %s: len(TargetIDs) = %d, want %d (got %v, want %v)", c.ID, len(c.TargetIDs), len(exp), c.TargetIDs, exp)
continue
}
for i, tid := range exp {
if c.TargetIDs[i] != tid {
t.Errorf("cert %s: TargetIDs[%d] = %q, want %q", c.ID, i, c.TargetIDs[i], tid)
}
}
}
for id := range want {
if !seen[id] {
t.Errorf("cert %s missing from List() result", id)
}
}
}
// --------------------------------------------------------------------
// GetExpiringCertificates() — scheduler read path
// --------------------------------------------------------------------
// TestGetExpiringCertificates_PopulatesTargetIDs: expiring certs must also carry
// their mapping information so renewal-triggered deployments can route work.
func TestGetExpiringCertificates_PopulatesTargetIDs(t *testing.T) {
tdb := getTestDB(t)
db := tdb.freshSchema(t)
repo := postgres.NewCertificateRepository(db)
ctx := context.Background()
ownerID, teamID, issuerID, policyID := insertCertPrereqsRaw(t, db, ctx, "expiring")
_, targets := insertAgentAndTargetsRaw(t, db, ctx, "expiring", 2)
// Two expiring certs (expires in 3 days). Threshold = 7 days → both selected.
certA := "mc-exp-a"
certB := "mc-exp-b"
expiresSoon := time.Now().Add(3 * 24 * time.Hour)
insertCertificateRow(t, db, ctx, certA, ownerID, teamID, issuerID, policyID, expiresSoon)
insertCertificateRow(t, db, ctx, certB, ownerID, teamID, issuerID, policyID, expiresSoon)
insertMapping(t, db, ctx, certA, targets[0])
insertMapping(t, db, ctx, certA, targets[1])
// certB has no mappings.
threshold := time.Now().Add(7 * 24 * time.Hour)
got, err := repo.GetExpiringCertificates(ctx, threshold)
if err != nil {
t.Fatalf("GetExpiringCertificates failed: %v", err)
}
found := map[string]*domain.ManagedCertificate{}
for _, c := range got {
found[c.ID] = c
}
a, ok := found[certA]
if !ok {
t.Fatalf("cert %s not in expiring list", certA)
}
if len(a.TargetIDs) != 2 || a.TargetIDs[0] != targets[0] || a.TargetIDs[1] != targets[1] {
t.Errorf("cert %s: TargetIDs = %v, want %v", certA, a.TargetIDs, []string{targets[0], targets[1]})
}
b, ok := found[certB]
if !ok {
t.Fatalf("cert %s not in expiring list", certB)
}
if b.TargetIDs == nil {
t.Errorf("cert %s: TargetIDs = nil, want empty slice", certB)
}
if len(b.TargetIDs) != 0 {
t.Errorf("cert %s: len(TargetIDs) = %d, want 0", certB, len(b.TargetIDs))
}
}
+285 -5
View File
@@ -4,6 +4,7 @@ import (
"context"
"database/sql"
"fmt"
"time"
"github.com/google/uuid"
"github.com/shankar0123/certctl/internal/domain"
@@ -237,7 +238,14 @@ func (r *JobRepository) UpdateStatus(ctx context.Context, id string, status doma
return nil
}
// GetPendingJobs returns jobs not yet processed of a specific type
// GetPendingJobs returns jobs not yet processed of a specific type.
//
// The SELECT uses FOR UPDATE SKIP LOCKED so that concurrent scheduler replicas
// cannot observe the same rows when invoked inside a transaction; combine with
// a subsequent UPDATE to Running for correct dispatch semantics. For the
// standard production dispatch path, prefer ClaimPendingJobs which wraps the
// lock, read, and state transition in a single transaction and is the
// authoritative race-free claim primitive (CWE-362 fix for H-6).
func (r *JobRepository) GetPendingJobs(ctx context.Context, jobType domain.JobType) ([]*domain.Job, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT id, type, certificate_id, target_id, agent_id, status, attempts, max_attempts,
@@ -245,6 +253,7 @@ func (r *JobRepository) GetPendingJobs(ctx context.Context, jobType domain.JobTy
FROM jobs
WHERE type = $1 AND status = $2
ORDER BY scheduled_at ASC
FOR UPDATE SKIP LOCKED
`, jobType, domain.JobStatusPending)
if err != nil {
@@ -268,10 +277,115 @@ func (r *JobRepository) GetPendingJobs(ctx context.Context, jobType domain.JobTy
return jobs, nil
}
// ListPendingByAgentID returns pending deployment jobs and AwaitingCSR jobs for a specific agent.
// Deployment jobs are matched by agent_id directly (set at creation time), with a fallback
// for legacy jobs where agent_id is NULL but target_id resolves to the agent via deployment_targets.
// AwaitingCSR jobs are matched through certificate → target mappings → agent ownership.
// ClaimPendingJobs atomically claims up to `limit` Pending jobs and transitions
// them to Running inside a single transaction. The SELECT uses FOR UPDATE SKIP
// LOCKED so concurrent scheduler replicas observe disjoint result sets — each
// row can be claimed by exactly one caller per tick (CWE-362 fix for H-6).
//
// Passing an empty jobType claims any type. Passing limit<=0 claims all
// available rows. The claimed rows are returned with Status already set to
// domain.JobStatusRunning.
//
// Downstream processors (ProcessRenewalJob, ProcessDeploymentJob) already call
// UpdateStatus(Running) unconditionally on entry, so this pre-flip is
// idempotent with respect to existing processing logic.
func (r *JobRepository) ClaimPendingJobs(ctx context.Context, jobType domain.JobType, limit int) ([]*domain.Job, error) {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return nil, fmt.Errorf("failed to begin claim transaction: %w", err)
}
// Rollback is a no-op after Commit — safe deferred cleanup if an error path
// triggers an early return before Commit().
defer func() { _ = tx.Rollback() }()
// Build the SELECT — jobType="" means any type, limit<=0 means unlimited.
query := `
SELECT id, type, certificate_id, target_id, agent_id, status, attempts, max_attempts,
last_error, scheduled_at, started_at, completed_at, created_at
FROM jobs
WHERE status = $1`
args := []interface{}{domain.JobStatusPending}
if jobType != "" {
query += ` AND type = $2`
args = append(args, jobType)
}
query += `
ORDER BY scheduled_at ASC
FOR UPDATE SKIP LOCKED`
if limit > 0 {
query += fmt.Sprintf(` LIMIT %d`, limit)
}
rows, err := tx.QueryContext(ctx, query, args...)
if err != nil {
return nil, fmt.Errorf("failed to query claimable jobs: %w", err)
}
var jobs []*domain.Job
for rows.Next() {
job, err := scanJob(rows)
if err != nil {
rows.Close()
return nil, err
}
jobs = append(jobs, job)
}
if err := rows.Err(); err != nil {
rows.Close()
return nil, fmt.Errorf("error iterating claimable job rows: %w", err)
}
rows.Close()
if len(jobs) == 0 {
// No rows to claim — commit the (read-only) tx and return.
if err := tx.Commit(); err != nil {
return nil, fmt.Errorf("failed to commit empty claim tx: %w", err)
}
return nil, nil
}
// Flip claimed rows to Running. Build IN clause safely with placeholders.
ids := make([]interface{}, len(jobs))
placeholders := make([]byte, 0, len(jobs)*5)
for i, job := range jobs {
ids[i] = job.ID
if i > 0 {
placeholders = append(placeholders, ',')
}
placeholders = append(placeholders, fmt.Sprintf("$%d", i+2)...)
}
updateQuery := fmt.Sprintf(
`UPDATE jobs SET status = $1 WHERE id IN (%s)`,
string(placeholders),
)
updateArgs := append([]interface{}{domain.JobStatusRunning}, ids...)
if _, err := tx.ExecContext(ctx, updateQuery, updateArgs...); err != nil {
return nil, fmt.Errorf("failed to transition claimed jobs to Running: %w", err)
}
if err := tx.Commit(); err != nil {
return nil, fmt.Errorf("failed to commit claim transaction: %w", err)
}
// Reflect the committed state in the returned objects.
for _, job := range jobs {
job.Status = domain.JobStatusRunning
}
return jobs, nil
}
// ListPendingByAgentID returns pending deployment jobs and AwaitingCSR jobs for
// a specific agent. Deployment jobs are matched by agent_id directly (set at
// creation time), with a fallback for legacy jobs where agent_id is NULL but
// target_id resolves to the agent via deployment_targets. AwaitingCSR jobs are
// matched through certificate → target mappings → agent ownership.
//
// The SELECT uses FOR UPDATE SKIP LOCKED so concurrent pollers (e.g. two agent
// instances running with the same agent_id) cannot observe the same rows when
// this method is invoked inside a transaction. For the production agent work
// poll path, prefer ClaimPendingByAgentID which additionally transitions
// claimed Pending deployment rows to Running atomically (H-6 CWE-362 fix).
func (r *JobRepository) ListPendingByAgentID(ctx context.Context, agentID string) ([]*domain.Job, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT id, type, certificate_id, target_id, agent_id, status, attempts, max_attempts,
@@ -326,6 +440,172 @@ func (r *JobRepository) ListPendingByAgentID(ctx context.Context, agentID string
return jobs, nil
}
// ClaimPendingByAgentID atomically claims agent work inside a single
// transaction. Pending Deployment jobs assigned to the agent (directly via
// agent_id, or via legacy target→agent fallback) are transitioned from
// Pending to Running. AwaitingCSR Renewal/Issuance jobs linked to the agent
// via certificate → target mappings are locked with FOR UPDATE SKIP LOCKED
// and returned without a state transition — the flow requires the agent to
// submit a CSR to advance state, and pre-flipping AwaitingCSR would violate
// the renewal state machine (CWE-362 fix for H-6).
//
// Claimed rows are invisible to other concurrent claim calls for the lifetime
// of the transaction; rows claimed as Running remain invisible after commit
// because ListPendingByAgentID's filter is status='Pending'.
func (r *JobRepository) ClaimPendingByAgentID(ctx context.Context, agentID string) ([]*domain.Job, error) {
tx, err := r.db.BeginTx(ctx, nil)
if err != nil {
return nil, fmt.Errorf("failed to begin agent claim transaction: %w", err)
}
defer func() { _ = tx.Rollback() }()
// Branch 1 + 2: Pending Deployment jobs (direct agent_id match or legacy
// target fallback). These get flipped to Running atomically below.
pendingRows, err := tx.QueryContext(ctx, `
SELECT id, type, certificate_id, target_id, agent_id, status, attempts, max_attempts,
last_error, scheduled_at, started_at, completed_at, created_at
FROM jobs
WHERE agent_id = $1 AND status = 'Pending' AND type = 'Deployment'
UNION ALL
SELECT j.id, j.type, j.certificate_id, j.target_id, j.agent_id, j.status, j.attempts, j.max_attempts,
j.last_error, j.scheduled_at, j.started_at, j.completed_at, j.created_at
FROM jobs j
INNER JOIN deployment_targets dt ON j.target_id = dt.id
WHERE j.agent_id IS NULL AND j.status = 'Pending' AND j.type = 'Deployment'
AND dt.agent_id = $1
ORDER BY created_at ASC
FOR UPDATE SKIP LOCKED
`, agentID)
if err != nil {
return nil, fmt.Errorf("failed to query pending deployment jobs for agent: %w", err)
}
var pendingJobs []*domain.Job
for pendingRows.Next() {
job, err := scanJob(pendingRows)
if err != nil {
pendingRows.Close()
return nil, err
}
pendingJobs = append(pendingJobs, job)
}
if err := pendingRows.Err(); err != nil {
pendingRows.Close()
return nil, fmt.Errorf("error iterating pending deployment rows: %w", err)
}
pendingRows.Close()
// Branch 3: AwaitingCSR jobs for this agent. Locked with FOR UPDATE SKIP
// LOCKED to prevent duplicate delivery to concurrent pollers, but state is
// NOT transitioned — the agent advances state via CSR submission.
csrRows, err := tx.QueryContext(ctx, `
SELECT j.id, j.type, j.certificate_id, j.target_id, j.agent_id, j.status, j.attempts, j.max_attempts,
j.last_error, j.scheduled_at, j.started_at, j.completed_at, j.created_at
FROM jobs j
WHERE j.status = 'AwaitingCSR'
AND j.type IN ('Renewal', 'Issuance')
AND EXISTS (
SELECT 1 FROM certificate_target_mappings ctm
INNER JOIN deployment_targets dt ON ctm.target_id = dt.id
WHERE ctm.certificate_id = j.certificate_id
AND dt.agent_id = $1
)
ORDER BY j.created_at ASC
FOR UPDATE SKIP LOCKED
`, agentID)
if err != nil {
return nil, fmt.Errorf("failed to query AwaitingCSR jobs for agent: %w", err)
}
var csrJobs []*domain.Job
for csrRows.Next() {
job, err := scanJob(csrRows)
if err != nil {
csrRows.Close()
return nil, err
}
csrJobs = append(csrJobs, job)
}
if err := csrRows.Err(); err != nil {
csrRows.Close()
return nil, fmt.Errorf("error iterating AwaitingCSR rows: %w", err)
}
csrRows.Close()
// Transition locked Pending deployments to Running before commit.
if len(pendingJobs) > 0 {
ids := make([]interface{}, len(pendingJobs))
placeholders := make([]byte, 0, len(pendingJobs)*5)
for i, job := range pendingJobs {
ids[i] = job.ID
if i > 0 {
placeholders = append(placeholders, ',')
}
placeholders = append(placeholders, fmt.Sprintf("$%d", i+2)...)
}
updateQuery := fmt.Sprintf(
`UPDATE jobs SET status = $1 WHERE id IN (%s)`,
string(placeholders),
)
updateArgs := append([]interface{}{domain.JobStatusRunning}, ids...)
if _, err := tx.ExecContext(ctx, updateQuery, updateArgs...); err != nil {
return nil, fmt.Errorf("failed to transition claimed deployment jobs to Running: %w", err)
}
}
if err := tx.Commit(); err != nil {
return nil, fmt.Errorf("failed to commit agent claim transaction: %w", err)
}
// Reflect the committed state in returned Pending deployment jobs; leave
// AwaitingCSR jobs untouched.
for _, job := range pendingJobs {
job.Status = domain.JobStatusRunning
}
// Preserve the legacy ordering: Pending deployments first, AwaitingCSR
// second. Callers that want a strict created_at merge can re-sort.
return append(pendingJobs, csrJobs...), nil
}
// ListTimedOutAwaitingJobs returns jobs stuck in AwaitingCSR or AwaitingApproval past
// their respective cutoff timestamps (created_at < cutoff). The reaper loop transitions
// them to Failed; I-001's retry loop then auto-promotes eligible Failed jobs back to
// Pending. I-003 coverage-gap closure.
func (r *JobRepository) ListTimedOutAwaitingJobs(ctx context.Context, csrCutoff, approvalCutoff time.Time) ([]*domain.Job, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT id, type, certificate_id, target_id, agent_id, status, attempts, max_attempts,
last_error, scheduled_at, started_at, completed_at, created_at
FROM jobs
WHERE (status = $1 AND created_at < $2)
OR (status = $3 AND created_at < $4)
ORDER BY created_at ASC
`, domain.JobStatusAwaitingCSR, csrCutoff, domain.JobStatusAwaitingApproval, approvalCutoff)
if err != nil {
return nil, fmt.Errorf("failed to query timed-out awaiting jobs: %w", err)
}
defer rows.Close()
var jobs []*domain.Job
for rows.Next() {
job, err := scanJob(rows)
if err != nil {
return nil, err
}
jobs = append(jobs, job)
}
if err := rows.Err(); err != nil {
return nil, fmt.Errorf("error iterating timed-out job rows: %w", err)
}
return jobs, nil
}
// scanJob scans a job from a row or rows
func scanJob(scanner interface {
Scan(...interface{}) error

Some files were not shown because too many files have changed in this diff Show More