Commit Graph

308 Commits

Author SHA1 Message Date
shankar0123 ccd89c348f fix(m2-pr-d): thread ctx through Job/Notification/Audit services
Collapse CancelJobWithContext into CancelJob; eliminate 10 context.Background()
hits across the Job+Notification+Audit service cluster by threading ctx
through their handler-facing service interfaces.

Services (ctx-first):
- service/job.go: ListJobs, GetJob, CancelJob, ApproveJob, RejectJob now
  accept ctx; the CancelJobWithContext wrapper is removed (handler callers
  continue to invoke CancelJob, now ctx-aware).
- service/notification.go: ListNotifications, GetNotification, MarkAsRead
  accept ctx.
- service/audit.go: ListAuditEvents, GetAuditEvent accept ctx.

Handlers (interface + callsites):
- handler/jobs.go, handler/notifications.go, handler/audit.go: local
  service interfaces updated, r.Context() threaded at every callsite.

Tests:
- Mock services updated to match the new interfaces (ctx accepted and
  ignored via '_ context.Context' first parameter; Fn closure fields
  unchanged).
- job_test.go / notification_test.go callsites thread context.Background()
  to match production shape.

Verification:
  go build ./...                 ok
  go vet ./...                   ok
  go test -short ./...           ok
  go test -race -short ./...     ok
  golangci-lint run ./...        0 issues

Locked decisions from the M-2 plan:
  D-1 ctx-only signatures (no dual forms)
  D-4 preserve handler method names facing the router
  D-5 domain types stay ctx-free

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:20:46 +00:00
shankar0123 478a141498 Merge branch 'fix/m2-pr-c-crud-cluster' 2026-04-18 01:10:10 +00:00
shankar0123 2497be496d M-2 PR-C: Collapse Policy/Profile/Owner/Team services to ctx-first signatures
- Add ctx first param to 21 service-layer handler-interface methods
  across policy.go (6), profile.go (5), owner.go (5), team.go (5)
- Replace 24 context.Background() call sites with received ctx; use
  context.WithoutCancel(ctx) for subsidiary audit-recording ops to
  preserve fire-and-forget audit semantics without inheriting caller
  cancellation
- Add ctx first param to 21 handler-interface method signatures across
  policies.go (6), profiles.go (5), owners.go (5), teams.go (5)
- Thread r.Context() through 21 HTTP handler sites (ListPolicies,
  GetPolicy, CreatePolicy, UpdatePolicy, DeletePolicy, ListViolations,
  ListProfiles, GetProfile, CreateProfile, UpdateProfile, DeleteProfile,
  ListOwners, GetOwner, CreateOwner, UpdateOwner, DeleteOwner,
  ListTeams, GetTeam, CreateTeam, UpdateTeam, DeleteTeam)
- Update MockPolicyService/MockProfileService/MockOwnerService/
  MockTeamService mock method impls with _ context.Context first param
  (Fn fields unchanged — closures do not need ctx); update mock impls
  in integration/lifecycle_test.go for all four services
- Update 12 service-layer test callsites (policy_test.go ×2,
  owner_test.go ×5, team_test.go ×5, profile_test.go ×13) to pass
  context.Background() at the call site

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 01:10:06 +00:00
shankar0123 25dd6c07f3 Merge branch 'fix/m2-pr-b-issuer-target' 2026-04-18 00:47:02 +00:00
shankar0123 eb14236166 M-2 PR-B: Collapse IssuerService + TargetService to ctx-first signatures
- Delete bare TestConnection wrapper in IssuerService; rename
  TestConnectionWithContext → TestConnection
- Delete TestTargetConnection delegate shim in TargetService (canonical
  TestConnection already ctx-first)
- Add ctx first param to 10 handler-interface methods
  (ListIssuers/GetIssuer/CreateIssuer/UpdateIssuer/DeleteIssuer and
  ListTargets/GetTarget/CreateTarget/UpdateTarget/DeleteTarget)
- Replace 16 context.Background() call sites with received ctx
- Thread r.Context() through 12 HTTP handler sites in issuers.go and
  targets.go (outer TargetHandler.TestTargetConnection HTTP method name
  preserved for router compatibility)
- Update MockIssuerService, MockTargetService, and mockTargetService
  (integration) for ctx-first forwarding; update test callsite literals

Audit complete. Commit: 1f6cf0eafa. Sections: 12. Findings: 2/7/10/4/6.
2026-04-18 00:46:58 +00:00
shankar0123 bbb628243f Merge branch 'fix/m2-pr-a-certificate-cluster' 2026-04-18 00:29:40 +00:00
shankar0123 cdc9d03d5b fix(m-2): thread context through CertificateService cluster
Collapses CertificateService, RevocationSvc, and CAOperationsSvc to
ctx-accepting method signatures. Removes context.Background() synthesis
at 24 internal call sites across certificate.go, revocation_svc.go, and
ca_operations.go.

- Primary repo calls inherit request cancellation via the passed ctx.
- Audit and notification dispatches use context.WithoutCancel(ctx) so
  they survive client disconnect.
- Collapses TriggerRenewal/TriggerRenewalWithActor,
  TriggerDeployment/TriggerDeploymentWithActor, and
  RevokeCertificate/RevokeCertificateWithActor sibling pairs into single
  canonical ctx-accepting methods (decisions D-1, D-2).

Handlers pass r.Context(). Mocks and tests updated to match new
signatures. No HTTP surface change, no OpenAPI change.

PR 1 of 6 in the M-2 remediation chain. Master green at this commit.

Refs: certctl-audit-report.md M-2 (L143, L224)
2026-04-18 00:29:37 +00:00
shankar0123 e951d319d0 Merge branch 'fix/m1-audit-shutdown-drain'
Resolves M-1 (Medium): Audit recorder shutdown drain.

The API audit middleware's detached recording goroutines now drain
during graceful shutdown via AuditMiddleware.Flush (sync.WaitGroup +
timeout-aware select), called between http.Server.Shutdown and
db.Close. Prevents silent audit-event loss on SIGTERM
(CWE-662 / CWE-400).
2026-04-17 17:29:54 +00:00
shankar0123 d14a45401b fix(audit): drain in-flight recording goroutines on shutdown (M-1)
Audit events spawned from the HTTP middleware ran in detached goroutines
using context.Background(). On SIGTERM the DB pool was closed before
those goroutines finished writing, silently dropping audit events
(CWE-662 Improper Synchronization / CWE-400 Uncontrolled Resource
Consumption).

NewAuditLog now returns an *AuditMiddleware struct that tracks every
spawned goroutine with sync.WaitGroup. Callers wire the middleware via
its Middleware method value (preserves the existing
func(http.Handler) http.Handler shape) and drain the WaitGroup with
Flush(ctx), which blocks until in-flight recordings complete or the
provided context is cancelled — mirroring scheduler.WaitForCompletion.

Flush is invoked in cmd/server/main.go between http.Server.Shutdown
(no new requests accepted) and db.Close (pool torn down), with a
timeout returning ErrAuditFlushTimeout wrapping ctx.Err().

Request-derived inputs (method, path, status) are snapshotted before
the goroutine spawn so the worker does not race with http.Server
reusing r after the handler returns.

Tests:
  TestAuditLog_FlushDrainsInFlightGoroutines
  TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout

Verification:
  go build ./...                            : 0
  go vet ./...                              : 0
  go test -race -short ./...                : 0 (all packages)
  go test -cover ./internal/api/middleware  : 81.4%
  golangci-lint run                         : 0 issues
  govulncheck ./...                         : 0 vulns in called code
2026-04-17 17:29:48 +00:00
shankar0123 655e2879e6 feat(frontend): add Owner field to OnboardingWizard Certificate step
The first-run onboarding wizard's Certificate step now surfaces an
Owner dropdown (required) alongside Issuer and Profile, matching the
ownership model introduced in M11b. Prevents newly-created certs from
being unowned and bypassing notification routing.

- web/src/pages/OnboardingWizard.tsx: getOwners query, ownerId state,
  Owner <select>, required-field guard (nextDisabled), empty-state link
  to /owners page when no owners exist yet.

Frontend-only change; no backend wiring or schema impact. Separated
from the M-6 sentinel-agent idempotency commit per scope-guard.
2026-04-17 16:55:44 +00:00
shankar0123 e757ef1471 Merge branch 'fix/m6-sentinel-idempotent-create'
Resolves M-6 (Medium): swallowed sentinel agent INSERT errors.
CWE-662 / CWE-209-adjacent.

Shape A: CreateIfNotExists helper + 4 sentinel call sites.
2026-04-17 16:32:12 +00:00
shankar0123 27afa4463d fix(repository): idempotent sentinel agent creation via ON CONFLICT (M-6)
Sentinel agents (server-scanner, cloud-aws-sm, cloud-azure-kv,
cloud-gcp-sm) were created on startup with a plain INSERT whose
duplicate-key error was swallowed unconditionally. That silenced every
other DB failure too (connectivity drop, permissions change, unrelated
constraint violation) — a restart after the first boot quietly
de-fanged cloud discovery and the network scanner (CWE-662, CWE-209-
adjacent).

Shape A: add AgentRepository.CreateIfNotExists using ON CONFLICT (id)
DO NOTHING RETURNING id + sql.ErrNoRows discrimination. This keeps the
strict Create semantics (duplicate-key is an error) intact for real
agent registration and gives sentinels their own idempotent path.

- repo: CreateIfNotExists returns (created bool, err error); false,nil
  on pre-existing row; false,wrapped err on anything else.
- interface: CreateIfNotExists added to AgentRepository.
- main.go: 4 sentinel sites log Error/Info/Debug distinctly.
- mocks: service + integration mocks implement the new method.
- tests: 4 new testcontainers integration tests cover first-insert,
  idempotent second-call, concurrent 16-goroutine race (exactly one
  creator, no duplicate-key panic), and pre-cancelled context
  surfacing.

Coverage gates (go test -cover): service 67.6%/55, handler 78.6%/60,
domain 92.7%/40, middleware 80.0%/30, crypto 86.7%/85. Race/vet/
golangci-lint v2.11.4 (0 issues)/govulncheck v1.2.0 clean across all
touched packages.
2026-04-17 16:32:07 +00:00
shankar0123 80450c7180 fix(repository): populate TargetIDs in certificate scan helper (M-7)
scanCertificate never queried the certificate_target_mappings junction
table, so Certificate.TargetIDs was always nil on reads. This silently
broke deployment lookups, bulk revocation filters, cert detail pages,
and any code path that iterated TargetIDs to dispatch target work.

Fix:
- Convert scanCertificate to a receiver method (r *CertificateRepository)
  so it has access to the DB for the secondary junction query.
- Get(): scan the row, then call r.getTargetIDs(ctx, certID) to populate
  TargetIDs with a single targeted query.
- List() and GetExpiringCertificates(): inline the scan loop so we can
  collect all certIDs first, then call getTargetIDsForCertificates once
  with pq.Array(certIDs) to avoid N+1 round-trips. Build a map and
  attach TargetIDs to each certificate in the result set.
- Default TargetIDs to []string{} (not nil) when a cert has no mappings
  so JSON marshals as [] rather than null.

Tests:
- New integration test file certificate_targetids_test.go with 5
  subtests exercising Get / List / GetExpiringCertificates single
  and multi-target cases plus the empty-slice vs nil contract.
- Uses the shared testcontainers-go setupTestDB infrastructure and
  skips under 'go test -short' so CI (which excludes ./internal/repository/...
  from coverage paths anyway) stays green.

Addresses M-7 from certctl-audit-report.md.
2026-04-17 15:41:08 +00:00
shankar0123 c655e0f8c5 fix(crypto/local-ca): reject expired or not-yet-valid sub-CA certificates on disk load (M-5)
loadCAFromDisk now validates the upstream sub-CA certificate's NotBefore
and NotAfter fields before accepting it, returning a fail-closed error
at server startup instead of silently loading an out-of-window CA.

Before this fix, loadCAFromDisk checked BasicConstraints.IsCA and
KeyUsage=CertSign but not the validity window. An expired enterprise
sub-CA (e.g. an ADCS subordinate whose rollover slipped) would load
without warning and the scheduler would mint child certs that every
RFC 5280 path validator rejects — outages show up at relying parties,
not at certctl, and only after thresholds trip.

CWE-672 (Operation on a Resource after Expiration or Release); secondary
CWE-295 (Improper Certificate Validation). Error strings include the CA
subject CommonName and both RFC3339 timestamps so the log line is
actionable in a 3am incident.

Tests: TestSubCAMode gains three subtests exercising the new gate —
SubCA_ExpiredCert_IsRejected (CA expired 1h ago → error mentions
'expired' and the CN), SubCA_NotYetValid_IsRejected (CA valid +1h →
error mentions 'not yet valid' and the CN), and SubCA_BarelyValid_IsAccepted
(CA valid [now-1m, now+1h] → issuance succeeds, proving no
over-rejection). Adds generateTestSubCAWithValidity helper; the
original generateTestSubCA wrapper preserves the [now, now+5y] default
for existing tests.

Package coverage: 67.7% -> 68.3%.

Verification: go build, go vet, go test -race, go test -cover all
green locally; golangci-lint v2.11.4 clean; govulncheck clean. All CI
coverage floors met with margin (service 67.6/55, handler 78.6/60,
domain 92.7/40, middleware 80.0/30, crypto 86.7/85).

Parent: 5abeeb8 (M-8 per-ciphertext salt).
Closes: audit finding M-5 in certctl-audit-report.md.
2026-04-17 14:10:23 +00:00
shankar0123 5abeeb882b fix(crypto): per-ciphertext PBKDF2 salt + v2 versioned format with v1 fallback (M-8) 2026-04-17 05:36:29 +00:00
shankar0123 b1df6dab27 ci(release): add CLI/MCP binaries, checksums, SBOM, Cosign, SLSA provenance (M-3) 2026-04-17 04:04:55 +00:00
shankar0123 672e1d991d build: propagate HTTP_PROXY/HTTPS_PROXY/NO_PROXY through Docker build (M-4, Issue #9)
Addresses Medium finding M-4 in the audit report. The multi-stage
Dockerfiles previously had no ARG declarations for HTTP_PROXY,
HTTPS_PROXY, or NO_PROXY, so corporate-proxy environments silently
failed at 'npm ci' (frontend stage) and 'go mod download' (Go builder).
The npm retry idiom (`npm ci --include=dev || npm ci --include=dev`)
masked the failure because the upstream 'Exit handler never called!'
bug exits 0 despite the install crash.

Fix: thread HTTP_PROXY / HTTPS_PROXY / NO_PROXY ARGs through every
Docker build stage that performs network I/O, re-export them as ENV
with both upper- and lower-case aliases (apk/curl/npm read lowercase;
Go/Node read uppercase), and forward the host shell's environment via
`build.args:` in every compose file and `build-args:` in the release
workflow's docker/build-push-action steps. Defaults are empty strings
so un-proxied builds remain byte-identical to the pre-fix tree.

Scope: Dockerfile (frontend + Go builder stages), Dockerfile.agent
(Go builder stage), deploy/docker-compose.yml (server + agent),
deploy/docker-compose.dev.yml (server + agent), deploy/docker-compose.test.yml
(server + agent), .github/workflows/release.yml (both docker/build-push-action
v6 invocations). Zero Go, web, test, or runtime code changes. Zero
base-image changes. Existing npm `||` retry idiom and `ARG TARGETARCH`
preserved verbatim.

CWE-1173 (Improper Use of Validated Input) / CWE-16 (Configuration).

Verification:
- YAML parses clean across all four compose files and release.yml.
- yamllint -d relaxed: clean exit across all five YAML files.
- All six `build.args:` blocks expose HTTP_PROXY, HTTPS_PROXY, NO_PROXY
  with default-empty ${VAR:-} substitution.
- Both release.yml docker/build-push-action steps expose the same
  three keys sourced from ${{ secrets.HTTP_PROXY }}, etc.
- Dockerfiles contain 5 proxy ARG declarations total (Dockerfile has 2
  stages × 3 ARGs = 6 lines, Dockerfile.agent has 1 stage × 3 ARGs = 3
  lines); lowercase ENV aliases verified present in every stage.
- git diff --shortstat: 6 files changed, 117 insertions(+), 0 deletions.
  Pure additive.

Docker-live verification (`docker build`, `docker compose config`)
deferred to CI / post-commit smoke because the sandbox has no Docker
runtime. hadolint, go, golangci-lint, govulncheck likewise unavailable
in the sandbox; per-layer CI coverage gates (service 55%, handler 60%,
domain 40%, middleware 30%) are trivially unaffected as M-4 touches
zero Go source files.
v2.0.41
2026-04-17 03:12:45 +00:00
shankar0123 89b910a8f1 security: atomic pending-job claim with FOR UPDATE SKIP LOCKED (H-6)
Fixes H-6 (CWE-362) — GetPendingJobs returned pending rows without row
locks, so two scheduler replicas in an HA deployment could both read the
same row, both decide it was theirs, and race on UpdateStatus, producing
duplicate Running jobs and duplicate certificate issuances.

Remediation: a claim-style repository API that selects + transitions
Pending -> Running in one transaction with SELECT ... FOR UPDATE SKIP
LOCKED. Concurrent claimants observe disjoint row sets; no worker ever
sees another worker's claimed row.

Repository changes (internal/repository/postgres/job.go):
  - New ClaimPendingJobs(ctx, jobType, limit): BEGIN; SELECT id,...
    FROM jobs WHERE status='Pending' (optional type filter, optional
    LIMIT) FOR UPDATE SKIP LOCKED; UPDATE jobs SET status='Running',
    updated_at=NOW() WHERE id = ANY($ids); COMMIT. Returns the claimed
    rows with status already flipped.
  - New ClaimPendingByAgentID(ctx, agentID): mirrors M31 UNION ALL
    semantics (direct agent_id match, target->agent JOIN fallback,
    certificate->target->agent chain for AwaitingCSR) but wraps each
    branch in FOR UPDATE SKIP LOCKED and flips Deployment/Renewal rows
    to Running. AwaitingCSR rows are returned in place (state
    transition deferred until SubmitCSR, consistent with M8 semantics).
  - Existing GetPendingJobs / ListPendingByAgentID retained for legacy
    compatibility; their godoc now directs production callers to the
    Claim* variants.

Production caller switches:
  - internal/service/job.go ProcessPendingJobs: ListByStatus(Pending)
    -> ClaimPendingJobs(ctx, "", 0). Eliminates the real scheduler
    race between two replicas tick-firing simultaneously.
  - internal/service/agent.go GetPendingWork: ListPendingByAgentID ->
    ClaimPendingByAgentID. Eliminates the race between two pollers
    for the same agent (e.g. brief network blip causing duplicate
    poll) and between a scheduler tick and an agent poll.

Safety argument for pre-flipping Pending -> Running inside the claim
transaction: ProcessRenewalJob and ProcessDeploymentJob both call
UpdateStatus(Running) unconditionally on entry, so an early flip is
idempotent. On panic, the scheduler's panic recovery leaves the job
in Running which the existing stale-running reaper handles.

Tests (internal/repository/postgres/repo_test.go, skipped in -short):
  - TestJobRepository_ClaimPendingJobs_FlipsToRunning: seed 5 Pending,
    claim once, assert all 5 returned + DB rows Running, residual
    claim returns 0.
  - TestJobRepository_ClaimPendingJobs_ConcurrentDisjoint: seed M=40
    Pending Renewals, spawn N=8 goroutines each calling
    ClaimPendingJobs(_, JobTypeRenewal, 1) in a loop. Invariants:
    (a) no job ID claimed by more than one worker, (b) sum of claims
    == 40, (c) all 40 rows in Running state in the DB. Bounded
    empty-streak guard (20 iterations) covers SKIP LOCKED transient
    zeros under contention.
  - TestJobRepository_ClaimPendingByAgentID_TransitionsDeployments:
    seeds 2 Pending Deployment + 1 AwaitingCSR for agent A plus 1
    Pending Renewal for agent B (scope check). Asserts deployments
    flip to Running, AwaitingCSR is returned but preserved, agent B's
    renewal never appears.

Mock updates: testutil_test.go, lifecycle_test.go, verification_test.go
gained ClaimPendingJobs/ClaimPendingByAgentID on their mock job repos
mirroring the real Pending -> Running semantics. Mocks intentionally
do NOT write to StatusUpdates (that map tracks UpdateStatus() call
history specifically; the real claim path uses a bulk UPDATE, not
UpdateStatus).

Verification (CI-scope):
  - go build ./cmd/...: ok
  - go vet ./...: ok
  - go test -race -short on service, api/handler, api/middleware,
    scheduler, connector/..., domain, validation, tlsprobe: ok
  - Coverage gates: service 67.6% (>=55), handler 78.6% (>=60),
    middleware 80.0% (>=30), domain 92.7% (>=40). All hold.
  - golangci-lint 2.11.4: 0 issues
  - govulncheck: no vulnerabilities in call graph
  - Frontend: tsc clean, 218 vitest tests pass, vite build ok
  - helm lint + helm template: ok
  - Invariant sweeps: FOR UPDATE SKIP LOCKED present in job.go;
    H-1 through H-5 fixtures unchanged.

Refs: H-6 in certctl-audit-report.md
2026-04-17 02:34:56 +00:00
shankar0123 6315ef102a security(globalsign): remove InsecureSkipVerify and pin CA pool (H-5)
The GlobalSign Atlas HVCA connector previously used InsecureSkipVerify:true
on its mTLS TLS config, disabling server certificate validation and
defeating the purpose of the client-side mTLS handshake. This was a
CWE-295 Improper Certificate Validation vulnerability silently degrading
trust on every production call to GlobalSign's signing API.

Remediation (per H-5 audit finding, Lens 4.4):

- Remove InsecureSkipVerify from all three http.Client construction sites
  (ValidateConfig, getHTTPClient, and legacy initialisation path).
- Introduce buildServerTLSConfig() helper that constructs tls.Config with
  MinVersion: tls.VersionTLS12 (addresses adjacent L-1 recommendation).
- New optional config field `server_ca_path` (env:
  CERTCTL_GLOBALSIGN_SERVER_CA_PATH). When unset the connector trusts the
  system root CA bundle (correct default for GlobalSign's publicly-trusted
  HVCA endpoints). When set the bundle is loaded via x509.NewCertPool() +
  AppendCertsFromPEM, and only those roots are trusted (supports private
  HVCA deployments and defence-in-depth root pinning).
- Error wrapping chain: "failed to read server CA bundle at %s" and
  "no valid PEM certificates found in server CA bundle at %s" surface
  config problems at ValidateConfig time instead of silently failing at
  request time.

Docs, config, service env-seed, and GUI issuer type definition updated to
expose the new field. Tests: 9 dead `InsecureSkipVerify: true` client
TLSClientConfig blocks (no-ops against httptest.NewServer plain-HTTP)
replaced with bare http.Client; new TestGlobalSign_ServerTLSConfig covers
pinned-CA trust, untrusted-server rejection, missing-file and invalid-PEM
error paths.

Verification:
- go build ./... clean
- go vet ./... clean
- go test -race ./internal/connector/issuer/globalsign/... ./internal/config/... ./internal/service/... ok
- go test ./... (excluding testcontainers-gated repo layer) ok
- golangci-lint run ./... 0 issues
- govulncheck ./... 0 reachable vulns
- Per-layer coverage: service 68.7% (≥55), handler 83.6% (≥60), domain 82.0% (≥40), middleware 63.8% (≥30)
- globalsign package coverage: 75.9%
- Invariant sweep: 0 InsecureSkipVerify references remain in globalsign
  package (only a test-file comment documenting the removal).
2026-04-17 01:40:58 +00:00
shankar0123 119986fa7e security: add SSRF defence-in-depth for webhook notifier (fixes H-4)
The webhook notifier would previously accept any operator-configured URL
and hand it to http.Client without validation. That exposed two
SSRF classes (CWE-918):

  * Reserved-address reachability — a misconfigured or adversarial
    webhook URL pointing at 127.0.0.1, ::1, 169.254.169.254 (cloud
    metadata), or 0.0.0.0 would succeed, exfiltrating request bodies
    to local services or leaking short-lived cloud credentials.
  * DNS rebinding — a hostname resolving to a public IP at validation
    time and to a reserved IP at dial time would bypass any
    URL-string-only check.

Fix installs two independent layers:

  * validation.ValidateSafeURL runs at config-ingest time and before
    every outbound POST. It rejects non-HTTP(S) schemes, empty hosts,
    and literal reserved-IP hosts with a clear operator-facing error.
    This is a fast early diagnostic.
  * validation.SafeHTTPDialContext is installed on the webhook
    http.Transport. It re-resolves the host at dial time, rejects any
    resolved address whose address lies in a reserved range (loopback,
    link-local, multicast, broadcast, unspecified, IPv6
    link-local/multicast), and pins the resolved IP into the final
    dial address so the TLS handshake targets the exact IP the guard
    approved. This is the authoritative, TOCTOU-safe defence against
    DNS rebinding.

The two layers are complementary — validateURL fails fast on obvious
misconfiguration; SafeHTTPDialContext fails closed when DNS changes
between validation and dial.

The existing unexported isReservedIP helper in
internal/service/network_scan.go is extracted into
internal/validation.IsReservedIP with byte-identical behaviour so the
webhook notifier and the network scanner share a single authoritative
reserved-address list. RFC 1918 ranges remain intentionally allowed
(certctl's self-hosted design). Broader unspecified / IPv6 link-local
coverage lives only in the stricter dial-time policy, where it belongs
for outbound HTTP egress.

Test seam: Connector gains an unexported validateURL func field and a
same-package newForTest constructor that installs a permissive
validator and the stdlib default transport. Production callers cannot
reach this constructor because it is unexported; only same-package
tests (package webhook) can use it. Same-package happy-path tests call
newForTest so they can point at httptest loopback servers without
being blocked by the production guard. The four SSRF-rejection tests
that verify the guard itself still call New so they exercise the real,
strict validator. This keeps the production SSRF defence
unconditionally on in real code while preserving legitimate unit-test
coverage.

Tests
-----
  * internal/validation/ssrf_test.go (new) — 16-subtest pin on
    IsReservedIP that is byte-identical with the original network-
    scanner behaviour; ValidateSafeURL accept/reject matrix covering
    HTTPS/HTTP, reserved-literal IPv4/IPv6, dangerous schemes
    (file/gopher/ftp/javascript/data/ldap/dict/jar), missing hosts,
    and malformed inputs; SafeHTTPDialContext rejects literal reserved
    addresses and hosts resolving to reserved addresses (DNS-rebinding
    coverage via localhost).
  * internal/connector/notifier/webhook/webhook_test.go — happy-path
    tests switched to newForTest; production-guard SSRF-rejection
    tests (TestValidateConfig_RejectsReservedURLs,
    TestValidateConfig_RejectsDangerousScheme,
    TestPostWebhook_RejectsReservedURL,
    TestPostWebhook_RejectsDangerousScheme) continue to call New so
    they exercise the unconditionally-installed production validator.

Wire-format invariants preserved
--------------------------------
  * Outbound HTTP request shape (method, headers, body, HMAC
    signature) unchanged.
  * network_scan.go behaviour unchanged — validation.IsReservedIP is
    byte-identical with the deleted helper.
  * RFC 1918 (10/8, 172.16/12, 192.168/16) remain allowed for both
    outbound webhook and CIDR expansion, matching the self-hosted
    design.

Verification
------------
  * go test -race ./internal/validation/... ./internal/connector/
    notifier/webhook/... ./internal/service/... — green.
  * Full-suite go test -race ./... — green (GOTMPDIR=/dev/shm to
    sidestep full /tmp on the sandbox host).
  * Coverage gates pass: service 68.8% >= 55%, handler 83.6% >= 60%,
    domain 82.0% >= 40%, middleware 63.8% >= 30%. Overall 67.8%.
    Webhook package 91.5% line coverage; validation package
    ValidateSafeURL/SafeHTTPDialContext 78-100% per function.
  * govulncheck ./... — no vulnerabilities found.
  * golangci-lint run on touched H-4 production code — clean. Pre-
    existing errcheck/gosimple warnings in scope-adjacent files
    (webhook_test.go:270 w.Write, network_scan.go:120/173/265/305)
    verified against 3853b74 to predate this commit; left alone per
    scope guard.

Operational notes
-----------------
  * No migration needed. The guard is pure Go code; existing webhook
    configs continue to work unless they point at reserved addresses,
    in which case they now fail closed with a clear error.
  * Existing operators who rely on webhook POST to 127.0.0.1 or
    ::1 (e.g., local receivers on the same host as certctl-server)
    must expose their receiver on an RFC 1918 address or public IP.
    This is deliberate — the threat model for webhook notifiers
    includes untrusted operator-supplied URLs.

Scope guard: H-4 only. H-5, H-6, M-*, L-*, and I-* findings remain
open and are tracked separately. No drive-by refactors.
2026-04-17 00:34:47 +00:00
shankar0123 3853b7460c security: reject CRLF/NUL in email headers to prevent SMTP injection (fixes H-3)
H-3 in certctl-audit-report.md: caller-supplied From/To/Subject were
interpolated directly into the SMTP DATA payload and handed to
client.Mail / client.Rcpt with no sanitization, allowing an attacker
who controls any of those values to inject extra headers (Bcc:,
Reply-To:), split the message body (CRLFCRLF), or tamper with the
SMTP envelope. CWE-113.

Fix:
- New package helper internal/validation.ValidateHeaderValue(field,
  value). Rejects CR ("\r"), LF ("\n"), and NUL ("\x00") with an error
  that names the offending field but does NOT echo the raw value,
  so log readers cannot be attacked with injected content. Silent
  stripping was considered and rejected: authentication-relevant
  headers must fail visibly.
- Two-layer defense in internal/connector/notifier/email/email.go:
    (1) primary guard at the top of sendEmail / sendHTMLEmail, which
        blocks tampering of the SMTP envelope (client.Mail, client.Rcpt)
        since net/smtp does not sanitize those arguments; and
    (2) defense-in-depth guard inside formatEmailMessage /
        formatHTMLEmailMessage, catching any future caller that
        bypasses sendEmail. Both format functions now return an error.
- Body content is intentionally NOT validated — CR/LF in body is legal
  RFC 5322 content and net/smtp handles dot-stuffing.

Tests:
- internal/validation/headers_test.go: 3 functions (AcceptsSafeInput,
  RejectsControlCharacters, DefaultFieldName) covering plain ASCII,
  UTF-8 multibyte, tabs, typical email addresses, CRLF injection,
  lone CR, lone LF, NUL, CRLFCRLF body split, trailing CR, leading LF.
  Each reject case asserts the field name IS in the error and the
  raw offending value IS NOT (anti-log-injection).
- internal/connector/notifier/email/email_test.go: added
  TestEmail_FormatEmailMessage_RejectsCRLFInjection and
  TestEmail_FormatHTMLEmailMessage_RejectsCRLFInjection. Existing
  format tests updated for the new (bytes, error) signature.

Wire-format invariants preserved:
- SMTP DATA headers still use CRLF separators and RFC 1123Z Date
  (unchanged).
- Content-Type headers unchanged (text/plain for plain, text/html +
  MIME-Version: 1.0 for HTML).
- No change to message encoding or transport.

Verification (Go 1.25.9 linux-arm64, parent e9947dc):
- go build ./...                                 clean
- go vet ./...                                   clean
- go test -race ./internal/validation/...        ok
- go test -race ./internal/connector/notifier/email/...   ok
- go test -race ./internal/connector/notifier/webhook/... ok
- Per-layer coverage gates all pass:
    validation  95.1% (+0.7 vs baseline 94.4%)
    email       39.7% (+1.4 vs baseline 38.3%)
    service     67.8% (unchanged)
    handler     78.6% (unchanged)
    middleware  80.0% (unchanged)
    domain      92.7% (unchanged)
- govulncheck ./...                              No vulnerabilities found
- golangci-lint run ./internal/validation/... ./internal/connector/notifier/email/...
                                                 0 issues

Operational note: SMTP sends that would previously deliver a
tampered message now fail fast at the notifier with a clear error.
Operators who were relying on header-injection-shaped inputs (there
should be none in practice — all callers are internal certctl code)
will see "failed to format message: <field> contains disallowed
control character" in logs.

Scope: H-3 only. H-4 (webhook SSRF) follows in a separate commit.
2026-04-17 00:08:20 +00:00
shankar0123 e9947dc0fe docs: redact V3 feature specifics from README (fixes H-7)
Problem
-------
H-7 (CWE-200 / information disclosure, strategic-policy class): the
public README's V3 section enumerated the paid-tier feature set --
"Role-based access control with profile-gating", "Event-driven
architecture with real-time operational views", "Advanced search",
"compliance scoring", "HSM/TPM integration" -- violating the
CLAUDE.md directive "Keep V3+ deliberately vague -- one-liner
descriptions only. Don't telegraph the paid feature set." The prior
wording also carried factual drift: `compliance scoring` was pulled
forward to V2.2 per the V2.2 Roadmap, so pairing it with V3 in the
README misrepresented the open-core line.

Fix
---
Replace the two-sentence enumeration at README.md:322-323 with a
single deliberately-vague sentence:

  Enterprise capabilities for larger deployments are available in
  the commercial tier.

No named features. No SKU enumeration. Matches the policy one-liner
shape used in neighboring V1 / V2 / V4+ sections. Net -1 line of
prose.

Files
-----
  README.md                          1 -, 1 +

Wire-format invariants preserved
--------------------------------
This is a docs-only change. All protocol surfaces are byte-identical:
  - RFC 7030 EST handler (internal/api/handler/est.go) -- untouched
  - RFC 8894 SCEP handler (internal/api/handler/scep.go) -- untouched
  - Shared internal/pkcs7/ package -- untouched
  - H-1 revocation composite key (migration 000012) -- untouched
  - H-2 SCEP challenge-password preflight + PKCSReq guard -- untouched
  - C-2 AES-256-GCM config encryption contract -- untouched
  - CRL DER bytes, OCSP response bytes -- untouched

Verification
------------
  git diff 387fb55 HEAD -- internal/ cmd/ migrations/ api/ deploy/
    -> 0 code changes (only README.md modified after H-1)

Operational note
----------------
No behavioral change. Product positioning only. The V3 feature set
itself remains documented in the gitignored roadmap.md / strategy.md,
which are the intended sources of truth for the paid tier.

Audit report: see /Users/shankar/Desktop/cowork/certctl-audit-report.md
2026-04-16 23:46:37 +00:00
shankar0123 b813660c74 security: require SCEP challenge password when SCEP enabled (fixes H-2)
Problem (CWE-306 Missing Authentication for Critical Function):
internal/service/scep.go PKCSReq skipped the shared-secret check when
s.challengePassword was empty. An unconfigured-but-enabled SCEP server
accepted any unauthenticated client reaching /scep and issued a
certificate against the configured issuer for any CSR with a valid
signature. No audit trail distinguished authenticated from
unauthenticated enrollments. This matches the two-layer fail-closed
pattern already used for C-2 (f549a7a): reject at startup AND reject
at the service boundary.

Fix (two layers, defense-in-depth):

Layer 1 — startup pre-flight in cmd/server/main.go:
  preflightSCEPChallengePassword returns a non-nil error when SCEP is
  enabled and CERTCTL_SCEP_CHALLENGE_PASSWORD is empty. main logs and
  os.Exit(1)s before the SCEP service is constructed. Disabled SCEP is
  unaffected. The helper is unit-testable in isolation.

Layer 2 — service-layer rejection in internal/service/scep.go:
  PKCSReq refuses enrollment when s.challengePassword == "" even though
  main already blocks this state — protects future call sites (tests,
  library reuse, a REST-over-HTTPS wrapper). When a secret is
  configured, the comparison now uses crypto/subtle.ConstantTimeCompare
  so response time does not leak the configured secret through a
  short-circuiting byte compare.

Files:
- cmd/server/main.go: preflightSCEPChallengePassword helper; call site
  inside the `if cfg.SCEP.Enabled` block before issuer lookup; fatal
  slog error references CWE-306 and names the env var so operators can
  diagnose the startup failure without reading code.
- cmd/server/main_test.go: TestPreflightSCEPChallengePassword with five
  table-driven subtests (disabled empty, disabled set, enabled empty
  rejected, enabled set, single-char boundary). The enabled-empty case
  asserts the error string contains both CERTCTL_SCEP_CHALLENGE_PASSWORD
  and CWE-306 so the log message remains actionable.
- internal/config/config.go: SCEPConfig.ChallengePassword godoc now
  states the field is REQUIRED when SCEP.Enabled and cross-references
  preflightSCEPChallengePassword.
- internal/service/scep.go: imports crypto/subtle; PKCSReq rewritten
  with the two-layer check; comment block cites H-2 / CWE-306 and the
  constant-time rationale.
- internal/service/scep_test.go: existing tests that relied on the
  vulnerable empty-password path now configure a secret on both sides.
  TestSCEPService_PKCSReq_ChallengePassword_NotRequired is replaced by
  TestSCEPService_PKCSReq_ChallengePassword_EmptyServerConfigRejected
  which iterates ["", "any-value", "guess"] against an unconfigured
  server and asserts "not configured" in the error. A new
  TestSCEPService_PKCSReq_ChallengePassword_ConstantTimeLengthIndependence
  exercises same-prefix-longer and wrong-case inputs to guard against a
  regression from ConstantTimeCompare to a short-circuiting byte compare.
- internal/service/m11c_crypto_enforcement_test.go: four tests
  (RejectsWeakKey, AcceptsStrongKey, MaxTTL_ForwardedToIssuer,
  NoProfileRepo_PassesThrough) constructed NewSCEPService with an empty
  challenge password and exercised PKCSReq through the now-rejected
  vulnerable path. All four now configure "secret123" on both sides with
  an inline H-2 comment; the crypto/MaxTTL/profile behavior they assert
  is unchanged.

Wire-format / behavioral invariants preserved:
- RFC 8894 SCEP handler is untouched (internal/api/handler/scep.go and
  internal/pkcs7/*): GetCACaps/GetCACert responses, PKIOperation request
  parsing, and the PKCS#7 certs-only response format are byte-identical.
- RFC 7030 EST handler is untouched
  (internal/api/handler/est.go + internal/pkcs7/*).
- Revocation idempotency composite key (H-1, migration 000012) untouched.
- AES-256-GCM config encryption (C-2) untouched.
- CRL DER bytes and OCSP response bytes unchanged.

Verification:
- go build ./...              silent success
- go vet ./...                silent success
- go test -race -count=1 ./internal/service/ ./cmd/server/
  ./internal/api/handler/ ./internal/integration/    all OK
- Coverage with comfortable headroom over CI gates:
    service     67.8% (gate 55%)
    handler     79.0% (gate 60%)
    domain      92.7% (gate 40%)
    middleware  80.0% (gate 30%)
    cmd/server  1.6%  (preflightSCEPChallengePassword: 100%)
  internal/service/scep.go PKCSReq statement coverage: 100%.
- rg sweeps: no `s.challengePassword != ""` remains;
  no `challengePassword != s.challengePassword` remains.

Operational note: operators with SCEP enabled but no challenge password
set will see a fatal startup error and a log line citing
CERTCTL_SCEP_CHALLENGE_PASSWORD and CWE-306 after upgrading. This is the
intended fail-closed behavior. Fix by either setting the env var to a
non-empty shared secret or setting CERTCTL_SCEP_ENABLED=false.

Audit report: certctl-audit-report.md (revision 5) logs this under
H-2 Resolution Log.
2026-04-16 22:22:51 +00:00
shankar0123 387fb555ac security: scope revocation unique index to (issuer_id, serial_number) (fixes H-1)
RFC 5280 §5.2.3 defines certificate serial number uniqueness per issuing CA,
not globally. The prior unique index on `certificate_revocations.serial_number`
enforced a stricter invariant than the spec: with 12 issuer connectors (Local
CA, ACME, Vault, step-ca, OpenSSL, DigiCert, Sectigo, Google CAS, AWS ACM PCA,
Entrust, GlobalSign, EJBCA), two distinct certificates legitimately issued by
different CAs can share a serial number. Recording a revocation for the second
collision silently dropped via `ON CONFLICT DO NOTHING`, leaving the second
cert persistently absent from OCSP/CRL responses.

Changes:

- Migration 000012 drops `idx_certificate_revocations_serial` and creates
  `idx_certificate_revocations_issuer_serial` UNIQUE ON (issuer_id,
  serial_number). Adds a non-unique `idx_certificate_revocations_serial_lookup`
  to preserve the serial-only fast path for OCSP/CRL probes that already know
  the issuer scope.
- `CertificateRevocationRepository.Create` targets the new composite key in
  `ON CONFLICT` — same-issuer idempotency preserved, cross-issuer collisions
  now recorded as distinct rows.
- `GetBySerial(serial)` renamed `GetByIssuerAndSerial(issuerID, serial)` on
  the interface and Postgres impl. All callers (OCSP responder, CRL
  generator, short-lived-cert exemption check) already have `issuerID` in
  scope because the protocol paths carry it (`/api/v1/ocsp/{issuer_id}/{serial}`,
  `/api/v1/crl/{issuer_id}`).
- Repository integration test added: `TestRevocationRepository_CrossIssuerSerialCollision`
  asserts that serial `CAFEBABE01` can be stored under two issuers
  simultaneously, that lookups return the correct row per (issuer, serial),
  and that same-issuer idempotency still works (re-inserting (issuer, serial)
  does not error and does not duplicate).
- Existing tests and service/integration mocks updated for the rename.

Wire-format invariants preserved: CRL DER bytes, OCSP response bytes, and
AES-256-GCM config encryption are unaffected — this change touches only
revocation-record uniqueness scope.

CWE-664.
2026-04-16 21:49:59 +00:00
shankar0123 f549a7aa79 security: fail closed when CERTCTL_CONFIG_ENCRYPTION_KEY is unset (fixes C-2)
EncryptIfKeySet/DecryptIfKeySet in internal/crypto/encryption.go previously
returned plaintext + wasEncrypted=false when the operator had not configured
CERTCTL_CONFIG_ENCRYPTION_KEY. That produced a data-at-rest confidentiality
bypass (CWE-311): sensitive fields on dynamically-configured issuer and
target rows (source='database') were persisted to PostgreSQL without any
encryption, and no caller could distinguish the encrypted from the plaintext
branch at runtime. The only visible signal was a single warning log line
emitted once at startup.

Fail closed instead:

- EncryptIfKeySet / DecryptIfKeySet now return crypto.ErrEncryptionKeyRequired
  (a new exported sentinel, errors.Is-unwrappable) when the key is empty or
  nil, rather than silently emitting plaintext. The (result, wasEncrypted,
  err) tuple signature is preserved for source compatibility; only the
  semantics of the no-key branch changed.

- cmd/server/main.go grows a startup pre-flight check: if no encryption key
  is configured the server lists issuers and targets, counts rows with
  source='database', and refuses to start (os.Exit(1)) if any exist. Operators
  must either configure CERTCTL_CONFIG_ENCRYPTION_KEY or remove the exposed
  rows before the control plane can boot. The warning-only path is retained
  for the clean-slate case (no database rows).

- internal/service/issuer.go's SeedFromEnvVars now guards the encryption call
  with len(s.encryptionKey) > 0 so env-seeded rows (source='env', which are
  reconstructable on every boot from process env) continue to persist as
  plaintext in the 'config' column when no key is configured. Registry load
  already falls through to cfg.Config when EncryptedConfig is nil. GUI/API
  write paths (source='database') remain fail-closed via propagation of
  ErrEncryptionKeyRequired.

- Integration tests that exercise CreateIssuer via the handler layer now
  supply a real 32-byte AES-256 test key so the encrypt path runs instead of
  returning ErrEncryptionKeyRequired. Same pattern in internal/service/
  testutil_test.go for consolidated service-layer tests.

- internal/crypto/encryption_test.go grows regression guards:
  TestEncryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
  TestDecryptIfKeySet_EmptyKeyFailsClosed (nil_key + empty_key subtests),
  TestEncryptDecryptIfKeySet_RoundTripProducesDifferentCiphertext,
  TestDecryptIfKeySet_RejectsTamperedCiphertext, and
  TestEncryptIfKeySet_PreservesErrEncryptionKeyRequiredSentinel (verifies
  the sentinel unwraps through fmt.Errorf(%w)-style wrapping).

Wire format is unchanged: AES-256-GCM Encrypt/Decrypt/DeriveKey, the
12-byte nonce prefix, the GCM auth tag, the PBKDF2 salt
('certctl-config-encryption-v1'), and the 100,000 iteration count are all
byte-identical. Ciphertexts produced before this change remain decryptable.

Verified:
- go build ./... : clean
- go vet ./...   : clean
- go test -race ./internal/crypto/... ./internal/service/... \
    ./internal/integration/... ./cmd/server/... : pass
- golangci-lint run ./... : 0 issues
- govulncheck ./... : 0 reachable vulnerabilities
- rg 'return plaintext, false, nil' internal/ : no matches
- Coverage: crypto 85.0% (unchanged), service 67.8% (was 67.9%, noise),
  cmd/server 0.0% (unchanged baseline). All above CI thresholds.

See certctl-audit-report.md for the full finding record and resolution log.
2026-04-16 21:10:40 +00:00
shankar0123 b219e5d68a security: use crypto/rand for agent API keys (fixes C-1)
Replaces math/rand-based agent API key generation in internal/service/agent.go
with crypto/rand.Read over a 32-byte buffer encoded with base64.RawURLEncoding,
yielding a 43-character URL-safe unpadded ASCII string (256 bits of entropy).

generateAPIKey now returns (string, error); Register and RegisterAgent propagate
entropy-source failures. hashAPIKey is unchanged — the SHA-256 hashed-at-rest
invariant is preserved.

Fixes C-1 (CWE-338: Use of Cryptographically Weak Pseudo-Random Number Generator)
from certctl-audit-report.md.

Changes:
- internal/service/agent.go: new imports (crypto/rand, encoding/base64);
  generateAPIKey rewritten to return (string, error); Register and RegisterAgent
  updated to propagate the error.
- internal/service/agent_test.go: TestGenerateAPIKey_Properties regression test
  (non-empty, length 43, valid base64url, 32 decoded bytes, no collisions over
  64 calls). No entropy-failure test — Go 1.24+ (issue #66821) makes crypto/rand
  errors fatal, so that branch is defensively unreachable.

Verification:
- go build ./cmd/server/... ./cmd/agent/... ./cmd/mcp-server/... ./cmd/cli/... → pass
- go vet ./... → pass
- go test -race (CI scope, 43 packages) → pass
- golangci-lint v2.11.4 run ./... → 0 issues
- govulncheck ./... → 0 vulnerabilities in certctl code
- Coverage: service 68.9% / handler 83.6% / domain 82.0% / middleware 63.8%
  (all above CI gates 55/60/40/30)
- grep math/rand in internal/ and cmd/ → zero production hits
- No caller assumes the old 32-char length or legacy charset
2026-04-16 19:43:19 +00:00
shankar0123 1f6cf0eafa fix: add npm ci retry and install verification for proxy environments (#9)
npm has a known bug where `npm ci` can crash with "Exit handler never
called!" behind corporate proxies yet exit with code 0. This adds a
single retry on failure and verifies tsc is actually installed before
proceeding to build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.40
2026-04-16 11:21:47 -04:00
shankar0123 a49eae8155 fix: correct BSL 1.1 change date to March 14, 2033
why-certctl.md said March 1, CHART_SUMMARY.md said March 28. The
LICENSE file is authoritative: Change Date is March 14, 2033.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:12:49 -04:00
shankar0123 1c7d085f16 docs: move maintenance notice and quick start link above Documentation section
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:05:47 -04:00
shankar0123 cc6eec3608 fix: merge npm install + build into single Docker layer (#9)
The previous fix (--include=dev) was necessary but insufficient. The
real issue is that node_modules created by npm ci in one layer can be
lost when COPY web/ . creates the next layer — depending on the Docker
storage driver (fuse-overlayfs, vfs). Merging install and build into a
single RUN eliminates the layer boundary entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.39
2026-04-16 10:52:50 -04:00
shankar0123 86fb140414 fix: ensure devDependencies install in Docker build (#9)
npm ci skips devDependencies when NODE_ENV=production leaks from the
host environment into the Docker build. This breaks the frontend stage
because typescript and vite are devDependencies. Adding --include=dev
makes the install hermetic regardless of host environment.

Closes #9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 10:00:06 -04:00
shankar0123 13cd4d98ba feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id,
owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure
tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke),
server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI
spec, compliance mapping updates, and 21 new tests (12 service, 7 handler,
1 CLI, 1 frontend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.38
2026-04-16 00:06:34 -04:00
shankar0123 84bc1245a1 fix: case-insensitive issuer type validation + missing M49 types (#7)
Backend rejected lowercase type strings (e.g., "acme") sent by older
cached frontends. Add normalizeIssuerType() with alias map for
case-insensitive lookup, wire into both Create paths. Add missing
Entrust/GlobalSign/EJBCA to validIssuerTypes. Add lowercase fallbacks
to issuer factory switch. 39 new test subtests covering normalization,
lowercase create flows, and M49 type acceptance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.37
2026-04-15 23:20:32 -04:00
shankar0123 e1bcde4cf1 feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret
managers. Three pluggable DiscoverySource connectors feed into the
existing discovery pipeline via sentinel agent pattern, with a 9th
scheduler loop for periodic cloud scanning.

- AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests
- Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests
- GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests
- CloudDiscoveryService orchestrator with 9 tests
- 9th scheduler loop (6h default, atomic.Bool idempotency)
- Discovery page: color-coded source type badges
- 14 new env vars across CloudDiscoveryConfig structs
- Docs: connectors.md, architecture.md, features.md, README updated

49 new tests. All CI checks pass (go vet, race, lint, coverage).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 23:01:00 -04:00
shankar0123 3f619bcaac feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA
coverage. Entrust uses mTLS client certificate auth with sync/async
issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with
serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for
self-hosted Keyfactor CAs.

Each connector implements the full issuer.Connector interface (9 methods),
includes httptest-based unit tests (~14 each), and follows established
patterns (injectable HTTP clients, RFC 5280 revocation reason mapping,
CRL/OCSP delegated to CA).

Also includes: issuer factory cases, env var seeding, config structs,
domain types, seed data (3 rows, all disabled), OpenAPI enum updates,
frontend issuer catalog entries with config fields, and full docs
(connectors.md, architecture.md, features.md, README).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 22:24:12 -04:00
shankar0123 f3a85d6b08 fix: remove unused createTestCert function in tlsprobe tests
golangci-lint (unused linter) flagged createTestCert as dead code —
only createTestCertWithKey is called by the actual tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 21:54:38 -04:00
shankar0123 596d86a206 feat(M48): continuous TLS health monitoring — endpoint state machine, shared tlsprobe, 8 API endpoints, GUI
Adds continuous TLS endpoint health monitoring that closes the deploy→verify→monitor loop.
After M25 verifies a deployment succeeded once, M48 continuously confirms it stays healthy.

Key components:
- Shared `internal/tlsprobe/` package extracted from network scanner for reuse
- Health status state machine: healthy → degraded (2 failures) → down (5 failures),
  plus cert_mismatch when served fingerprint differs from expected
- 8th scheduler loop (60s tick, per-endpoint configurable intervals)
- PostgreSQL migration 000011: endpoint_health_checks + endpoint_health_history tables
- 8 REST API endpoints (CRUD, history, acknowledge, summary)
- Health Monitor GUI page with summary bar, status table, create modal, auto-refresh
- 38 new tests (5 tlsprobe + 11 domain + 10 service + 8 handler + 4 frontend)
- All coverage thresholds maintained (service 68%, handler 83%, domain 87%, middleware 63%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 21:45:45 -04:00
shankar0123 f2e60b93a3 feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths
(renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs
with key algorithm/size that don't match profile rules. MaxTTL enforcement
caps certificate validity per issuer connector (Local CA, Vault, step-ca
enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and
size are now persisted in certificate_versions for audit compliance.

16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded
version number from GUI sidebar. Documentation updated across architecture,
features, connectors, and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 21:05:14 -04:00
shankar0123 f16a9c767a docs: consolidate README — merge architecture, security, design decisions into Why certctl
Fold Architecture, Key Design Decisions, and Security sections into the
Why certctl section as bold-header paragraphs. Removes three standalone
sections, tightening the README structure: Documentation → Integrations →
Why certctl (with architecture, security, design decisions) → What It Does →
Quick Start → Examples → CLI → MCP → Development → Roadmap → License.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 17:06:43 -04:00
shankar0123 3a27c87b3f docs: move Supported Integrations under Documentation links in README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 17:03:11 -04:00
shankar0123 0ed8676066 docs: rewrite README to highlight all adoption-driving features
Move documentation table to top (below Gantt chart). Condense screenshots
to 4 key images with "see all" link. Add Enrollment Protocols and
Standards & Revocation tables. Surface previously buried features:
dynamic GUI config, onboarding wizard, approval workflows, agent groups,
TLS verification, certificate export, SCEP, revocation infrastructure.
Fix stale numbers (26 pages, 111 routes) verified against repo source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 17:00:09 -04:00
shankar0123 bcefb11e65 feat(M51): add SCEP server (RFC 8894) for MDM and network device enrollment
Implements Simple Certificate Enrollment Protocol with single-endpoint
operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7
SignedData CSR extraction with fallback for raw/base64 CSR, challenge
password authentication via CSR attributes, and shared internal/pkcs7
package extracted from EST handler to eliminate code duplication.

24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.36
2026-04-15 16:47:18 -04:00
shankar0123 75cf8475f5 tighten BSL license scope, fix documentation underselling shipped features
Broadened BSL Additional Use Grant from "hosted or managed service" to cover
any commercial offering (embedded, bundled, integrated). Updated README to
promote all shipped connectors from Beta to Implemented, added EST/ARI/S/MIME
highlight, Helm quickstart, and corrected license description. Fixed
connectors.md stale claims (AWS ACM PCA listed as planned, K8s Secrets
listed as coming soon) and updated overview with exact connector counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.35
2026-04-15 15:54:03 -04:00
shankar0123 c015cab2f4 docs: rewrite features.md, audit README + architecture against repo
Rewrote docs/features.md from scratch as authoritative feature inventory
(1255 lines, every claim verified against source files).

Audited README.md and architecture.md against repo — fixed 19 stale
references: K8s Secrets status, issuer counts, dashboard page counts,
CI thresholds, missing connectors in Mermaid diagrams, OpenAPI operation
count, GetCACertPEM behavior, and V2/V4 roadmap accuracy.

Also includes related fixes discovered during audit:
- Scheduler skips expired/failed/revoked certs from auto-renewal
- Seed demo expiry dates moved outside 31-day scheduler query window
- Agent pages use correct last_heartbeat_at field name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 00:22:57 -04:00
shankar0123 3da6584ab8 fix: correct K8s Secrets status to 'Coming in 2.1', increase audit trail page size to 200
The Kubernetes Secrets target connector has config validation, tests, UI,
and Helm RBAC implemented but the realK8sClient is a stub — runtime
deployment will fail. Update README and connectors.md to reflect actual
status instead of misleading 'Beta' label.

Also increase the audit trail GUI default from 50 to 200 events per page
(backend already permits up to 500).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.34
2026-04-14 12:11:01 -04:00
shankar0123 68f6fd474b fix: return 409 on duplicate issuer name, improve error handling and onboarding defaults
Closes #7. The issuer create/update handlers swallowed all service errors
as generic 500s. Now differentiates: 409 for UNIQUE constraint violations,
400 for unsupported issuer type, 404 for not-found on update, 500 for
unknown errors. Adds structured error logging via slog.

OnboardingWizard now pre-populates config field defaults when a type is
selected (matching IssuersPage behavior), preventing empty required fields
from causing silent failures.

install-agent.sh hardened for curl|bash usage: --agent-id flag, =value
syntax, /dev/tty stdin reopening, proper stderr routing in download_binary,
non-interactive install examples in help text, and updated wizard commands.

Adds adversarial security tests for EST, path traversal, and query
injection handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.33
2026-04-12 19:18:32 -04:00
shankar0123 614e4e636b chore: bump Go to 1.25.9 to patch 4 stdlib CVEs
Go 1.25.9 (released Apr 7 2026) fixes:
- GO-2026-4947: unexpected work during chain building in crypto/x509
- GO-2026-4946: inefficient policy validation in crypto/x509
- GO-2026-4870: unauthenticated TLS 1.3 KeyUpdate DoS in crypto/tls
- GO-2026-4865: JsBraceDepth context tracking XSS in html/template

Update CI workflow and go.mod to pin 1.25.9. govulncheck now reports
0 vulnerabilities in called code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:33:25 -04:00
shankar0123 370f856725 fix: resolve 8 staticcheck lint errors in test files
SA1029: use typed context key instead of string in main_test.go
S1039: remove unnecessary fmt.Sprintf in validation_test.go
SA4023: fix unreachable nil check on concrete error type
SA4006: fix unused variable assignments in stepca_test.go (4 occurrences)
SA4000: fix duplicate expression in ssh_test.go (BEGIN vs END CERTIFICATE)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:27:57 -04:00
shankar0123 7382e5f03b test: comprehensive test gap closure across 24 packages
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.

All tests pass: go test -short, go vet, go test -race clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:09:40 -04:00
shankar0123 5567d4b411 feat(M47): add Kubernetes Secrets target + AWS ACM PCA issuer connectors
Implement both M47 connectors with full cross-layer wiring:

Kubernetes Secrets target: DNS-1123 validation, kubernetes.io/tls Secret
create-or-update, chain concatenation, serial number validation, Helm
RBAC gating. 18 tests.

AWS ACM Private CA issuer: synchronous issuance (like Vault), ARN regex
validation, RFC 5280 revocation reason mapping, CA cert retrieval,
factory + env var seeding. 23 tests.

Cross-cutting: domain types, service validation, config, factory, agent
dispatch, frontend (TargetsPage, issuerTypes), OpenAPI, seed data, Helm
chart, connectors docs, README. Testing docs (testing-guide, qa-test-guide,
qa_test.go) with Parts thematically integrated near related connectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v2.0.32
2026-04-07 20:21:09 -04:00