certctl

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 21:11:30 +00:00

Author	SHA1	Message	Date
shankar0123	8f2e5771db	fix(middleware): SEC-006 — TTL-evict idle token-bucket rate-limiter entries Sprint 2 unified-master-audit closure. Pre-fix the keyed rate limiter's bucket map had no eviction. The package-level comment explicitly noted the leak: high-cardinality unauthenticated traffic (CGNAT churn, Tor exit lists, botnets, infinite-cardinality scanners) grew process memory unboundedly. Production deploys with millions of unique IPs would eventually OOM. Fix: - RateLimitConfig.BucketTTL (env CERTCTL_RATE_LIMIT_BUCKET_TTL, default 1h, clamp-floor 1m). 1h chosen to be well above realistic operator IP churn windows (returning clients keep their bucket) and well below the unbounded-leak window the pre-fix code allowed. - tokenBucket gains a lastAccess field updated on every allow() call via touch(); reading via lastAccessTime() under the bucket's own mutex. - keyedRateLimiter.sweepLoop runs in a single goroutine per limiter (production wires 2: default + no-auth fallback), waking every BucketTTL/4. sweep() removes any bucket whose lastAccess is older than the cutoff and bumps evictedTotal atomically. - Both NewRateLimiter call sites in cmd/server/main.go (default stack and no-auth fallback) now thread cfg.RateLimit.BucketTTL. Regression coverage: - TestKeyedRateLimiter_SweepEvictsIdleBuckets: 1000 synthetic IP keys populate the map, advance past TTL, call sweep() directly, assert map drained to 0 + evictedTotal=1000 + fresh key creates new bucket (map not poisoned). - TestKeyedRateLimiter_SweepKeepsActiveBuckets: inverse — a bucket touched within the TTL window survives the sweep. Catches a future regression that inverts the cutoff comparison. Closes SEC-006.	2026-05-16 04:01:18 +00:00
shankar0123	7268d12a17	feat(web): close FE-M6 — migrate static inline-style attrs to Tailwind + correct CSP rationale comment Closes frontend-design-audit finding FE-M6 (Med): CSP allows 'unsafe-inline' for `style-src` — necessary today because of inline SVG `style=` attrs (related to FE-H2) ═══════════════════════════ GROUND-TRUTH FINDINGS ═══════════════════ Ground-truth recon found 4 audit-framing errors: (1) The "17 inline-style tsx files" count was stale — actual is 9 (8 after excluding a Layout.tsx comment match the audit's grep counted). (2) The CSP rationale comment at securityheaders.go:35 LIED about WHY 'unsafe-inline' is needed. It claimed "Tailwind (via Vite) injects per-component <style> blocks at build time." Verified against the post-build artifact: `grep -c '<style' dist/index.html` = 0; Vite's CSS output is a single .css file linked via `<link rel="stylesheet">`. The 'unsafe-inline' grant exists for React's `style={...}` attribute model, NOT for Vite or Tailwind. (3) The 9 sites split cleanly into: LOAD-BEARING DYNAMIC (5 sites; can't be Tailwind utilities because values are computed at runtime): - Tooltip.tsx Floating-UI position (left/top px per-tick) - AgentFleetPage.tsx dynamic color+width chart bars - dashboard/charts.tsx Recharts color props - CertificatesPage.tsx progress-bar percent width - IssuerHierarchyPage.tsx depth-based marginLeft STATIC PIXEL VALUES (3 files, ~12 sites; clean Tailwind migration targets): - UsersPage.tsx — filter UI + table styling - DigestPage.tsx — iframe min-height - AuthProvider.tsx — demo-mode banner (4) Fully eliminating 'unsafe-inline' would require either banning dynamic `style={...}` (CSS-in-JS rewrite of the 5 load-bearing sites) or adopting CSP nonces with React 18+'s style runtime. Neither fits the original FE-M6 phase budget. ═══════════════════════════ CHANGES ═══════════════════════════════ web/src/pages/auth/UsersPage.tsx: 9 inline-style attrs → Tailwind utility classes. The filter UI (mb-4, mr-2, w-[280px] p-1), the table (w-full border-collapse), the thead row (border-b-2 border-gray-300 text-left), per-row borders (border-b border-gray-200 + opacity-50/100 conditional), buttons (px-3 py-1), the empty-state cell (p-3 text-center). Behavior-preserving. web/src/pages/DigestPage.tsx: iframe `style={{ minHeight: '600px' }}` → className "min-h-[600px]" (composed into the existing className). web/src/components/AuthProvider.tsx: Demo-mode banner: 6-prop `style={{ background, color, padding, fontSize, fontWeight, textAlign }}` → className "bg-red-700 text-white px-4 py-2 text-[13px] font-semibold text-center". Same visual. internal/api/middleware/securityheaders.go: CSP rationale comment rewritten to accurately describe WHY 'unsafe-inline' is required. New comment: - Names the 5 load-bearing dynamic-style sites explicitly - Lists the 3 static sites that were migrated to Tailwind today - Documents that the OLD comment's "Tailwind/Vite injects <style> blocks" claim was factually wrong (verified against built dist/index.html — zero <style> tags emitted) - Records the future-tightening path (React style-runtime nonces OR CSS-in-JS rewrite of the 5 sites) and notes it doesn't fit the original FE-M6 phase budget ═══════════════════════════ AUDIT FRAMING ════════════════════════ The audit said FE-M6 was about "inline SVG style= attrs (related to FE-H2)." Ground-truth: FE-H2 (Phase 3 Layout SVG → Lucide icons) ALREADY happened; the remaining inline-style sites have nothing to do with SVGs. The audit's bridge from FE-H2 → FE-M6 was a red herring. The OPERATOR-VISIBLE win from this closure: • 3 production tsx files now use Tailwind utility classes for static styling — consistent with the rest of the codebase. • The CSP comment now tells the truth about why 'unsafe-inline' is needed, so the next operator who reads it doesn't waste time hunting for non-existent <style> blocks. • The inline-style attribute surface is reduced to ONLY load-bearing dynamic styling — making any future tightening work (nonces, CSS-in-JS migration) easier to scope. The CSP header itself is UNCHANGED ("style-src 'self' 'unsafe-inline'"). True elimination of 'unsafe-inline' is a separate workstream tracked in the corrected comment. ═══════════════════════════ VERIFICATION ═══════════════════════════ • gofmt -l internal/api/middleware/securityheaders.go — clean • go vet ./internal/api/middleware/... — exit 0 • go test -short -count=1 ./internal/api/middleware/... — ok 0.247s (existing securityheaders_test.go pins the Content-Security-Policy header value byte-string; unchanged by this commit so test stays green) • npx tsc --noEmit — exit 0 • npx vitest run AuthProvider DigestPage UsersPage — 16/16 pass • npx vite build — built in 3.42s Ground-truth: origin/master tip `9ba5ee4` (P-M2 just pushed) verified via GitHub API BEFORE commit. Falsifiable proof: a future engineer reading securityheaders.go:35 sees an accurate explanation of why 'unsafe-inline' is needed, NOT the previous false "Tailwind/Vite" claim.	2026-05-14 20:40:55 +00:00
shankar0123	03f0e08a77	fix(middleware): Hotfix #14 — staticcheck QF1008 from Hotfix #12 CI run #571 (commit `af5c392`, "Hotfix #12 — CodeQL #34 go/reflected-xss in etag.go") failed: internal/api/middleware/etag.go:261:11: QF1008: could remove embedded field "ResponseWriter" from selector (staticcheck) hdr := r.ResponseWriter.Header() Root cause: etagRecorder embeds http.ResponseWriter: type etagRecorder struct { http.ResponseWriter body *bytes.Buffer status int headerWritten bool headerWrittenOnWire bool bodyTruncated bool } etagRecorder DOES override Write() and WriteHeader() — those buffer / track instead of writing through. So r.ResponseWriter.Write(b) and r.ResponseWriter.WriteHeader(s) ARE intentional embedded-field selectors (calling the recorder's own Write would recurse infinitely; calling its WriteHeader would skip the wire flush). staticcheck recognizes those as load-bearing and doesn't flag. But etagRecorder does NOT override Header(). So r.ResponseWriter.Header() and r.Header() are equivalent — staticcheck QF1008 wants the shorter form. The Hotfix #12 change added a new r.ResponseWriter.Header() that I missed. Fix: Change r.ResponseWriter.Header() → r.Header() at line 261 (the Content-Type defense added in Hotfix #12). Behavior is byte- identical: r.Header() is the promoted method from the embedded ResponseWriter. Added a comment block immediately above the fix explaining why the neighboring r.ResponseWriter.WriteHeader / r.ResponseWriter.Write calls intentionally KEEP the explicit selector (overridden methods → embedded form required to bypass recursion). Future engineers won't get confused by the asymmetric pattern. Hotfix #13 (signer FileDriver path-injection — local commit `38f86bc`, not yet pushed) does NOT have the same risk: FileDriver has no embedded struct / interface, only direct fields, so QF1008 can't apply. Verification (sandbox constraints — Go unavailable): • Manual syntax inspection: brace count balanced (27/27), paren count balanced (53/53). Diff +9/-1. • No remaining r.ResponseWriter.Header() in the file (verified via grep — empty match). • All 48 CI guards pass. • Other CI noise on run #571 (windows-latest syscall.Stat_t, Node.js 20 deprecation warnings) is PRE-EXISTING and not introduced by either Hotfix #12 or #13 — see the failure log: undefined: syscall.Stat_t fires in internal/deploy/ownership.go which neither hotfix touched. Ground-truth: origin/master tip `af5c392` verified via GitHub API. Local is at `38f86bc` (Hotfix #13) which the operator hasn't pushed yet; this commit lands on top. After push the order is: `af5c392` → `38f86bc` → <this>. Operator: please run `make verify` from the repo root before pushing — sandbox can't run staticcheck/go vet/go test.	2026-05-14 19:12:43 +00:00
shankar0123	af5c39252f	fix(middleware): Hotfix #12 — CodeQL #34 go/reflected-xss in etag.go CodeQL alert #34 (severity: HIGH, rule: go/reflected-xss) fired on commit `8191b1e` (Phase 6 SCALE-L2 ETag middleware): internal/api/middleware/etag.go:220 return r.ResponseWriter.Write(b) "Cross-site scripting vulnerability due to user-provided value." Root cause (analysis): The etagRecorder type buffers response bytes from the wrapped handler so the ETag middleware can hash the body before deciding 304-vs-200. On the over-sized-response truncation path (body > 64 KiB), bytes are forwarded directly to the underlying ResponseWriter at line 220. CodeQL's data-flow query traces: *http.Request (source: user input) → handler reads query/path/body → handler echoes data into the JSON response payload (a cert's common_name, an audit row's actor display name, etc.) → json.NewEncoder(w).Encode(...) calls w.Write([]byte) → etagRecorder.Write forwards to r.ResponseWriter.Write(b) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ sink — CodeQL flags reflected-XSS CodeQL can't see that the wrapped handler set Content-Type: application/json via handler.JSON() before any byte was written; it sees a generic byte forwarder writing to an http.ResponseWriter with no proximate Content-Type guarantee. Browsers don't interpret application/json as HTML — so this is technically a false positive — but the data-flow path is real and a future handler that forgets to set Content-Type would convert it into a real vuln (browsers can content-sniff a JSON body as text/html when Content-Type is absent). Fix (defense-in-depth, not just suppression): Add an explicit Content-Type guard at writeHeadersToWire() — the centralized chokepoint that ALL wire-write paths funnel through (line 213 in Write's truncation branch, line 258 in flush's main branch). If Content-Type is unset at this point, default to "application/json; charset=utf-8". This: 1. Makes the Content-Type invariant the middleware relies on explicit at the sink, which is the standard pattern CodeQL's go/reflected-xss recognizes as "validated before write". 2. Adds REAL defense-in-depth: a hypothetical future handler wired through ETag that forgot Content-Type can no longer expose a content-sniff vuln. The middleware enforces the safe shape at the boundary. 3. Is behavior-preserving for the 5 current consumers — every wrapped list endpoint (/api/v1/{certificates,agents,jobs, audit,discovered-certificates}) routes JSON responses through handler.JSON() at internal/api/handler/response.go:60, which already sets Content-Type: application/json. Path is no-op for them. Why not a simpler approach: • Removing line 220 (refactor to avoid the data-flow): the truncation path is required behavior — once buffer > 64 KiB the middleware degrades to no-caching pass-through, which requires writing the body bytes to the wire. The data flow is structural. • html.EscapeString(b) before write: would corrupt JSON. Wrong encoder for the content type. • Bare CodeQL suppression comment: closes the alert without actually addressing the latent bug a future handler could create. Defense-in-depth is the operator's stated preference per the CLAUDE.md "always take the complete path" principle. Verification (sandbox constraints disclosed honestly): • Manual syntax inspection — diff is 21-line additive, all inside writeHeadersToWire(). Brace count balanced (27/27), paren count balanced (53/53). No imports changed (http.Header API was already in use). • CI guards: all 48 pass locally. • Existing etag_test.go has 10 contract tests covering: ETag emit on GET, 304-on-If-None-Match, 200-on-mutation, POST bypass, 5xx/4xx pass-through, OversizedResponse degradation, wildcard match, HEAD parity, PassThrough body preservation. Behavior analysis (see commit body): every test either (a) has the handler set Content-Type explicitly (no-op for the new guard) or (b) goes through the 304-direct-write path in ETag() which bypasses the recorder entirely. All 10 tests should remain green when `make verify` runs on workstation. • Go toolchain NOT available in sandbox (no `go vet` / `go test` / `golangci-lint` / `staticcheck`). Disk pressure on the shared /sessions partition (166 MB free of 9.8 GB) prevented installing Go for this run. The CLAUDE.md operating rule allows this fallback path provided the verification gap is disclosed and the operator runs `make verify` on workstation BEFORE pushing. Operator: please run `make verify` from the repo root on your workstation before pushing. The change is minimal + additive, but the Go test suite should be the final green-light. Falsifiable proof for the next CodeQL scan: alert #34 should auto-close on the next push to master once the post-fix run sees the Content-Type setter precede every Write to the wire. Ground-truth: origin/master tip `6c00f7b` verified via GitHub API BEFORE commit per the operating rule.	2026-05-14 19:03:50 +00:00
shankar0123	0ad881c2bd	fix(lint): U1000 — delete dead etagRecorder.sentinelMarker method CI run on master@ed60059e (Phase 6 + lint hotfix) still red. The golangci-lint step now passes cleanly (0 issues — yesterday's ST1021 fix landed), but the workflow also has a SEPARATE `staticcheck ./...` step at the end that runs raw staticcheck without golangci-lint's directive-resolution layer: internal/api/middleware/etag.go:254:24: func (etagRecorder).sentinelMarker is unused (U1000) Root cause: Phase 6's etag.go shipped a dead no-op method `func (r etagRecorder) sentinelMarker() {}` with a `//nolint:unused` directive. golangci-lint's `unused` linter respects the directive; raw staticcheck's U1000 does NOT — `//nolint:` is a golangci-lint convention, not a staticcheck convention (staticcheck uses `//lint:ignore U1000 reason` syntax). The comment claimed the method "anchors" documentation about the `headerWrittenOnWire` field. Reading the actual code: the field is used directly in `writeHeadersToWire` (line 241); the method is pure dead code with a misleading comment. Deleting it loses nothing — the sentinel field stays where it's needed. Pattern lesson logged in the Tasks-Deferred table: golangci-lint's `//nolint:LINTER` directive is a golangci-lint invention. Raw staticcheck (or any underlying linter run outside golangci-lint) ignores it. The certctl workflow runs BOTH golangci-lint AND a standalone `staticcheck ./...` step, so any future `//nolint:unused` / `//nolint:staticcheck` use needs to be paired with `//lint:ignore U1000` (or equivalent) for staticcheck to honor it — OR the code should be deleted / exported / actually used. Verification: staticcheck ./... → exit 0, no output (mirrors CI's invocation) go vet ./internal/api/middleware/... → clean go test ./internal/api/middleware/... -count=1 -short → ok (0.25s) gofmt -l → clean Closes: CI run on master@ed60059e U1000 lint failure	2026-05-14 03:11:57 +00:00
shankar0123	8191b1ee64	scheduler+db: close Phase 6 — scale hardening across pool, jitter, ETag, asyncpoll Phase 6 of the certctl architecture diligence remediation. Five findings across the same scheduler-and-DB-pool surface. SCALE-M1 (Med) — DB pool default bumped 25 → 50 internal/config/config.go line 1972: MaxConnections: getEnvInt("CERTCTL_DATABASE_MAX_CONNS", 50) Postgres default max_connections is 100; 50 leaves headroom for pg_dump + ad-hoc psql + a server replica without exhausting the DB-side cap. Operator override env var unchanged. Operator-tune ladder for larger fleets (5K / 50K certs) lives in docs/operator/scale.md as starter values pending Phase 8 load tests — explicitly marked TBD. SCALE-M3 (Med) — async-CA poll budget operator-configurable Live state was partially-already-shipped: all 4 async-CA connectors (digicert, entrust, globalsign, sectigo) already have per-connector CERTCTL_<NAME>_POLL_MAX_WAIT_SECONDS (Audit fix #5 closed pre-Phase-6). What was missing: a global package-default override. Shipped: - internal/connector/issuer/asyncpoll/asyncpoll.go gains SetDefaultMaxWait(d) + effectiveDefaultMaxWait var + the currentDefaultMaxWait() priority resolver. - cmd/server/main.go reads CERTCTL_ASYNC_POLL_MAX_WAIT_SECONDS at boot and calls SetDefaultMaxWait. - deploy/ENVIRONMENTS.md documents the new env var (G-3 guard green). Naming deviation from the prompt's CERTCTL_ASYNC_POLL_MAX_ATTEMPTS: the live code tracks wall-clock time (MaxWait), not attempt count. Matched the existing per-connector nomenclature (_POLL_MAX_WAIT_SECONDS) so the priority chain reads naturally. SCALE-M5 (Med) — JitteredTicker wrapper for all 15 scheduler loops internal/scheduler/jitter.go ships NewJitteredTicker(interval, jitterPct) + DefaultSchedulerJitter (±10%). All 15 sites in internal/scheduler/scheduler.go migrated from bare time.NewTicker to NewJitteredTicker(interval, DefaultSchedulerJitter). Base intervals unchanged; only the per-tick envelope adds ±10% randomized delay so multiple loops with the same nominal cadence don't co-fire and spike CPU + DB at wall-clock boundaries. internal/scheduler/jitter_test.go pins: - Bounded envelope (each tick within ±jitterPct of interval) - Mean drift < 30% of nominal (sign-bug detector) - Stop() releases the goroutine + closes C - Stop() idempotent (no panic on repeat) - Zero-jitter behaves like time.NewTicker - Negative and >=1 jitterPct values clamped defensively CI guard scripts/ci-guards/no-bare-newticker-in-scheduler.sh blocks any future bare time.NewTicker in scheduler.go. SCALE-L1 (Low) — renewal-sweep semaphore behavior documented docs/operator/scale.md "Scheduler tick budgets" section explains the per-tick concurrency semaphore (CERTCTL_RENEWAL_CONCURRENCY=25 default), the ctx-cancellation drain on tick-budget overrun, and operator tuning advice (raise concurrency + DB pool together). No code change — the behavior is defensible as-is per the audit. SCALE-L2 (Low) — ETag middleware for top-5 read endpoints internal/api/middleware/etag.go computes SHA-256 ETag over the buffered response body, respects If-None-Match, short-circuits to 304 Not Modified on match. GET/HEAD only; non-2xx responses pass through unchanged. 64 KiB buffer cap degrades gracefully on oversized responses (no caching, body still flushes intact). Wired around the top-5 read endpoints via etagged() helper in internal/api/router/router.go: GET /api/v1/certificates GET /api/v1/agents GET /api/v1/jobs GET /api/v1/audit GET /api/v1/discovered-certificates internal/api/middleware/etag_test.go pins 11 behaviors including 304-on-repeat, 200-after-mutation-with-new-ETag, POST bypass, 4xx/5xx pass-through, oversized-response degradation, wildcard match, HEAD-treated-like-GET, byte-equal pass-through. Cross-cutting fixes: - internal/config/config_test.go::TestLoad_DefaultValues updated to assert the new 50 default (was 25). - deploy/helm/certctl/values.yaml comment corrected — agent pollInterval is hardcoded 30s, not env-configurable; the Phase 4 comment mistakenly referenced CERTCTL_AGENT_POLL_INTERVAL which G-3 caught as a phantom env var. - asyncpoll.go reformatted by gofmt; functionally unchanged. Verification (all pass): grep -nE 'SetMaxOpenConns' internal/repository/postgres/db.go # finds 1 site grep -nE 'CERTCTL_DATABASE_MAX_CONNS.*50' internal/config/config.go # config default is 50 grep -rnE 'CERTCTL_ASYNC_POLL_MAX_WAIT_SECONDS' internal/ deploy/ENVIRONMENTS.md # wired grep -cE 'time\.NewTicker\(' internal/scheduler/scheduler.go # 0 (all migrated) grep -cE 'JitteredTicker' internal/scheduler/scheduler.go # 15 ls internal/scheduler/jitter.go internal/api/middleware/etag.go # both exist ls docs/operator/scale.md # exists bash scripts/ci-guards/no-bare-newticker-in-scheduler.sh # clean bash scripts/ci-guards/G-3-env-docs-drift.sh # clean go test ./internal/scheduler/ ./internal/api/middleware/ \ ./internal/connector/issuer/asyncpoll/ ./internal/config/ # 4/4 packages green Closes: cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M1 cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M3 cowork/certctl-architecture-diligence-audit.html#fix-SCALE-M5 cowork/certctl-architecture-diligence-audit.html#fix-SCALE-L1 cowork/certctl-architecture-diligence-audit.html#fix-SCALE-L2	2026-05-14 01:23:03 +00:00
shankar0123	21aeed4f4e	legal: addlicense headers + normalize legacy variants (Phase 0 RED-4) Phase 0 closure (Path B2, post-rewrite): addlicense sweep — adds the canonical certctl LLC copyright + BUSL-1.1 SPDX header to every production Go file. Template: // Copyright 2026 certctl LLC. All rights reserved. // SPDX-License-Identifier: BUSL-1.1 Coverage: 338 / 338 production Go files (cmd/ + internal/, excluding _test.go and /testdata/). Pre-sweep coverage was 22 / 338 (6.5%); post-sweep is 338 / 338 (100%). Normalized 22 pre-existing legacy headers (`// Copyright (c) certctl` + `// SPDX-License-Identifier: BSL-1.1`) and 1 file using a `Certctl Contributors` attribution. The legacy SPDX ID `BSL-1.1` is non-standard; the official SPDX identifier for Business Source License 1.1 is `BUSL-1.1` (capital U). All 338 files now share the canonical form. Generated via: addlicense -c "certctl LLC" -y 2026 \ -f cowork/legal/copyright-header.tpl \ -ignore '/testdata/' -ignore '/_test.go' \ cmd/ internal/ Verification: find cmd internal -name '.go' -not -name '_test.go' \ -not -path '/testdata/' \ -exec grep -L '^// Copyright 2026 certctl LLC' {} \; \| wc -l Returns: 0 gofmt clean. Header additions are comments only, no compile impact. Closes: cowork/certctl-architecture-diligence-audit.html#fix-RED-4	2026-05-13 21:23:35 +00:00
shankar0123	630831aeac	harden(audit+session): full SHA-256 audit hash + cookie segment length cap (MED-15 + Nit-4) Audit 2026-05-10 Fix 13 Phase F + Fix 14 Phase F partial — close MED-15 + Nit-4. Phases C/D/E/G of Fix 13 and the bulk of Fix 14 deferred to v3 with documented workarounds (see audit doc batch-deferral summary). MED-15: internal/api/middleware/audit.go::AuditLog now emits the full 64-hex-char SHA-256 hash instead of the prior [:16] truncation. The audit_events.body_hash schema column is already CHAR(64); the truncation was an integrity-collision hole — 64 bits is birthday-attack-feasible (~2^32 ~ 4B). Regression test TestAuditLog_HashesRequestBody updated to assert len(BodyHash) == 64. Nit-4: internal/auth/session/service.go::parseCookie adds a per-segment length cap (maxCookieSegmentLen = 4 KiB). Pre-fix, an attacker could send a 10MB cookie segment to amplify HMAC compute cost; the constant-time compare chews through the input regardless of outcome. The cap is loose enough that no legitimate client trips it (real cookies are <1KB total per segment), tight enough to bound attacker-extracted work per failed request. Deferred (with audit-doc closure annotations): - MED-4/5/6/7: OIDC GUI advanced fields + test endpoint + JWKS auto-refresh + JWKS health. v3 OIDC-operator-experience bundle. Workarounds documented. - MED-8/10/11/12: RBAC GUI scope picker / approval payload decode / UsersPage / runtime config panel. v3 GUI-polish bundle. Backend already accepts the scope_type/scope_id fields; the gap is GUI. - MED-13: MCP tools for approvals / break-glass / bootstrap. v3 MCP-expansion bundle. - MED-14: __Host- cookie rename. Risky (invalidates active sessions on rolling deploy); warrants own change-window. - MED-16/17: Pre-login UA/IP binding + RFC 9207 iss URL check. v3 OIDC-hardening bundle. - All 12 LOWs + 4 of 5 Nits: v3 cleanup bundle. Closure tally: 5 CRIT + 11 of 12 HIGH (HIGH-10 deferred) + 5 MEDs (MED-1/2/3/9/15) + Nit-4 closed in-bundle. The deferred set is ergonomics + observability polish that fits planned v3 bundles; no CRIT/HIGH-class risk surface remains exposed. Refs: cowork/auth-bundles-audit-2026-05-10.md MED-15, Nit-4 Spec: cowork/auth-bundles-fixes-2026-05-10/13-med-bundle.md Phase F cowork/auth-bundles-fixes-2026-05-10/14-low-nit-cleanup.md Phase F	2026-05-10 22:02:26 +00:00
shankar0123	00eace8068	fix(api/cors): narrow Bundle-2 routes from wildcard to NewCORS(corsCfg) Closes CRIT-3 of the 2026-05-10 audit. Bundle 2's OIDC handshake + back-channel-logout + logout + bootstrap + breakglass-login routes were wrapped by middleware.CORS — a hard-coded Access-Control-Allow-Origin: * middleware that ignored the operator's CERTCTL_CORS_ORIGINS knob (CWE-942). The properly-configured middleware.NewCORS(corsCfg) exists right next to it but wasn't used here. The deprecation comment on middleware.CORS said "Kept for health endpoints" but Bundle 2 added four additional call sites without converting them. This commit: - Renames middleware.CORS -> middleware.CORSWildcard with a stronger doc block making the security tradeoff explicit at every remaining call site. The doc references the CI guard + the 2026-05-10 audit closure. - Adds a CorsCfg middleware.CORSConfig field to router.HandlerRegistry and threads it from cmd/server/main.go using the existing cfg.CORS.AllowedOrigins value. The same config that drives the global corsMiddleware now also drives the per-route NewCORS wraps for the auth-exempt direct r.mux.Handle blocks. - Swaps middleware.CORS -> middleware.NewCORS(reg.CorsCfg) for the 7 credentialed auth-exempt routes: - GET /auth/oidc/login - GET /auth/oidc/callback - POST /auth/oidc/back-channel-logout - POST /auth/logout - POST /auth/breakglass/login - GET /api/v1/auth/bootstrap - POST /api/v1/auth/bootstrap - Keeps middleware.CORSWildcard for the 4 credential-free probe routes: - GET /health - GET /ready - GET /api/v1/version - GET /api/v1/auth/info - Adds scripts/ci-guards/cors-wildcard-allowlist.sh — pins the 4-route allowlist; fails CI when a new middleware.CORSWildcard wrap appears outside the allowlist. Adding a new wildcard call site requires updating the allowlist AND documenting why in the commit body. Operators who configured CERTCTL_CORS_ORIGINS=https://admin.example.com expecting the OIDC + BCL + breakglass-login routes to honor it now do. Previously those routes ignored the knob and emitted ACAO: * regardless. Verification gate green: - gofmt -l . clean - go vet ./... clean - go test -short -count=1 ./internal/api/... ./internal/auth/... ./internal/domain/auth/ ./internal/service/auth/ ./cmd/server/ pass - go build ./... clean - scripts/ci-guards/cors-wildcard-allowlist.sh passes (4 allowlisted routes; zero violations) CRIT-1 + CRIT-2 from the same audit are already closed on this branch (commits `68ca42f`, `ca1e135`); CRIT-4 / CRIT-5 remain open and continue to block the v2.1.0 tag. Spec: cowork/auth-bundles-fixes-2026-05-10/03-crit-3-cors-narrow.md. Refs: cowork/auth-bundles-audit-2026-05-10.md CRIT-3	2026-05-10 20:12:19 +00:00
shankar0123	99a012e3be	auth-bundle-1 Phase 0: extract internal/auth/ from middleware package Bundle 1 / Phase 0: pure refactor splitting auth surface out of internal/api/middleware so Bundle 2 (OIDC + sessions) and the broader RBAC primitive (roles, permissions, scoped grants) have a clean home. Moved to internal/auth/: NamedAPIKey, HashAPIKey, AuthConfig, NewAuthWithNamedKeys, NewAuth, UserKey, AdminKey, GetUser, IsAdmin. Added testfixtures.go (WithActor / WithAdmin / WithActorAdmin) so handler tests don't construct context manually. Stayed in internal/api/middleware/: RequestID, Logging, NewLogging, Recovery, RateLimitConfig, NewRateLimiter (now imports auth.GetUser for per-user keying per audit Category C), CORSConfig, NewCORS, ContentType, CORS, GetRequestID, responseWriter, Chain, audit middleware (now imports auth.GetUser). Updated 22 caller files across cmd/, internal/api/handler/, internal/api/middleware/, internal/mcp/. Existing m008_admin_gate_test.go now scans for auth.IsAdmin( substring; Phase 3 will further evolve to track auth.RequirePermission. Behavior unchanged: all handler / middleware / service / connector / cmd / mcp tests pass with no test-logic edits, only import-path renames. Phase 0 exit criteria: internal/auth/ exists with 6 files; middleware.go went 575 -> 422 lines (auth-related ~150 lines moved out); grep -rE 'middleware\.(GetUser\|IsAdmin\|UserKey\|AdminKey\|NamedAPIKey\|HashAPIKey\|NewAuth)' returns 0 hits; context.WithValue(.*middleware.UserKey/AdminKey) returns 0 hits; go vet ./... clean; go test -short ./... green across all packages tested. Branch: dev/auth-bundle-1. Per cowork/auth-bundle-1-prompt.md, do not merge to master without (1) make verify green, (2) >= 2 external testers confirm, (3) >= 90% coverage on internal/auth/ in .github/coverage-thresholds.yml.	2026-05-09 15:51:31 +00:00
shankar0123	0f81c1b956	ci: re-fix CodeQL #32 + repair loadtest f5-mock build context Two unrelated CI failures from run #25305811340; fixed in one commit since neither needs the other to land first. CodeQL alert #32 (go/log-injection at middleware.go:68) reopened after `b0fc067`. The previous fix introduced a scrubLogValue helper backed by strings.NewReplacer; CodeQL's taint tracker only recognizes the literal strings.ReplaceAll pattern as a sanitizer (matches the OWASP example in the rule docs). Wrapper helpers and NewReplacer don't trigger the recognition, so the analyzer kept flagging. Fix: drop the helper. Inline strings.ReplaceAll chains directly at the call site for r.Method and r.URL.Path. Same runtime semantics (strip CR/LF/NUL); CodeQL pattern-matches the literal call so the alert can finally close. Loadtest CI failure (run #25305811340 'k6 throughput run' job at make loadtest): ERROR: failed to compute cache key: failed to calculate checksum of ref ...: "/deploy/test/f5-mock-icontrol": not found The f5-mock-icontrol Dockerfile has `COPY deploy/test/f5-mock-icontrol/ ./` which assumes the build context is the repo root. The docker-compose.test.yml f5-mock-icontrol service correctly uses the long-form build: build: context: .. # = repo root from deploy/docker-compose.test.yml dockerfile: deploy/test/f5-mock-icontrol/Dockerfile The loadtest compose at deploy/test/loadtest/docker-compose.yml used the shorthand: build: ../f5-mock-icontrol That sets context = the f5-mock-icontrol directory itself, breaking the Dockerfile's COPY (it tries to find the directory inside itself). Fix: change the loadtest compose to the long-form pattern matching docker-compose.test.yml, with context: ../../.. (= repo root from deploy/test/loadtest/) and explicit dockerfile path. Verified locally: gofmt: clean. go vet ./internal/api/middleware/...: exit 0. go test -short -count=1 ./internal/api/middleware/...: ok 0.253s. python3 -c 'import yaml; yaml.safe_load(...)' on the compose file: parses clean. grep -rnE 'scrubLogValue' internal/api/: zero references (helper fully dropped). References: https://github.com/certctl-io/certctl/security/code-scanning/32 CI run https://github.com/certctl-io/certctl/actions/runs/25305811340 Closes CodeQL #32 + restores loadtest CI.	2026-05-04 17:26:24 +00:00
shankar0123	b0fc067317	security: close CodeQL #17 (log injection) + #23 (SSRF false-positive reopen) Two CodeQL alerts in one sweep — both medium-impact follow-ups on already-merged guards. Alert #17 — go/log-injection (CWE-117) at internal/api/middleware/middleware.go:58: log.Printf("[%s] %s %s %d %v", requestID, r.Method, r.URL.Path, ...) r.Method and r.URL.Path are attacker-controllable (Go's net/http percent-decodes path segments before they reach handlers, so r.URL.Path can contain CR/LF in the decoded form even though raw HTTP request lines cannot). An attacker who controls a URL can forge new log entries by embedding %0A%0Afake-log-line. Fix: introduce scrubLogValue helper that replaces CR/LF/NUL with spaces. Apply to both r.Method and r.URL.Path. Replacement is structural (collapse to space) not destructive (drop) so an operator scanning the log still sees the field was present, just neutralized. Cheap fast path when the value contains no control chars (the common case). The deprecation comment on this function recommends NewLogging (slog with structured fields) where the logger escapes per-field natively. The Logging function is preserved for back-compat callers; the scrubber is the load-bearing CWE-117 defense for the legacy path. Alert #23 — go/request-forgery (CWE-918) at scep_probe.go:271: CodeQL reopened the alert after commit `e6919cd`. The commit's in-function validator dispatch went through a function-pointer override hook: validateURL := s.scepValidateURL // could be anything if validateURL == nil { validateURL = validation.ValidateSafeURL } if err := validateURL(rawURL); err != nil { ... } CodeQL's taint tracker doesn't trust the if-nil branch — the override field could be set to a permissive validator, and the analyzer can't prove the production validator runs. Fix: invert the dispatch. Always call validation.ValidateSafeURL literally first; only consult the test-override hook to grant an EXEMPTION when the production validator rejects: if err := validation.ValidateSafeURL(rawURL); err != nil { if s.scepValidateURL == nil \|\| s.scepValidateURL(rawURL) != nil { return ... validate url error } } Same applies to ProbeSCEP's entry-point validator. Both call sites now have the literal validation.ValidateSafeURL call in-scope of the sink (client.Do), which CodeQL recognizes as a sanitizer. Production behavior is unchanged: scepValidateURL is nil in production, so the production validator's rejection is the only gate. Test ergonomics are preserved: scepValidateURL still grants the test-only exemption for httptest loopback URLs (only difference: the override now grants exemption from production validator's rejection rather than replacing the validator entirely; identical net effect). Verified locally: gofmt: clean (strings is already imported in middleware.go). go vet ./internal/api/middleware/... + ./internal/service/...: exit 0. go test -short ./internal/api/middleware/...: ok 0.244s. go test -short ./internal/service/...: ok 4.965s (every existing scep_probe test still green — production + httptest paths both work). References: https://github.com/certctl-io/certctl/security/code-scanning/17 https://github.com/certctl-io/certctl/security/code-scanning/23 Closes CodeQL #17. Re-closes CodeQL #23 with a fix CodeQL's taint tracker can verify.	2026-05-04 05:29:35 +00:00
shankar0123	7cb453a336	chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4 Mechanical reformat. The new 'gofmt drift' CI step (added in ci-pipeline-cleanup Phase 4, commit `0f205a8`) surfaced 111 files with accumulated gofmt drift across cmd/, internal/, and deploy/test/. Each file's diff is gofmt-standard: whitespace adjustments, intra- group import sorting (alphabetical by import path within blank-line- separated groups), and struct-tag column alignment. No semantic changes — verified via 'git diff --ignore-all-space' which shows only the line-position deltas from import reordering. The gate stays in place after this commit. Going forward it catches gofmt drift at PR time.	2026-04-30 22:33:57 +00:00
shankar0123	6b5af27546	Bundle G: Final audit closure — L-004 + D-003/4/5/7 closed; 54/55 + 7/7 Closes the 2026-04-25 audit's final-closure cluster. Score 51/55 -> 54/55 (98% closed); deferred 4/7 -> 7/7 (100%). All severity-graded findings now closed except M-029 (frontend per-PR migration backlog, by design incremental). L-004 (CWE-924) — dual-key API rotation overlap window: internal/config/config.go::ParseNamedAPIKeys rewritten to allow same-name duplicate entries iff admin flag matches. Mismatched-admin entries rejected at startup (privilege escalation guard); exact (name,key) duplicates rejected (typo guard — rotation requires DIFFERENT keys under the same name). Startup INFO log per name with multiple entries surfaces the active rotation window. NewAuthWithNamedKeys was already shaped correctly (constant-time hash compare across all entries, same UserKey + AdminKey for either bearer); Bundle B's M-025 per-user rate-limit bucket and audit-trail actor inherit consistency across the rollover automatically. 8 new tests pin the contract end-to-end. docs/security.md::API key rotation walks the 6-step zero-downtime rollover. D-003 — Mutation testing wired: security-deep-scan.yml gets a go-mutesting step covering ./internal/crypto/..., ./internal/pkcs7/..., ./internal/connector/issuer/local/... with per-package summary lines extracted into go-mutesting.txt artefact. D-007 — Frontend semgrep wired (recon found Bundle 7's wiring claim was false): security-deep-scan.yml gets a 'semgrep p/react-security' step running returntocorp/semgrep:latest --config=p/react-security against /src/web/src; results uploaded as semgrep-react.json. D-004 + D-005 — Operator runbook published: docs/testing-strategy.md (NEW) consolidates per-tool local-run procedures, acceptance thresholds, and triage paths for go-mutesting, ZAP baseline DAST, testssl.sh, and semgrep p/react-security. Closes the 'wired CI-only, no local-run validation' framing for D-004/D-005 by giving operators the same commands the CI workflow runs. Verification: gofmt -l no diff go vet ./internal/config/... ./internal/api/middleware/... clean go test -short -count=1 ./internal/config/... ./internal/api/middleware/... PASS python3 -c 'yaml.safe_load(...)' YAML OK G-3 env-var docs guard no phantom env-vars Audit deliverables: audit-report.md: L-004 + D-003/4/5/7 boxes flipped [x]; score 51/55 -> 54/55 findings.yaml: 5 status flips; new bundle-G-final-closure closure_log entry CHANGELOG.md: Bundle G entry under [unreleased]; supersedes Bundle E + F L-004-deferred framing	2026-04-27 02:27:44 +00:00
shankar0123	1b4de3fb2d	Bundle E: Mechanical sweeps & defensive polish — 6 findings closed; L-004 deferred Closes L-009 + L-010 + L-011 + L-013 + L-020 + L-021 from comprehensive-audit-2026-04-25. L-004 deferred — recon found NO rotation infrastructure exists at all; building it from scratch is a feature project, not a Bundle-E mechanical sweep. L-009 — ZeroSSL EAB URL configurable Audit's 'no timeout' claim was wrong: ari.go:329 has 15s timeout. internal/connector/issuer/acme/acme.go: zeroSSLEABEndpoint now lazily reads CERTCTL_ZEROSSL_EAB_URL from env at package init; defaults to ZeroSSL public endpoint. Pre-existing test override path preserved. L-010 — Verified-already-clean grep -rn 'mock\.Anything' --include='*_test.go' . returned 0. certctl uses hand-rolled struct mocks (mockJobRepo, mockAuditRepo, etc.) with explicit method bodies; no testify-style mocks anywhere. L-011 — IPv6 bracket-aware dialing pinned Every production net.Dial / DialTimeout site audited: cmd/agent/main.go:293 — intentional IPv4 literal '8.8.8.8:80' verify.go / tlsprobe / network_scan — net.Dialer (no string addr) email.go — net.JoinHostPort (bracket-aware) ssh.go — addr derives from JoinHostPort upstream ssrf.go — net.Dialer internal/connector/notifier/email/email_ipv6_test.go (NEW): TestJoinHostPort_IPv6BracketsRoundTrip pins IPv4/IPv6/zone variants; TestSMTPDialerUsesJoinHostPort source-greps email.go and fails CI if a future refactor swaps in 'host:port' concatenation. L-013 — Verified-already-clean (monotonic-safe) Only one site uses now.Sub: middleware.go:393 in tokenBucket.allow(). Both 'now' and tb.lastRefill come from time.Now() which carries monotonic-clock readings per Go's time package contract; intra-process now.Sub is monotonic-safe by construction. Doc comment block added above the call to make the invariant explicit. L-020 (CWE-563) — ineffassign sweep, 8 unique sites certificate.go:135 — sortDir initial value dropped (set unconditionally below by SortDesc branch). certificate.go:169,175 — argCount post-increments dropped (var not read past the LIMIT/OFFSET formatting). agent_group.go, profile.go — page/perPage truly vestigial, replaced with _ = page; _ = perPage. issuer.go:633, owner.go:131, target.go:267, team.go:131 — same treatment for the audit-flagged second-function ListXxx clamps. First-function List() in issuer/owner/target/team KEEPS its clamp because page/perPage is used for in-memory slice pagination — ineffassign correctly didn't flag those. Build + tests green post-sweep. L-021 — Transitive CVE bump go get golang.org/x/crypto@v0.45.0 golang.org/x/net@v0.47.0 (crypto required net@0.47.0). go-text@v0.31.0 transitively bumped. Per tool-output govulncheck-verbose: x/net@v0.45.0 fixes GO-2026-4441 + GO-2026-4440; x/crypto@v0.45.0 fixes GO-2025-4134 + GO-2025-4135 + GO-2025-4116 — all 5 advisories cleared. Bundle B's ISV grep guard + Bundle D's release-time govulncheck step are the going-forward monitor + bump pass. L-004 — Deferred to dedicated bundle Recon: zero hits for RotateAPIKey / rotated_at / key_status anywhere in source. API keys configured via CERTCTL_API_KEYS_NAMED env var; rotation is operator-managed (edit env + restart). Building rotation infrastructure from scratch is a feature project, not a mechanical sweep. Documented in audit-report.md with scope-pivot note. Audit deliverables: audit-report.md: score 46/55 -> 52/55 closed (Low 14/19 -> 19/19 — 100% Low closed except L-004 deferred) findings.yaml: 6 status flips certctl/CHANGELOG.md: Bundle E section Verification: go test -count=1 -short ./internal/service ./internal/connector/issuer/acme ./internal/connector/notifier/email green go vet on changed packages clean	2026-04-27 01:17:15 +00:00
shankar0123	30f9f1e712	Bundle B: Auth & transport surface tightening — 5 findings closed Closes M-001 + M-002 + M-013 + M-018 + M-025 from comprehensive-audit-2026-04-25. M-001 (CWE-916) — PBKDF2 100k -> 600k via v3 blob format internal/crypto/encryption.go: - New v3Magic (0x03), pbkdf2IterationsV3 (600,000 — OWASP 2024 Password Storage Cheat Sheet floor), v3SaltSize (16 bytes), deriveKeyWithSaltV3 helper. - EncryptIfKeySet now unconditionally writes v3: magic(0x03) \|\| salt(16) \|\| nonce(12) \|\| ciphertext+tag - DecryptIfKeySet falls through v3 -> v2 -> v1 with AEAD verification at each step. Wrong-passphrase v3 reads cannot be silently misattributed to v2/v1. - IsLegacyFormat updated to recognize 0x03 as non-legacy. internal/crypto/encryption_v3_test.go (NEW, 7 tests): V3 round-trip / V2 read-fallback against deterministic v2 fixture / V3 wrong-passphrase fails / V3-vs-V2 dispatch order / V2 vs V3 keys differ for same (passphrase, salt) / iteration-count pin at OWASP 2024 floor / IsLegacyFormat-recognises-V3. Coverage internal/crypto: 86.7% -> 88.2%. M-002 (CWE-862) — Auth-exempt allowlist constants + AST regression test Recon found auth-exempt surface spans TWO layers (audit's claim was incomplete): Layer 1 (router.go direct r.mux.Handle): GET /health, GET /ready, GET /api/v1/auth/info, GET /api/v1/version Layer 2 (cmd/server/main.go::buildFinalHandler URL-prefix dispatch): /.well-known/pki/, /.well-known/est/, /scep[/...]* internal/api/router/router.go: - New AuthExemptRouterRoutes constant with per-entry justifications. - New AuthExemptDispatchPrefixes constant. internal/api/router/auth_exempt_test.go (NEW, 2 tests): AST-walks router.go for every direct mux.Handle call and asserts set equals AuthExemptRouterRoutes; reads source bytes of Register / RegisterFunc and asserts they still wrap with middleware.Chain. cmd/server/auth_exempt_test.go (NEW, 2 tests): 14-case table test on buildFinalHandler asserting documented prefixes route to noAuthHandler and authenticated routes route to apiHandler; inverse-overlap pin proves no documented bypass shadows an authenticated prefix. M-013 (CWE-942) — CORS deny-by-default verified-already-clean + pin Audit claim 'default allows all origins if env-var unset' was WRONG. internal/api/middleware/middleware.go::NewCORS already denies cross- origin requests when len(cfg.AllowedOrigins) == 0 (no Access-Control-Allow-Origin header is emitted, same-origin policy applies). internal/api/middleware/cors_test.go: +TestNewCORS_NilOriginsDeniesAll + TestNewCORS_M013_ContractDocumentedInOrder (5-case table test pinning the 3-arm dispatch contract). M-018 (CWE-319 / PCI-DSS Req 4) — Postgres TLS opt-in toggle deploy/helm/certctl/values.yaml: new postgresql.tls.{mode,caSecretRef} operator-facing knobs. Default 'disable' preserves in-cluster pod- network behavior; PCI-scoped operators set verify-full. deploy/helm/certctl/templates/_helpers.tpl: certctl.databaseURL helper pipes postgresql.tls.mode into ?sslmode=. deploy/helm/certctl/templates/server-secret.yaml: uses the helper instead of hardcoded sslmode=disable. deploy/docker-compose.yml: CERTCTL_DATABASE_URL is now ${CERTCTL_DATABASE_URL:-...} so operators override without editing. docs/database-tls.md (NEW): operator runbook covering 4 deployment shapes, RDS verify-full example with PGSSLROOTCERT mount, and pg_stat_ssl verification query. helm template + helm lint clean. M-025 (OWASP ASVS L2 §11.2.1) — Per-key rate limiting internal/api/middleware/middleware.go::NewRateLimiter rewritten from a single global tokenBucket to a keyedRateLimiter map keyed on 'user:'+GetUser(ctx) for authenticated callers 'ip:'+RemoteAddr-host for unauthenticated - Empty UserKey strings treated as unauthenticated. - X-Forwarded-For intentionally NOT consulted (header-spoofing risk). - Create-on-demand bucket allocation under sync.RWMutex with double- check pattern. RateLimitConfig.PerUserRPS / PerUserBurstSize fields with env vars CERTCTL_RATE_LIMIT_PER_USER_RPS / CERTCTL_RATE_LIMIT_PER_USER_BURST allow per-user budgets distinct from per-IP. internal/api/middleware/ratelimit_keyed_test.go (NEW, 5 tests): TwoIPsHaveIndependentBuckets / SameUserDifferentIPsShareBucket / TwoUsersHaveIndependentBuckets / PerUserBudgetOverride / EmptyUserKeyTreatedAsAnonymous. Coverage internal/api/middleware: 82.1% -> 83.7%. Audit deliverables: cowork/comprehensive-audit-2026-04-25/audit-report.md: score 25/55 -> 30/55 closed (High 7/9, Medium 7/27 -> 12/27, Low 8/19). cowork/comprehensive-audit-2026-04-25/findings.yaml: 5 status flips open -> closed with closure notes citing the Bundle B mechanism. certctl/CHANGELOG.md: Bundle B section under [unreleased]. Verification: go test -count=1 -short ./... all green staticcheck on changed packages no new SA/ST hits (the 4 pre-existing SA1019 sites in cmd/server/main_test.go are Bundle 9 / M-028 partial closure leftovers tracked in Bundle C) helm template + helm lint clean internal/repository/postgres setup-fail sandbox disk pressure, same on master HEAD before this branch — environmental, not Bundle B	2026-04-26 23:09:10 +00:00
shankar0123	3e78ecb799	feat(security): bodyLimit on noAuth + security headers + encryption-key validation (H-1 master) Closes three 2026-04-24 audit findings (all P2): - cat-s5-4936a1cf0118: noAuthHandler chain accepted arbitrary-size bodies (EST simpleenroll, SCEP, PKI CRL/OCSP, /health, /ready). Memory exhaustion vector without HTTP-layer auth gatekeeping. - cat-s11-missing_security_headers: zero security headers on any response. Clickjacking, MIME-sniffing, untrusted-origin resource loads against the dashboard and API. - cat-r-encryption_key_no_length_validation: CERTCTL_CONFIG_ENCRYPTION_KEY accepted with any non-empty value including a single character. PBKDF2-SHA256 (100k rounds) does not compensate for low-entropy passphrases at scale (CWE-916, CWE-329). Changes: - cmd/server/main.go::noAuthHandler chain — added bodyLimitMiddleware + securityHeadersMiddleware. Same default cap as authed surface (1MB via CERTCTL_MAX_BODY_SIZE), same 413 on overflow. - cmd/server/main.go::middlewareStack (authed) — added securityHeadersMiddleware before corsMiddleware. - internal/api/middleware/securityheaders.go (new) — SecurityHeaders middleware + SecurityHeadersDefaults() with conservative defaults: HSTS 1y+includeSubDomains, X-Frame-Options DENY, X-Content-Type- Options nosniff, Referrer-Policy no-referrer-when-downgrade, CSP default-src 'self' + img/data + style 'unsafe-inline' (Tailwind/Vite needs it; scripts still 'self' only) + connect 'self' + frame- ancestors 'none'. Operators behind a customising reverse proxy can disable any header by setting its config field to empty. - internal/config/config.go::Validate() — enforce minEncryptionKeyLength = 32 bytes when CERTCTL_CONFIG_ENCRYPTION_KEY is set. Empty stays accepted (downstream fail-closed sentinel handles it). Structured error names the env var, the actual length, the required minimum, and the canonical generation command (`openssl rand -base64 32`). Tests: - internal/api/middleware/securityheaders_test.go (new) — 4 cases (defaults present, empty value disables single header, override applied, headers on 4xx/5xx). - internal/config/config_test.go — 5 new cases for the encryption-key length check (empty accepted, 1-byte rejected, 31-byte rejected at boundary, 32-byte accepted, 44-byte realistic operator key accepted). Documentation: - CHANGELOG.md — H-1 section above D-2 under [unreleased] with Breaking-change callout (operators with low-entropy keys must rotate before upgrade). - coverage-gap-audit-2026-04-24-v5/unified-audit.md — Live Tracker 25/47 → 33/47, P1 14/14 (zero remaining), P2 11/27 → 16/27. Three H-1 findings flipped + closed-bundle row added. Verification: - go build ./... — clean - go vet ./... — clean - golangci-lint v2.11.4 run ./... — 0 issues - go test ./internal/api/middleware/... — pass (incl. 4 new SecurityHeaders cases) - go test ./internal/config/... — pass (incl. 5 new EncryptionKey cases) - tsc --noEmit (frontend) — clean - All sibling guardrails (S-1 / G-3 / D-1 / D-2 / B-1 / L-1) still pass Audit findings closed: - cat-s5-4936a1cf0118 (P2) - cat-s11-missing_security_headers (P2) - cat-r-encryption_key_no_length_validation (P2) Breaking change: - Operators with CERTCTL_CONFIG_ENCRYPTION_KEY shorter than 32 bytes must rotate before upgrade. Generate via `openssl rand -base64 32`. Deferred follow-ups: - Weak-key dictionary check (reject password123, common ASCII patterns) — adds operational friction with low marginal entropy gain at the 32-byte minimum. - CSP 'unsafe-inline' for styles — required for Tailwind/Vite per-component <style> blocks; removing requires HTML report or component refactor outside H-1 scope. - Permissions-Policy header — dashboard uses no advanced browser APIs (camera, mic, geolocation); deferred until a real consumer needs it.	2026-04-25 16:40:21 +00:00
shankar0123	9c1d446e40	fix(security,config): remove unimplemented JWT auth-type, close silent downgrade (G-1) The pre-G-1 config validator accepted CERTCTL_AUTH_TYPE=jwt and the startup log faithfully echoed 'authentication enabled type=jwt'. Reasonable people read that and concluded JWT auth was on. It wasn't. The auth-middleware wiring at cmd/server/main.go unconditionally routed every request through the api-key bearer middleware regardless of cfg.Auth.Type. So CERTCTL_AUTH_TYPE=jwt quietly compared the incoming 'Authorization: Bearer <token>' against whatever string the operator put in CERTCTL_AUTH_SECRET — real JWT clients got 401, and operators who treated CERTCTL_AUTH_SECRET as a signing secret (because they thought they were configuring JWT) had effectively handed an attacker an api-key. A security finding masquerading as a config option. We chose the audit-recommended structural fix: remove the option, fail fast at startup, and add the gateway-fronting pattern as the documented forward path. Implementing JWT middleware would have meant jwks vs static-secret rotation, claim mapping, expiry enforcement, audience and issuer validation, key rollover semantics, and regression coverage at the same depth as the existing api-key path — a feature, not a fix. Operators who genuinely need JWT/OIDC front certctl with an authenticating gateway (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia) and run the upstream certctl with CERTCTL_AUTH_TYPE=none. Same shape works on docker-compose and Helm. The change is comprehensive across 7 phases — every surface that mentioned 'jwt' as a certctl-auth-type is updated, plus structural backstops (typed enum, runtime guard, helm template validation, CI grep guard) so the lie can't reappear. Files changed: Phase 1 — production code (typed enum + jwt removal): - internal/config/config.go: AuthType typed alias + AuthTypeAPIKey / AuthTypeNone constants + ValidAuthTypes() helper. Validate() routes literal 'jwt' through a dedicated multi-line diagnostic naming the authenticating-gateway pattern, then cross-checks against ValidAuthTypes(). Secret-required branch simplified to api-key-only. Field comment on AuthConfig.Type rewritten to drop jwt and point at the gateway pattern. - internal/api/middleware/middleware.go: AuthConfig.Type field comment references the typed config.AuthType constants. - internal/api/handler/health.go: same treatment for HealthHandler.AuthType. - cmd/server/main.go: defense-in-depth runtime switch immediately after config.Load() — exits 1 on any unsupported auth-type that bypassed the validator. Auth-disabled startup log explicitly names the authenticating-gateway pattern. Phase 2 — tests (Red→Green, contract pinning): - internal/config/config_test.go: TestValidate_JWTAuth_RejectedDedicated (two table rows pinning the dedicated G-1 error fires regardless of whether Secret is set), TestValidAuthTypesDoesNotContainJWT (property guard against future re-introduction), TestValidAuthTypesIsExactly_APIKey_None (allowed-set contract), TestValidate_GenericInvalidAuthType (pins non-jwt invalid values still hit the generic invalid-auth-type error). Removed the prior TestValidate_JWTAuth_MissingSecret happy-path since its premise is inverted post-G-1. - internal/api/handler/health_test.go: removed TestAuthInfo_ReturnsAuthType_JWT (which baked the silent-downgrade lie into the regression suite). Pre-existing _APIKey test continues to cover the api-key happy path. Phase 3 — spec, docs, env templates: - api/openapi.yaml: auth_type enum dropped to [api-key, none] with inline comment naming the G-1 closure. - .env.example (root): CERTCTL_AUTH_TYPE comment block rewritten to drop jwt and point at the gateway pattern; secret-required conditional simplified to api-key-only. - docs/architecture.md: middleware-stack bullet rewritten to drop the JWT mention; new H3 'Authenticating-gateway pattern (JWT, OIDC, mTLS)' section explaining the design rationale and listing oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium / Authelia / Caddy forward_auth / Apache mod_auth_openidc / nginx auth_request as the standard fronting options. - docs/upgrade-to-v2-jwt-removal.md (new ~125 lines): migration guide with preconditions, what-changes, both recovery paths, complete docker-compose oauth2-proxy walkthrough, Traefik ForwardAuth and Envoy ext_authz patterns, rollback posture. Phase 4 — Helm chart (template validation + docs): - deploy/helm/certctl/templates/_helpers.tpl: new certctl.validateAuthType helper mirroring the existing certctl.tls.required pattern. Fails template render on any server.auth.type outside {api-key, none} with a multi-line diagnostic. - deploy/helm/certctl/templates/server-deployment.yaml, server-configmap.yaml, server-secret.yaml: invoke the helper at the top of each template that depends on .Values.server.auth.type. - deploy/helm/certctl/values.yaml: auth: block comment expanded with the G-1 rationale and gateway-pattern cross-reference. - deploy/helm/CHART_SUMMARY.md: server.auth.type table row now surfaces the allowed set and points at the upgrade doc. - deploy/helm/certctl/README.md: new 'JWT / OIDC via authenticating gateway' section with a Kubernetes-flavored oauth2-proxy + certctl walkthrough. Phase 5 — release surface: - CHANGELOG.md: new [unreleased] top entry with Breaking / Removed / Added / Changed sections; explicit pointer at docs/upgrade-to-v2-jwt-removal.md from the Breaking subsection. Phase 6 — CI guardrail: - .github/workflows/ci.yml: new 'Forbidden auth-type literal regression guard (G-1)' step. Scoped patterns catch the actual regression shapes (map literal, slice literal, switch case, OpenAPI enum, env-file default, AuthType('jwt') cast). Comments and the dedicated rejection branch are intentionally exempt; connector-package JWT references (Google OAuth2 / step-ca) are exempt as out-of-scope external protocols. Verified locally: the guard passes on the actual tree and fires on all 4 synthetic regression patterns. Out of scope (explicitly untouched): - internal/connector/discovery/gcpsm/gcpsm.go — Google OAuth2 service- account JWT (external protocol). - internal/connector/issuer/googlecas/googlecas.go — same. - internal/connector/issuer/stepca/stepca.go — step-ca's provisioner one-time-token JWT for /sign API. - docs/test-env.md, docs/connectors.md, docs/features.md — describe external CAs' use of JWT, not certctl's auth shape. - Implementing actual JWT middleware. Feature, not a fix. Verification (all gates pass): - go build ./... — clean - go vet ./... — clean - go test -short ./... — every package green - go test -short -race ./internal/config/... ./internal/api/... — clean - govulncheck ./... — no vulnerabilities in our code - helm lint deploy/helm/certctl/ — clean - helm template with auth.type=api-key — renders OK - helm template with auth.type=none — renders OK - helm template with auth.type=jwt — fails with validateAuthType diagnostic (exit 1) - python3 yaml.safe_load on api/openapi.yaml — parses - CI guardrail mirror — clean on real tree, fires on all 4 synthetic regression patterns - Smoke test: 'CERTCTL_AUTH_TYPE=jwt ./certctl-server' exits non-zero with: 'Failed to load configuration: CERTCTL_AUTH_TYPE=jwt is no longer accepted (G-1 silent auth downgrade): no JWT middleware ships with certctl. To use JWT/OIDC, run an authenticating gateway (oauth2-proxy / Envoy ext_authz / Traefik ForwardAuth / Pomerium) in front of certctl and set CERTCTL_AUTH_TYPE=none on the upstream. See docs/architecture.md "Authenticating-gateway pattern" and docs/upgrade-to-v2-jwt-removal.md for the migration walkthrough' config pkg coverage: ValidAuthTypes 100%, Validate 94.7%, total 75.5%. Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md §2 P1 cluster, cat-g-jwt_silent_auth_downgrade Audit recommendation followed verbatim: 'Remove jwt from validAuthTypes until middleware ships'.	2026-04-25 00:22:23 +00:00
shankar0123	ff7357f889	fix(lint): godoc comment on NewAuthWithNamedKeys must lead with function name (ST1020) CI failure on master (commit `3287e17`) — staticcheck ST1020: internal/api/middleware/middleware.go:125:1: ST1020: comment on exported function NewAuthWithNamedKeys should be of the form "NewAuthWithNamedKeys ..." (staticcheck) When NewAuth was renamed to NewAuthWithNamedKeys during the M-002 auth unification, the leading godoc sentence was left pointing at the old name. Rewrite the comment so its first sentence starts with the new function name, and expand the body to describe the named-key + admin-flag contract introduced in `3287e17`. Also gitignore /.gopath/ — session-scoped tool install cache, same category as /.gocache/ and /.gomodcache/. Verification: go vet ./internal/api/middleware/... — clean go build ./internal/api/middleware/... — clean go test ./internal/api/middleware/... — PASS (0.245s) staticcheck -checks=all,<project exclusions> — clean across middleware, handler, service, domain, cmd/server, scheduler Closes: CI failure on `3287e17`.	2026-04-18 21:38:46 +00:00
shankar0123	3287e174dc	Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001) Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006) on top of the C-001/C-002 ownership + agent-FK contract fixes landed in `a53a4b8`. The work lands as a single commit spanning server, docs, tests, and the React client. M-002 — Named API keys with per-key actor propagation * Migration 000014 adds the 'api_keys' table (id, name, hash, principal, role, created_at, last_used_at, disabled_at) so every credential carries an identifiable principal instead of the opaque 'anonymous'/'api-key' sentinel. * Auth middleware now rotates through configured keys, performs constant-time hash comparison, stamps 'last_used_at', and emits an actor struct via contextWithActor(). The audit middleware, bulk-revocation handler, approval handlers, and MCP tool layer now read the principal off the context and persist it on every audit_events row. * Regression coverage: - internal/api/middleware/audit_test.go — actor propagation, principal redaction for disabled keys, anonymous fallback for unauthenticated endpoints. - internal/api/handler/bulk_revocation_handler_test.go, job_handler_test.go — principal-on-audit assertions. M-003 — Authorization gates (Phase B) * Approval handler rejects self-approval / self-rejection with 403 when the actor principal equals the job's requested_by field. * Bulk revocation is gated behind the 'admin' role; operators and viewers receive 403. * Regression coverage: - internal/service/job_test.go — TestApproveJob_NotSelf, TestRejectJob_NotSelf. - internal/api/handler/bulk_revocation_handler_test.go — TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds. M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux * Per RFC 8615, relying parties cannot reasonably be asked to authenticate against the issuing certctl instance to retrieve revocation material. CRL and OCSP move off the authenticated '/api/v1/crl' and '/api/v1/ocsp/' paths onto: GET /.well-known/pki/crl/{issuer_id} Content-Type: application/pkix-crl (RFC 5280 §5) GET /.well-known/pki/ocsp/{issuer_id}/{serial} Content-Type: application/ocsp-response (RFC 6960) * Non-standard JSON CRL shape is removed; only DER is served. * Short-lived certificate exemption (profile TTL < 1h → skip CRL/OCSP) is preserved; the response simply omits the serial. * Routes are registered on the unauthenticated 'finalHandler' mux in cmd/server/main.go alongside EST ('/.well-known/est/') and SCEP ('/scep'). Legacy authenticated paths return 404. Regression coverage: - internal/api/handler/certificate_handler_test.go — content type, DER parseability, 404 for unknown issuer. - internal/api/handler/adversarial_path_test.go — unauthenticated access asserted for CRL, OCSP, EST, SCEP. - internal/api/router/router_test.go — route-table assertion that '.well-known/pki/', '.well-known/est/', and '/scep' are mounted on the unauthenticated branch. M-001 — Auto-closed by M-002 EST and SCEP were already registered on the unauthenticated 'finalHandler' mux; the router comment at internal/api/router/router.go:247 now matches reality. The adversarial-path tests above lock the behavior in. Verification (all gates green): * go vet ./... — clean * go build ./... — ok * go test -short ./... (55+ packages) — all pass * web/ : npm test (225 Vitest tests) — all pass * web/ : npx tsc --noEmit — clean * grep sweep for '/api/v1/(crl\|ocsp)' — 13 surviving hits, all intentional M-006 tombstone/relocation comments. Documentation: * coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 → Fixed, with per-finding resolution paragraphs citing regression test IDs. (Audit file lives outside this repo; see cowork root.) * CLAUDE.md Project Status line updated with the auth-unification closure note. * docs/features.md, docs/architecture.md, docs/quickstart.md, docs/concepts.md, docs/connectors.md, docs/test-env.md, docs/testing-guide.md, docs/compliance-.md, docs/demo-advanced.md — refreshed for the new '.well-known/pki/' namespace and named API keys. * api/openapi.yaml — documents the new unauthenticated endpoints and removes the legacy '/api/v1/crl' + '/api/v1/ocsp/' paths. .gitignore: adds '/.gocache/' and '/.gomodcache/' for the session- scoped Go caches so they never enter the tree.	2026-04-18 18:17:41 +00:00
shankar0123	e3196e7b50	M-2 PR-F: Middleware/ACME ctx-propagation + contextcheck linter + audit closeout Final PR in the six-commit M-2 sequence (PR-A: CertificateService cluster `cdc9d03`, PR-B: IssuerService+TargetService `eb14236`, PR-C: Policy/Profile/ Owner/Team `2497be4`, PR-D: Job/Notification/Audit `ccd89c3`, PR-E: AgentService `283ec27`, PR-F: this commit). PR-A through PR-E collapsed the service-layer shim methods and deleted every in-production context.Background() / context.TODO() call from internal/service/; this PR completes the sweep across the non-service tiers (HTTP middleware + ACME connector) and wires the contextcheck linter so regressions fail CI. Three narrow edits land the D-3 pattern (context.WithoutCancel for subsidiary async writes and deferred shutdown contexts): - internal/api/middleware/audit.go -- async audit goroutine now runs on auditCtx := context.WithoutCancel(r.Context()) instead of context.Background(). Preserves request-scoped values (trace ID, auth) while detaching from the request's cancellation so the audit write does not get killed when the response completes. Goroutine is still tracked via a.wg (M-1 shutdown drain) so Flush(ctx) behaviour is unchanged. CWE-770 Missing Release (goroutine leak potential) + CWE-400 Resource Exhaustion (missed cancellation propagation). - internal/api/middleware/middleware.go -- Recovery panic path now logs via slog.ErrorContext(ctx, ...) instead of log.Printf. Request- scoped trace/auth metadata now carries through the panic log, matching every other request log. D-3 non-bypass: the context is r.Context() captured before the defer, so even a panic mid-handler propagates the ctx's trace ID into the ERROR log line. - internal/connector/issuer/acme/acme.go (HTTP-01 challenge server shutdown) -- defer shutdown context derived from context.WithTimeout(context.WithoutCancel(ctx), 5s) instead of context.Background(). Preserves parent ctx values, detaches from parent cancellation so Shutdown always gets its full 5-second budget even when the parent was cancelled. Matches the same pattern applied in ACME's solveAuthorizationsDNS01 and solveAuthorizationsDNSPersist01. Linter wiring: .golangci.yml adds `contextcheck` to the enabled set. golangci-lint v2.11.4 now fails CI on any function that takes a context.Context parameter but calls into context.Background() or context.TODO() instead of propagating -- regression guard for all five prior PRs. Verification (CI parity, GOCACHE=/tmp/gocache GOMODCACHE=/tmp/gomodcache GOLANGCI_LINT_CACHE=/tmp/lintcache): - go build ./... -> 0 - go vet ./... -> 0 - golangci-lint run (contextcheck enabled) -> 0 issues - go test -race -short ./internal/api/middleware/... -> PASS - go test -race -short ./internal/scheduler/... -> PASS - go test -race -short ./internal/connector/issuer/acme/... -> PASS - go test -race -short ./internal/service/... -> PASS - rg "context\.(Background\|TODO)" internal/service/ internal/scheduler/ internal/connector/ internal/api/middleware/ -> 0 non-test hits (one pedagogical godoc reference in audit.go documenting why context.Background() would be wrong remains intentional) Wire-format invariants preserved: 0 API routes, 0 SQL migrations, 0 frontend bytes, 0 OpenAPI bytes, 0 connector interface signature changes, 0 new env vars, 0 new external dependencies (pure context stdlib). The AuditRecorder interface signature, the body-hash algorithm (SHA-256 16 hex chars), the excluded-path short-circuit, the actor-extraction path, the responseWriter status-capture wrapper, the AuditServiceAdapter, and all 116 API routes under /api/v1/, /.well-known/est/, /scep, /health, /auth are byte-identical. M-2 aggregate across PR-A through PR-F: 57 files, +635 / -613 (PR-A 12f +227/-237, PR-B 9f +150/-146, PR-C 17f +156/-148, PR-D 11f +67/-63, PR-E 4f +9/-15, PR-F 4f +26/-4). With M-2 closed, 8 of 10 Medium findings resolved; M-9, M-10, L-1..L-4, I-1..I-8 remain post-v2.1.0 hardening batch. Audit complete. Commit: `1f6cf0eafa`. Sections: 12. Findings: 2/7/10/4/6.	2026-04-18 01:43:47 +00:00
shankar0123	d14a45401b	fix(audit): drain in-flight recording goroutines on shutdown (M-1) Audit events spawned from the HTTP middleware ran in detached goroutines using context.Background(). On SIGTERM the DB pool was closed before those goroutines finished writing, silently dropping audit events (CWE-662 Improper Synchronization / CWE-400 Uncontrolled Resource Consumption). NewAuditLog now returns an *AuditMiddleware struct that tracks every spawned goroutine with sync.WaitGroup. Callers wire the middleware via its Middleware method value (preserves the existing func(http.Handler) http.Handler shape) and drain the WaitGroup with Flush(ctx), which blocks until in-flight recordings complete or the provided context is cancelled — mirroring scheduler.WaitForCompletion. Flush is invoked in cmd/server/main.go between http.Server.Shutdown (no new requests accepted) and db.Close (pool torn down), with a timeout returning ErrAuditFlushTimeout wrapping ctx.Err(). Request-derived inputs (method, path, status) are snapshotted before the goroutine spawn so the worker does not race with http.Server reusing r after the handler returns. Tests: TestAuditLog_FlushDrainsInFlightGoroutines TestAuditLog_FlushTimeoutReturnsErrAuditFlushTimeout Verification: go build ./... : 0 go vet ./... : 0 go test -race -short ./... : 0 (all packages) go test -cover ./internal/api/middleware : 81.4% golangci-lint run : 0 issues govulncheck ./... : 0 vulns in called code	2026-04-17 17:29:48 +00:00
shankar0123	7382e5f03b	test: comprehensive test gap closure across 24 packages Close coverage gaps identified by dual-audit (qualitative + quantitative). New test files for config (0%→98%), router (0%→100%), handler validation, health, audit, response helpers, webhook notifier (0%→88%), email notifier, middleware (recovery, rate limiter), domain profile, service nil-safety, config helpers, issuer bootstrap, and server bootstrap wiring. Expanded existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent (43%→63%), scheduler (88%→99%), renewal service, and issuerfactory. All tests pass: go test -short, go vet, go test -race clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-09 23:09:40 -04:00
shankar0123	6d508cf53f	fix: security audit remediation (AUDIT-001, 003, 004, 005, 006, 018) - AUDIT-001: Validate OpenSSL revoke inputs (hex-only serials, RFC 5280 reasons) - AUDIT-003: Enforce /20 CIDR size cap at API level (create + update) - AUDIT-004: Support comma-separated CERTCTL_AUTH_SECRET for zero-downtime key rotation - AUDIT-005: Add ReadHeaderTimeout (5s) to prevent Slowloris - AUDIT-006: Document audit trail query parameter exclusion rationale - AUDIT-018: Add immediate-run-on-start to short-lived expiry scheduler loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 14:11:16 -04:00
shankar0123	de9264baf7	docs: synchronize project documentation with codebase Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010) and performs comprehensive documentation audit to eliminate drift between code and docs. Code changes: - TICKET-003: Repository integration tests with testcontainers-go (50+ subtests) - TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc - TICKET-010: Request body size limits via http.MaxBytesReader middleware - Fix missing slog import in certificate.go after service decomposition Documentation updates: - README: Fix endpoint count (97→93), expand env var reference (15→39 vars) - CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations - architecture.md: Add body size limits section, middleware chain ordering - CONTRIBUTING.md: New contributor guide with architecture conventions, test patterns, middleware ordering, CI thresholds - SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored) - Test files: Add doc comments to all new test files Documentation that should exist but doesn't yet: - Architecture diagrams (C4 model or similar) - Threat model document - Testing philosophy guide - Disaster recovery runbook - Upgrade guide (migration between versions) - API versioning strategy document Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 22:28:54 -04:00
shankar0123	4655f68e87	fix(testing): TICKET-015 replace time.Sleep with channel-based sync in audit tests The audit middleware records events asynchronously via goroutines. Tests previously used time.Sleep(50ms) to wait for audit recording, which is unreliable. Implemented waitableAuditRecorder wrapper that: - Wraps mockAuditRecorder to intercept RecordAPICall invocations - Signals via buffered channel when recording completes - Provides Wait(timeout) method for tests to synchronously wait - Returns true on successful wait, false on timeout Replaced all 7 time.Sleep(50ms) calls with recorder.Wait(1*time.Second) calls, improving test reliability and reducing flakiness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 21:40:28 -04:00
shankar0123	3e3e68fd3a	fix(security): TICKET-009 add HTTP timeouts to notifier clients - Added TestSlack_ClientHasTimeout to verify 10-second timeout - Added TestTeams_ClientHasTimeout to verify 10-second timeout - Added TestPagerDuty_ClientHasTimeout to verify 10-second timeout - Added TestOpsGenie_ClientHasTimeout to verify 10-second timeout - All notifiers already configured with 10 second timeout in New() - Tests verify timeout is set and matches expected value	2026-03-27 21:33:31 -04:00
shankar0123	9b0ff37973	feat: M19 API audit log + M16a notifier connectors (Slack, Teams, PagerDuty, OpsGenie) M19: HTTP middleware records every API call to the immutable audit trail with method, path, actor, SHA-256 body hash, status, and latency. Best-effort async recording via goroutine. Health/ready probes excluded. M16a: Four pluggable notifier connectors — Slack (incoming webhook), Teams (MessageCard), PagerDuty (Events API v2), OpsGenie (Alert API v2). Each enabled by config env var. 30 new tests across middleware and connectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 17:58:14 -04:00
shankar0123	ee75f149ae	feat: M14 — Observability (dashboard charts, agent fleet, stats API, metrics, structured logging, rollback) Backend: StatsService with 5 aggregation methods, JSON metrics endpoint, slog-based structured logging middleware. Stats API: dashboard summary, certificates-by-status, expiration timeline, job trends, issuance rate. 23 new backend tests. Frontend: Recharts-powered dashboard with 4 charts (status pie, expiration heatmap, job trends line, issuance bar), agent fleet overview page with OS/arch grouping and version breakdown, deployment rollback buttons on version history. 7 new frontend tests. 78 API endpoints, 744+ total tests (658 Go + 86 Vitest). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 19:46:13 -04:00
shankar0123	28205e1131	Implement M7: auth middleware, rate limiting, CORS, and GUI login flow Add SHA-256 API key authentication with constant-time comparison, configurable token bucket rate limiter, CORS origin allowlist middleware, and React auth context with login page. Auth info endpoint bootstraps GUI without credentials. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 11:58:13 -04:00
shankar0123	d395776a95	Initial scaffold: certificate control plane v0.1.0	2026-03-14 08:22:17 -04:00

31 Commits