Commit Graph

10 Commits

Author SHA1 Message Date
shankar0123 3287e174dc Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
a53a4b8. The work lands as a single commit spanning server, docs, tests,
and the React client.

M-002 — Named API keys with per-key actor propagation
  * Migration 000014 adds the 'api_keys' table (id, name, hash,
    principal, role, created_at, last_used_at, disabled_at) so every
    credential carries an identifiable principal instead of the
    opaque 'anonymous'/'api-key' sentinel.
  * Auth middleware now rotates through configured keys, performs
    constant-time hash comparison, stamps 'last_used_at', and emits
    an actor struct via contextWithActor(). The audit middleware,
    bulk-revocation handler, approval handlers, and MCP tool layer
    now read the principal off the context and persist it on every
    audit_events row.
  * Regression coverage:
      - internal/api/middleware/audit_test.go — actor propagation,
        principal redaction for disabled keys, anonymous fallback for
        unauthenticated endpoints.
      - internal/api/handler/bulk_revocation_handler_test.go,
        job_handler_test.go — principal-on-audit assertions.

M-003 — Authorization gates (Phase B)
  * Approval handler rejects self-approval / self-rejection with 403
    when the actor principal equals the job's requested_by field.
  * Bulk revocation is gated behind the 'admin' role; operators and
    viewers receive 403.
  * Regression coverage:
      - internal/service/job_test.go — TestApproveJob_NotSelf,
        TestRejectJob_NotSelf.
      - internal/api/handler/bulk_revocation_handler_test.go —
        TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.

M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
  * Per RFC 8615, relying parties cannot reasonably be asked to
    authenticate against the issuing certctl instance to retrieve
    revocation material. CRL and OCSP move off the authenticated
    '/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
        GET /.well-known/pki/crl/{issuer_id}
            Content-Type: application/pkix-crl   (RFC 5280 §5)
        GET /.well-known/pki/ocsp/{issuer_id}/{serial}
            Content-Type: application/ocsp-response  (RFC 6960)
  * Non-standard JSON CRL shape is removed; only DER is served.
  * Short-lived certificate exemption (profile TTL < 1h → skip
    CRL/OCSP) is preserved; the response simply omits the serial.
  * Routes are registered on the unauthenticated 'finalHandler' mux
    in cmd/server/main.go alongside EST ('/.well-known/est/*') and
    SCEP ('/scep'). Legacy authenticated paths return 404.
  * Regression coverage:
      - internal/api/handler/certificate_handler_test.go — content
        type, DER parseability, 404 for unknown issuer.
      - internal/api/handler/adversarial_path_test.go — unauthenticated
        access asserted for CRL, OCSP, EST, SCEP.
      - internal/api/router/router_test.go — route-table assertion
        that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
        mounted on the unauthenticated branch.

M-001 — Auto-closed by M-002
  EST and SCEP were already registered on the unauthenticated
  'finalHandler' mux; the router comment at
  internal/api/router/router.go:247 now matches reality. The
  adversarial-path tests above lock the behavior in.

Verification (all gates green):
  * go vet ./...                                           — clean
  * go build ./...                                         — ok
  * go test -short ./... (55+ packages)                    — all pass
  * web/ : npm test (225 Vitest tests)                     — all pass
  * web/ : npx tsc --noEmit                                — clean
  * grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
    all intentional M-006 tombstone/relocation comments.

Documentation:
  * coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
    Fixed, with per-finding resolution paragraphs citing regression
    test IDs. (Audit file lives outside this repo; see cowork root.)
  * CLAUDE.md Project Status line updated with the auth-unification
    closure note.
  * docs/features.md, docs/architecture.md, docs/quickstart.md,
    docs/concepts.md, docs/connectors.md, docs/test-env.md,
    docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
    — refreshed for the new '.well-known/pki/*' namespace and named
    API keys.
  * api/openapi.yaml — documents the new unauthenticated endpoints
    and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.

.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
2026-04-18 18:17:41 +00:00
shankar0123 a53a4b845b fix(gui,api): close C-001 + C-002 — ownership + agent FK contract
C-001 — CreateCertificate was server-accepted with null owner_id,
team_id, renewal_policy_id because the GUI neither collected the fields
nor enforced them, even though the backend's ManagedCertificate schema
and handler contract treat them as required. Fix the contract at all
four layers:

  - web/src/pages/CertificatesPage.tsx: replace owner_id/team_id free-
    text inputs with <select> elements fed by getOwners/getTeams/
    getPolicies queries; mark all three required; gate the Create
    button on owner_id + team_id + renewal_policy_id being set.
  - internal/api/handler/certificates.go: ValidateRequired for
    owner_id, team_id, renewal_policy_id on CreateCertificate so the
    handler returns HTTP 400 with the offending field name before the
    service layer is reached.
  - internal/mcp/types.go: drop ',omitempty' from
    CreateCertificateInput.RenewalPolicyID so the MCP schema reflects
    the required contract; Update inputs keep partial-update semantics.
  - api/openapi.yaml: 'required: [name, common_name, renewal_policy_id,
    issuer_id, owner_id, team_id]' was already present on the Create
    schema; clarified DeploymentTarget.agent_id description to note the
    FK contract.

C-002 — CreateTargetWizard accepted an empty or bogus agent_id and the
service inserted directly, producing a Postgres 23503 FK-violation that
bubbled out as a generic HTTP 500. The FK itself (migration 000001 line
104: agent_id TEXT NOT NULL REFERENCES agents(id)) is correct; we keep
the schema strict and add validation at three layers:

  - internal/service/target.go: introduce
    ErrAgentNotFound sentinel and pre-validate agent_id in
    TargetService.CreateTarget — empty string returns
    'agent_id is required'; a nonexistent id returns the full
    'referenced agent does not exist: <id>' error. Both wrap
    ErrAgentNotFound via fmt.Errorf %w so callers can use errors.Is.
  - internal/api/handler/targets.go: ValidateRequired on agent_id; map
    errors.Is(err, service.ErrAgentNotFound) to HTTP 400 instead of
    letting it fall through to the generic 500 branch.
  - internal/mcp/types.go: drop ',omitempty' from
    CreateTargetInput.AgentID to match the required contract.
  - web/src/pages/TargetsPage.tsx: replace the free-text Agent ID input
    with a <select> populated from getAgents(); include agent in the
    canProceedToReview gate so Next is disabled until an agent is
    chosen.

Regression coverage (21 new subtests total):

  - TestCreateCertificate_MissingRequiredField_Returns400 — 6 subtests,
    one per required field, each proves the handler guard fires before
    the mock service is called.
  - TestCreateTarget_MissingAgentID_Returns400 — handler guard.
  - TestCreateTarget_NonexistentAgent_Returns400 — pins the
    ErrAgentNotFound -> 400 translation.
  - TestTargetService_CreateTarget_MissingAgentID — errors.Is sentinel.
  - TestTargetService_CreateTarget_NonexistentAgentID — errors.Is.
  - The existing TestTargetService_CreateTarget_Success, along with
    TestCreateTarget_{MissingName,MissingType,NameTooLong}_* handler
    tests, were updated to seed a real agent or include agent_id in
    the request body so the happy paths still run cleanly.

Gates (Phase 4):
  - go build/vet/test/race: green
  - go test -cover: internal/service 68.7% (gate 55%),
    internal/api/handler 78.9% (gate 60%)
  - golangci-lint on service+handler+mcp: 0 issues
  - govulncheck: no reachable vulns
  - tsc --noEmit: clean
  - vitest: 223/223 passing

See cowork/certctl-coverage-gap-audit.md entries C-001 and C-002.
2026-04-18 16:01:40 +00:00
shankar0123 cdc9d03d5b fix(m-2): thread context through CertificateService cluster
Collapses CertificateService, RevocationSvc, and CAOperationsSvc to
ctx-accepting method signatures. Removes context.Background() synthesis
at 24 internal call sites across certificate.go, revocation_svc.go, and
ca_operations.go.

- Primary repo calls inherit request cancellation via the passed ctx.
- Audit and notification dispatches use context.WithoutCancel(ctx) so
  they survive client disconnect.
- Collapses TriggerRenewal/TriggerRenewalWithActor,
  TriggerDeployment/TriggerDeploymentWithActor, and
  RevokeCertificate/RevokeCertificateWithActor sibling pairs into single
  canonical ctx-accepting methods (decisions D-1, D-2).

Handlers pass r.Context(). Mocks and tests updated to match new
signatures. No HTTP surface change, no OpenAPI change.

PR 1 of 6 in the M-2 remediation chain. Master green at this commit.

Refs: certctl-audit-report.md M-2 (L143, L224)
2026-04-18 00:29:37 +00:00
shankar0123 a8fc177118 fix: resolve NULL csr_pem scan errors and QA smoke test failures
Root cause: certificate_versions.csr_pem is nullable in the schema but
Go code scanned it into a plain string. Used sql.NullString in
ListVersions and GetLatestVersion to handle NULL values correctly.

Also includes: partial update fetch-merge-update pattern to prevent FK
violations, nil directory guard in discovery service, diagnostic slog
logging in handlers, export handler 422 for unparseable PEM, OpenAPI
spec corrections, MCP tool description improvements, and test fixes.

Rewrites the Release Sign-Off section in testing-guide.md to individual
test-level granularity (320 rows) with smoke test results audited and
checked off (121 pass, 5 skip, 194 manual remaining).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 00:51:18 -04:00
shankar0123 e078a686bf feat: M20 Enhanced Query API — sort, time-range filters, cursor pagination, sparse fields, deployments endpoint
V2 (free) query enhancements for certificates:
- `sort` param with direction (`?sort=-notAfter` for descending)
- Time-range filters: `expires_before`, `expires_after`, `created_after`, `updated_after`
- Cursor-based pagination (`?cursor=token&page_size=100`) alongside page-based
- Sparse field selection (`?fields=id,commonName,status`)
- Additional filters: `agent_id`, `profile_id`
- New endpoint: `GET /api/v1/certificates/{id}/deployments`

25 new tests (12 handler + 13 e2e) covering all M20 features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:56:02 -04:00
shankar0123 ed989d81fd fix: wire issuer registry in revocation tests, correct CRL/OCSP handler test URLs
Service tests: newRevocationTestService() was missing SetIssuerRegistry(),
causing all 8 CRL/OCSP tests to fail with "issuer registry not configured".

Handler tests: CRL tests used /api/v1/issuers/{id}/crl but handler parses
/api/v1/crl/{id}. OCSP tests used query string ?serial=X but handler
expects path param /api/v1/ocsp/{id}/{serial}. Fixed all 9 test URLs.

All issues pre-date CI on v2-dev — introduced during M15b.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 16:02:36 -04:00
shankar0123 762c523d59 feat: M15b — OCSP responder, DER CRL, short-lived exemption, revocation GUI
Backend:
- Embedded OCSP responder: GET /api/v1/ocsp/{issuer_id}/{serial} returns
  signed OCSP responses (good/revoked/unknown) using CA key
- DER-encoded X.509 CRL: GET /api/v1/crl/{issuer_id} returns proper DER CRL
  signed by issuing CA with 24h validity window
- Short-lived cert exemption: certs with profile TTL < 1 hour skip CRL/OCSP
  (expiry is sufficient revocation for ephemeral workloads)
- Extended issuer connector interface with GenerateCRL and SignOCSPResponse
- Local CA implements full CRL/OCSP signing; ACME and step-ca return
  appropriate "use native endpoint" errors
- IssuerConnectorAdapter bridges new methods between layers

Frontend:
- Revoke button on certificate detail page with RFC 5280 reason modal
- Revocation banner with reason display and timestamp
- Revocation status indicators in lifecycle section
- "Revoked" filter option in certificates list
- API client: revokeCertificate() function and Certificate type extensions

Tests: ~31 new tests across connector, service, handler, and adapter layers
Docs: milestones renumbered (M13-M14, M16-M18), M15b marked complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 14:39:10 -04:00
shankar0123 5d98e373e3 feat: M15a — certificate revocation API, CRL endpoint, and revocation notifications
Implements core revocation infrastructure: POST /api/v1/certificates/{id}/revoke
with all 8 RFC 5280 reason codes, JSON-formatted CRL at GET /api/v1/crl, webhook
and email revocation notifications, best-effort issuer notification, and immutable
revocation audit trail. Includes 48 new tests across service, handler, integration,
and domain layers (600+ total). Fixes 3 pre-existing test bugs (team_test error
matching, agent_group delete status code, team handler per_page validation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 10:59:18 -04:00
shankar0123 9e6756d02f Implement M5: hardening, input validation, and Vite+React+TS dashboard
Backend hardening:
- Fix 6 nginx.go non-constant format string build errors
- Add validation.go with hostname, PEM, and enum validators
- Apply input validation to all POST/PUT handlers (certificates,
  agents, CSR, policies, teams, owners, targets, issuers)
- Fix unchecked JSON decode in TriggerDeployment handler

Frontend (Vite + React + TypeScript):
- Migrate from single-file SPA to proper build pipeline
- 7 pages: Dashboard, Certificates (list+detail), Agents, Jobs,
  Notifications, Policies, Audit Trail
- TanStack Query for server state with auto-refetch intervals
- Certificate detail with version history and renewal trigger
- Job cancellation, status/type filtering, expiry countdowns
- Reusable components: DataTable, StatusBadge, ErrorState, PageHeader
- Dark theme with Tailwind CSS, sidebar nav via React Router

Server integration:
- Go server serves web/dist/ (Vite output) with SPA fallback
- Falls back to web/index.html for legacy mode
- .gitignore updated for web/node_modules/ and web/dist/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 01:19:19 -04:00
shankar0123 5553568495 Implement M4: comprehensive test coverage with 120 tests
Service layer (63 tests): certificate, agent, audit, job, notification,
policy, and renewal services with mock repositories covering threshold
alerting, deduplication, status transitions, and job processing.

Handler layer (46 tests): certificate and agent HTTP handlers using
httptest with mock service interfaces, covering success/error paths,
pagination, JSON marshaling, and path parameter extraction.

Integration (11 subtests): end-to-end certificate lifecycle test
exercising real services and Local CA issuer through HTTP API —
create cert, trigger renewal, process jobs, register agent, heartbeat,
verify audit trail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 00:25:01 -04:00