Commit Graph

27 Commits

Author SHA1 Message Date
shankar0123 2519da85f0 docs: README + concepts + features reflect CRL/OCSP responder bundle
Audit pass against cowork/crl-ocsp-responder-prompt.md found three
operator-facing docs still describing the pre-bundle CRL/OCSP surface
(GET-only OCSP, CA-key-direct signing, no scheduler-driven cache). Each
claim updated below was ground-truthed against repo HEAD before edit.

README.md
  * Standards & Revocation table — CRL row now mentions
    scheduler-pre-generated cache (CERTCTL_CRL_GENERATION_INTERVAL,
    crl_cache table); OCSP row mentions GET + POST forms, dedicated
    responder cert per RFC 6960 §2.6, id-pkix-ocsp-nocheck per
    §4.2.2.2.1, 7d auto-rotation grace.
  * Revocation paragraph — corrected the 'Embedded OCSP responder'
    one-liner to call out the dedicated-responder-cert design (the CA
    private key is never used directly for OCSP signing, which is the
    load-bearing security property for the future PKCS#11/HSM driver
    path) and added the link to the relying-party guide.

docs/concepts.md
  * CRL paragraph — added the scheduler pre-generation + singleflight
    coalescing detail. Kept the existing 24h validity claim (verified
    against internal/connector/issuer/local/local.go:956 — 'NextUpdate:
    now.Add(24 * time.Hour)').
  * OCSP paragraph — corrected the description so it covers both GET
    and POST forms (POST per RFC 6960 §A.1.1 is what production
    clients use: Firefox, OpenSSL s_client -status, cert-manager,
    Intune); added the dedicated-responder-cert + nocheck-extension +
    auto-rotation explanation; cross-link to docs/crl-ocsp.md.

docs/features.md
  * Revocation Infrastructure section — CRL Endpoint, OCSP Responder,
    new Admin Cache Observability subsection, new GUI Revocation
    Endpoints Panel subsection. Corrected the previously-wrong 'Signs
    with the issuing CA key' OCSP claim — the bundle's load-bearing
    security improvement is exactly that the CA key is NOT used
    directly. Cross-link to crl-ocsp.md.
  * Local CA env vars table — added all four new
    CERTCTL_CRL_GENERATION_INTERVAL / CERTCTL_OCSP_RESPONDER_KEY_DIR
    (with the prod 'MUST set' callout) / _ROTATION_GRACE / _VALIDITY
    rows. Closes the G-3 'env var defined in Go but never documented'
    drift that broke CI on commit fc3c7ad.
  * Migrations table — added 000019_crl_cache and 000020_ocsp_responder
    rows so the table reflects the bundle's persisted surface area;
    also clarified the table is illustrative + pointed readers at
    'ls migrations/*.up.sql' for the full sequence (the table had
    drifted behind reality at 000010 even before this bundle).

docs/architecture.md was already updated in commit b4334ed with the
same content scope, so no further architecture edits.

Verification:
  * Local G-3 set difference: empty (Go-defined ∖ docs-mentioned for
    CRL/OCSP env vars).
  * 24h CRL validity claim verified against local.go:956 NextUpdate.
  * Migration numbers verified against 'ls migrations/000019* 000020*'.
  * id-pkix-ocsp-nocheck OID verified against
    internal/connector/issuer/local/ocsp_responder.go:60.
2026-04-29 03:20:44 +00:00
shankar0123 0725713e19 Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.

Schema — migration 000015:
  migrations/000015_agent_retire.up.sql flips
  deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
  RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
  boundary instead of quietly destroying targets. Both `agents` and
  `deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
  TEXT pair (TEXT not VARCHAR so operator comments are never
  truncated), indexed via partial indexes WHERE retired_at IS NOT
  NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
  CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
  EXISTS) so repeated runs against partially-migrated databases
  converge. migrations/000015_agent_retire.down.sql restores CASCADE
  and drops the new columns for clean rollback. A dedicated
  repository-layer testcontainers test
  (internal/repository/postgres/migration_000015_test.go) asserts the
  before/after FK action, column presence, index presence, and
  round-trip idempotency under up→down→up.

Domain — sentinel guard + dependency counts:
  internal/domain/connector.go gains IsRetired() on Agent, the
  exported SentinelAgentIDs slice listing server-scanner,
  cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
  four reserved IDs documented in CLAUDE.md and created at startup in
  cmd/server/main.go), IsSentinelAgent(id string) predicate,
  AgentDependencyCounts{ActiveTargets, ActiveCertificates,
  PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
  ActorTypeSystem enum values used by audit emission downstream.
  Coverage locked down by internal/domain/connector_test.go.

Service — 8-step ordered contract:
  internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
  opts{Force, Reason}) enforces a fixed execution order:
  (1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
      unconditionally; force=true does NOT bypass it.
  (2) fetch — ErrAgentNotFound on miss.
  (3) idempotency — if IsRetired() already, return
      AgentRetirementResult{AlreadyRetired: true} with no new audit
      event and no state change (safe to replay from flaky clients).
  (4) preflight counts — collectAgentDependencyCounts runs
      ActiveTargets, ActiveCertificates, PendingJobs sequentially
      (not in parallel; keeps the per-query timeout predictable and
      matches the repo's existing call-chain shape).
  (5) force-reason guard — opts.Force=true with empty Reason returns
      ErrForceReasonRequired (wired into the 400 status surface).
  (6) dependency guard — HasDependencies() with opts.Force=false
      returns BlockedByDependenciesError{Counts} (wired into the 409
      body with per-bucket counts).
  (7) mutation — single pinned retiredAt := time.Now(); agent
      retirement first, then cascade target retirement if opts.Force,
      all under the repo's single transaction so the two retired_at
      stamps match to the second.
  (8) best-effort audit — agent_retired always; agent_retirement_
      cascaded additionally on the force path. Actor is whatever the
      handler resolves from the request; actor type is mapped by
      resolveActorType (system/agent-prefix→Agent/else→User). Audit
      emission failures are logged via slog.Error but do not abort
      the retirement (matches the house convention used by every
      other scheduler-emitted event).

  BlockedByDependenciesError implements Error() as
  "active_targets=%d, active_certificates=%d, pending_jobs=%d" and
  Unwrap() → ErrBlockedByDependencies. The single struct satisfies
  errors.Is via Unwrap (used by scheduler-level tests) and errors.As
  via the concrete type (used by the handler to fish out Counts for
  the 409 body). ListRetiredAgents(page, perPage) adds a separate
  paginated accessor with page<1→1 and perPage<1→50 normalization so
  retired rows are queryable without polluting the default agent
  listing.

  Sentinel guard coverage is asymmetric by design: all four reserved
  IDs are protected, and force=true cannot override. Regression tests
  in internal/service/agent_retire_test.go assert each of the eight
  steps in order, plus sentinel bypass attempts and idempotency
  replay.

Handler + router — status-code surface:
  internal/api/handler/agents.go:RetireAgent exposes seven status
  codes on DELETE /agents/{id}:
    200 on a fresh retirement (body echoes AgentRetirementResult).
    204 on idempotent replay (AlreadyRetired=true; no new audit).
    400 on ErrForceReasonRequired.
    403 on ErrAgentIsSentinel.
    404 on ErrAgentNotFound.
    409 on BlockedByDependenciesError, with a custom body shape
        {error, counts{active_targets, active_certificates,
        pending_jobs}} that bypasses the default ErrorWithRequestID
        envelope so callers get the per-bucket numbers directly.
    500 on any other error.
  Heartbeat HandleHeartbeat returns 410 Gone when the agent is
  retired (ErrAgentRetired), signalling the agent to shut down.
  Query params `force=true` and `reason=<text>` drive the cascade
  path; both are forwarded as url.Values through the new MCP
  transport.

  internal/api/router/router.go registers GET /api/v1/agents/retired
  literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
  literal-beats-pattern-var precedence routes "retired" to the
  paginated retired-agents listing instead of fetching a hypothetical
  agent named "retired".

Agent binary — clean shutdown on 410:
  cmd/agent/main.go gains the ErrAgentRetired sentinel, a
  retiredOnce sync.Once, and a retiredSignal chan struct{}. A
  markRetired(source, statusCode, body) helper closes the channel
  exactly once; the Run() select loop observes the close and returns
  ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
  and exits cleanly instead of spinning in the heartbeat retry loop.
  The 410 Gone surface is therefore terminal for the agent process.

MCP transport:
  internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
  a new additive transport method. Client.Delete is path-only; without
  this method the retire tool would silently drop `force` and `reason`,
  turning every cascade retire into a default soft-retire. The new
  method shares do()'s 204 normalization and 4xx/5xx error
  propagation so tool authors get one contract.
  internal/mcp/tools.go + internal/mcp/types.go expose the
  retire_agent tool with Force+Reason inputs wired through
  DeleteWithQuery.

CLI:
  cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
  `agents list --retired` (client-side strip of --retired then
  delegation to ListRetiredAgents, sharing --page/--per-page parsing
  with the default listing) and `agents retire <id> [--force --reason
  "…"]` (mirrors ErrForceReasonRequired — force without reason is
  rejected client-side before the request is sent). JSON + table
  output modes both honor the new columns.

Frontend:
  web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
  web/src/api/client.ts + web/src/api/types.ts expose the retire
  endpoint and the retired-listing. 4 new Vitest regression cases.

OpenAPI:
  api/openapi.yaml documents DELETE /agents/{id} with all seven
  status codes, 410 on heartbeat, and the 409 per-bucket body shape.

Regression coverage (six new test files, all green):
  internal/service/agent_retire_test.go           — 8-step contract + sentinel guards
  internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
  internal/mcp/retire_agent_test.go               — DeleteWithQuery wire-through
  internal/cli/agent_retire_test.go               — --retired listing + --force/--reason pairing
  internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
  internal/domain/connector_test.go               — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies

Files:
  api/openapi.yaml                                — DELETE + 410 + 409 body shape
  cmd/agent/main.go                               — ErrAgentRetired, markRetired, retiredSignal
  cmd/cli/main.go                                 — handleAgents list/get/retire dispatch
  docs/architecture.md, docs/concepts.md,
    docs/testing-guide.md                         — retirement contract narrative
  internal/api/handler/agents.go                  — RetireAgent, status surface, 410 on heartbeat
  internal/api/handler/agent_handler_test.go      — extended coverage
  internal/api/handler/agent_retire_handler_test.go — new
  internal/api/router/router.go                   — /agents/retired before /agents/{id}
  internal/cli/agent_retire_test.go               — new
  internal/cli/client.go                          — ListRetiredAgents + RetireAgent
  internal/domain/connector.go                    — IsRetired, SentinelAgentIDs,
                                                    IsSentinelAgent, AgentDependencyCounts,
                                                    ActorTypeAgent/System
  internal/domain/connector_test.go               — new
  internal/integration/lifecycle_test.go          — retirement fixture
  internal/mcp/client.go                          — DeleteWithQuery additive transport
  internal/mcp/retire_agent_test.go               — new
  internal/mcp/tools.go, internal/mcp/types.go    — retire_agent tool + Force/Reason inputs
  internal/repository/interfaces.go               — AgentRepository retirement methods
  internal/repository/postgres/agent.go           — retire + cascade target retire + counts
  internal/repository/postgres/migration_000015_test.go — new
  internal/service/agent.go                       — wire into AgentService surface
  internal/service/agent_retire.go                — new 8-step contract
  internal/service/agent_retire_test.go           — new
  internal/service/deployment.go                  — skip retired agents
  internal/service/target.go                      — skip retired agents
  internal/service/testutil_test.go               — shared mocks extended
  migrations/000015_agent_retire.up.sql           — new
  migrations/000015_agent_retire.down.sql         — new
  web/src/api/client.ts, types.ts + tests         — retire endpoint wiring
  web/src/pages/AgentsPage.tsx                    — retire UI
2026-04-19 05:24:00 +00:00
shankar0123 3287e174dc Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
a53a4b8. The work lands as a single commit spanning server, docs, tests,
and the React client.

M-002 — Named API keys with per-key actor propagation
  * Migration 000014 adds the 'api_keys' table (id, name, hash,
    principal, role, created_at, last_used_at, disabled_at) so every
    credential carries an identifiable principal instead of the
    opaque 'anonymous'/'api-key' sentinel.
  * Auth middleware now rotates through configured keys, performs
    constant-time hash comparison, stamps 'last_used_at', and emits
    an actor struct via contextWithActor(). The audit middleware,
    bulk-revocation handler, approval handlers, and MCP tool layer
    now read the principal off the context and persist it on every
    audit_events row.
  * Regression coverage:
      - internal/api/middleware/audit_test.go — actor propagation,
        principal redaction for disabled keys, anonymous fallback for
        unauthenticated endpoints.
      - internal/api/handler/bulk_revocation_handler_test.go,
        job_handler_test.go — principal-on-audit assertions.

M-003 — Authorization gates (Phase B)
  * Approval handler rejects self-approval / self-rejection with 403
    when the actor principal equals the job's requested_by field.
  * Bulk revocation is gated behind the 'admin' role; operators and
    viewers receive 403.
  * Regression coverage:
      - internal/service/job_test.go — TestApproveJob_NotSelf,
        TestRejectJob_NotSelf.
      - internal/api/handler/bulk_revocation_handler_test.go —
        TestBulkRevoke_RequiresAdmin, TestBulkRevoke_AdminSucceeds.

M-006 — RFC-compliant CRL/OCSP on the unauthenticated .well-known mux
  * Per RFC 8615, relying parties cannot reasonably be asked to
    authenticate against the issuing certctl instance to retrieve
    revocation material. CRL and OCSP move off the authenticated
    '/api/v1/crl*' and '/api/v1/ocsp/*' paths onto:
        GET /.well-known/pki/crl/{issuer_id}
            Content-Type: application/pkix-crl   (RFC 5280 §5)
        GET /.well-known/pki/ocsp/{issuer_id}/{serial}
            Content-Type: application/ocsp-response  (RFC 6960)
  * Non-standard JSON CRL shape is removed; only DER is served.
  * Short-lived certificate exemption (profile TTL < 1h → skip
    CRL/OCSP) is preserved; the response simply omits the serial.
  * Routes are registered on the unauthenticated 'finalHandler' mux
    in cmd/server/main.go alongside EST ('/.well-known/est/*') and
    SCEP ('/scep'). Legacy authenticated paths return 404.
  * Regression coverage:
      - internal/api/handler/certificate_handler_test.go — content
        type, DER parseability, 404 for unknown issuer.
      - internal/api/handler/adversarial_path_test.go — unauthenticated
        access asserted for CRL, OCSP, EST, SCEP.
      - internal/api/router/router_test.go — route-table assertion
        that '.well-known/pki/*', '.well-known/est/*', and '/scep' are
        mounted on the unauthenticated branch.

M-001 — Auto-closed by M-002
  EST and SCEP were already registered on the unauthenticated
  'finalHandler' mux; the router comment at
  internal/api/router/router.go:247 now matches reality. The
  adversarial-path tests above lock the behavior in.

Verification (all gates green):
  * go vet ./...                                           — clean
  * go build ./...                                         — ok
  * go test -short ./... (55+ packages)                    — all pass
  * web/ : npm test (225 Vitest tests)                     — all pass
  * web/ : npx tsc --noEmit                                — clean
  * grep sweep for '/api/v1/(crl|ocsp)' — 13 surviving hits,
    all intentional M-006 tombstone/relocation comments.

Documentation:
  * coverage-gap-audit.md — status flips M-001/M-002/M-003/M-006 →
    Fixed, with per-finding resolution paragraphs citing regression
    test IDs. (Audit file lives outside this repo; see cowork root.)
  * CLAUDE.md Project Status line updated with the auth-unification
    closure note.
  * docs/features.md, docs/architecture.md, docs/quickstart.md,
    docs/concepts.md, docs/connectors.md, docs/test-env.md,
    docs/testing-guide.md, docs/compliance-*.md, docs/demo-advanced.md
    — refreshed for the new '.well-known/pki/*' namespace and named
    API keys.
  * api/openapi.yaml — documents the new unauthenticated endpoints
    and removes the legacy '/api/v1/crl*' + '/api/v1/ocsp/*' paths.

.gitignore: adds '/.gocache/' and '/.gomodcache/' for the session-
scoped Go caches so they never enter the tree.
2026-04-18 18:17:41 +00:00
shankar0123 13cd4d98ba feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id,
owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure
tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke),
server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI
spec, compliance mapping updates, and 21 new tests (12 service, 7 handler,
1 CLI, 1 frontend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 00:06:34 -04:00
shankar0123 cc03f55006 docs: comprehensive documentation audit — fix stale counts, V2/V3 matrix, connector status
- features.md: Fix Feature Matrix to correctly show all V2 Free features
  (F5/IIS/WinCertStore/JavaKeystore as Implemented, not Stub; Vault/DigiCert/
  Sectigo/GoogleCAS as V2 Free, not V3 Paid). Add missing shipped features
  (EST, verification, export, S/MIME, ARI, digest, Helm, onboarding). Update
  issuer count to 9, target count to 13.
- architecture.md: Fix F5/IIS from "interface only, implementation planned"
  to implemented. Add all 13 target connectors to built-in targets list.
- why-certctl.md: Add Sectigo and Google CAS to issuer list (7→9). Fix
  target count (10→13). Remove hardcoded endpoint/operation counts.
- connectors.md: Fix F5 BIG-IP TOC entry from "Interface Only" to
  "Implemented". Remove dead "Planned Issuers" TOC link.
- README.md: Remove competitor product names (CertKit, KeyTalk). Remove
  hardcoded dashboard page count. Remove hardcoded endpoint counts. Fix V4
  roadmap to remove already-shipped issuers (Sectigo, Google CAS).
- Remove hardcoded MCP tool counts (78/80) across 8 files (mcp.md,
  architecture.md, features.md, testing-guide.md, concepts.md, quickstart.md,
  demo-advanced.md, why-certctl.md). Replace with "REST API exposed via MCP"
  to avoid future drift.
- quickstart.md: Docker Compose environments table (from previous session).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 21:33:12 -04:00
shankar0123 f92c997a50 feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone:

1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with
   `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing
   acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support.
   ES256 JWS signing with kid mode, nonce management, directory discovery.
   Empty profile delegates to standard library path (zero behavior change).
   Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on
   ACME issuer config.

2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source,
   docs, README, and examples. Zero remaining occurrences of RFC 9702.

3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating
   renewal thresholds against SC-081v3 validity reduction timeline (200→100→47
   days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the
   expected renewal path for 6-day shortlived certs.

New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 13:52:13 -04:00
shankar0123 4c3b7cbb16 docs: fix stale references, seed data case bugs, and convert ASCII diagrams to Mermaid
Audit all docs and examples against current codebase state. Fix seed_demo.sql
domain constant casing (IssuerType, TargetType, AgentStatus) that would cause
agent dispatch failures. Fix example docker-compose health endpoints (/health
not /api/v1/health) and env var names (CERTCTL_DATABASE_URL). Update connector
counts, test numbers, and planned→implemented status across docs. Convert 3
ASCII flow diagrams to Mermaid.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 16:11:42 -04:00
shankar0123 ec21c9bb29 feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing
with cert ID computation, directory endpoint discovery, graceful
degradation for non-ARI CAs. 19 tests.

M29: Email notifier wiring + scheduled certificate digest — SMTP
connector bridged to service layer via NotifierAdapter, DigestService
with HTML email template, 7th scheduler loop (24h), digest preview/send
API endpoints and GUI card. 21 tests.

M30: Production-ready Helm chart — server Deployment, PostgreSQL
StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security
contexts, health probes, example values for dev/prod/ACME scenarios.

Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job,
documentation updates across 5 doc files and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 21:18:35 -04:00
shankar0123 b9633e5b1a docs: add GUI references to discovery and network scan documentation
Update concepts.md and connectors.md to mention the Discovery and
Network Scans dashboard pages alongside existing API documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 16:19:14 -04:00
shankar0123 87355c3efb docs: add table of contents to all major documentation files
Navigation menus for testing guide, architecture, concepts,
connectors, quickstart, advanced demo, and three compliance docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 23:38:28 -04:00
shankar0123 e19c240a79 feat: add ACME DNS-PERSIST-01 challenge support (IETF draft-ietf-acme-dns-persist)
Standing TXT record at _validation-persist.<domain> eliminates per-renewal
DNS updates. Auto-fallback to dns-01 if CA doesn't offer dns-persist-01.
ScriptDNSSolver extended with PresentPersist method. Configurable via
CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01 and
CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN env vars.

Also fixes IsExpired edge-case test in discovery_test.go that always failed
due to time.Now() drift between test setup and method invocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:23:46 -04:00
shankar0123 7d14635a72 feat: add EST server (RFC 7030) for device certificate enrollment (M23)
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).

Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
  quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
  issuer interface, resource lists, OpenSSL status)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:31:06 -04:00
shankar0123 72cda5877a docs: fix 16 discrepancies found by cross-validating all docs against source code
CLI syntax corrected across 5 files (concepts, demo-guide, demo-advanced,
architecture, features): list-certs→certs list, get-cert→certs get, etc.
Removed non-existent health/metrics commands, replaced with status.
Subcommand count 10→12 everywhere.

architecture.md: Go 1.22→1.25, endpoint count 91→93, ER diagram expanded
from 15 to 21 tables (added renewal_policies, certificate_revocations,
discovered_certificates, discovery_scans, network_scan_targets).

connectors.md: added GenerateCRL and SignOCSPResponse to issuer interface,
added Email and Webhook rows to notifier config table.

compliance docs: fixed keygen warning messages to match actual log output,
CERTCTL_STEPCA_PROVISIONER_KEY→CERTCTL_STEPCA_KEY_PATH, openssl genrsa→
crypto/ecdsa.GenerateKey, CERTCTL_SERVER_ADDR→CERTCTL_SERVER_HOST+PORT.

README.md: v2.0.0 version bump, solo developer mention, feature list,
table of contents, documentation table moved to top, 7 fact-check fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 03:51:33 -04:00
shankar0123 a4622d5e9a fix: correct stale counts across all docs (tables 19→21, MCP tools 76→78, tests 860→900+)
V2 audit found 3 critical number mismatches propagated across 8 files:
- Table count was 19 everywhere but actual migrations create 21 tables
- MCP tool count was 76 but tools.go registers 78 (M21/M22 additions)
- README MCP breakdown claimed 83 tools with math summing to 90
- architecture.md still had stale 860+ test count
- features.md OpenAPI claim said 93 ops but spec has 78
- mcp.md tool-per-domain table had wrong counts in 10 of 16 rows
- Added 3 network_scan_targets to seed_demo.sql for demo completeness
- Added curl examples to Agent Groups section in features.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 00:36:47 -04:00
shankar0123 4f90be9311 feat: add network certificate discovery (M21) and Prometheus metrics (M22)
M21 adds server-side active TLS scanning of CIDR ranges with concurrent
probing, sentinel agent pattern for pipeline reuse, and full CRUD API for
scan targets. M22 adds Prometheus exposition format endpoint alongside
existing JSON metrics. Comprehensive documentation audit updates all docs
to reflect 91 endpoints, 19 tables, 6 scheduler loops, and 900+ tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 23:37:47 -04:00
shankar0123 667a30870d feat: M18b Filesystem Certificate Discovery — agent scanning, server dedup, triage API
Agent-side:
- Filesystem scanner walks configured directories (CERTCTL_DISCOVERY_DIRS)
- Parses PEM (.pem, .crt, .cer, .cert) and DER (.der) certificate files
- Extracts CN, SANs, serial, issuer/subject DN, validity, key info, SHA-256 fingerprint
- Reports discoveries to control plane on startup + every 6 hours
- Skips files >1MB and private key files

Server-side:
- Migration 000006: discovered_certificates + discovery_scans tables
- Domain model: DiscoveredCertificate, DiscoveryScan, DiscoveryReport
- Three triage states: Unmanaged, Managed (claimed), Dismissed
- Repository with upsert dedup (fingerprint + agent + path)
- Service layer: process reports, claim, dismiss, list, summary
- 7 new API endpoints (84 total):
  POST /agents/{id}/discoveries, GET /discovered-certificates,
  GET /discovered-certificates/{id}, POST .../claim, POST .../dismiss,
  GET /discovery-scans, GET /discovery-summary
- Audit trail: scan_completed, cert_claimed, cert_dismissed events

Tests: 28 new test functions (domain, handler, service layers)
Docs: README, quickstart, demo-guide, demo-advanced, architecture,
      concepts, connectors, features.md all updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 00:25:00 -04:00
shankar0123 1e56e35dcc docs: add MCP server guide and OpenAPI spec guide
New docs/mcp.md covers MCP server setup with Claude Desktop, Cursor,
and Claude Code, lists all 76 tools across 16 domains, includes example
conversations, and documents security considerations.

New docs/openapi.md covers Swagger UI setup, SDK generation for
TypeScript/Python/Go/Java, Postman import, spec validation, and
contract testing with schemathesis.

Updated cross-references in concepts.md and architecture.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:18:50 -04:00
shankar0123 95165fe972 docs: comprehensive V2 documentation update across all guides
Add missing V2 concepts (Certificate Profiles, Revocation with CRL/OCSP,
Short-Lived Certificates, CLI, MCP Server, Observability) to concepts guide.
Update quickstart with revocation examples, sorting/filtering, cursor pagination,
sparse fields, stats/metrics, and approval workflows. Align 5-minute demo guide
and advanced demo to full V2 feature set including revocation workflows, bulk ops,
fleet overview, and dashboard charts. Update architecture with MCP server section,
5th scheduler loop, API audit log, and 860+ test count. Add revocation-across-issuers
section to connectors guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 19:10:57 -04:00
shankar0123 aa183efdca docs: cross-validate all documentation against codebase, fix 21 inaccuracies
Fact-checked every doc file against actual source code. Key corrections:
- Table count 14→17 (added profiles, agent_groups, agent_group_members)
- Endpoint count 55→68 (counted from router.go)
- Test count 250+→330+ (99 service + 165 handler + 53 frontend + connectors)
- Dashboard views 14→16 pages (counted from web/src/pages/)
- step-ca marked implemented (was "Planned V2") across all docs
- ACME DNS-01 marked implemented (was "planned") in concepts.md
- Removed ADCS as separate planned connector (handled via sub-CA mode)
- Fixed pointer types in connectors.md interface docs (*string, *time.Time)
- Added 3 missing tables to architecture.md ER diagram
- Added 5 missing env vars to README config table
- Updated M11/M12 to  in README roadmap
- Issuer count in quickstart demo data 3→4 (added step-ca)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 23:12:23 -04:00
shankar0123 d7a4d40d47 docs: fix SC-081v3 voting claim — not unanimous, zero opposition with 5 abstentions
The ballot passed 25-0-5 among CAs and 4-0-0 among browsers.
Not unanimous due to 5 CA abstentions (Entrust, IdenTrust,
Japan Registry Services, SECOM, TWCA).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 16:40:51 -04:00
shankar0123 9a12ee18b2 docs: codify proxy agent pattern, sub-CA capability, IIS dual-mode design
Three architectural decisions from user feedback:

1. Pull-only deployment model — server never initiates outbound
   connections. Network appliances (F5, Palo Alto, FortiGate, Citrix)
   use a proxy agent in the same network zone. Added as design principle
   #2 across all docs.

2. IIS dual-mode — agent-local PowerShell (primary/recommended) + proxy
   agent WinRM (for agentless targets). Replaces the previous WinRM-only
   design. Updated connectors.md, architecture.md, demo-advanced.md.

3. Sub-CA to ADCS — Local CA can load a pre-signed CA cert+key from
   disk, so all issued certs chain to the enterprise root. Replaces the
   planned standalone ADCS issuer connector. Updated concepts.md,
   connectors.md, demo-advanced.md issuer diagram.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 21:45:18 -04:00
shankar0123 b0549e6f05 feat: M11b — ownership tracking, agent groups, interactive renewal approval
Ownership: owners/teams GUI pages, notification email resolution via
resolveRecipient (owner_id → owner.email lookup). Agent groups: dynamic
device grouping by OS/arch/IP CIDR/version with manual include/exclude
membership, migration 000004, full CRUD stack (domain → repo → service →
handler → frontend). Interactive approval: AwaitingApproval job state,
approve/reject API endpoints with reason tracking. Tests: 12 agent group
handler tests, 8 approve/reject job handler tests, integration tests
updated for 13-param RegisterHandlers. Docs updated across architecture,
concepts, and seed data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 21:02:35 -04:00
shankar0123 7450fcfb07 docs: add 47-day cert lifespan motivation, update roadmap, cross-validate all docs
README: lead with CA/Browser Forum Ballot SC-081v3 (47-day certs by 2029)
and certctl's end-to-end automation positioning. Update architecture
diagram and target lists to include Apache/HAProxy. Update roadmap
with new M15 (Revocation Infrastructure), renumbered M16-M18, and
V3.1 cert-manager/IAM Roles Anywhere additions.

concepts.md: rewrite "Why Do Certificates Expire?" with shrinking
lifespan timeline and automation imperative.

quickstart.md: add 47-day framing in intro.

architecture.md: add Apache/HAProxy to system diagram, target connector
diagram, deployment section, and ER diagram (agent metadata columns).
Update planned targets list for V3.1. Fix test count (230+).

connectors.md: fix notifier planned version reference (V2 not V2.1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 19:28:02 -04:00
shankar0123 07275bf92f feat: M10 — agent metadata collection, Apache httpd + HAProxy target connectors
Agents now report OS, architecture, IP address, hostname, and version
via heartbeat using runtime.GOOS, runtime.GOARCH, and net.Dial. New
migration adds columns to agents table. Heartbeat handler, service,
and repository updated to accept and persist metadata. GUI shows
OS/Arch in agent list and full system info in agent detail page.

Apache httpd connector: separate cert/chain/key files, apachectl
configtest validation, graceful reload. HAProxy connector: combined
PEM file (cert+chain+key), optional config validation, reload.
Both wired into agent binary's target connector switch.

14 tests for new connectors. All existing tests updated for new
Heartbeat/UpdateHeartbeat signatures. Docs updated across README,
architecture, concepts, and connectors guides.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 02:19:28 -04:00
shankar0123 e0fa20b369 docs: clarify ACME is HTTP-01 only, DNS-01 planned for V2
The concepts guide implied DNS-01 was supported. Made it explicit
that v1 uses HTTP-01 and DNS-01 (wildcards) is on the V2 roadmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 23:39:35 -04:00
shankar0123 8e17384983 Add technical explanations to advanced demo and convert all diagrams to Mermaid
- Add how/why technical breakdowns to every step in demo-advanced.md:
  handler→service→repository code paths, SQL details, security reasoning,
  field-by-field explanations, and architectural design decisions
- Convert all ASCII box diagrams to Mermaid across docs:
  architecture.md (9 diagrams), demo-advanced.md (6), concepts.md (1)
- Diagram types: flowcharts, sequence diagrams, ER diagram, state machine
- Remove placeholder Support & Community section from README
- Zero ASCII box-drawing characters remaining in docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 21:53:34 -04:00
shankar0123 9b4122b159 Fix runtime bugs, implement service layer, and overhaul documentation
Runtime fixes:
- Fix env var mismatch (CERTCTL_DB_URL → CERTCTL_DATABASE_URL)
- Fix table name mismatches (certificates → managed_certificates, notifications → notification_events)
- Add renewal_policy_id to certificate queries
- Remove non-existent created_at from notification queries
- Add env var fallback for agent CLI flags
- Graceful degradation for missing notifiers/issuers in demo mode
- Copy web/ directory in Dockerfile for dashboard serving

Service layer:
- Implement handler-service interface pattern across all services
- Wire up certificate, agent, job, policy, team, owner, audit, notification services

Documentation:
- Add concepts.md: beginner-friendly guide to TLS, CAs, private keys
- Rewrite quickstart.md with accurate API examples matching actual handlers
- Add demo-advanced.md: interactive demo with cert issuance and automated script
- Update architecture.md with correct table names and connector interfaces
- Update connectors.md to match actual Go interface signatures
- Update demo-guide.md with cross-references to new docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 21:38:11 -04:00