Commit Graph

43 Commits

Author SHA1 Message Date
shankar0123 0725713e19 Close I-004 (agent hard-delete cascades targets) coverage-gap finding
Operator decision answered as full soft-delete with optional forced
cascade — hard-delete is not reachable from any public surface. Prior
to this commit, DELETE /agents/{id} ran a plain `DELETE FROM agents`
whose schema-level `ON DELETE CASCADE` on deployment_targets.agent_id
silently wiped every target, orphaning certs and aborting in-flight
jobs. The finding closure reshapes the agent-removal contract around
soft retirement with explicit preflight counts, an opt-in cascade
gated by a mandatory reason, and unconditional protection for the
four reserved sentinel agents used by discovery sources.

Schema — migration 000015:
  migrations/000015_agent_retire.up.sql flips
  deployment_targets_agent_id_fkey from ON DELETE CASCADE to ON DELETE
  RESTRICT, so a stray `DELETE FROM agents` now errors at the DB
  boundary instead of quietly destroying targets. Both `agents` and
  `deployment_targets` grow a retired_at TIMESTAMPTZ + retired_reason
  TEXT pair (TEXT not VARCHAR so operator comments are never
  truncated), indexed via partial indexes WHERE retired_at IS NOT
  NULL. The migration is self-healing (ADD COLUMN IF NOT EXISTS, DROP
  CONSTRAINT IF EXISTS then ADD CONSTRAINT, CREATE INDEX IF NOT
  EXISTS) so repeated runs against partially-migrated databases
  converge. migrations/000015_agent_retire.down.sql restores CASCADE
  and drops the new columns for clean rollback. A dedicated
  repository-layer testcontainers test
  (internal/repository/postgres/migration_000015_test.go) asserts the
  before/after FK action, column presence, index presence, and
  round-trip idempotency under up→down→up.

Domain — sentinel guard + dependency counts:
  internal/domain/connector.go gains IsRetired() on Agent, the
  exported SentinelAgentIDs slice listing server-scanner,
  cloud-aws-sm, cloud-azure-kv, cloud-gcp-sm verbatim (matching the
  four reserved IDs documented in CLAUDE.md and created at startup in
  cmd/server/main.go), IsSentinelAgent(id string) predicate,
  AgentDependencyCounts{ActiveTargets, ActiveCertificates,
  PendingJobs} with a HasDependencies() method, and ActorTypeAgent /
  ActorTypeSystem enum values used by audit emission downstream.
  Coverage locked down by internal/domain/connector_test.go.

Service — 8-step ordered contract:
  internal/service/agent_retire.go:RetireAgent(ctx, id, actor,
  opts{Force, Reason}) enforces a fixed execution order:
  (1) sentinel guard — IsSentinelAgent(id) returns ErrAgentIsSentinel
      unconditionally; force=true does NOT bypass it.
  (2) fetch — ErrAgentNotFound on miss.
  (3) idempotency — if IsRetired() already, return
      AgentRetirementResult{AlreadyRetired: true} with no new audit
      event and no state change (safe to replay from flaky clients).
  (4) preflight counts — collectAgentDependencyCounts runs
      ActiveTargets, ActiveCertificates, PendingJobs sequentially
      (not in parallel; keeps the per-query timeout predictable and
      matches the repo's existing call-chain shape).
  (5) force-reason guard — opts.Force=true with empty Reason returns
      ErrForceReasonRequired (wired into the 400 status surface).
  (6) dependency guard — HasDependencies() with opts.Force=false
      returns BlockedByDependenciesError{Counts} (wired into the 409
      body with per-bucket counts).
  (7) mutation — single pinned retiredAt := time.Now(); agent
      retirement first, then cascade target retirement if opts.Force,
      all under the repo's single transaction so the two retired_at
      stamps match to the second.
  (8) best-effort audit — agent_retired always; agent_retirement_
      cascaded additionally on the force path. Actor is whatever the
      handler resolves from the request; actor type is mapped by
      resolveActorType (system/agent-prefix→Agent/else→User). Audit
      emission failures are logged via slog.Error but do not abort
      the retirement (matches the house convention used by every
      other scheduler-emitted event).

  BlockedByDependenciesError implements Error() as
  "active_targets=%d, active_certificates=%d, pending_jobs=%d" and
  Unwrap() → ErrBlockedByDependencies. The single struct satisfies
  errors.Is via Unwrap (used by scheduler-level tests) and errors.As
  via the concrete type (used by the handler to fish out Counts for
  the 409 body). ListRetiredAgents(page, perPage) adds a separate
  paginated accessor with page<1→1 and perPage<1→50 normalization so
  retired rows are queryable without polluting the default agent
  listing.

  Sentinel guard coverage is asymmetric by design: all four reserved
  IDs are protected, and force=true cannot override. Regression tests
  in internal/service/agent_retire_test.go assert each of the eight
  steps in order, plus sentinel bypass attempts and idempotency
  replay.

Handler + router — status-code surface:
  internal/api/handler/agents.go:RetireAgent exposes seven status
  codes on DELETE /agents/{id}:
    200 on a fresh retirement (body echoes AgentRetirementResult).
    204 on idempotent replay (AlreadyRetired=true; no new audit).
    400 on ErrForceReasonRequired.
    403 on ErrAgentIsSentinel.
    404 on ErrAgentNotFound.
    409 on BlockedByDependenciesError, with a custom body shape
        {error, counts{active_targets, active_certificates,
        pending_jobs}} that bypasses the default ErrorWithRequestID
        envelope so callers get the per-bucket numbers directly.
    500 on any other error.
  Heartbeat HandleHeartbeat returns 410 Gone when the agent is
  retired (ErrAgentRetired), signalling the agent to shut down.
  Query params `force=true` and `reason=<text>` drive the cascade
  path; both are forwarded as url.Values through the new MCP
  transport.

  internal/api/router/router.go registers GET /api/v1/agents/retired
  literal-path BEFORE /api/v1/agents/{id} — Go 1.22 ServeMux's
  literal-beats-pattern-var precedence routes "retired" to the
  paginated retired-agents listing instead of fetching a hypothetical
  agent named "retired".

Agent binary — clean shutdown on 410:
  cmd/agent/main.go gains the ErrAgentRetired sentinel, a
  retiredOnce sync.Once, and a retiredSignal chan struct{}. A
  markRetired(source, statusCode, body) helper closes the channel
  exactly once; the Run() select loop observes the close and returns
  ErrAgentRetired; main() matches via errors.Is(err, ErrAgentRetired)
  and exits cleanly instead of spinning in the heartbeat retry loop.
  The 410 Gone surface is therefore terminal for the agent process.

MCP transport:
  internal/mcp/client.go adds Client.DeleteWithQuery(path, query),
  a new additive transport method. Client.Delete is path-only; without
  this method the retire tool would silently drop `force` and `reason`,
  turning every cascade retire into a default soft-retire. The new
  method shares do()'s 204 normalization and 4xx/5xx error
  propagation so tool authors get one contract.
  internal/mcp/tools.go + internal/mcp/types.go expose the
  retire_agent tool with Force+Reason inputs wired through
  DeleteWithQuery.

CLI:
  cmd/cli/main.go + internal/cli/client.go add two CLI surfaces:
  `agents list --retired` (client-side strip of --retired then
  delegation to ListRetiredAgents, sharing --page/--per-page parsing
  with the default listing) and `agents retire <id> [--force --reason
  "…"]` (mirrors ErrForceReasonRequired — force without reason is
  rejected client-side before the request is sent). JSON + table
  output modes both honor the new columns.

Frontend:
  web/src/pages/AgentsPage.tsx surfaces retired/retire affordances.
  web/src/api/client.ts + web/src/api/types.ts expose the retire
  endpoint and the retired-listing. 4 new Vitest regression cases.

OpenAPI:
  api/openapi.yaml documents DELETE /agents/{id} with all seven
  status codes, 410 on heartbeat, and the 409 per-bucket body shape.

Regression coverage (six new test files, all green):
  internal/service/agent_retire_test.go           — 8-step contract + sentinel guards
  internal/api/handler/agent_retire_handler_test.go — 7-status-code surface + 410 heartbeat
  internal/mcp/retire_agent_test.go               — DeleteWithQuery wire-through
  internal/cli/agent_retire_test.go               — --retired listing + --force/--reason pairing
  internal/repository/postgres/migration_000015_test.go — FK flip + columns + indexes + up↔down
  internal/domain/connector_test.go               — IsRetired, IsSentinelAgent, SentinelAgentIDs, HasDependencies

Files:
  api/openapi.yaml                                — DELETE + 410 + 409 body shape
  cmd/agent/main.go                               — ErrAgentRetired, markRetired, retiredSignal
  cmd/cli/main.go                                 — handleAgents list/get/retire dispatch
  docs/architecture.md, docs/concepts.md,
    docs/testing-guide.md                         — retirement contract narrative
  internal/api/handler/agents.go                  — RetireAgent, status surface, 410 on heartbeat
  internal/api/handler/agent_handler_test.go      — extended coverage
  internal/api/handler/agent_retire_handler_test.go — new
  internal/api/router/router.go                   — /agents/retired before /agents/{id}
  internal/cli/agent_retire_test.go               — new
  internal/cli/client.go                          — ListRetiredAgents + RetireAgent
  internal/domain/connector.go                    — IsRetired, SentinelAgentIDs,
                                                    IsSentinelAgent, AgentDependencyCounts,
                                                    ActorTypeAgent/System
  internal/domain/connector_test.go               — new
  internal/integration/lifecycle_test.go          — retirement fixture
  internal/mcp/client.go                          — DeleteWithQuery additive transport
  internal/mcp/retire_agent_test.go               — new
  internal/mcp/tools.go, internal/mcp/types.go    — retire_agent tool + Force/Reason inputs
  internal/repository/interfaces.go               — AgentRepository retirement methods
  internal/repository/postgres/agent.go           — retire + cascade target retire + counts
  internal/repository/postgres/migration_000015_test.go — new
  internal/service/agent.go                       — wire into AgentService surface
  internal/service/agent_retire.go                — new 8-step contract
  internal/service/agent_retire_test.go           — new
  internal/service/deployment.go                  — skip retired agents
  internal/service/target.go                      — skip retired agents
  internal/service/testutil_test.go               — shared mocks extended
  migrations/000015_agent_retire.up.sql           — new
  migrations/000015_agent_retire.down.sql         — new
  web/src/api/client.ts, types.ts + tests         — retire endpoint wiring
  web/src/pages/AgentsPage.tsx                    — retire UI
2026-04-19 05:24:00 +00:00
shankar0123 b3cc7cbdb2 fix(policies): close the D-006 loop — TitleCase seed canonicals + severity-aware, config-consuming rule engine (D-008)
D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:

  (a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
      types ('ownership', 'environment', 'lifetime', 'renewal_window')
      that the handler validator rejects on Create/Update but that
      raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
      switch fell through to the default "unknown policy rule type"
      error branch on every demo rule × every cert × every cycle,
      flooding logs while emitting zero violations.

  (b) migrations/seed_demo.sql persisted lowercase severity values
      ('critical', 'error', 'warning') on policy_violations rows.
      INSERT succeeded because that column had no CHECK, but any
      frontend comparing against the canonical PolicySeverity enum
      mis-categorized every seeded violation.

  (c) evaluateRule hardcoded Severity: PolicySeverityWarning on
      every emitted violation and ignored rule.Config entirely —
      so the D-006 per-rule severity column (000013) and every
      per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
      required_keys, allowed, lead_time_days, max_days}) was dead
      data below the evaluation layer.

This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.

## Changes

Domain (internal/domain/policy.go):
  * Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
    Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
    arm — routing it through RenewalLeadTime would conflate "how
    close to expiry before we renew" with "how long can the cert
    possibly be", two distinct semantics. The new type accepts
    config {"max_days": int} and flags certs whose
    NotAfter - NotBefore exceeds the cap.

Handler validator (internal/api/handler/validation.go):
  * ValidatePolicyType allowlist grown to 6 canonicals
    (AllowedIssuers, AllowedDomains, RequiredMetadata,
    AllowedEnvironments, RenewalLeadTime, CertificateLifetime).

OpenAPI (api/openapi.yaml):
  * PolicyType enum grown to match domain.

Frontend (web/src/api/types.ts, types.test.ts):
  * POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
    all 6 canonicals and rejects casing drift.

Migration 000014 (policy_violations severity CHECK):
  * Named CHECK constraint (policy_violations_severity_check)
    mirroring 000013's allowlist, defense-in-depth at the DB layer
    against future drift from bypassed writes (migrations, psql
    sessions, future callers). Symmetric down migration drops by
    name.

Seed data:
  * migrations/seed.sql rewritten to emit TitleCase canonicals with
    per-arm config JSON that actually exercises the config-consuming
    paths (not the missing-field backstops):
      - pr-require-owner         → RequiredMetadata     {"required_keys":["owner"]}                        Warning
      - pr-allowed-environments  → AllowedEnvironments  {"allowed":["production","staging","development"]} Error
      - pr-max-certificate-lifetime → CertificateLifetime {"max_days":90}                                   Critical
      - pr-min-renewal-window    → RenewalLeadTime      {"lead_time_days":14}                              Warning
    Severities are now differentiated per rule (D-006 intent).
  * migrations/seed_demo.sql violation rows flipped to TitleCase
    severity ('Critical', 'Error', 'Warning') so migration 000014
    applies cleanly on upgrade paths.

Engine rewrite (internal/service/policy.go):
  * evaluateRule rewritten. All six arms now:
      1. Parse rule.Config into the per-arm typed struct.
      2. Bad JSON → log at ValidateCertificate boundary and skip
         this rule (no co-located poisoning of other rules in the
         same batch).
      3. Empty/null Config → emit the pre-D-008 missing-field
         violation (backwards compat invariant — operators who
         haven't reconfigured still see the same output).
      4. Violations emitted carry rule.Severity (no more hardcoded
         Warning); D-006 column is now load-bearing.
  * CertificateLifetime arm reads NotBefore/NotAfter from the
    certificate's latest version via CertRepo. Injected via
    PolicyService.SetCertRepo() setter — avoids churning ~36
    NewPolicyService call sites while keeping the lifetime arm
    optional (degrades to a log+skip if the setter is not wired).

Server wiring (cmd/server/main.go):
  * policyService.SetCertRepo(certRepo) wired after construction.

Tests (internal/service/policy_test.go):
  * 25 new subtests across 5 groups:
      - TestEvaluateRule_SeverityPassThrough (6): every rule type
        emits violations carrying rule.Severity, not hardcoded.
      - TestEvaluateRule_ConfigConsumed (12): every per-arm Config
        path exercised positive + negative.
      - TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
        Config still emits pre-D-008 missing-field violations.
      - TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
        and skips cleanly without poisoning neighbors.
      - TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
        ok when repo wired, log+skip when not, handles missing
        NotBefore/NotAfter edges.

Provenance: D-008 surfaced during D-005/D-006 remediation review
in eef1db0. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
2026-04-18 14:55:56 +00:00
shankar0123 eef1db0f0a fix(policies): stop 400ing the "+ New Policy" button + add per-rule severity (D-005, D-006)
Coverage Gap Audit findings D-005 (P0) + D-006 (P1) fixed together in a
single commit because they share the same root cause — policy CRUD sending
values the backend silently rejects — and splitting them would leave a
half-working UI between commits.

## D-005 (P0): PoliciesPage dropdown 400s every Create Policy

Root cause
----------
`web/src/pages/PoliciesPage.tsx` populated the Type `<select>` from a
hardcoded `['key_algorithm', 'ownership', 'allowed_issuers', ...]` array.
The backend's `internal/api/handler/validators.go::ValidatePolicyType`
enforces the TitleCase allowlist `AllowedIssuers`, `AllowedDomains`,
`RequiredMetadata`, `AllowedEnvironments`, `RenewalLeadTime` — defined in
`internal/domain/policy.go`. Every Create Policy request was rejected with
`400 invalid policy type`. The error surfaced only as a transient toast;
the modal closed anyway. Silent user-visible failure.

Fix
---
- `web/src/api/types.ts`: added `POLICY_TYPES` and `POLICY_SEVERITIES`
  tuples with `as const` and narrowed `PolicyRule.type`, `.severity`, and
  `PolicyViolation.severity` to the literal-union types. Dropdown is now
  sourced from the tuple; casing drift becomes a compile error.
- `web/src/pages/PoliciesPage.tsx`: rekeyed `severityStyles` /
  `severityDots` to the TitleCase values, added `humanize()` for display
  (AllowedIssuers → "Allowed Issuers"), removed the `badge-neutral`
  fallback that was papering over the mismatch.
- `web/src/api/types.test.ts` (new): pins both tuples exactly. If anyone
  edits one side of the frontend/backend contract without the other, CI
  fails with a clear assertion. Pure-TS vitest, no RTL dependency.

## D-006 (P1): `severity` field silently dropped on create/update

Root cause
----------
`PolicyRule` had no `Severity` field in `internal/domain/policy.go`. The
frontend has always sent `severity` on create/update, but Go's
`json.Decoder` (default settings, no `DisallowUnknownFields`) silently
dropped it. The value never reached PostgreSQL. Every rule rendered with
the same severity because there was no severity — just a display
computation downstream.

Fix: option (b), full-stack schema add (not delete-the-field)
-------------------------------------------------------------
- Migration `000013_policy_rule_severity` (up + down): adds
  `severity VARCHAR(50) NOT NULL DEFAULT 'Warning'` to `policy_rules` with
  CHECK constraint `severity IN ('Warning', 'Error', 'Critical')`. No
  index — three-value column on a low-thousands-rows table, planner will
  seq-scan regardless. PG 11+ metadata-only ADD COLUMN, safe on live data.
- `internal/domain/policy.go`: added `Severity PolicySeverity` field.
- `internal/repository/postgres/policy.go`: plumbed `severity` through
  ListRules SELECT + Scan, GetRule SELECT + Scan, CreateRule INSERT,
  UpdateRule UPDATE (4 queries).
- `internal/service/policy.go::UpdatePolicy`: if the client omits
  severity on a PUT (zero-value empty string), fetch the existing rule
  and preserve its severity. Without this, partial updates would trip the
  NOT NULL CHECK and 500. Preserves pre-existing behavior for Name/Type
  (out of scope).
- `internal/api/handler/policies.go::CreatePolicy`: default empty severity
  to `'Warning'`, then validate via `ValidatePolicySeverity`. 400 with
  clear message instead of 500 on CHECK violation. `UpdatePolicy`:
  validates severity only when provided.
- `internal/mcp/types.go` + `internal/mcp/tools.go`: added optional
  `severity` on the MCP `create_policy` / `update_policy` tool inputs so
  LLM callers stay in sync with the wire contract.
- `api/openapi.yaml`: added `severity` to the `PolicyRule` schema with
  the enum and default.

Acceptance criterion (user-defined)
-----------------------------------
"Create a rule with severity=Critical, reload the page, and still see
Critical — no silent drops." Verified end-to-end: frontend sends
`severity: "Critical"`, handler validates, service persists, DB stores,
GET returns, React renders the correct badge.

Seed data
---------
`migrations/seed.sql`: four demo rules now have differentiated severities
— `pr-require-owner` → Warning, `pr-allowed-environments` → Error,
`pr-max-certificate-lifetime` → Critical, `pr-min-renewal-window` →
Warning. The user called out that seeding all four at the same severity
makes the feature look decorative; differentiation demonstrates the
column carries real signal.

## Integration test fix (side effect of D-006)

`internal/integration/e2e_test.go::TestCrossResourceWorkflow/CreatePolicy`
was sending `"severity": "High"` — a value from the pre-audit severity
vocabulary that the new `ValidatePolicySeverity` correctly rejects with
400. Changed to `"Error"` (closest semantic match in the new TitleCase
allowlist). Only severity reference in the integration/ directory;
verified via grep.

## Out of scope, logged for follow-up (d/D-008)

Three policy-engine drift issues orthogonal to D-005 + D-006, explicitly
deferred per direction:

1. `migrations/seed.sql` policy_rules INSERTs use lowercase TYPE values
   (`'ownership'`, `'environment'`, `'lifetime'`, `'renewal_window'`).
   These are load-bearing on `internal/service/policy.go::evaluateRule`'s
   `switch rule.Type` (which also uses the lowercase strings). Migrating
   requires coordinated changes across seed + evaluation engine.
2. `migrations/seed_demo.sql:482-483` contains lowercase `'critical'`
   severity — will now fail the new CHECK constraint. Separate fix.
3. `evaluateRule` hardcodes `Severity: domain.PolicySeverityWarning` on
   emitted violations and ignores the configured `rule.Config`. The new
   severity column is read correctly on the CRUD path but not yet
   consulted during evaluation.

## Verification

Backend:
- `go build ./...` — clean
- `go vet ./...` — clean
- `go test -short ./...` — all packages green, including
  `internal/service` (policy service), `internal/api/handler` (policy +
  MCP handler tests), `internal/integration` (e2e_test.go after fix),
  `internal/domain`, `internal/repository/postgres`.

Frontend:
- `tsc --noEmit` — clean
- `vitest run` — 223/223 passing (4 new assertions in types.test.ts)
- `vite build` — clean (only the pre-existing chunk-size warning)
2026-04-18 13:02:04 +00:00
shankar0123 13cd4d98ba feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id,
owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure
tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke),
server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI
spec, compliance mapping updates, and 21 new tests (12 service, 7 handler,
1 CLI, 1 frontend).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 00:06:34 -04:00
shankar0123 e1bcde4cf1 feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret
managers. Three pluggable DiscoverySource connectors feed into the
existing discovery pipeline via sentinel agent pattern, with a 9th
scheduler loop for periodic cloud scanning.

- AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests
- Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests
- GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests
- CloudDiscoveryService orchestrator with 9 tests
- 9th scheduler loop (6h default, atomic.Bool idempotency)
- Discovery page: color-coded source type badges
- 14 new env vars across CloudDiscoveryConfig structs
- Docs: connectors.md, architecture.md, features.md, README updated

49 new tests. All CI checks pass (go vet, race, lint, coverage).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 23:01:00 -04:00
shankar0123 3f619bcaac feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA
coverage. Entrust uses mTLS client certificate auth with sync/async
issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with
serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for
self-hosted Keyfactor CAs.

Each connector implements the full issuer.Connector interface (9 methods),
includes httptest-based unit tests (~14 each), and follows established
patterns (injectable HTTP clients, RFC 5280 revocation reason mapping,
CRL/OCSP delegated to CA).

Also includes: issuer factory cases, env var seeding, config structs,
domain types, seed data (3 rows, all disabled), OpenAPI enum updates,
frontend issuer catalog entries with config fields, and full docs
(connectors.md, architecture.md, features.md, README).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 22:24:12 -04:00
shankar0123 596d86a206 feat(M48): continuous TLS health monitoring — endpoint state machine, shared tlsprobe, 8 API endpoints, GUI
Adds continuous TLS endpoint health monitoring that closes the deploy→verify→monitor loop.
After M25 verifies a deployment succeeded once, M48 continuously confirms it stays healthy.

Key components:
- Shared `internal/tlsprobe/` package extracted from network scanner for reuse
- Health status state machine: healthy → degraded (2 failures) → down (5 failures),
  plus cert_mismatch when served fingerprint differs from expected
- 8th scheduler loop (60s tick, per-endpoint configurable intervals)
- PostgreSQL migration 000011: endpoint_health_checks + endpoint_health_history tables
- 8 REST API endpoints (CRUD, history, acknowledge, summary)
- Health Monitor GUI page with summary bar, status table, create modal, auto-refresh
- 38 new tests (5 tlsprobe + 11 domain + 10 service + 8 handler + 4 frontend)
- All coverage thresholds maintained (service 68%, handler 83%, domain 87%, middleware 63%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 21:45:45 -04:00
shankar0123 bcefb11e65 feat(M51): add SCEP server (RFC 8894) for MDM and network device enrollment
Implements Simple Certificate Enrollment Protocol with single-endpoint
operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7
SignedData CSR extraction with fallback for raw/base64 CSR, challenge
password authentication via CSR attributes, and shared internal/pkcs7
package extracted from EST handler to eliminate code duplication.

24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 16:47:18 -04:00
shankar0123 7382e5f03b test: comprehensive test gap closure across 24 packages
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.

All tests pass: go test -short, go vet, go test -race clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:09:40 -04:00
shankar0123 5567d4b411 feat(M47): add Kubernetes Secrets target + AWS ACM PCA issuer connectors
Implement both M47 connectors with full cross-layer wiring:

Kubernetes Secrets target: DNS-1123 validation, kubernetes.io/tls Secret
create-or-update, chain concatenation, serial number validation, Helm
RBAC gating. 18 tests.

AWS ACM Private CA issuer: synchronous issuance (like Vault), ARN regex
validation, RFC 5280 revocation reason mapping, CA cert retrieval,
factory + env var seeding. 23 tests.

Cross-cutting: domain types, service validation, config, factory, agent
dispatch, frontend (TargetsPage, issuerTypes), OpenAPI, seed data, Helm
chart, connectors docs, README. Testing docs (testing-guide, qa-test-guide,
qa_test.go) with Parts thematically integrated near related connectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-07 20:21:09 -04:00
shankar0123 7d6ef44e21 feat(M46): Windows Certificate Store + Java Keystore target connectors, shared certutil package
Extract shared certutil helpers (CreatePFX, ParsePrivateKey, ComputeThumbprint,
GenerateRandomPassword, ParseCertificatePEM) from IIS connector for reuse.
Add WinCertStore connector (PowerShell Import-PfxCertificate, dual local/WinRM
mode, configurable store/location, expired cert cleanup) and JavaKeystore
connector (PEM→PKCS#12→keytool pipeline, JKS/PKCS12 support, shell injection
prevention, path traversal protection). 53 new tests, all passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 19:14:32 -04:00
shankar0123 f92c997a50 feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone:

1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with
   `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing
   acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support.
   ES256 JWS signing with kid mode, nonce management, directory discovery.
   Empty profile delegates to standard library path (zero behavior change).
   Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on
   ACME issuer config.

2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source,
   docs, README, and examples. Zero remaining occurrences of RFC 9702.

3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating
   renewal thresholds against SC-081v3 validity reduction timeline (200→100→47
   days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the
   expected renewal path for 6-day shortlived certs.

New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 13:52:13 -04:00
shankar0123 697c0be9f3 feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.

Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard

Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 12:36:01 -04:00
shankar0123 e6088c79a3 feat(M35): dynamic target configuration with encrypted config, test connection, and GUI updates
Mirror M34's dynamic issuer config pattern for deployment targets: AES-256-GCM
encrypted config storage, sensitive field redaction in API responses, agent
heartbeat-based test connection endpoint, and full frontend updates including
test status indicators, source badges, and removal of stale hostname/status
fields from the Target interface.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 01:09:53 -04:00
shankar0123 995b72df05 feat(M34): dynamic issuer configuration with encrypted config storage
Replace static env-var-based issuer wiring with GUI-driven dynamic
configuration stored encrypted in PostgreSQL. Operators can now
configure, test, enable/disable, and manage issuers from the dashboard
without restarting the server.

Key changes:
- AES-256-GCM encryption for sensitive issuer config at rest (PBKDF2
  key derivation with 100k iterations)
- Dynamic IssuerRegistry with sync.RWMutex replacing static map
- Connector factory pattern (issuerfactory.NewFromConfig) replacing
  140 lines of static wiring in main.go
- Migration 000009: encrypted_config, last_tested_at, test_status,
  source columns on issuers table
- Env var seeding on first boot with ON CONFLICT DO NOTHING
- Registry Rebuild() for atomic map swap after CRUD operations
- Issuer type validation against domain constants on Create
- Audit trail for test connection results
- Conditional seeding for step-ca/OpenSSL (only when env vars set)
- GUI: source badge, connection test status on issuer detail page

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-04 00:20:13 -04:00
shankar0123 5a53b648b1 feat(M44): Google CAS issuer connector
Google Cloud Certificate Authority Service integration via REST API
with OAuth2 service account auth (JWT→access token). Synchronous
issuance model, CA pool selection, mutex-guarded token caching,
revocation with RFC 5280 reason mapping. No Google SDK dependency —
all stdlib. 19 tests with httptest mock OAuth2 + CAS API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 21:25:34 -04:00
shankar0123 3a11e447cf feat(M43): Sectigo SCM issuer connector
Implement Sectigo Certificate Manager REST API connector with async
order model (enroll → poll → collect PEM), 3-header auth, DV/OV/EV
support, collect-not-ready (400/-183) graceful handling, and RFC 5280
revocation reason mapping. 20 tests with httptest mock API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 21:01:14 -04:00
shankar0123 9feb6c796d feat(M42): Postfix/Dovecot mail server target connector
Dual-mode TLS connector for mail servers — single package with mode
field selecting Postfix or Dovecot defaults. File-based cert/key
deployment with correct permissions (cert 0644, key 0600), optional
chain append, shell injection prevention, and configurable
reload/validate commands. 18 tests covering config validation,
deployment, and security. GUI wizard fields and OpenAPI enum updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 01:46:15 -04:00
shankar0123 fd05bacb76 feat(M41): Envoy target connector with SDS support
File-based deployment for Envoy service mesh — writes cert/key/chain
to watched directory with optional SDS JSON config for xDS bootstrap.
Path traversal prevention, configurable filenames, 15 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 01:23:35 -04:00
shankar0123 6375909591 feat: add Vault PKI and DigiCert CertCentral issuer connectors (M32 + M37)
Vault PKI: synchronous issuance via /v1/{mount}/sign/{role}, token auth,
revocation, CA cert retrieval, 14 tests. DigiCert CertCentral: async order
model (submit → poll → download), X-DC-DEVKEY auth, OV/EV support, PEM
bundle parsing, 16 tests. Both conditionally registered based on env vars.
Includes OpenAPI enum updates, seed data, connector docs, architecture docs,
README badges, and testing guide sign-off (Parts 38 + 39, 12 automated
smoke test assertions all passing).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 17:19:46 -04:00
shankar0123 11173a74c6 feat(M31): agent work routing — scope jobs to assigned agents
Deployment jobs now set agent_id from target→agent relationship at
creation time. GetPendingWork() uses ListPendingByAgentID() with a
3-way UNION query (direct match, legacy NULL fallback via target JOIN,
AwaitingCSR via cert→target→agent chain) so each agent only receives
its own jobs.

- Added AgentID *string to Job domain struct
- Added agent_id to all job SQL queries (5 SELECTs, INSERT, UPDATE, scanJob)
- New ListPendingByAgentID() repository method
- Rewrote GetPendingWork() from ~25 lines to single scoped query
- 4 new Go tests (3 agent routing + 1 deployment agent_id)
- Frontend: agent_id/target_id on Job type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:10:42 -04:00
shankar0123 ec21c9bb29 feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing
with cert ID computation, directory endpoint discovery, graceful
degradation for non-ARI CAs. 19 tests.

M29: Email notifier wiring + scheduled certificate digest — SMTP
connector bridged to service layer via NotifierAdapter, DigestService
with HTML email template, 7th scheduler loop (24h), digest preview/send
API endpoints and GUI card. 21 tests.

M30: Production-ready Helm chart — server Deployment, PostgreSQL
StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security
contexts, health probes, example values for dev/prod/ACME scenarios.

Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job,
documentation updates across 5 doc files and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 21:18:35 -04:00
shankar0123 03472072b8 test + docs: close 12 test gaps (~250 new tests) and expand testing guide to 34 parts
Implements all P0-P2 test gaps from docs/test-gap-prompt.md:
- Deployment service tests (20), target service tests (18), scheduler tests (8)
- Agent binary tests (48), CSR renewal tests (8), short-lived cert tests (7)
- Domain model tests (25), context cancellation tests (9), concurrency tests (7)
- Handler negative-path tests (23 across 5 files)
- Frontend error handling tests (86) and API client tests (7)

Expands testing-guide.md from 28 to 34 parts covering certificate export,
S/MIME/EKU, OCSP/DER CRL, body size limits, Apache/HAProxy connectors,
and sub-CA mode. Fixes stale profile count (4->5) and updates sign-off table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 17:57:25 -04:00
shankar0123 a0afa7ab6f test(security): TICKET-018 add fuzz tests for command validation and domain parsing
Added Go native fuzz tests (testing/fuzz) for security-critical input validation:

1. FuzzValidateShellCommand in internal/validation/command_fuzz_test.go
   - Tests shell command validation with injection payloads (;, |, &, $, `, etc.)
   - Seed corpus includes valid commands and dangerous metacharacters
   - Ensures function never panics under fuzzing

2. FuzzValidateDomainName in internal/validation/command_fuzz_test.go
   - Tests RFC 1123 domain validation with wildcard support
   - Seed corpus includes SQL injection, path traversal, and malformed domains
   - Ensures function never panics under fuzzing

3. FuzzValidateACMEToken in internal/validation/command_fuzz_test.go
   - Tests base64url token validation
   - Seed corpus includes injection payloads and special characters
   - Ensures function never panics under fuzzing

4. FuzzIsValidRevocationReason in internal/domain/revocation_fuzz_test.go
   - Tests RFC 5280 revocation reason validation
   - Seed corpus includes case variations, injection attempts, and null bytes
   - Ensures function never panics and returns only valid booleans

5. FuzzCRLReasonCode in internal/domain/revocation_fuzz_test.go
   - Tests CRL reason code mapping
   - Validates return codes are within 0-9 range
   - Ensures invalid reasons default to 0 (unspecified)

All fuzz tests follow Go 1.18+ testing/fuzz conventions with seed corpus
for faster discovery of edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 21:40:49 -04:00
shankar0123 be72627aeb feat: M25 post-deployment TLS verification + M26 Traefik/Caddy targets
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.

M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.

Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 21:07:16 -04:00
shankar0123 8308beb5bb fix: Docker Compose missing migrations, network scan []int crash, demo seed data
Three bugs fixed:
- Docker Compose only mounted migration 000001; migrations 000002-000007
  (profiles, agent groups, revocation, discovery, network scans) never ran,
  breaking half the demo features. Now mounts all 7 migrations in order.
- Network Scans page crashed with pq.Array scan error because lib/pq
  doesn't support []int, only []int64. Changed Ports field accordingly.
- Dashboard pie chart displayed "RenewalInProgress" without spaces.
  Added formatStatus() helper for PascalCase → spaced display.

Also adds first-run demo experience improvements:
- 9 discovered certificates (filesystem + network scan mix)
- 3 discovery scans with recent timestamps
- 2 AwaitingApproval renewal jobs for approval workflow demo
- CERTCTL_NETWORK_SCAN_ENABLED=true in Docker Compose
- Network scan targets seeded with last_scan results
- Version badge updated to v2.0.5
- Docs updated (quickstart, advanced demo) to reference seeded data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 18:33:50 -04:00
shankar0123 e19c240a79 feat: add ACME DNS-PERSIST-01 challenge support (IETF draft-ietf-acme-dns-persist)
Standing TXT record at _validation-persist.<domain> eliminates per-renewal
DNS updates. Auto-fallback to dns-01 if CA doesn't offer dns-persist-01.
ScriptDNSSolver extended with PresentPersist method. Configurable via
CERTCTL_ACME_CHALLENGE_TYPE=dns-persist-01 and
CERTCTL_ACME_DNS_PERSIST_ISSUER_DOMAIN env vars.

Also fixes IsExpired edge-case test in discovery_test.go that always failed
due to time.Now() drift between test setup and method invocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-26 14:23:46 -04:00
shankar0123 7d14635a72 feat: add EST server (RFC 7030) for device certificate enrollment (M23)
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).

Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
  quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
  issuer interface, resource lists, OpenSSL status)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:31:06 -04:00
shankar0123 4f90be9311 feat: add network certificate discovery (M21) and Prometheus metrics (M22)
M21 adds server-side active TLS scanning of CIDR ranges with concurrent
probing, sentinel agent pattern for pipeline reuse, and full CRUD API for
scan targets. M22 adds Prometheus exposition format endpoint alongside
existing JSON metrics. Comprehensive documentation audit updates all docs
to reflect 91 endpoints, 19 tables, 6 scheduler loops, and 900+ tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 23:37:47 -04:00
shankar0123 8028c14356 fix: remove unused import and variable flagged by go vet
Remove unused repository import from discovery_handler_test.go and
unused tests variable from discovery_test.go (replaced by testCases).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 01:07:16 -04:00
shankar0123 667a30870d feat: M18b Filesystem Certificate Discovery — agent scanning, server dedup, triage API
Agent-side:
- Filesystem scanner walks configured directories (CERTCTL_DISCOVERY_DIRS)
- Parses PEM (.pem, .crt, .cer, .cert) and DER (.der) certificate files
- Extracts CN, SANs, serial, issuer/subject DN, validity, key info, SHA-256 fingerprint
- Reports discoveries to control plane on startup + every 6 hours
- Skips files >1MB and private key files

Server-side:
- Migration 000006: discovered_certificates + discovery_scans tables
- Domain model: DiscoveredCertificate, DiscoveryScan, DiscoveryReport
- Three triage states: Unmanaged, Managed (claimed), Dismissed
- Repository with upsert dedup (fingerprint + agent + path)
- Service layer: process reports, claim, dismiss, list, summary
- 7 new API endpoints (84 total):
  POST /agents/{id}/discoveries, GET /discovered-certificates,
  GET /discovered-certificates/{id}, POST .../claim, POST .../dismiss,
  GET /discovery-scans, GET /discovery-summary
- Audit trail: scan_completed, cert_claimed, cert_dismissed events

Tests: 28 new test functions (domain, handler, service layers)
Docs: README, quickstart, demo-guide, demo-advanced, architecture,
      concepts, connectors, features.md all updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-24 00:25:00 -04:00
shankar0123 df1aaa37f8 feat: M17 OpenSSL/Custom CA issuer connector + M16b CLI tool with bulk import
M17: Script-based issuer connector delegating sign/revoke/CRL to user-provided
scripts. Compatible with any CA tooling (OpenSSL, cfssl, custom PKI). Configurable
timeout, environment variable passthrough. 14 tests including timeout enforcement.

M16b: certctl-cli wraps all 76 REST API endpoints for terminal workflows. Supports
certs/agents/jobs list/get/renew/revoke/cancel, bulk PEM import with progress
reporting, server health status, table and JSON output formats. Zero external
dependencies (stdlib only). 14 tests with mock HTTP server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:12:40 -04:00
shankar0123 9b0ff37973 feat: M19 API audit log + M16a notifier connectors (Slack, Teams, PagerDuty, OpsGenie)
M19: HTTP middleware records every API call to the immutable audit trail
with method, path, actor, SHA-256 body hash, status, and latency. Best-effort
async recording via goroutine. Health/ready probes excluded.

M16a: Four pluggable notifier connectors — Slack (incoming webhook), Teams
(MessageCard), PagerDuty (Events API v2), OpsGenie (Alert API v2). Each
enabled by config env var. 30 new tests across middleware and connectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 17:58:14 -04:00
shankar0123 5d98e373e3 feat: M15a — certificate revocation API, CRL endpoint, and revocation notifications
Implements core revocation infrastructure: POST /api/v1/certificates/{id}/revoke
with all 8 RFC 5280 reason codes, JSON-formatted CRL at GET /api/v1/crl, webhook
and email revocation notifications, best-effort issuer notification, and immutable
revocation audit trail. Includes 48 new tests across service, handler, integration,
and domain layers (600+ total). Fixes 3 pre-existing test bugs (team_test error
matching, agent_group delete status code, team handler per_page validation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 10:59:18 -04:00
shankar0123 f5fed74d6f feat: M12 — sub-CA mode, ACME DNS-01 challenges, step-ca issuer connector
Sub-CA mode: Local CA loads CA cert+key from disk (CERTCTL_CA_CERT_PATH +
CERTCTL_CA_KEY_PATH) to operate as subordinate CA under enterprise root
(e.g., ADCS). Supports RSA, ECDSA, PKCS#8 keys. Validates IsCA and
KeyUsageCertSign. Falls back to self-signed when paths unset.

DNS-01 challenges: Pluggable DNSSolver interface with script-based hook
implementation. User-provided scripts create/cleanup _acme-challenge TXT
records for any DNS provider. Configurable propagation wait. Enables
wildcard certs and non-HTTP-accessible hosts.

step-ca connector: Smallstep private CA via native /sign API with JWK
provisioner auth. Issuance, renewal, revocation. Registered as iss-stepca.

23 new tests across 3 files. CI test path widened to ./internal/connector/issuer/...

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 22:55:50 -04:00
shankar0123 b0549e6f05 feat: M11b — ownership tracking, agent groups, interactive renewal approval
Ownership: owners/teams GUI pages, notification email resolution via
resolveRecipient (owner_id → owner.email lookup). Agent groups: dynamic
device grouping by OS/arch/IP CIDR/version with manual include/exclude
membership, migration 000004, full CRUD stack (domain → repo → service →
handler → frontend). Interactive approval: AwaitingApproval job state,
approve/reject API endpoints with reason tracking. Tests: 12 agent group
handler tests, 8 approve/reject job handler tests, integration tests
updated for 13-param RegisterHandlers. Docs updated across architecture,
concepts, and seed data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 21:02:35 -04:00
shankar0123 a579a84c7f feat: M11a — certificate profiles, crypto policy enforcement, short-lived cert expiry
Add certificate profiles as named enrollment templates that control allowed
key algorithms, max TTL, permitted EKUs, required SAN patterns, and optional
SPIFFE URI SANs. CSR submissions are validated against profile rules at
signing time (key type + minimum size). Short-lived certs (TTL < 1 hour)
auto-expire via a new scheduler loop — expiry acts as revocation, no
CRL/OCSP needed.

New files:
- Migration 000003: certificate_profiles table, FK columns on
  managed_certificates/renewal_policies, key metadata on certificate_versions
- domain/profile.go: CertificateProfile + KeyAlgorithmRule structs
- repository/postgres/profile.go: full CRUD with JSONB marshaling
- service/profile.go: ProfileService with validation + audit logging
- service/crypto_validation.go: CSR-against-profile validation (RSA/ECDSA/Ed25519)
- handler/profiles.go: 5 HTTP endpoints under /api/v1/profiles
- web/src/pages/ProfilesPage.tsx: profiles management page

Modified:
- renewal.go: CSR validation in CompleteAgentCSRRenewal, ExpireShortLivedCertificates
- scheduler.go: 30s short-lived expiry check loop
- certificate.go (repo): nullable profile FK, key metadata on versions
- main.go: profile repo/service/handler wiring, 8-param NewRenewalService
- router.go: 12-param RegisterHandlers with profile routes
- seed_demo.sql: 4 demo profiles (standard, mtls, short-lived, high-security)
- Frontend: types, API client, routing, sidebar nav

Tests: 40 new tests across handler (15), service (13), crypto validation (12)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 20:39:49 -04:00
shankar0123 07275bf92f feat: M10 — agent metadata collection, Apache httpd + HAProxy target connectors
Agents now report OS, architecture, IP address, hostname, and version
via heartbeat using runtime.GOOS, runtime.GOARCH, and net.Dial. New
migration adds columns to agents table. Heartbeat handler, service,
and repository updated to accept and persist metadata. GUI shows
OS/Arch in agent list and full system info in agent detail page.

Apache httpd connector: separate cert/chain/key files, apachectl
configtest validation, graceful reload. HAProxy connector: combined
PEM file (cert+chain+key), optional config validation, reload.
Both wired into agent binary's target connector switch.

14 tests for new connectors. All existing tests updated for new
Heartbeat/UpdateHeartbeat signatures. Docs updated across README,
architecture, concepts, and connectors guides.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 02:19:28 -04:00
shankar0123 66f04f7afe style: run gofmt -s across all Go files
Fixes Go Report Card gofmt score from 52% to 100%.
Pure formatting changes — no logic modifications.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 19:32:29 -04:00
shankar0123 e2821c448a Implement M8: agent-side key generation with ECDSA P-256
Private keys never leave agent infrastructure. Agents generate ECDSA P-256
key pairs locally, store them with 0600 permissions, and submit only the CSR
(public key) to the control plane. New AwaitingCSR job state pauses
renewal/issuance jobs until the agent submits its CSR. Server-side keygen
retained behind CERTCTL_KEYGEN_MODE=server for demo/development.

Key changes:
- Dual keygen mode via CERTCTL_KEYGEN_MODE (agent default, server for demo)
- AwaitingCSR job state with CommonName/SANs in work response
- Agent ECDSA P-256 keygen, local key storage, CSR-only submission
- CompleteAgentCSRRenewal server-side flow for agent-submitted CSRs
- DeploymentRequest.KeyPEM for agent-provided keys during deployment
- Dockerfile.agent creates /var/lib/certctl/keys with correct ownership

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 13:51:41 -04:00
shankar0123 1d1b89c9b5 Implement M3: expiration threshold alerting with dedup and status transitions
- Add alert_thresholds_days JSONB column to renewal_policies (default [30,14,7,0])
- Add RenewalPolicy.AlertThresholdsDays field + EffectiveAlertThresholds() helper
- Add RenewalPolicyRepository interface + postgres implementation
- Rewrite CheckExpiringCertificates with per-policy threshold alerting
- Add SendThresholdAlert + HasThresholdNotification for deduplication via [threshold:N] tags
- Add Type and MessageLike filters to NotificationFilter + postgres query support
- Auto-transition certs to Expiring (>0 days) or Expired (<=0 days) status
- Record expiration_alert_sent audit events per threshold crossing
- Fix .gitignore: allow SQL migration files, scope server/agent build artifact rules
- Track previously untracked cmd/ and migrations/ directories
- Update docs (README, architecture, demo-advanced) for threshold alerting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 00:03:43 -04:00
shankar0123 ae67b10708 Complete M1, M1.1, M2: end-to-end lifecycle, agent deployment, ACME v2
- Wire issuer connector end-to-end with IssuerConnectorAdapter (dependency inversion)
- Renewal/issuance job processor: RSA key + CSR generation, Local CA signing, cert version storage
- Agent work API (GET /agents/{id}/work) and job status API (POST /agents/{id}/jobs/{job_id}/status)
- Agent-side deployment: WorkItem enrichment with target type/config, NGINX/F5/IIS connector invocation
- Full ACME v2 implementation: HTTP-01 challenge solving, account registration, order lifecycle
- Update all docs (README, architecture, connectors, demo-advanced, quickstart) for M1-M2
- Fix go vet warning in deployment.go (non-constant format string)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 23:49:45 -04:00
shankar0123 d395776a95 Initial scaffold: certificate control plane v0.1.0 2026-03-14 08:22:17 -04:00