Commit Graph

3 Commits

Author SHA1 Message Date
shankar0123 62a412c488 Bundle C: Renewal/reliability cluster — 7 findings closed
Closes M-006 + M-007 + M-008 + M-015 + M-016 + M-019 + M-020 from
comprehensive-audit-2026-04-25. M-028 was already closed by the
Bundle B CI follow-up.

M-006 (CWE-913) — Idempotent migration 000014
  migrations/000014_policy_violation_severity_check.up.sql:
    Prepended ALTER TABLE ... DROP CONSTRAINT IF EXISTS before the
    ADD. Mirrors the down migration's existing IF EXISTS shape and
    the M-7 idempotent-index idiom. Re-runs against partially-applied
    DBs now succeed.

M-007 — Bulk-op partial-failure tests (3 new)
  internal/api/handler/bulk_partial_failure_test.go:
    TestBulkRevoke_PartialFailure_ReportsBoth
    TestBulkRenew_PartialFailure_ReportsBoth
    TestBulkReassign_PartialFailure_ReportsBoth
  Each asserts HTTP 200 + both success/failure counters round-trip
  + per-cert errors[] preserved with non-empty messages so operators
  can correlate each failure to its certificate ID.

M-008 — Admin-gated handler enumeration pin (verified-already-clean)
  Recon: only one admin-gated handler — bulk_revocation.go — with
  full 3-branch test triplet already in place. health.go calls
  IsAdmin informationally to surface the flag to the GUI without
  gating.
  internal/api/handler/m008_admin_gate_test.go:
    Walks every handler .go file, asserts every middleware.IsAdmin
    call site is in AdminGatedHandlers (with required test triplet)
    or InformationalIsAdminCallers (justified). Adding a new admin
    gate without updating both the constant AND adding the test
    triplet fails CI.

M-015 — Single-profile cardinality pin (verified-already-clean)
  Audit claim 'no cardinality validation' was wrong — enforced at
  struct level. domain.ManagedCertificate.{CertificateProfileID,
  RenewalPolicyID,IssuerID,OwnerID} and RenewalPolicy.
  CertificateProfileID are bare strings, not slices.
  internal/domain/m015_cardinality_test.go:
    reflect-based pin on kind=String. Schema change to N:N would
    have to update renewal.go's lookup loop in the same commit.

M-016 (CWE-754) — Reap stale-agent jobs
  internal/repository/postgres/job.go::ListJobsWithOfflineAgents:
    JOIN jobs to agents on agent_id, filter (status=Running AND
    a.last_heartbeat_at < cutoff), exclude server-keygen jobs.
  internal/service/job.go::ReapJobsWithOfflineAgents:
    Flips matched jobs to Failed reason agent_offline so I-001
    retry loop re-queues them on a healthy agent. Records audit
    event per reap.
  internal/scheduler/scheduler.go:
    Scheduler.runJobTimeout cycle now calls both reaper arms.
    agentOfflineJobTTL default 5min (5x agent-health-check default);
    SetAgentOfflineJobTTL knob for operator override.
  internal/service/job_offline_agent_reaper_test.go: 6 unit tests
  cover happy path, server-keygen-skip, non-Running-skip, non-
  positive-TTL fail-loud, repo-error propagation, audit-event
  recording.

M-019 — Configurable ARI HTTP timeout
  Audit claim 'no fallback timeout' was wrong — ari.go:52 already
  had a 15s timeout. Bundle C makes it configurable.
  internal/connector/issuer/acme/acme.go:
    Config.ARIHTTPTimeoutSeconds field with env path
    CERTCTL_ACME_ARI_HTTP_TIMEOUT_SECONDS.
  internal/connector/issuer/acme/ari.go:
    Both HTTP clients (GetRenewalInfo + getARIEndpoint) now use the
    new ariHTTPTimeout() helper. Zero / negative / nil-config all
    fall back to the historic 15s default.
  ari_timeout_test.go: 4 dispatch arm tests.

M-020 (CWE-770) — OCSP DoS hardening
  Pre-bundle the noAuthHandler chain had no rate limit. An attacker
  could DoS the OCSP responder, which for fail-open relying parties
  is a revocation bypass.
  cmd/server/main.go:
    noAuthHandler refactored from fixed middleware.Chain(...) to a
    conditional slice that appends middleware.NewRateLimiter when
    cfg.RateLimit.Enabled. Per-IP keying applies; OCSP/CRL/EST/SCEP
    are unauth.
  docs/security.md (NEW):
    Operator runbook documenting Must-Staple TLS Feature extension
    RFC 7633 as the architectural fix for fail-open relying parties.
    Profile-flip guidance + nginx/Apache/HAProxy/Envoy stapling
    snippets + explicit scope statement on what the rate limiter
    alone does NOT solve.

Audit deliverables:
  cowork/comprehensive-audit-2026-04-25/audit-report.md: score
    31/55 -> 38/55 closed (Medium 13/27 -> 20/27).
  cowork/comprehensive-audit-2026-04-25/findings.yaml: 7 status
    flips open -> closed with closure notes citing the Bundle C
    mechanism.
  certctl/CHANGELOG.md: Bundle C section under [unreleased].

Verification:
  go vet ./internal/service ./internal/scheduler ./internal/connector/issuer/acme
    ./internal/api/handler ./internal/domain ./cmd/server     clean
  go test -count=1 -short on the same packages              all green
  helm template + helm lint                                 clean
  internal/repository/postgres setup-fail                   sandbox disk
    pressure (same on master HEAD before this branch)
2026-04-27 00:08:25 +00:00
shankar0123 f92c997a50 feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone:

1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with
   `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing
   acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support.
   ES256 JWS signing with kid mode, nonce management, directory discovery.
   Empty profile delegates to standard library path (zero behavior change).
   Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on
   ACME issuer config.

2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source,
   docs, README, and examples. Zero remaining occurrences of RFC 9702.

3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating
   renewal thresholds against SC-081v3 validity reduction timeline (200→100→47
   days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the
   expected renewal path for 6-day shortlived certs.

New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 13:52:13 -04:00
shankar0123 ec21c9bb29 feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing
with cert ID computation, directory endpoint discovery, graceful
degradation for non-ARI CAs. 19 tests.

M29: Email notifier wiring + scheduled certificate digest — SMTP
connector bridged to service layer via NotifierAdapter, DigestService
with HTML email template, 7th scheduler loop (24h), digest preview/send
API endpoints and GUI card. 21 tests.

M30: Production-ready Helm chart — server Deployment, PostgreSQL
StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security
contexts, health probes, example values for dev/prod/ACME scenarios.

Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job,
documentation updates across 5 doc files and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 21:18:35 -04:00