Commit Graph

9 Commits

Author SHA1 Message Date
shankar0123 c5db41d3f0 openssl: add failure_test.go covering 6 shell-out error modes
Closes Top-10 fix #3 of the 2026-05-03 issuer-coverage audit (see
cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the
OpenSSL adapter (497 LOC, certctl's highest-risk issuer surface)
had openssl_test.go (8 happy-path funcs + 20 subtests) but no
dedicated _failure_test.go. Compare to ACME, Vault, DigiCert,
Sectigo, Entrust, GlobalSign, EJBCA — all peers have one. An
acquirer's diligence team flags this as an immediate blocker on
the highest-risk issuer surface.

This commit adds 6 failure-mode tests:

  1. TestOpenSSL_Issue_ScriptNotFound_OperatorActionableError —
     SignScript path doesn't exist; error wraps os.ErrNotExist
     (errors.Is); message contains 'no such file' / 'not found'
     so the operator's grep finds it in journalctl.
  2. TestOpenSSL_Issue_PermissionDenied_OperatorActionableError —
     SignScript exists with mode 0o600 (non-executable); error
     wraps os.ErrPermission; message contains 'permission'.
     Skipped under root (uid 0 bypasses chmod gating).
  3. TestOpenSSL_Issue_MalformedStdout_DistinguishedFromCSRReject
     — script exits 0 + writes garbage (no PEM markers) to the
     cert output file; error mentions PEM/certificate/parse so
     operators distinguish output-parsing failure from a script-
     side fault.
  4. TestOpenSSL_Issue_NonZeroExit_DistinguishesCAReject_From_
     ScriptError — script writes 'policy violation: …' to stderr
     and exits 2 (CA-side rejection convention); the script's
     stderr surfaces in the error message; errors.Unwrap returns
     non-nil (proving the underlying *exec.ExitError chain
     survives).
  5. TestOpenSSL_Issue_TimeoutEnforced_ContextCancellationPropagates
     — script does 'exec sleep 30' (not 'sleep 30 ' as a child;
     exec replaces bash so SIGKILL goes directly to the sleeper,
     avoiding the orphan-pipes corner case where a killed bash
     leaves sleep holding stdout/stderr open and CombinedOutput
     blocks); ctx with 100ms deadline; call returns within ~5s
     wall-clock; either errors.Is(err, context.DeadlineExceeded)
     or the error message names 'killed' / 'signal'.
  6. TestOpenSSL_Issue_SignalKilled_PartialOutputDiscarded —
     script writes a half-PEM ('-----BEGIN CERTIFICATE-----\nMII…')
     then 'kill -KILL $$'; assertion: result is nil OR
     CertPEM is empty (no half-cert leaks to caller); error
     names 'signal' / 'killed' OR 'PEM' / 'parse' (both are
     operator-actionable).

Each test pins the operator-actionable error message contract:
the message names the failure mode (so journalctl + grep find
it) and proves no half-state was created (no partial cert
returned). errors.Is / errors.Unwrap checks confirm the wrapping
chain survives.

The OpenSSL adapter has no commandRunner abstraction (production
code uses exec.CommandContext directly); these tests use real
operator-supplied scripts written to t.TempDir (matches the
adapter's actual production code path; no os/exec mocking). The
'exec sleep 30' technique in Test 5 is the load-bearing fix for
the bash-orphans-sleep-and-pipes-stay-open corner case that
otherwise makes the test take 30s instead of 100ms.

Coverage delta:
  - Before this commit: openssl_test.go + openssl_stubs_test.go
    covered 8 happy-path funcs.
  - After: 79.8% statement coverage of openssl.go (up from
    operator-pre-existing baseline; the 6 new tests exercise
    every error path through callSignScript + parseCertificate).

Tests pass clean under '-race -count=10' (Test 5's deadline
tolerance is the only timing-sensitive case; the 5s wall-clock
budget vs the 100ms ctx deadline gives ample slack on slow CI
without masking deadline-not-enforced bugs).

Test-only commit; no production code changes. Hardening fixes
(per-call concurrency semaphore, threat-model docs) are separate
Top-10 entries.

Verified locally:
  - gofmt clean across the repo.
  - go vet ./... clean across the repo.
  - go test -race -count=10 -short
    ./internal/connector/issuer/openssl/... green.

Audit reference: cowork/issuer-coverage-audit-2026-05-03/
RESULTS.md Top-10 fix #3.
2026-05-03 20:55:44 +00:00
shankar0123 482c7e8047 chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit 71b2245) surfaced 111 files
with accumulated gofmt drift across cmd/, internal/, and deploy/test/.

Each file's diff is gofmt-standard: whitespace adjustments, intra-
group import sorting (alphabetical by import path within blank-line-
separated groups), and struct-tag column alignment. No semantic
changes — verified via 'git diff --ignore-all-space' which shows only
the line-position deltas from import reordering.

The gate stays in place after this commit. Going forward it catches
gofmt drift at PR time.
2026-04-30 22:33:57 +00:00
cowork 933edfb1d0 Bundle N (Coverage Audit Closure) [partial]: issuer-connector stubs coverage
Closes M-001 partially; M-002, M-003, and CI threshold raise #2 deferred.

Stubs coverage shipped across 8 issuer connectors via per-connector

<conn>_stubs_test.go (~50 LoC each) pinning the not-supported

issuer.Connector interface methods (GenerateCRL, SignOCSPResponse,

GetCACertPEM, GetRenewalInfo). Most CAs delegate CRL/OCSP/CA-cert

distribution to managed services, so these are documented stubs that

return errors. Pinning them ensures the stubs aren't silently replaced

with no-ops in a future refactor.

Coverage delta:

  digicert:   79.3% -> 81.0%  (+1.7pp)

  ejbca:      75.8% -> 76.5%  (+0.7pp)

  entrust:    70.8% -> 70.8%  (stubs already covered)

  sectigo:    78.0% -> 79.4%  (+1.4pp)

  vault:      81.0% -> 84.1%  (+3.1pp)

  openssl:    76.9% -> 78.0%  (+1.1pp)

  googlecas:  81.0% -> 83.4%  (+2.4pp)

  globalsign: 75.9% -> 78.2%  (+2.3pp)

(awsacmpca not included; its 0%-coverage hotspots are stubClient methods

structurally different from the others' interface stubs. Already at 83.5%.)

Why the gates aren't yet met: the stub functions are tiny (1-2 lines

each, mostly 'return nil, fmt.Errorf("not supported")'). Lifting each

connector to >=85% requires per-connector failure-mode test files

mirroring Bundle J's ACME pattern (httptest.Server + canned 401/403/

429+Retry-After/5xx/malformed responses against the actual API methods).

That's ~200-300 LoC x 9 connectors = ~2000-2700 LoC of bespoke per-CA

mock work; exceeds this session's budget. Tracked as follow-on

Bundle N.A-extended / N.B-extended.

Deferred sub-batches:

  N.C (M-002 + M-003): internal/service (70.5%) + internal/api/handler

    (79.4%) round-out NOT YET STARTED. Tracked as Bundle N.C-extended.

  N.CI (CI threshold raise #2): prescribed raises require underlying

    coverage at proposed floors first. Premature raise would fail CI

    immediately. Tracked as Bundle N.CI-extended.

Verification:

  go vet ./internal/connector/issuer/{8-pkgs}/...   clean

  gofmt -l                                          clean

  go test -short -count=1                           PASS for all 8

Audit deliverables:

  gap-backlog.md: M-001 partial-strikethrough with per-connector table

    + Bundle N closure-log entry covering all 4 sub-batch statuses

  closure-plan.md: Bundle N [~] with per-sub-batch status breakdown

  CHANGELOG.md: [unreleased] Bundle N entry
2026-04-27 17:45:18 +00:00
Shankar ff223e2586 feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths
(renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs
with key algorithm/size that don't match profile rules. MaxTTL enforcement
caps certificate validity per issuer connector (Local CA, Vault, step-ca
enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and
size are now persisted in certificate_versions for audit compliance.

16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded
version number from GUI sidebar. Documentation updated across architecture,
features, connectors, and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 21:05:14 -04:00
Shankar 3f1f94f56b feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing
with cert ID computation, directory endpoint discovery, graceful
degradation for non-ARI CAs. 19 tests.

M29: Email notifier wiring + scheduled certificate digest — SMTP
connector bridged to service layer via NotifierAdapter, DigestService
with HTML email template, 7th scheduler loop (24h), digest preview/send
API endpoints and GUI card. 21 tests.

M30: Production-ready Helm chart — server Deployment, PostgreSQL
StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security
contexts, health probes, example values for dev/prod/ACME scenarios.

Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job,
documentation updates across 5 doc files and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 21:18:35 -04:00
Shankar 269d341e50 fix: security audit remediation (AUDIT-001, 003, 004, 005, 006, 018)
- AUDIT-001: Validate OpenSSL revoke inputs (hex-only serials, RFC 5280 reasons)
- AUDIT-003: Enforce /20 CIDR size cap at API level (create + update)
- AUDIT-004: Support comma-separated CERTCTL_AUTH_SECRET for zero-downtime key rotation
- AUDIT-005: Add ReadHeaderTimeout (5s) to prevent Slowloris
- AUDIT-006: Document audit trail query parameter exclusion rationale
- AUDIT-018: Add immediate-run-on-start to short-lived expiry scheduler loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 14:11:16 -04:00
Shankar 55d22c3cb2 fix(quality): TICKET-012 propagate request context instead of context.Background()
- Updated AgentService interface to accept context.Context parameter in all methods
- Replaced context.Background() calls with proper ctx parameter in agent.go
- Updated AgentGroupService interface to accept context.Context parameter
- Replaced context.Background() calls with proper ctx parameter in agent_group.go
- Updated handler methods to pass r.Context() to service methods
- Context now properly propagates through request lifecycle for timeout/cancellation
- Improved request tracing and cancellation behavior
2026-03-27 21:35:22 -04:00
Shankar e4ba8d4de2 feat: add EST server (RFC 7030) for device certificate enrollment (M23)
Implement Enrollment over Secure Transport protocol with 4 endpoints under
/.well-known/est/ — cacerts (CA chain distribution), simpleenroll (initial
enrollment), simplereenroll (certificate renewal), and csrattrs (CSR
attributes). PKCS#7 certs-only wire format with hand-rolled ASN.1, accepts
both PEM and base64-encoded DER CSRs, configurable issuer and profile
binding, full audit trail. 28 new tests (18 handler + 10 service).

Also includes:
- GetCACertPEM added to issuer connector interface (all 4 issuers updated)
- EST integration tests wired into e2e test suite (13 test cases)
- QA testing guide Part 26 (15 manual EST test cases)
- All docs updated: README, features, architecture, concepts, connectors,
  quickstart, demo-advanced (endpoint counts, MCP wording, agent IDs,
  issuer interface, resource lists, OpenSSL status)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 15:31:06 -04:00
Shankar 502902b8c9 feat: M17 OpenSSL/Custom CA issuer connector + M16b CLI tool with bulk import
M17: Script-based issuer connector delegating sign/revoke/CRL to user-provided
scripts. Compatible with any CA tooling (OpenSSL, cfssl, custom PKI). Configurable
timeout, environment variable passthrough. 14 tests including timeout enforcement.

M16b: certctl-cli wraps all 76 REST API endpoints for terminal workflows. Supports
certs/agents/jobs list/get/renew/revoke/cancel, bulk PEM import with progress
reporting, server health status, table and JSON output formats. Zero external
dependencies (stdlib only). 14 tests with mock HTTP server.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:12:40 -04:00