Commit Graph

28 Commits

Author SHA1 Message Date
shankar0123 5dc698307b chore: rename Go module path to github.com/certctl-io/certctl
Mechanical sed across the main go.mod's module declaration, the f5-mock-icontrol
sub-module's go.mod, every Go file's import path (361 files), and a rebuild of
the checked-in f5-mock-icontrol binary so its embedded build-info reflects the
new module path. No behavior change.

Choice B from cowork/transfer-certctl-to-org.md, executed 2026-05-04. Choice A
(keep module path declared as github.com/shankar0123/certctl regardless of
repo URL) shipped on the day of the org transfer (2026-05-03) since we had no
external Go consumers; this commit closes that deferral.

Backward-compat: GitHub HTTP redirects continue to forward
github.com/shankar0123/certctl → github.com/certctl-io/certctl at the URL
level, but Go's module proxy uses the path declared in go.mod as the
canonical name. Pre-fix, anyone trying `go get github.com/certctl-io/certctl/...`
hit a "module path mismatch" error because go.mod said
github.com/shankar0123/certctl and the URL they fetched it from said
certctl-io/certctl. Post-fix, the canonical name and the URL agree, so
go get / go install / external Go consumers / Go-tooling integrations
work cleanly via either the new path (preferred) or the old path (which
redirects and Go follows the redirect for source fetch).

Anyone still importing the old path inside their own code keeps working
provided they update their go.mod's `require` line to match — the module
path declared in their consumer's go.sum / go.mod is the authoritative
import name, so a mass sed across their import statements is the migration
on the consumer side. No external consumers exist today.

Diff shape:
  361 *.go files  — import path replacement only
    2 go.mod     — module declaration replacement only
    1 binary     — deploy/test/f5-mock-icontrol/f5-mock-icontrol rebuilt
                   so embedded build-info reflects the new path (8618965 vs
                   8618933 bytes; 32-byte diff is the build-info change)

  Total: 364 files, 730 insertions / 730 deletions, net-zero size, pure
  mechanical substitution.

Verification:
  gofmt: 17 files needed re-alignment after sed (the new path is one char
    shorter than the old, so column-aligned import groups drifted). Applied
    `gofmt -w` to fix.
  go mod tidy: clean exit on both modules.
  go vet ./...: clean exit.
  go build ./...: clean exit.
  go test -short -count=1 on representative packages: all green
    (internal/domain, internal/validation, internal/crypto, internal/crypto/signer,
    cmd/agent). Test output now reads `ok github.com/certctl-io/certctl/...`
    confirming the module path resolves correctly.
  binary: f5-mock-icontrol rebuilt; `strings | grep shankar0123` returns
    nothing; `strings | grep certctl-io/certctl` shows the new module path
    embedded in build-info.

Files intentionally NOT touched in this commit:
  README.md / CHANGELOG.md / docs/ / etc. — already swept to certctl-io
    URLs in commit bc6039a (the post-transfer URL refresh). This commit is
    purely the Go-tooling layer.
  Scarf pixels (`shankar0123.docker.scarf.sh/...`) — Scarf-account
    namespace, not a Go import or GitHub repo URL. Stays.

This is a non-blocking, non-customer-impacting change. Operators pulling
container images, running `make verify`, hitting the API, or installing the
agent see no functional difference. Only Go-tooling consumers (none today)
are affected, and they're enabled — not broken — by this commit.
2026-05-04 00:30:29 +00:00
shankar0123 502823dfdf ci(go.mod): fix go mod tidy drift after Rank 5 cloud-target commits
CI failed at the "go mod tidy drift" gate on commit 89b6d71 (Rank 5
follow-up). The drift was leftover from the Azure SDK addition in
commit 14fcc82 — `go get` initially pulled the deprecated
`keyvault/azcertificates v0.9.0` path before I switched the import
to the supported `security/keyvault/azcertificates v1.4.0` path. The
v0.9.0 entries stayed in go.mod / go.sum as transitive `// indirect`
because the sandbox's `go mod tidy` couldn't run during the original
commit (disk-pressure on the modcache), so the cleanup got deferred
to CI's tidy-drift gate.

Aligning go.mod + go.sum with what `go mod tidy` produces on a clean
machine. Diff applied verbatim from the CI's `git diff --exit-code`
output:

  go.mod removed (// indirect):
    github.com/Azure/azure-sdk-for-go/sdk/keyvault/azcertificates v0.9.0
    github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1
    github.com/kr/text v0.2.0  (no longer transitive after the
                                deprecated keyvault module is gone)

  go.sum removed:
    github.com/Azure/azure-sdk-for-go/sdk/keyvault/azcertificates v0.9.0 h1: + .mod
    github.com/Azure/azure-sdk-for-go/sdk/keyvault/internal v0.7.1 h1: + .mod
    github.com/creack/pty v1.1.9/go.mod
    github.com/kr/pretty v0.3.0 h1: + .mod
    github.com/rogpeppe/go-internal v1.8.1 h1: + .mod
    github.com/stretchr/testify v1.10.0 h1: + .mod

  go.sum added:
    github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.2 h1: + .mod
    github.com/AzureAD/microsoft-authentication-extensions-for-go/cache v0.1.1 h1: + .mod
    github.com/keybase/go-keychain v0.0.1 h1: + .mod
    github.com/kr/pretty v0.3.1 h1: + .mod
    github.com/rogpeppe/go-internal v1.12.0 h1: + .mod
    github.com/stretchr/testify v1.11.1 h1: + .mod

Net: 3 lines removed from go.mod, 21 lines net from go.sum (10
insertions / 14 deletions).

Verified locally:
- go build ./internal/connector/target/...  green.
- The h1: hashes copied verbatim from the CI's `go mod tidy`
  output line numbers in the run-#???? log so the operator can
  cross-reference the diff against what CI saw.
2026-05-03 23:01:08 +00:00
shankar0123 14fcc82cda target(azurekv): SDK-driven Azure Key Vault target connector
Closes Rank 5 (Azure half) of the 2026-05-03 Infisical deep-research
deliverable (cowork/infisical-deep-research-results.md Part 5).
Pre-fix, certctl had no path to deploy certs to Azure-managed TLS-
termination endpoints (Application Gateway / Front Door / App Service
/ Container Apps) — operators terminating TLS at Azure had to use
manual `az keyvault certificate import` invocations or external
automation. This commit lands the SDK-driven Azure Key Vault target
connector that closes the gap, mirroring the AWS ACM target shape
shipped in commit 54033aa.

Architecture:
  - internal/connector/target/azurekv/azurekv.go — Connector wraps
    *azcertificates.Client behind the KeyVaultClient interface seam
    (mirrors awsacm's ACMClient + awsacmpca's ACMPCAClient). Lives
    in azurekv.go alongside the PFX (PKCS#12) wrapping helper that
    bundles the operator-supplied PEM cert + chain + key into the
    base64-PFX wire format azcertificates.ImportCertificate accepts.
  - internal/connector/target/azurekv/sdk_client.go — SDK-loading
    code isolated so the test path (NewWithClient) compiles without
    pulling azcore + azidentity transitive deps into the test
    binary. DefaultAzureCredential / ManagedIdentityCredential /
    EnvironmentCredential / WorkloadIdentityCredential selected via
    Config.CredentialMode (closed enum).
  - Pre-deploy snapshot via GetCertificate(name, "" /* latest */) so
    on-import-failure rollback restores the previous cert. Mirrors
    Bundle 5+. The Azure-specific quirk: rollback creates a NEW
    VERSION (Key Vault doesn't support version-restore without
    soft-delete recovery, which we keep off the minimum-RBAC
    surface). Operators reading audit dashboards see e.g. v1=initial,
    v2=failed-renewal, v3=rollback-of-v2; the certctl-managed-by +
    certctl-certificate-id provenance tags + future certctl-rollback-of
    metadata tag let an operator filter rollback artifacts.
  - Provenance tags identical to AWS ACM
    (certctl-managed-by=certctl + certctl-certificate-id=<mc-id>),
    automatically applied on every import. Key Vault carries tags
    forward across versions (unlike ACM which strips on re-import),
    so no separate AddTags call is required.
  - DeploymentRequest.KeyPEM held in agent memory only; PFX wrapping
    happens in-memory via software.sslmate.com/src/go-pkcs12. No
    disk write.

Tests:
  - azurekv_test.go: 13-subtest happy-path + validation matrix —
    ValidateConfig (success / missing-vault-url / malformed-vault-
    url / missing-cert-name / invalid-credential-mode / reserved-
    tag rejection), DeployCertificate (fresh import / rollback-on-
    serial-mismatch / empty-key-rejected / no-client-rejected /
    SDK-error-surfaced), ValidateOnly (returns sentinel),
    ValidateDeployment (serial match / mismatch).
  - All tests use the NewWithClient injection seam; no real-Azure
    API calls.
  - go test -short -count=1 ./internal/connector/target/azurekv/...
    green.

Wiring:
  - internal/domain/connector.go: TargetTypeAzureKeyVault =
    "AzureKeyVault".
  - internal/service/target.go: validTargetTypes set extended.
  - cmd/agent/main.go::createTargetConnector: AzureKeyVault case
    arm mirroring the AWSACM shape exactly.
  - cmd/agent/agent_test.go::TestCreateTargetConnector_AllSupported
    Types: AzureKeyVault added to the type matrix + the InvalidJSON
    matrix (16 supported target types now, up from 15).

go.mod / go.sum:
  - github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 (direct).
  - github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.1 (direct).
  - github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/
    azcertificates v1.4.0 (direct). The deprecated
    /keyvault/azcertificates path appears as a transitive indirect
    via Microsoft's microsoft-authentication-library-for-go; we use
    the new /security/keyvault/ path exclusively.

Documentation:
  - docs/connectors.md "Azure Key Vault" section: config table, RBAC
    role recipe (off-the-shelf "Key Vault Certificates Officer" or
    custom role with 3 data-plane actions), AKS workload-identity /
    managed-identity / service-principal / default credential
    recipes, atomic-rollback contract + Azure-version semantics
    explanation, soft-delete caveat, App Gateway / Front Door
    Terraform attachment snippet, threat model carve-outs (no disk
    writes, mandatory provenance tags, no long-lived secrets in
    Config), 5-bullet procurement checklist crib.

Out of scope (intentional, flagged in V3-Pro forward path):
  - Azure Front Door direct-attach (UpdateRoutingConfig — different
    Azure RBAC scope).
  - App Gateway / App Service auto-bind (V3-Pro auto-attach).
  - Soft-delete recovery (acm:RecoverDeletedCertificate-equivalent
    requires extra RBAC; V2 keeps minimum-permission surface).
  - GCP Certificate Manager (separate cloud, separate connector).

Verified locally:
- gofmt clean.
- go vet ./internal/connector/target/azurekv/...
  ./internal/domain/... ./internal/service/...
  ./cmd/agent/...  clean.
- go test -short -count=1 ./internal/connector/target/azurekv/...
  ./cmd/agent/...  green (all 16 supported target types
  instantiate via the agent factory).

Reference: cowork/infisical-deep-research-results.md Part 5 Rank 5.
Acquisition prompt:
cowork/rank-5-aws-acm-azure-kv-target-adapters-prompt.md.
Companion commit (AWS half): 54033aa.
2026-05-03 22:43:45 +00:00
shankar0123 54033aa3aa target(awsacm): SDK-driven AWS Certificate Manager target connector
Closes Rank 5 (AWS half) of the 2026-05-03 Infisical deep-research
deliverable (cowork/infisical-deep-research-results.md Part 5).
Pre-fix, certctl had no path to deploy certs to AWS-managed TLS-
termination endpoints (ALB / CloudFront / API Gateway / App Runner)
— operators terminating TLS at AWS had to use Infisical secret-sync,
manual aws-cli imports, or external automation. This commit lands
the SDK-driven AWS Certificate Manager target connector that closes
the gap end-to-end.

Architecture:
  - internal/connector/target/awsacm/awsacm.go — Connector wraps
    *acm.Client behind the ACMClient interface seam (mirrors
    awsacmpca's ACMPCAClient pattern from the issuer side).
    LoadDefaultConfig handles the standard AWS credential chain
    (IRSA / EC2 instance profile / SSO / env vars); no embedded
    creds in connector Config.
  - Pre-deploy snapshot via DescribeCertificate + GetCertificate so
    on-import-failure rollback restores the previous cert. Mirrors
    the Bundle 5 IIS pattern + the Bundle 7/8 WinCertStore /
    JavaKeystore patterns. Surfaces rollback success/failure via
    the existing certctl_deploy_rollback_total Prometheus counter
    label set.
  - Provenance tags: certctl-managed-by=certctl + certctl-
    certificate-id=<mc-id> set automatically on every import. ACM
    strips tags on re-import, so the connector calls
    AddTagsToCertificate post-import to keep the provenance pair
    fresh. Operators looking up a cert ARN by managed-cert ID
    (Terraform data source, CloudFormation output) match against
    these tags.
  - DeploymentRequest.KeyPEM held in agent memory only — never
    written to disk. Aligns with the pull-only deployment model
    documented in CLAUDE.md.

Tests:
  - awsacm_test.go: 15-subtest happy-path + validation matrix
    covering ValidateConfig (success / missing-region / malformed-
    region / malformed-ARN / reserved-tag rejection),
    DeployCertificate (fresh import / rotate-in-place / rollback-
    on-serial-mismatch / rollback-also-fails / empty-key-rejected /
    no-client-rejected), ValidateOnly (returns sentinel),
    ValidateDeployment (serial match / mismatch / no-ARN-yet).
  - awsacm_failure_test.go: 5 per-error-class contract tests
    mirroring the awsacmpca_failure_test.go shape (commit
    60dce0b) — AccessDeniedException (smithy.GenericAPIError),
    ResourceNotFoundException (typed), ThrottlingException
    (smithy.GenericAPIError, FaultServer preserved),
    InvalidArgsException (typed, terminal), RequestInProgress
    Exception (typed). All assert errors.As against the SDK type +
    operator-actionable substring + connector-side wrap framing.
  - Coverage on awsacm.go: 54.9% of statements (matches the K8s-
    Secret + IIS connectors' 50-65% range; rollback-failure paths
    contribute most of the un-covered surface — those exercise
    only when the rollback's SDK call also returns an error).
  - go test -race -count=10 green; no goroutine leaks.

Wiring:
  - internal/domain/connector.go: TargetTypeAWSACM = "AWSACM".
  - internal/service/target.go: validTargetTypes set extended.
  - cmd/agent/main.go::createTargetConnector: AWSACM case arm
    mirroring the KubernetesSecrets shape exactly. Calls
    awsacm.New(context.Background(), &cfg, a.logger) — the
    SDK-loading happens here, not lazily, so config errors
    surface at agent boot.
  - cmd/agent/agent_test.go::TestCreateTargetConnector_AllSupported
    Types: AWSACM added to the type matrix + the InvalidJSON
    matrix.

go.mod / go.sum:
  - github.com/aws/aws-sdk-go-v2/service/acm v1.38.3 (direct).
    aws-sdk-go-v2 + service/acmpca + smithy-go were already direct
    from the awsacmpca issuer; this is the distribution-side
    companion package.

Documentation:
  - docs/connectors.md "AWS Certificate Manager (ACM)" section:
    config table, IAM policy JSON (5 actions on
    arn:aws:acm:*:*:certificate/*), IRSA / EC2 instance-profile /
    SSO auth recipes, atomic-rollback contract, Terraform ALB-
    attachment snippet, threat model carve-outs (no disk writes,
    mandatory provenance tags, no long-lived creds in Config),
    procurement checklist crib (5 bullets paste-able into a
    security review).

Out of scope (intentional, flagged in V3-Pro forward path):
  - CloudFront / ALB auto-attach (UpdateDistribution requires a
    different IAM scope than ACM ImportCertificate).
  - Cross-region ACM replication (ACM is regional; CloudFront
    forces us-east-1).
  - Tag-filtered ARN discovery (V2 uses operator-pinned
    Config.CertificateArn after first deploy; tag-scan path
    requires acm:ListTagsForCertificate which we deliberately
    keep off the minimum-IAM-policy surface).
  - Azure Key Vault (separate cloud, separate connector — Azure
    half of Rank 5 ships in a follow-on commit).

Verified locally:
- gofmt clean.
- go vet ./internal/connector/target/awsacm/...
  ./internal/domain/... ./internal/service/...
  ./cmd/agent/...  clean.
- go test -short -count=1 ./internal/connector/target/awsacm/...
  ./internal/domain/... ./cmd/agent/...  green (15 + 5 awsacm
  subtests; all 15 supported target types instantiate via the
  agent factory).
- go test -race -count=10 ./internal/connector/target/awsacm/...
  green.

Reference: cowork/infisical-deep-research-results.md Part 5 Rank 5.
Acquisition prompt:
cowork/rank-5-aws-acm-azure-kv-target-adapters-prompt.md.
2026-05-03 22:32:45 +00:00
shankar0123 60dce0bf10 googlecas, awsacmpca: add failure_test.go covering cloud-SDK error contracts
Closes Top-10 fix #4 of the 2026-05-03 issuer-coverage audit (see
cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, both
adapters had only happy-path test coverage with a single generic
ServerError pair each. Cloud CAs are typically the first-deployed
issuer in enterprise pilots; their diligence reviews dig hard into
IAM-error / cloud-error coverage. This commit lands the contract
tests.

AWSACMPCA — 5 tests in awsacmpca_failure_test.go. Each injects a
typed AWS SDK v2 error via the existing mockACMPCAClient seam and
asserts (1) error non-nil, (2) errors.As against the SDK's typed
value succeeds (so the wrap chain through fmt.Errorf("...%w", ...)
is intact), and (3) operator-actionable substring is present.

  1. Issue_AccessDenied — *smithy.GenericAPIError with
     Code="AccessDeniedException" (the SDK does NOT generate a
     typed *types.AccessDeniedException; AWS uses the smithy
     APIError shape for IAM denials). Asserts ErrorCode +
     "not authorized" + IAM resource path preserved through wrap.
  2. Issue_ResourceNotFound — *types.ResourceNotFoundException
     names the missing CA ARN.
  3. Issue_Throttling — *smithy.GenericAPIError with
     Code="ThrottlingException", Fault=FaultServer. Asserts the
     retryable class (FaultServer) is preserved through wrap so
     upstream retry logic can engage.
  4. Issue_MalformedCSR — *types.MalformedCSRException is terminal
     (operator must fix the CSR, not retry); asserts the
     validation-issue substring survives.
  5. Issue_RequestInProgress — *types.RequestInProgressException
     wraps cleanly; classification (retry vs reissue) is upstream's
     responsibility per the spec's "no new retry logic" rule.

GoogleCAS — 5 tests in googlecas_failure_test.go. The adapter uses
stdlib net/http directly (NO Google Cloud Go SDK dependency in
googlecas.go), so SDK typed-error assertions don't translate. Each
test runs an httptest.Server that returns the canonical Google API
JSON error envelope:

  {"error":{"code":N,"message":"...","status":"<STATUS>"}}

and asserts (1) error non-nil, (2) operator-actionable substring,
and (3) the canonical status string ("PERMISSION_DENIED",
"NOT_FOUND", "UNAVAILABLE") survives the wrap chain so upstream
classification can branch on it.

  1. Issue_PermissionDenied — 403 / PERMISSION_DENIED; surfaced
     error names the IAM resource path.
  2. Issue_CAPoolNotFound — 404 / NOT_FOUND; surfaced error names
     the missing pool resource.
  3. Issue_OAuth2TokenRefreshFailure — token endpoint returns 401
     invalid_grant; surfaced error mentions "token" so an operator
     reading the log immediately distinguishes a credential failure
     (rotate SA key) from a CA-side error (fix IAM binding). Test
     also asserts the CAS endpoint is NOT reached when the token
     exchange fails.
  4. Issue_RegionalAPIUnavailable — 503 / UNAVAILABLE; surfaced
     error preserves the retryable class markers (status code +
     UNAVAILABLE string) for upstream retry classification.
  5. Revoke_PermissionDenied — adapter does NOT silently swallow
     the failure; pin the contract so the audit-row atomicity
     guarantee from Bundle G (which lives in the service-layer
     wrapper, not the adapter) continues to apply. Test also
     verifies the revoke endpoint was actually reached, guarding
     against a future regression that short-circuits before the
     HTTP call.

Coverage delta:
  awsacmpca: 71.0% → 71.0% (failure tests reuse existing wrap
    code paths; behaviour-pin contract tests, not coverage tests).
  googlecas: 83.4% → 84.4% (+1.0pp).

go.mod: smithy-go moved indirect → direct, since the new AWSACMPCA
test file imports it. CI's go-mod-tidy-drift gate enforces this.

Test-only commit. No production code changes.

Verified locally:
  - gofmt clean.
  - go vet ./internal/connector/issuer/awsacmpca/...
    ./internal/connector/issuer/googlecas/...  clean.
  - go test -short -count=1 ./internal/connector/issuer/...  green.
  - go test -race -count=10 ./internal/connector/issuer/awsacmpca
    ./internal/connector/issuer/googlecas  green.

Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md
Top-10 fix #4.
2026-05-03 21:10:41 +00:00
shankar0123 ac8f92852d chore(deps): upgrade go-jose v4.0.4 → v4.1.4 + tidy duplicate require
Two-fer in one commit:

(1) Dependabot security alerts on go-jose/v4 v4.0.4. Both alerts
    flagged on commit 27bd660 (the Phase 1b push that introduced
    the dep):
      - GHSA-c6gw-w398-hv78 (CVE-2025-27144): DoS in JWS Compact
        parsing when input has many `.` characters; excessive
        memory consumption via strings.Split. Fixed in v4.0.5.
        Same shape as CVE-2025-22868 in golang.org/x/oauth2/jws.
      - GHSA-78h2-9frx-2jm8 (CVE-2026-34986): JWE decryption
        panic when alg is a key-wrapping algorithm (`*KW` other
        than the GCMKW family) and encrypted_key is empty. Maps
        to a denial-of-service via panic. Fixed in v4.1.4.

    The certctl ACME server only invokes ParseSigned for JWS verify
    (the JWS path); we never call ParseEncrypted/Decrypt. So the JWE
    panic doesn't reach our code path. The JWS DoS is a low-grade
    concern (an attacker submitting JWS objects with many dots
    could amplify memory). Both are still real CVEs; upgrading
    is cheap and right.

(2) ci: fix `go mod tidy` drift on commit 4f42dfb. When I added
    go-jose to the direct require block, I missed removing the
    duplicate `// indirect` line in the indirect block. CI's
    `go mod tidy && git diff --exit-code go.mod go.sum` flagged
    the drift. Running `go mod tidy` (combined with the v4.1.4
    upgrade above) cleans up both.

Verified locally:
  - go.mod has exactly one `github.com/go-jose/go-jose/v4 v4.1.4`
    line (in the direct require block); no `// indirect` duplicate.
  - go test -count=1 -short ./internal/api/acme/ green —
    confirms v4.1.4 has the same API surface (ParseSigned with
    SignatureAlgorithm allowlist, Header.ExtraHeaders[HeaderKey],
    JSONWebKey.Thumbprint(crypto.SHA256), Signer with
    SignerOptions.WithHeader). 16-case JWS verifier suite all
    pass.
  - go test -count=1 -short ./internal/service/ green.
  - go test -count=1 -short ./internal/api/handler/ -run TestACME
    green.
  - go build ./cmd/server → server binary clean.
2026-05-03 13:48:57 +00:00
shankar0123 4f42dfb053 ci: fix Phase 1b post-push CI failures (3 guards)
Phase 1b push (commit 27bd660) failed three CI guards. None were
caught by `make verify` locally because they're CI-only guards
that aren't part of the Makefile target. This commit fixes all
three.

1. go.mod tidy diff. The go-jose v4 dep was added with `// indirect`
   in go.mod after the initial `go get`, but the codebase imports it
   directly from internal/api/acme/jws.go + service/acme.go +
   handler/acme.go. CI's `go mod tidy && git diff --exit-code go.mod
   go.sum` flagged the staleness. Promoted to a direct require in
   the same `require (...)` block as github.com/aws/aws-sdk-go-v2
   etc.

2. G-3-env-docs-drift.sh. The guard greps `\bCERTCTL_[A-Z_]+\b` in
   docs/ and complains when the bare-prefix forms don't match
   anything defined in config.go. Phase 1a + 1b's docs/acme-server.md
   intro and migration header use bare-prefix forms `CERTCTL_ACME_*`
   and `CERTCTL_ACME_SERVER_*` to describe namespace separation
   (consumer-side ACMEConfig vs server-side ACMEServerConfig). Same
   precedent as the existing CERTCTL_SCEP_ + CERTCTL_TLS_ +
   CERTCTL_QA_* prefix entries already in the guard's ALLOWED list.
   Added CERTCTL_ACME_ + CERTCTL_ACME_SERVER_ to the ALLOWED list
   with a justification comment block matching the existing
   integration-surface allowlist convention.

3. openapi-handler-parity.sh. Distinct from
   internal/api/router/openapi_parity_test.go (which runs at `go
   test` time and has its own SpecParityExceptions map I extended
   in 1a + 1b) — this is a separate CI-only guard that reads
   api/openapi-handler-exceptions.yaml. The 6 Phase-1a routes + 4
   Phase-1b routes (10 ACME endpoints total) were never added to
   that yaml. Same rationale as the SCEP/SCEP-mTLS entries already
   in the file: ACME is a JWS-signed-JSON wire protocol per
   RFC 8555 + RFC 9773, not an OpenAPI-shape REST surface.
   Documenting every endpoint in openapi.yaml would duplicate the
   RFC. The canonical reference is docs/acme-server.md. Phases 2-4
   will add their routes to this yaml in lockstep with router.go.

Verified locally:
  - bash scripts/ci-guards/G-3-env-docs-drift.sh → clean.
  - bash scripts/ci-guards/openapi-handler-parity.sh → clean
    (152 router routes, 136 OpenAPI ops, 18 documented exceptions).
  - All other ci-guards/*.sh → clean.
  - go.mod diff after `go mod tidy` is empty.
2026-05-03 13:31:35 +00:00
shankar0123 27bd660e49 acme-server: account resource + JWS verifier (Phase 1b/7)
Layers JWS-authenticated POST machinery onto the Phase 1a foundation
(commit e146b00). After this commit, an ACME client can run

  POST /acme/profile/<id>/new-account

against certctl and successfully register an account. Account update
+ deactivation via POST /acme/profile/<id>/account/<acc-id> work.
Orders + challenges remain Phase 2 / 3.

Background:
  Two prior dispatch attempts at the original Phase 1 ("skeleton +
  directory + new-nonce + new-account" as a single commit) failed on
  go-jose v4 API speculation (jws.GetPayload, sig.Algorithm,
  jose.SHA256, etc. — none of those exist in v4). Splitting Phase 1
  into 1a (foundation, no go-jose) and 1b (this commit, all go-jose
  in one place) concentrated the JWS work where attention pays off.
  The verifier reads the actual go-jose v4 surface — ParseSigned with
  closed alg allow-list, Header struct fields (Algorithm, KeyID,
  JSONWebKey, Nonce, ExtraHeaders[HeaderKey]), JWK.Thumbprint with
  stdlib crypto.SHA256.

What ships:
  - internal/api/acme/jws.go: 487-line verifier + sentinel error
    family. Enforces RFC 8555 §6.2 + §6.4 + §6.5 invariants:
      - alg in {RS256, ES256, EdDSA} (closed allow-list passed to
        jose.ParseSigned — HS256 / none / etc. rejected at parse time)
      - exactly one of `kid` / `jwk` in protected header (per
        endpoint policy — new-account demands jwk, others demand kid)
      - protected `url` matches request URL exactly
      - protected `nonce` consumed against acme_nonces (badNonce on
        miss/replay/expiry per RFC 8555 §6.5.1)
      - kid round-trips against canonical AccountKID(accountID) URL
        (catches cross-profile / cross-host replay)
      - kid path: account exists + status=valid (deactivated /
        revoked accounts cannot authenticate)
      - signature verifies; post-Verify payload bytes equal
        UnsafePayloadWithoutVerification (defense in depth)
    + JWK persistence helpers (JWKToPEM / ParseJWKFromPEM round-
    trip a public-only JWK as a PEM-wrapped JSON envelope; stored
    as TEXT in acme_accounts.jwk_pem for diff-friendliness) +
    JWKThumbprint per RFC 7638.
  - internal/api/acme/jws_test.go: 16 cases covering happy paths
    (RS256 kid, ES256 jwk, EdDSA kid) + every named failure mode
    (alg-not-allowed, bad-sig, missing-nonce, unknown-nonce,
    replay, url-mismatch, mixed kid+jwk, deactivated-account,
    cross-host kid). Uses real keypairs + real go-jose Signer to
    build JWS objects.
  - internal/api/acme/account.go: NewAccountRequest /
    AccountUpdateRequest payload shapes (RFC 8555 §7.3 + §7.3.2 +
    §7.3.6) + AccountResponseJSON wire shape + MarshalAccount
    helper.
  - internal/domain/acme.go: ACMEAccount struct + ACMEAccountStatus
    closed enum (valid / deactivated / revoked).
  - internal/repository/postgres/acme.go: full account CRUD path
    (CreateAccountWithTx with 23505-unique-violation sentinel
    translation, GetAccountByID, GetAccountByThumbprint,
    UpdateAccountContactWithTx, UpdateAccountStatusWithTx) +
    sql.ErrNoRows-wrapped repository.ErrNotFound on lookup misses.
  - internal/service/acme.go: ACMERepo interface extended;
    SetTransactor + SetAuditService wires; NewAccount (idempotent
    re-registration per RFC 8555 §7.3.1 — same JWK returns existing
    row without an update or new audit event); LookupAccount;
    UpdateAccount; DeactivateAccount; VerifyJWS adapter that bridges
    api/acme.VerifierConfig to the service-layer ACMERepo; per-op
    metrics extended (new_account_total + _failures_total +
    _idempotent_total + update_account_total + _failures_total +
    deactivate_account_total).
  - internal/service/acme_test.go: 8 new tests covering
    new-account happy path / idempotent re-registration / only-
    return-existing match + no-match / contact update / deactivate
    / lookup-not-found / requires-transactor.
  - internal/api/handler/acme.go: NewAccount + Account handlers.
    Account dispatches POST-as-GET (RFC 8555 §6.3 — empty body or
    {} payload returns the account row), contact update, and
    deactivation from the same endpoint. Defense-in-depth check
    that the kid path-segment matches the URL path-segment (the
    verifier already round-tripped the kid against canonical URL,
    but the handler re-asserts to catch any future verifier
    refactor).
  - internal/api/handler/acme_handler_test.go: 7 new cases
    covering happy-create, idempotent-200, only-return-existing-
    no-match-400, malformed-JWS-400, kid-URL-mismatch-401,
    deactivate, contact-update, POST-as-GET.
  - internal/api/router/router.go: 4 new Register calls (per-
    profile + shorthand for new-account and account/{acc_id}).
  - internal/api/router/openapi_parity_test.go: SpecParityExceptions
    extended with the 4 new routes (RFC 8555 wire-protocol surface,
    not OpenAPI-shaped — same precedent as Phase 1a).
  - cmd/server/main.go: SetTransactor + SetAuditService on
    acmeService at startup so the WithinTx-based new-account /
    update / deactivate paths run with the same transactor instance
    shared across CertificateService / RevocationSvc / RenewalService.
  - docs/acme-server.md: Phase status updated; endpoints table grows
    new-account + account/<acc_id> rows; new "JWS verification
    (Phase 1b)" section enumerates the 7 invariants the verifier
    enforces; phases-cross-reference table marks 1b live.
  - go.mod / go.sum: github.com/go-jose/go-jose/v4 v4.0.4 added.

Atomicity: every account-state mutation writes its acme_accounts row
+ its audit_events row inside one repository.Transactor.WithinTx
call — the canonical certctl atomicity contract (matches
CertificateService.Create at internal/service/certificate.go:131).
Idempotent re-registration explicitly does NOT write an audit row
(RFC 8555 §7.3.1 returns the existing row unmodified).

Tests: 16 jws_test.go cases + 11 service tests + 11 handler tests
all pass under -short. Bad-signature test uses a real registered
account whose stored JWK is a different keypair from the signer's,
so the JWS parses cleanly but jose.Verify rejects — exercises the
ErrJWSSignatureInvalid path directly.

Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-1b".
2026-05-03 13:21:56 +00:00
shankar0123 4b73344acf scheduler: bound renewal concurrency via CERTCTL_RENEWAL_CONCURRENCY
Closes the #9 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit. Pre-fix, JobService.ProcessPendingJobs ran every
claimed job sequentially in a single goroutine: safe but slow, and
operators with large fleets had no lever to dial throughput up.
Switching to fire-and-forget per-job goroutines would have unbounded
the upstream-CA call rate and tripped DigiCert / Entrust / Sectigo
rate limits — certctl's response to 429 was to retry on the next
tick, re-fanning out the same calls and digging deeper into the
limit. Operators need a knob.

This commit:

- Adds CERTCTL_RENEWAL_CONCURRENCY env var (default 25) loaded via
  the existing getEnvInt pattern in internal/config/config.go.
  Documented inline as the cap for the per-tick renewal/issuance/
  deployment goroutine fan-out, with operator-tuning guidance:
  permissive upstream limits + large fleets (>10k certs) → 100;
  strict limits or async-CA-heavy fleets → 25 or lower.

- Wires golang.org/x/sync/semaphore.Weighted around the per-job
  goroutine launch in JobService.ProcessPendingJobs. Acquire(ctx, 1)
  is the load-bearing piece — it BLOCKS the loop when at the cap,
  providing real backpressure rather than fire-and-forget. The
  fan-out is split into processPendingJobsSequential (legacy,
  preserved for unit-test wiring that doesn't call
  SetRenewalConcurrency) and processPendingJobsConcurrent (production,
  delegates to a generic boundedFanOut helper).

- boundedFanOut takes the per-job work as a closure so the cap can
  be tested directly without standing up the renewal/deployment
  service graph. processed/failed counters use atomic.Int64 to
  avoid mutex overhead on every job completion; final log line
  reads both AFTER wg.Wait so the counts reflect every dispatched
  job. ctx-aware Acquire ensures a shutdown ctx cancel interrupts
  the dispatch loop promptly; in-flight goroutines drain via Wait
  before the function returns so no goroutine outlives the
  scheduler tick.

- shouldSkipJob extracted as a package-private helper so the
  agent-routed-deployment skip logic is shared between the
  sequential and concurrent paths byte-for-byte (the audit prompt's
  "channel-based semaphore without ctx-aware acquire" anti-pattern
  is explicitly avoided — semaphore.Weighted.Acquire returns on ctx
  done; channel <- struct{}{} would block forever).

- SetRenewalConcurrency setter on JobService normalises ≤0 to 1.
  semaphore.NewWeighted(0) constructs a semaphore that blocks every
  Acquire forever; the normalisation prevents a misconfigured env
  var from wedging the scheduler.

- cmd/server/main.go wires SetRenewalConcurrency(cfg.Scheduler.
  RenewalConcurrency) on the freshly-built jobService, immediately
  after SetAuditService. Production deployments always take the
  bounded path; tests that build JobService directly via
  NewJobService keep their strict-sequential behaviour because
  renewalConcurrency is the zero value.

- Tests in internal/service/job_concurrency_test.go:
  * TestBoundedFanOut_CapHolds — primary regression guard. 50 jobs
    × 50ms work × cap=5 → asserts peak in-flight never exceeds 5
    AND reaches 5 at least once (catches both upper-bound regressions
    and gates that incorrectly cap below the configured value).
    Lock-free max via CompareAndSwap so the measurement instrument
    doesn't itself constrain concurrency.
  * TestBoundedFanOut_AllJobsRun — lower-bound: every non-skipped
    job is dispatched.
  * TestBoundedFanOut_SkipsAgentRoutedDeployments — pins the
    shouldSkipJob contract.
  * TestBoundedFanOut_CtxCancelInterrupts — ctx cancellation
    interrupts a stuck fan-out within the timeout budget.
  * TestBoundedFanOut_FailedJobsCounted — per-job errors don't
    abort the fan-out.
  * TestSetRenewalConcurrency_NormalizesNonPositive — ≤0 → 1 fail-safe
    pinned across negative/zero/positive inputs.

- docs/features.md: scheduler-loop table augmented with the
  concurrency-cap env-var pointer alongside the job-processor row.

- docs/architecture.md: Concurrency Safety section gains a paragraph
  explaining the cap, the operator-tuning guidance, the ctx-aware
  Acquire semantics, and the audit reference.

Operator-facing impact: the first big renewal sweep no longer
takes down the upstream CA's rate-limit budget. Existing deployments
get the bounded path automatically (default 25); operators can
override via env var without code changes.

Verified locally:
- gofmt -l . clean
- go vet ./... clean
- staticcheck ./... clean
- go test -short -count=1 across service / scheduler / config /
  integration: green
- Six new tests under TestBoundedFanOut* + TestSetRenewalConcurrency*:
  green

Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix #9.
2026-05-02 14:12:30 +00:00
shankar0123 6119f26b1b awsacmpca: replace stub client with AWS SDK v2 implementation
Closes the #1 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit. The production New() constructor previously hardcoded
&stubClient{}, which returned "AWS SDK client not initialized (stub)" on
every method. Tests passed green via NewWithClient mock injection — a
path the production constructor never took. AWSACMPCA was wired into
the factory, the seed file, the test suite, and marketing collateral
but did not actually issue, retrieve, or revoke certificates.

This commit:
- Adds aws-sdk-go-v2/{config,service/acmpca,aws} to go.mod (with
  acmpca/types as a sub-package). go mod tidy could not be completed
  in the sandbox due to virtiofs concurrent-open-file ceiling on the
  module cache; the require blocks were arranged manually so the three
  directly-imported packages are non-indirect. Build, vet, staticcheck,
  and the full test suite are green; operator should run `go mod tidy`
  on the workstation to confirm cosmetic ordering before pushing.
- Implements sdkClient wrapping *acmpca.Client with local input/output
  type translation. Each method translates the connector's local input
  type to the SDK's typed input, calls the SDK, and translates the SDK
  output back to the local output type. aws-sdk-go-v2 types do not
  leak out of the awsacmpca package.
- Deletes stubClient (the four "AWS SDK client not initialized (stub)"
  methods). After this commit, there is no fall-back stub; production
  New() always wires the SDK.
- Rewrites New() to load credentials via awsconfig.LoadDefaultConfig
  with awsconfig.WithRegion(config.Region) and construct the SDK client
  via acmpca.NewFromConfig. Returns (*Connector, error). When config
  is nil or config.Region is empty, New defers SDK loading; ValidateConfig
  builds the client lazily on the first successful validation. This
  preserves the test pattern of New(nil, logger) → ValidateConfig.
- Wires acmpca.NewCertificateIssuedWaiter (5-minute default timeout)
  inside sdkClient.IssueCertificate so the connector's two-call
  pattern (IssueCertificate → GetCertificate) sees synchronous-via-
  waiter semantics. The waiter is hidden from the ACMPCAClient
  interface so mock implementations stay simple.
- Maps RFC 5280 revocation reasons to acmpcatypes.RevocationReason
  via the existing mapRevocationReason helper plus a cast at the
  sdkClient.RevokeCertificate boundary.
- Updates the issuerfactory.NewFromConfig call site at factory.go:L88
  for the new (*Connector, error) signature; the factory's outer
  signature already returns (issuer.Connector, error) so the change
  is local.
- Adds nil-client guards on the four client-using connector methods
  (IssueCertificate, RevokeCertificate, GetCACertPEM, plus the
  RenewCertificate path via IssueCertificate). When the connector is
  used before ValidateConfig has been called, these methods fail-fast
  with a "client not initialized" sentinel error instead of panicking.
- Fixes the copy-paste env-var doc-comments at awsacmpca.go:L41,L45
  (CERTCTL_GOOGLE_CAS_PROJECT / CERTCTL_GOOGLE_CAS_CA_ARN →
  CERTCTL_AWS_PCA_REGION / CERTCTL_AWS_PCA_CA_ARN). The actual config
  loader at internal/config/config.go:L1556-L1561 already used the
  correct env-var names; only the doc-comments were wrong.
- Updates the package doc-comment at awsacmpca.go:L1-L36 to clarify
  the synchronous-via-waiter behavior (issuance is asynchronous at
  the API level; the waiter inside sdkClient.IssueCertificate hides
  the asynchrony).
- Adds TestNew_ProductionPath/ValidConfigBuildsRealClient: calls
  production New() (NOT NewWithClient) with a valid config, asserts
  err is nil, then calls IssueCertificate with a bogus CSR and asserts
  the resulting error is the expected PEM-decode error rather than
  the deleted stubClient's "client not initialized" sentinel. This is
  the regression-marker test the audit's D11 blocker called out as
  missing — if anyone re-introduces a stub-style placeholder from
  production New() in the future, this test fails.
- Adds TestNew_ProductionPath/NilConfigDefersClientInit: documents the
  lazy-init contract for the New(nil, logger) → ValidateConfig pattern.
- Adds TestNew_ProductionPath/ValidateConfigBuildsClientLazily: verifies
  that ValidateConfig wires the SDK client when New was called with
  nil config.
- Adds TestNew_ProductionPath/{Revoke,GetCAPEM}BeforeInitFailsFast:
  verifies the nil-client guards on the other client-using methods.
- Adds TestNew_ErrorPaths covering AccessDeniedException-shaped errors,
  transient 5xx errors, and ctx-cancel propagation via the existing
  mockACMPCAClient.
- Updates docs/connectors.md:L490-L555 with: the synchronous-via-waiter
  behavior, a complete IAM policy example scoped to the four ACM PCA
  actions, a worked POST /api/v1/issuers example, and a troubleshooting
  section with three known failure modes (AccessDeniedException,
  ResourceNotFoundException, waiter timeout).

Live AWS integration testing is intentionally not added: ACM PCA is a
Pro-tier feature in localstack and the existing interface-mock tests
cover correctness end-to-end. Operators with AWS credentials can
validate by following the worked example in docs/connectors.md.

Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix #1 (Part 3, narrative section).
2026-05-01 23:13:59 +00:00
cowork 9a01ec4023 Bundle Q (Coverage Audit Closure): property-based pilot + hygiene — L-001/L-002/L-003/L-004/I-001 closed
Five small closures wrapping the Low-tier and Info-tier audit findings.

Q.1 — cmd/cli round-out (L-001 closed)
======================================
cmd/cli/dispatch_test.go: ~30 dispatch tests across handleCerts /
handleAgents / handleJobs / handleImport / handleStatus. httptest.NewTLSServer
mocks the API; cli.NewClient(_, _, _, _, true) constructs an
insecure-skip-verify client. Each test pins the missing-args usage-print
path AND the happy-path delegation. Result: 7.1% -> 63.5% coverage
(gate: >=30%).

Q.2 — awssm round-out (L-002 closed)
======================================
internal/connector/discovery/awssm/awssm_edge_test.go: New() default
constructor, extractKeyInfo (ECDSA/Ed25519/unknown — was RSA-only),
processSecret filter arms (NamePrefix mismatch / TagFilter mismatch /
empty-value / GetSecretValue error), realSMClient stub-contract pin
(ListSecrets / GetSecretValue / NewRealSMClient), and EmailAddresses
SAN extraction. Result: 78.2% -> 96.0% coverage (gate: >=85%).

Q.3 — Property-based testing pilot (L-003 closed)
======================================
gopter@v0.2.11 added to go.mod (test-only).

internal/crypto/encryption_property_test.go:
- TestProperty_EncryptDecryptRoundTrip — 50 successful tests,
  DecryptIfKeySet(EncryptIfKeySet(x, k), k) == x
- TestProperty_WrongPassphraseRejected — 30 successful tests,
  AEAD never returns nil-error AND bytes-equal plaintext under
  wrong passphrase
Both skipped under -short to keep developer loop fast (PBKDF2 600k
rounds × 50 iters ≈ 15s on -race CI).

internal/pkcs7/length_property_test.go:
- TestProperty_ASN1LengthRoundTrip — three sub-properties:
  decodeLength(encode(x)) == x for x ∈ [0, 2³¹−1]; short-form
  invariant (length<128 → 1 byte == length); long-form invariant
  (length>=128 → high bit set + N bytes follow). 500 successful
  tests in <10ms.

Q.4 — Architecture diagram multi-agent update (L-004 closed)
======================================
docs/qa-test-guide.md::Architecture: ASCII diagram updated to show
'certctl-agent (×N)' + callout explaining seed_demo.sql provisions
12 agent rows (1 active, 2 retired, 9 reserved/sentinel) for Parts
04, 05, 55 + FSM coverage. Operators running parallel-agent topologies
guided to AGENT_COUNT=N + 'make qa-stats'.

Q.5 — Test-naming CI guard (I-001 closed)
======================================
.github/workflows/ci.yml: Test-naming convention guard added after
the QA-doc seed-count drift guard. Greps for func Test<X>( missing
the <X>_<Scenario> suffix. Prints first 20 non-conformant as
::warning:: annotations. continue-on-error: true (informational).
Excludes TestMain + TestProperty_*. Promotion to hard-fail tracked
as I-001-extended.

Verification
======================================
- python3 yaml.safe_load on ci.yml: OK
- go vet ./cmd/cli/... ./internal/connector/discovery/awssm/...
  ./internal/crypto/... ./internal/pkcs7/...: clean
- go test -short -count=1 across all four packages: PASS
- go test -count=1 (full property tests): PASS
  - crypto 15.4s (50 + 30 × 600k PBKDF2)
  - pkcs7 5ms

Audit deliverables
======================================
- gap-backlog.md: strikethroughs on L-001/L-002/L-003/L-004/I-001
  with per-finding closure note
- closure-plan.md: ticks Bundle Q [x] with per-item breakdown

Closes: L-001, L-002, L-003, L-004, I-001
Bundle: Q (Property-Based + Hygiene)
2026-04-27 18:36:47 +00:00
Shankar e776327f71 Bundle E: Mechanical sweeps & defensive polish — 6 findings closed; L-004 deferred
Closes L-009 + L-010 + L-011 + L-013 + L-020 + L-021 from
comprehensive-audit-2026-04-25. L-004 deferred — recon found NO
rotation infrastructure exists at all; building it from scratch is
a feature project, not a Bundle-E mechanical sweep.

L-009 — ZeroSSL EAB URL configurable
  Audit's 'no timeout' claim was wrong: ari.go:329 has 15s timeout.
  internal/connector/issuer/acme/acme.go: zeroSSLEABEndpoint now
  lazily reads CERTCTL_ZEROSSL_EAB_URL from env at package init;
  defaults to ZeroSSL public endpoint. Pre-existing test override
  path preserved.

L-010 — Verified-already-clean
  grep -rn 'mock\.Anything' --include='*_test.go' . returned 0.
  certctl uses hand-rolled struct mocks (mockJobRepo, mockAuditRepo,
  etc.) with explicit method bodies; no testify-style mocks anywhere.

L-011 — IPv6 bracket-aware dialing pinned
  Every production net.Dial / DialTimeout site audited:
    cmd/agent/main.go:293 — intentional IPv4 literal '8.8.8.8:80'
    verify.go / tlsprobe / network_scan — net.Dialer (no string addr)
    email.go — net.JoinHostPort (bracket-aware)
    ssh.go — addr derives from JoinHostPort upstream
    ssrf.go — net.Dialer
  internal/connector/notifier/email/email_ipv6_test.go (NEW):
    TestJoinHostPort_IPv6BracketsRoundTrip pins IPv4/IPv6/zone variants;
    TestSMTPDialerUsesJoinHostPort source-greps email.go and fails CI
    if a future refactor swaps in 'host:port' concatenation.

L-013 — Verified-already-clean (monotonic-safe)
  Only one site uses now.Sub: middleware.go:393 in tokenBucket.allow().
  Both 'now' and tb.lastRefill come from time.Now() which carries
  monotonic-clock readings per Go's time package contract;
  intra-process now.Sub is monotonic-safe by construction. Doc
  comment block added above the call to make the invariant explicit.

L-020 (CWE-563) — ineffassign sweep, 8 unique sites
  certificate.go:135 — sortDir initial value dropped (set
    unconditionally below by SortDesc branch).
  certificate.go:169,175 — argCount post-increments dropped (var
    not read past the LIMIT/OFFSET formatting).
  agent_group.go, profile.go — page/perPage truly vestigial,
    replaced with _ = page; _ = perPage.
  issuer.go:633, owner.go:131, target.go:267, team.go:131 — same
    treatment for the audit-flagged second-function ListXxx clamps.
  First-function List() in issuer/owner/target/team KEEPS its
    clamp because page/perPage is used for in-memory slice
    pagination — ineffassign correctly didn't flag those.
  Build + tests green post-sweep.

L-021 — Transitive CVE bump
  go get golang.org/x/crypto@v0.45.0 golang.org/x/net@v0.47.0
    (crypto required net@0.47.0). go-text@v0.31.0 transitively
    bumped.
  Per tool-output govulncheck-verbose: x/net@v0.45.0 fixes
    GO-2026-4441 + GO-2026-4440; x/crypto@v0.45.0 fixes
    GO-2025-4134 + GO-2025-4135 + GO-2025-4116 — all 5 advisories
    cleared. Bundle B's ISV grep guard + Bundle D's release-time
    govulncheck step are the going-forward monitor + bump pass.

L-004 — Deferred to dedicated bundle
  Recon: zero hits for RotateAPIKey / rotated_at / key_status
    anywhere in source. API keys configured via
    CERTCTL_API_KEYS_NAMED env var; rotation is operator-managed
    (edit env + restart). Building rotation infrastructure from
    scratch is a feature project, not a mechanical sweep.
  Documented in audit-report.md with scope-pivot note.

Audit deliverables:
  audit-report.md: score 46/55 -> 52/55 closed
    (Low 14/19 -> 19/19 — 100% Low closed except L-004 deferred)
  findings.yaml: 6 status flips
  certctl/CHANGELOG.md: Bundle E section

Verification:
  go test -count=1 -short ./internal/service ./internal/connector/issuer/acme
    ./internal/connector/notifier/email                      green
  go vet on changed packages                                  clean
2026-04-27 01:17:15 +00:00
Shankar 8fe00870e4 chore: bump Go to 1.25.9 to patch 4 stdlib CVEs
Go 1.25.9 (released Apr 7 2026) fixes:
- GO-2026-4947: unexpected work during chain building in crypto/x509
- GO-2026-4946: inefficient policy validation in crypto/x509
- GO-2026-4870: unauthenticated TLS 1.3 KeyUpdate DoS in crypto/tls
- GO-2026-4865: JsBraceDepth context tracking XSS in html/template

Update CI workflow and go.mod to pin 1.25.9. govulncheck now reports
0 vulnerabilities in called code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:33:25 -04:00
Shankar 3cf75ffb73 feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.

Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard

Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 12:36:01 -04:00
Shankar e555af9288 feat(M39): IIS WinRM proxy agent mode + front-to-back wiring
Complete the IIS target connector with dual-mode deployment:
- WinRM proxy agent mode via masterzen/winrm for remote Windows servers
- Base64 PFX transfer with try/finally cleanup on remote host
- GUI wizard updated with 13 IIS config fields including WinRM settings
- TargetDetailPage sensitive field redaction (password/secret/token/key)
- OpenAPI TargetType enum updated (added Traefik, Caddy)
- connectors.md fully documented with WinRM proxy config example
- 38 total IIS tests (10 new WinRM tests), all passing with race detection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 20:53:20 -04:00
Shankar cf632c0af4 fix: end-to-end certificate lifecycle bugs + integration test environment
Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:

ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)

step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)

NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional

Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission

Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
  discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
  pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 17:02:20 -04:00
Shankar c014c17cc6 feat(m27): certificate export (PEM/PKCS#12) and S/MIME EKU support
Add certificate export in PEM (JSON or file download) and PKCS#12 formats.
Private keys are never included — they stay on agents. Add EKU-aware
issuance threading profile EKUs (serverAuth, clientAuth, codeSigning,
emailProtection, timeStamping) through the full issuance pipeline. Fix
agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for
S/MIME vs TLS, and a pre-existing generateID collision bug in deployment
job creation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-28 16:16:19 -04:00
Shankar 05e2a7406f chore: update go.sum with testcontainers-go dependencies
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 22:58:10 -04:00
Shankar b898317d0f docs: synchronize project documentation with codebase
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010)
and performs comprehensive documentation audit to eliminate drift between
code and docs.

Code changes:
- TICKET-003: Repository integration tests with testcontainers-go (50+ subtests)
- TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc
- TICKET-010: Request body size limits via http.MaxBytesReader middleware
- Fix missing slog import in certificate.go after service decomposition

Documentation updates:
- README: Fix endpoint count (97→93), expand env var reference (15→39 vars)
- CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations
- architecture.md: Add body size limits section, middleware chain ordering
- CONTRIBUTING.md: New contributor guide with architecture conventions,
  test patterns, middleware ordering, CI thresholds
- SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored)
- Test files: Add doc comments to all new test files

Documentation that should exist but doesn't yet:
- Architecture diagrams (C4 model or similar)
- Threat model document
- Testing philosophy guide
- Disaster recovery runbook
- Upgrade guide (migration between versions)
- API versioning strategy document

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 22:28:54 -04:00
Shankar d460950cce fix: Go 1.25 upgrade, codebase audit fixes, MCP server tests
Upgrade from Go 1.22 to 1.25 (minimum for MCP SDK, actively supported).
CI updated to match.

Codebase audit fixes:
- Local CA parseIP() now uses net.ParseIP — IP SANs no longer silently dropped
- Nil pointer guards in agent.go GetWorkWithTargets for target/cert enrichment
- MCP CreateCertificateInput marks owner_id/team_id as required
- NGINX connector uses CombinedOutput() — captures diagnostic output on failure
- Jobs handler validates JSON decode on rejection body — returns 400 on malformed
- CRL/OCSP handlers propagate requestID for error tracing

MCP server tests (26 tests):
- client_test.go: HTTP client coverage (GET/POST/PUT/DELETE, auth, 204, errors, binary)
- tools_test.go: tool registration, pagination, end-to-end flows with mock API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 17:36:25 -04:00
Shankar 7c8d4d29ff fix: pin Go version to 1.23 (minimum for MCP SDK compatibility)
The MCP Go SDK (modelcontextprotocol/go-sdk) requires Go 1.23+. Previous
commit accidentally bumped to 1.25 via go mod tidy on a newer toolchain.
Pin to 1.23 as the minimum compatible version — closest to our original
1.22 baseline. CI updated to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 17:05:45 -04:00
Shankar 4232b4acc7 fix: add go.sum and indirect deps for MCP SDK
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 17:00:30 -04:00
Shankar 89fe48923d feat: M18a — MCP server exposing all 76 API endpoints as AI-native tools
Separate standalone binary (cmd/mcp-server/) using official MCP Go SDK
(modelcontextprotocol/go-sdk v1.4.1) with stdio transport. Stateless HTTP
proxy translates MCP tool calls to certctl REST API requests. 76 tools
across 16 resource domains with typed input structs and jsonschema tags
for automatic LLM-friendly schema generation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 16:49:39 -04:00
Shankar 9b9aca3874 Downgrade x/crypto from v0.49.0 to v0.31.0 for Go 1.22 compatibility
golang.org/x/crypto v0.49.0 requires Go 1.25.0 which doesn't exist
yet, breaking both Docker builds and CI. Downgraded to v0.31.0 which
requires only Go 1.20+ and includes the same stable ACME v2 package.

Note: go.sum needs regeneration. Run `go mod tidy` before building.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 15:21:44 -04:00
Shankar c0cdc81a94 Fix go.mod: downgrade from non-existent Go 1.25 to Go 1.22
go.mod was set to go 1.25.0 which doesn't exist yet, causing Docker
builds to fail with "go.mod requires go >= 1.25.0 (running go 1.22.12)".
The project uses Go 1.22 as documented in Dockerfiles, CI, and README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 15:03:00 -04:00
Shankar ab79dead13 Complete M1, M1.1, M2: end-to-end lifecycle, agent deployment, ACME v2
- Wire issuer connector end-to-end with IssuerConnectorAdapter (dependency inversion)
- Renewal/issuance job processor: RSA key + CSR generation, Local CA signing, cert version storage
- Agent work API (GET /agents/{id}/work) and job status API (POST /agents/{id}/jobs/{job_id}/status)
- Agent-side deployment: WorkItem enrichment with target type/config, NGINX/F5/IIS connector invocation
- Full ACME v2 implementation: HTTP-01 challenge solving, account registration, order lifecycle
- Update all docs (README, architecture, connectors, demo-advanced, quickstart) for M1-M2
- Fix go vet warning in deployment.go (non-constant format string)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 23:49:45 -04:00
shankar0123 3a9fe8ba37 Complete V1 scaffold 2026-03-14 20:01:53 -04:00
shankar0123 d395776a95 Initial scaffold: certificate control plane v0.1.0 2026-03-14 08:22:17 -04:00