Closes the #9 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit. Pre-fix, JobService.ProcessPendingJobs ran every
claimed job sequentially in a single goroutine: safe but slow, and
operators with large fleets had no lever to dial throughput up.
Switching to fire-and-forget per-job goroutines would have unbounded
the upstream-CA call rate and tripped DigiCert / Entrust / Sectigo
rate limits — certctl's response to 429 was to retry on the next
tick, re-fanning out the same calls and digging deeper into the
limit. Operators need a knob.
This commit:
- Adds CERTCTL_RENEWAL_CONCURRENCY env var (default 25) loaded via
the existing getEnvInt pattern in internal/config/config.go.
Documented inline as the cap for the per-tick renewal/issuance/
deployment goroutine fan-out, with operator-tuning guidance:
permissive upstream limits + large fleets (>10k certs) → 100;
strict limits or async-CA-heavy fleets → 25 or lower.
- Wires golang.org/x/sync/semaphore.Weighted around the per-job
goroutine launch in JobService.ProcessPendingJobs. Acquire(ctx, 1)
is the load-bearing piece — it BLOCKS the loop when at the cap,
providing real backpressure rather than fire-and-forget. The
fan-out is split into processPendingJobsSequential (legacy,
preserved for unit-test wiring that doesn't call
SetRenewalConcurrency) and processPendingJobsConcurrent (production,
delegates to a generic boundedFanOut helper).
- boundedFanOut takes the per-job work as a closure so the cap can
be tested directly without standing up the renewal/deployment
service graph. processed/failed counters use atomic.Int64 to
avoid mutex overhead on every job completion; final log line
reads both AFTER wg.Wait so the counts reflect every dispatched
job. ctx-aware Acquire ensures a shutdown ctx cancel interrupts
the dispatch loop promptly; in-flight goroutines drain via Wait
before the function returns so no goroutine outlives the
scheduler tick.
- shouldSkipJob extracted as a package-private helper so the
agent-routed-deployment skip logic is shared between the
sequential and concurrent paths byte-for-byte (the audit prompt's
"channel-based semaphore without ctx-aware acquire" anti-pattern
is explicitly avoided — semaphore.Weighted.Acquire returns on ctx
done; channel <- struct{}{} would block forever).
- SetRenewalConcurrency setter on JobService normalises ≤0 to 1.
semaphore.NewWeighted(0) constructs a semaphore that blocks every
Acquire forever; the normalisation prevents a misconfigured env
var from wedging the scheduler.
- cmd/server/main.go wires SetRenewalConcurrency(cfg.Scheduler.
RenewalConcurrency) on the freshly-built jobService, immediately
after SetAuditService. Production deployments always take the
bounded path; tests that build JobService directly via
NewJobService keep their strict-sequential behaviour because
renewalConcurrency is the zero value.
- Tests in internal/service/job_concurrency_test.go:
* TestBoundedFanOut_CapHolds — primary regression guard. 50 jobs
× 50ms work × cap=5 → asserts peak in-flight never exceeds 5
AND reaches 5 at least once (catches both upper-bound regressions
and gates that incorrectly cap below the configured value).
Lock-free max via CompareAndSwap so the measurement instrument
doesn't itself constrain concurrency.
* TestBoundedFanOut_AllJobsRun — lower-bound: every non-skipped
job is dispatched.
* TestBoundedFanOut_SkipsAgentRoutedDeployments — pins the
shouldSkipJob contract.
* TestBoundedFanOut_CtxCancelInterrupts — ctx cancellation
interrupts a stuck fan-out within the timeout budget.
* TestBoundedFanOut_FailedJobsCounted — per-job errors don't
abort the fan-out.
* TestSetRenewalConcurrency_NormalizesNonPositive — ≤0 → 1 fail-safe
pinned across negative/zero/positive inputs.
- docs/features.md: scheduler-loop table augmented with the
concurrency-cap env-var pointer alongside the job-processor row.
- docs/architecture.md: Concurrency Safety section gains a paragraph
explaining the cap, the operator-tuning guidance, the ctx-aware
Acquire semantics, and the audit reference.
Operator-facing impact: the first big renewal sweep no longer
takes down the upstream CA's rate-limit budget. Existing deployments
get the bounded path automatically (default 25); operators can
override via env var without code changes.
Verified locally:
- gofmt -l . clean
- go vet ./... clean
- staticcheck ./... clean
- go test -short -count=1 across service / scheduler / config /
integration: green
- Six new tests under TestBoundedFanOut* + TestSetRenewalConcurrency*:
green
Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix#9.
Closes the #1 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit. The production New() constructor previously hardcoded
&stubClient{}, which returned "AWS SDK client not initialized (stub)" on
every method. Tests passed green via NewWithClient mock injection — a
path the production constructor never took. AWSACMPCA was wired into
the factory, the seed file, the test suite, and marketing collateral
but did not actually issue, retrieve, or revoke certificates.
This commit:
- Adds aws-sdk-go-v2/{config,service/acmpca,aws} to go.mod (with
acmpca/types as a sub-package). go mod tidy could not be completed
in the sandbox due to virtiofs concurrent-open-file ceiling on the
module cache; the require blocks were arranged manually so the three
directly-imported packages are non-indirect. Build, vet, staticcheck,
and the full test suite are green; operator should run `go mod tidy`
on the workstation to confirm cosmetic ordering before pushing.
- Implements sdkClient wrapping *acmpca.Client with local input/output
type translation. Each method translates the connector's local input
type to the SDK's typed input, calls the SDK, and translates the SDK
output back to the local output type. aws-sdk-go-v2 types do not
leak out of the awsacmpca package.
- Deletes stubClient (the four "AWS SDK client not initialized (stub)"
methods). After this commit, there is no fall-back stub; production
New() always wires the SDK.
- Rewrites New() to load credentials via awsconfig.LoadDefaultConfig
with awsconfig.WithRegion(config.Region) and construct the SDK client
via acmpca.NewFromConfig. Returns (*Connector, error). When config
is nil or config.Region is empty, New defers SDK loading; ValidateConfig
builds the client lazily on the first successful validation. This
preserves the test pattern of New(nil, logger) → ValidateConfig.
- Wires acmpca.NewCertificateIssuedWaiter (5-minute default timeout)
inside sdkClient.IssueCertificate so the connector's two-call
pattern (IssueCertificate → GetCertificate) sees synchronous-via-
waiter semantics. The waiter is hidden from the ACMPCAClient
interface so mock implementations stay simple.
- Maps RFC 5280 revocation reasons to acmpcatypes.RevocationReason
via the existing mapRevocationReason helper plus a cast at the
sdkClient.RevokeCertificate boundary.
- Updates the issuerfactory.NewFromConfig call site at factory.go:L88
for the new (*Connector, error) signature; the factory's outer
signature already returns (issuer.Connector, error) so the change
is local.
- Adds nil-client guards on the four client-using connector methods
(IssueCertificate, RevokeCertificate, GetCACertPEM, plus the
RenewCertificate path via IssueCertificate). When the connector is
used before ValidateConfig has been called, these methods fail-fast
with a "client not initialized" sentinel error instead of panicking.
- Fixes the copy-paste env-var doc-comments at awsacmpca.go:L41,L45
(CERTCTL_GOOGLE_CAS_PROJECT / CERTCTL_GOOGLE_CAS_CA_ARN →
CERTCTL_AWS_PCA_REGION / CERTCTL_AWS_PCA_CA_ARN). The actual config
loader at internal/config/config.go:L1556-L1561 already used the
correct env-var names; only the doc-comments were wrong.
- Updates the package doc-comment at awsacmpca.go:L1-L36 to clarify
the synchronous-via-waiter behavior (issuance is asynchronous at
the API level; the waiter inside sdkClient.IssueCertificate hides
the asynchrony).
- Adds TestNew_ProductionPath/ValidConfigBuildsRealClient: calls
production New() (NOT NewWithClient) with a valid config, asserts
err is nil, then calls IssueCertificate with a bogus CSR and asserts
the resulting error is the expected PEM-decode error rather than
the deleted stubClient's "client not initialized" sentinel. This is
the regression-marker test the audit's D11 blocker called out as
missing — if anyone re-introduces a stub-style placeholder from
production New() in the future, this test fails.
- Adds TestNew_ProductionPath/NilConfigDefersClientInit: documents the
lazy-init contract for the New(nil, logger) → ValidateConfig pattern.
- Adds TestNew_ProductionPath/ValidateConfigBuildsClientLazily: verifies
that ValidateConfig wires the SDK client when New was called with
nil config.
- Adds TestNew_ProductionPath/{Revoke,GetCAPEM}BeforeInitFailsFast:
verifies the nil-client guards on the other client-using methods.
- Adds TestNew_ErrorPaths covering AccessDeniedException-shaped errors,
transient 5xx errors, and ctx-cancel propagation via the existing
mockACMPCAClient.
- Updates docs/connectors.md:L490-L555 with: the synchronous-via-waiter
behavior, a complete IAM policy example scoped to the four ACM PCA
actions, a worked POST /api/v1/issuers example, and a troubleshooting
section with three known failure modes (AccessDeniedException,
ResourceNotFoundException, waiter timeout).
Live AWS integration testing is intentionally not added: ACM PCA is a
Pro-tier feature in localstack and the existing interface-mock tests
cover correctness end-to-end. Operators with AWS credentials can
validate by following the worked example in docs/connectors.md.
Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix#1 (Part 3, narrative section).
Closes L-009 + L-010 + L-011 + L-013 + L-020 + L-021 from
comprehensive-audit-2026-04-25. L-004 deferred — recon found NO
rotation infrastructure exists at all; building it from scratch is
a feature project, not a Bundle-E mechanical sweep.
L-009 — ZeroSSL EAB URL configurable
Audit's 'no timeout' claim was wrong: ari.go:329 has 15s timeout.
internal/connector/issuer/acme/acme.go: zeroSSLEABEndpoint now
lazily reads CERTCTL_ZEROSSL_EAB_URL from env at package init;
defaults to ZeroSSL public endpoint. Pre-existing test override
path preserved.
L-010 — Verified-already-clean
grep -rn 'mock\.Anything' --include='*_test.go' . returned 0.
certctl uses hand-rolled struct mocks (mockJobRepo, mockAuditRepo,
etc.) with explicit method bodies; no testify-style mocks anywhere.
L-011 — IPv6 bracket-aware dialing pinned
Every production net.Dial / DialTimeout site audited:
cmd/agent/main.go:293 — intentional IPv4 literal '8.8.8.8:80'
verify.go / tlsprobe / network_scan — net.Dialer (no string addr)
email.go — net.JoinHostPort (bracket-aware)
ssh.go — addr derives from JoinHostPort upstream
ssrf.go — net.Dialer
internal/connector/notifier/email/email_ipv6_test.go (NEW):
TestJoinHostPort_IPv6BracketsRoundTrip pins IPv4/IPv6/zone variants;
TestSMTPDialerUsesJoinHostPort source-greps email.go and fails CI
if a future refactor swaps in 'host:port' concatenation.
L-013 — Verified-already-clean (monotonic-safe)
Only one site uses now.Sub: middleware.go:393 in tokenBucket.allow().
Both 'now' and tb.lastRefill come from time.Now() which carries
monotonic-clock readings per Go's time package contract;
intra-process now.Sub is monotonic-safe by construction. Doc
comment block added above the call to make the invariant explicit.
L-020 (CWE-563) — ineffassign sweep, 8 unique sites
certificate.go:135 — sortDir initial value dropped (set
unconditionally below by SortDesc branch).
certificate.go:169,175 — argCount post-increments dropped (var
not read past the LIMIT/OFFSET formatting).
agent_group.go, profile.go — page/perPage truly vestigial,
replaced with _ = page; _ = perPage.
issuer.go:633, owner.go:131, target.go:267, team.go:131 — same
treatment for the audit-flagged second-function ListXxx clamps.
First-function List() in issuer/owner/target/team KEEPS its
clamp because page/perPage is used for in-memory slice
pagination — ineffassign correctly didn't flag those.
Build + tests green post-sweep.
L-021 — Transitive CVE bump
go get golang.org/x/crypto@v0.45.0 golang.org/x/net@v0.47.0
(crypto required net@0.47.0). go-text@v0.31.0 transitively
bumped.
Per tool-output govulncheck-verbose: x/net@v0.45.0 fixes
GO-2026-4441 + GO-2026-4440; x/crypto@v0.45.0 fixes
GO-2025-4134 + GO-2025-4135 + GO-2025-4116 — all 5 advisories
cleared. Bundle B's ISV grep guard + Bundle D's release-time
govulncheck step are the going-forward monitor + bump pass.
L-004 — Deferred to dedicated bundle
Recon: zero hits for RotateAPIKey / rotated_at / key_status
anywhere in source. API keys configured via
CERTCTL_API_KEYS_NAMED env var; rotation is operator-managed
(edit env + restart). Building rotation infrastructure from
scratch is a feature project, not a mechanical sweep.
Documented in audit-report.md with scope-pivot note.
Audit deliverables:
audit-report.md: score 46/55 -> 52/55 closed
(Low 14/19 -> 19/19 — 100% Low closed except L-004 deferred)
findings.yaml: 6 status flips
certctl/CHANGELOG.md: Bundle E section
Verification:
go test -count=1 -short ./internal/service ./internal/connector/issuer/acme
./internal/connector/notifier/email green
go vet on changed packages clean
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.
Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard
Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete the IIS target connector with dual-mode deployment:
- WinRM proxy agent mode via masterzen/winrm for remote Windows servers
- Base64 PFX transfer with try/finally cleanup on remote host
- GUI wizard updated with 13 IIS config fields including WinRM settings
- TargetDetailPage sensitive field redaction (password/secret/token/key)
- OpenAPI TargetType enum updated (added Traefik, Caddy)
- connectors.md fully documented with WinRM proxy config example
- 38 total IIS tests (10 new WinRM tests), all passing with race detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add certificate export in PEM (JSON or file download) and PKCS#12 formats.
Private keys are never included — they stay on agents. Add EKU-aware
issuance threading profile EKUs (serverAuth, clientAuth, codeSigning,
emailProtection, timeStamping) through the full issuance pipeline. Fix
agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for
S/MIME vs TLS, and a pre-existing generateID collision bug in deployment
job creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit downgraded x/crypto but go.sum was missing the
v0.31.0 hashes, causing CI to fail with "missing go.sum entry".
Hashes sourced from sum.golang.org.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
golang.org/x/crypto v0.49.0 requires Go 1.25.0 which doesn't exist
yet, breaking both Docker builds and CI. Downgraded to v0.31.0 which
requires only Go 1.20+ and includes the same stable ACME v2 package.
Note: go.sum needs regeneration. Run `go mod tidy` before building.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wire issuer connector end-to-end with IssuerConnectorAdapter (dependency inversion)
- Renewal/issuance job processor: RSA key + CSR generation, Local CA signing, cert version storage
- Agent work API (GET /agents/{id}/work) and job status API (POST /agents/{id}/jobs/{job_id}/status)
- Agent-side deployment: WorkItem enrichment with target type/config, NGINX/F5/IIS connector invocation
- Full ACME v2 implementation: HTTP-01 challenge solving, account registration, order lifecycle
- Update all docs (README, architecture, connectors, demo-advanced, quickstart) for M1-M2
- Fix go vet warning in deployment.go (non-constant format string)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>