mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:01:37 +00:00
fb54ebcb62980ca35b9044aa929cec867074f002
77 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
aebfd8bd7c |
Revert "chore: drop 'Infisical' label from internal references"
This reverts commit
|
||
|
|
19706e56b3 |
chore: drop 'Infisical' label from internal references
Strategic naming cleanup. Earlier doc-comments + commit messages framed Rank 4 / Rank 5 / Rank 7 work as 'Rank N of the 2026-05-03 Infisical deep-research deliverable' — the 'Infisical' qualifier was a holdover from the original deep-research framing where Infisical (a competing secrets-management platform) was the comparator. Keeping the comparator's name in our source adds noise without value; an external reader sees 'Infisical' and assumes a dependency or shared lineage rather than reading it as the competitive context it was. Mechanical sed across 34 files (32 source / docs + 2 follow-up Python passes to collapse 'deep-research deep-research' duplicates that emerged where the original phrase wrapped across lines): s|Infisical deep-research|deep-research|g s|infisical-deep-research-results|deep-research-results-2026-05-03|g s|infisical-deep-research-prompt|deep-research-prompt-2026-05-03|g s|infisical-deep-research|deep-research|g s|Infisical|deep-research|g s|deep-research deep-research|deep-research|g # collapse-pass Net diff: 63 insertions / 64 deletions across cmd/, docs/, internal/, migrations/. Pure text substitution; zero behavior change. Code path unchanged — go vet clean, tests for TestApproval pass on both internal/service and internal/api/handler packages. Workspace docs (cowork/) carry the same references and will be swept separately — they're not under certctl/ git control. The two filename references (cowork/infisical-deep-research-results.md + cowork/infisical-deep-research-prompt.md) get renamed alongside that sweep to deep-research-results-2026-05-03.md / deep-research-prompt-2026-05-03.md so cross-references in the certctl repo doc-comments resolve cleanly. |
||
|
|
0729ee46e0 |
chore: sweep github.com/shankar0123/certctl URL refs to certctl-io/certctl
Post-transfer cosmetic + release-critical URL refresh after moving the
repo from github.com/shankar0123/certctl to github.com/certctl-io/certctl
(2026-05-03). GitHub HTTP redirects continue to forward old URLs forever,
so existing operators are not broken — but aligns the canonical
references with the new owner so:
- procurement engineers / contributors browsing the docs see the right
URL on first read
- operators copying the agent install one-liner hit the new path
directly without going through a redirect
- the Helm chart's default image repository points at the canonical org
registry path
- the OnboardingWizard rendered to first-run UI users shows the new
URL in the install snippets and doc anchor links
- the GitHub Actions release workflow pushes container images to
ghcr.io/certctl-io/certctl-{server,agent} (was: shankar0123)
- the release-notes Markdown body in release.yml — which gets stamped
into every future release page — references the post-transfer
cert-identity (cosign keyless signing now uses the certctl-io
workflow URL) and the post-transfer SLSA provenance source-uri.
Without this, every cosign verify / slsa-verifier command on a
v2.1.0+ release would fail because the cert-identity-regexp would
not match the signing identity GitHub Actions OIDC issues post-
transfer. Old releases (v2.0.67 and earlier) keep their immutable
release-notes pointing at the shankar0123 path and remain
verifiable via their own published instructions.
Customer impact:
- Operators on ghcr.io/shankar0123/certctl-{server,agent}:latest
silently freeze on whatever tag was current at transfer time. They
get no errors; they just stop receiving updates. The next release
notes need a one-line callout (Phase 3.1 of cowork/transfer-
certctl-to-org.md) telling them to update their image path to
ghcr.io/certctl-io/certctl-{server,agent}.
- All other URLs (git clone, install one-liner, raw.githubusercontent
URLs, browser links, GitHub API) continue to resolve via permanent
HTTP redirects. The sweep is cosmetic for those.
Files swept (30 total):
.github/workflows/release.yml — IMAGE_NAMESPACE, source-uri,
cosign cert-identity-regexp, IMAGE= snippet (5 refs total).
CHANGELOG.md, README.md — anchor links, badges, install one-liner,
cosign verify snippets in operator-facing sections.
api/openapi.yaml — info / externalDocs URLs.
install-agent.sh — GITHUB_REPO const + systemd unit Documentation=
field.
deploy/ENVIRONMENTS.md, deploy/helm/{CHART_SUMMARY,INDEX,
INSTALLATION,README}.md, deploy/helm/certctl/{Chart.yaml,
README.md,values.yaml}, deploy/helm/examples/values-*.yaml —
chart docs + image repository defaults across dev / prod-ha
overrides.
docs/{certctl-for-cert-manager-users,connector-iis,connectors,
migrate-from-acmesh,migrate-from-certbot,quickstart,test-env,
why-certctl}.md — operator-facing doc URLs.
examples/{acme-nginx,acme-wildcard-dns01,multi-issuer,
private-ca-traefik,step-ca-haproxy}/docker-compose.yml +
examples/step-ca-haproxy/step-ca-haproxy.md — example image:
paths and accompanying narrative.
web/src/pages/OnboardingWizard.tsx — first-run-UI URL refs (curl
install one-liners, agent docker image path, doc anchor links).
Files intentionally NOT swept (Choice A from cowork/transfer-certctl-
to-org.md):
go.mod, go.sum — module declaration stays github.com/shankar0123/
certctl. Existing imports compile because Go uses the path
declared in go.mod, not the URL it was fetched from. Internal-
only project; no external Go consumers; rename will land as a
mechanical sed when one materializes.
~250 *.go files — every import remains github.com/shankar0123/
certctl/internal/...
deploy/test/f5-mock-icontrol/go.mod — separate test sub-module;
same Choice A logic; module path stays.
Files intentionally NOT swept (other reasons):
README.md lines 244-245 — Scarf-pixel docker-pull commands.
shankar0123.docker.scarf.sh/... is a Scarf-account hostname
(per-user, not per-repo) and the pixel keeps tracking pulls
against the operator's personal Scarf account. Migrating to a
certctl-io Scarf account is a separate decision (create org
Scarf account → re-create package → update README).
deploy/test/f5-mock-icontrol/f5-mock-icontrol — checked-in
compiled binary with shankar0123/certctl baked into Go build
info via the sub-module path. Out of scope for a URL sweep;
will refresh on the next `make test-integration` rebuild.
Verification:
gofmt: clean (no .go files touched).
go vet ./...: clean (verified at this SHA in 1.3 of the transfer
checklist; no .go changes since).
go build ./...: clean (same).
go test -short on representative packages: green (same).
Diff shape: 30 files, 74 insertions / 74 deletions, net-zero size,
pure URL substitution.
|
||
|
|
8a56a78282 |
target(azurekv): SDK-driven Azure Key Vault target connector
Closes Rank 5 (Azure half) of the 2026-05-03 Infisical deep-research deliverable (cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl had no path to deploy certs to Azure-managed TLS- termination endpoints (Application Gateway / Front Door / App Service / Container Apps) — operators terminating TLS at Azure had to use manual `az keyvault certificate import` invocations or external automation. This commit lands the SDK-driven Azure Key Vault target connector that closes the gap, mirroring the AWS ACM target shape shipped in commit |
||
|
|
edf6bee7f8 |
target(awsacm): SDK-driven AWS Certificate Manager target connector
Closes Rank 5 (AWS half) of the 2026-05-03 Infisical deep-research
deliverable (cowork/infisical-deep-research-results.md Part 5).
Pre-fix, certctl had no path to deploy certs to AWS-managed TLS-
termination endpoints (ALB / CloudFront / API Gateway / App Runner)
— operators terminating TLS at AWS had to use Infisical secret-sync,
manual aws-cli imports, or external automation. This commit lands
the SDK-driven AWS Certificate Manager target connector that closes
the gap end-to-end.
Architecture:
- internal/connector/target/awsacm/awsacm.go — Connector wraps
*acm.Client behind the ACMClient interface seam (mirrors
awsacmpca's ACMPCAClient pattern from the issuer side).
LoadDefaultConfig handles the standard AWS credential chain
(IRSA / EC2 instance profile / SSO / env vars); no embedded
creds in connector Config.
- Pre-deploy snapshot via DescribeCertificate + GetCertificate so
on-import-failure rollback restores the previous cert. Mirrors
the Bundle 5 IIS pattern + the Bundle 7/8 WinCertStore /
JavaKeystore patterns. Surfaces rollback success/failure via
the existing certctl_deploy_rollback_total Prometheus counter
label set.
- Provenance tags: certctl-managed-by=certctl + certctl-
certificate-id=<mc-id> set automatically on every import. ACM
strips tags on re-import, so the connector calls
AddTagsToCertificate post-import to keep the provenance pair
fresh. Operators looking up a cert ARN by managed-cert ID
(Terraform data source, CloudFormation output) match against
these tags.
- DeploymentRequest.KeyPEM held in agent memory only — never
written to disk. Aligns with the pull-only deployment model
documented in CLAUDE.md.
Tests:
- awsacm_test.go: 15-subtest happy-path + validation matrix
covering ValidateConfig (success / missing-region / malformed-
region / malformed-ARN / reserved-tag rejection),
DeployCertificate (fresh import / rotate-in-place / rollback-
on-serial-mismatch / rollback-also-fails / empty-key-rejected /
no-client-rejected), ValidateOnly (returns sentinel),
ValidateDeployment (serial match / mismatch / no-ARN-yet).
- awsacm_failure_test.go: 5 per-error-class contract tests
mirroring the awsacmpca_failure_test.go shape (commit
|
||
|
|
109f32ff41 |
notifications: per-policy multi-channel expiry-alert routing
Closes Rank 4 of the 2026-05-03 Infisical deep-research deliverable
(see cowork/infisical-deep-research-results.md Part 5). Pre-fix,
RenewalService.CheckExpiringCertificates already ran daily,
RenewalPolicy.AlertThresholdsDays drove per-cert thresholds, and
NotificationService.SendThresholdAlert deduped per (cert, threshold)
— but the channel was hardcoded to Email
(internal/service/notification.go:118 pre-fix). Operators who
configured PagerDuty / Slack / Teams / OpsGenie via
CERTCTL_PAGERDUTY_ROUTING_KEY etc. got nothing at any threshold
unless SMTP was also wired. Their first signal of an expired cert
was a 3 AM outage.
This commit lands the routing matrix on top of the existing
infrastructure:
1. RenewalPolicy gains AlertChannels (per-tier channel list) +
AlertSeverityMap (per-threshold tier assignment) +
EffectiveAlertChannels / EffectiveAlertSeverity accessors.
Default*() helpers preserve the back-compat Email-only
behaviour for operators who haven't touched their policies
post-upgrade. Migration 000026 adds the JSONB columns
idempotently.
2. NotificationService.SendThresholdAlertOnChannel — the new
per-channel dispatch helper. Old SendThresholdAlert stays as
an Email-only alias so non-policy callers (admin "send test
alert" surfaces) keep working byte-for-byte.
3. NotificationService.HasThresholdNotificationOnChannel — per-
(cert, threshold, channel) deduplication so a transient
PagerDuty 5xx today does NOT suppress today's Slack alert and
tomorrow's PagerDuty retry will still fire.
4. RenewalService.sendThresholdAlerts walks the resolved channel
set per threshold tier, fans out to every configured channel,
handles per-channel failures independently, defensively drops
off-enum channels with an audit row trail, and records a per-
channel audit event with metadata.channel + metadata.severity_tier.
5. service.ExpiryAlertMetrics — atomic counter table mirrored on
the VaultRenewalMetrics shape from the 2026-05-03 audit fix #5
(commit
|
||
|
|
869fc8f245 |
docs(openssl): operator playbook for shell-out threat model
Closes Top-10 fix #6 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the OpenSSL adapter's docs in docs/connectors.md explained usage but did NOT enumerate the threat model. The adapter exec's an arbitrary operator-supplied script — env-var inheritance, symlink attacks, sandbox-escape, multi-tenant process-isolation gaps. An acquirer's security reviewer reading this surface cold pattern-matches "highest-risk issuer surface with the lowest documented threat model." This commit lands a doc-side operator playbook in docs/connectors.md OpenSSL section (mirrors Bundle 8's "Operator playbook: keytool argv password exposure" subsection shape and the 2026-05-02 audit Top-10 fix #7 SSH InsecureIgnoreHostKey playbook). Six topics covered: 1. Why the adapter exists despite the risk (CLI-driven CAs without Go SDKs need an integration path). 2. Threat model the adapter accepts (trusted operator + trusted script + appropriate ownership + clear audit trail). 3. Threat model the adapter does NOT accept (operator-writable script paths, untrusted content, multi-tenant hosts). 4. Mitigations operators can layer (dedicated user, root-owned 0755 binary, audit rules, per-call timeout via CERTCTL_OPENSSL_TIMEOUT_SECONDS, env sanitisation, chroot/container, audit wrapper, per-call concurrency bound). 5. When NOT to use the adapter (compliance environments, multi-tenant servers, no-script-review environments). 6. V3-Pro forward path (hardened mode tracked in cowork/WORKSPACE-ROADMAP.md). Inline comment in internal/connector/issuer/openssl/openssl.go near the callSignScript exec call site forward-references the new doc subsection (no logic change). cowork/WORKSPACE-ROADMAP.md gains an "OpenSSL hardened mode" V3- Pro entry under "Adapter hardening" — sibling-folder doc, not in the certctl repo, so not reflected in this commit's diff. Same shape Bundle 8 used for the JavaKeystore playbook and the 2026-05-02 deployment-target audit Top-10 fix #7 used for the SSH InsecureIgnoreHostKey playbook. No code logic changes (only the explanatory comment near the exec call site). No test changes. Doc-only commit. Verified locally: - gofmt / go vet clean. - go test -short -count=1 ./internal/connector/issuer/openssl/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #6. |
||
|
|
0792271dc6 |
vault: add automatic token renewal at TTL/2 + Prometheus metric
Closes Top-10 fix #5 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the VaultPKI adapter authenticated with a static token and never called renew-self. Long-lived deploys hit token expiry; the first operator-visible signal was failed cert renewals on production targets. This commit: 1. Connector.Start(ctx) spawns a goroutine that calls POST /v1/auth/token/renew-self at TTL/2 cadence (computed from a one-shot lookup-self at startup). Honours ctx.Done() for graceful shutdown via a per-loop done channel + Stop(). 2. On `renewable: false` response (initial lookup OR any subsequent renewal), the loop emits a WARN, increments the not_renewable counter, and exits. The operator must rotate the token before Vault's Max TTL elapses. 3. New Prometheus counter certctl_vault_token_renewals_total with labels result={success,failure,not_renewable}. Registered alongside existing certctl_issuance_* counters in internal/api/handler/metrics.go. 4. ERROR-level logging on renewal failure with operator-actionable substring ("vault token renewal failed; rotate the token before TTL expires") so journalctl + grep find it. Loop keeps ticking after a failure — transient blips don't kill it. New optional issuer.Lifecycle interface: type Lifecycle interface { Start(ctx context.Context) error Stop() } Connectors that hold no background goroutines (almost all of them) do not implement this — IssuerRegistry.StartLifecycles / StopLifecycles feature-detect via type assertion. New lifecycle-bearing connectors plug in by implementing the interface; no further registry plumbing required. Wiring (cmd/server/main.go): - service.NewVaultRenewalMetrics() instance is shared between issuerRegistry.SetVaultRenewalMetrics (so Vault connectors built by Rebuild get a recorder) and metricsHandler.SetVaultRenewals (so the Prometheus exposer emits the new series). - issuerRegistry.StartLifecycles(ctx) is called after issuerService.BuildRegistry; defer issuerRegistry.StopLifecycles is paired so goroutines exit cleanly on signal. - IssuerConnectorAdapter.Underlying() exposes the wrapped issuer.Connector so registry-level machinery can reach the concrete connector behind the adapter without duplicating the wiring at every call site. Tests (internal/connector/issuer/vault/vault_renew_test.go): - TestVault_RenewLoop_TickAtHalfTTL — three ticks → three renewals, all "success". - TestVault_RenewLoop_StopsOnNotRenewable — second renewal returns renewable=false, loop exits, third tick fires no HTTP call. - TestVault_RenewLoop_FailureSurfacesViaMetric — first renewal 403 bumps "failure", second renewal succeeds → loop kept ticking. - TestVault_RenewLoop_CtxCancellation_StopsCleanly — Stop returns within 200ms after ctx cancel. - TestVault_RenewLoop_StartsNothingWhenNotRenewable — token already non-renewable at boot ⇒ no goroutine, "not_renewable" metric increments at startup so operators see it in Grafana. - TestVault_ComputeInterval — 4 cases pinning TTL/2 + minRenewInterval floor. - TestVault_RenewSelf_ParseFailure_NamesActionableInError — surfaced error contains "vault token renewal failed" + "rotate the token". Cadence is dynamic — every successful renewal re-derives TTL/2 from the renewed lease's lease_duration, so a short bootstrap token that gets renewed up to a longer Max TTL shifts to the longer cadence automatically (defends against degenerate fast ticking on a token whose Max TTL is far longer than its initial TTL). Documentation: - docs/connectors.md Vault PKI section gains "Token TTL + automatic renewal" subsection (operator-facing: cadence, metric, renewable=false rotation playbook). Out of scope (intentional, flagged in the audit follow-up): - AppRole / Kubernetes / AWS IAM auth methods (different renewal semantics). - Hot-reload of rotated token from disk (operator restarts today; future: GUI/MCP issuer-update path triggers Rebuild which Stops the old connector and Starts the new one). - Auto-re-auth after token death (operator playbook owns it). CHANGELOG.md is intentionally not hand-edited (per CHANGELOG.md itself: "no longer maintains a hand-edited per-version changelog; per-release notes are auto-generated from commit messages between consecutive tags"). Verified locally: - gofmt clean. - go vet ./internal/service/... ./internal/api/handler/... ./internal/connector/issuer/vault/... ./cmd/server/... clean. - go test -short -count=1 ./internal/connector/issuer/vault/... ./internal/service/... ./internal/api/handler/... green. - go test -race -count=10 -run 'TestVault_RenewLoop|TestVault_ComputeInterval' ./internal/connector/issuer/vault/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #5. |
||
|
|
39f065dda4 |
docs(acme-server): operator-facing reference + threat model + cert-manager walkthrough (Phase 6/7)
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.
What ships:
- docs/acme-server.md final pass: Auth-mode decision tree (when to
use trust_authenticated vs challenge), RFC 8555 + RFC 9773
conformance statement (section-by-section table of implemented
plus procurement-honest 'not implemented' rows for EAB / multi-
level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
(5 failure modes — badNonce / unknownAuthority / HTTP-01
connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
canonical fix for each), Version pinning + tested clients table
(cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
against-LE, can-I-use-from-outside-K8s, migration story, audit-
log catalog), See-also cross-link block.
- docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
certctl → Certificate flow, with YAML blocks byte-equal to
deploy/test/acme-integration/{clusterissuer-trust-authenticated,
certificate-test}.yaml to prevent doc/test drift.
- docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
options (OS trust store + Caddy pki.ca block).
- docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
.caServer + serversTransport.rootCAs configuration.
- docs/acme-server-threat-model.md: Threat surface map + JWS forgery
resistance (alg-confusion / HS256 substitution / replayed nonce /
URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
(pre-dial check + per-dial check + per-redirect check + body cap +
bounded redirects), DNS-01 cache-poisoning posture (default Google
Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
tuning, Audit trail catalog, Out-of-scope threats list.
- docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
top-level 'ACME Server (Built-in)' section between Issuer Connector
and Target Connector — distinguishes the consumer-side ACME
(existing) from the new server-side ACME via env-var-prefix
call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).
DoD verification:
- All 5 docs files exist with the structure prescribed by the
Phase 6 prompt.
- Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
to an actual lookup in internal/config/config.go (verified by
'grep -oE | sort -u | diff' returning empty).
- Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
byte-equal to the corresponding file in deploy/test/acme-integration/
(verified with 'diff' against awk-extracted YAML blocks).
- docs/connectors.md has the cross-link subsection with all 4 new
docs referenced.
- cowork/CLAUDE.md Architecture Decisions has the new ACME-server
bullet documenting per-profile URL family + per-profile
acme_auth_mode + Phase 4-5-6 progression.
- cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
the ACME-Server rollup spanning Phases 1a-6.
- cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
- 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.
Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
|
||
|
|
85d247455b |
docs(postfix): add Mode=postfix vs Mode=dovecot decision matrix subsection
Closes Top-10 fix #9 of the 2026-05-02 deployment-target audit
re-run (see cowork/deployment-target-audit-2026-05-02-rerun/
RESULTS.md). Pre-fix, the Postfix connector's docs in
docs/connectors.md described the connector as a single
"Postfix / Dovecot" target without explicit guidance on when to
use Mode=postfix vs Mode=dovecot. Operators with a mail server
running both Postfix (MTA, port 25) and Dovecot (IMAPS, port
993) had to read source to figure out the dual-deploy pattern.
Bundle 11 (commit
|
||
|
|
b16e5b5e97 |
docs(ssh): operator playbook for InsecureIgnoreHostKey design choice
Closes Top-10 fix #7 of the 2026-05-02 deployment-target audit re-run (see cowork/deployment-target-audit-2026-05-02-rerun/ RESULTS.md). Pre-fix, the SSH connector's ssh.InsecureIgnoreHostKey() at internal/connector/target/ssh/ ssh.go (realSSHClient.Connect) had only an inline comment justifying the design choice. An acquirer's diligence engineer reading the connector cold pattern-matches "MITM hazard" without seeing the comment. This commit lands a doc-side operator playbook in docs/connectors.md SSH section covering: 1. Why the connector accepts any host key (operator-configured target infrastructure; mirrors network scanner's InsecureSkipVerify and F5's Insecure flag). 2. Threat model the choice accepts (passive eavesdropper on operator-controlled network; layered SSH-key auth limits blast radius). 3. Threat model the choice does NOT accept (public-internet ephemeral hosts, multi-tenant networks, strict MITM- resistance regulatory requirements). 4. Mitigations operators can layer (custom SSHClient via NewWithClient + golang.org/x/crypto/ssh/knownhosts; SSH certificate authentication via @cert-authority pinning; network segmentation; per-target key rotation). 5. When to NOT use the SSH connector (regulatory environments, dynamic IPs, multi-tenant networks). 6. V3-Pro forward path (built-in known_hosts management, tracked in WORKSPACE-ROADMAP.md). Inline comment in ssh.go realSSHClient.Connect updated to forward-reference the new doc subsection (no logic change; same HostKeyCallback: ssh.InsecureIgnoreHostKey() call). Same shape Bundle 8 used for "Operator playbook: keytool argv password exposure" in docs/connectors.md JavaKeystore section. No code-behavior changes. No test changes. Verified locally: - gofmt / go vet clean. - go test -short ./internal/connector/target/ssh/... green. Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/ RESULTS.md Top-10 fix #7. |
||
|
|
62f0a284be |
iis,wincertstore: default-deadline ctx wrapper for PowerShell exec calls
Closes Top-10 fix #4 of the 2026-05-02 deployment-target audit re-run (see cowork/deployment-target-audit-2026-05-02-rerun/ RESULTS.md). Pre-fix, both IIS and WinCertStore's realExecutor invoked PowerShell via exec.CommandContext(ctx, ...) and relied entirely on the caller's ctx to provide a deadline. If the caller forgot to attach one (context.Background() in a deeply-nested path; an operator running an ad-hoc deploy via a CLI that doesn't default-deadline its ctx), a hung WinRM session blocked the deploy worker thread indefinitely. S2 (failure isolation) bar from the audit: "does a hung WinRM take down the deploy worker pool?" — today's answer was "potentially yes" for these two connectors. Post-fix the answer is "no, capped at the configured ExecDeadline (default 60s)". This commit: 1. Adds Config.ExecDeadline (time.Duration, json: "exec_deadline") to both connectors, defaulted to 60 seconds. WinCertStore defaults via the existing applyDefaults helper; IIS defaults inline at New() and inside ValidateConfig (the IIS connector has no shared applyDefaults helper today; out-of-scope to refactor one in for this minor fix). Operators on slow Windows links can override via the JSON config field exec_deadline. 2. Wraps realExecutor.Execute with a fallback context.WithTimeout that fires ONLY when ctx has no deadline of its own. Caller- supplied deadlines always win — the wrapper is a safety net, not a hard cap. defer cancel() guards against goroutine leaks. 3. Tests: - TestIIS_RealExecutor_AttachesDefaultDeadlineWhenCallerHasNone (passes context.Background; asserts the call returns within 500ms with an error). On Linux/macOS runners powershell.exe is missing and exec.Cmd fails fast; on Windows the wrapper's ctx deadline cancels the running PowerShell process. Either path returns well under 500ms. - TestIIS_RealExecutor_RespectsCallerDeadlineWhenSet (10s fallback executor deadline, 50ms caller ctx; asserts caller deadline wins). - TestIIS_RealExecutor_NoDeadlineWiredWhenZero (deadline=0 means no fallback wrapper; caller's tight ctx still bounds). - TestIIS_New_DefaultsExecDeadlineTo60s + TestIIS_New_RespectsExplicitExecDeadline pin the constructor's defaulting behavior (uses winrm mode so the test doesn't need powershell.exe in PATH). - Same five tests in wincertstore_test.go. 4. docs/connectors.md IIS + WinCertStore sections document the new exec_deadline field with: what it is (per-PowerShell- subprocess cap), default (60 seconds), override semantics (caller ctx deadline wins). No change to behavior when the caller already attaches a deadline (the common case in production code paths). Tests using the mock executor (mockExecutor in iis_test.go / wincertstore_test.go) are unaffected — they bypass realExecutor entirely. S2 cross-cutting scorecard rating in cowork/deployment-target-audit-2026-05-02-rerun/findings.json flips from "gap" to "pass" for IIS and WinCertStore (in any future re-audit). Verified locally: - gofmt / go vet / staticcheck clean across both packages. - go test -race -count=1 ./internal/connector/target/iis/... ./internal/connector/target/wincertstore/... green. Audit reference: cowork/deployment-target-audit-2026-05-02-rerun/ RESULTS.md Top-10 fix #4. |
||
|
|
eb390b2db4 |
javakeystore: pre-deploy export snapshot + on-import-failure rollback + argv-password operator note
Closes Bundle 8 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
DeployCertificate at javakeystore.go:172-272 ran an irreversible
keytool -delete against the existing alias, then keytool
-importkeystore. If the import failed after the delete succeeded,
the keystore was missing the alias entirely — previous cert gone,
new cert never landed. docs/deployment-atomicity.md L94 promised
"keytool snapshot; rollback via keytool -delete + re-import"; the
code didn't deliver. Separately, the operator-facing keystore
password is passed via -storepass argv (a standard keytool
limitation) which is visible to ps(1) for the duration of each
subprocess; this was undocumented as an operator-playbook caveat.
This commit:
1. Pre-delete snapshot. When os.Stat(KeystorePath) succeeds,
snapshotKeystore runs keytool -exportkeystore to
<BackupDir>/.certctl-bak.<unix-nanos>.p12 BEFORE the existing
-delete step. Backup path persisted in a local variable for
the rollback path; export-step failure aborts the deploy
entirely (no mutation has happened yet — the keystore is
untouched). Snapshot skipped on first-time deploys (no
keystore file = nothing to roll back to). The "alias not
present in pre-existing keystore" case is recognised via the
well-known keytool error string and treated as a clean
first-time-on-existing-keystore signal — the deploy proceeds
without a backup, and rollback (if needed) becomes the
no-backup branch.
2. On-import-failure rollback. When keytool -importkeystore
returns error, rollbackImport(ctx, backupPath) runs:
- keytool -delete -alias <Alias> ... (best-effort; the failed
import may have created a partial alias entry).
- keytool -importkeystore from the backup PKCS#12 to restore
the previous state.
On rollback success, the deploy returns wrapped error noting
"rolled back from <backup_path>". On rollback failure,
returns operator-actionable wrapped error containing both the
import error AND the rollback error AND the backup path so
the operator can manually keytool -importkeystore from the
.p12 file to recover.
3. Backup retention. Successful deploys prune older
.certctl-bak.*.p12 files beyond Config.BackupRetention.
Sort by ModTime newest-first; keep most recent N. Defaults:
BackupRetention=0 → keep most recent 3 (the default).
BackupRetention=N → keep most recent N.
BackupRetention=-1 → opt out of pruning entirely (operators
that wire their own archival/rotation).
Pruning runs in the success path AFTER the optional reload
command so it doesn't interfere with deploy-time signals.
ReadDir / Remove failures are non-fatal (debug log only) —
the deploy already succeeded.
4. Config gains BackupRetention int and BackupDir string fields.
BackupDir defaults to filepath.Dir(KeystorePath) so backups
land on the same filesystem as the keystore (atomic-ish
writes, disk-full failures fail fast at snapshot time).
5. Helper extraction. snapshotKeystore + rollbackImport +
pruneBackups + backupDir are private methods on Connector.
Constants backupFilePrefix=".certctl-bak." and
backupFileSuffix=".p12" centralise the naming convention so
the snapshot writer, the rollback reader, and the retention
pruner all agree.
6. Operator-playbook section added to docs/connectors.md
JavaKeystore section. Documents the standard keytool
-storepass argv exposure: ps(1)-visible for the duration
of each subprocess. Lists mitigations:
- Restrict shell access to the agent host.
- Linux user namespaces / AppArmor / SystemD ProtectProc=
invisible to deny ps-visibility.
- Single-purpose container for proper PID-namespace
isolation.
- Post-deploy keystore password rotation via reload_command
for high-security environments.
- BCFKS keystore type for FIPS environments (same argv
caveat applies).
Also documents an "Atomic rollback" subsection covering the
snapshot/rollback flow, the new backup_retention /
backup_dir Config fields, and the design choice to reuse
the keystore password for the snapshot (rather than
generating a separate transient password) — operator
already trusts the connector with this secret, surface area
doesn't grow, rollback's matching -srcstorepass stays
simple.
Tests added to javakeystore_test.go (7 new tests, ~430 LOC):
- TestJKS_Snapshot_RunsBefore_Delete: mock executor records call
order; asserts -exportkeystore is call[0], -delete is call[1],
-importkeystore is call[2]. The snapshot MUST run before the
delete — otherwise the delete destroys the very state the
snapshot is meant to capture.
- TestJKS_Snapshot_FirstTimeDeploy_NoExport: no keystore file
pre-created; asserts exactly 1 keytool call (-importkeystore
only), no -exportkeystore.
- TestJKS_ImportFails_RollsBack: happy rollback path with one
same-Subject backup. Asserts rollback re-import references the
same backup path the snapshot wrote (verified via arg
comparison between call[0] and call[4]).
- TestJKS_ImportFails_RollbackAlsoFails_OperatorActionable:
wrapped-error escalation with backup path in the error
message.
- TestJKS_BackupRetention_PrunesOldBackups: 5 pre-existing
staggered-ModTime backups + 1 deploy-created → retention=3 →
exactly 3 newest survive (deploy-created + 2 newest
pre-existing); 3 oldest pre-existing pruned.
- TestJKS_BackupRetention_Zero_DefaultsTo3: BackupRetention=0
must default to 3 (not "keep none").
- TestJKS_BackupRetention_Negative_OptsOut: BackupRetention=-1
pre-existing 5 + deploy 1 = 6 total, all 6 remain.
- TestJKS_Snapshot_AliasNotInKeystore_ProceedsCleanly: keystore
exists but alias missing; -exportkeystore returns "alias does
not exist" → snapshot helper recognises this signal and
returns ("", nil) so the deploy proceeds cleanly.
mockExecutor extended with optional `onCall` hook so the
retention-pruning tests can simulate keytool -exportkeystore's
file-write side effect (via the simulateExportSideEffect helper
that parses -destkeystore from args and writes a placeholder
.p12 file). Existing tests that don't set onCall behave
identically to before — backward compatible.
docs/deployment-atomicity.md L94 unchanged from today's text —
Bundle 1 doc-realignment hasn't shipped, so the "keytool snapshot;
rollback via keytool -delete + re-import" line was never softened.
Post-Bundle-8 the claim is honest (was aspirational pre-fix).
Verified locally (sandbox lacks staticcheck install due to disk
pressure; CI runs the full lint gate):
- gofmt -l ./internal/connector/target/javakeystore/ clean
- go vet ./internal/connector/target/javakeystore/ clean
- go build ./cmd/agent/... clean
- go test -race -count=1 ./internal/connector/target/javakeystore/
green (16 tests total: 9 pre-existing + 7 new)
Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 8.
|
||
|
|
a22a1be962 |
globalsign,entrust: cache mTLS keypair with mtime-based reload
Closes the #10 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. Pre-fix, GlobalSign reloaded the mTLS cert/key from disk on every API call (globalsign.go::getHTTPClient) and Entrust loaded once in ValidateConfig with no rotation handling — both shapes were broken for different reasons. Per-call disk reads under a 100- cert renewal sweep meant 200 file opens / parses / tls.X509KeyPair calls in flight, each adding 5–50ms of latency for nothing; the single-load Entrust shape served stale credentials forever after a cert rotation, requiring a process restart. This commit: - Adds a new shared package internal/connector/issuer/mtlscache/ with a Cache type holding a parsed tls.Certificate plus a precomputed *http.Transport. RWMutex serialises reloads; reads are lock-free in the hot path (read lock briefly held to copy out the *http.Client pointer, then released — the HTTP request itself happens with no lock held, per the audit prompt's anti- pattern about holding the write lock across an API call). - RefreshIfStale stats the cert file; if mtime advanced beyond the last load, the keypair is re-parsed and the transport is rebuilt. The fast path (mtime unchanged) takes the read lock for the comparison and returns immediately. Double-checked-lock pattern (read lock → stat → release → write lock → re-stat) prevents two callers who observed the same stale mtime from both reloading. - Options.TLSConfigBuilder lets the caller customise the *tls.Config built around the parsed leaf certificate. GlobalSign uses this to inject the ServerCAPath-pinning RootCAs pool that buildServerTLSConfig already produces; entrust uses the default builder. - New() performs the initial load so a broken cert path fails fast at construction rather than at first API call. - GlobalSign.Connector gains an mtls field. getHTTPClient now: (1) preserves the test-mode short-circuit when httpClient has a non-nil Transport; (2) preserves the bare-default-client short-circuit when cert paths aren't configured; (3) lazy-builds the cache on the first call so the constructor stays cheap; (4) calls RefreshIfStale on every subsequent call. The error wrap preserves the substring "client certificate" so existing TestGlobalsign_GetHTTPClient_MTLSPathConfigured_LoadsKeyPair keeps its assertion. - Entrust.Connector gains an mtls field plus a new getHTTPClient helper mirroring GlobalSign's shape. The three IssueCertificate / RevokeCertificate / pollEnrollmentOnce sites that previously hit c.httpClient.Do(req) directly now route through getHTTPClient, which falls through to the test-injected client (same logic as GlobalSign) and otherwise serves the cached mTLS client. The legacy ValidateConfig flow that pre-built c.httpClient with its own transport stays intact — its transport wins because getHTTPClient short-circuits when c.httpClient.Transport != nil. - Tests at internal/connector/issuer/mtlscache/cache_test.go cover: * fail-fast on missing paths (constructor input validation) * load on construction (positive + negative) * NoReloadWhenMtimeStable — 100 RefreshIfStale calls, LoadedAt must stay equal to the constructor's stamp (the load-bearing regression guard against per-call disk reads) * ReloadsOnMtimeAdvance — os.Chtimes forward, next refresh must observe the new LoadedAt (the load-bearing regression guard for rotation-without-process-restart) * StatErrorBubbles — missing cert file surfaces as an error rather than silently serving stale credentials * ConcurrentNoRace — 100 goroutines × 50 iterations under -race; no race detected, all calls succeed * TLSConfigBuilderUsed — custom builder is invoked at New AND on reload; verifies MinVersion=TLS1.3 takes effect * ClientHonoursTimeout — Options.HTTPTimeout reaches the constructed *http.Client - docs/connectors.md GlobalSign + Entrust sections each gain an "mTLS keypair caching (audit fix #10)" paragraph documenting the steady-state caching, mtime-based rotation contract, and operator workflow (mv -f new.crt /etc/certctl/.../client.crt). Acquirer impact: removes the per-call disk-read latency floor and makes operator-driven cert rotation a no-restart event. Combined with audit fix #9's bounded scheduler concurrency, the renewal sweep's hot path now has predictable steady-state cost: capN concurrent goroutines, each reusing the cached keypair, no per- call file I/O. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - go test -race -count=1 ./internal/connector/issuer/mtlscache/... green (8 tests) - go test -count=1 -short across globalsign / entrust / sectigo / ejbca / mtlscache / connector packages: green Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #10. Closes the audit's full Top-10 list (fixes #1-10 all shipped to master). |
||
|
|
fefa5a5fd7 |
acme: support serial-only revocation via local cert-version lookup
Closes the #7 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. Pre-fix, ACME RevokeCertificate at acme.go:L519-L529 returned the literal error "ACME revocation by serial not supported in V1; provide certificate DER". RFC 8555 §7.6 genuinely requires the cert DER bytes (not just the serial), but a CLM platform's job is to abstract over that limitation. Operators routinely have only the serial in hand: lost PEM, rotated key, GUI revoke action driven by a row in the certs list. This commit: - Adds CertificateLookupRepo interface at the ACME connector boundary (connector boundary, NOT a service/repository import — the connector accepts whatever satisfies the shape). Production wiring in cmd/server/main.go injects the postgres CertificateRepository; tests inject a fake. - Adds CertificateRepository.GetVersionBySerial(ctx, issuerID, serial) + interface declaration in repository/interfaces.go, returning the certificate_versions row whose SerialNumber matches, scoped to the issuer via JOIN on managed_certificates. Mirrors the existing GetByIssuerAndSerial shape but returns the version (where PEMChain lives). Per RFC 5280 §5.2.3 the issuer scope is required for determinism. - Adds SetCertificateLookup + SetIssuerID setters on *acme.Connector. Mirror the pattern local.Connector already uses for OCSP responder wiring. Both must be wired before serial-only revoke works; unwired state falls back to a more actionable error pointing at the wiring requirement (the historical "not supported" wording is retired). - Rewrites RevokeCertificate end-to-end: lookup → empty-PEM check → pem.Decode → block.Type == "CERTIFICATE" check → ensureClient → golang.org/x/crypto/acme.Client.RevokeCert(ctx, accountKey, der, reasonCode). RFC 8555 §7.6 case 1 (revocation request signed with account key) — the same account key issued the cert, so authority is intrinsic. The not-found path returns an actionable operator- facing error pointing at the local-store requirement. - Adds mapRevocationReason translating RFC 5280 §5.3.1 reason strings (unspecified, keyCompromise, cACompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, removeFromCRL, privilegeWithdrawn, aACompromise) into golang.org/x/crypto/acme. CRLReasonCode. Accepts canonical camelCase + underscore_lower + ALL_CAPS_UNDERSCORE. Nil reason → 0 (unspecified). Unknown reason errors rather than silently demoting (operators rely on the reason for compliance reporting). - Wiring update in service/issuer_registry.go: SetACMECertLookup setter on the registry; Rebuild type-asserts *acme.Connector and calls SetCertificateLookup + SetIssuerID, mirroring the existing *local.Connector branch. cmd/server/main.go calls issuerRegistry.SetACMECertLookup(certificateRepo) immediately after SetIssuanceMetrics — the postgres repo satisfies the interface via GetVersionBySerial. - Tests: * acme_revoke_test.go (new): TestRevokeCertificate_NoCertLookupWired, TestRevokeCertificate_NoIssuerIDWired, TestRevokeCertificate_LookupReturnsNotFound (operator-facing "may not have been issued through certctl" hint pinned), TestRevokeCertificate_LookupArbitraryError, TestRevokeCertificate_VersionPEMEmpty (corrupt-row guard), TestRevokeCertificate_PEMMalformed_NoBlock, TestRevokeCertificate_PEMMalformed_WrongType (PRIVATE KEY block rejected as not a CERTIFICATE). * TestMapRevocationReason_TableDriven: full RFC 5280 reason set plus camelCase / underscore / ALL-CAPS variants plus nil-reason and unknown-reason cases. * acme_failure_test.go: renamed TestRevokeCertificate_AlwaysError → TestRevokeCertificate_UnwiredCertLookupFallback; the test still exercises the same backward-compat branch but now asserts the new "CertificateLookup wiring" error wording. - Mock-repo updates (3 sites): mockCertificateRepository in internal/integration/lifecycle_test.go, mockCertRepo in internal/service/testutil_test.go, mockCertRepoWithGetError in internal/service/shortlived_test.go each gain a GetVersionBySerial implementation that mirrors the GetByIssuerAndSerial logic but returns the version row. - docs/connectors.md ACME section: new "Revocation by serial number" subsection covering the workflow, the local-store requirement (cert was issued through certctl, not imported), the reason-code mapping with the three accepted spelling variants, and a pointer to the audit reference. Out of scope (intentional, per spec): - Recovering the DER from outside the local cert store (CT logs, CSR + signature reconstruction). If the cert wasn't issued through certctl, revoke-by-serial via certctl isn't possible. - Revocation via the cert's private key (RFC 8555 §7.6 case 2). The account-key path covers all certctl-issued certs because the same account key issued them. - Pebble-backed integration test for the happy path. Pebble integration is the right home for that — the unit tests in this commit pin all failure-mode branches before the network call, and the wiring branch in Rebuild is exercised by the existing TestIssuerRegistryRebuild paths. Verified locally: - gofmt -l . clean - go vet ./... clean - staticcheck ./... clean - go test -short -count=1 across connector, service, repository, integration, api/middleware, api/handler: green Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #7. |
||
|
|
0509790325 |
asyncpoll: refactor Sectigo / Entrust / GlobalSign to bounded polling (Phase 2)
Phase 2 of the #5 acquisition-readiness fix from the 2026-05-01 issuer
coverage audit. Phase 1 (commit
|
||
|
|
590f654b0d |
awsacmpca: replace stub client with AWS SDK v2 implementation
Closes the #1 acquisition-readiness blocker from the 2026-05-01 issuer coverage audit. The production New() constructor previously hardcoded &stubClient{}, which returned "AWS SDK client not initialized (stub)" on every method. Tests passed green via NewWithClient mock injection — a path the production constructor never took. AWSACMPCA was wired into the factory, the seed file, the test suite, and marketing collateral but did not actually issue, retrieve, or revoke certificates. This commit: - Adds aws-sdk-go-v2/{config,service/acmpca,aws} to go.mod (with acmpca/types as a sub-package). go mod tidy could not be completed in the sandbox due to virtiofs concurrent-open-file ceiling on the module cache; the require blocks were arranged manually so the three directly-imported packages are non-indirect. Build, vet, staticcheck, and the full test suite are green; operator should run `go mod tidy` on the workstation to confirm cosmetic ordering before pushing. - Implements sdkClient wrapping *acmpca.Client with local input/output type translation. Each method translates the connector's local input type to the SDK's typed input, calls the SDK, and translates the SDK output back to the local output type. aws-sdk-go-v2 types do not leak out of the awsacmpca package. - Deletes stubClient (the four "AWS SDK client not initialized (stub)" methods). After this commit, there is no fall-back stub; production New() always wires the SDK. - Rewrites New() to load credentials via awsconfig.LoadDefaultConfig with awsconfig.WithRegion(config.Region) and construct the SDK client via acmpca.NewFromConfig. Returns (*Connector, error). When config is nil or config.Region is empty, New defers SDK loading; ValidateConfig builds the client lazily on the first successful validation. This preserves the test pattern of New(nil, logger) → ValidateConfig. - Wires acmpca.NewCertificateIssuedWaiter (5-minute default timeout) inside sdkClient.IssueCertificate so the connector's two-call pattern (IssueCertificate → GetCertificate) sees synchronous-via- waiter semantics. The waiter is hidden from the ACMPCAClient interface so mock implementations stay simple. - Maps RFC 5280 revocation reasons to acmpcatypes.RevocationReason via the existing mapRevocationReason helper plus a cast at the sdkClient.RevokeCertificate boundary. - Updates the issuerfactory.NewFromConfig call site at factory.go:L88 for the new (*Connector, error) signature; the factory's outer signature already returns (issuer.Connector, error) so the change is local. - Adds nil-client guards on the four client-using connector methods (IssueCertificate, RevokeCertificate, GetCACertPEM, plus the RenewCertificate path via IssueCertificate). When the connector is used before ValidateConfig has been called, these methods fail-fast with a "client not initialized" sentinel error instead of panicking. - Fixes the copy-paste env-var doc-comments at awsacmpca.go:L41,L45 (CERTCTL_GOOGLE_CAS_PROJECT / CERTCTL_GOOGLE_CAS_CA_ARN → CERTCTL_AWS_PCA_REGION / CERTCTL_AWS_PCA_CA_ARN). The actual config loader at internal/config/config.go:L1556-L1561 already used the correct env-var names; only the doc-comments were wrong. - Updates the package doc-comment at awsacmpca.go:L1-L36 to clarify the synchronous-via-waiter behavior (issuance is asynchronous at the API level; the waiter inside sdkClient.IssueCertificate hides the asynchrony). - Adds TestNew_ProductionPath/ValidConfigBuildsRealClient: calls production New() (NOT NewWithClient) with a valid config, asserts err is nil, then calls IssueCertificate with a bogus CSR and asserts the resulting error is the expected PEM-decode error rather than the deleted stubClient's "client not initialized" sentinel. This is the regression-marker test the audit's D11 blocker called out as missing — if anyone re-introduces a stub-style placeholder from production New() in the future, this test fails. - Adds TestNew_ProductionPath/NilConfigDefersClientInit: documents the lazy-init contract for the New(nil, logger) → ValidateConfig pattern. - Adds TestNew_ProductionPath/ValidateConfigBuildsClientLazily: verifies that ValidateConfig wires the SDK client when New was called with nil config. - Adds TestNew_ProductionPath/{Revoke,GetCAPEM}BeforeInitFailsFast: verifies the nil-client guards on the other client-using methods. - Adds TestNew_ErrorPaths covering AccessDeniedException-shaped errors, transient 5xx errors, and ctx-cancel propagation via the existing mockACMPCAClient. - Updates docs/connectors.md:L490-L555 with: the synchronous-via-waiter behavior, a complete IAM policy example scoped to the four ACM PCA actions, a worked POST /api/v1/issuers example, and a troubleshooting section with three known failure modes (AccessDeniedException, ResourceNotFoundException, waiter timeout). Live AWS integration testing is intentionally not added: ACM PCA is a Pro-tier feature in localstack and the existing interface-mock tests cover correctness end-to-end. Operators with AWS credentials can validate by following the worked example in docs/connectors.md. Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md Top-10 fix #1 (Part 3, narrative section). |
||
|
|
6622883989 |
docs(est): EST RFC 7030 operator guide + WiFi/802.1X recipe + IoT bootstrap recipe + FreeRADIUS integration + architecture + README
EST RFC 7030 hardening master bundle Phase 12 — comprehensive operator- facing documentation for the Phases 1-11 backend work that shipped on 2026-04-29. NEW docs/est.md (19 sections, ~810 lines): Concepts (host vs user enrollment, profile-driven policy, multi-profile dispatch); 5-minute single-profile Quick start with curl + openssl recipes; Multi-profile dispatch (CERTCTL_EST_PROFILES=corp,iot,wifi setup with PathID rules enforced at boot); Authentication modes (mTLS / Basic / both / empty with cross-check semantics); RFC 9266 channel binding (failure-mode HTTP mapping table — ErrChannelBindingMissing/Mismatch/NotTLS13 → 400/409/426); WiFi/802.1X recipe with end-to-end FreeRADIUS integration (EAP-TLS supplicant config, mods-available/eap tls-common block, CRL distribution endpoint cross-ref, troubleshooting playbook); IoT bootstrap recipe (factory provisioning, first boot, steady-state renewal, compromise/decommission via bulk-revoke, recommended cert lifetimes per master prompt §7.7); serverkeygen for resource-constrained devices (CMS EnvelopedData wrap, RSA-only at this revision, zeroize discipline, Phase-1 cross-check refusing _SERVERKEYGEN_ENABLED=true with empty _PROFILE_ID); HSM-backed CA signing for EST cross-ref (signer interface seam); Operator GUI tabbed surface tour (/est: Profiles / Recent Activity / Trust Bundle); CLI + 6 MCP tools; Renewal device-driven model (RFC 7030 §4.2.2 mandate, renewal-trigger ratios for laptops/IoT, operator-push via webhook); Troubleshooting matrix (one row per typed audit-action constant in internal/service/est_audit_actions.go); TLS 1.2 reverse-proxy runbook cross-ref (channel-binding caveat explained); Threat model (load-bearing properties: trust-anchor reload fail-safety, per-profile counter isolation, mTLS cross-profile bleed defense, source-IP limiter process-locality, server-keygen heap residency, HTTP Basic in-process-only, legacy-anonymous-default back-compat carve-out); V3-Pro deferrals; Appendix A (libest sidecar reproducer + 5 integration test names); Appendix B (Cisco IOS 15.x + 16.x + Apple MDM + OpenWRT + libest <v3.0 wire-format quirks tested in internal/api/handler/cisco_ios_quirks_test.go). UPDATED docs/architecture.md: new "EST Server (RFC 7030) — Production Deployment" section under the existing baseline EST section. Mermaid diagram of multi-profile dispatch + mTLS sibling route + per-profile gate ordering + audit + GUI + SIGHUP-equivalent reload. Existing authentication paragraph updated with forward-ref to the hardening section. Audit paragraph updated to enumerate the 13 typed est_* action codes operators grep on. Trust-anchor reload semantics + libest interop tested in CI both called out. UPDATED README.md::Enrollment Protocols: replaced the one-line EST row with the full production-grade surface description matching the SCEP analog. Cross-references docs/est.md. UPDATED docs/connectors.md::EST/SCEP Integration: extended the EST-or-SCEP shared paragraph to point at the per-profile env-var form for both protocols + linked the new architecture.md section. NEW "Multi-profile EST dispatch + production hardening" subsection mirrors the SCEP equivalent: 9-row env-var table, cross-ref to docs/est.md. G-3 docs-drift CI guard reproduced locally clean — every CERTCTL_EST_* mention in docs maps back to internal/config/config.go, and every defined env var is documented. The `<NAME>` placeholder convention matches the SCEP idiom so the docs grep doesn't extract per-deploy profile names as phantom env vars. No new env vars introduced — this is a pure docs commit. |
||
|
|
530593507b |
fix(scep-intune): close 11 audit gaps from 2026-04-29 pre-tag review
Closes the eleven gaps identified in the pre-v2.1.0 audit of the SCEP
RFC 8894 + Intune master bundle (cowork/scep-bundle-gap-closure-prompt.md).
Constitutional rule from cowork/CLAUDE.md::Operating Rules — 'Always
take the complete path, not the easy path' — drove this closure: each
gap was a load-bearing wire that crossed multiple layers (config →
validator → service wire-up → tests → docs) and shipping the bundle
without them would have produced lying-field footguns where operator-
visible config options stored values without affecting behavior.
WHAT LANDS:
Phase A — Clock-skew tolerance (master prompt §15 hazard closure)
internal/scep/intune/challenge.go: ValidateChallenge migrated from
positional args to ValidateOptions{} struct; new ClockSkewTolerance
field with default 0 (strict). 24 call sites updated mechanically.
Asymmetric application: now+tolerance >= iat AND now-tolerance < exp.
internal/config/config.go: SCEPIntuneProfileConfig.ClockSkewTolerance
default 60s + Validate() refusal when >= ChallengeValidity.
cmd/server/main.go: SetIntuneIntegration signature extended;
per-profile env-var loader honors CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_CLOCK_SKEW_TOLERANCE.
internal/service/scep.go: intuneClockSkew field + IntuneStatsSnapshot
surfaces clock_skew_tolerance_ns. web/src/api/types.ts mirrors.
4 new tests in challenge_test.go covering accept-within-tolerance,
reject-beyond-tolerance, accept-expired-within-tolerance,
negative-treated-as-zero defensive normalization.
docs/scep-intune.md updated with the new env var + time-bounds rule.
Phase B — unknown-version-rejected golden test
internal/scep/intune/golden_helper_test.go: goldenUnknownVersionPayload
helper + signGoldenChallengeAny generic signer.
challenge_golden_test.go: TestGoldenChallenge_UnknownVersionRejected
uses an in-process ECDSA fixture (the on-disk PEM was generated with
a Go-stdlib version that produces different ecdsa.GenerateKey bytes
from the current call). TestRegenerateGoldenFixtures emits the new
unknown_version fixture file too.
Phase C — Two named Intune e2e tests
internal/api/handler/scep_intune_e2e_test.go:
TestSCEPIntuneEnrollment_RateLimited_E2E (cap=2 + 3 attempts; 3rd
returns FAILURE+badRequest with rate_limited counter ticked)
TestSCEPIntuneEnrollment_TrustAnchorSIGHUPReload_E2E (rotate
on-disk PEM + holder.Reload(); old-key challenge fails with
badMessageCheck; signature_invalid counter ticked)
intuneE2EFixture struct extended with trustHolder + trustPath fields
so tests can rotate.
Phase D — Four new ChromeOS hermetic tests (10 total now)
internal/api/handler/scep_chromeos_test.go:
_RAKeyMismatch — PKIMessage encrypted to wrong RA cert; handler
rejects without reaching service.
_3DESBackwardCompat — RFC 8894 §3.5.2 legacy fallback verified.
_RSACSR + _ECDSACSR — explicit matrix-pair pinning.
buildTestECDSACSR helper for ECDSA P-256 CSR construction;
tripleDESCBCEncrypt mirrors aesCBCEncrypt for 3DES-CBC;
assertChromeOSPositiveCertRep shared assertion.
Phase E — Per-profile counter isolation test
internal/api/handler/scep_profile_counter_isolation_test.go:
TestSCEPHandler_PerProfileIntuneCountersIsolated wires two
SCEPService instances + drives distinct PKIMessages + asserts
counter isolation. Guards against a future cmd/server/main.go
refactor that shares a *intuneCounterTab across profiles.
buildPerProfileIntuneFixture parameterized helper.
Phase F — Server-boot regression tests
cmd/server/preflight_scep_intune_test.go: 3 named tests covering
disabled-backward-compat, broken-config-with-PathID, expired-cert
refusal. preflightSCEPIntuneTrustAnchor signature extended with
pathID arg so error messages carry PathID= for operator log-grep.
Phase G — docs/connectors.md
Four new subsections under §EST/SCEP Integration: multi-profile
dispatch + mTLS sibling route + Intune Connector dispatcher + SCEP
probe in network scanner. Each has a one-paragraph operator
explanation + an env-var or endpoint table.
Phase H — Coverage uplift
internal/service/scep_probe_persist_test.go: 5 unit tests on
persistProbeResult (nil-safe + nil-repo-safe + repo-error swallow +
nil-logger guard) + ListRecentSCEPProbes (empty-slice-not-nil + repo
pass-through) + describeCertAlgorithm (RSA/ECDSA/QF1008-nil-curve
defensive branch/Ed25519/DSA/empty). CI gates (service ≥70, handler
≥75) PASS at 70.9% / 79.3%.
Phase I — deploy/test integration variant
deploy/test/scep_intune_e2e_test.go (//go:build integration):
TestSCEPIntuneEnrollment_Integration + _RateLimited_Integration
against the live docker-compose certctl container. Skip-when-
stack-missing semantics so sandbox + CI both work.
deploy/docker-compose.test.yml: new e2eintune SCEP profile env
vars + bind-mount of deploy/test/fixtures/.
deploy/test/fixtures/README.md: documents the deterministic trust
anchor regeneration recipe.
VERIFICATION (sandbox):
gofmt -d — clean for all changed files
staticcheck — clean for intune + handler + config + service +
cmd/server packages
go vet — clean for the same packages
go test -short — green for intune (95.3% cov), service (70.9%),
handler (79.3%), config (94.0%), cmd/server (boot
path; my preflight tests cover the directly-
testable function), pkcs7 (80.5% informational)
DEFERRED (per closure prompt §7 out-of-scope):
- V3-Pro Conditional Access gating + Microsoft Graph integration
- Standalone certctl-scan CLI binary
- OCSP rate-limiting, OCSP stapling, delta CRLs
Spec preserved at cowork/scep-bundle-gap-closure-prompt.md;
journal at cowork/scep-rfc8894-intune/progress.md (audit-closure
section appended).
|
||
|
|
23603f5174 |
docs(scep): RFC 8894 hardening — README + architecture + connectors
SCEP RFC 8894 + Intune master bundle — Phase 6 of 14.
Closes Half 1 of the bundle (Phases 0-6). The certctl SCEP server now
ships full RFC 8894 wire format (EnvelopedData decrypt + signerInfo POPO
verify + CertRep PKIMessage builder), tested against ChromeOS-shape
hermetic E2E requests, with multi-profile dispatch and must-staple
per-profile policy. Half 2 (Phases 7-12) adds the Microsoft Intune
dynamic-challenge layer; Phase 6.5 (mTLS sibling route) is independently
shippable as an opt-in enterprise-procurement feature.
README.md
* Standards & Revocation table SCEP row updated to mention full RFC
8894 wire format (EnvelopedData decryption, signerInfo POPO
verification, CertRep PKIMessage builder), PKCSReq + RenewalReq +
GetCertInitial messageType dispatch, multi-profile dispatch
(/scep/<pathID>), per-profile RA cert + key, MVP fall-through for
lightweight clients.
* Enrollment protocols paragraph extended with the same scope, plus
a link to docs/legacy-est-scep.md for the operator + device-
integration guide.
docs/architecture.md
* SCEP wire format paragraph rewritten to describe the two paths
(RFC 8894 first, MVP fall-through), the messageType dispatch
table, the EnvelopedData decrypt (constant-time PKCS#7 unpad
closing the padding-oracle leg), the SET-OF Attribute
re-serialisation quirk per RFC 5652 §5.4, and the CertRep
PKIMessage shape (cert chain encrypted to req.SignerCert, NOT
the RA cert).
* SCEP service interface updated to show the three new
*WithEnvelope variants alongside the legacy PKCSReq method.
* Added 'Capabilities advertised', 'Multi-profile dispatch', and
'Must-staple per profile' subsections covering the RFC 7633
extension policy.
docs/connectors.md
* EST/SCEP Integration section extended with the per-profile
issuer-binding env-var form (CERTCTL_SCEP_PROFILE_<NAME>_ISSUER_ID).
* New SCEP RA cert + key paragraph pointing operators at the
legacy-est-scep.md openssl recipe + ChromeOS Admin Console
pointer + must-staple per-profile policy.
cowork/CLAUDE.md::Active Focus
* 2026-04-29 SCEP RFC 8894 + Intune master bundle status updated
to 'HALF 1 COMPLETE (Phases 0-5 of 14 SHIPPED)' with the full
chain of commit SHAs (
|
||
|
|
86fffa305a |
fix(deploy,helm,docs): published-image HEALTHCHECK speaks HTTPS + Helm /ready path + docs HTTPS sweep (U-2)
Pre-U-2 the published `ghcr.io/shankar0123/certctl-server` image shipped with `HEALTHCHECK CMD curl -f http://localhost:8443/health`. The server has been HTTPS-only since the v2.2 HTTPS-Everywhere milestone (`cmd/server/main.go::ListenAndServeTLS`, no plaintext fallback, TLS 1.3 pinned), so the probe failed on every interval and Docker marked the container `unhealthy` indefinitely. Operators inside docker- compose / Helm / the example stacks were unaffected — compose overrides the HEALTHCHECK with `--cacert + https://`, Helm uses explicit `httpGet` probes that ignore Docker's HEALTHCHECK, and every example compose file overrides with `curl -sfk https://localhost:8443/health`. But anyone running bare `docker run` / Docker Swarm / Nomad / ECS — exactly the "I just pulled the published image" path — saw permanent `unhealthy` status and (depending on orchestrator policy) a restart- loop. (Audit: cat-u-healthcheck_protocol_mismatch in coverage-gap-audit-2026-04-24-v5/unified-audit.md.) Recon for U-2 surfaced two adjacent bugs from the same v2.2 milestone gap, both bundled into this commit because they share the same root cause and the same operator surface: 1. Helm chart `server.readinessProbe.httpGet.path` pointed at `/readyz`, the kube-flavored convention. The certctl server doesn't register `/readyz` (only `/health` and `/ready` are wired and bypass the auth middleware — see internal/api/router/router.go:81 and cmd/server/main.go:920). K8s readiness probes therefore got 401 (api-key auth rejection) or 404 (when auth was disabled), pods stayed `NotReady` indefinitely, and Helm rollouts stalled. 2. The agent image (`Dockerfile.agent`) had no HEALTHCHECK at all, so bare-`docker run` agents got zero health signal. The compose override at `deploy/docker-compose.yml:173` called `pgrep -f certctl-agent` against the agent image, but the agent image didn't ship `procps` — pgrep was missing too. The compose probe was a latent always-fail. We fixed all three with the audit-recommended shape (option (a) — `-k`) plus three structural backstops: Files changed: Phase 1 — Dockerfile fix: - Dockerfile: HEALTHCHECK switched from `curl -f http://localhost:8443/ health` to `curl -fsk https://localhost:8443/health`. `-k` (insecure) is acceptable because the probe is localhost-to-localhost: the same process serving the cert is being probed, no network hop. Pinning `--cacert` is not viable for the published image because the bootstrap cert is per-deploy (generated into the `certs` named volume on first up; operator-supplied via Helm's `existingSecret` or cert-manager). Long-form docblock cross-references the audit closure, the compose vs Helm vs examples coverage matrix, and the CI guardrail. - Dockerfile.agent: added HEALTHCHECK using `pgrep -f certctl-agent` matching the compose pattern. Added `procps` to the runtime apk install — fixes both the new image-level HEALTHCHECK AND the pre-existing compose probe that was silently failing. Phase 2 — Helm readiness probe path: - deploy/helm/certctl/values.yaml: server.readinessProbe.httpGet.path changed from `/readyz` to `/ready`. Liveness probe path (`/health`) was correct and is unchanged. Probes block now carries an explanatory comment naming the registered no-auth probe routes and the U-2 closure rationale. Phase 3 — Image-level integration tests: - deploy/test/healthcheck_test.go (new, //go:build integration): TestPublishedServerImage_HealthcheckSpecUsesHTTPS builds the server image, inspects `Config.Healthcheck.Test` via `docker inspect`, and asserts the array contains `https://localhost:8443/health` and `-k`, and does NOT contain `http://localhost:8443/health` (positive + negative regression contracts). TestPublishedAgentImage_HealthcheckSpecExists builds the agent image and asserts the HEALTHCHECK uses `pgrep` against `certctl-agent`. Both tests `t.Skip` cleanly when docker isn't available (sandbox / CI without docker-in-docker) — verified locally: tests skip with the diagnostic and the suite returns PASS. TestPublishedServerImage_HealthcheckTransitionsToHealthy is a documented `t.Skip` placeholder until the harness wires a sidecar postgres for image-level smoke; the spec-level tests above cover the audit-flagged regression. Phase 4 — CI guardrail: - .github/workflows/ci.yml: new "Forbidden plaintext HEALTHCHECK regression guard (U-2)" step. Scoped patterns catch `HEALTHCHECK.*http://` and `curl -f http://localhost:8443/health` in any `Dockerfile*`. Comment lines exempt; docs/upgrade-to-tls.md out of scope (the post-cutover invariant string at line 182 is intentionally a documented expected-failure assertion). Verified locally on the real tree (passes) and against synthetic regressions (each fires the guard). Phase 5 — Docs sweep: - docs/connectors.md: 15 stale curl examples updated from `http://localhost:8443/...` to `https://localhost:8443/...` with `--cacert "$CA"` injected on every site. Added a one-time introductory note documenting the `$CA` extraction with `docker compose ... exec ... cat /etc/certctl/tls/ca.crt`, matching the pattern in docs/quickstart.md. Pre-U-2 these examples silently failed against the HTTPS listener. Phase 6 — Release surface: - CHANGELOG.md: appended U-2 section to the existing [unreleased] block (immediately below the G-1 entry). Sections: explanatory blockquote covering all three bugs (primary + 2 adjacent), Fixed, Added, Changed. Verification (all gates pass): - go build ./... — clean - go vet ./... — clean - go vet -tags integration ./deploy/test/ — clean - go test -short ./... — every package green - go test -tags integration -v -run TestPublishedServerImage|TestPublishedAgentImage ./deploy/test/ — three tests SKIP cleanly with "docker not available" diagnostic - helm lint deploy/helm/certctl/ — clean - helm template smoke render — succeeds; rendered Deployment carries `path: /ready` and zero `/readyz` matches - python3 yaml.safe_load on api/openapi.yaml — parses - govulncheck ./... — no vulnerabilities in our code - CI guardrail mirror: clean on real tree, fires on synthetic regression patterns Out of scope (intentionally untouched): - cmd/server/main.go::ListenAndServeTLS — HTTPS-only is correct, this finding does NOT propose adding back a plaintext listener. - deploy/docker-compose.yml:126 HEALTHCHECK — already correct. - deploy/docker-compose.test.yml HEALTHCHECK blocks — already correct. - All 5 examples/*/docker-compose.yml HEALTHCHECK overrides — already correct (they ALSO use `-fsk https://localhost:8443/health`). - Helm server.livenessProbe.httpGet — already uses `scheme: HTTPS` + `path: /health`, correct. - docs/upgrade-to-tls.md:182 `curl ... http://localhost:8443/health` invariant line — that's the expected-failure assertion for the post-cutover state ("plaintext is gone, expect Connection refused"); intentionally left intact. - Go production code — this is purely a deploy-image / probe / docs / Helm-chart fix. Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md §2 P1 cluster, cat-u-healthcheck_protocol_mismatch Audit recommendation followed verbatim: 'change Dockerfile:80 to CMD curl -kf https://localhost:8443/health'. |
||
|
|
04c7eca615 |
docs: reconcile scheduler topology across sibling docs (7 → 12 loops)
Authoritative 12-loop table lives at docs/architecture.md:522-534 (committed via
the I-001/I-003/I-005 + M48/M50 milestone commits). This change brings six sibling
docs into parity with that table so every surface — user-facing features reference,
SOC 2 compliance mapping, connectors guide, advanced demo architecture diagram,
testing guide, and in-line architecture prose — reflects the same 8 always-on + 4
opt-in topology.
Touches:
- docs/architecture.md: 2 inline ordinal references (9th / 8th loop) replaced with
descriptive names (opt-in cloud discovery / opt-in endpoint health), cross-linked
to the authoritative table to prevent future ordinal rot.
- docs/features.md: metric row (7 → 12), inline reference to 9th loop, and full
scheduler table expanded to include Always-on column + env vars + I-001/I-003/I-005
refs.
- docs/compliance-soc2.md: background scheduler monitoring bullets expanded to list
all 12 loops with env vars + I-series refs; table row updated with 8 always-on +
4 opt-in summary.
- docs/connectors.md: three inline ordinals (7th/6th/9th loop) replaced with
descriptive names, cross-linked to architecture.md.
- docs/demo-advanced.md: Mermaid SCHED node label updated from '7 background loops'
to '12 background loops (8 always-on + 4 opt-in)'.
- docs/testing-guide.md: Test 20.1.1 header + grep pattern expanded to include
job-retry / job-timeout / notification-retry / digest / endpoint-health /
cloud-discovery loops; sign-off chart row label updated.
Pure documentation reconciliation. No code changes. Master HEAD pre-commit:
|
||
|
|
3287e174dc |
Unify API auth + RFC-compliant CRL/OCSP (M-002 + M-003 + M-006, auto-closes M-001)
Closes the remaining P1 gaps from coverage-gap-audit.md (M-001/M-002/M-003/M-006)
on top of the C-001/C-002 ownership + agent-FK contract fixes landed in
|
||
|
|
6315ef102a |
security(globalsign): remove InsecureSkipVerify and pin CA pool (H-5)
The GlobalSign Atlas HVCA connector previously used InsecureSkipVerify:true on its mTLS TLS config, disabling server certificate validation and defeating the purpose of the client-side mTLS handshake. This was a CWE-295 Improper Certificate Validation vulnerability silently degrading trust on every production call to GlobalSign's signing API. Remediation (per H-5 audit finding, Lens 4.4): - Remove InsecureSkipVerify from all three http.Client construction sites (ValidateConfig, getHTTPClient, and legacy initialisation path). - Introduce buildServerTLSConfig() helper that constructs tls.Config with MinVersion: tls.VersionTLS12 (addresses adjacent L-1 recommendation). - New optional config field `server_ca_path` (env: CERTCTL_GLOBALSIGN_SERVER_CA_PATH). When unset the connector trusts the system root CA bundle (correct default for GlobalSign's publicly-trusted HVCA endpoints). When set the bundle is loaded via x509.NewCertPool() + AppendCertsFromPEM, and only those roots are trusted (supports private HVCA deployments and defence-in-depth root pinning). - Error wrapping chain: "failed to read server CA bundle at %s" and "no valid PEM certificates found in server CA bundle at %s" surface config problems at ValidateConfig time instead of silently failing at request time. Docs, config, service env-seed, and GUI issuer type definition updated to expose the new field. Tests: 9 dead `InsecureSkipVerify: true` client TLSClientConfig blocks (no-ops against httptest.NewServer plain-HTTP) replaced with bare http.Client; new TestGlobalSign_ServerTLSConfig covers pinned-CA trust, untrusted-server rejection, missing-file and invalid-PEM error paths. Verification: - go build ./... clean - go vet ./... clean - go test -race ./internal/connector/issuer/globalsign/... ./internal/config/... ./internal/service/... ok - go test ./... (excluding testcontainers-gated repo layer) ok - golangci-lint run ./... 0 issues - govulncheck ./... 0 reachable vulns - Per-layer coverage: service 68.7% (≥55), handler 83.6% (≥60), domain 82.0% (≥40), middleware 63.8% (≥30) - globalsign package coverage: 75.9% - Invariant sweep: 0 InsecureSkipVerify references remain in globalsign package (only a test-file comment documenting the removal). |
||
|
|
e1bcde4cf1 |
feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret managers. Three pluggable DiscoverySource connectors feed into the existing discovery pipeline via sentinel agent pattern, with a 9th scheduler loop for periodic cloud scanning. - AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests - Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests - GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests - CloudDiscoveryService orchestrator with 9 tests - 9th scheduler loop (6h default, atomic.Bool idempotency) - Discovery page: color-coded source type badges - 14 new env vars across CloudDiscoveryConfig structs - Docs: connectors.md, architecture.md, features.md, README updated 49 new tests. All CI checks pass (go vet, race, lint, coverage). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3f619bcaac |
feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA coverage. Entrust uses mTLS client certificate auth with sync/async issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for self-hosted Keyfactor CAs. Each connector implements the full issuer.Connector interface (9 methods), includes httptest-based unit tests (~14 each), and follows established patterns (injectable HTTP clients, RFC 5280 revocation reason mapping, CRL/OCSP delegated to CA). Also includes: issuer factory cases, env var seeding, config structs, domain types, seed data (3 rows, all disabled), OpenAPI enum updates, frontend issuer catalog entries with config fields, and full docs (connectors.md, architecture.md, features.md, README). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f2e60b93a3 |
feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths (renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs with key algorithm/size that don't match profile rules. MaxTTL enforcement caps certificate validity per issuer connector (Local CA, Vault, step-ca enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and size are now persisted in certificate_versions for audit compliance. 16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded version number from GUI sidebar. Documentation updated across architecture, features, connectors, and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
bcefb11e65 |
feat(M51): add SCEP server (RFC 8894) for MDM and network device enrollment
Implements Simple Certificate Enrollment Protocol with single-endpoint operation-based dispatch (GetCACaps, GetCACert, PKIOperation), PKCS#7 SignedData CSR extraction with fallback for raw/base64 CSR, challenge password authentication via CSR attributes, and shared internal/pkcs7 package extracted from EST handler to eliminate code duplication. 24 new tests (11 service + 13 handler) plus 5 shared pkcs7 package tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
75cf8475f5 |
tighten BSL license scope, fix documentation underselling shipped features
Broadened BSL Additional Use Grant from "hosted or managed service" to cover any commercial offering (embedded, bundled, integrated). Updated README to promote all shipped connectors from Beta to Implemented, added EST/ARI/S/MIME highlight, Helm quickstart, and corrected license description. Fixed connectors.md stale claims (AWS ACM PCA listed as planned, K8s Secrets listed as coming soon) and updated overview with exact connector counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3da6584ab8 |
fix: correct K8s Secrets status to 'Coming in 2.1', increase audit trail page size to 200
The Kubernetes Secrets target connector has config validation, tests, UI, and Helm RBAC implemented but the realK8sClient is a stub — runtime deployment will fail. Update README and connectors.md to reflect actual status instead of misleading 'Beta' label. Also increase the audit trail GUI default from 50 to 200 events per page (backend already permits up to 500). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5567d4b411 |
feat(M47): add Kubernetes Secrets target + AWS ACM PCA issuer connectors
Implement both M47 connectors with full cross-layer wiring: Kubernetes Secrets target: DNS-1123 validation, kubernetes.io/tls Secret create-or-update, chain concatenation, serial number validation, Helm RBAC gating. 18 tests. AWS ACM Private CA issuer: synchronous issuance (like Vault), ARN regex validation, RFC 5280 revocation reason mapping, CA cert retrieval, factory + env var seeding. 23 tests. Cross-cutting: domain types, service validation, config, factory, agent dispatch, frontend (TargetsPage, issuerTypes), OpenAPI, seed data, Helm chart, connectors docs, README. Testing docs (testing-guide, qa-test-guide, qa_test.go) with Parts thematically integrated near related connectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
cc03f55006 |
docs: comprehensive documentation audit — fix stale counts, V2/V3 matrix, connector status
- features.md: Fix Feature Matrix to correctly show all V2 Free features (F5/IIS/WinCertStore/JavaKeystore as Implemented, not Stub; Vault/DigiCert/ Sectigo/GoogleCAS as V2 Free, not V3 Paid). Add missing shipped features (EST, verification, export, S/MIME, ARI, digest, Helm, onboarding). Update issuer count to 9, target count to 13. - architecture.md: Fix F5/IIS from "interface only, implementation planned" to implemented. Add all 13 target connectors to built-in targets list. - why-certctl.md: Add Sectigo and Google CAS to issuer list (7→9). Fix target count (10→13). Remove hardcoded endpoint/operation counts. - connectors.md: Fix F5 BIG-IP TOC entry from "Interface Only" to "Implemented". Remove dead "Planned Issuers" TOC link. - README.md: Remove competitor product names (CertKit, KeyTalk). Remove hardcoded dashboard page count. Remove hardcoded endpoint counts. Fix V4 roadmap to remove already-shipped issuers (Sectigo, Google CAS). - Remove hardcoded MCP tool counts (78/80) across 8 files (mcp.md, architecture.md, features.md, testing-guide.md, concepts.md, quickstart.md, demo-advanced.md, why-certctl.md). Replace with "REST API exposed via MCP" to avoid future drift. - quickstart.md: Docker Compose environments table (from previous session). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
7d6ef44e21 |
feat(M46): Windows Certificate Store + Java Keystore target connectors, shared certutil package
Extract shared certutil helpers (CreatePFX, ParsePrivateKey, ComputeThumbprint, GenerateRandomPassword, ParseCertificatePEM) from IIS connector for reuse. Add WinCertStore connector (PowerShell Import-PfxCertificate, dual local/WinRM mode, configurable store/location, expired cert cleanup) and JavaKeystore connector (PEM→PKCS#12→keytool pipeline, JKS/PKCS12 support, shell injection prevention, path traversal protection). 53 new tests, all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f92c997a50 |
feat(M45): ACME certificate profile selection, ARI RFC 9773 renumber, 45-day renewal positioning
Three related ACME ecosystem changes shipped as a single milestone: 1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with `profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support. ES256 JWS signing with kid mode, nonce management, directory discovery. Empty profile delegates to standard library path (zero behavior change). Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on ACME issuer config. 2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source, docs, README, and examples. Zero remaining occurrences of RFC 9702. 3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating renewal thresholds against SC-081v3 validity reduction timeline (200→100→47 days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the expected renewal path for 6-day shortlived certs. New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
697c0be9f3 |
feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any Linux/Unix server without installing the certctl agent binary. Uses the proxy agent pattern — a single agent in the same network zone deploys certs to remote servers over SSH/SFTP. Key additions: - SSH/SFTP connector with key auth (file/inline) + password auth - Injectable SSHClient interface for cross-platform testing (25 tests) - Shell injection prevention via validation.ValidateShellCommand() - Configurable cert/key/chain paths with octal permissions - GUI: 11 SSH config fields in target create wizard Also fixes pre-existing frontend bug where all target type strings (nginx, apache, etc.) were sent as lowercase but the backend expects proper-case (NGINX, Apache, etc.), breaking GUI-created targets. Adds missing TargetTypeSSH to validTargetTypes service map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
2a14a1da01 |
feat(M40): F5 BIG-IP target connector via iControl REST
Replace 190-line stub with full iControl REST implementation (~580 lines). Token auth with 401 auto-retry, file upload + crypto object install, transaction-based atomic SSL profile updates, cleanup on failure. Injectable F5Client interface for cross-platform testing. 32 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5a53b648b1 |
feat(M44): Google CAS issuer connector
Google Cloud Certificate Authority Service integration via REST API with OAuth2 service account auth (JWT→access token). Synchronous issuance model, CA pool selection, mutex-guarded token caching, revocation with RFC 5280 reason mapping. No Google SDK dependency — all stdlib. 19 tests with httptest mock OAuth2 + CAS API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3a11e447cf |
feat(M43): Sectigo SCM issuer connector
Implement Sectigo Certificate Manager REST API connector with async order model (enroll → poll → collect PEM), 3-header auth, DV/OV/EV support, collect-not-ready (400/-183) graceful handling, and RFC 5280 revocation reason mapping. 20 tests with httptest mock API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
4c3b7cbb16 |
docs: fix stale references, seed data case bugs, and convert ASCII diagrams to Mermaid
Audit all docs and examples against current codebase state. Fix seed_demo.sql domain constant casing (IssuerType, TargetType, AgentStatus) that would cause agent dispatch failures. Fix example docker-compose health endpoints (/health not /api/v1/health) and env var names (CERTCTL_DATABASE_URL). Update connector counts, test numbers, and planned→implemented status across docs. Convert 3 ASCII flow diagrams to Mermaid. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
9feb6c796d |
feat(M42): Postfix/Dovecot mail server target connector
Dual-mode TLS connector for mail servers — single package with mode field selecting Postfix or Dovecot defaults. File-based cert/key deployment with correct permissions (cert 0644, key 0600), optional chain append, shell injection prevention, and configurable reload/validate commands. 18 tests covering config validation, deployment, and security. GUI wizard fields and OpenAPI enum updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
fd05bacb76 |
feat(M41): Envoy target connector with SDS support
File-based deployment for Envoy service mesh — writes cert/key/chain to watched directory with optional SDS JSON config for xDS bootstrap. Path traversal prevention, configurable filenames, 15 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
9a41d0ca39 |
feat(M39): IIS WinRM proxy agent mode + front-to-back wiring
Complete the IIS target connector with dual-mode deployment: - WinRM proxy agent mode via masterzen/winrm for remote Windows servers - Base64 PFX transfer with try/finally cleanup on remote host - GUI wizard updated with 13 IIS config fields including WinRM settings - TargetDetailPage sensitive field redaction (password/secret/token/key) - OpenAPI TargetType enum updated (added Traefik, Caddy) - connectors.md fully documented with WinRM proxy config example - 38 total IIS tests (10 new WinRM tests), all passing with race detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
8b52da6aef |
feat(M39): IIS target connector + README overhaul
Implement full IIS target connector with PEM-to-PFX conversion via go-pkcs12, PowerShell-based deployment (Import-PfxCertificate, IIS binding management), SHA-1 thumbprint computation, and SNI support. Injectable PowerShellExecutor interface enables cross-platform testing. Regex-validated config fields prevent PowerShell injection. 28 tests. Restructure README from 563 to 313 lines: outcome-focused feature descriptions, "Who Is This For" persona section, examples promoted above the fold, configuration/API/security reference moved to docs. All numbers verified against repo (25 GUI pages, 97 OpenAPI ops, CI thresholds service 55%/handler 60%/domain 40%/middleware 30%). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6375909591 |
feat: add Vault PKI and DigiCert CertCentral issuer connectors (M32 + M37)
Vault PKI: synchronous issuance via /v1/{mount}/sign/{role}, token auth,
revocation, CA cert retrieval, 14 tests. DigiCert CertCentral: async order
model (submit → poll → download), X-DC-DEVKEY auth, OV/EV support, PEM
bundle parsing, 16 tests. Both conditionally registered based on env vars.
Includes OpenAPI enum updates, seed data, connector docs, architecture docs,
README badges, and testing guide sign-off (Parts 38 + 39, 12 automated
smoke test assertions all passing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
bcf2c3ae92 |
feat(pre-2.1.0): demo data overhaul, examples, migration guides, install script
Pre-2.1.0 adoption polish delivering all four milestones: A) Demo Data Overhaul — seed_demo.sql rewritten with 35 certs across 5 issuers, 8 agents, 8 targets, 50+ jobs spanning 90 days, 55+ audit events, discovery scans, network scan targets, S/MIME cert. B) Examples Directory — 5 turnkey docker-compose configs: acme-nginx, acme-wildcard-dns01, private-ca-traefik, step-ca-haproxy, multi-issuer. C) Migration Guides — migrate-from-certbot.md, migrate-from-acmesh.md, certctl-for-cert-manager-users.md. D) Agent Install Script — install-agent.sh with cross-platform support (Linux systemd + macOS launchd), release.yml updated for 6-target cross-compilation. Triple-audited against codebase: 22 factual corrections applied across docs, examples, and config (env var names, CLI flags, ports, DNS hook interface, scheduler loop counts, license conversion date). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ec21c9bb29 |
feat(m28+m29+m30): ACME ARI, email digest, and Helm chart
M28: ACME Renewal Information (RFC 9702) — CA-directed renewal timing with cert ID computation, directory endpoint discovery, graceful degradation for non-ARI CAs. 19 tests. M29: Email notifier wiring + scheduled certificate digest — SMTP connector bridged to service layer via NotifierAdapter, DigestService with HTML email template, 7th scheduler loop (24h), digest preview/send API endpoints and GUI card. 21 tests. M30: Production-ready Helm chart — server Deployment, PostgreSQL StatefulSet, agent DaemonSet, ConfigMaps, Secrets, Ingress, security contexts, health probes, example values for dev/prod/ACME scenarios. Also: OpenAPI spec updates, MCP tool additions, CI helm-lint job, documentation updates across 5 doc files and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a00bb349c4 |
feat(m27): certificate export (PEM/PKCS#12) and S/MIME EKU support
Add certificate export in PEM (JSON or file download) and PKCS#12 formats. Private keys are never included — they stay on agents. Add EKU-aware issuance threading profile EKUs (serverAuth, clientAuth, codeSigning, emailProtection, timeStamping) through the full issuance pipeline. Fix agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for S/MIME vs TLS, and a pre-existing generateID collision bug in deployment job creation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
6d508cf53f |
fix: security audit remediation (AUDIT-001, 003, 004, 005, 006, 018)
- AUDIT-001: Validate OpenSSL revoke inputs (hex-only serials, RFC 5280 reasons) - AUDIT-003: Enforce /20 CIDR size cap at API level (create + update) - AUDIT-004: Support comma-separated CERTCTL_AUTH_SECRET for zero-downtime key rotation - AUDIT-005: Add ReadHeaderTimeout (5s) to prevent Slowloris - AUDIT-006: Document audit trail query parameter exclusion rationale - AUDIT-018: Add immediate-run-on-start to short-lived expiry scheduler loop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
be72627aeb |
feat: M25 post-deployment TLS verification + M26 Traefik/Caddy targets
M25: After deploying a certificate, the agent probes the live TLS
endpoint and compares SHA-256 fingerprints to verify the correct cert
is being served. Best-effort — failures don't block deployments.
New endpoints: POST /jobs/{id}/verify, GET /jobs/{id}/verification.
Migration 000008 adds verification columns to jobs table.
M26: Traefik target connector (file provider, auto-reload) and Caddy
target connector (dual-mode: admin API hot-reload or file-based).
Both wired into agent dispatch.
Also: restructured README to highlight supported integrations (issuers,
targets, notifiers) earlier, moved API/CLI/MCP sections lower. Updated
all docs (features, connectors, architecture, testing guide, why-certctl)
and fixed integration tests for 18-param RegisterHandlers signature.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
b9633e5b1a |
docs: add GUI references to discovery and network scan documentation
Update concepts.md and connectors.md to mention the Discovery and Network Scans dashboard pages alongside existing API documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |