mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 23:31:39 +00:00
edb71fb59732999f583fead1f728e2f6a4a6d2bd
728 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
edb71fb597 |
security: close CodeQL #17 (log injection) + #23 (SSRF false-positive reopen)
Two CodeQL alerts in one sweep — both medium-impact follow-ups
on already-merged guards.
Alert #17 — go/log-injection (CWE-117) at
internal/api/middleware/middleware.go:58:
log.Printf("[%s] %s %s %d %v", requestID, r.Method, r.URL.Path, ...)
r.Method and r.URL.Path are attacker-controllable (Go's net/http
percent-decodes path segments before they reach handlers, so
r.URL.Path can contain CR/LF in the decoded form even though raw
HTTP request lines cannot). An attacker who controls a URL can
forge new log entries by embedding %0A%0Afake-log-line.
Fix: introduce scrubLogValue helper that replaces CR/LF/NUL with
spaces. Apply to both r.Method and r.URL.Path. Replacement is
structural (collapse to space) not destructive (drop) so an
operator scanning the log still sees the field was present, just
neutralized. Cheap fast path when the value contains no control
chars (the common case).
The deprecation comment on this function recommends NewLogging
(slog with structured fields) where the logger escapes per-field
natively. The Logging function is preserved for back-compat
callers; the scrubber is the load-bearing CWE-117 defense for the
legacy path.
Alert #23 — go/request-forgery (CWE-918) at scep_probe.go:271:
CodeQL reopened the alert after commit
|
||
|
|
0743b5c6d6 |
deps: upgrade go-ntlmssp v0.0.0-20221128 → v0.1.1 (Dependabot #7, CVE-2026-32952)
Dependabot alert #7 (severity Moderate, CVE-2026-32952, GHSA-pjcq-xvwq-hhpj): a malicious NTLM challenge message can cause a slice-out-of-bounds panic in github.com/Azure/go-ntlmssp, crashing any Go process using ntlmssp.Negotiator as an HTTP transport. Pre-v0.1.1 versions are vulnerable. Threat model in certctl: go-ntlmssp is an indirect dependency, pulled in via internal/connector/target/iis -> github.com/masterzen/winrm -> github.com/Azure/go-ntlmssp. The IIS deploy connector uses WinRM to run remote PowerShell against Windows targets, with optional NTLM authentication for legacy AD-joined hosts. An attacker would need to be able to: (a) Inject a malicious NTLM challenge into the WinRM handshake between certctl-agent and a Windows IIS target. (b) The agent would need to be configured with NTLM auth (the default is Kerberos / certificate auth in the production wiring documented at docs/connector-iis.md). Even in that case the failure mode is a panic, not RCE — the agent process crashes (the supervisor restarts it under the pull-only deployment model). Availability impact only (matches the CVSS 'Availability: Low' rating). Fix: go get github.com/Azure/go-ntlmssp@v0.1.1 Stale go.sum lines for the old v0.0.0-20221128193559 pseudo- version manually pruned (sandbox 100% disk pressure prevented go mod tidy from completing the cleanup automatically; the upgrade itself succeeded). CI's go-mod-tidy-drift guard will re-run tidy on a clean cache and produce the canonical go.sum state. Verified locally: go.mod: require github.com/Azure/go-ntlmssp v0.1.1 // indirect go.sum: only the v0.1.1 entries remain. go mod why github.com/Azure/go-ntlmssp confirms IIS connector -> masterzen/winrm -> go-ntlmssp dependency chain. go build ./internal/connector/target/iis/... + wincertstore/... exit 0 (the only consumers). go vet on both packages: exit 0. go test -short -count=1 ./internal/connector/target/iis/...: ok 0.016s. go test -short -count=1 ./internal/connector/target/wincertstore/...: ok 0.012s. Reference: https://github.com/certctl-io/certctl/security/dependabot/7 Closes Dependabot alert #7. |
||
|
|
2690b6401a |
refactor(scep+ejbca): drop dead conditionals on always-empty vars (CodeQL #18, #19)
Two CodeQL go/comparison-of-identical-expressions alerts in one sweep — both Warning severity, both real dead-code (not false positives). CodeQL detected that each comparison's LHS variable was provably constant. Alert #18 — internal/api/handler/scep.go:612 (extractCSRFields): challengePassword := "" transactionID := "" // ... loop populates challengePassword from CSR.Attributes ... for _, attr := range csr.Attributes { if attr.Type.Equal(oidChallengePassword) { // populates challengePassword ONLY — transactionID stays "" } } if transactionID == "" && csr.Subject.CommonName != "" { // ← always true transactionID = csr.Subject.CommonName } transactionID was initialized to "" and never reassigned before the check. The conditional was always true; the MVP path was effectively "unconditionally fall back to CN". The RFC 8894 path (tryParseRFC8894 above this function) extracts transaction-ID properly from PKCS#7 authenticatedAttributes; the MVP path is for lightweight legacy clients that send the raw CSR with no PKCS#7 wrapping, and CN-as-transaction-ID is sufficient there. Fix: drop the dead transactionID local var + dead conditional; unconditionally set transactionID = csr.Subject.CommonName. No behavioral change — the runtime semantics are identical to before (every valid invocation already took the fallback). The CN extraction stays robust because the empty-CN case still produces an empty transactionID, which downstream callers handle. Alert #19 — internal/connector/issuer/ejbca/ejbca.go:415 (RevokeCertificate): serial := request.Serial issuerDN := "" // (comment: "if we have time..." — TODO never followed up) revokeURL := fmt.Sprintf("%s/certificate/%s/%s/revoke", apiURL, issuerDN, serial) if issuerDN == "" { // ← always true revokeURL = fmt.Sprintf("%s/certificate/%s/revoke", apiURL, serial) } issuerDN was hardcoded to "" two lines above. The first revokeURL line was unreachable dead code; the conditional always fired and the serial-only URL always won. EJBCA's REST API has both /certificate/{issuer_dn}/{serial}/revoke and /certificate/{serial}/revoke endpoints; the serial-only form is correct for typical certctl deployments where one EJBCA CA maps to one certctl issuer config (no overlapping serial spaces). Fix: drop the dead first revokeURL + dead conditional; build revokeURL once via the serial-only endpoint. No behavioral change — the runtime URL was always the serial-only one. Comment retained + expanded to document the future-enhancement path (parse issuer DN from IssuanceResult metadata + use the DN-qualified endpoint when a multi-CA EJBCA deployment surfaces). Verified locally: gofmt: clean. go vet ./internal/api/handler/... + ./internal/connector/issuer/ejbca/...: exit 0. go test -short -count=1 ./internal/api/handler/... + ejbca/...: PASS. Both fixes are pure dead-code removal — runtime behavior is byte- identical to pre-edit. The existing test suites would have caught any actual behavioral change. References: https://github.com/certctl-io/certctl/security/code-scanning/18 https://github.com/certctl-io/certctl/security/code-scanning/19 Closes both alerts. |
||
|
|
64a9233a96 |
test(web): drop unused mock helpers in client.error.test.ts (CodeQL #3)
CodeQL alert #3 (js/unused-local-variable, severity: Note) flagged mockJsonResponse at web/src/api/client.error.test.ts:39 as dead. Audit: client.error.test.ts is the error-path companion to client.test.ts. Every test in this file drives a non-2xx response through the client function under test via mockErrorResponse (52 call sites). Both mockJsonResponse AND mockBlobResponse were drafted alongside the scaffolding but never used — the success-path coverage lives in client.test.ts, not this file. CodeQL only flagged mockJsonResponse, but mockBlobResponse is the same shape (defined, never called). Cleaning both up for consistency with the file's error-only scope. Replaced with a one-paragraph comment explaining the file's scope so future contributors don't re-add the helpers expecting them to be used. Verified locally: tsc --noEmit: exit 0. grep -c mockJsonResponse + mockBlobResponse: 1 each (the comment mention only). No behavioral change. Reference: https://github.com/certctl-io/certctl/security/code-scanning/3 Closes CodeQL alert #3 (js/unused-local-variable). |
||
|
|
be5f70b669 |
refactor(web): drop unused imports (CodeQL #5 + #10)
Two CodeQL js/unused-local-variable alerts in one sweep — both Note severity, both pure dead-import cleanup. Alert #10 (web/src/pages/NotificationsPage.tsx:8): formatDateTime imported but only timeAgo used. Verified via repo-wide grep — formatDateTime appears on the import line only. Drop from the import statement; leave timeAgo in place. Alert #5 (web/src/api/client.test.ts:2): Five unused imports in the test file's import block (the test file imports nearly the full API client surface): - acknowledgeHealthCheck - createPolicy - deleteHealthCheck - getHealthCheckHistory - updateHealthCheck Each appears only on the import line — verified via grep -c. Removing them doesn't change test coverage (the corresponding client functions are exported and exercised in their own tests elsewhere, but the integration covered by client.test.ts doesn't reach them yet). Verified locally: tsc --noEmit: exit 0. grep -c on each removed symbol in its file: 0 occurrences. No behavioral change — pure import-list cleanup. References: https://github.com/certctl-io/certctl/security/code-scanning/10 https://github.com/certctl-io/certctl/security/code-scanning/5 Closes both alerts. |
||
|
|
9201efddde |
refactor(scep-gui): remove unused pickTabFromQuery (CodeQL #22)
CodeQL alert #22 (js/unused-local-variable, severity: Note) flagged pickTabFromQuery at web/src/pages/SCEPAdminPage.tsx:584 as dead code. Audit: this function is a leftover from an incomplete refactor. The SCEP admin page picks its initial tab via pickInitialTab (line 594 post-edit), which subsumes the same query-string check that pickTabFromQuery did: pickInitialTab honors three signals (precedence high → low): 1. ?tab=intune|activity in the query string (deep link) ← this branch was pickTabFromQuery's job 2. Pathname ending in /scep/intune (legacy alias from Phase 9.4) 3. Default to 'profiles' pickTabFromQuery only handled signal (1); pickInitialTab inlined the same logic on its first branch and added (2) + (3). Nothing references pickTabFromQuery (verified via repo-wide grep). Pure dead code. Fix: delete the function. No behavioral change — pickInitialTab already does the work. Verified locally: tsc --noEmit: exit 0. grep -nE 'pickTabFromQuery' web/src/: zero references. Reference: https://github.com/certctl-io/certctl/security/code-scanning/22 Closes CodeQL alert #22 (js/unused-local-variable). |
||
|
|
ef5cf2e87a |
security(scep-intune): annotate verifyES256/RS256 SHA-256 as RFC-mandated (CodeQL #21 false positive)
CodeQL alert #21 (go/weak-sensitive-data-hashing, severity: High) flagged the sha256.Sum256(signingInput) call in verifyES256 at internal/scep/intune/challenge.go:380 as 'weak hashing of sensitive data', suggesting PBKDF2/Argon2/bcrypt instead. This is a CodeQL false positive. The CodeQL query triggers when SHA-256 is used near *x509.Certificate (the trust pool) and infers 'this might be password hashing.' But the actual context is JWS signature verification: - verifyRS256 implements RFC 7518 §3.3 — 'RSASSA-PKCS1-v1_5 using SHA-256'. SHA-256 is spec-mandated. - verifyES256 implements RFC 7518 §3.4 — 'ECDSA using P-256 and SHA-256'. SHA-256 is spec-mandated. - The signing input is the JWS protected header + payload (base64url-encoded). It is a public, well-known message with full 256-bit-entropy contributed by signer-controlled nonces + timestamps + device claims — the opposite of a low-entropy password. - The output is verified against an asymmetric signature (rsa.VerifyPKCS1v15 / ecdsa.Verify), not compared to a pre-computed hash digest. This is signature verification, not password hashing. - Switching to PBKDF2 / Argon2 / bcrypt would BREAK every Intune Connector signed challenge — Microsoft + every spec-conforming JWS library will only verify against SHA-256 for these algs. Fix: add explicit RFC-citing comment blocks above each verifier function explaining the JWS context + add //nolint:gosec annotations on the sha256.Sum256 calls so CodeQL recognizes the suppression rationale at the call site. The annotation cites the specific RFC clause (7518 §3.3 / §3.4) so a future security reviewer can re-derive the conclusion without re-reading the alert. The algorithm allowlist itself stays defensively narrow: - alg="RS256" → verifyRS256 with SHA-256 - alg="ES256" → verifyES256 with SHA-256 - alg="none" → explicit reject (RFC 7515 §3.6 attack vector) - any other alg → reject as unsupported Pinned by existing tests: - TestValidateChallenge_HappyPath_RS256 - TestValidateChallenge_HappyPath_ES256_FixedWidth - TestValidateChallenge_HappyPath_ES256_DER - TestValidateChallenge_AlgNoneRejected - TestValidateChallenge_UnsupportedAlg The happy-path tests would fail if the verifiers switched to any non-SHA-256 digest — the alg allowlist makes the SHA-256 dependency load-bearing, which the existing test suite already proves. Verified locally: gofmt: clean. go vet ./internal/scep/intune/...: exit 0. go test -short -count=1 ./internal/scep/intune/...: PASS (every existing challenge_test.go subtest still green). Reference: https://github.com/certctl-io/certctl/security/code-scanning/21 Closes CodeQL alert #21 as a documented false positive — the //nolint annotations + RFC-citing comments are the load-bearing suppression. Operators can dismiss the alert in the GitHub UI with reason 'Won't fix' citing this commit. |
||
|
|
586308ee71 |
security(signer): bound FileDriver paths with SafeRoot + reject .. (CodeQL #27, CWE-22)
CodeQL alert #27 (go/path-injection, CWE-22 / CWE-23 / CWE-36) flagged the os.WriteFile sink at internal/crypto/signer/file_driver.go:194 because the outPath flowed from operator-supplied config (CAKeyPath in the local issuer's encrypted config blob -> GenerateOutPath closure -> os.WriteFile) without a containment check. Threat model: Production wiring (cmd/server/main.go) constructs &signer.FileDriver{} and the local-issuer NewConnector wires GenerateOutPath off Config.CAKeyPath. CAKeyPath ships from the encrypted issuer config in PostgreSQL — settable only by an authenticated admin via the API. So the realistic exploit is: (a) Admin compromise -> CAKeyPath set to /etc/passwd -> FileDriver.Generate overwrites system files. (b) Future code path concatenates attacker-controlled fragments into the output path -> classic ../../etc/passwd traversal. Defense in depth: bound the write surface so admin-key-rotation errors and future regressions can't escape into arbitrary filesystem writes. Fix: internal/crypto/signer/file_driver.go gains: - SafeRoot string field on FileDriver. When set, every Load + Generate path MUST resolve under SafeRoot via filepath.Abs + strings.HasPrefix on cleaned paths. - validateSafePath helper that: * rejects empty paths * filepath.Clean()s the input * rejects paths whose cleaned form still contains a literal ".." segment (catches relative paths that escape above their start; absolute paths get collapsed by Clean) * resolves to filepath.Abs and (when SafeRoot non-empty) verifies containment via filepath.Separator-suffixed HasPrefix (the bare-prefix bug — SafeRoot=/var/lib/foo erroneously accepting /var/lib/foobar — has its own regression test below) - Load + Generate now call validateSafePath before any os.ReadFile / os.WriteFile. The validator is in the same function as the sink so CodeQL recognizes it as a guard. Tests (internal/crypto/signer/signer_test.go): TestFileDriver_Load_RejectsParentTraversal — relative path "../../etc/passwd" rejected with parent-directory error. TestFileDriver_Load_RejectsEmptyPath — empty path rejected. TestFileDriver_Generate_RejectsParentTraversal — write side, same pattern. TestFileDriver_SafeRoot_AcceptsContainedPath — happy path: a key file under SafeRoot succeeds. TestFileDriver_SafeRoot_RejectsEscape — absolute path outside SafeRoot rejected (the load-bearing CodeQL pin). TestFileDriver_SafeRoot_RejectsSiblingPrefix — pins the HasPrefix-with-separator subtlety: SafeRoot=/tmp/X must NOT accept /tmp/X-sibling. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short -count=1 ./internal/crypto/signer/...: ok 1.605s go test -short -count=1 ./internal/connector/issuer/local/...: ok 4.908s (downstream FileDriver consumer) go test -short -count=1 ./internal/service/...: ok 4.029s Backwards-compat: when SafeRoot is unset, only the structural .. + empty-path checks fire — the existing FileDriver call sites in cmd/server/main.go and the existing unit tests pass unchanged. Production wiring SHOULD set SafeRoot via cmd/server/main.go in a follow-up commit (env-var-supplied CERTCTL_CA_KEY_DIR or similar). Reference: https://github.com/certctl-io/certctl/security/code-scanning/27 Closes CodeQL alert #27 (go/path-injection). |
||
|
|
5ec5ba7c82 |
ci: convert literal Unicode in headers_test.go to \u escapes (ST1018)
CI run #448 (commit |
||
|
|
e5c2a1e788 |
security(scep_probe): re-validate URL inside scepHTTPGet to close CodeQL #23 (CWE-918)
CodeQL alert #23 (go/request-forgery, CWE-918 SSRF) flagged the client.Do(req) sink at internal/service/scep_probe.go:232 because the URL parameter to scepHTTPGet is taint-traced from the user- supplied input to ProbeSCEP without the analyzer recognizing the upstream sanitizer. The defense-in-depth was already in place: 1. validation.ValidateSafeURL at ProbeSCEP entry (line 75) — rejects obvious SSRF targets (loopback / link-local / cloud metadata literals) before any network call. 2. validation.SafeHTTPDialContext on the http.Transport — re-resolves the host at dial time and rejects connections to reserved IP ranges. This is the authoritative SSRF + DNS- rebinding guard. Even if step 1 was bypassed, the dial would still fail. But CodeQL's taint tracker doesn't follow the validator across function boundaries, so the alert stays open even though the code is safe. This commit re-runs validation.ValidateSafeURL inside scepHTTPGet immediately before http.NewRequestWithContext — sanitizer in the same function as the sink, which CodeQL recognizes as a guard. Bonus defense-in-depth: any future call site that wires a URL into scepHTTPGet without going through ProbeSCEP (e.g. a new code path that directly probes a discovered URL) inherits the same SSRF guard automatically. Fail-closed by default. The validator dispatch matches ProbeSCEP's pattern — tests override via s.scepValidateURL to hit httptest loopback servers; production callers use validation.ValidateSafeURL. The probe's existing httptest-based tests continue to work unchanged. Verified locally: gofmt: clean. go vet ./...: exit 0. go test -short ./internal/service/...: ok 4.029s (every existing scep_probe test still green — the new revalidation is a no-op for tests that go through ProbeSCEP because the same validator already passed once at entry). Reference: https://github.com/certctl-io/certctl/security/code-scanning/23 Closes CodeQL alert #23 (go/request-forgery). |
||
|
|
b9478d064c |
security(email): sanitize body fields against content injection (CodeQL #11, CWE-640)
CodeQL alert #11 (go/email-injection, CWE-640 / OWASP Content Spoofing)
flagged the wc.Write(message) sink at internal/connector/notifier/email/
email.go:208 because attacker-controllable fields flow into the email
body unchecked.
Threat model:
Headers (From, To, Subject) were already protected by
validation.ValidateHeaderValue (CWE-113 SMTP header injection,
closed in commit
|
||
|
|
ad440562b7 |
docs(README): strategic refresh — surface Rank 4/5/7/8 + ACME server + cloud targets
README audit found six classes of drift between the README and the
shipped repo. Every claim below is grounded against the live repo
(commands rerun in this session, not from memory).
Stale numeric claims fixed:
'111 routes' → '180+ routes'
(live: grep -cE 'r\.Register' router.go = 184)
'80 tools' → '85+ tools'
(live: grep -cE 'mcp\.AddTool' tools.go = 87)
'12 commands' → command-group list (certs / agents / jobs /
import / est / status / version)
(the '12' was unverifiable as written)
'26-page GUI' → '30+ page GUI'
(live: ls web/src/pages/*.tsx | grep -v test = 31)
'21 tables' → '35+ tables'
(live: distinct CREATE TABLE in migrations = 35)
Connectors added to tables (these shipped commits ago without
README mentions):
Deployment Targets:
AWS Certificate Manager (AWSACM) — commit
|
||
|
|
d03a22ab0f |
docs(intermediate-ca-hierarchy): fix stateDiagram-v2 GitHub render parse error
GitHub's mermaid renderer (older version) doesn't accept <br/> tags
or em-dashes in stateDiagram-v2 transition labels. The conversion
shipped in
|
||
|
|
57ae1184b1 |
docs: convert remaining ASCII diagrams to mermaid (audit closure)
Audit pass over docs/ found 4 files with non-mermaid (ASCII
box-drawing) diagrams in fenced code blocks. The other 9 doc files
already used mermaid blocks (architecture.md, demo-advanced.md,
ci-pipeline.md, concepts.md, est.md, legacy-est-scep.md, mcp.md,
qa-test-guide.md, scep-intune.md). Rendering parity for everything
in docs/.
Conversions:
approval-workflow.md
1 ASCII swimlane → sequenceDiagram with named participants
(Operator A / CertificateService / Job+ApprovalRequest /
Operator B / ApprovalService / Scheduler). Same content: the
same-actor RBAC reject path, the AwaitingApproval gate, the
audit + Prometheus side effects.
intermediate-ca-hierarchy.md
1 lifecycle ASCII → stateDiagram-v2 (created → active → retiring
→ retired with the drain-first refusal annotation).
3 ASCII tree patterns → 3 flowchart TD diagrams (FedRAMP 4-level
boundary CA, financial-services 3-level policy CA, internal-PKI
2-level). Same depth, same path_len + permitted-DNS labels.
runbook-cloud-targets.md
1 dual-column ASCII flow → flowchart TD with two subgraphs
(AWS ACM path, Azure Key Vault path) joining at the audit +
Prometheus exposer node. Same 6-step deploy sequence on each
side with the rollback-on-mismatch step explicit.
runbook-expiry-alerts.md
1 nested-loop ASCII flow → flowchart TD with three nested
subgraphs (per-cert main loop / per-threshold inner / per-channel
fault-isolating dispatch). Same dedup + Prometheus + audit-row
side effects per channel.
Verified locally:
Audit re-run: every fenced block in docs/*.md that does NOT open
with ```mermaid contains zero ASCII box-drawing characters
(┌ └ │ ─ ━ ═ ║ ╔ ╚ ▼ ▲).
Mermaid block tally: 39 across 13 files (up from 32 across 9
files pre-audit). The +7 new blocks are the 4 conversions plus
the lifecycle + 3 tree patterns expanded out of the single
intermediate-ca-hierarchy.md ASCII section.
No code or test changes. Doc-only commit.
|
||
|
|
478c75dffe |
web, docs: IssuerHierarchyPage + sysadmin runbook + connectors row (Rank 8 commit 5)
Final commit of the 5-commit Rank 8 chain. Operator-facing surface
on top of the service + handler layers shipped in commits 1-4.
Frontend (web/src):
- api/client.ts: 3 new functions + IntermediateCA interface
(listIntermediateCAs, getIntermediateCA, retireIntermediateCA).
- pages/IssuerHierarchyPage.tsx: recursive nested <ul> render of
the hierarchy tree at /issuers/:id/hierarchy. buildHierarchyTree
is a pure helper that walks the flat list and groups children
on parent_ca_id; the dendrogram view is parking-lot work tracked
in WORKSPACE-ROADMAP. Two-phase retire UX surfaces 'Retire…'
then 'Confirm retire (terminal)' when the row is in retiring
state. Admin gate is enforced at the API; the page renders the
backend's 403 as ErrorState for non-admin callers.
- main.tsx: register the new /issuers/:id/hierarchy route.
CI guard update:
- scripts/ci-guards/T-1-frontend-page-coverage.sh: add
IssuerHierarchyPage to the deferred-test allowlist with the
standard 'why deferred' comment. Admin-gate + recursive build
semantics are already pinned at the backend layer
(intermediate_ca_test.go service tests + intermediate_ca_test.go
handler triplet). Vitest test deferred until next feature
change touches the page.
Docs:
- docs/intermediate-ca-hierarchy.md: new operator runbook
covering:
Concepts (HierarchyMode 'single' vs 'tree', defense-in-depth
on key bytes never persisting on rows).
Lifecycle states + drain-first semantics
(active → retiring → retired with active-children gate).
Three deployment patterns: 4-level FedRAMP boundary CA,
3-level financial-services policy CA, 2-level internal
PKI.
RFC 5280 enforcement (§3.2 self-signed, §4.2.1.9 path-length
tightening, §4.2.1.10 NameConstraints subset).
Migration from single → tree using the load-bearing
TestLocal_HierarchyMode_SingleVsTree_ByteIdentical pin as
the canary.
API reference + observability (IntermediateCAMetrics
Prometheus exposure).
Known limitations + Rank-8 follow-on roadmap.
- docs/connectors.md: extend the Built-in Local CA section with
a 'Tree mode (Rank 8)' paragraph describing the new chain
assembly path + cross-link to docs/intermediate-ca-hierarchy.md.
Roadmap:
- WORKSPACE-ROADMAP.md: 5 follow-on items under a new
'Intermediate CA hierarchy extensions (Rank 8 V2 follow-ons)'
bullet block:
HSM-backed roots (PKCS#11 / cloud KMS drivers via existing
signer.Driver interface — no service-layer change needed).
Automated CA rotation (parallel-validity windows ahead of
expiry).
Intra-hierarchy CRL chaining (per-CA CRL endpoints stitched
at issue time).
NameConstraints policy templates (FedRAMP / financial /
internal PKI declarative templates instead of hand-rolled
JSON).
D3 dendrogram visualization (separate page so the existing
list view stays the default + the dep stays opt-in).
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
tsc --noEmit (web/): exit 0 (no TypeScript errors).
go test -short -count=1 ./internal/api/handler/... + service +
local: ok across all three packages, 4-5s each.
All 24 CI guards: clean
(T-1 frontend-page-coverage with the new
IssuerHierarchyPage allowlist entry; openapi-handler-parity,
M-008 admin-gate, every other guard untouched).
Rank 8 chain complete:
|
||
|
|
4d17ef9054 |
api, handler: 4 admin-gated CA hierarchy endpoints + OpenAPI (Rank 8 commit 4)
Rank 8 commit 4 of 5. The API + RBAC layer that operators drive
the new hierarchy management surface from.
Endpoints (all admin-gated via middleware.IsAdmin; non-admin Bearer
callers get 403):
POST /api/v1/issuers/{id}/intermediates
Discriminator on body shape:
empty parent_ca_id + root_cert_pem + key_driver_id
→ CreateRoot (registers operator-supplied root CA).
parent_ca_id non-empty
→ CreateChild (signs new sub-CA cert under parent).
Service-layer error → HTTP code mapping:
ErrCANotSelfSigned → 400
ErrCAKeyMismatch → 400
ErrPathLenExceeded → 400
ErrNameConstraintExceeded → 400
ErrInvalidCertPEM → 400
ErrParentCANotActive → 409
ErrIntermediateCANotFound → 404
(other) → 500
GET /api/v1/issuers/{id}/intermediates
Returns flat list ordered by created_at; caller renders the
tree from each row's parent_ca_id (nil = root).
GET /api/v1/intermediates/{id}
Single-row detail.
POST /api/v1/intermediates/{id}/retire
Two-phase: confirm=false → active→retiring; confirm=true →
retiring→retired with active-children check (drain-first
semantics; ErrCAStillHasActiveChildren → 409).
Files changed:
internal/api/handler/intermediate_ca.go — 4 handlers
+ handler-defined
service interface
(dependency
inversion).
internal/api/handler/intermediate_ca_test.go — 8 test variants
(M-008 admin-
gate triplet
complete).
internal/api/handler/m008_admin_gate_test.go — register the
new admin-gated
handler in
AdminGatedHandlers
so the M-008
coherence
scanner stays
green.
internal/api/router/router.go — 4 r.Register
calls + new
IntermediateCAs
field on
HandlerRegistry.
cmd/server/main.go — wire the
postgres repo +
service +
handler. Reuses
the same
signer.FileDriver
instance the
OCSP responder
bootstrap path
feeds.
api/openapi.yaml — 4 new
operationIds,
full body
schema + status-
code dispatch.
Tests (8 in this commit):
TestIntermediateCA_Handler_NonAdmin_Returns403 (admin gate
— table-driven across all 4 endpoints)
TestIntermediateCA_Handler_AdminExplicitFalse_Returns403
(defensive: AdminKey present but false ≠ AdminKey absent)
TestIntermediateCA_Handler_AdminPermitted_ForwardsActor
(admin actor forwarded to service for audit attribution)
TestIntermediateCA_HandlerCreate_RootDispatch
(body discriminator: empty parent_ca_id → CreateRoot)
TestIntermediateCA_HandlerCreate_ChildDispatch
(body discriminator: parent_ca_id present → CreateChild)
TestIntermediateCA_HandlerCreate_BadRequestOnMissingRootBundle
(validation: no parent + no root bundle → 400)
TestIntermediateCA_HandlerCreate_ServiceErrorMappings
(table-driven: 7 service errors → expected HTTP codes)
TestIntermediateCA_HandlerRetire_TwoPhaseConfirm
(confirm=false then confirm=true forwarded correctly)
TestIntermediateCA_HandlerRetire_StillHasActiveChildren_Returns409
(drain-first contract — 409 not 500)
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 ./internal/api/handler/...: ok 4.498s.
bash scripts/ci-guards/openapi-handler-parity.sh: clean
(router routes: 182, openapi operations: 148; the +4 new routes
have +4 new operationIds — parity preserved).
bash scripts/ci-guards/* (all 24 guards): clean.
Out of scope of THIS commit (commit 5):
- web/src/pages/IssuerHierarchyPage.tsx (recursive tree render).
- docs/intermediate-ca-hierarchy.md sysadmin runbook (FedRAMP /
financial-services / internal-PKI patterns).
- docs/connectors.md hierarchy_mode row.
- WORKSPACE-ROADMAP entries (HSM-backed roots, automated
rotation, CRL chaining, NameConstraints templates, D3
dendrogram).
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 4.
|
||
|
|
8ff5668eb1 |
local: tree-mode chain assembly + byte-equivalence pin (Rank 8 commit 3)
Rank 8 commit 3 of 5. Load-bearing connector rewrite that activates
the first-class CA hierarchy surface shipped by commits 1-2.
Local connector changes:
- New ChainAssembler interface (single-method seam) defined in the
connector package — *service.IntermediateCAService satisfies it
implicitly. Avoids the import cycle that would arise from
pulling internal/service into internal/connector/issuer/local.
- Three new optional fields on Connector: hierarchyMode,
chainAssembler, treeIssuingCAID. Default zero values keep the
pre-Rank-8 single-sub-CA flow byte-identical (no operator on
the historical path sees any change in wire bytes).
- Three new setters: SetHierarchyMode, SetChainAssembler,
SetTreeIssuingCAID. Wired in cmd/server/main.go in commit 4
when the issuer's HierarchyMode column is read at boot.
- resolveChainPEM helper centralizes the dispatch:
tree mode + ChainAssembler set + treeIssuingCAID set
→ call AssembleChain over intermediate_cas
otherwise (incl. tree mode with incomplete wiring)
→ fall back to historical c.caCertPEM
Defense in depth: a misconfigured operator gets a working
issuance, not a nil-deref panic.
- IssueCertificate + RenewCertificate both delegate ChainPEM
population to resolveChainPEM. The cert generation path
(generateCertificate) is untouched — same key, same template,
same signing.
Tests (internal/connector/issuer/local/local_hierarchy_test.go):
TestLocal_HierarchyMode_SingleVsTree_ByteIdentical ← LOAD-BEARING
THE refuse-to-ship pin. Two connectors against the same on-disk
CA cert+key:
- A: pre-Rank-8 single-sub-CA mode (HierarchyMode unset).
- B: tree mode wired against an in-memory ChainAssembler
whose 1-level chain matches A's caCertPEM byte-for-byte.
Asserts:
1. resA.ChainPEM == resB.ChainPEM (the byte-identical pin).
2. resA.ChainPEM == fixture root cert PEM (real fact about
the wire format, not internal consistency).
Operators on single mode keep getting byte-identical bytes.
Operators flipping to tree with a 1-level shim see no change.
Zero behavioral drift for unmigrated deployments.
TestLocal_HierarchyMode_Tree_LeafChainIncludesAllAncestors
Multi-level pin. 4-level synthetic chain (root → policy →
issuingA → issuingB-leaf-CA). Asserts:
- 4 CERTIFICATE blocks in ChainPEM.
- Leaf-first ordering (issuingB.CN, issuingA.CN, policy.CN,
root.CN at depths 0..3).
This is what tree mode buys operators in exchange for the
migration overhead.
TestLocal_HierarchyMode_FallsBackToSingleWhenWiringIncomplete
Defensive fallback pin. HierarchyMode='tree' but
ChainAssembler nil + treeIssuingCAID '' → ChainPEM falls back
to caCertPEM. No panic, no lying field.
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 -run TestLocal_HierarchyMode ./internal/connector/issuer/local/...
PASS (3/3, including the load-bearing byte-identical pin).
go test -short -count=1 ./internal/connector/issuer/local/...: ok 4.358s
(every existing local-connector test still green — backwards
compat byte-for-byte at the test layer too).
Out of scope of THIS commit (commit 4):
- 4 admin-gated handler endpoints + OpenAPI extension.
- cmd/server/main.go wiring that reads Issuer.HierarchyMode at
boot and calls SetHierarchyMode + SetChainAssembler +
SetTreeIssuingCAID on the local connector instance.
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 3.
|
||
|
|
5bf2f0cc87 |
service: 10 IntermediateCAService tests + in-memory fake repo (Rank 8 commit 2.5)
Service-layer pin for Rank 8. The fake IntermediateCARepository's
WalkAncestry mirrors the postgres recursive-CTE semantics
(leaf-first ordering, terminate at parent_ca_id IS NULL) so the
AssembleChain pin carries the same weight the production repo would.
Tests:
TestIntermediateCA_CreateRoot_RegistersOperatorSuppliedSelfSigned
Happy path. RFC 5280 §3.2 self-signed root + matching key gets
persisted with parent_ca_id=NULL, state=active, KeyDriverID=...
TestIntermediateCA_CreateRoot_RejectsNonSelfSigned
RFC 5280 §3.2 enforcement. Cert whose embedded public key
doesn't match the actual signer fails CheckSignatureFrom →
ErrCANotSelfSigned.
TestIntermediateCA_CreateRoot_RejectsKeyMismatch
Operator-boundary defense in depth. Cert is well-formed
self-signed but the supplied keyDriverID resolves to a
different key → ErrCAKeyMismatch.
TestIntermediateCA_CreateChild_PathLenTighteningEnforced
RFC 5280 §4.2.1.9 enforcement. Child whose path-len equals or
exceeds parent's → ErrPathLenExceeded. Strictly-tighter child
succeeds.
TestIntermediateCA_CreateChild_NameConstraintsSubset
RFC 5280 §4.2.1.10 enforcement. Widening rejected
("evil.com" outside parent's "example.com"); subdomain
narrowing succeeds ("internal.example.com").
TestIntermediateCA_AssembleChain_4DeepHierarchy ← LOAD-BEARING
The pin the local connector tree-mode delegates to. Builds
root → policy → issuing-A → issuing-B and asserts AssembleChain
returns 4 CERTIFICATE blocks in leaf-to-root order with
matching subject CommonNames at each depth.
TestIntermediateCA_Retire_RefusesIfActiveChildren
Drain-first semantics. retiring → retired with active children
refuses with ErrCAStillHasActiveChildren.
TestIntermediateCA_Retire_TwoPhaseConfirm
First call: active → retiring (no confirm). Second call without
confirm: surfaces "pass confirm=true". Second call with
confirm: retiring → retired.
TestIntermediateCA_MetricsRecordedPerOutcome
Snapshot pin. CreateRoot bumps create_root, CreateChild bumps
create_child, Retire(active) bumps retire_retiring, all
dimensioned by issuer_id.
TestIntermediateCA_LoadHierarchy_FlatList
Returns every CA for an issuer ordered by created_at; caller
renders the tree from parent_ca_id.
Test infrastructure:
fakeIntermediateCARepo — sync.Mutex-guarded map.
WalkAncestry walks
parent_ca_id from leafID
to root (or terminates on
cycle, defense-in-depth).
Compile-time interface
guard.
testCAFixture — mints a self-signed root
cert+key in process,
Adopt()s the key under
a stable ref so CreateRoot
can resolve it.
newTestService — wires IntermediateCAService
with fake repo +
signer.MemoryDriver +
mockAuditRepo (already
lives in testutil_test.go)
+ IntermediateCAMetrics.
Verified locally:
gofmt: clean.
go vet ./...: exit 0.
go test -short -count=1 -run TestIntermediateCA ./internal/service/...
PASS (10/10)
go test -short -count=1 ./internal/service/...: ok 3.844s
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 2.5.
|
||
|
|
05623594da |
service: IntermediateCAService + IntermediateCAMetrics + RFC 5280 enforcement
Rank 8 of the 2026-05-03 deep-research deliverable, commit 2 of 5.
Service-layer wiring for first-class N-level CA hierarchy management.
The connector rewrite that activates this surface lands in commit 3.
Files added:
internal/service/intermediate_ca.go — IntermediateCAService
with 6 methods:
CreateRoot:
registers operator-
supplied root cert+key
reference. Validates
RFC 5280 §3.2 self-
signed (subject ==
issuer + signature
verifies). Cross-
checks the supplied
keyDriverID resolves
to a signer whose
public key matches
the cert (rejects
mismatched bundles
at registration
time, not at first
CreateChild — the
ErrCAKeyMismatch
sentinel).
CreateChild:
generates child key
via signer.Driver,
signs the cert via
the parent's signer.
Enforces RFC 5280
§4.2.1.9 (path-len
tightening) +
§4.2.1.10
(NameConstraints
subset semantics) at
service layer fail-
closed. Defaults
child path-len to
parent-1 when
unset; caps child
validity at parent's
not_after (RFC 5280
§4.1.2.5).
Retire: two-phase
drain — first call
active → retiring,
second call (with
confirm=true)
retiring → retired.
Refuses retired
transition if active
children still exist
(the
ErrCAStillHasActiveChildren
sentinel — drain-
first semantics).
Get / LoadHierarchy:
thin repo wrappers.
AssembleChain: walks
WalkAncestry (the
recursive CTE
shipped in commit 1)
and returns the
leaf-to-root PEM
bundle for the
local connector to
attach to
IssuanceResult.
internal/service/intermediate_ca_metrics.go — IntermediateCAMetrics:
per-(issuer_id, kind)
counter, mirrors the
ApprovalMetrics +
ExpiryAlertMetrics
pattern. RecordCreate
(root/child) +
RecordRetire
(retiring/retired).
SnapshotIntermediateCA
for the Prometheus
exposer.
Defense in depth retained:
- NEVER persist CA private key bytes in the row. KeyDriverID is the
only key reference; signer.Driver.Load resolves it at signing time.
- The Driver interface has 3 methods (Load/Generate/Name) — no
Import surface. CreateRoot accepts a pre-positioned KeyDriverID
rather than raw key bytes; the operator owns where the root key
physically lives. Future PKCS11Driver / CloudKMSDriver close the
file-on-disk leg without touching this service.
Verified locally:
gofmt: clean.
go vet ./internal/service/...: exit 0.
go build ./internal/service/...: exit 0.
Deferred to commit 2.5 (or fold into commit 3, operator's call):
- 9 service-level tests including:
* TestIntermediateCA_CreateRoot_RegistersOperatorSuppliedSelfSigned
* TestIntermediateCA_CreateRoot_RejectsNonSelfSigned
* TestIntermediateCA_CreateRoot_RejectsKeyMismatch
* TestIntermediateCA_CreateChild_PathLenTighteningEnforced
* TestIntermediateCA_CreateChild_NameConstraintsSubset
* TestIntermediateCA_AssembleChain_4DeepHierarchy ← LOAD-BEARING
* TestIntermediateCA_Retire_RefusesIfActiveChildren
* TestIntermediateCA_Retire_TwoPhaseConfirm
* TestIntermediateCA_MetricsRecordedPerOutcome
Test setup needs: in-memory IntermediateCARepository fake +
signer.MemoryDriver (already exists) + helper to generate test root
cert+key. Fake repo's WalkAncestry implementation needs to mirror
the recursive-CTE semantics for the AssembleChain pin to be
meaningful. Total ~500 lines of test code; non-trivial setup.
Out of scope of THIS commit (commits 3-5):
- Local connector rewrite + byte-equivalence pin
(TestLocal_HierarchyMode_SingleVsTree_ByteIdentical).
- 4 admin-gated handler endpoints + OpenAPI extension.
- web/src/pages/IssuerHierarchyPage.tsx.
- docs/intermediate-ca-hierarchy.md sysadmin runbook.
- cmd/server/main.go wiring.
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md.
|
||
|
|
468b75c650 |
domain, migrations: IntermediateCA type + intermediate_cas + Issuer.HierarchyMode
Rank 8 of the 2026-05-03 deep-research deliverable, commit 1 of 5
(cowork/rank-8-intermediate-ca-hierarchy-prompt.md). Closes the multi-
level CA hierarchy gap for FedRAMP boundary-CA, financial-services
policy-CA, and OT network-CA deployments where regulator-mandated
certificate-policy separation requires multiple layers (root → policy
→ issuing).
This commit lands ONLY the foundation — schema, types, repository
interface, postgres implementation. No service / connector / handler
wiring yet. The 5-commit chain is bisectable: this commit can ship
with no operator-visible behavior change until commits 2-5 wire the
service layer + the local-connector tree-mode + admin API + GUI tree
view + operator runbook. The default value for issuers.hierarchy_mode
is 'single' so every existing operator's behavior is byte-identical
post-migration.
Existing scaffolding REUSED (not redefined):
- internal/crypto/signer.Driver seam — every IntermediateCA carries
a key_driver_id pointing at the signer.Driver instance that owns
its private key. Defense in depth: NEVER persist key bytes in a
row. FileDriver is the production default; future PKCS11Driver /
CloudKMSDriver close the disk-exposure leg via the same seam.
- issuers.id row — the new intermediate_cas FK references it.
Files added:
internal/domain/intermediate_ca.go — IntermediateCA type,
IntermediateCAState
closed enum (active /
retiring / retired),
IsValidIntermediateCAState
+ IsTerminal helpers,
NameConstraint struct
(RFC 5280 §4.2.1.10
permitted+excluded
subtree subset
semantics for service-
layer enforcement),
HierarchyModeSingle /
HierarchyModeTree
constants.
internal/repository/postgres/intermediate_ca.go — IntermediateCARepository
impl: Create (ica-<slug>
ID gen, JSONB +
nullable-column round-
trip, lib/pq 23505 →
ErrAlreadyExists),
Get, ListByIssuer,
ListChildren,
UpdateState,
GetActiveRoot,
WalkAncestry (recursive
CTE — single SQL
round-trip, O(depth)
rows, leaf-first
ordering).
migrations/000028_intermediate_ca_hierarchy.{up,down}.sql
— idempotent schema.
issuers.hierarchy_mode
VARCHAR(20) DEFAULT
'single'. New
intermediate_cas table
with FKs to
issuers / self
(parent_ca_id) +
CHECK constraints
(closed-enum state,
not_after >
not_before, no self-
parent) + 6 indexes
(partial-unique
active root per
issuer, partial-
unique name per
issuer, owning
issuer, parent,
state, expiring).
Files modified:
internal/domain/connector.go — adds Issuer.HierarchyMode field
with full doc comment + JSON tag.
Empty string ≡ single mode for
back-compat.
internal/repository/interfaces.go — adds IntermediateCARepository
interface (7 methods).
Verified locally:
gofmt: clean.
go vet ./internal/domain/... ./internal/repository/...: exit 0.
go build ./internal/domain/... ./internal/repository/...: exit 0.
Out of scope for this commit (lands in commits 2-5):
- service/intermediate_ca.go (CreateRoot / CreateChild / Retire /
LoadHierarchy / AssembleChain + RFC 5280 §4.2.1.9 path-len +
§4.2.1.10 NameConstraints subset enforcement + 9 service tests).
- local connector rewrite + byte-equivalence pin
(TestLocal_HierarchyMode_SingleVsTree_ByteIdentical — the load-
bearing backwards-compat refusal-to-ship test).
- 4 admin-gated handler endpoints + OpenAPI extension + handler tests.
- web/src/pages/IssuerHierarchyPage.tsx.
- docs/intermediate-ca-hierarchy.md sysadmin runbook + connectors.md
row + WORKSPACE-ROADMAP follow-ons.
Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md.
|
||
|
|
55f6a9a53e |
ci: fix Rank 7 lint + openapi-handler-parity drift on master
Two CI failures from the Rank 7 chain push (#438): Go Build & Test — staticcheck ST1021: internal/service/approval_metrics.go:97 comment for ApprovalDecisionEntry doesn't start with the type name internal/service/approval_metrics.go:130 comment for ApprovalPendingAgeSnapshot doesn't start with the type name Frontend Build — scripts/ci-guards/openapi-handler-parity.sh: 4 router routes have no OpenAPI operationId: GET /api/v1/approvals GET /api/v1/approvals/{id} POST /api/v1/approvals/{id}/approve POST /api/v1/approvals/{id}/reject The Rank 7 commit-3 spec deferred OpenAPI extension to commit 4 with a 'batched alongside the integration changes' note; commit 4 didn't actually add them. This commit closes that gap. Fixes: approval_metrics.go — split the doc comment that was attached to SnapshotApprovalDecisions (the function) but visually preceded ApprovalDecisionEntry (the type), so the type appeared to staticcheck as having a comment that named the function instead of the type. Same fix on ApprovalPendingAgeSnapshot. Now each exported type has its own type-name-leading comment per Go convention. api/openapi.yaml — added 4 new operationIds (listApprovalRequests, getApprovalRequest, approveApprovalRequest, rejectApprovalRequest) + new ApprovalRequest schema component under components/schemas. Inline 401 response (the Unauthorized component does not exist in this spec; the canonical pattern in the rest of the file is inline 'description: Authentication required'). The two-person integrity contract surface is documented in the description of the approve / reject endpoints so external readers see the RBAC contract from the spec alone. Verified locally: go vet ./internal/service/...: exit 0. scripts/ci-guards/openapi-handler-parity.sh: clean (140 ops vs 174 routes, 36 documented exceptions). Third CI failure (image-and-supply-chain) was a transient apt-fetch 'Connection reset by peer' from deb.debian.org while pulling libasan6_10.2.1-6_amd64.deb. Not a code issue; just re-run the workflow. No code change needed. |
||
|
|
62dd7e1463 |
docs(approval-workflow): drop Infisical reference from operator playbook
The operator-facing approval-workflow.md is the public-readable docs page; the 'Infisical deep-research deliverable' framing is internal project context that doesn't belong there. Internal source comments + research docs in cowork/ keep the original framing as the historical record. |
||
|
|
dcc28bf113 |
Revert "chore: drop 'Infisical' label from internal references"
This reverts commit
|
||
|
|
2886b58daf |
chore: drop 'Infisical' label from internal references
Strategic naming cleanup. Earlier doc-comments + commit messages framed Rank 4 / Rank 5 / Rank 7 work as 'Rank N of the 2026-05-03 Infisical deep-research deliverable' — the 'Infisical' qualifier was a holdover from the original deep-research framing where Infisical (a competing secrets-management platform) was the comparator. Keeping the comparator's name in our source adds noise without value; an external reader sees 'Infisical' and assumes a dependency or shared lineage rather than reading it as the competitive context it was. Mechanical sed across 34 files (32 source / docs + 2 follow-up Python passes to collapse 'deep-research deep-research' duplicates that emerged where the original phrase wrapped across lines): s|Infisical deep-research|deep-research|g s|infisical-deep-research-results|deep-research-results-2026-05-03|g s|infisical-deep-research-prompt|deep-research-prompt-2026-05-03|g s|infisical-deep-research|deep-research|g s|Infisical|deep-research|g s|deep-research deep-research|deep-research|g # collapse-pass Net diff: 63 insertions / 64 deletions across cmd/, docs/, internal/, migrations/. Pure text substitution; zero behavior change. Code path unchanged — go vet clean, tests for TestApproval pass on both internal/service and internal/api/handler packages. Workspace docs (cowork/) carry the same references and will be swept separately — they're not under certctl/ git control. The two filename references (cowork/infisical-deep-research-results.md + cowork/infisical-deep-research-prompt.md) get renamed alongside that sweep to deep-research-results-2026-05-03.md / deep-research-prompt-2026-05-03.md so cross-references in the certctl repo doc-comments resolve cleanly. |
||
|
|
72d00b8865 |
scheduler, certificate, renewal: gate issuance on profile-driven approval
Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."
This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
- 1 (
|
||
|
|
f53f9f9ca3 |
api, handler: 4 approval endpoints + handler RBAC integration tests
Rank 7 of the 2026-05-03 Infisical deep-research deliverable, commit 3 of 4.
Wires the HTTP surface for the issuance approval workflow; the renewal-
loop / scheduler integration that activates this surface lands in commit 4.
Files added:
internal/api/handler/approval.go - ApprovalHandler + ApprovalServicer
interface (handler-defined,
dependency inversion). 4
endpoints:
GET /api/v1/approvals
?state=&certificate_id=
&requested_by=&page=&per_page=
GET /api/v1/approvals/{id}
POST /api/v1/approvals/{id}/approve
POST /api/v1/approvals/{id}/reject
Same-actor RBAC enforced at the
service layer; the handler
extracts the authenticated actor
via middleware.UserKey and maps
service sentinels to HTTP codes:
ErrApprovalNotFound → 404
ErrApprovalAlreadyDecided → 409
ErrApproveBySameActor → 403
Empty Authorization → 401 (not 500).
Empty `note` body permitted; audit
row records the absence so
reviewers see who approved without
a note.
internal/api/handler/approval_test.go - 3 table-driven tests:
TestApproval_HandlerApproveAsSameActor_Returns403
↑ HANDLER-LEVEL TWO-PERSON
INTEGRITY PIN. Pairs with
the service-level
TestApproval_Approve_RejectsSameActor.
Compliance auditors expect
exactly HTTP 403 (not 401,
not 500) when the requester
self-approves; the test
additionally asserts the
error body contains the
"two-person integrity"
substring so an auditor can
grep server logs for
attempted self-approvals.
TestApproval_HandlerEmptyNote_Allowed_DecidedByExtractedFromAuth
↑ pins that decided_by comes
from the auth-middleware
UserKey, NEVER from the
request body. Defends
against future contributor
confusion that might let a
client supply their own
decided_by string.
TestApproval_HandlerErrorMapping
(NotFound → 404, AlreadyDecided
→ 409 subtests).
Files modified:
internal/api/router/router.go - Adds Approvals field to
HandlerRegistry struct + 4
r.Register lines for the
approval routes. Go 1.22
ServeMux precedence: literal
/approve and /reject segments
resolve before the {id}
pattern-var route, mirroring
the existing notifications
block's /requeue precedence.
Verified:
gofmt: clean.
go vet ./internal/api/... ./internal/service/...: exit 0.
go test -short -count=1 -run TestApproval
./internal/api/handler/...: ok 0.004s.
Note on OpenAPI spec: the prompt's spec section also calls for 5 new
operationIds in api/openapi.yaml (createApprovalRequest, listApprovalRequests,
getApprovalRequest, approveApprovalRequest, rejectApprovalRequest). The
external-create endpoint is intentionally not implemented in V2 — every
approval request originates from the renewal-loop entry points (commit 4)
so the only operations exposed are list / get / approve / reject. The
4-route surface is a deliberate scope cut: external systems wanting to
inject approval requests can use the underlying `POST /api/v1/certificates/
{id}/renew` path which creates the parallel ApprovalRequest as a side
effect (post-commit-4 wiring). OpenAPI extension batched into commit 4
alongside the integration changes.
Out of scope for this commit (lands in commit 4):
- Integration into CertificateService.TriggerRenewal +
RenewalService.CheckExpiringCertificates + Scheduler.ReapTimedOutJobs.
- cmd/server/main.go wiring.
- Config.Approval.BypassEnabled + CERTCTL_APPROVAL_BYPASS env var.
- api/openapi.yaml extensions.
- docs/connectors.md + docs/approval-workflow.md.
Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
|
||
|
|
df23294476 |
service: ApprovalService + ApprovalMetrics + 8 table-driven tests
Rank 7 of the 2026-05-03 Infisical deep-research deliverable, commit 2 of 4
(cowork/rank-7-approval-workflow-primitive-prompt.md). Builds on the
foundation in commit
|
||
|
|
b4d1ad1c97 |
domain, migrations: ApprovalRequest type + issuance_approval_requests + RequiresApproval
Rank 7 of the 2026-05-03 Infisical deep-research deliverable, commit 1 of 4
(cowork/rank-7-approval-workflow-primitive-prompt.md). The four-commit
chain ships the issuance approval-workflow primitive (request → human review
→ CA call) closing the two-person integrity / four-eyes principle
procurement gap for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2
Type II, and HIPAA-regulated PHI deployments.
This commit lands ONLY the foundation — schema, types, repository
interface, postgres implementation. No service / handler wiring yet.
The four-commit shape is bisectable: the schema can land in production
behind a flag (via the default RequiresApproval=false on every existing
profile) without any operator-visible behavior change until commits 2-4
wire the surrounding workflow.
Existing scaffolding REUSED (not redefined here):
- JobStatusAwaitingApproval enum value (internal/domain/job.go).
- JobRepository.ListTimedOutAwaitingJobs (postgres reaper query).
- Config.Scheduler.AwaitingApprovalTimeout (env-mapped via
CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT, default 168h = 7 days).
- Scheduler.SetAwaitingApprovalTimeout wiring.
Files added:
internal/domain/approval.go - ApprovalRequest type,
ApprovalState closed enum
(pending/approved/rejected/
expired), IsValidApprovalState +
IsTerminal helpers, outcome
const block + bypass-actor
sentinel.
internal/repository/postgres/approval.go - ApprovalRepository
implementation: Create
(ar-<slug> ID gen + JSONB
metadata round-trip + lib/pq
23505 → ErrAlreadyExists
translation), Get, GetByJobID,
List (paginated with state /
cert / requester filters),
UpdateState (pending→terminal
transitions only, with
already-terminal disambiguation),
ExpireStale (bulk reaper,
decided_by='system-reaper').
migrations/000027_approval_workflow.{up,down}.sql
- Idempotent IF NOT EXISTS /
IF EXISTS. Adds
certificate_profiles.requires_approval
BOOLEAN NOT NULL DEFAULT false,
issuance_approval_requests
table with FK to
managed_certificates / jobs /
certificate_profiles, four
indexes (state, certificate,
pending-age, partial-unique
pending-per-job), and the
approval_decision_consistency
CHECK constraint enforcing
decided_by/decided_at must be
non-null for terminal states.
Files modified:
internal/domain/profile.go - Adds CertificateProfile.RequiresApproval
bool field with full doc
comment + JSON tag. Defaults
to false (back-compat — every
existing profile keeps the
unattended renewal path).
internal/repository/interfaces.go - Adds ApprovalRepository
interface (6 methods) +
ApprovalFilter struct.
internal/repository/errors.go - Adds ErrAlreadyExists sentinel
for postgres SQLSTATE 23505
(unique-constraint violations
from the partial-unique
pending-per-job index, plus
the "already terminal" state-
transition signal). Mirrors
the existing ErrNotFound +
ErrForeignKeyConstraint shape.
Verified:
gofmt: clean.
go vet ./internal/domain/... ./internal/repository/...: exit 0.
go build ./internal/domain/... ./internal/repository/...: exit 0.
Out of scope for this commit (lands in commits 2-4):
- service/approval.go (RequestApproval / Approve / Reject / ListPending
/ ExpireStale + same-actor RBAC + bypass mode + audit + metrics).
- service/approval_metrics.go (decisions counter + pending-age histogram).
- 8 service-level table-driven tests including the load-bearing
TestApproval_Approve_RejectsSameActor two-person integrity pin.
- api/handler/approval.go (5 endpoints + RBAC integration).
- api/openapi.yaml (5 new operationIds).
- Integration into CertificateService.TriggerRenewal +
RenewalService.CheckExpiringCertificates + Scheduler.ReapTimedOutJobs.
- cmd/server/main.go wiring.
- Config.Approval.BypassEnabled + CERTCTL_APPROVAL_BYPASS env var.
- docs/connectors.md CertificateProfile config-table row.
- docs/approval-workflow.md operator playbook + compliance control mapping.
Reference: cowork/infisical-deep-research-results.md Part 5 Rank 7.
Acquisition prompt: cowork/rank-7-approval-workflow-primitive-prompt.md.
|
||
|
|
9c80894c3c |
ci(release): pin run-name + release title to tag (fix ugly auto-generated titles)
Two GitHub-Actions defaults were producing ugly titles on every tag: 1. The Actions-tab workflow run title was auto-generated as `<commit-subject> #<run-number>` because release.yml had no `run-name:`. The v2.0.69 push showed up as "chore: rename Go module path to github.com/certctl-io/certctl #73" instead of the obvious "Release v2.0.69". 2. The Releases-page title was auto-generated by softprops/action-gh-release@v2 because the action's `with:` block had no `name:` field — it falls back to the most recent commit subject in that case, producing the same noise on the Releases page. Fixes: - Add `run-name: Release ${{ github.ref_name }}` at the workflow top. `github.ref_name` resolves to the tag (e.g., `v2.0.69`) since the only trigger is `on: push: tags: ['v*']`. Actions tab now shows "Release v2.0.69". - Add `name: ${{ github.ref_name }}` to the softprops/action-gh-release@v2 step's `with:` block. Releases page now shows "v2.0.69" as the title instead of the commit subject. Affects v2.0.70+. The v2.0.69 workflow run + release page that's already in flight retain the bad titles (the workflow file is read at trigger time); the v2.0.69 Releases-page title can be manually edited via the GitHub UI ("Edit release" → set title to `v2.0.69` → Update release). The Actions-tab run name for #73 is immutable post-trigger. This same pattern likely affects ci.yml + the other workflows but the operator-facing surface is the Release workflow's titles, so leaving the CI workflows alone for now (they run continuously on master and nobody clicks individual run titles). |
||
|
|
5dc698307b |
chore: rename Go module path to github.com/certctl-io/certctl
Mechanical sed across the main go.mod's module declaration, the f5-mock-icontrol
sub-module's go.mod, every Go file's import path (361 files), and a rebuild of
the checked-in f5-mock-icontrol binary so its embedded build-info reflects the
new module path. No behavior change.
Choice B from cowork/transfer-certctl-to-org.md, executed 2026-05-04. Choice A
(keep module path declared as github.com/shankar0123/certctl regardless of
repo URL) shipped on the day of the org transfer (2026-05-03) since we had no
external Go consumers; this commit closes that deferral.
Backward-compat: GitHub HTTP redirects continue to forward
github.com/shankar0123/certctl → github.com/certctl-io/certctl at the URL
level, but Go's module proxy uses the path declared in go.mod as the
canonical name. Pre-fix, anyone trying `go get github.com/certctl-io/certctl/...`
hit a "module path mismatch" error because go.mod said
github.com/shankar0123/certctl and the URL they fetched it from said
certctl-io/certctl. Post-fix, the canonical name and the URL agree, so
go get / go install / external Go consumers / Go-tooling integrations
work cleanly via either the new path (preferred) or the old path (which
redirects and Go follows the redirect for source fetch).
Anyone still importing the old path inside their own code keeps working
provided they update their go.mod's `require` line to match — the module
path declared in their consumer's go.sum / go.mod is the authoritative
import name, so a mass sed across their import statements is the migration
on the consumer side. No external consumers exist today.
Diff shape:
361 *.go files — import path replacement only
2 go.mod — module declaration replacement only
1 binary — deploy/test/f5-mock-icontrol/f5-mock-icontrol rebuilt
so embedded build-info reflects the new path (8618965 vs
8618933 bytes; 32-byte diff is the build-info change)
Total: 364 files, 730 insertions / 730 deletions, net-zero size, pure
mechanical substitution.
Verification:
gofmt: 17 files needed re-alignment after sed (the new path is one char
shorter than the old, so column-aligned import groups drifted). Applied
`gofmt -w` to fix.
go mod tidy: clean exit on both modules.
go vet ./...: clean exit.
go build ./...: clean exit.
go test -short -count=1 on representative packages: all green
(internal/domain, internal/validation, internal/crypto, internal/crypto/signer,
cmd/agent). Test output now reads `ok github.com/certctl-io/certctl/...`
confirming the module path resolves correctly.
binary: f5-mock-icontrol rebuilt; `strings | grep shankar0123` returns
nothing; `strings | grep certctl-io/certctl` shows the new module path
embedded in build-info.
Files intentionally NOT touched in this commit:
README.md / CHANGELOG.md / docs/ / etc. — already swept to certctl-io
URLs in commit
|
||
|
|
a1fc2e7cb3 |
release: v2.0.68 — image registry path moved to ghcr.io/certctl-io
Image registry path changed. Starting this release, container images
publish to `ghcr.io/certctl-io/certctl-server` and
`ghcr.io/certctl-io/certctl-agent`. Existing pulls from
`ghcr.io/shankar0123/certctl-{server,agent}:<tag>` continue to work
for previously-published tags (the registry never deletes images),
but the `:latest` tag at the old path stops moving forward at this
release. Operators must update `docker pull` paths, `docker-compose.yml`
`image:` keys, or Helm `image.repository` values to receive future
updates. Old `git clone` / `git push` / install-script / API URLs
continue to redirect forever — only the container-registry path
changed.
This is the only operator-action-required change in v2.0.68. Other
changes since v2.0.67 are cosmetic URL refreshes after the GitHub
org transfer (shankar0123 → certctl-io, 2026-05-03) and a contextcheck
lint fix in the agent. The release.yml workflow's IMAGE_NAMESPACE env
var was swept to certctl-io as part of the URL refresh, so the next
release auto-pushes to the new ghcr.io path; verified via
`grep -n IMAGE_NAMESPACE .github/workflows/release.yml` showing
`IMAGE_NAMESPACE: certctl-io`.
Adds a top-of-file v2.0.68 entry to CHANGELOG.md as a one-time
migration callout. The existing "no hand-edited per-version changelog"
policy text is preserved below — that policy applies to per-version
entries; this is a one-time critical migration notice that needs to
be visible to operators doing diligence by reading CHANGELOG.md.
|
||
|
|
3be1c0c973 |
docs(README): drop V3 Pro + V4 sections — everything ships free under BSL
Strategic pivot. We are NOT building a V3 Pro paid tier or a V4 cloud / scale tier. Every certctl feature — current and future — ships free under the same BSL 1.1 source-available license. No gated features, no paid edition, no enterprise tier. Future revenue path is a managed-service hosting offering: operator runs the certctl-server control plane as a hosted service; customers self-install only the certctl-agent in their infrastructure. The self-hosted code stays free forever; the managed service sells operational convenience (no PostgreSQL to run, no upgrades, no backups, no SSO setup). BSL 1.1 was already structured around exactly this — the license expressly prevents competitors from running their own commercial certctl-as-a-service against the same source while leaving self-hosting unrestricted. Removed the old roadmap sections: - "### V3: certctl Pro" — Enterprise capabilities for larger deployments are available in the commercial tier. - "### V4+: Cloud & Scale" — Kubernetes cert-manager external issuer, cloud infrastructure targets, extended CA support, and platform-scale features. Replaced with a single "Forward-looking work — all free, all self-hostable" section that names the real engineering tracks (OIDC / SSO / RBAC, NATS / real-time, search / risk scoring, HSM / TPM / FIPS, deeper Vault auth, cloud-managed-target deep integrations, adapter hardening, credential lifecycle expansion) and points at the workspace-level WORKSPACE-ROADMAP.md for the unshipped backlog. The full feature surface lands in V2 over time — V3 / V4 are not real version targets, they were positioning artifacts. Diff: 2 insertions / 5 deletions. README's License section (BSL 1.1 licensing-inquiries footer) is unchanged. |
||
|
|
a26686844c |
fix(agent): thread ctx through createTargetConnector to satisfy contextcheck
CI run #428 (job 74148571711) failed on commit |
||
|
|
bc6039a79e |
chore: sweep github.com/shankar0123/certctl URL refs to certctl-io/certctl
Post-transfer cosmetic + release-critical URL refresh after moving the
repo from github.com/shankar0123/certctl to github.com/certctl-io/certctl
(2026-05-03). GitHub HTTP redirects continue to forward old URLs forever,
so existing operators are not broken — but aligns the canonical
references with the new owner so:
- procurement engineers / contributors browsing the docs see the right
URL on first read
- operators copying the agent install one-liner hit the new path
directly without going through a redirect
- the Helm chart's default image repository points at the canonical org
registry path
- the OnboardingWizard rendered to first-run UI users shows the new
URL in the install snippets and doc anchor links
- the GitHub Actions release workflow pushes container images to
ghcr.io/certctl-io/certctl-{server,agent} (was: shankar0123)
- the release-notes Markdown body in release.yml — which gets stamped
into every future release page — references the post-transfer
cert-identity (cosign keyless signing now uses the certctl-io
workflow URL) and the post-transfer SLSA provenance source-uri.
Without this, every cosign verify / slsa-verifier command on a
v2.1.0+ release would fail because the cert-identity-regexp would
not match the signing identity GitHub Actions OIDC issues post-
transfer. Old releases (v2.0.67 and earlier) keep their immutable
release-notes pointing at the shankar0123 path and remain
verifiable via their own published instructions.
Customer impact:
- Operators on ghcr.io/shankar0123/certctl-{server,agent}:latest
silently freeze on whatever tag was current at transfer time. They
get no errors; they just stop receiving updates. The next release
notes need a one-line callout (Phase 3.1 of cowork/transfer-
certctl-to-org.md) telling them to update their image path to
ghcr.io/certctl-io/certctl-{server,agent}.
- All other URLs (git clone, install one-liner, raw.githubusercontent
URLs, browser links, GitHub API) continue to resolve via permanent
HTTP redirects. The sweep is cosmetic for those.
Files swept (30 total):
.github/workflows/release.yml — IMAGE_NAMESPACE, source-uri,
cosign cert-identity-regexp, IMAGE= snippet (5 refs total).
CHANGELOG.md, README.md — anchor links, badges, install one-liner,
cosign verify snippets in operator-facing sections.
api/openapi.yaml — info / externalDocs URLs.
install-agent.sh — GITHUB_REPO const + systemd unit Documentation=
field.
deploy/ENVIRONMENTS.md, deploy/helm/{CHART_SUMMARY,INDEX,
INSTALLATION,README}.md, deploy/helm/certctl/{Chart.yaml,
README.md,values.yaml}, deploy/helm/examples/values-*.yaml —
chart docs + image repository defaults across dev / prod-ha
overrides.
docs/{certctl-for-cert-manager-users,connector-iis,connectors,
migrate-from-acmesh,migrate-from-certbot,quickstart,test-env,
why-certctl}.md — operator-facing doc URLs.
examples/{acme-nginx,acme-wildcard-dns01,multi-issuer,
private-ca-traefik,step-ca-haproxy}/docker-compose.yml +
examples/step-ca-haproxy/step-ca-haproxy.md — example image:
paths and accompanying narrative.
web/src/pages/OnboardingWizard.tsx — first-run-UI URL refs (curl
install one-liners, agent docker image path, doc anchor links).
Files intentionally NOT swept (Choice A from cowork/transfer-certctl-
to-org.md):
go.mod, go.sum — module declaration stays github.com/shankar0123/
certctl. Existing imports compile because Go uses the path
declared in go.mod, not the URL it was fetched from. Internal-
only project; no external Go consumers; rename will land as a
mechanical sed when one materializes.
~250 *.go files — every import remains github.com/shankar0123/
certctl/internal/...
deploy/test/f5-mock-icontrol/go.mod — separate test sub-module;
same Choice A logic; module path stays.
Files intentionally NOT swept (other reasons):
README.md lines 244-245 — Scarf-pixel docker-pull commands.
shankar0123.docker.scarf.sh/... is a Scarf-account hostname
(per-user, not per-repo) and the pixel keeps tracking pulls
against the operator's personal Scarf account. Migrating to a
certctl-io Scarf account is a separate decision (create org
Scarf account → re-create package → update README).
deploy/test/f5-mock-icontrol/f5-mock-icontrol — checked-in
compiled binary with shankar0123/certctl baked into Go build
info via the sub-module path. Out of scope for a URL sweep;
will refresh on the next `make test-integration` rebuild.
Verification:
gofmt: clean (no .go files touched).
go vet ./...: clean (verified at this SHA in 1.3 of the transfer
checklist; no .go changes since).
go build ./...: clean (same).
go test -short on representative packages: green (same).
Diff shape: 30 files, 74 insertions / 74 deletions, net-zero size,
pure URL substitution.
|
||
|
|
502823dfdf |
ci(go.mod): fix go mod tidy drift after Rank 5 cloud-target commits
CI failed at the "go mod tidy drift" gate on commit |
||
|
|
89b6d71b72 |
docs, seed: cloud-target operator runbook + AWS ACM / Azure KV demo seed rows
Wraps up Rank 5 of the 2026-05-03 Infisical deep-research deliverable (commits |
||
|
|
14fcc82cda |
target(azurekv): SDK-driven Azure Key Vault target connector
Closes Rank 5 (Azure half) of the 2026-05-03 Infisical deep-research deliverable (cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl had no path to deploy certs to Azure-managed TLS- termination endpoints (Application Gateway / Front Door / App Service / Container Apps) — operators terminating TLS at Azure had to use manual `az keyvault certificate import` invocations or external automation. This commit lands the SDK-driven Azure Key Vault target connector that closes the gap, mirroring the AWS ACM target shape shipped in commit |
||
|
|
54033aa3aa |
target(awsacm): SDK-driven AWS Certificate Manager target connector
Closes Rank 5 (AWS half) of the 2026-05-03 Infisical deep-research
deliverable (cowork/infisical-deep-research-results.md Part 5).
Pre-fix, certctl had no path to deploy certs to AWS-managed TLS-
termination endpoints (ALB / CloudFront / API Gateway / App Runner)
— operators terminating TLS at AWS had to use Infisical secret-sync,
manual aws-cli imports, or external automation. This commit lands
the SDK-driven AWS Certificate Manager target connector that closes
the gap end-to-end.
Architecture:
- internal/connector/target/awsacm/awsacm.go — Connector wraps
*acm.Client behind the ACMClient interface seam (mirrors
awsacmpca's ACMPCAClient pattern from the issuer side).
LoadDefaultConfig handles the standard AWS credential chain
(IRSA / EC2 instance profile / SSO / env vars); no embedded
creds in connector Config.
- Pre-deploy snapshot via DescribeCertificate + GetCertificate so
on-import-failure rollback restores the previous cert. Mirrors
the Bundle 5 IIS pattern + the Bundle 7/8 WinCertStore /
JavaKeystore patterns. Surfaces rollback success/failure via
the existing certctl_deploy_rollback_total Prometheus counter
label set.
- Provenance tags: certctl-managed-by=certctl + certctl-
certificate-id=<mc-id> set automatically on every import. ACM
strips tags on re-import, so the connector calls
AddTagsToCertificate post-import to keep the provenance pair
fresh. Operators looking up a cert ARN by managed-cert ID
(Terraform data source, CloudFormation output) match against
these tags.
- DeploymentRequest.KeyPEM held in agent memory only — never
written to disk. Aligns with the pull-only deployment model
documented in CLAUDE.md.
Tests:
- awsacm_test.go: 15-subtest happy-path + validation matrix
covering ValidateConfig (success / missing-region / malformed-
region / malformed-ARN / reserved-tag rejection),
DeployCertificate (fresh import / rotate-in-place / rollback-
on-serial-mismatch / rollback-also-fails / empty-key-rejected /
no-client-rejected), ValidateOnly (returns sentinel),
ValidateDeployment (serial match / mismatch / no-ARN-yet).
- awsacm_failure_test.go: 5 per-error-class contract tests
mirroring the awsacmpca_failure_test.go shape (commit
|
||
|
|
6af95ccf5f |
notifications: per-policy multi-channel expiry-alert routing
Closes Rank 4 of the 2026-05-03 Infisical deep-research deliverable
(see cowork/infisical-deep-research-results.md Part 5). Pre-fix,
RenewalService.CheckExpiringCertificates already ran daily,
RenewalPolicy.AlertThresholdsDays drove per-cert thresholds, and
NotificationService.SendThresholdAlert deduped per (cert, threshold)
— but the channel was hardcoded to Email
(internal/service/notification.go:118 pre-fix). Operators who
configured PagerDuty / Slack / Teams / OpsGenie via
CERTCTL_PAGERDUTY_ROUTING_KEY etc. got nothing at any threshold
unless SMTP was also wired. Their first signal of an expired cert
was a 3 AM outage.
This commit lands the routing matrix on top of the existing
infrastructure:
1. RenewalPolicy gains AlertChannels (per-tier channel list) +
AlertSeverityMap (per-threshold tier assignment) +
EffectiveAlertChannels / EffectiveAlertSeverity accessors.
Default*() helpers preserve the back-compat Email-only
behaviour for operators who haven't touched their policies
post-upgrade. Migration 000026 adds the JSONB columns
idempotently.
2. NotificationService.SendThresholdAlertOnChannel — the new
per-channel dispatch helper. Old SendThresholdAlert stays as
an Email-only alias so non-policy callers (admin "send test
alert" surfaces) keep working byte-for-byte.
3. NotificationService.HasThresholdNotificationOnChannel — per-
(cert, threshold, channel) deduplication so a transient
PagerDuty 5xx today does NOT suppress today's Slack alert and
tomorrow's PagerDuty retry will still fire.
4. RenewalService.sendThresholdAlerts walks the resolved channel
set per threshold tier, fans out to every configured channel,
handles per-channel failures independently, defensively drops
off-enum channels with an audit row trail, and records a per-
channel audit event with metadata.channel + metadata.severity_tier.
5. service.ExpiryAlertMetrics — atomic counter table mirrored on
the VaultRenewalMetrics shape from the 2026-05-03 audit fix #5
(commit
|
||
|
|
5fd1e71477 |
ci(googlecas): fix QF1002 staticcheck — tagged switch on r.URL.Path
CI failure on commit
|
||
|
|
8f89266c48 |
docs(openssl): operator playbook for shell-out threat model
Closes Top-10 fix #6 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the OpenSSL adapter's docs in docs/connectors.md explained usage but did NOT enumerate the threat model. The adapter exec's an arbitrary operator-supplied script — env-var inheritance, symlink attacks, sandbox-escape, multi-tenant process-isolation gaps. An acquirer's security reviewer reading this surface cold pattern-matches "highest-risk issuer surface with the lowest documented threat model." This commit lands a doc-side operator playbook in docs/connectors.md OpenSSL section (mirrors Bundle 8's "Operator playbook: keytool argv password exposure" subsection shape and the 2026-05-02 audit Top-10 fix #7 SSH InsecureIgnoreHostKey playbook). Six topics covered: 1. Why the adapter exists despite the risk (CLI-driven CAs without Go SDKs need an integration path). 2. Threat model the adapter accepts (trusted operator + trusted script + appropriate ownership + clear audit trail). 3. Threat model the adapter does NOT accept (operator-writable script paths, untrusted content, multi-tenant hosts). 4. Mitigations operators can layer (dedicated user, root-owned 0755 binary, audit rules, per-call timeout via CERTCTL_OPENSSL_TIMEOUT_SECONDS, env sanitisation, chroot/container, audit wrapper, per-call concurrency bound). 5. When NOT to use the adapter (compliance environments, multi-tenant servers, no-script-review environments). 6. V3-Pro forward path (hardened mode tracked in cowork/WORKSPACE-ROADMAP.md). Inline comment in internal/connector/issuer/openssl/openssl.go near the callSignScript exec call site forward-references the new doc subsection (no logic change). cowork/WORKSPACE-ROADMAP.md gains an "OpenSSL hardened mode" V3- Pro entry under "Adapter hardening" — sibling-folder doc, not in the certctl repo, so not reflected in this commit's diff. Same shape Bundle 8 used for the JavaKeystore playbook and the 2026-05-02 deployment-target audit Top-10 fix #7 used for the SSH InsecureIgnoreHostKey playbook. No code logic changes (only the explanatory comment near the exec call site). No test changes. Doc-only commit. Verified locally: - gofmt / go vet clean. - go test -short -count=1 ./internal/connector/issuer/openssl/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #6. |
||
|
|
ceca3647eb |
vault: add automatic token renewal at TTL/2 + Prometheus metric
Closes Top-10 fix #5 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the VaultPKI adapter authenticated with a static token and never called renew-self. Long-lived deploys hit token expiry; the first operator-visible signal was failed cert renewals on production targets. This commit: 1. Connector.Start(ctx) spawns a goroutine that calls POST /v1/auth/token/renew-self at TTL/2 cadence (computed from a one-shot lookup-self at startup). Honours ctx.Done() for graceful shutdown via a per-loop done channel + Stop(). 2. On `renewable: false` response (initial lookup OR any subsequent renewal), the loop emits a WARN, increments the not_renewable counter, and exits. The operator must rotate the token before Vault's Max TTL elapses. 3. New Prometheus counter certctl_vault_token_renewals_total with labels result={success,failure,not_renewable}. Registered alongside existing certctl_issuance_* counters in internal/api/handler/metrics.go. 4. ERROR-level logging on renewal failure with operator-actionable substring ("vault token renewal failed; rotate the token before TTL expires") so journalctl + grep find it. Loop keeps ticking after a failure — transient blips don't kill it. New optional issuer.Lifecycle interface: type Lifecycle interface { Start(ctx context.Context) error Stop() } Connectors that hold no background goroutines (almost all of them) do not implement this — IssuerRegistry.StartLifecycles / StopLifecycles feature-detect via type assertion. New lifecycle-bearing connectors plug in by implementing the interface; no further registry plumbing required. Wiring (cmd/server/main.go): - service.NewVaultRenewalMetrics() instance is shared between issuerRegistry.SetVaultRenewalMetrics (so Vault connectors built by Rebuild get a recorder) and metricsHandler.SetVaultRenewals (so the Prometheus exposer emits the new series). - issuerRegistry.StartLifecycles(ctx) is called after issuerService.BuildRegistry; defer issuerRegistry.StopLifecycles is paired so goroutines exit cleanly on signal. - IssuerConnectorAdapter.Underlying() exposes the wrapped issuer.Connector so registry-level machinery can reach the concrete connector behind the adapter without duplicating the wiring at every call site. Tests (internal/connector/issuer/vault/vault_renew_test.go): - TestVault_RenewLoop_TickAtHalfTTL — three ticks → three renewals, all "success". - TestVault_RenewLoop_StopsOnNotRenewable — second renewal returns renewable=false, loop exits, third tick fires no HTTP call. - TestVault_RenewLoop_FailureSurfacesViaMetric — first renewal 403 bumps "failure", second renewal succeeds → loop kept ticking. - TestVault_RenewLoop_CtxCancellation_StopsCleanly — Stop returns within 200ms after ctx cancel. - TestVault_RenewLoop_StartsNothingWhenNotRenewable — token already non-renewable at boot ⇒ no goroutine, "not_renewable" metric increments at startup so operators see it in Grafana. - TestVault_ComputeInterval — 4 cases pinning TTL/2 + minRenewInterval floor. - TestVault_RenewSelf_ParseFailure_NamesActionableInError — surfaced error contains "vault token renewal failed" + "rotate the token". Cadence is dynamic — every successful renewal re-derives TTL/2 from the renewed lease's lease_duration, so a short bootstrap token that gets renewed up to a longer Max TTL shifts to the longer cadence automatically (defends against degenerate fast ticking on a token whose Max TTL is far longer than its initial TTL). Documentation: - docs/connectors.md Vault PKI section gains "Token TTL + automatic renewal" subsection (operator-facing: cadence, metric, renewable=false rotation playbook). Out of scope (intentional, flagged in the audit follow-up): - AppRole / Kubernetes / AWS IAM auth methods (different renewal semantics). - Hot-reload of rotated token from disk (operator restarts today; future: GUI/MCP issuer-update path triggers Rebuild which Stops the old connector and Starts the new one). - Auto-re-auth after token death (operator playbook owns it). CHANGELOG.md is intentionally not hand-edited (per CHANGELOG.md itself: "no longer maintains a hand-edited per-version changelog; per-release notes are auto-generated from commit messages between consecutive tags"). Verified locally: - gofmt clean. - go vet ./internal/service/... ./internal/api/handler/... ./internal/connector/issuer/vault/... ./cmd/server/... clean. - go test -short -count=1 ./internal/connector/issuer/vault/... ./internal/service/... ./internal/api/handler/... green. - go test -race -count=10 -run 'TestVault_RenewLoop|TestVault_ComputeInterval' ./internal/connector/issuer/vault/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #5. |
||
|
|
60dce0bf10 |
googlecas, awsacmpca: add failure_test.go covering cloud-SDK error contracts
Closes Top-10 fix #4 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, both adapters had only happy-path test coverage with a single generic ServerError pair each. Cloud CAs are typically the first-deployed issuer in enterprise pilots; their diligence reviews dig hard into IAM-error / cloud-error coverage. This commit lands the contract tests. AWSACMPCA — 5 tests in awsacmpca_failure_test.go. Each injects a typed AWS SDK v2 error via the existing mockACMPCAClient seam and asserts (1) error non-nil, (2) errors.As against the SDK's typed value succeeds (so the wrap chain through fmt.Errorf("...%w", ...) is intact), and (3) operator-actionable substring is present. 1. Issue_AccessDenied — *smithy.GenericAPIError with Code="AccessDeniedException" (the SDK does NOT generate a typed *types.AccessDeniedException; AWS uses the smithy APIError shape for IAM denials). Asserts ErrorCode + "not authorized" + IAM resource path preserved through wrap. 2. Issue_ResourceNotFound — *types.ResourceNotFoundException names the missing CA ARN. 3. Issue_Throttling — *smithy.GenericAPIError with Code="ThrottlingException", Fault=FaultServer. Asserts the retryable class (FaultServer) is preserved through wrap so upstream retry logic can engage. 4. Issue_MalformedCSR — *types.MalformedCSRException is terminal (operator must fix the CSR, not retry); asserts the validation-issue substring survives. 5. Issue_RequestInProgress — *types.RequestInProgressException wraps cleanly; classification (retry vs reissue) is upstream's responsibility per the spec's "no new retry logic" rule. GoogleCAS — 5 tests in googlecas_failure_test.go. The adapter uses stdlib net/http directly (NO Google Cloud Go SDK dependency in googlecas.go), so SDK typed-error assertions don't translate. Each test runs an httptest.Server that returns the canonical Google API JSON error envelope: {"error":{"code":N,"message":"...","status":"<STATUS>"}} and asserts (1) error non-nil, (2) operator-actionable substring, and (3) the canonical status string ("PERMISSION_DENIED", "NOT_FOUND", "UNAVAILABLE") survives the wrap chain so upstream classification can branch on it. 1. Issue_PermissionDenied — 403 / PERMISSION_DENIED; surfaced error names the IAM resource path. 2. Issue_CAPoolNotFound — 404 / NOT_FOUND; surfaced error names the missing pool resource. 3. Issue_OAuth2TokenRefreshFailure — token endpoint returns 401 invalid_grant; surfaced error mentions "token" so an operator reading the log immediately distinguishes a credential failure (rotate SA key) from a CA-side error (fix IAM binding). Test also asserts the CAS endpoint is NOT reached when the token exchange fails. 4. Issue_RegionalAPIUnavailable — 503 / UNAVAILABLE; surfaced error preserves the retryable class markers (status code + UNAVAILABLE string) for upstream retry classification. 5. Revoke_PermissionDenied — adapter does NOT silently swallow the failure; pin the contract so the audit-row atomicity guarantee from Bundle G (which lives in the service-layer wrapper, not the adapter) continues to apply. Test also verifies the revoke endpoint was actually reached, guarding against a future regression that short-circuits before the HTTP call. Coverage delta: awsacmpca: 71.0% → 71.0% (failure tests reuse existing wrap code paths; behaviour-pin contract tests, not coverage tests). googlecas: 83.4% → 84.4% (+1.0pp). go.mod: smithy-go moved indirect → direct, since the new AWSACMPCA test file imports it. CI's go-mod-tidy-drift gate enforces this. Test-only commit. No production code changes. Verified locally: - gofmt clean. - go vet ./internal/connector/issuer/awsacmpca/... ./internal/connector/issuer/googlecas/... clean. - go test -short -count=1 ./internal/connector/issuer/... green. - go test -race -count=10 ./internal/connector/issuer/awsacmpca ./internal/connector/issuer/googlecas green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #4. |
||
|
|
c5db41d3f0 |
openssl: add failure_test.go covering 6 shell-out error modes
Closes Top-10 fix #3 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the OpenSSL adapter (497 LOC, certctl's highest-risk issuer surface) had openssl_test.go (8 happy-path funcs + 20 subtests) but no dedicated _failure_test.go. Compare to ACME, Vault, DigiCert, Sectigo, Entrust, GlobalSign, EJBCA — all peers have one. An acquirer's diligence team flags this as an immediate blocker on the highest-risk issuer surface. This commit adds 6 failure-mode tests: 1. TestOpenSSL_Issue_ScriptNotFound_OperatorActionableError — SignScript path doesn't exist; error wraps os.ErrNotExist (errors.Is); message contains 'no such file' / 'not found' so the operator's grep finds it in journalctl. 2. TestOpenSSL_Issue_PermissionDenied_OperatorActionableError — SignScript exists with mode 0o600 (non-executable); error wraps os.ErrPermission; message contains 'permission'. Skipped under root (uid 0 bypasses chmod gating). 3. TestOpenSSL_Issue_MalformedStdout_DistinguishedFromCSRReject — script exits 0 + writes garbage (no PEM markers) to the cert output file; error mentions PEM/certificate/parse so operators distinguish output-parsing failure from a script- side fault. 4. TestOpenSSL_Issue_NonZeroExit_DistinguishesCAReject_From_ ScriptError — script writes 'policy violation: …' to stderr and exits 2 (CA-side rejection convention); the script's stderr surfaces in the error message; errors.Unwrap returns non-nil (proving the underlying *exec.ExitError chain survives). 5. TestOpenSSL_Issue_TimeoutEnforced_ContextCancellationPropagates — script does 'exec sleep 30' (not 'sleep 30 ' as a child; exec replaces bash so SIGKILL goes directly to the sleeper, avoiding the orphan-pipes corner case where a killed bash leaves sleep holding stdout/stderr open and CombinedOutput blocks); ctx with 100ms deadline; call returns within ~5s wall-clock; either errors.Is(err, context.DeadlineExceeded) or the error message names 'killed' / 'signal'. 6. TestOpenSSL_Issue_SignalKilled_PartialOutputDiscarded — script writes a half-PEM ('-----BEGIN CERTIFICATE-----\nMII…') then 'kill -KILL $$'; assertion: result is nil OR CertPEM is empty (no half-cert leaks to caller); error names 'signal' / 'killed' OR 'PEM' / 'parse' (both are operator-actionable). Each test pins the operator-actionable error message contract: the message names the failure mode (so journalctl + grep find it) and proves no half-state was created (no partial cert returned). errors.Is / errors.Unwrap checks confirm the wrapping chain survives. The OpenSSL adapter has no commandRunner abstraction (production code uses exec.CommandContext directly); these tests use real operator-supplied scripts written to t.TempDir (matches the adapter's actual production code path; no os/exec mocking). The 'exec sleep 30' technique in Test 5 is the load-bearing fix for the bash-orphans-sleep-and-pipes-stay-open corner case that otherwise makes the test take 30s instead of 100ms. Coverage delta: - Before this commit: openssl_test.go + openssl_stubs_test.go covered 8 happy-path funcs. - After: 79.8% statement coverage of openssl.go (up from operator-pre-existing baseline; the 6 new tests exercise every error path through callSignScript + parseCertificate). Tests pass clean under '-race -count=10' (Test 5's deadline tolerance is the only timing-sensitive case; the 5s wall-clock budget vs the 100ms ctx deadline gives ample slack on slow CI without masking deadline-not-enforced bugs). Test-only commit; no production code changes. Hardening fixes (per-call concurrency semaphore, threat-model docs) are separate Top-10 entries. Verified locally: - gofmt clean across the repo. - go vet ./... clean across the repo. - go test -race -count=10 -short ./internal/connector/issuer/openssl/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/ RESULTS.md Top-10 fix #3. |
||
|
|
ece15cb457 |
vault, digicert: migrate Token / APIKey to *secret.Ref (Bundle I Phase 3)
Closes Top-10 fix #2 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, vault.Config.Token and digicert.Config.APIKey were plain string fields. Practical impact: 1. GET /api/v1/issuers responses marshalled the credential into the JSON body. An acquirer's procurement engineer running 'curl /api/v1/issuers | jq' saw the token / API key in plain text on screen. 2. DEBUG-level HTTP request logging printed the credential header verbatim. 3. A heap dump of the running server contained the credential as readable bytes for the lifetime of the process. Bundle I from the 2026-05-01 audit closed this for AWSACMPCA, EJBCA, GlobalSign, Sectigo (Phase 1+2). Vault and DigiCert were left out. This commit ports the same migration onto them. Mechanics: - Config.Token / Config.APIKey type changed from 'string' to '*secret.Ref'. UnmarshalJSON of a JSON string populates the Ref via NewRefFromString — operator config files are unchanged. - Every header-write call site routed through Ref.Use, with the byte buffer zeroed after the callback returns. Vault: 3 sites (IssueCertificate, RevokeCertificate, GetCACertPEM). DigiCert: 5 sites (ValidateConfig, IssueCertificate, RevokeCertificate, pollOrderOnce, downloadCertificate). - ValidateConfig nil-checks switch from 'cfg.Token == ""' to 'cfg.Token.IsEmpty()' (mirrors Sectigo's existing pattern). - Tests migrated: every Config{Token:"..."} → Config{Token: secret.NewRefFromString("...")}. The 'json.Marshal(config) → ValidateConfig(rawConfig)' round-trip pattern in DigiCert's ValidateConfig_Success test is now broken by the redact-on-marshal contract — switched that one to construct the rawConfig as a JSON literal (mirrors Sectigo's existing test pattern). - Two new tests pin the redact-on-marshal contract: - TestVault_Config_TokenMarshalsAsRedacted (vault_redact_test.go) - TestDigiCert_Config_APIKeyMarshalsAsRedacted (digicert_redact_test.go) Both assert the marshaled JSON contains '"[redacted]"' and does NOT contain the plaintext bytes. Operator-visible: GET /api/v1/issuers responses for type=vault and type=digicert now show the credential as '[redacted]'. Existing config files keep working — the Ref unmarshal accepts strings. CHANGELOG note: certctl/CHANGELOG.md is intentionally not hand-edited; release notes are auto-generated from commit messages between consecutive tags. This commit's message body is the release-note artifact. Verified locally: - gofmt clean across the repo. - go vet ./... clean across the repo. - go test -race -count=1 -short ./internal/connector/issuer/vault/... ./internal/connector/issuer/digicert/... ./internal/secret/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/ RESULTS.md Top-10 fix #2. |
||
|
|
88e7d0c17b |
ejbca: port mTLS keypair to mtlscache (close Bundle M for the last issuer)
Closes Top-10 fix #1 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, ejbca.go::New called tls.LoadX509KeyPair once at construction and configured the keypair into *http.Transport.TLSClientConfig with no mtime watch. mTLS rotation required a server restart — quarterly rotation per any reasonable security policy = quarterly deploy outage. Bundle M from the prior 2026-05-01 audit shipped the mtlscache helper at internal/connector/issuer/mtlscache/cache.go and wired it into Entrust + GlobalSign. EJBCA was missed in Bundle M's scope. This commit ports the same helper onto EJBCA's auth_mode=mtls path. The OAuth2 path is unchanged. Implementation: - New imports internal/connector/issuer/mtlscache. - Connector struct gains an mtls *mtlscache.Cache field (mirroring Entrust + GlobalSign). - New()'s case 'mtls': replaces tls.LoadX509KeyPair + manual *http.Transport with mtlscache.New(certPath, keyPath, Options{HTTPTimeout: 30s}). Cache build happens at construction so misconfigured operators fail fast (matches pre-fix behaviour). - New helper getHTTPClient() returns the cached client; on the mTLS path it calls RefreshIfStale before returning so the next request uses the new keypair if disk has rotated. On OAuth2 / test paths (c.mtls == nil), returns c.httpClient as-is. - All 3 c.httpClient.Do call sites (IssueCertificate enroll, RevokeCertificate revoke, GetOrderStatus cert lookup) replaced with c.getHTTPClient() + client.Do. - crypto/tls import removed (no longer used at this layer). Tests: - TestEJBCA_MTLSKeypairRotation_PicksUpNewCertWithoutRestart (new, ejbca_mtls_rotation_test.go): generates two CAs (caA, caB), signs leafA + leafB, spins up an httptest TLS server that trusts both CAs and records the issuer DN of every presented client cert, writes leafA, makes request 1, writes leafB + advances mtime by 2s, makes request 2. Asserts the server saw caA's DN on req 1 and caB's DN on req 2 — the cache picked up the rotation without ejbca.New re-running. - export_test.go: GetHTTPClientForTest helper exposes the private getHTTPClient so the rotation test drives the production code path. - All existing EJBCA tests still pass (TestNew_MTLSWiresClientCert, TestNew_MTLSCertLoadFailure, TestNew_OAuth2NoTransportTuning, TestNew_InvalidAuthMode). Verified locally: - gofmt clean across the repo. - go vet ./... clean across the repo. - go test -race -count=1 -short ./internal/connector/issuer/ejbca/... ./internal/connector/issuer/mtlscache/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/ RESULTS.md Top-10 fix #1. |
||
|
|
340df70abd |
docs(acme-server): operator-facing reference + threat model + cert-manager walkthrough (Phase 6/7)
Doc-only commit closing the ACME-server work series. After this commit,
an outside reviewer (procurement engineer / Venafi diligence engineer /
Infisical-comparison-shopper) can read the docs cold, understand the
ACME server's surface, follow the cert-manager walkthrough, and reach
a deployment decision without escalating to certctl maintainers.
What ships:
- docs/acme-server.md final pass: Auth-mode decision tree (when to
use trust_authenticated vs challenge), RFC 8555 + RFC 9773
conformance statement (section-by-section table of implemented
plus procurement-honest 'not implemented' rows for EAB / multi-
level wildcards / RFC 8738 / cross-CA proxying), Troubleshooting
(5 failure modes — badNonce / unknownAuthority / HTTP-01
connection refused / DNS-01 NXDOMAIN / rejectedIdentifier with
canonical fix for each), Version pinning + tested clients table
(cert-manager 1.15.0, lego v4, kind v0.20+, Caddy 2.7.x, Traefik
3.0+), FAQ (5 entries — why two auth modes, vs cert-manager-
against-LE, can-I-use-from-outside-K8s, migration story, audit-
log catalog), See-also cross-link block.
- docs/acme-cert-manager-walkthrough.md: kind → cert-manager →
certctl → Certificate flow, with YAML blocks byte-equal to
deploy/test/acme-integration/{clusterissuer-trust-authenticated,
certificate-test}.yaml to prevent doc/test drift.
- docs/acme-caddy-walkthrough.md: Caddyfile acme_ca + tls.cas
options (OS trust store + Caddy pki.ca block).
- docs/acme-traefik-walkthrough.md: certificatesResolvers.<name>.acme
.caServer + serversTransport.rootCAs configuration.
- docs/acme-server-threat-model.md: Threat surface map + JWS forgery
resistance (alg-confusion / HS256 substitution / replayed nonce /
URL spoofing / multi-sig / kid-vs-jwk / kid round-trip mismatch),
Nonce store integrity rationale, HTTP-01 SSRF defense-in-depth
(pre-dial check + per-dial check + per-redirect check + body cap +
bounded redirects), DNS-01 cache-poisoning posture (default Google
Public DNS + operator-owns-private-resolver-posture), TLS-ALPN-01
chain-not-validated rationale (RFC 8737 §3 explicit), Rate-limit
tuning, Audit trail catalog, Out-of-scope threats list.
- docs/connectors.md: TOC renumbered 3→4 etc. to make room for new
top-level 'ACME Server (Built-in)' section between Issuer Connector
and Target Connector — distinguishes the consumer-side ACME
(existing) from the new server-side ACME via env-var-prefix
call-out (CERTCTL_ACME_* vs CERTCTL_ACME_SERVER_*).
DoD verification:
- All 5 docs files exist with the structure prescribed by the
Phase 6 prompt.
- Every CERTCTL_ACME_SERVER_* env var in docs/acme-server.md maps
to an actual lookup in internal/config/config.go (verified by
'grep -oE | sort -u | diff' returning empty).
- Every YAML snippet in docs/acme-cert-manager-walkthrough.md is
byte-equal to the corresponding file in deploy/test/acme-integration/
(verified with 'diff' against awk-extracted YAML blocks).
- docs/connectors.md has the cross-link subsection with all 4 new
docs referenced.
- cowork/CLAUDE.md Architecture Decisions has the new ACME-server
bullet documenting per-profile URL family + per-profile
acme_auth_mode + Phase 4-5-6 progression.
- cowork/WORKSPACE-CHANGELOG.md has the ACME-Server-6 entry plus
the ACME-Server rollup spanning Phases 1a-6.
- cowork/infisical-deep-research-results.md Rank 1 marked SHIPPED.
- 'gofmt -l .' clean (no Go changes); 'go vet ./...' clean.
Acquisition-readiness: every one of the 12 acquisition-grade criteria
from cowork/acme-server-endpoint-prompt.md is verified by the test
suite (Phases 1a-5) plus this doc walkthrough (Phase 6). The full
RFC 8555 + RFC 9773 surface is live; the operator can deploy
end-to-end by reading one walkthrough doc and one env-var table.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-6 (docs)'
+ ACME-Server rollup of all 6 phases.
|
||
|
|
876b937e47 |
acme-server: cert-manager integration test + production hardening (Phase 5/7)
Closes the production-readiness loop on the ACME surface. After this
commit, certctl ships per-account rate limits + a GC sweeper for
expired ACME state + a kind-driven cert-manager 1.15 integration test
+ a lego-driven RFC conformance harness + a k6 loadtest scenario for
the unauthenticated ACME path.
Architecture:
- Rate limits live in-memory + per-replica. Restart wipes the
counters; orders/hour caps are eventual-consistency anyway. A
3-replica certctl-server fleet behind an LB effectively has 3x
the configured throughput per account; persistent rate limiting
is a follow-up if production telemetry shows abuse patterns we
can't catch in a single restart cycle. Per-key + per-action
isolation: ActionNewOrder/acc-1, ActionKeyChange/acc-1, and
ActionChallengeRespond/<challenge-id> are independent buckets.
- GC loop follows the existing scheduler-loop pattern (atomic.Bool
+ sync.WaitGroup; see crlGenerationLoop for shape). Three
independent SQL sweeps per tick (DELETE expired nonces; UPDATE
pending authzs whose expires_at < now() to expired; UPDATE
pending/ready/processing orders whose expires_at < now() to
invalid). Each sweep is a single statement; failures are logged-
and-continued so a failing nonces sweep doesn't block authzs.
Per-sweep 1m timeout bounds a stuck Postgres.
- cert-manager integration test is gated on KIND_AVAILABLE so CI
skips it cleanly (kind is too heavy for per-PR). Operators run
locally via 'make acme-cert-manager-test'; the harness brings up
a fresh cluster each run + tears it down on Cleanup.
- lego conformance harness drives a real ACME client through
register → run → cert-PEM-landed against a hermetic certctl
stack. Catches RFC-shape regressions third-party clients would
hit before they ship.
- k6 ACME-flow scenario hammers the unauthenticated surface
(directory + new-nonce + ARI synthetic-id) at 100 VUs × 5m. JWS-
signed flows are out of scope for k6 (no JWS support); they're
covered by the lego harness above.
What ships:
- internal/api/acme/ratelimit.go (+ ratelimit_test.go: 7 cases —
disable-when-perHour-zero, capacity, per-key isolation, per-
action isolation, refill-over-time, RetryAfter, concurrent-access
with -race + 200 goroutines × 200 calls).
- internal/repository/postgres/acme.go: 4 new methods —
CountActiveOrdersByAccount + GCExpiredNonces + GCExpireAuthorizations
+ GCInvalidateExpiredOrders. Each a single SQL statement.
- internal/service/acme.go: SetRateLimiter + GarbageCollect +
rate-limit gates at 3 entry points (CreateOrder + RotateAccountKey
+ RespondToChallenge) + concurrent-orders gate at CreateOrder.
2 new sentinels (ErrACMERateLimited, ErrACMEConcurrentOrdersExceeded);
5 new GC metrics (gc_runs / gc_run_failures / gc_nonces_reaped /
gc_authzs_expired / gc_orders_invalidated).
- internal/scheduler/scheduler.go: ACMEGarbageCollector interface +
acmeGCRunning atomic.Bool + acmeGCInterval + 2 setters (SetACME-
GarbageCollector + SetACMEGCInterval) + acmeGCLoop following the
crlGenerationLoop shape.
- internal/api/handler/acme.go: writeServiceError gains rateLimited
(429 + RFC 8555 §6.7) + concurrent-orders-exceeded mappings.
- internal/config/config.go: 5 new env vars
(CERTCTL_ACME_SERVER_RATE_LIMIT_ORDERS_PER_HOUR=100,
CERTCTL_ACME_SERVER_RATE_LIMIT_CONCURRENT_ORDERS=5,
CERTCTL_ACME_SERVER_RATE_LIMIT_KEY_CHANGE_PER_HOUR=5,
CERTCTL_ACME_SERVER_RATE_LIMIT_CHALLENGE_RESPONDS_PER_HOUR=60,
CERTCTL_ACME_SERVER_GC_INTERVAL=1m).
- cmd/server/main.go: NewRateLimiter() + SetRateLimiter() at
startup; conditional SetACMEGarbageCollector(acmeService) +
SetACMEGCInterval(cfg.ACMEServer.GCInterval) when Enabled+
GCInterval > 0.
- deploy/test/acme-integration/: kind-config.yaml + cert-manager-
install.sh + clusterissuer-trust-authenticated.yaml +
clusterissuer-challenge.yaml + certificate-test.yaml + conformance-
lego.sh + certmanager_test.go (//go:build integration + KIND_AVAILABLE
gate).
- deploy/test/loadtest/k6/acme_flow.js + README ACME-flows section.
- Makefile: 2 new PHONY targets (acme-cert-manager-test +
acme-rfc-conformance-test).
- docs/acme-server.md: status flipped to Phase 5; Configuration
table grows 5 rows; new 'Phase 5 — operational guidance' section
explaining rate-limit math + GC sweeper semantics + cert-manager
integration + lego conformance + k6 baseline.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./internal/...' green across every
affected package (service / acme / handler / scheduler / repo /
config).
- 'go vet -tags=integration ./deploy/test/acme-integration/' clean
(the integration test compiles cleanly with the build tag).
- The kind/cert-manager harness is gated behind KIND_AVAILABLE so
CI skips by default; operators run locally via 'make acme-cert-
manager-test'.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-5'.
|
||
|
|
367101ef7e |
deps(web): upgrade vite ^8.0.0 → ^8.0.10 (3 Dependabot alerts)
Closes Dependabot alerts #12 (CVE — arbitrary file read via Vite dev server WebSocket), #13 (CVE-2026-39364 — server.fs.deny bypassed with ?raw / ?import&raw / ?import&url&inline query suffixes), and #14 (path traversal in optimized-deps .map handling). All three live in the vite DEV server only — vite build (production output) is unaffected. All three share the same advisory range '>= 8.0.0, <= 8.0.4' → fixed in 8.0.5; npm picked the latest 8.x patch (8.0.10). Real-world exposure for certctl was low: web/package.json's 'dev: vite' script has no --host flag, so the default binding is localhost (127.0.0.1). Devs who manually run 'vite --host' for cross-machine testing were exposed to the same-LAN attack vector; this closes it. Manifest change: bumped the constraint from '^8.0.0' to '^8.0.10' to document the security floor in package.json itself (the caret already permitted 8.0.10, but pinning the floor higher prevents an accidental downgrade if a future 'npm install' somehow re-resolves to a vulnerable 8.0.0-8.0.4). Lockfile change: 17 packages removed + 18 changed — mostly transitive vite-internal modules (rolldown, oxc-* etc.) that shifted around between 8.0.0 and 8.0.10. Verified locally: - 'npm install vite@^8.0.5 --save-dev' completed cleanly. - 'vite build' produces the same web/dist/ output (668 modules transformed, 35.30 kB CSS / 918.04 kB JS — same shape as pre- upgrade). - vitest run wasn't completed in the sandbox (test runner hung in the disk-pressure environment); CI will run it on push. Engineering history: this is a cross-cutting deps bump that lives outside the ACME-Server-N phase plan. |
||
|
|
397d5665b4 |
fix: collapse identical if/else branches in Account handler (CodeQL #25)
CodeQL alert #25 (go/duplicate-branches) on internal/api/handler/
acme.go::ACMEHandler.Account flagged that 'if readOnly { ... } else
{ ... }' had byte-identical bodies — both setting the same
Content-Type: application/json header. The 'readOnly' bool was
threaded through the function as a placeholder for differentiated
headers (Cache-Control etc. on the POST-as-GET path) that never
landed; both branches collapsed to the same value with no
follow-through.
Audit + fix:
- The alert is real (verified by re-reading the source); not a
false positive.
- The Copilot Autofix Anthropic surfaced was correct in spirit but
incomplete: it collapsed the if/else but left 'readOnly' as
dead code (declared at line 395, assigned at lines 400 and 436,
only read at the now-removed if). golangci-lint's 'unused'
linter would flag 'readOnly' next.
- Complete fix: collapse the if/else AND remove the now-unused
'readOnly' variable + its 2 assignments. Single unconditional
'w.Header().Set("Content-Type", "application/json")' covers
both paths (RFC 8555 §6.3 POST-as-GET + §7.3.2 / §7.3.6 update
+ deactivation all return the same account JSON shape — no spec
rationale for differentiating headers).
Verified locally: 'gofmt -l .' clean; 'go vet ./...' clean;
'go test -short -count=1 ./internal/api/handler/' green; 'grep
readOnly' on the file returns only the new explanatory comment
(no live references).
The alert was first detected in commit
|