mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 21:51:30 +00:00
f1d97710e1098ce352290e4a9db253ba0a054f72
5 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4dc8d3fa5b |
acme-server: key rollover + revocation + ARI (Phase 4/7)
Closes the RFC 8555 + RFC 9773 surface beyond the issuance happy-path:
- POST /acme/profile/<id>/key-change (RFC 8555 §7.3.5)
- POST /acme/profile/<id>/revoke-cert (RFC 8555 §7.6)
- GET /acme/profile/<id>/renewal-info/<cert-id> (RFC 9773 ARI)
After this commit, ACME clients can rotate account keys, revoke certs
through the ACME surface (rather than only via the certctl GUI/API),
and fetch ARI for proactive renewal scheduling.
Architecture:
- Key rollover: outer JWS verified against the registered account key
(existing kid path); the inner JWS — embedded as the outer's payload
— verified against the embedded NEW jwk in a new dedicated routine
(ParseAndVerifyKeyChangeInner) that enforces RFC 8555 §7.3.5
inner-only invariants: MUST use jwk + MUST NOT use kid, payload
.account == outer.kid, payload.oldKey thumbprint-equals registered.
A single WithinTx swaps the stored thumbprint+pem and writes the
audit row. Concurrent-rollover safety via SELECT…FOR UPDATE on the
conflicting account row in UpdateAccountJWKWithTx; the loser
observes the winner's new thumbprint and is told to retry (409).
- Revocation: two auth paths. kid → AccountOwnsCertificate single-
indexed COUNT lookup over acme_orders. jwk → constant-time RFC 7638
thumbprint compare against the cert's pubkey. Both paths route
through service.RevocationSvc.RevokeCertificateWithActor so the
existing CRL/OCSP refresh + audit + metrics pipeline applies. RFC
5280 §5.3.1 numeric reason codes clamp to certctl's
domain.ValidRevocationReasons; codes 8 (removeFromCRL) + 10
(aACompromise) clamp to 'unspecified' since they aren't in the set.
- ARI is GET-only and unauth per RFC 9773 §4. Cert-id wire shape is
base64url(AKI).base64url(serial); ParseARICertID strict-decodes,
SerialHex emits the canonical certctl-shape lowercase-no-leading-
zeros hex used in certificate_versions.serial_number.
ComputeRenewalWindow has 3 branches: bound RenewalPolicy →
[notAfter - days, notAfter - days/2]; no policy → last 33% of
validity; past expiry → [now, now + 1d] (renew immediately).
Retry-After honors CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL.
What ships:
- internal/api/acme/{keychange,ari}.go (+ phase4_test.go: 15 tests).
- internal/api/acme/order.go: RevokeCertRequest wire shape.
- internal/api/handler/acme.go: KeyChange, RevokeCert, RenewalInfo
+ 11 new writeServiceError mappings.
- internal/repository/postgres/acme.go: UpdateAccountJWKWithTx (FOR
UPDATE + expectedOldThumbprint precondition; ErrACMEAccountKey-
ConcurrentUpdate sentinel) + AccountOwnsCertificate.
- internal/service/acme.go: RotateAccountKey + RevokeCert +
RenewalInfo; CertificateRevoker + RenewalPolicyLookup interfaces;
SetRevocationDelegate + SetRenewalPolicyLookup wiring; 11 new
sentinels; 6 new metrics.
- internal/service/acme_phase4_test.go: service-layer tests for
RotateAccountKey (happy + duplicate-key) + RevokeCert (kid mismatch
+ jwk mismatch + jwk happy + already-revoked + reason-clamping) +
RenewalInfo (disabled + bad cert-id).
- internal/api/router/router.go: 6 new register calls (3 per-profile
+ 3 shorthand). Router parity exceptions extended in lockstep
(in-tree SpecParityExceptions + CI-only openapi-handler-exceptions
.yaml).
- cmd/server/main.go: SetRevocationDelegate(revocationSvc) +
SetRenewalPolicyLookup(renewalPolicyRepo) at startup.
- internal/config/config.go: CERTCTL_ACME_SERVER_ARI_ENABLED (default
true) + CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL (default 6h);
BuildDirectory's ariEnabled flag now flips on under
cfg.ARIEnabled.
- docs/acme-server.md: phase status flipped to Phase 4; endpoints
table grows 6 rows (3 per-profile + 3 shorthand); FAQ section
appended explaining how to rotate keys, revoke certs, and consume
ARI.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./...' green across every package.
- phase4_test.go covers: keychange happy-path + 5 negatives +
MapKeyChangeErrorToProblem coverage; ARI cert-id round-trip + 6
malformed cases + BuildARICertID from a generated cert; window-
math 3 branches.
- service-layer tests confirm: RotateAccountKey atomically swaps the
thumbprint (verifies persisted state) and rejects duplicate keys;
RevokeCert routes through the stub RevocationSvc with the right
actor string + reason on the jwk path, rejects mismatched keys,
rejects already-revoked certs, clamps reason codes correctly;
RenewalInfo respects ARIEnabled + cert-id format.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-4'.
|
||
|
|
9bc845304e |
acme-server: HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (Phase 3/7)
Wires up the actual challenge-validation machinery so profiles in
acme_auth_mode='challenge' resolve end-to-end. After this commit,
cert-manager 1.15+ with `solver: http01: ingress` against a
challenge-mode profile completes a real HTTP-01 flow and gets a cert.
DNS-01 + TLS-ALPN-01 share the same code path with the appropriate
validator selection.
Architecture (the load-bearing parts):
- 3 separate semaphore-bounded worker pools (one per challenge type),
so HTTP-01 and DNS-01 can't starve each other under load. Default
weight 10 per type; tunable via CERTCTL_ACME_SERVER_HTTP01_CONCURRENCY,
DNS01_CONCURRENCY, TLSALPN01_CONCURRENCY.
- 30s per-challenge timeout (configurable via PoolConfig.PerChallengeTimeout).
- HTTP-01 validator runs validation.IsReservedIPForDial (newly
exported wrapper preserving the existing private impl byte-for-byte
for the network scanner + ValidateSafeURL paths) on the resolved
IP — both at the initial dial and every redirect hop. SSRF probes
into private IP space are refused before the connect.
- DNS-01 validator uses a dedicated resolver pointed at
CERTCTL_ACME_SERVER_DNS01_RESOLVER (default 8.8.8.8:53) — does
NOT use the system resolver to keep behavior deterministic across
deployments. Wildcard handling: `*.example.com` queries
_acme-challenge.example.com.
- TLS-ALPN-01 validator (RFC 8737) connects with ALPN `acme-tls/1`,
inspects the id-pe-acmeIdentifier extension (OID 1.3.6.1.5.5.7.1.31),
asserts the ASN.1 OCTET STRING value equals SHA-256 of the key
authorization. Cert chain is intentionally NOT validated
(InsecureSkipVerify=true is correct per RFC 8737 — the proof is
in the extension, not the chain). Documented in docs/tls.md L-001
table + the //nolint:gosec comment carries the justification.
SSRF guard: same posture as HTTP-01.
- Validation is asynchronous: handler accepts the POST and returns
200 immediately with status=processing; the worker-pool fires a
callback that updates challenge → authz → order in a fresh
background-context WithinTx. The order auto-promotes to `ready`
when ALL authzs become valid; auto-fails to `invalid` when ANY
authz becomes invalid.
What ships:
- internal/api/acme/challenge.go: KeyAuthorization (RFC 8555 §8.1) +
DNS01TXTRecordValue (§8.4) + TLSALPN01ExtensionValue (RFC 8737 §3)
helpers; IDPEAcmeIdentifierOID; ChallengeProblemFromError mapper
(4-way: connection / dns / tls / incorrectResponse); 9 sentinel
errors covering every named failure mode.
- internal/api/acme/validators.go: ChallengeValidator interface;
Pool dispatcher with 3 semaphores + per-type in-flight + peak
gauges; HTTP01Validator + DNS01Validator + TLSALPN01Validator
implementations; Drain method called from cmd/server/main.go's
shutdown sequence.
- internal/api/acme/validators_test.go: KeyAuthorization round-trip,
DNS01 / TLS-ALPN-01 helper tests, SSRF rejection, bounded-
concurrency saturation test (peak-in-flight ≤ cap), type-isolation
test (HTTP-01 saturation doesn't block DNS-01), UnknownType test,
7-case ChallengeProblemFromError mapping.
- internal/repository/postgres/acme.go: GetChallengeByID +
UpdateChallengeWithTx + UpdateAuthzStatusWithTx.
- internal/service/acme.go: SetValidatorPool wires the *acme.Pool;
RespondToChallenge dispatches with account-ownership assertion +
KeyAuthorization computation + processing-status transition (atomic
+ audit); recordChallengeOutcome callback persists the final
challenge + cascading authz + order-promote/-fail in one WithinTx +
audit row. 4 new metrics.
- internal/api/handler/acme.go: Challenge handler; round-trips
account.JWKPEM through ParseJWKFromPEM to recover the *jose.JSONWebKey
the validator pool needs.
- internal/api/router/router.go + openapi_parity_test.go +
api/openapi-handler-exceptions.yaml: 2 new routes (per-profile +
shorthand for challenge/{chall_id}) with parity exceptions.
- cmd/server/main.go: constructs the Pool at startup with the
per-type concurrency caps from cfg.ACMEServer; ACMEService.ValidatorPool()
accessor exposed for the shutdown drain sequence.
- internal/validation/ssrf.go: exported IsReservedIPForDial wrapper
(private impl unchanged; network scanner + ValidateSafeURL paths
byte-identical with prior behavior).
- docs/tls.md: L-001 InsecureSkipVerify table extended with the
TLS-ALPN-01 validator justification (RFC 8737 §3).
- docs/acme-server.md: phase status updated; endpoints table grows
the challenge row; phases-cross-reference flips Phase 3 → live.
Tests:
- 80%+ coverage on the new files.
- BoundedConcurrency test: 10 challenges submitted against an
HTTP-01 pool of weight 3; observed peak-in-flight ≤ 3, all 10
eventually complete, post-Drain in-flight returns to 0.
- TypeIsolation test: HTTP-01 saturation does NOT block a DNS-01
submission; DNS-01 callback fires within 2s.
- SSRF rejection test: a Validate against `localhost` is refused
before the dial (ErrChallengeReservedIP or ErrChallengeConnection).
Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-3".
|
||
|
|
c351bba41a |
acme-server: orders + authorizations + finalize + cert download (Phase 2/7)
Closes the issuance loop in trust_authenticated mode (commits |
||
|
|
a05a7d3dad |
ci: fix Phase 1b post-push CI failures (3 guards)
Phase 1b push (commit
|
||
|
|
b7a3162028 |
ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
Bundle: ci-pipeline-cleanup, Phases 7-9 / frozen decisions 0.8 + 0.10 + 0.11.
NEW image-and-supply-chain job (Ubuntu, ~3 min). Three steps:
PHASE 7 — Digest validity
scripts/ci-guards/digest-validity.sh resolves every @sha256:<digest>
ref in deploy/**/*.{yml,Dockerfile*} against its registry. Closes the
H-001 lying-field gap that Bundle II hit (11 fabricated digests passed
H-001's regex-only check and failed docker pull in CI).
Sandbox verification: 16/16 digests in deploy/* + Dockerfiles all
return HTTP 200 from registry-1.docker.io / ghcr.io / mcr.microsoft.com.
PHASE 8 — Docker build smoke (all 4 Dockerfiles)
Per frozen decision 0.10: build Dockerfile, Dockerfile.agent,
deploy/test/f5-mock-icontrol/Dockerfile, deploy/test/libest/Dockerfile.
Catches syntax errors + COPY path drift before tag-time release.yml.
The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a
syntax error there silently breaks the e2e suite.
PHASE 9 — OpenAPI ↔ handler operationId parity
scripts/ci-guards/openapi-handler-parity.sh extracts router routes
(r.mux.Handle / r.Register "METHOD /path" syntax — Go 1.22+ ServeMux),
extracts OpenAPI operations (paths × HTTP methods), and fails if any
router route has no operationId AND is not documented in the new
api/openapi-handler-exceptions.yaml.
Verified gap at HEAD
|