mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:41:30 +00:00
acme-server: key rollover + revocation + ARI (Phase 4/7)
Closes the RFC 8555 + RFC 9773 surface beyond the issuance happy-path:
- POST /acme/profile/<id>/key-change (RFC 8555 §7.3.5)
- POST /acme/profile/<id>/revoke-cert (RFC 8555 §7.6)
- GET /acme/profile/<id>/renewal-info/<cert-id> (RFC 9773 ARI)
After this commit, ACME clients can rotate account keys, revoke certs
through the ACME surface (rather than only via the certctl GUI/API),
and fetch ARI for proactive renewal scheduling.
Architecture:
- Key rollover: outer JWS verified against the registered account key
(existing kid path); the inner JWS — embedded as the outer's payload
— verified against the embedded NEW jwk in a new dedicated routine
(ParseAndVerifyKeyChangeInner) that enforces RFC 8555 §7.3.5
inner-only invariants: MUST use jwk + MUST NOT use kid, payload
.account == outer.kid, payload.oldKey thumbprint-equals registered.
A single WithinTx swaps the stored thumbprint+pem and writes the
audit row. Concurrent-rollover safety via SELECT…FOR UPDATE on the
conflicting account row in UpdateAccountJWKWithTx; the loser
observes the winner's new thumbprint and is told to retry (409).
- Revocation: two auth paths. kid → AccountOwnsCertificate single-
indexed COUNT lookup over acme_orders. jwk → constant-time RFC 7638
thumbprint compare against the cert's pubkey. Both paths route
through service.RevocationSvc.RevokeCertificateWithActor so the
existing CRL/OCSP refresh + audit + metrics pipeline applies. RFC
5280 §5.3.1 numeric reason codes clamp to certctl's
domain.ValidRevocationReasons; codes 8 (removeFromCRL) + 10
(aACompromise) clamp to 'unspecified' since they aren't in the set.
- ARI is GET-only and unauth per RFC 9773 §4. Cert-id wire shape is
base64url(AKI).base64url(serial); ParseARICertID strict-decodes,
SerialHex emits the canonical certctl-shape lowercase-no-leading-
zeros hex used in certificate_versions.serial_number.
ComputeRenewalWindow has 3 branches: bound RenewalPolicy →
[notAfter - days, notAfter - days/2]; no policy → last 33% of
validity; past expiry → [now, now + 1d] (renew immediately).
Retry-After honors CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL.
What ships:
- internal/api/acme/{keychange,ari}.go (+ phase4_test.go: 15 tests).
- internal/api/acme/order.go: RevokeCertRequest wire shape.
- internal/api/handler/acme.go: KeyChange, RevokeCert, RenewalInfo
+ 11 new writeServiceError mappings.
- internal/repository/postgres/acme.go: UpdateAccountJWKWithTx (FOR
UPDATE + expectedOldThumbprint precondition; ErrACMEAccountKey-
ConcurrentUpdate sentinel) + AccountOwnsCertificate.
- internal/service/acme.go: RotateAccountKey + RevokeCert +
RenewalInfo; CertificateRevoker + RenewalPolicyLookup interfaces;
SetRevocationDelegate + SetRenewalPolicyLookup wiring; 11 new
sentinels; 6 new metrics.
- internal/service/acme_phase4_test.go: service-layer tests for
RotateAccountKey (happy + duplicate-key) + RevokeCert (kid mismatch
+ jwk mismatch + jwk happy + already-revoked + reason-clamping) +
RenewalInfo (disabled + bad cert-id).
- internal/api/router/router.go: 6 new register calls (3 per-profile
+ 3 shorthand). Router parity exceptions extended in lockstep
(in-tree SpecParityExceptions + CI-only openapi-handler-exceptions
.yaml).
- cmd/server/main.go: SetRevocationDelegate(revocationSvc) +
SetRenewalPolicyLookup(renewalPolicyRepo) at startup.
- internal/config/config.go: CERTCTL_ACME_SERVER_ARI_ENABLED (default
true) + CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL (default 6h);
BuildDirectory's ariEnabled flag now flips on under
cfg.ARIEnabled.
- docs/acme-server.md: phase status flipped to Phase 4; endpoints
table grows 6 rows (3 per-profile + 3 shorthand); FAQ section
appended explaining how to rotate keys, revoke certs, and consume
ARI.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./...' green across every package.
- phase4_test.go covers: keychange happy-path + 5 negatives +
MapKeyChangeErrorToProblem coverage; ARI cert-id round-trip + 6
malformed cases + BuildARICertID from a generated cert; window-
math 3 branches.
- service-layer tests confirm: RotateAccountKey atomically swaps the
thumbprint (verifies persisted state) and rejects duplicate keys;
RevokeCert routes through the stub RevocationSvc with the right
actor string + reason on the jwk path, rejects mismatched keys,
rejects already-revoked certs, clamps reason codes correctly;
RenewalInfo respects ARIEnabled + cert-id format.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-4'.
This commit is contained in:
+86
-14
@@ -7,13 +7,12 @@ as an ACME issuer with no certctl-side modification — closing the
|
||||
"deploy a certctl agent on every K8s node" friction that costs deals to
|
||||
external PKI vendors today.
|
||||
|
||||
> **Phase status (2026-05-03):** Phase 3 — Phase 2's surface plus
|
||||
> challenge validation: HTTP-01 (RFC 8555 §8.3), DNS-01 (§8.4), and
|
||||
> TLS-ALPN-01 (RFC 8737). Profiles in `challenge` mode now resolve
|
||||
> end-to-end: client POSTs to `/challenge/<id>`, the server dispatches
|
||||
> a bounded-concurrency worker pool to fetch the proof out-of-band,
|
||||
> the validator updates the challenge → authz → order status chain
|
||||
> on completion. Profiles in `trust_authenticated` mode are unchanged.
|
||||
> **Phase status (2026-05-03):** Phase 4 — closes the RFC 8555 surface
|
||||
> beyond the issuance happy-path: doubly-signed key rollover (§7.3.5),
|
||||
> revoke-cert via either account-key or cert-key (§7.6), and RFC 9773
|
||||
> ACME Renewal Information. ACME clients can now rotate their account
|
||||
> keys, revoke certs through the ACME surface (rather than only the
|
||||
> certctl GUI/API), and fetch ARI for proactive renewal scheduling.
|
||||
> Track shipped phases via `git log --grep='acme-server:'`.
|
||||
|
||||
## Configuration
|
||||
@@ -39,6 +38,8 @@ issuer connector). The struct definition lives in
|
||||
| `CERTCTL_ACME_SERVER_DNS01_RESOLVER` | `8.8.8.8:53` | 3 | Reserved. |
|
||||
| `CERTCTL_ACME_SERVER_DNS01_CONCURRENCY` | `10` | 3 | Reserved. |
|
||||
| `CERTCTL_ACME_SERVER_TLSALPN01_CONCURRENCY` | `10` | 3 | Reserved. |
|
||||
| `CERTCTL_ACME_SERVER_ARI_ENABLED` | `true` | 4 | Toggles the RFC 9773 ARI surface — both the `renewalInfo` URL in the directory document and the GET `/renewal-info/<cert-id>` handler. Set to `false` to drop ARI from the directory; ACME clients fall back to static renewal scheduling. |
|
||||
| `CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL` | `6h` | 4 | Server-policy `Retry-After` value the ARI handler emits on a 200 response. RFC 9773 §4.2 leaves this server-policy. Tighten to `1h` for short-lived certs; loosen to `24h` for standard 90-day certs. |
|
||||
|
||||
## Per-profile auth mode
|
||||
|
||||
@@ -97,7 +98,7 @@ the `caBundle` requirement is flagged here in Phase 1a's docs because
|
||||
operators hit it the moment they try to point a real ACME client at
|
||||
certctl.
|
||||
|
||||
## Endpoints (Phase 2)
|
||||
## Endpoints
|
||||
|
||||
Routes registered in `internal/api/router/router.go::RegisterHandlers`:
|
||||
|
||||
@@ -114,6 +115,9 @@ Routes registered in `internal/api/router/router.go::RegisterHandlers`:
|
||||
| POST | `/acme/profile/{id}/authz/{authz_id}` | RFC 8555 §7.5 | JWS kid | POST-as-GET fetch of an authorization. |
|
||||
| POST | `/acme/profile/{id}/challenge/{chall_id}` | RFC 8555 §7.5.1 | JWS kid | Submit a challenge for validation. Dispatches to a bounded-concurrency worker pool; clients poll authz for the eventual result. |
|
||||
| POST | `/acme/profile/{id}/cert/{cert_id}` | RFC 8555 §7.4.2 | JWS kid | POST-as-GET cert chain download (PEM). |
|
||||
| POST | `/acme/profile/{id}/key-change` | RFC 8555 §7.3.5 | JWS kid (outer) + jwk (inner) | Doubly-signed account-key rollover. |
|
||||
| POST | `/acme/profile/{id}/revoke-cert` | RFC 8555 §7.6 | JWS kid OR jwk | Revoke a cert via the issuing account's key OR the cert's own private key. Routes through the certctl revocation pipeline. |
|
||||
| GET | `/acme/profile/{id}/renewal-info/{cert_id}` | RFC 9773 | unauth | Fetch the suggested renewal window for a cert (cert-id is `base64url(AKI).base64url(serial)` per RFC 9773 §4.1). Response carries `Retry-After`. |
|
||||
| GET | `/acme/directory` | RFC 8555 §7.1.1 | unauth | Shorthand path; mirrors per-profile when `CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID` is set. |
|
||||
| HEAD | `/acme/new-nonce` | RFC 8555 §7.2 | unauth | Shorthand. |
|
||||
| GET | `/acme/new-nonce` | RFC 8555 §7.2 | unauth | Shorthand. |
|
||||
@@ -124,12 +128,13 @@ Routes registered in `internal/api/router/router.go::RegisterHandlers`:
|
||||
| POST | `/acme/order/{ord_id}/finalize` | RFC 8555 §7.4 | JWS kid | Shorthand. |
|
||||
| POST | `/acme/authz/{authz_id}` | RFC 8555 §7.5 | JWS kid | Shorthand. |
|
||||
| POST | `/acme/cert/{cert_id}` | RFC 8555 §7.4.2 | JWS kid | Shorthand. |
|
||||
| POST | `/acme/key-change` | RFC 8555 §7.3.5 | JWS kid (outer) + jwk (inner) | Shorthand. |
|
||||
| POST | `/acme/revoke-cert` | RFC 8555 §7.6 | JWS kid OR jwk | Shorthand. |
|
||||
| GET | `/acme/renewal-info/{cert_id}` | RFC 9773 | unauth | Shorthand. |
|
||||
|
||||
The remaining RFC 8555 endpoints (`challenge/{id}`, `key-change`,
|
||||
`revoke-cert`, `renewal-info`) are advertised in the directory document
|
||||
but not yet served — clients hitting them get a 404 until subsequent
|
||||
phases land. The directory document includes their URLs because RFC 8555
|
||||
doesn't permit a partial directory.
|
||||
After Phase 4, the full RFC 8555 + RFC 9773 surface is live. RFC 8739
|
||||
(short-lived certs) and EAB enforcement remain follow-up work; cert-
|
||||
manager + boulder-tested clients work today against the surface above.
|
||||
|
||||
## Finalize routing through `CertificateService.Create` (Phase 2 architecture)
|
||||
|
||||
@@ -200,7 +205,7 @@ at `internal/service/certificate.go:131`).
|
||||
| 1b | live | new-account + account/{id} + JWS verifier (RFC 7515 + go-jose v4) |
|
||||
| 2 | live | orders + authzs + finalize + cert download (trust_authenticated mode end-to-end) |
|
||||
| 3 | live | HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (challenge mode end-to-end) |
|
||||
| 4 | not yet | key rollover + revocation + ARI (RFC 9773) |
|
||||
| 4 | live | key rollover (RFC 8555 §7.3.5) + revoke-cert (§7.6) + ARI (RFC 9773) |
|
||||
| 5 | not yet | cert-manager integration test + production hardening |
|
||||
| 6 | not yet | full operator-facing reference + walkthroughs + threat model |
|
||||
|
||||
@@ -234,3 +239,70 @@ Track shipped phases via `git log --grep='acme-server:' --oneline`.
|
||||
`s.tx.WithinTx(...)` + `auditService.RecordEventWithTx(...)` pattern
|
||||
so every account state mutation is paired with an `audit_events`
|
||||
row.
|
||||
|
||||
## Phase 4 — key rollover, revocation, ARI
|
||||
|
||||
### How do I rotate my ACME account key?
|
||||
|
||||
RFC 8555 §7.3.5 defines a doubly-signed JWS for the rollover. The OUTER
|
||||
JWS is signed by the OLD account key (kid path); its payload IS the
|
||||
INNER JWS, which is signed by the NEW account key (jwk path). cert-
|
||||
manager and lego do this for you transparently — `lego renew --key-rotate`
|
||||
or the cert-manager `Issuer.spec.acme.privateKeySecretRef` rollover.
|
||||
|
||||
Server-side validation:
|
||||
|
||||
1. Outer JWS verifies against the registered account's current key.
|
||||
2. Inner JWS verifies against the embedded NEW jwk (proves possession).
|
||||
3. Inner payload `account` matches outer `kid`.
|
||||
4. Inner payload `oldKey` thumbprint-equals the registered key.
|
||||
5. Inner protected `url` equals outer protected `url`.
|
||||
6. New JWK thumbprint not already registered against the same profile.
|
||||
7. `SELECT … FOR UPDATE` on the account row serializes concurrent
|
||||
rollovers; the loser sees the winner's new thumbprint and is told
|
||||
to retry (409).
|
||||
|
||||
### How do I revoke an ACME-issued cert?
|
||||
|
||||
Two auth paths per RFC 8555 §7.6:
|
||||
|
||||
- **kid path:** sign with your account key. The server checks the
|
||||
account "owns" the cert via `acme_orders.certificate_id` lookup.
|
||||
- **jwk path:** sign with the cert's own private key. The server
|
||||
extracts the cert's public key, computes the JWK, and asserts it
|
||||
matches the embedded jwk thumbprint.
|
||||
|
||||
Either path routes through `service.RevocationSvc.RevokeCertificateWithActor`
|
||||
— the same pipeline the GUI revoke button, bulk-revocation, and the
|
||||
ACME-consumer issuer use. So the cert-row update + revocation row + audit
|
||||
row are all atomic in one `WithinTx`, the issuer is best-effort
|
||||
notified, and the OCSP response cache is invalidated.
|
||||
|
||||
Reason codes follow RFC 5280 §5.3.1; codes 8 (removeFromCRL) and 10
|
||||
(aACompromise) are not in certctl's `domain.ValidRevocationReasons`
|
||||
set so they clamp to `unspecified`.
|
||||
|
||||
### What is ARI?
|
||||
|
||||
RFC 9773 ACME Renewal Information. Clients GET
|
||||
`/acme/profile/<id>/renewal-info/<cert-id>` (unauthenticated) and
|
||||
receive a JSON document with `suggestedWindow.start` and `.end` —
|
||||
the server's recommendation for when to renew. The response also
|
||||
carries `Retry-After` (RFC 9773 §4.2) hinting at the next-poll cadence.
|
||||
|
||||
Cert-id format is `base64url(authorityKeyIdentifier).base64url(serial)`
|
||||
per RFC 9773 §4.1.
|
||||
|
||||
Window math:
|
||||
|
||||
- Cert with a bound renewal policy: window starts at
|
||||
`notAfter - RenewalWindowDays`, ends at `notAfter - RenewalWindowDays/2`.
|
||||
So a 30-day window cert with notAfter 2026-06-30 emits start=2026-05-31,
|
||||
end=2026-06-15. Boulder-shape default that lets cert-manager schedule
|
||||
inside our renewal window.
|
||||
- No policy: window is the last 33% of validity.
|
||||
- Past expiry: window is "now" → "now + 24h" (renew immediately).
|
||||
|
||||
Disable ARI globally with `CERTCTL_ACME_SERVER_ARI_ENABLED=false`. The
|
||||
URL drops out of the directory; the route is still registered but
|
||||
returns 404 — clients fall back to static renewal scheduling.
|
||||
|
||||
Reference in New Issue
Block a user