mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:51:30 +00:00
acme-server: key rollover + revocation + ARI (Phase 4/7)
Closes the RFC 8555 + RFC 9773 surface beyond the issuance happy-path:
- POST /acme/profile/<id>/key-change (RFC 8555 §7.3.5)
- POST /acme/profile/<id>/revoke-cert (RFC 8555 §7.6)
- GET /acme/profile/<id>/renewal-info/<cert-id> (RFC 9773 ARI)
After this commit, ACME clients can rotate account keys, revoke certs
through the ACME surface (rather than only via the certctl GUI/API),
and fetch ARI for proactive renewal scheduling.
Architecture:
- Key rollover: outer JWS verified against the registered account key
(existing kid path); the inner JWS — embedded as the outer's payload
— verified against the embedded NEW jwk in a new dedicated routine
(ParseAndVerifyKeyChangeInner) that enforces RFC 8555 §7.3.5
inner-only invariants: MUST use jwk + MUST NOT use kid, payload
.account == outer.kid, payload.oldKey thumbprint-equals registered.
A single WithinTx swaps the stored thumbprint+pem and writes the
audit row. Concurrent-rollover safety via SELECT…FOR UPDATE on the
conflicting account row in UpdateAccountJWKWithTx; the loser
observes the winner's new thumbprint and is told to retry (409).
- Revocation: two auth paths. kid → AccountOwnsCertificate single-
indexed COUNT lookup over acme_orders. jwk → constant-time RFC 7638
thumbprint compare against the cert's pubkey. Both paths route
through service.RevocationSvc.RevokeCertificateWithActor so the
existing CRL/OCSP refresh + audit + metrics pipeline applies. RFC
5280 §5.3.1 numeric reason codes clamp to certctl's
domain.ValidRevocationReasons; codes 8 (removeFromCRL) + 10
(aACompromise) clamp to 'unspecified' since they aren't in the set.
- ARI is GET-only and unauth per RFC 9773 §4. Cert-id wire shape is
base64url(AKI).base64url(serial); ParseARICertID strict-decodes,
SerialHex emits the canonical certctl-shape lowercase-no-leading-
zeros hex used in certificate_versions.serial_number.
ComputeRenewalWindow has 3 branches: bound RenewalPolicy →
[notAfter - days, notAfter - days/2]; no policy → last 33% of
validity; past expiry → [now, now + 1d] (renew immediately).
Retry-After honors CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL.
What ships:
- internal/api/acme/{keychange,ari}.go (+ phase4_test.go: 15 tests).
- internal/api/acme/order.go: RevokeCertRequest wire shape.
- internal/api/handler/acme.go: KeyChange, RevokeCert, RenewalInfo
+ 11 new writeServiceError mappings.
- internal/repository/postgres/acme.go: UpdateAccountJWKWithTx (FOR
UPDATE + expectedOldThumbprint precondition; ErrACMEAccountKey-
ConcurrentUpdate sentinel) + AccountOwnsCertificate.
- internal/service/acme.go: RotateAccountKey + RevokeCert +
RenewalInfo; CertificateRevoker + RenewalPolicyLookup interfaces;
SetRevocationDelegate + SetRenewalPolicyLookup wiring; 11 new
sentinels; 6 new metrics.
- internal/service/acme_phase4_test.go: service-layer tests for
RotateAccountKey (happy + duplicate-key) + RevokeCert (kid mismatch
+ jwk mismatch + jwk happy + already-revoked + reason-clamping) +
RenewalInfo (disabled + bad cert-id).
- internal/api/router/router.go: 6 new register calls (3 per-profile
+ 3 shorthand). Router parity exceptions extended in lockstep
(in-tree SpecParityExceptions + CI-only openapi-handler-exceptions
.yaml).
- cmd/server/main.go: SetRevocationDelegate(revocationSvc) +
SetRenewalPolicyLookup(renewalPolicyRepo) at startup.
- internal/config/config.go: CERTCTL_ACME_SERVER_ARI_ENABLED (default
true) + CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL (default 6h);
BuildDirectory's ariEnabled flag now flips on under
cfg.ARIEnabled.
- docs/acme-server.md: phase status flipped to Phase 4; endpoints
table grows 6 rows (3 per-profile + 3 shorthand); FAQ section
appended explaining how to rotate keys, revoke certs, and consume
ARI.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./...' green across every package.
- phase4_test.go covers: keychange happy-path + 5 negatives +
MapKeyChangeErrorToProblem coverage; ARI cert-id round-trip + 6
malformed cases + BuildARICertID from a generated cert; window-
math 3 branches.
- service-layer tests confirm: RotateAccountKey atomically swaps the
thumbprint (verifies persisted state) and rejects duplicate keys;
RevokeCert routes through the stub RevocationSvc with the right
actor string + reason on the jwk path, rejects mismatched keys,
rejects already-revoked certs, clamps reason codes correctly;
RenewalInfo respects ARIEnabled + cert-id format.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-4'.
This commit is contained in:
@@ -652,6 +652,105 @@ func jsonUnmarshalACME(data []byte, v interface{}) error {
|
||||
return json.Unmarshal(data, v)
|
||||
}
|
||||
|
||||
// --- Phase 4 — key rollover + revocation auth helper ------------------
|
||||
|
||||
// ErrACMEAccountKeyConcurrentUpdate is returned by UpdateAccountJWKWithTx
|
||||
// when the row's stored thumbprint no longer matches expectedOldThumbprint.
|
||||
// This is the "concurrent rollover" race signal: two requests both
|
||||
// pre-validated their inner JWS against the SAME current key, but only
|
||||
// one of them got into the WithinTx first. The loser's expectedOldThumb-
|
||||
// print is now stale → the row's current thumbprint is the WINNER's NEW
|
||||
// thumbprint. Caller maps to RFC 8555 §7.3.5 "key already exists".
|
||||
var ErrACMEAccountKeyConcurrentUpdate = errors.New("acme: account key was rotated concurrently; re-fetch and retry")
|
||||
|
||||
// UpdateAccountJWKWithTx atomically swaps an account row's JWK
|
||||
// (thumbprint + PEM) using a SELECT … FOR UPDATE guard plus an
|
||||
// expectedOldThumbprint precondition.
|
||||
//
|
||||
// The row is locked for update; the precondition check then runs
|
||||
// against the locked row inside the same tx. If a concurrent rollover
|
||||
// committed first, this caller observes the new thumbprint and returns
|
||||
// ErrACMEAccountKeyConcurrentUpdate (caller maps to 409). Postgres's
|
||||
// row-level lock guarantees the two concurrent transactions serialize.
|
||||
//
|
||||
// Returns repository.ErrNotFound when the account row is missing.
|
||||
func (r *ACMERepository) UpdateAccountJWKWithTx(
|
||||
ctx context.Context,
|
||||
q repository.Querier,
|
||||
accountID string,
|
||||
expectedOldThumbprint, newThumbprint, newJWKPEM string,
|
||||
) error {
|
||||
// SELECT … FOR UPDATE serializes concurrent rollovers on the same
|
||||
// row. The lock is released when the surrounding tx commits / rolls
|
||||
// back; the second concurrent caller blocks here until the first
|
||||
// commits + then sees the new thumbprint on its read.
|
||||
var current string
|
||||
err := q.QueryRowContext(ctx, `
|
||||
SELECT jwk_thumbprint
|
||||
FROM acme_accounts
|
||||
WHERE account_id = $1
|
||||
FOR UPDATE
|
||||
`, accountID).Scan(¤t)
|
||||
if err != nil {
|
||||
if errors.Is(err, sql.ErrNoRows) {
|
||||
return fmt.Errorf("account not found: %w", repository.ErrNotFound)
|
||||
}
|
||||
return fmt.Errorf("acme: lock account for key rollover: %w", err)
|
||||
}
|
||||
if current != expectedOldThumbprint {
|
||||
return ErrACMEAccountKeyConcurrentUpdate
|
||||
}
|
||||
|
||||
res, err := q.ExecContext(ctx, `
|
||||
UPDATE acme_accounts
|
||||
SET jwk_thumbprint = $2, jwk_pem = $3, updated_at = NOW()
|
||||
WHERE account_id = $1 AND jwk_thumbprint = $4
|
||||
`, accountID, newThumbprint, newJWKPEM, expectedOldThumbprint)
|
||||
if err != nil {
|
||||
// Postgres SQLSTATE 23505 = unique_violation on
|
||||
// (profile_id, jwk_thumbprint). RFC 8555 §7.3.5 says the
|
||||
// server SHOULD return 409 Conflict in that case.
|
||||
var pqErr *pq.Error
|
||||
if errors.As(err, &pqErr) && pqErr.Code == "23505" {
|
||||
return ErrACMEAccountDuplicateThumbprint
|
||||
}
|
||||
return fmt.Errorf("acme: update account jwk: %w", err)
|
||||
}
|
||||
n, err := res.RowsAffected()
|
||||
if err != nil {
|
||||
return fmt.Errorf("acme: update account jwk rows affected: %w", err)
|
||||
}
|
||||
if n == 0 {
|
||||
// SELECT … FOR UPDATE found the row but the UPDATE WHERE
|
||||
// jwk_thumbprint=$4 did not. That means the thumbprint changed
|
||||
// between the SELECT result + the UPDATE — should be impossible
|
||||
// under the row lock. Treat defensively as a concurrency error.
|
||||
return ErrACMEAccountKeyConcurrentUpdate
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// AccountOwnsCertificate returns true when there is at least one
|
||||
// acme_orders row with (account_id = accountID, certificate_id =
|
||||
// certificateID). Used by ACMEService.RevokeCert (kid path) to confirm
|
||||
// the requesting account had an order that produced the cert. The
|
||||
// query is a single indexed lookup; we don't materialize the full row.
|
||||
func (r *ACMERepository) AccountOwnsCertificate(ctx context.Context, accountID, certificateID string) (bool, error) {
|
||||
if accountID == "" || certificateID == "" {
|
||||
return false, nil
|
||||
}
|
||||
var count int
|
||||
err := r.db.QueryRowContext(ctx, `
|
||||
SELECT COUNT(1)
|
||||
FROM acme_orders
|
||||
WHERE account_id = $1 AND certificate_id = $2
|
||||
`, accountID, certificateID).Scan(&count)
|
||||
if err != nil {
|
||||
return false, fmt.Errorf("acme: account-owns-certificate: %w", err)
|
||||
}
|
||||
return count > 0, nil
|
||||
}
|
||||
|
||||
// scanACMEAccount is the shared shape for the SELECT-by-X account
|
||||
// queries above. Returns sql.ErrNoRows-wrapped repository.ErrNotFound
|
||||
// on miss; any other scan failure surfaces verbatim.
|
||||
|
||||
Reference in New Issue
Block a user