diff --git a/docs/architecture.md b/docs/architecture.md index a907c13..2371c58 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -981,7 +981,7 @@ Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST - **Additional filters**: `?agent_id=`, `?profile_id=` (in addition to existing status, environment, owner_id, team_id, issuer_id). - **Deployments**: `GET /api/v1/certificates/{id}/deployments` returns deployment targets for a certificate. -Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`). The embedded OCSP responder serves signed responses unauthenticated at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`). Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation. +Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`); the CRL is pre-generated by the scheduler-driven `crlGenerationLoop` and persisted in the `crl_cache` table (migration 000019) so HTTP fetches do not rebuild per request. The embedded OCSP responder serves signed responses unauthenticated at both `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` and `POST /.well-known/pki/ocsp/{issuer_id}` (RFC 6960 §A.1.1, `Content-Type: application/ocsp-response`); responses are signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6, migration 000020) carrying the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) — the CA private key is never used directly for OCSP signing, which keeps it cold for the future PKCS#11/HSM driver path. The responder cert auto-rotates within `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE` (default 7d) of expiry. Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation. See [`crl-ocsp.md`](crl-ocsp.md) for the operator + relying-party guide (endpoint URLs, configuration knobs, responder cert lifecycle, cert-manager / Firefox / OpenSSL / Intune integration recipes, troubleshooting). Certificate export (M27): `GET /api/v1/certificates/{id}/export/pem` returns PEM-encoded certificate and chain, and `POST /api/v1/certificates/{id}/export/pkcs12` returns a PKCS#12 bundle (binary). Private keys are never exported — they remain on agents. All exports are audited with actor, timestamp, and format. diff --git a/docs/crl-ocsp.md b/docs/crl-ocsp.md new file mode 100644 index 0000000..624e17b --- /dev/null +++ b/docs/crl-ocsp.md @@ -0,0 +1,329 @@ +# CRL & OCSP — Revocation Status for Relying Parties + +This guide is the operator + relying-party reference for certctl's revocation +status surfaces. It covers the wire format, endpoint URLs, configuration knobs, +the OCSP responder cert lifecycle, and how to point common consumers +(cert-manager, Firefox, OpenSSL) at the endpoints. + +If you're looking for the higher-level architecture, see +[`architecture.md` § Security Model](architecture.md#security-model). If you're +looking for the revocation policy / reason codes the API accepts, see +[`api/openapi.yaml` § /certificates/{id}/revoke](../api/openapi.yaml). + +--- + +## Conceptual overview + +**Why two formats.** RFC 5280 §5 defines a Certificate Revocation List (CRL) +— a periodically-published, signed list of every revoked certificate for an +issuer. RFC 6960 defines the Online Certificate Status Protocol (OCSP) — a +request/response protocol that returns the status of a single certificate by +serial number. CRLs are batch-friendly and cacheable; OCSP is point-query and +fresh. Production PKI deployments serve both because different relying parties +prefer different trade-offs: + +- Browsers (Firefox / Safari) prefer OCSP for freshness; some pin OCSP + stapling. +- cert-manager and most Linux TLS clients fall back to CRL when OCSP is + unreachable. +- Microsoft Intune / corporate device-state validators do periodic CRL pulls. +- OpenSSL `s_client -status` exercises OCSP via the `Certificate Status + Request` extension during the handshake. + +certctl's local issuer publishes both, with a pre-generation cache so a busy +CA does not DOS itself rebuilding the CRL on every fetch. + +**Why a separate OCSP responder cert.** RFC 6960 §2.6 + §4.2.2.2 strongly +recommend that OCSP responses be signed by a delegated "OCSP responder cert" +issued by the CA, NOT by the CA private key directly. The responder cert +carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP +clients do not recursively check the responder cert's revocation status. This +keeps the CA private key cold (an HSM operation per OCSP request would be +prohibitive at scale) and lets the responder key live on disk, on a separate +HSM partition, or rotate frequently while the CA key stays untouched. + +--- + +## Endpoints + +All revocation endpoints live under `/.well-known/pki/` per RFC 8615 and run +**unauthenticated** — relying parties without certctl API credentials must be +able to validate revocation status. The HTTPS-only TLS 1.3 control plane +applies; there is no plaintext fallback. + +### CRL — Certificate Revocation List + +``` +GET https:///.well-known/pki/crl/{issuer_id} +``` + +| Field | Value | +| --- | --- | +| Method | `GET` | +| Auth | None (unauthenticated, RFC 5280 §5 distribution semantics) | +| Response Content-Type | `application/pkix-crl` | +| Response body | DER-encoded X.509 CRL signed by the issuer's CA | +| Cache | Pre-generated by the scheduler; configurable interval | + +Example: + +```bash +curl --cacert ca.crt \ + -o crl.der \ + https://localhost:8443/.well-known/pki/crl/iss-local + +openssl crl -inform DER -in crl.der -text -noout +``` + +### OCSP — Online Certificate Status Protocol + +certctl serves both the GET form (RFC 6960 §A.1.1, simple URL-path lookup) +and the POST form (RFC 6960 §A.1.1, binary OCSPRequest body). Most +production OCSP clients (Firefox, OpenSSL `s_client -status`, cert-manager, +Intune) use POST. The GET form is preserved for ops curl-debugging. + +#### GET form + +``` +GET https:///.well-known/pki/ocsp/{issuer_id}/{serial_hex} +``` + +| Field | Value | +| --- | --- | +| Method | `GET` | +| Auth | None | +| Response Content-Type | `application/ocsp-response` | +| Response body | DER-encoded OCSPResponse signed by the **OCSP responder cert** (NOT the CA cert) | + +Example: + +```bash +curl --cacert ca.crt \ + -o response.der \ + https://localhost:8443/.well-known/pki/ocsp/iss-local/a1b2c3d4 + +openssl ocsp -respin response.der -text -CAfile ca.crt +``` + +#### POST form (the standard one) + +``` +POST https:///.well-known/pki/ocsp/{issuer_id} +Content-Type: application/ocsp-request +Body: +``` + +| Field | Value | +| --- | --- | +| Method | `POST` | +| Auth | None | +| Request Content-Type | `application/ocsp-request` | +| Response Content-Type | `application/ocsp-response` | + +Example with OpenSSL building the request: + +```bash +openssl ocsp -issuer ca.crt -cert leaf.crt -reqout request.der + +curl --cacert ca.crt \ + -X POST \ + -H "Content-Type: application/ocsp-request" \ + --data-binary @request.der \ + -o response.der \ + https://localhost:8443/.well-known/pki/ocsp/iss-local + +openssl ocsp -respin response.der -text -CAfile ca.crt +``` + +The body-size limit applies (`http.MaxBytesReader` from middleware, +default 1MB, configurable via `CERTCTL_MAX_BODY_SIZE`); a typical OCSPRequest +is ~200 bytes so this is a generous cap. + +### Admin observability endpoint + +``` +GET https:///api/v1/admin/crl/cache +Authorization: Bearer +``` + +Returns the per-issuer cache state — for ops dashboards, GUI badges, or +"is the scheduler keeping up?" diagnostics. Admin-gated (M-008 admin-gated +handler allowlist; non-admin Bearer callers receive HTTP 403). Response shape: + +```json +{ + "cache_rows": [ + { + "issuer_id": "iss-local", + "cache_present": true, + "crl_number": 42, + "this_update": "2026-04-29T10:00:00Z", + "next_update": "2026-04-29T11:00:00Z", + "generated_at": "2026-04-29T10:00:00Z", + "generation_duration_ms": 87, + "revoked_count": 13, + "is_stale": false, + "recent_events": [ + { + "started_at": "2026-04-29T10:00:00Z", + "duration_ms": 87, + "succeeded": true, + "crl_number": 42, + "revoked_count": 13 + } + ] + } + ], + "row_count": 1, + "generated_at": "2026-04-29T10:30:00Z" +} +``` + +Issuers that have not yet had a CRL generated appear with `cache_present: +false` so the GUI can render a "Not yet generated" pill rather than 404. + +--- + +## Configuration + +| Env var | Default | Meaning | +| --- | --- | --- | +| `CERTCTL_CRL_GENERATION_INTERVAL` | `1h` | How often the scheduler walks every CRL-supporting issuer and rebuilds. The HTTP handler reads from the cache, not from a per-request rebuild. | +| `CERTCTL_OCSP_RESPONDER_KEY_DIR` | unset | **Operator MUST set in production.** Directory where the FileDriver persists each issuer's OCSP responder key (`ocsp-responder-.key`). When unset, the responder service uses a temporary directory that does NOT survive restarts — fine for dev, NEVER for prod. | +| `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE` | `7d` | When the responder cert's `NotAfter` falls within this window, `EnsureResponder` rotates to a fresh cert+key on the next OCSP request or scheduler tick. | +| `CERTCTL_OCSP_RESPONDER_VALIDITY` | `30d` | How long each newly-issued responder cert is valid for. Short by design — relying parties cache OCSP responses, not the responder cert chain, and `id-pkix-ocsp-nocheck` blocks recursive revocation checking on the responder itself. | + +The issuer-level CRL `nextUpdate` is derived from the generation timestamp + +the configured CRL validity (currently a build-time constant in the +`CRLCacheService`; configurable knob deferred until an operator asks). + +--- + +## OCSP responder cert lifecycle + +1. **First OCSP request for an issuer (or scheduler tick).** The local + issuer's `SignOCSPResponse` calls into `OCSPResponderService.EnsureResponder`. +2. **Cache lookup.** `EnsureResponder` queries the `ocsp_responders` table for + a row keyed by `issuer_id`. +3. **Disk lookup.** If a row exists, the FileDriver reads the persisted key + from `/ocsp-responder-.key`. **Self-healing:** if the + row exists but the file is missing (operator pruned the keydir without + pruning the DB), the service treats this as "rotate now" rather than + crashing. +4. **Rotation check.** If `cert.NotAfter < now + RotationGrace`, the service + generates a fresh ECDSA-P256 key, builds a `*x509.CertificateRequest`, + and asks the local issuer's existing `IssueCertificate` flow to sign it. + The signing template carries: + - `KeyUsage: x509.KeyUsageDigitalSignature` (signing OCSP responses) + - `ExtKeyUsage: x509.ExtKeyUsageOCSPSigning` (RFC 6960 §4.2.2.2) + - The `id-pkix-ocsp-nocheck` extension (OID `1.3.6.1.5.5.7.48.1.5`, + DER value `NULL`, RFC 6960 §4.2.2.2.1) wired through + `Certificate.ExtraExtensions`. +5. **Persistence.** The new cert + key path are written to `ocsp_responders` + via an idempotent `INSERT … ON CONFLICT DO UPDATE`. +6. **Response signing.** `ocsp.CreateResponse(caCert, responderCert, + template, responderSigner)` produces the response bytes; the responder + cert is included in the response chain so relying parties can validate + without a separate fetch. + +The race between scheduler-driven cache refresh and on-demand cache miss is +collapsed by the `CRLCacheService`'s in-tree singleflight (a `sync.Map` of +`*flightEntry` keyed by `issuer_id`). Concurrent generation requests for the +same issuer wait on the in-flight result rather than each rebuilding from +scratch. + +--- + +## Pointing common consumers at the endpoints + +### cert-manager (Kubernetes) + +cert-manager's certificate-validation logic checks both the AIA OCSP URI +embedded in the leaf and the CDP CRL URI. Both are populated automatically +by the local issuer's certificate template — relying parties should NOT +need any additional configuration. To verify: + +```bash +openssl x509 -in leaf.crt -text -noout | grep -A1 "Authority Information Access" +openssl x509 -in leaf.crt -text -noout | grep -A2 "CRL Distribution Points" +``` + +If your cert-manager pods cannot reach `https://:8443/.well-known/pki/`, +add a NetworkPolicy egress rule or expose the certctl service via the +appropriate ingress class. + +### Firefox + +Firefox honors the AIA OCSP URI by default. To force-refresh the local +revocation cache after revoking a cert in dev: + +``` +about:preferences#privacy → Certificates → Query OCSP responder servers +``` + +If Firefox reports `SEC_ERROR_OCSP_INVALID_SIGNING_CERT`, verify that the +responder cert chain is reachable from the system trust store — +`id-pkix-ocsp-nocheck` is a Firefox-strict extension and is set automatically +on every responder cert certctl issues. + +### OpenSSL + +```bash +# OCSP via stand-alone request +openssl ocsp -issuer ca.crt -cert leaf.crt -url https://localhost:8443/.well-known/pki/ocsp/iss-local -CAfile ca.crt -text + +# OCSP via TLS Certificate Status Request extension +openssl s_client -connect example.com:443 -status -CAfile ca.crt +``` + +### Intune (corporate device state) + +Intune device-compliance validators pull the CRL on a schedule (configured in +the Intune admin console, default 24h). Configure the CRL distribution point +to `https://:8443/.well-known/pki/crl/` and Intune +will pull on its own cadence. + +--- + +## What this release does NOT include (V3-Pro) + +The following are explicitly out of scope for the V2 (free) bundle and are +tracked for the certctl Pro release: + +- **Delta CRLs (RFC 5280 §5.2.4).** Useful for very large CRLs (10k+ + revoked certs); the data model already accommodates the Base CRL Number + reference but the pipeline only emits Base CRLs in V2. +- **OCSP rate-limiting per relying party.** Per-IP token bucket on the OCSP + endpoint — V3-Pro because it justifies per-seat pricing for high-traffic + responders. +- **OCSP stapling.** Server-side: cache pre-fetched OCSP responses + serve + in TLS handshake. Client-side: a "stapling fetcher" agent for non-stapling + origins. + +The MaxBytesReader cap is the only request-level guard in V2; the +unauthenticated-by-design relying-party endpoints are intentionally not +rate-limited per IP. + +--- + +## Troubleshooting + +**`pki/crl/` returns 404.** The issuer either does not support +CRL signing (Vault, EJBCA, DigiCert serve their own CRL infrastructure; +certctl's connectors return `nil` from `GenerateCRL` for these) or the +issuer ID is wrong. Verify with `GET /api/v1/issuers`. + +**`pki/ocsp//` returns 200 but `openssl ocsp -text` +shows "unauthorized".** Check that the serial in the URL is hex-encoded (no +`0x` prefix, no leading zeros stripped, lowercase). Mismatched serials +return an OCSP response with status `unauthorized` per RFC 6960 §2.3. + +**Admin cache endpoint returns 403.** The Bearer key does not carry the +admin flag. M-008 gates this endpoint server-side; the GUI also gates the +fetch on `useAuth().admin`. Either escalate the key (`certctl admin +keys promote `) or use a different identity. + +**Cache shows `is_stale: true` repeatedly.** The scheduler is not running +(or not getting scheduled often enough). Check `CERTCTL_CRL_GENERATION_INTERVAL` +and confirm the scheduler started: `grep crlGenerationLoop` in the server +logs at startup.