docs: CRL/OCSP user guide + architecture cross-reference — Phase 6

Audit of cowork/crl-ocsp-responder-prompt.md against repo HEAD found
two prompt deliverables still missing after the Phase 5 + Phase 6 code
landed: the docs/crl-ocsp.md operator+relying-party guide (Phase 6.2)
and the docs/architecture.md cross-reference. This commit closes both.

docs/crl-ocsp.md (329 lines) covers:
  * Conceptual overview — why both CRL and OCSP, why a separate
    responder cert (RFC 6960 §2.6 / §4.2.2.2.1) keeps the CA key cold
  * Endpoints — GET CRL, GET + POST OCSP, admin observability endpoint
    (M-008 admin-gated) with full request/response shape examples
  * Configuration — every CERTCTL_CRL_* / CERTCTL_OCSP_RESPONDER_*
    env var with default + meaning + 'MUST set in prod' callout for
    OCSP_RESPONDER_KEY_DIR
  * OCSP responder cert lifecycle — first-request bootstrap, disk
    self-healing when keydir is pruned out from under the DB row,
    rotation grace, ExtraExtensions wiring for id-pkix-ocsp-nocheck
  * Consumer integration recipes — cert-manager (AIA/CDP automatic),
    Firefox (about:preferences quirk), OpenSSL (ocsp + s_client -status),
    Intune (CRL pull cadence)
  * V3-Pro deferred (delta CRLs, OCSP rate-limiting, OCSP stapling)
  * Troubleshooting (404 on issuer that doesn't support CRL, hex
    serial format, admin-gated 403, scheduler not running)

docs/architecture.md: extended the existing 'Certificate revocation'
paragraph to explicitly call out the new pipeline (crl_cache table,
OCSP responder cert per RFC 6960 §2.6, POST + GET OCSP endpoints,
auto-rotation grace) and added the 'See docs/crl-ocsp.md for the
operator + relying-party guide' link so future readers can find the
deep dive.

Closes the prompt's Phase 6.2 + 6.3 exit criteria. Combined with
the Phase 5 GUI panel (0594631) + Phase 6 e2e helpers (fc3c7ad) +
Phase 5 admin endpoint (a4df1f8), this completes V2 for the bundle.
V3-Pro polish (delta CRLs, OCSP rate-limiting, OCSP stapling) remains
explicitly out of scope per the prompt's 'What this prompt is NOT'
section.
This commit is contained in:
shankar0123
2026-04-29 03:09:13 +00:00
parent fc3c7ad1e3
commit b4334edda1
2 changed files with 330 additions and 1 deletions
+1 -1
View File
@@ -981,7 +981,7 @@ Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST
- **Additional filters**: `?agent_id=`, `?profile_id=` (in addition to existing status, environment, owner_id, team_id, issuer_id).
- **Deployments**: `GET /api/v1/certificates/{id}/deployments` returns deployment targets for a certificate.
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`). The embedded OCSP responder serves signed responses unauthenticated at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`). Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation.
Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{"reason": "keyCompromise"}`. Supports RFC 5280 reason codes (unspecified, keyCompromise, caCompromise, affiliationChanged, superseded, cessationOfOperation, certificateHold, privilegeWithdrawn). Returns the updated certificate status. Best-effort issuer notification — the revocation succeeds even if the issuer connector is unavailable. The DER-encoded X.509 CRL signed by the issuing CA is served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`); the CRL is pre-generated by the scheduler-driven `crlGenerationLoop` and persisted in the `crl_cache` table (migration 000019) so HTTP fetches do not rebuild per request. The embedded OCSP responder serves signed responses unauthenticated at both `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` and `POST /.well-known/pki/ocsp/{issuer_id}` (RFC 6960 §A.1.1, `Content-Type: application/ocsp-response`); responses are signed by a per-issuer dedicated OCSP responder cert (RFC 6960 §2.6, migration 000020) carrying the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) — the CA private key is never used directly for OCSP signing, which keeps it cold for the future PKCS#11/HSM driver path. The responder cert auto-rotates within `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE` (default 7d) of expiry. Both endpoints are accessible to relying parties with no certctl API credentials, as RFC-compliant PKI consumers expect. Short-lived certificates (profile TTL < 1 hour) are exempt from CRL/OCSP — expiry is sufficient revocation. See [`crl-ocsp.md`](crl-ocsp.md) for the operator + relying-party guide (endpoint URLs, configuration knobs, responder cert lifecycle, cert-manager / Firefox / OpenSSL / Intune integration recipes, troubleshooting).
Certificate export (M27): `GET /api/v1/certificates/{id}/export/pem` returns PEM-encoded certificate and chain, and `POST /api/v1/certificates/{id}/export/pkcs12` returns a PKCS#12 bundle (binary). Private keys are never exported — they remain on agents. All exports are audited with actor, timestamp, and format.
+329
View File
@@ -0,0 +1,329 @@
# CRL & OCSP — Revocation Status for Relying Parties
This guide is the operator + relying-party reference for certctl's revocation
status surfaces. It covers the wire format, endpoint URLs, configuration knobs,
the OCSP responder cert lifecycle, and how to point common consumers
(cert-manager, Firefox, OpenSSL) at the endpoints.
If you're looking for the higher-level architecture, see
[`architecture.md` § Security Model](architecture.md#security-model). If you're
looking for the revocation policy / reason codes the API accepts, see
[`api/openapi.yaml` § /certificates/{id}/revoke](../api/openapi.yaml).
---
## Conceptual overview
**Why two formats.** RFC 5280 §5 defines a Certificate Revocation List (CRL)
— a periodically-published, signed list of every revoked certificate for an
issuer. RFC 6960 defines the Online Certificate Status Protocol (OCSP) — a
request/response protocol that returns the status of a single certificate by
serial number. CRLs are batch-friendly and cacheable; OCSP is point-query and
fresh. Production PKI deployments serve both because different relying parties
prefer different trade-offs:
- Browsers (Firefox / Safari) prefer OCSP for freshness; some pin OCSP
stapling.
- cert-manager and most Linux TLS clients fall back to CRL when OCSP is
unreachable.
- Microsoft Intune / corporate device-state validators do periodic CRL pulls.
- OpenSSL `s_client -status` exercises OCSP via the `Certificate Status
Request` extension during the handshake.
certctl's local issuer publishes both, with a pre-generation cache so a busy
CA does not DOS itself rebuilding the CRL on every fetch.
**Why a separate OCSP responder cert.** RFC 6960 §2.6 + §4.2.2.2 strongly
recommend that OCSP responses be signed by a delegated "OCSP responder cert"
issued by the CA, NOT by the CA private key directly. The responder cert
carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP
clients do not recursively check the responder cert's revocation status. This
keeps the CA private key cold (an HSM operation per OCSP request would be
prohibitive at scale) and lets the responder key live on disk, on a separate
HSM partition, or rotate frequently while the CA key stays untouched.
---
## Endpoints
All revocation endpoints live under `/.well-known/pki/` per RFC 8615 and run
**unauthenticated** — relying parties without certctl API credentials must be
able to validate revocation status. The HTTPS-only TLS 1.3 control plane
applies; there is no plaintext fallback.
### CRL — Certificate Revocation List
```
GET https://<host>/.well-known/pki/crl/{issuer_id}
```
| Field | Value |
| --- | --- |
| Method | `GET` |
| Auth | None (unauthenticated, RFC 5280 §5 distribution semantics) |
| Response Content-Type | `application/pkix-crl` |
| Response body | DER-encoded X.509 CRL signed by the issuer's CA |
| Cache | Pre-generated by the scheduler; configurable interval |
Example:
```bash
curl --cacert ca.crt \
-o crl.der \
https://localhost:8443/.well-known/pki/crl/iss-local
openssl crl -inform DER -in crl.der -text -noout
```
### OCSP — Online Certificate Status Protocol
certctl serves both the GET form (RFC 6960 §A.1.1, simple URL-path lookup)
and the POST form (RFC 6960 §A.1.1, binary OCSPRequest body). Most
production OCSP clients (Firefox, OpenSSL `s_client -status`, cert-manager,
Intune) use POST. The GET form is preserved for ops curl-debugging.
#### GET form
```
GET https://<host>/.well-known/pki/ocsp/{issuer_id}/{serial_hex}
```
| Field | Value |
| --- | --- |
| Method | `GET` |
| Auth | None |
| Response Content-Type | `application/ocsp-response` |
| Response body | DER-encoded OCSPResponse signed by the **OCSP responder cert** (NOT the CA cert) |
Example:
```bash
curl --cacert ca.crt \
-o response.der \
https://localhost:8443/.well-known/pki/ocsp/iss-local/a1b2c3d4
openssl ocsp -respin response.der -text -CAfile ca.crt
```
#### POST form (the standard one)
```
POST https://<host>/.well-known/pki/ocsp/{issuer_id}
Content-Type: application/ocsp-request
Body: <DER-encoded OCSPRequest>
```
| Field | Value |
| --- | --- |
| Method | `POST` |
| Auth | None |
| Request Content-Type | `application/ocsp-request` |
| Response Content-Type | `application/ocsp-response` |
Example with OpenSSL building the request:
```bash
openssl ocsp -issuer ca.crt -cert leaf.crt -reqout request.der
curl --cacert ca.crt \
-X POST \
-H "Content-Type: application/ocsp-request" \
--data-binary @request.der \
-o response.der \
https://localhost:8443/.well-known/pki/ocsp/iss-local
openssl ocsp -respin response.der -text -CAfile ca.crt
```
The body-size limit applies (`http.MaxBytesReader` from middleware,
default 1MB, configurable via `CERTCTL_MAX_BODY_SIZE`); a typical OCSPRequest
is ~200 bytes so this is a generous cap.
### Admin observability endpoint
```
GET https://<host>/api/v1/admin/crl/cache
Authorization: Bearer <token-with-admin-flag>
```
Returns the per-issuer cache state — for ops dashboards, GUI badges, or
"is the scheduler keeping up?" diagnostics. Admin-gated (M-008 admin-gated
handler allowlist; non-admin Bearer callers receive HTTP 403). Response shape:
```json
{
"cache_rows": [
{
"issuer_id": "iss-local",
"cache_present": true,
"crl_number": 42,
"this_update": "2026-04-29T10:00:00Z",
"next_update": "2026-04-29T11:00:00Z",
"generated_at": "2026-04-29T10:00:00Z",
"generation_duration_ms": 87,
"revoked_count": 13,
"is_stale": false,
"recent_events": [
{
"started_at": "2026-04-29T10:00:00Z",
"duration_ms": 87,
"succeeded": true,
"crl_number": 42,
"revoked_count": 13
}
]
}
],
"row_count": 1,
"generated_at": "2026-04-29T10:30:00Z"
}
```
Issuers that have not yet had a CRL generated appear with `cache_present:
false` so the GUI can render a "Not yet generated" pill rather than 404.
---
## Configuration
| Env var | Default | Meaning |
| --- | --- | --- |
| `CERTCTL_CRL_GENERATION_INTERVAL` | `1h` | How often the scheduler walks every CRL-supporting issuer and rebuilds. The HTTP handler reads from the cache, not from a per-request rebuild. |
| `CERTCTL_OCSP_RESPONDER_KEY_DIR` | unset | **Operator MUST set in production.** Directory where the FileDriver persists each issuer's OCSP responder key (`ocsp-responder-<issuer_id>.key`). When unset, the responder service uses a temporary directory that does NOT survive restarts — fine for dev, NEVER for prod. |
| `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE` | `7d` | When the responder cert's `NotAfter` falls within this window, `EnsureResponder` rotates to a fresh cert+key on the next OCSP request or scheduler tick. |
| `CERTCTL_OCSP_RESPONDER_VALIDITY` | `30d` | How long each newly-issued responder cert is valid for. Short by design — relying parties cache OCSP responses, not the responder cert chain, and `id-pkix-ocsp-nocheck` blocks recursive revocation checking on the responder itself. |
The issuer-level CRL `nextUpdate` is derived from the generation timestamp +
the configured CRL validity (currently a build-time constant in the
`CRLCacheService`; configurable knob deferred until an operator asks).
---
## OCSP responder cert lifecycle
1. **First OCSP request for an issuer (or scheduler tick).** The local
issuer's `SignOCSPResponse` calls into `OCSPResponderService.EnsureResponder`.
2. **Cache lookup.** `EnsureResponder` queries the `ocsp_responders` table for
a row keyed by `issuer_id`.
3. **Disk lookup.** If a row exists, the FileDriver reads the persisted key
from `<keydir>/ocsp-responder-<issuer_id>.key`. **Self-healing:** if the
row exists but the file is missing (operator pruned the keydir without
pruning the DB), the service treats this as "rotate now" rather than
crashing.
4. **Rotation check.** If `cert.NotAfter < now + RotationGrace`, the service
generates a fresh ECDSA-P256 key, builds a `*x509.CertificateRequest`,
and asks the local issuer's existing `IssueCertificate` flow to sign it.
The signing template carries:
- `KeyUsage: x509.KeyUsageDigitalSignature` (signing OCSP responses)
- `ExtKeyUsage: x509.ExtKeyUsageOCSPSigning` (RFC 6960 §4.2.2.2)
- The `id-pkix-ocsp-nocheck` extension (OID `1.3.6.1.5.5.7.48.1.5`,
DER value `NULL`, RFC 6960 §4.2.2.2.1) wired through
`Certificate.ExtraExtensions`.
5. **Persistence.** The new cert + key path are written to `ocsp_responders`
via an idempotent `INSERT … ON CONFLICT DO UPDATE`.
6. **Response signing.** `ocsp.CreateResponse(caCert, responderCert,
template, responderSigner)` produces the response bytes; the responder
cert is included in the response chain so relying parties can validate
without a separate fetch.
The race between scheduler-driven cache refresh and on-demand cache miss is
collapsed by the `CRLCacheService`'s in-tree singleflight (a `sync.Map` of
`*flightEntry` keyed by `issuer_id`). Concurrent generation requests for the
same issuer wait on the in-flight result rather than each rebuilding from
scratch.
---
## Pointing common consumers at the endpoints
### cert-manager (Kubernetes)
cert-manager's certificate-validation logic checks both the AIA OCSP URI
embedded in the leaf and the CDP CRL URI. Both are populated automatically
by the local issuer's certificate template — relying parties should NOT
need any additional configuration. To verify:
```bash
openssl x509 -in leaf.crt -text -noout | grep -A1 "Authority Information Access"
openssl x509 -in leaf.crt -text -noout | grep -A2 "CRL Distribution Points"
```
If your cert-manager pods cannot reach `https://<certctl-host>:8443/.well-known/pki/`,
add a NetworkPolicy egress rule or expose the certctl service via the
appropriate ingress class.
### Firefox
Firefox honors the AIA OCSP URI by default. To force-refresh the local
revocation cache after revoking a cert in dev:
```
about:preferences#privacy → Certificates → Query OCSP responder servers
```
If Firefox reports `SEC_ERROR_OCSP_INVALID_SIGNING_CERT`, verify that the
responder cert chain is reachable from the system trust store —
`id-pkix-ocsp-nocheck` is a Firefox-strict extension and is set automatically
on every responder cert certctl issues.
### OpenSSL
```bash
# OCSP via stand-alone request
openssl ocsp -issuer ca.crt -cert leaf.crt -url https://localhost:8443/.well-known/pki/ocsp/iss-local -CAfile ca.crt -text
# OCSP via TLS Certificate Status Request extension
openssl s_client -connect example.com:443 -status -CAfile ca.crt
```
### Intune (corporate device state)
Intune device-compliance validators pull the CRL on a schedule (configured in
the Intune admin console, default 24h). Configure the CRL distribution point
to `https://<certctl-host>:8443/.well-known/pki/crl/<issuer_id>` and Intune
will pull on its own cadence.
---
## What this release does NOT include (V3-Pro)
The following are explicitly out of scope for the V2 (free) bundle and are
tracked for the certctl Pro release:
- **Delta CRLs (RFC 5280 §5.2.4).** Useful for very large CRLs (10k+
revoked certs); the data model already accommodates the Base CRL Number
reference but the pipeline only emits Base CRLs in V2.
- **OCSP rate-limiting per relying party.** Per-IP token bucket on the OCSP
endpoint — V3-Pro because it justifies per-seat pricing for high-traffic
responders.
- **OCSP stapling.** Server-side: cache pre-fetched OCSP responses + serve
in TLS handshake. Client-side: a "stapling fetcher" agent for non-stapling
origins.
The MaxBytesReader cap is the only request-level guard in V2; the
unauthenticated-by-design relying-party endpoints are intentionally not
rate-limited per IP.
---
## Troubleshooting
**`pki/crl/<issuer_id>` returns 404.** The issuer either does not support
CRL signing (Vault, EJBCA, DigiCert serve their own CRL infrastructure;
certctl's connectors return `nil` from `GenerateCRL` for these) or the
issuer ID is wrong. Verify with `GET /api/v1/issuers`.
**`pki/ocsp/<issuer_id>/<serial>` returns 200 but `openssl ocsp -text`
shows "unauthorized".** Check that the serial in the URL is hex-encoded (no
`0x` prefix, no leading zeros stripped, lowercase). Mismatched serials
return an OCSP response with status `unauthorized` per RFC 6960 §2.3.
**Admin cache endpoint returns 403.** The Bearer key does not carry the
admin flag. M-008 gates this endpoint server-side; the GUI also gates the
fetch on `useAuth().admin`. Either escalate the key (`certctl admin
keys promote <key-id>`) or use a different identity.
**Cache shows `is_stale: true` repeatedly.** The scheduler is not running
(or not getting scheduled often enough). Check `CERTCTL_CRL_GENERATION_INTERVAL`
and confirm the scheduler started: `grep crlGenerationLoop` in the server
logs at startup.