Files
certctl/migrations/000024_ocsp_response_cache.up.sql
shankar0123 40fd96a416 feat(ocsp): pre-signed response cache + invalidate-on-revoke (Phase 2)
Production hardening II Phase 2 — closes the per-request live-signing
bottleneck for OCSP. Mirrors the existing crl_cache pattern (migration
000019 / internal/service/crl_cache.go) but per (issuer_id, serial_hex)
instead of per-issuer.

LOAD-BEARING SECURITY INVARIANT: a revoked cert MUST NOT continue to
return the stale 'good' cached response after revocation. The
RevocationSvc.RevokeCertificateWithActor flow now calls
OCSPResponseCacheService.InvalidateOnRevoke after a successful revoke
so the next OCSP fetch falls through to live signing and returns the
revoked status. Pinned by TestOCSPCache_InvalidateOnRevoke_NextFetchReturnsRevoked.

NEW migrations/000024_ocsp_response_cache.{up,down}.sql with composite
PK (issuer_id, serial_hex), nullable revocation_reason / revoked_at,
next_update index for the scheduler refresh loop, issuer_id index for
admin observability.

NEW internal/domain/ocsp_response_cache.go::OCSPResponseCacheEntry +
IsStale helper.

NEW internal/repository/postgres/ocsp_response_cache.go implementing
repository.OCSPResponseCacheRepository (Get / Put / Delete /
CountByIssuer). Interface defined in internal/repository/interfaces.go.

NEW internal/service/ocsp_response_cache.go::OCSPResponseCacheService
with read-through facade + sync.Map singleflight + InvalidateOnRevoke.
On cache miss, calls caOperationsSvc.LiveSignOCSPResponse(nil) — the
NEW bypass-cache entry point — to break the cyclic dependency between
cache and CAOps.

REFACTORED internal/service/ca_operations.go:
  - GetOCSPResponseWithNonce now dispatches: nil-nonce + cache wired
    → cacheSvc.Get (cache); nonce != nil OR cache nil → live-sign.
  - LiveSignOCSPResponse is the new exported bypass-cache entry point;
    contains the body of what was previously the GetOCSPResponse-
    With-Nonce path.
  - SetOCSPCacheSvc + new OCSPResponseCacher interface (cyclic-dep
    break + test-injectable).

The cache stores nil-nonce blobs by design. Nonce-bearing requests
always live-sign because re-signing to add a nonce defeats caching;
this is a deliberate tradeoff — most relying parties don't send
nonces (Apple Push, Microsoft Edge SmartScreen, Firefox), and the
minority that do already accept the extra round-trip cost for replay
protection.

WIRED in cmd/server/main.go alongside the existing CRL cache wire:
ocspResponseCacheRepo + ocspResponseCacheService + SetOCSPCacheSvc +
SetOCSPCacheInvalidator. Existing deploys see no behavior change
(cache is consulted but on every cold-start the first fetch lands
through the live-sign + write-back path).

NOT YET WIRED in this commit (deferred to next phase commit to keep
this one shippable):
  - Scheduler ocspCacheRefreshLoop (the warm-on-startup + N-hourly
    refresh loop). The cache works without it; entries just live-sign
    on miss + cache hit thereafter, so cold caches warm up
    organically as relying parties query.
  - Admin observability endpoint /api/v1/admin/ocsp/cache.
  - CERTCTL_OCSP_CACHE_REFRESH_INTERVAL env var.
  These three are the visible-but-not-load-bearing wires; the security
  invariant (no stale-good-after-revoke) is fully shipped here.

7 new tests in internal/service/ocsp_response_cache_test.go pin every
documented invariant, with TestOCSPCache_InvalidateOnRevoke_NextFetch
ReturnsRevoked called out as the load-bearing security test.

Pre-commit verification: go build ./... clean; go test -short -count=1
green for service/ + handler/ + connector/issuer/local/.
2026-04-30 05:03:01 +00:00

53 lines
2.6 KiB
SQL

-- 000024_ocsp_response_cache.up.sql
--
-- Production hardening II Phase 2: pre-signed OCSP response cache.
--
-- Mirrors the crl_cache pattern from migration 000019 — same
-- read-through facade, same scheduler-driven refresh — but per
-- (issuer_id, serial) instead of per-issuer. Without this cache, every
-- inbound OCSP request triggers a fresh signature with the dedicated
-- responder cert, which becomes the bottleneck for high-volume relying
-- parties (Apple Push, Microsoft Edge SmartScreen, etc.).
--
-- After this migration the scheduler's ocspCacheRefreshLoop pre-signs
-- responses for every active (issuer_id, serial) at a configurable
-- interval (default 1h, env var CERTCTL_OCSP_CACHE_REFRESH_INTERVAL),
-- and CAOperationsSvc.GetOCSPResponseWithNonce reads from the cache
-- on the hot path. On cache miss the service falls back to live
-- signing AND writes the result back to the cache (read-through).
--
-- LOAD-BEARING SECURITY INVARIANT: the revocation service MUST call
-- OCSPResponseCacheService.InvalidateOnRevoke after a successful
-- revoke. Without that wire, a revoked cert keeps returning the
-- stale "good" response from cache until the next scheduler tick —
-- a security incident. The Phase 2 prompt's frozen decision 0.4
-- mandates this.
--
-- Idempotent: every CREATE uses IF NOT EXISTS so re-running the
-- migration is safe (matches the project's migration convention).
CREATE TABLE IF NOT EXISTS ocsp_response_cache (
issuer_id TEXT NOT NULL REFERENCES issuers(id) ON DELETE CASCADE,
serial_hex TEXT NOT NULL,
response_der BYTEA NOT NULL,
cert_status TEXT NOT NULL, -- 'good' | 'revoked' | 'unknown'
revocation_reason INTEGER, -- nullable; set only when cert_status='revoked'
revoked_at TIMESTAMPTZ, -- nullable; set only when cert_status='revoked'
this_update TIMESTAMPTZ NOT NULL,
next_update TIMESTAMPTZ NOT NULL,
generated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (issuer_id, serial_hex)
);
-- Lets the scheduler refresh loop quickly identify entries whose
-- next_update has fallen behind the current time. Runs at every
-- ocspCacheRefreshLoop tick.
CREATE INDEX IF NOT EXISTS idx_ocsp_response_cache_next_update
ON ocsp_response_cache(next_update);
-- Lets the admin observability endpoint efficiently list per-issuer
-- entries for the GUI cache stats panel (Phase 8 wires this into the
-- AdminCRLCacheHandler-equivalent).
CREATE INDEX IF NOT EXISTS idx_ocsp_response_cache_issuer
ON ocsp_response_cache(issuer_id);