Files
shankar0123 a41fc2d75c feat(ratelimit): Phase 13 Sprint 13.3 — wire backend selector + scheduler janitor + docs + helm (ARCH-M1 closure complete)
Phase 13 Sprint 13.3 — the completion half of the ARCH-M1
substantive close. Sprint 13.2 shipped the Postgres-backed
sliding-window limiter + multi-replica integration test; Sprint 13.3
wires the 6 call sites in cmd/server/main.go through the operator-
chosen backend selector, adds the rate_limit_buckets scheduler
janitor sweep, rewrites the observability doc, exposes the env-var
in the helm chart, and promotes the multi-replica integration test
to a required CI status check.

Signature ground-truth (sprint 13.2 + 13.3)
===========================================
Prompt-template signatures: `Allow(key string) error` and "5 call
sites." Actual repo: `Allow(key string, now time.Time) error` and 6
NewSlidingWindowLimiter call sites in cmd/server/main.go (the prompt
miscounted the second EST per-principal arm). Per CLAUDE.md "the repo
is truth," matched the live shape.

What changed
============

internal/config/server.go (+40 LOC):
  - Added `SlidingWindowBackend string` + `SlidingWindowJanitorInterval
    time.Duration` to RateLimitConfig with full operator-facing
    documentation of the two valid values (memory|postgres) +
    when-to-use-which decision tree.

internal/config/config.go (+27 LOC):
  - Load() reads CERTCTL_RATE_LIMIT_BACKEND (default "memory") +
    CERTCTL_RATE_LIMIT_JANITOR_INTERVAL (default 5m).
  - Validate() rejects anything other than ""/"memory"/"postgres"
    (empty = memory equivalence for test-built Configs that bypass
    Load()). Janitor interval must be ≥ 1 minute when set.
  - Failure modes return clear ::error:: with the env-var name + the
    valid values, so an operator typo ("postgress" → memory in a
    3-replica cluster) fails fast at startup.

internal/ratelimit/factory.go (NEW, 67 LOC):
  - NewLimiter(backend, db, maxN, window, mapCap) Limiter — single
    factory the 6 cmd/server/main.go call sites route through.
  - Drop-in signature: same maxN/window/mapCap as
    NewSlidingWindowLimiter (mapCap accepted + ignored for postgres
    — the rate_limit_buckets table grows until the janitor sweeps).
  - Defensive panic on unknown backend (config.Validate is SoT;
    this is belt-and-suspenders).

internal/ratelimit/postgres_gc.go (NEW, 73 LOC):
  - PostgresGC struct + NewPostgresGC + GarbageCollect.
  - Single-statement DELETE FROM rate_limit_buckets WHERE
    updated_at < NOW() - maxWindow. Idempotent.
  - maxWindow <= 0 is a no-op (operator opt-out).

internal/scheduler/scheduler.go (+90 LOC):
  - New RateLimitGarbageCollector interface (mirrors the
    ACMEGarbageCollector / SessionGarbageCollector contracts).
  - rateLimitGC field + rateLimitGCInterval + rateLimitGCRunning
    on Scheduler.
  - SetRateLimitGarbageCollector(gc) + SetRateLimitGCInterval(d)
    Setters following the existing acmeGC/sessionGC pattern.
  - rateLimitGCLoop() — JitteredTicker + atomic.Bool guard +
    per-tick context.WithTimeout(1m). Logs row count at Debug.
  - Loop counted in the Start() WaitGroup only when the GC is
    non-nil; cmd/server/main.go skips SetRateLimitGarbageCollector
    when backend=memory so the loop never launches for that case.

cmd/server/main.go (35 LOC diff):
  - All 6 ratelimit.NewSlidingWindowLimiter call sites now route
    through ratelimit.NewLimiter(cfg.RateLimit.SlidingWindowBackend,
    db, ...). Grep verification post-fix returns ZERO hits.
  - Six sites: breakglass loginLimiter (580), ocspLimiter (1003),
    exportLimiter (1068), EST failed-basic (1535), EST per-principal
    SCEP-mTLS arm (1591), EST per-principal SCEP arm (1613). The
    intune.NewPerDeviceRateLimiter site at line 1823 stays unmoved
    — its inner type-alias wrapper is the prompt's
    out-of-scope (cmd/server/*.go only).
  - Conditionally constructs PostgresGC + wires the scheduler janitor
    when backend=postgres; logs the wiring decision either way so
    operators see "rate-limit GC sweep enabled (postgres backend)"
    or "in-memory backend self-prunes" in the boot log.

internal/api/handler/{est,export,certificates,auth_breakglass}.go:
  - Replaced 5 *ratelimit.SlidingWindowLimiter field/Setter types
    with ratelimit.Limiter (the interface). Allow() satisfies the
    same call shape on both backends; the in-memory tests that
    construct *SlidingWindowLimiter still compile because the
    concrete type satisfies the interface (compile-time check in
    internal/ratelimit/limiter.go pins this).

docs/operator/observability.md (176 LOC diff):
  - Replaced the "per-process, in-memory, reset-on-restart, not
    shared across replicas" paragraph with the new
    configurable-backend section: operator decision tree,
    backend internals (memory vs postgres), janitor description,
    falsifiable closure proof (the Sprint 13.2 integration test
    name + invocation), helm chart wiring example.
  - Updated inventory to reflect the actual handler file paths +
    actual cap configurations (the prior doc said "60s window" for
    several limiters that actually use 60m / 24h windows).
  - Doc smoke confirmed: grep -c 'per-process, in-memory,
    reset-on-restart' docs/operator/observability.md = 0.

deploy/helm/certctl/values.yaml + templates/server-configmap.yaml +
templates/server-deployment.yaml:
  - Exposed server.rateLimiting.backend (default "memory") +
    server.rateLimiting.janitorInterval (default "5m") under the
    existing rateLimiting block.
  - ConfigMap renders both as rate-limit-backend +
    rate-limit-janitor-interval keys.
  - Deployment wires CERTCTL_RATE_LIMIT_BACKEND +
    CERTCTL_RATE_LIMIT_JANITOR_INTERVAL env vars from the configmap.
  - Helm render: `helm template deploy/helm/certctl --set
    server.rateLimiting.backend=postgres` shows the env-var on the
    server-deployment.yaml output.

.github/workflows/ci.yml (+12 LOC):
  - Added a new step in the Go Build & Test job that runs the
    Sprint 13.2 multi-replica integration test
    (TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas) with
    -tags=integration -race -timeout=300s. Fails the CI status check
    if the cross-replica row lock ever stops arbitrating across
    replicas — the ARCH-M1 closure regression gate.

Verification (all green locally; postgres integration via CI)
============================================================

  $ grep -nE 'NewSlidingWindowLimiter' cmd/server/*.go
    (zero hits — Sprint 13.3 receipt)

  $ go test -short -count=1 \
      ./internal/config/... ./internal/ratelimit/... \
      ./internal/scheduler/... ./internal/api/handler/... \
      ./cmd/server/...
    ok  internal/config       1.177s
    ok  internal/ratelimit    0.007s
    ok  internal/scheduler    9.165s
    ok  internal/api/handler  6.245s
    ok  cmd/server            0.390s

  $ staticcheck ./internal/ratelimit/... ./internal/scheduler/... \
      ./internal/config/... ./internal/api/handler/... ./cmd/server/...
    (clean)

  $ gofmt -l internal/ cmd/server/
    (clean)

  $ grep -c 'per-process, in-memory, reset-on-restart' \
      docs/operator/observability.md
    0   (doc smoke — the audit's verbatim phrasing is gone)

  $ bash scripts/ci-guards/G-3-env-docs-drift.sh
    G-3 env-docs-drift: clean.

  $ bash scripts/ci-guards/complete-path-config-coverage.sh
    OK — every CERTCTL_* env var (197) has at least one non-config-
    package consumer.

Selector contract verified — config.Validate() rejects any value
other than ""/memory/postgres at startup with a clear error message.

Sprint 13.4 next (ARCH-H1 OpenAPI authoring batch 1) is on a
different axis; ARCH-M1 closure is complete with this commit
modulo the Sprint 13.7 audit-HTML flip + zero-floor pin.

Closes: ARCH-M1 substantive remediation. The cross-replica rate-
limit-cap-enforcement gap that the audit recommended deferring to
v3 is closed; operators with server.replicas > 1 flip
CERTCTL_RATE_LIMIT_BACKEND=postgres and get exactly-cap enforcement
across the cluster (proved by the multi-replica integration test now
gating CI).
2026-05-14 11:52:13 +00:00

902 lines
32 KiB
Go

// Copyright 2026 certctl LLC. All rights reserved.
// SPDX-License-Identifier: BUSL-1.1
package handler
import (
"context"
"crypto/sha256"
"encoding/json"
"errors"
"fmt"
"io"
"log/slog"
"net"
"net/http"
"strconv"
"strings"
"time"
"golang.org/x/crypto/ocsp"
"github.com/certctl-io/certctl/internal/api/middleware"
"github.com/certctl-io/certctl/internal/domain"
"github.com/certctl-io/certctl/internal/ratelimit"
"github.com/certctl-io/certctl/internal/repository"
"github.com/certctl-io/certctl/internal/service"
)
// CertificateService defines the service interface for certificate operations.
type CertificateService interface {
ListCertificates(ctx context.Context, status, environment, ownerID, teamID, issuerID string, page, perPage int) ([]domain.ManagedCertificate, int64, error)
ListCertificatesWithFilter(ctx context.Context, filter *repository.CertificateFilter) ([]domain.ManagedCertificate, int, error)
GetCertificate(ctx context.Context, id string) (*domain.ManagedCertificate, error)
CreateCertificate(ctx context.Context, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
UpdateCertificate(ctx context.Context, id string, cert domain.ManagedCertificate) (*domain.ManagedCertificate, error)
ArchiveCertificate(ctx context.Context, id string) error
GetCertificateVersions(ctx context.Context, certID string, page, perPage int) ([]domain.CertificateVersion, int64, error)
TriggerRenewal(ctx context.Context, certID string, actor string, force bool) error
TriggerDeployment(ctx context.Context, certID string, targetID string, actor string) error
RevokeCertificate(ctx context.Context, certID string, reason string, actor string) error
GetRevokedCertificates(ctx context.Context) ([]*domain.CertificateRevocation, error)
GenerateDERCRL(ctx context.Context, issuerID string) ([]byte, error)
GetOCSPResponse(ctx context.Context, issuerID string, serialHex string) ([]byte, error)
// GetOCSPResponseWithNonce is the nonce-aware variant added in
// production hardening II Phase 1. When nonce is non-nil, the
// responder echoes it in the response per RFC 6960 §4.4.1. A nil
// nonce produces a response without the nonce extension.
GetOCSPResponseWithNonce(ctx context.Context, issuerID string, serialHex string, nonce []byte) ([]byte, error)
GetCertificateDeployments(ctx context.Context, certID string) ([]domain.DeploymentTarget, error)
}
// CertificateHandler handles HTTP requests for certificate operations.
type CertificateHandler struct {
svc CertificateService
ocspLimiter ratelimit.Limiter // production hardening II Phase 3 — per-source-IP cap on OCSP
}
// NewCertificateHandler creates a new CertificateHandler with a service dependency.
func NewCertificateHandler(svc CertificateService) CertificateHandler {
return CertificateHandler{svc: svc}
}
// SetOCSPRateLimiter wires the per-source-IP OCSP rate limiter.
// Production hardening II Phase 3. Default cap (when set in
// cmd/server/main.go): 1000 req/min/IP. Setting to nil disables the
// limit; the limiter's own NewSlidingWindowLimiter(maxN<=0, ...)
// also produces a no-op limiter, so the env-var-zero case is safe.
func (h *CertificateHandler) SetOCSPRateLimiter(l ratelimit.Limiter) {
h.ocspLimiter = l
}
// ListCertificates lists certificates with optional filtering.
// GET /api/v1/certificates?status=Active&environment=prod&owner_id=...&team_id=...&issuer_id=...&agent_id=...&profile_id=...&expires_before=...&expires_after=...&created_after=...&updated_after=...&sort=notAfter&sort_desc=false&cursor=...&page=1&per_page=50&fields=id,commonName,status
func (h CertificateHandler) ListCertificates(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Parse query parameters
query := r.URL.Query()
// Basic filters
filter := &repository.CertificateFilter{
Status: query.Get("status"),
Environment: query.Get("environment"),
OwnerID: query.Get("owner_id"),
TeamID: query.Get("team_id"),
IssuerID: query.Get("issuer_id"),
AgentID: query.Get("agent_id"),
ProfileID: query.Get("profile_id"),
}
// Time-range filters
if eb := query.Get("expires_before"); eb != "" {
if t, err := time.Parse(time.RFC3339, eb); err == nil {
filter.ExpiresBefore = &t
}
}
if ea := query.Get("expires_after"); ea != "" {
if t, err := time.Parse(time.RFC3339, ea); err == nil {
filter.ExpiresAfter = &t
}
}
if ca := query.Get("created_after"); ca != "" {
if t, err := time.Parse(time.RFC3339, ca); err == nil {
filter.CreatedAfter = &t
}
}
if ua := query.Get("updated_after"); ua != "" {
if t, err := time.Parse(time.RFC3339, ua); err == nil {
filter.UpdatedAfter = &t
}
}
// Sorting
if sort := query.Get("sort"); sort != "" {
// Handle sort direction prefix
if strings.HasPrefix(sort, "-") {
filter.Sort = sort[1:]
filter.SortDesc = true
} else {
filter.Sort = sort
filter.SortDesc = query.Get("sort_desc") == "true"
}
}
// Cursor-based pagination
filter.Cursor = query.Get("cursor")
// Page-based pagination
page := 1
perPage := 50
if p := query.Get("page"); p != "" {
if parsed, err := strconv.Atoi(p); err == nil && parsed > 0 {
page = parsed
}
}
if pp := query.Get("per_page"); pp != "" {
if parsed, err := strconv.Atoi(pp); err == nil && parsed > 0 && parsed <= 500 {
perPage = parsed
}
}
if ps := query.Get("page_size"); ps != "" {
if parsed, err := strconv.Atoi(ps); err == nil && parsed > 0 && parsed <= 500 {
filter.PageSize = parsed
}
}
filter.Page = page
filter.PerPage = perPage
// Sparse fields
if fieldsStr := query.Get("fields"); fieldsStr != "" {
filter.Fields = strings.Split(fieldsStr, ",")
}
certs, total, err := h.svc.ListCertificatesWithFilter(r.Context(), filter)
if err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to list certificates", requestID)
return
}
// Apply sparse field filtering if requested
var responseData interface{} = certs
if len(filter.Fields) > 0 {
responseData = filterFields(certs, filter.Fields)
}
// Return cursor-based or page-based response depending on which pagination is used
if filter.Cursor != "" {
// Compute next cursor from last result
nextCursor := ""
if len(certs) > 0 {
lastCert := certs[len(certs)-1]
nextCursor = encodeCursor(lastCert.CreatedAt, lastCert.ID)
}
pageSize := filter.PageSize
if pageSize == 0 {
pageSize = filter.PerPage
}
response := CursorPagedResponse{
Data: responseData,
Total: int64(total),
NextCursor: nextCursor,
PageSize: pageSize,
}
JSON(w, http.StatusOK, response)
} else {
response := PagedResponse{
Data: responseData,
Total: int64(total),
Page: page,
PerPage: perPage,
}
JSON(w, http.StatusOK, response)
}
}
// GetCertificate retrieves a single certificate by ID.
// GET /api/v1/certificates/{id}
func (h CertificateHandler) GetCertificate(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
id := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
if id == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
cert, err := h.svc.GetCertificate(r.Context(), id)
if err != nil {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
JSON(w, http.StatusOK, cert)
}
// CreateCertificate creates a new certificate.
// POST /api/v1/certificates
func (h CertificateHandler) CreateCertificate(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
var cert domain.ManagedCertificate
if err := json.NewDecoder(r.Body).Decode(&cert); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, "Invalid request body", requestID)
return
}
// Validate required fields
if err := ValidateRequired("common_name", cert.CommonName); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateCommonName(cert.CommonName); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateRequired("owner_id", cert.OwnerID); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateRequired("team_id", cert.TeamID); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateRequired("issuer_id", cert.IssuerID); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateRequired("name", cert.Name); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
if err := ValidateRequired("renewal_policy_id", cert.RenewalPolicyID); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
created, err := h.svc.CreateCertificate(r.Context(), cert)
if err != nil {
slog.Error("failed to create certificate", "error", err, "request_id", requestID, "common_name", cert.CommonName, "name", cert.Name)
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to create certificate", requestID)
return
}
JSON(w, http.StatusCreated, created)
}
// UpdateCertificate updates an existing certificate.
// PUT /api/v1/certificates/{id}
func (h CertificateHandler) UpdateCertificate(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPut {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
id := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(id, "/")
if len(parts) == 0 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
id = parts[0]
var cert domain.ManagedCertificate
if err := json.NewDecoder(r.Body).Decode(&cert); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, "Invalid request body", requestID)
return
}
// Validate required fields (if provided)
if cert.CommonName != "" {
if err := ValidateCommonName(cert.CommonName); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
}
if cert.OwnerID != "" {
if err := ValidateStringLength("owner_id", cert.OwnerID, 255); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
}
if cert.TeamID != "" {
if err := ValidateStringLength("team_id", cert.TeamID, 255); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, err.Error(), requestID)
return
}
}
updated, err := h.svc.UpdateCertificate(r.Context(), id, cert)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
slog.Error("UpdateCertificate failed", "cert_id", id, "error", err.Error())
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to update certificate", requestID)
return
}
JSON(w, http.StatusOK, updated)
}
// ArchiveCertificate archives a certificate (soft delete).
// DELETE /api/v1/certificates/{id}
func (h CertificateHandler) ArchiveCertificate(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodDelete {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
id := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
if id == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
if err := h.svc.ArchiveCertificate(r.Context(), id); err != nil {
if errors.Is(err, repository.ErrNotFound) {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to archive certificate", requestID)
return
}
w.WriteHeader(http.StatusNoContent)
}
// GetCertificateVersions retrieves version history for a certificate.
// GET /api/v1/certificates/{id}/versions
func (h CertificateHandler) GetCertificateVersions(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract certificate ID from path /api/v1/certificates/{id}/versions
path := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(path, "/")
if len(parts) < 2 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
certID := parts[0]
page := 1
perPage := 50
query := r.URL.Query()
if p := query.Get("page"); p != "" {
if parsed, err := strconv.Atoi(p); err == nil && parsed > 0 {
page = parsed
}
}
if pp := query.Get("per_page"); pp != "" {
if parsed, err := strconv.Atoi(pp); err == nil && parsed > 0 && parsed <= 500 {
perPage = parsed
}
}
versions, total, err := h.svc.GetCertificateVersions(r.Context(), certID, page, perPage)
if err != nil {
if errors.Is(err, repository.ErrNotFound) {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
slog.Error("GetCertificateVersions failed", "cert_id", certID, "error", err.Error())
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to get certificate versions", requestID)
return
}
response := PagedResponse{
Data: versions,
Total: total,
Page: page,
PerPage: perPage,
}
JSON(w, http.StatusOK, response)
}
// TriggerRenewal triggers manual renewal for a certificate.
// POST /api/v1/certificates/{id}/renew
func (h CertificateHandler) TriggerRenewal(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract certificate ID from path /api/v1/certificates/{id}/renew
path := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(path, "/")
if len(parts) < 2 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
certID := parts[0]
actor := resolveActor(r.Context())
// 2026-05-05 parity-defaults-cleanup (P3-1): operators can opt into
// forcing a renewal when the cert is stuck in RenewalInProgress (a
// previous job hung without releasing the status flag). Accepted as
// either ?force=true query param OR {"force": true} JSON body so CLI
// + GUI clients can pick whichever flow fits their idiom.
force := false
if fv := r.URL.Query().Get("force"); fv == "true" || fv == "1" {
force = true
}
if !force && r.ContentLength > 0 && r.Header.Get("Content-Type") == "application/json" {
var body struct {
Force bool `json:"force,omitempty"`
}
if err := json.NewDecoder(r.Body).Decode(&body); err == nil {
force = body.Force
}
}
if err := h.svc.TriggerRenewal(r.Context(), certID, actor, force); err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
if strings.Contains(errMsg, "cannot renew") {
ErrorWithRequestID(w, http.StatusBadRequest, errMsg, requestID)
return
}
if strings.Contains(errMsg, "already in progress") {
ErrorWithRequestID(w, http.StatusConflict, errMsg, requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to trigger renewal", requestID)
return
}
response := map[string]string{
"status": "renewal_triggered",
}
JSON(w, http.StatusAccepted, response)
}
// TriggerDeployment triggers deployment of a certificate to targets.
// POST /api/v1/certificates/{id}/deploy
func (h CertificateHandler) TriggerDeployment(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract certificate ID from path /api/v1/certificates/{id}/deploy
path := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(path, "/")
if len(parts) < 2 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
certID := parts[0]
// Optional: parse request body for specific target ID
var req struct {
TargetID string `json:"target_id,omitempty"`
}
if r.Header.Get("Content-Type") == "application/json" {
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
// Log but don't fail - targetID is optional
ErrorWithRequestID(w, http.StatusBadRequest, "Invalid request body", requestID)
return
}
}
actor := resolveActor(r.Context())
if err := h.svc.TriggerDeployment(r.Context(), certID, req.TargetID, actor); err != nil {
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to trigger deployment", requestID)
return
}
response := map[string]string{
"status": "deployment_triggered",
}
JSON(w, http.StatusAccepted, response)
}
// RevokeCertificate revokes a certificate with an optional reason code.
// POST /api/v1/certificates/{id}/revoke
func (h CertificateHandler) RevokeCertificate(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract certificate ID from path /api/v1/certificates/{id}/revoke
path := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(path, "/")
if len(parts) < 2 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
certID := parts[0]
// Parse optional reason from request body
var req struct {
Reason string `json:"reason"`
}
if r.Body != nil && r.Header.Get("Content-Type") == "application/json" {
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, "Invalid request body", requestID)
return
}
}
actor := resolveActor(r.Context())
if err := h.svc.RevokeCertificate(r.Context(), certID, req.Reason, actor); err != nil {
// Distinguish between client errors and server errors
errMsg := err.Error()
if strings.Contains(errMsg, "already revoked") ||
strings.Contains(errMsg, "cannot revoke") ||
strings.Contains(errMsg, "invalid revocation reason") {
ErrorWithRequestID(w, http.StatusBadRequest, errMsg, requestID)
return
}
if strings.Contains(errMsg, "not found") || strings.Contains(errMsg, "failed to fetch") || strings.Contains(errMsg, "failed to get") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to revoke certificate", requestID)
return
}
JSON(w, http.StatusOK, map[string]string{"status": "revoked"})
}
// GetDERCRL returns a DER-encoded X.509 CRL signed by the specified issuer.
// GET /.well-known/pki/crl/{issuer_id}
//
// RFC 5280 § 5. Served unauthenticated under the /.well-known/pki/ namespace so
// relying parties (browsers, OpenSSL, OCSP stapling sidecars) can fetch the CRL
// without presenting certctl API credentials.
func (h CertificateHandler) GetDERCRL(w http.ResponseWriter, r *http.Request) {
requestID, _ := r.Context().Value("request_id").(string)
if r.Method != http.MethodGet {
ErrorWithRequestID(w, http.StatusMethodNotAllowed, "Method not allowed", requestID)
return
}
issuerID := strings.TrimPrefix(r.URL.Path, "/.well-known/pki/crl/")
if issuerID == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Issuer ID is required", requestID)
return
}
derBytes, err := h.svc.GenerateDERCRL(r.Context(), issuerID)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, errMsg, requestID)
return
}
if strings.Contains(errMsg, "do not support") || strings.Contains(errMsg, "does not support") {
ErrorWithRequestID(w, http.StatusNotImplemented, errMsg, requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to generate CRL", requestID)
return
}
// Production hardening II Phase 4: HTTP caching headers per RFC 7232.
// CDNs and reverse proxies in front of certctl can serve repeated
// CRL fetches from their edge caches (saves both bandwidth + the
// per-request DB read on certctl's side).
//
// ETag is the SHA-256 of the DER body, weak-form (W/) per RFC 7232
// §2.3 because the body bytes are the canonical identity but two
// different generation runs of the same revocation set could produce
// byte-identical CRLs (deterministic builder) — weak ETag covers
// the future case where signature randomness leaks into the bytes.
etagBytes := sha256.Sum256(derBytes)
etag := fmt.Sprintf("W/\"%x\"", etagBytes[:16]) // first 16 bytes of SHA-256 — sufficient ID space
w.Header().Set("ETag", etag)
// If-None-Match short-circuits to 304 Not Modified. RFC 7232 §3.2.
// We compare the raw header against our ETag literal; a missing
// header simply produces no match and falls through.
if match := r.Header.Get("If-None-Match"); match != "" && match == etag {
w.WriteHeader(http.StatusNotModified)
return
}
// Cache-Control max-age derived from the CRL's nextUpdate window.
// We don't have the parsed CRL handy here (the service returns raw
// DER), so derive a conservative TTL from the current scheduler
// regen interval — relying parties that respect max-age won't
// re-fetch within that window. Floor at 60s so we never advertise
// max-age=0 even on degenerate test cases.
const crlCacheControlSeconds = 3600 // 1h matches default CRL regen cadence
w.Header().Set("Content-Type", "application/pkix-crl")
w.Header().Set("Cache-Control", fmt.Sprintf("public, max-age=%d, must-revalidate", crlCacheControlSeconds))
w.WriteHeader(http.StatusOK)
w.Write(derBytes)
}
// ocspSourceIP extracts the source IP from the request for the
// per-IP rate limiter. Production hardening II Phase 3.
//
// Strategy: net.SplitHostPort on RemoteAddr; on parse failure fall
// back to the bare RemoteAddr string. We deliberately do NOT honor
// X-Forwarded-For here because OCSP is publicly reachable and
// untrusted intermediaries could spoof the header to bypass the
// limit. Operators behind a trusted reverse proxy should configure
// the proxy to pass through the original IP via the standard
// transport (rewriting RemoteAddr at the proxy boundary).
func ocspSourceIP(r *http.Request) string {
if r == nil {
return ""
}
host, _, err := net.SplitHostPort(r.RemoteAddr)
if err != nil {
return r.RemoteAddr
}
return host
}
// applyOCSPRateLimit enforces the per-source-IP cap. Returns true when
// the request was rejected (handler should stop). Returns false to
// continue processing. Production hardening II Phase 3.
func (h CertificateHandler) applyOCSPRateLimit(w http.ResponseWriter, r *http.Request) bool {
if h.ocspLimiter == nil {
return false
}
ip := ocspSourceIP(r)
if err := h.ocspLimiter.Allow(ip, time.Now()); err != nil {
// Rate-limited: respond with the canonical OCSP "tryLater"
// status (status 3 per RFC 6960 §2.3) plus an HTTP-level
// Retry-After hint. ocsp.UnauthorizedErrorResponse is
// status 6 (unauthorized); we use that here too because
// x/crypto/ocsp doesn't ship a TryLater pre-built blob and
// rolling our own DER for one rejection path adds a
// fragility surface for no relying-party benefit
// (everything that retries an OCSP failure retries on any
// non-good status, not specifically TryLater).
w.Header().Set("Content-Type", "application/ocsp-response")
w.Header().Set("Retry-After", "60")
w.WriteHeader(http.StatusOK)
_, _ = w.Write(ocsp.UnauthorizedErrorResponse)
return true
}
return false
}
// HandleOCSP processes OCSP requests.
// GET /.well-known/pki/ocsp/{issuer_id}/{serial_hex}
//
// RFC 6960. Served unauthenticated under the /.well-known/pki/ namespace. For
// simplicity we accept GET with path params rather than the binary POST body
// form — the response is a valid DER-encoded OCSP response either way.
func (h CertificateHandler) HandleOCSP(w http.ResponseWriter, r *http.Request) {
requestID, _ := r.Context().Value("request_id").(string)
if r.Method != http.MethodGet {
ErrorWithRequestID(w, http.StatusMethodNotAllowed, "Method not allowed", requestID)
return
}
// Production hardening II Phase 3: per-source-IP rate limit.
// When the cap is tripped, applyOCSPRateLimit writes the
// rate-limited OCSP response and returns true — handler stops.
if h.applyOCSPRateLimit(w, r) {
return
}
// Extract issuer_id and serial from path: /.well-known/pki/ocsp/{issuer_id}/{serial_hex}
path := strings.TrimPrefix(r.URL.Path, "/.well-known/pki/ocsp/")
parts := strings.SplitN(path, "/", 2)
if len(parts) < 2 || parts[0] == "" || parts[1] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Issuer ID and serial number are required", requestID)
return
}
issuerID := parts[0]
serialHex := parts[1]
derBytes, err := h.svc.GetOCSPResponse(r.Context(), issuerID, serialHex)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, errMsg, requestID)
return
}
if strings.Contains(errMsg, "do not support") || strings.Contains(errMsg, "does not support") {
ErrorWithRequestID(w, http.StatusNotImplemented, errMsg, requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to generate OCSP response", requestID)
return
}
w.Header().Set("Content-Type", "application/ocsp-response")
w.Header().Set("Cache-Control", "max-age=3600")
w.WriteHeader(http.StatusOK)
w.Write(derBytes)
}
// HandleOCSPPost processes RFC 6960 §A.1.1 POST OCSP requests.
// POST /.well-known/pki/ocsp/{issuer_id}
//
// The body MUST be the binary DER-encoded OCSPRequest with content-type
// "application/ocsp-request". The response is the same DER-encoded
// OCSPResponse with content-type "application/ocsp-response" returned
// by the existing GET handler — only the input shape differs.
//
// POST is the standard transport for production OCSP clients (Firefox,
// OpenSSL `s_client -status`, cert-manager, Microsoft Intune device
// validators). The pre-existing GET form is kept for ad-hoc curl
// inspection + human-readable URL paths.
//
// Bundle CRL/OCSP-Responder Phase 4.
func (h CertificateHandler) HandleOCSPPost(w http.ResponseWriter, r *http.Request) {
requestID, _ := r.Context().Value("request_id").(string)
if r.Method != http.MethodPost {
ErrorWithRequestID(w, http.StatusMethodNotAllowed, "Method not allowed", requestID)
return
}
// Production hardening II Phase 3: per-source-IP rate limit.
if h.applyOCSPRateLimit(w, r) {
return
}
// Be tolerant about Content-Type: RFC 6960 §A.1.1 says it MUST be
// "application/ocsp-request" but real-world clients sometimes omit
// the header or send it with a charset suffix. We require the
// substring "ocsp-request" rather than exact match — the actual
// validation happens in ocsp.ParseRequest below; a malformed body
// fails there with a 400.
ct := r.Header.Get("Content-Type")
if ct != "" && !strings.Contains(strings.ToLower(ct), "ocsp-request") {
ErrorWithRequestID(w, http.StatusUnsupportedMediaType,
fmt.Sprintf("Content-Type must be application/ocsp-request, got %q", ct), requestID)
return
}
// Issuer ID from the path. The router pattern strips the leading
// /.well-known/pki/ocsp/ prefix; what remains is the bare issuer ID.
issuerID := strings.TrimPrefix(r.URL.Path, "/.well-known/pki/ocsp/")
issuerID = strings.TrimSuffix(issuerID, "/")
if issuerID == "" || strings.Contains(issuerID, "/") {
ErrorWithRequestID(w, http.StatusBadRequest, "Issuer ID is required", requestID)
return
}
// Body is already MaxBytesReader-capped by the body-size middleware.
// OCSPRequest bodies are tiny (~200 bytes for a single-cert query),
// so the default cap is comfortably above what any legitimate client
// will send.
body, err := io.ReadAll(r.Body)
if err != nil {
ErrorWithRequestID(w, http.StatusBadRequest, "Failed to read request body", requestID)
return
}
ocspReq, err := ocsp.ParseRequest(body)
if err != nil {
ErrorWithRequestID(w, http.StatusBadRequest,
fmt.Sprintf("Invalid OCSPRequest: %v", err), requestID)
return
}
// Production hardening II Phase 1: extract the optional RFC 6960
// §4.4.1 nonce extension from the request. golang.org/x/crypto/ocsp
// doesn't expose the request's extensions, so we walk the raw DER
// ourselves via service.ParseOCSPRequestNonce.
//
// Failure modes:
// - no nonce (most relying parties): nonce=nil, present=false,
// err=nil -> proceed without echoing.
// - well-formed nonce <= 32 bytes: nonce=bytes, present=true,
// err=nil -> plumb through GetOCSPResponseWithNonce.
// - malformed nonce (empty or > 32 bytes): err=ErrOCSPNonceMalformed
// -> respond with the OCSP "unauthorized" status (RFC 6960 §2.3
// status code 6) rather than echoing potentially-malicious bytes.
nonce, _, nonceErr := service.ParseOCSPRequestNonce(body)
if errors.Is(nonceErr, service.ErrOCSPNonceMalformed) {
w.Header().Set("Content-Type", "application/ocsp-response")
w.WriteHeader(http.StatusOK)
// ocsp.UnauthorizedErrorResponse is the canonical pre-built
// error response (status 6) per RFC 6960 §4.2.1.
w.Write(ocsp.UnauthorizedErrorResponse)
return
}
// Reuse the existing service path. The serial extracted from the
// parsed OCSPRequest is converted to hex (the on-disk format for
// certctl serials matches certificate.SerialNumber.Text(16)).
serialHex := fmt.Sprintf("%x", ocspReq.SerialNumber)
derBytes, err := h.svc.GetOCSPResponseWithNonce(r.Context(), issuerID, serialHex, nonce)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, errMsg, requestID)
return
}
if strings.Contains(errMsg, "do not support") || strings.Contains(errMsg, "does not support") {
ErrorWithRequestID(w, http.StatusNotImplemented, errMsg, requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to generate OCSP response", requestID)
return
}
w.Header().Set("Content-Type", "application/ocsp-response")
w.Header().Set("Cache-Control", "max-age=3600")
w.WriteHeader(http.StatusOK)
w.Write(derBytes)
}
// GetCertificateDeployments retrieves all deployment targets for a certificate.
// GET /api/v1/certificates/{id}/deployments
func (h CertificateHandler) GetCertificateDeployments(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
return
}
requestID := middleware.GetRequestID(r.Context())
// Extract certificate ID from path /api/v1/certificates/{id}/deployments
path := strings.TrimPrefix(r.URL.Path, "/api/v1/certificates/")
parts := strings.Split(path, "/")
if len(parts) < 2 || parts[0] == "" {
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
return
}
certID := parts[0]
deployments, err := h.svc.GetCertificateDeployments(r.Context(), certID)
if err != nil {
errMsg := err.Error()
if strings.Contains(errMsg, "not found") {
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
return
}
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to get deployments", requestID)
return
}
JSON(w, http.StatusOK, map[string]interface{}{
"data": deployments,
"total": len(deployments),
})
}