mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:41:29 +00:00
a41fc2d75c
Phase 13 Sprint 13.3 — the completion half of the ARCH-M1
substantive close. Sprint 13.2 shipped the Postgres-backed
sliding-window limiter + multi-replica integration test; Sprint 13.3
wires the 6 call sites in cmd/server/main.go through the operator-
chosen backend selector, adds the rate_limit_buckets scheduler
janitor sweep, rewrites the observability doc, exposes the env-var
in the helm chart, and promotes the multi-replica integration test
to a required CI status check.
Signature ground-truth (sprint 13.2 + 13.3)
===========================================
Prompt-template signatures: `Allow(key string) error` and "5 call
sites." Actual repo: `Allow(key string, now time.Time) error` and 6
NewSlidingWindowLimiter call sites in cmd/server/main.go (the prompt
miscounted the second EST per-principal arm). Per CLAUDE.md "the repo
is truth," matched the live shape.
What changed
============
internal/config/server.go (+40 LOC):
- Added `SlidingWindowBackend string` + `SlidingWindowJanitorInterval
time.Duration` to RateLimitConfig with full operator-facing
documentation of the two valid values (memory|postgres) +
when-to-use-which decision tree.
internal/config/config.go (+27 LOC):
- Load() reads CERTCTL_RATE_LIMIT_BACKEND (default "memory") +
CERTCTL_RATE_LIMIT_JANITOR_INTERVAL (default 5m).
- Validate() rejects anything other than ""/"memory"/"postgres"
(empty = memory equivalence for test-built Configs that bypass
Load()). Janitor interval must be ≥ 1 minute when set.
- Failure modes return clear ::error:: with the env-var name + the
valid values, so an operator typo ("postgress" → memory in a
3-replica cluster) fails fast at startup.
internal/ratelimit/factory.go (NEW, 67 LOC):
- NewLimiter(backend, db, maxN, window, mapCap) Limiter — single
factory the 6 cmd/server/main.go call sites route through.
- Drop-in signature: same maxN/window/mapCap as
NewSlidingWindowLimiter (mapCap accepted + ignored for postgres
— the rate_limit_buckets table grows until the janitor sweeps).
- Defensive panic on unknown backend (config.Validate is SoT;
this is belt-and-suspenders).
internal/ratelimit/postgres_gc.go (NEW, 73 LOC):
- PostgresGC struct + NewPostgresGC + GarbageCollect.
- Single-statement DELETE FROM rate_limit_buckets WHERE
updated_at < NOW() - maxWindow. Idempotent.
- maxWindow <= 0 is a no-op (operator opt-out).
internal/scheduler/scheduler.go (+90 LOC):
- New RateLimitGarbageCollector interface (mirrors the
ACMEGarbageCollector / SessionGarbageCollector contracts).
- rateLimitGC field + rateLimitGCInterval + rateLimitGCRunning
on Scheduler.
- SetRateLimitGarbageCollector(gc) + SetRateLimitGCInterval(d)
Setters following the existing acmeGC/sessionGC pattern.
- rateLimitGCLoop() — JitteredTicker + atomic.Bool guard +
per-tick context.WithTimeout(1m). Logs row count at Debug.
- Loop counted in the Start() WaitGroup only when the GC is
non-nil; cmd/server/main.go skips SetRateLimitGarbageCollector
when backend=memory so the loop never launches for that case.
cmd/server/main.go (35 LOC diff):
- All 6 ratelimit.NewSlidingWindowLimiter call sites now route
through ratelimit.NewLimiter(cfg.RateLimit.SlidingWindowBackend,
db, ...). Grep verification post-fix returns ZERO hits.
- Six sites: breakglass loginLimiter (580), ocspLimiter (1003),
exportLimiter (1068), EST failed-basic (1535), EST per-principal
SCEP-mTLS arm (1591), EST per-principal SCEP arm (1613). The
intune.NewPerDeviceRateLimiter site at line 1823 stays unmoved
— its inner type-alias wrapper is the prompt's
out-of-scope (cmd/server/*.go only).
- Conditionally constructs PostgresGC + wires the scheduler janitor
when backend=postgres; logs the wiring decision either way so
operators see "rate-limit GC sweep enabled (postgres backend)"
or "in-memory backend self-prunes" in the boot log.
internal/api/handler/{est,export,certificates,auth_breakglass}.go:
- Replaced 5 *ratelimit.SlidingWindowLimiter field/Setter types
with ratelimit.Limiter (the interface). Allow() satisfies the
same call shape on both backends; the in-memory tests that
construct *SlidingWindowLimiter still compile because the
concrete type satisfies the interface (compile-time check in
internal/ratelimit/limiter.go pins this).
docs/operator/observability.md (176 LOC diff):
- Replaced the "per-process, in-memory, reset-on-restart, not
shared across replicas" paragraph with the new
configurable-backend section: operator decision tree,
backend internals (memory vs postgres), janitor description,
falsifiable closure proof (the Sprint 13.2 integration test
name + invocation), helm chart wiring example.
- Updated inventory to reflect the actual handler file paths +
actual cap configurations (the prior doc said "60s window" for
several limiters that actually use 60m / 24h windows).
- Doc smoke confirmed: grep -c 'per-process, in-memory,
reset-on-restart' docs/operator/observability.md = 0.
deploy/helm/certctl/values.yaml + templates/server-configmap.yaml +
templates/server-deployment.yaml:
- Exposed server.rateLimiting.backend (default "memory") +
server.rateLimiting.janitorInterval (default "5m") under the
existing rateLimiting block.
- ConfigMap renders both as rate-limit-backend +
rate-limit-janitor-interval keys.
- Deployment wires CERTCTL_RATE_LIMIT_BACKEND +
CERTCTL_RATE_LIMIT_JANITOR_INTERVAL env vars from the configmap.
- Helm render: `helm template deploy/helm/certctl --set
server.rateLimiting.backend=postgres` shows the env-var on the
server-deployment.yaml output.
.github/workflows/ci.yml (+12 LOC):
- Added a new step in the Go Build & Test job that runs the
Sprint 13.2 multi-replica integration test
(TestRateLimit_PostgresBackend_CapEnforcedAcrossReplicas) with
-tags=integration -race -timeout=300s. Fails the CI status check
if the cross-replica row lock ever stops arbitrating across
replicas — the ARCH-M1 closure regression gate.
Verification (all green locally; postgres integration via CI)
============================================================
$ grep -nE 'NewSlidingWindowLimiter' cmd/server/*.go
(zero hits — Sprint 13.3 receipt)
$ go test -short -count=1 \
./internal/config/... ./internal/ratelimit/... \
./internal/scheduler/... ./internal/api/handler/... \
./cmd/server/...
ok internal/config 1.177s
ok internal/ratelimit 0.007s
ok internal/scheduler 9.165s
ok internal/api/handler 6.245s
ok cmd/server 0.390s
$ staticcheck ./internal/ratelimit/... ./internal/scheduler/... \
./internal/config/... ./internal/api/handler/... ./cmd/server/...
(clean)
$ gofmt -l internal/ cmd/server/
(clean)
$ grep -c 'per-process, in-memory, reset-on-restart' \
docs/operator/observability.md
0 (doc smoke — the audit's verbatim phrasing is gone)
$ bash scripts/ci-guards/G-3-env-docs-drift.sh
G-3 env-docs-drift: clean.
$ bash scripts/ci-guards/complete-path-config-coverage.sh
OK — every CERTCTL_* env var (197) has at least one non-config-
package consumer.
Selector contract verified — config.Validate() rejects any value
other than ""/memory/postgres at startup with a clear error message.
Sprint 13.4 next (ARCH-H1 OpenAPI authoring batch 1) is on a
different axis; ARCH-M1 closure is complete with this commit
modulo the Sprint 13.7 audit-HTML flip + zero-floor pin.
Closes: ARCH-M1 substantive remediation. The cross-replica rate-
limit-cap-enforcement gap that the audit recommended deferring to
v3 is closed; operators with server.replicas > 1 flip
CERTCTL_RATE_LIMIT_BACKEND=postgres and get exactly-cap enforcement
across the cluster (proved by the multi-replica integration test now
gating CI).
194 lines
6.4 KiB
Go
194 lines
6.4 KiB
Go
// Copyright 2026 certctl LLC. All rights reserved.
|
|
// SPDX-License-Identifier: BUSL-1.1
|
|
|
|
package handler
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"errors"
|
|
"fmt"
|
|
"github.com/certctl-io/certctl/internal/repository"
|
|
"log/slog"
|
|
"net/http"
|
|
"strings"
|
|
"time"
|
|
|
|
"github.com/certctl-io/certctl/internal/api/middleware"
|
|
"github.com/certctl-io/certctl/internal/ratelimit"
|
|
"github.com/certctl-io/certctl/internal/service"
|
|
)
|
|
|
|
// ExportService defines the service interface for certificate export operations.
|
|
type ExportService interface {
|
|
ExportPEM(ctx context.Context, certID string) (*service.ExportPEMResult, error)
|
|
ExportPKCS12(ctx context.Context, certID string, password string) ([]byte, error)
|
|
}
|
|
|
|
// ExportHandler handles HTTP requests for certificate export operations.
|
|
type ExportHandler struct {
|
|
svc ExportService
|
|
exportLimiter ratelimit.Limiter // production hardening II Phase 3
|
|
}
|
|
|
|
// NewExportHandler creates a new ExportHandler with a service dependency.
|
|
func NewExportHandler(svc ExportService) ExportHandler {
|
|
return ExportHandler{svc: svc}
|
|
}
|
|
|
|
// SetExportRateLimiter wires the per-actor cert-export rate limiter.
|
|
// Production hardening II Phase 3. Default cap (when set in
|
|
// cmd/server/main.go): 50 exports/hr/operator. Setting to nil
|
|
// disables the limit.
|
|
func (h *ExportHandler) SetExportRateLimiter(l ratelimit.Limiter) {
|
|
h.exportLimiter = l
|
|
}
|
|
|
|
// applyExportRateLimit enforces the per-actor cap. Returns true when
|
|
// the request was rejected (handler should stop).
|
|
//
|
|
// On rejection: HTTP 429 + JSON body {"error":"rate_limit_exceeded",
|
|
// "retry_after_seconds":3600}. Production hardening II Phase 3.
|
|
func (h ExportHandler) applyExportRateLimit(w http.ResponseWriter, r *http.Request) bool {
|
|
if h.exportLimiter == nil {
|
|
return false
|
|
}
|
|
// Auth context populates an actor on the request; cert-export is
|
|
// always behind the API-key middleware so this is non-empty in
|
|
// production. Fall-back to RemoteAddr only if the auth pipeline
|
|
// somehow allowed an empty actor (defensive; shouldn't fire).
|
|
actor := r.Header.Get("X-Actor")
|
|
if actor == "" {
|
|
actor = r.RemoteAddr
|
|
}
|
|
if err := h.exportLimiter.Allow(actor, time.Now()); err != nil {
|
|
w.Header().Set("Content-Type", "application/json")
|
|
w.Header().Set("Retry-After", "3600")
|
|
w.WriteHeader(http.StatusTooManyRequests)
|
|
_, _ = fmt.Fprint(w, `{"error":"rate_limit_exceeded","retry_after_seconds":3600}`)
|
|
return true
|
|
}
|
|
return false
|
|
}
|
|
|
|
// ExportPEM exports a certificate and its chain in PEM format.
|
|
// GET /api/v1/certificates/{id}/export/pem
|
|
func (h ExportHandler) ExportPEM(w http.ResponseWriter, r *http.Request) {
|
|
if r.Method != http.MethodGet {
|
|
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
|
|
return
|
|
}
|
|
|
|
// Production hardening II Phase 3: per-actor cert-export rate limit.
|
|
if h.applyExportRateLimit(w, r) {
|
|
return
|
|
}
|
|
|
|
requestID := middleware.GetRequestID(r.Context())
|
|
|
|
// Extract certificate ID from path: /api/v1/certificates/{id}/export/pem
|
|
id := extractCertIDFromExportPath(r.URL.Path)
|
|
if id == "" {
|
|
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
|
|
return
|
|
}
|
|
|
|
result, err := h.svc.ExportPEM(r.Context(), id)
|
|
if err != nil {
|
|
if errors.Is(err, repository.ErrNotFound) {
|
|
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
|
|
return
|
|
}
|
|
slog.Error("ExportPEM failed", "cert_id", id, "error", err.Error())
|
|
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to export certificate", requestID)
|
|
return
|
|
}
|
|
|
|
// Check if client wants file download via Accept header or ?download=true query param
|
|
if r.URL.Query().Get("download") == "true" {
|
|
w.Header().Set("Content-Type", "application/x-pem-file")
|
|
w.Header().Set("Content-Disposition", "attachment; filename=\"certificate.pem\"")
|
|
w.WriteHeader(http.StatusOK)
|
|
w.Write([]byte(result.FullPEM))
|
|
return
|
|
}
|
|
|
|
JSON(w, http.StatusOK, result)
|
|
}
|
|
|
|
// ExportPKCS12 exports a certificate and chain in PKCS#12 format.
|
|
// POST /api/v1/certificates/{id}/export/pkcs12
|
|
// Body: { "password": "optional-password" }
|
|
func (h ExportHandler) ExportPKCS12(w http.ResponseWriter, r *http.Request) {
|
|
if r.Method != http.MethodPost {
|
|
Error(w, http.StatusMethodNotAllowed, "Method not allowed")
|
|
return
|
|
}
|
|
|
|
// Production hardening II Phase 3: per-actor cert-export rate limit.
|
|
if h.applyExportRateLimit(w, r) {
|
|
return
|
|
}
|
|
|
|
requestID := middleware.GetRequestID(r.Context())
|
|
|
|
// Extract certificate ID from path: /api/v1/certificates/{id}/export/pkcs12
|
|
id := extractCertIDFromExportPath(r.URL.Path)
|
|
if id == "" {
|
|
ErrorWithRequestID(w, http.StatusBadRequest, "Certificate ID is required", requestID)
|
|
return
|
|
}
|
|
|
|
// Parse optional password from request body (may be empty)
|
|
var req struct {
|
|
Password string `json:"password"`
|
|
}
|
|
// Body is optional — empty body means empty password
|
|
_ = parseJSONBody(r, &req)
|
|
|
|
pfxData, err := h.svc.ExportPKCS12(r.Context(), id, req.Password)
|
|
if err != nil {
|
|
if errors.Is(err, repository.ErrNotFound) {
|
|
ErrorWithRequestID(w, http.StatusNotFound, "Certificate not found", requestID)
|
|
return
|
|
}
|
|
if strings.Contains(err.Error(), "cannot be parsed") || strings.Contains(err.Error(), "no certificates found") {
|
|
ErrorWithRequestID(w, http.StatusUnprocessableEntity, "Certificate data cannot be parsed as X.509", requestID)
|
|
return
|
|
}
|
|
slog.Error("ExportPKCS12 failed", "cert_id", id, "error", err.Error())
|
|
ErrorWithRequestID(w, http.StatusInternalServerError, "Failed to export PKCS#12", requestID)
|
|
return
|
|
}
|
|
|
|
w.Header().Set("Content-Type", "application/x-pkcs12")
|
|
w.Header().Set("Content-Disposition", "attachment; filename=\"certificate.p12\"")
|
|
w.WriteHeader(http.StatusOK)
|
|
w.Write(pfxData)
|
|
}
|
|
|
|
// extractCertIDFromExportPath extracts the certificate ID from an export path.
|
|
// Path format: /api/v1/certificates/{id}/export/pem or /api/v1/certificates/{id}/export/pkcs12
|
|
func extractCertIDFromExportPath(path string) string {
|
|
prefix := "/api/v1/certificates/"
|
|
if !strings.HasPrefix(path, prefix) {
|
|
return ""
|
|
}
|
|
rest := strings.TrimPrefix(path, prefix)
|
|
// rest should be "{id}/export/pem" or "{id}/export/pkcs12"
|
|
parts := strings.Split(rest, "/")
|
|
if len(parts) < 3 || parts[1] != "export" {
|
|
return ""
|
|
}
|
|
return parts[0]
|
|
}
|
|
|
|
// parseJSONBody is a helper that decodes JSON from the request body.
|
|
// Returns an error if the body is malformed, nil if body is empty.
|
|
func parseJSONBody(r *http.Request, v interface{}) error {
|
|
if r.Body == nil {
|
|
return nil
|
|
}
|
|
return json.NewDecoder(r.Body).Decode(v)
|
|
}
|