mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-14 08:48:59 +00:00
vault: add automatic token renewal at TTL/2 + Prometheus metric
Closes Top-10 fix #5 of the 2026-05-03 issuer-coverage audit (see cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the VaultPKI adapter authenticated with a static token and never called renew-self. Long-lived deploys hit token expiry; the first operator-visible signal was failed cert renewals on production targets. This commit: 1. Connector.Start(ctx) spawns a goroutine that calls POST /v1/auth/token/renew-self at TTL/2 cadence (computed from a one-shot lookup-self at startup). Honours ctx.Done() for graceful shutdown via a per-loop done channel + Stop(). 2. On `renewable: false` response (initial lookup OR any subsequent renewal), the loop emits a WARN, increments the not_renewable counter, and exits. The operator must rotate the token before Vault's Max TTL elapses. 3. New Prometheus counter certctl_vault_token_renewals_total with labels result={success,failure,not_renewable}. Registered alongside existing certctl_issuance_* counters in internal/api/handler/metrics.go. 4. ERROR-level logging on renewal failure with operator-actionable substring ("vault token renewal failed; rotate the token before TTL expires") so journalctl + grep find it. Loop keeps ticking after a failure — transient blips don't kill it. New optional issuer.Lifecycle interface: type Lifecycle interface { Start(ctx context.Context) error Stop() } Connectors that hold no background goroutines (almost all of them) do not implement this — IssuerRegistry.StartLifecycles / StopLifecycles feature-detect via type assertion. New lifecycle-bearing connectors plug in by implementing the interface; no further registry plumbing required. Wiring (cmd/server/main.go): - service.NewVaultRenewalMetrics() instance is shared between issuerRegistry.SetVaultRenewalMetrics (so Vault connectors built by Rebuild get a recorder) and metricsHandler.SetVaultRenewals (so the Prometheus exposer emits the new series). - issuerRegistry.StartLifecycles(ctx) is called after issuerService.BuildRegistry; defer issuerRegistry.StopLifecycles is paired so goroutines exit cleanly on signal. - IssuerConnectorAdapter.Underlying() exposes the wrapped issuer.Connector so registry-level machinery can reach the concrete connector behind the adapter without duplicating the wiring at every call site. Tests (internal/connector/issuer/vault/vault_renew_test.go): - TestVault_RenewLoop_TickAtHalfTTL — three ticks → three renewals, all "success". - TestVault_RenewLoop_StopsOnNotRenewable — second renewal returns renewable=false, loop exits, third tick fires no HTTP call. - TestVault_RenewLoop_FailureSurfacesViaMetric — first renewal 403 bumps "failure", second renewal succeeds → loop kept ticking. - TestVault_RenewLoop_CtxCancellation_StopsCleanly — Stop returns within 200ms after ctx cancel. - TestVault_RenewLoop_StartsNothingWhenNotRenewable — token already non-renewable at boot ⇒ no goroutine, "not_renewable" metric increments at startup so operators see it in Grafana. - TestVault_ComputeInterval — 4 cases pinning TTL/2 + minRenewInterval floor. - TestVault_RenewSelf_ParseFailure_NamesActionableInError — surfaced error contains "vault token renewal failed" + "rotate the token". Cadence is dynamic — every successful renewal re-derives TTL/2 from the renewed lease's lease_duration, so a short bootstrap token that gets renewed up to a longer Max TTL shifts to the longer cadence automatically (defends against degenerate fast ticking on a token whose Max TTL is far longer than its initial TTL). Documentation: - docs/connectors.md Vault PKI section gains "Token TTL + automatic renewal" subsection (operator-facing: cadence, metric, renewable=false rotation playbook). Out of scope (intentional, flagged in the audit follow-up): - AppRole / Kubernetes / AWS IAM auth methods (different renewal semantics). - Hot-reload of rotated token from disk (operator restarts today; future: GUI/MCP issuer-update path triggers Rebuild which Stops the old connector and Starts the new one). - Auto-re-auth after token death (operator playbook owns it). CHANGELOG.md is intentionally not hand-edited (per CHANGELOG.md itself: "no longer maintains a hand-edited per-version changelog; per-release notes are auto-generated from commit messages between consecutive tags"). Verified locally: - gofmt clean. - go vet ./internal/service/... ./internal/api/handler/... ./internal/connector/issuer/vault/... ./cmd/server/... clean. - go test -short -count=1 ./internal/connector/issuer/vault/... ./internal/service/... ./internal/api/handler/... green. - go test -race -count=10 -run 'TestVault_RenewLoop|TestVault_ComputeInterval' ./internal/connector/issuer/vault/... green. Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md Top-10 fix #5.
This commit is contained in:
@@ -29,6 +29,7 @@ import (
|
||||
"log/slog"
|
||||
"net/http"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/connector/issuer"
|
||||
@@ -72,6 +73,32 @@ type Connector struct {
|
||||
config *Config
|
||||
logger *slog.Logger
|
||||
httpClient *http.Client
|
||||
|
||||
// Token-renewal loop fields. Top-10 fix #5 of the 2026-05-03
|
||||
// issuer-coverage audit. Long-lived certctl-server deploys hit
|
||||
// Vault token expiry; the loop calls /v1/auth/token/renew-self at
|
||||
// TTL/2 cadence so the integration stays alive up to Vault's
|
||||
// configured Max TTL. See vault_renew.go for Start / Stop /
|
||||
// renewSelf / lookupSelf.
|
||||
//
|
||||
// renewMu guards startedOnce + cancel + done. The ticker runs in a
|
||||
// goroutine that owns its own copy of these channels.
|
||||
renewMu sync.Mutex
|
||||
renewStarted bool // true after Start spawned the goroutine
|
||||
renewCancel func() // cancels the goroutine's ctx
|
||||
renewDone chan struct{} // closed when goroutine exits
|
||||
renewRecorder RenewalRecorder // optional metric sink (defaults to no-op)
|
||||
|
||||
// renewTickerFactory lets tests substitute a deterministic ticker
|
||||
// implementation for cadence assertions. Production callers leave
|
||||
// this nil and the loop uses time.NewTicker.
|
||||
renewTickerFactory func(d time.Duration) renewTicker
|
||||
|
||||
// renewClient is the HTTP client used for renew-self / lookup-self.
|
||||
// Defaults to httpClient; a separate seam lets tests inject an
|
||||
// httptest.Server-bound client without disturbing the issuance
|
||||
// path's client.
|
||||
renewClient *http.Client
|
||||
}
|
||||
|
||||
// New creates a new Vault PKI connector with the given configuration and logger.
|
||||
@@ -85,13 +112,36 @@ func New(config *Config, logger *slog.Logger) *Connector {
|
||||
}
|
||||
}
|
||||
|
||||
return &Connector{
|
||||
config: config,
|
||||
logger: logger,
|
||||
httpClient: &http.Client{
|
||||
Timeout: 30 * time.Second,
|
||||
},
|
||||
httpClient := &http.Client{
|
||||
Timeout: 30 * time.Second,
|
||||
}
|
||||
return &Connector{
|
||||
config: config,
|
||||
logger: logger,
|
||||
httpClient: httpClient,
|
||||
renewClient: httpClient,
|
||||
renewRecorder: noopRenewalRecorder{},
|
||||
}
|
||||
}
|
||||
|
||||
// SetRenewalRecorder wires a metric sink for the renew-self loop. The
|
||||
// recorder's RecordRenewal(result string) is called with one of the
|
||||
// enum values "success", "failure", or "not_renewable" on every tick.
|
||||
// Pass nil to disable recording. Safe to call before Start; calling
|
||||
// after Start has no effect on already-emitted increments.
|
||||
//
|
||||
// The interface lives in this package (not internal/service) to avoid
|
||||
// an import cycle: vault is a connector package that the service-layer
|
||||
// IssuerRegistry imports. The service-layer concrete type
|
||||
// (*service.VaultRenewalMetrics) satisfies this interface and is wired
|
||||
// in cmd/server/main.go.
|
||||
func (c *Connector) SetRenewalRecorder(r RenewalRecorder) {
|
||||
if r == nil {
|
||||
r = noopRenewalRecorder{}
|
||||
}
|
||||
c.renewMu.Lock()
|
||||
defer c.renewMu.Unlock()
|
||||
c.renewRecorder = r
|
||||
}
|
||||
|
||||
// vaultResponse is the standard Vault API response wrapper.
|
||||
|
||||
Reference in New Issue
Block a user