Files
certctl/internal/service/issuer_adapter.go
T
shankar0123 ceca3647eb vault: add automatic token renewal at TTL/2 + Prometheus metric
Closes Top-10 fix #5 of the 2026-05-03 issuer-coverage audit (see
cowork/issuer-coverage-audit-2026-05-03/RESULTS.md). Pre-fix, the
VaultPKI adapter authenticated with a static token and never called
renew-self. Long-lived deploys hit token expiry; the first
operator-visible signal was failed cert renewals on production
targets.

This commit:

  1. Connector.Start(ctx) spawns a goroutine that calls
     POST /v1/auth/token/renew-self at TTL/2 cadence (computed from a
     one-shot lookup-self at startup). Honours ctx.Done() for
     graceful shutdown via a per-loop done channel + Stop().
  2. On `renewable: false` response (initial lookup OR any subsequent
     renewal), the loop emits a WARN, increments the not_renewable
     counter, and exits. The operator must rotate the token before
     Vault's Max TTL elapses.
  3. New Prometheus counter certctl_vault_token_renewals_total with
     labels result={success,failure,not_renewable}. Registered
     alongside existing certctl_issuance_* counters in
     internal/api/handler/metrics.go.
  4. ERROR-level logging on renewal failure with operator-actionable
     substring ("vault token renewal failed; rotate the token before
     TTL expires") so journalctl + grep find it. Loop keeps ticking
     after a failure — transient blips don't kill it.

New optional issuer.Lifecycle interface:

  type Lifecycle interface {
      Start(ctx context.Context) error
      Stop()
  }

Connectors that hold no background goroutines (almost all of them)
do not implement this — IssuerRegistry.StartLifecycles /
StopLifecycles feature-detect via type assertion. New
lifecycle-bearing connectors plug in by implementing the interface;
no further registry plumbing required.

Wiring (cmd/server/main.go):

  - service.NewVaultRenewalMetrics() instance is shared between
    issuerRegistry.SetVaultRenewalMetrics (so Vault connectors built
    by Rebuild get a recorder) and metricsHandler.SetVaultRenewals
    (so the Prometheus exposer emits the new series).
  - issuerRegistry.StartLifecycles(ctx) is called after
    issuerService.BuildRegistry; defer issuerRegistry.StopLifecycles
    is paired so goroutines exit cleanly on signal.
  - IssuerConnectorAdapter.Underlying() exposes the wrapped
    issuer.Connector so registry-level machinery can reach the
    concrete connector behind the adapter without duplicating the
    wiring at every call site.

Tests (internal/connector/issuer/vault/vault_renew_test.go):

  - TestVault_RenewLoop_TickAtHalfTTL — three ticks → three
    renewals, all "success".
  - TestVault_RenewLoop_StopsOnNotRenewable — second renewal returns
    renewable=false, loop exits, third tick fires no HTTP call.
  - TestVault_RenewLoop_FailureSurfacesViaMetric — first renewal 403
    bumps "failure", second renewal succeeds → loop kept ticking.
  - TestVault_RenewLoop_CtxCancellation_StopsCleanly — Stop returns
    within 200ms after ctx cancel.
  - TestVault_RenewLoop_StartsNothingWhenNotRenewable — token
    already non-renewable at boot ⇒ no goroutine, "not_renewable"
    metric increments at startup so operators see it in Grafana.
  - TestVault_ComputeInterval — 4 cases pinning TTL/2 +
    minRenewInterval floor.
  - TestVault_RenewSelf_ParseFailure_NamesActionableInError —
    surfaced error contains "vault token renewal failed" + "rotate
    the token".

Cadence is dynamic — every successful renewal re-derives TTL/2
from the renewed lease's lease_duration, so a short bootstrap
token that gets renewed up to a longer Max TTL shifts to the
longer cadence automatically (defends against degenerate fast
ticking on a token whose Max TTL is far longer than its initial
TTL).

Documentation:
  - docs/connectors.md Vault PKI section gains "Token TTL +
    automatic renewal" subsection (operator-facing: cadence, metric,
    renewable=false rotation playbook).

Out of scope (intentional, flagged in the audit follow-up):
  - AppRole / Kubernetes / AWS IAM auth methods (different renewal
    semantics).
  - Hot-reload of rotated token from disk (operator restarts
    today; future: GUI/MCP issuer-update path triggers Rebuild
    which Stops the old connector and Starts the new one).
  - Auto-re-auth after token death (operator playbook owns it).

CHANGELOG.md is intentionally not hand-edited (per CHANGELOG.md
itself: "no longer maintains a hand-edited per-version changelog;
per-release notes are auto-generated from commit messages between
consecutive tags").

Verified locally:
- gofmt clean.
- go vet ./internal/service/... ./internal/api/handler/...
  ./internal/connector/issuer/vault/... ./cmd/server/...  clean.
- go test -short -count=1 ./internal/connector/issuer/vault/...
  ./internal/service/... ./internal/api/handler/...  green.
- go test -race -count=10 -run 'TestVault_RenewLoop|TestVault_ComputeInterval'
  ./internal/connector/issuer/vault/...  green.

Audit reference: cowork/issuer-coverage-audit-2026-05-03/RESULTS.md
Top-10 fix #5.
2026-05-03 21:24:27 +00:00

192 lines
7.1 KiB
Go

package service
import (
"context"
"time"
"github.com/shankar0123/certctl/internal/connector/issuer"
)
// IssuerConnectorAdapter bridges the connector-layer issuer.Connector interface with the
// service-layer IssuerConnector interface. This maintains dependency inversion: the service
// layer defines the interface it needs, and this adapter wraps the concrete connector.
//
// Metrics: when issuerType + metrics are set via SetMetrics, the adapter
// records every IssueCertificate / RenewCertificate call into the
// IssuanceMetrics tables (audit fix #4). Untyped or unmetricked
// adapters (test path) skip the recording — nil-guard everywhere.
type IssuerConnectorAdapter struct {
connector issuer.Connector
issuerType string
metrics *IssuanceMetrics
}
// NewIssuerConnectorAdapter wraps an issuer.Connector to implement
// service.IssuerConnector. Existing call sites (28+) keep this
// signature; metrics are wired via SetMetrics post-construction by
// the production code path (issuer_registry.go) so test sites that
// don't care about metrics stay one-arg.
func NewIssuerConnectorAdapter(c issuer.Connector) IssuerConnector {
return &IssuerConnectorAdapter{connector: c}
}
// SetMetrics wires per-issuer-type issuance metrics. issuerType is the
// factory key (e.g. "local", "acme", "digicert") — must match one of
// the closed-enum values the metrics doc references. metrics may be
// nil to disable recording. Closes the #4 audit-readiness blocker
// (per-issuer-type metrics).
func (a *IssuerConnectorAdapter) SetMetrics(issuerType string, metrics *IssuanceMetrics) {
a.issuerType = issuerType
a.metrics = metrics
}
// Underlying returns the wrapped issuer.Connector so registry-level
// machinery (StartLifecycles / StopLifecycles, Bundle G audit-row
// pairing, future feature-detect interfaces) can reach the concrete
// connector behind the adapter without duplicating the wiring at
// every call site. Returns interface{} rather than issuer.Connector
// so callers do their own type assertion against optional extension
// interfaces (issuer.Lifecycle, etc.) without an import dependency
// fan-out from this package.
func (a *IssuerConnectorAdapter) Underlying() interface{} {
return a.connector
}
// recordIssuance is the metrics-recording side effect at the adapter
// boundary. Bumps the issuance counter (success/failure) and the
// duration histogram; on failure also bumps the failure-by-error-class
// counter via ClassifyError.
//
// nil-guarded: when metrics or issuerType are unset, it's a no-op.
func (a *IssuerConnectorAdapter) recordIssuance(start time.Time, err error) {
if a.metrics == nil || a.issuerType == "" {
return
}
duration := time.Since(start)
if err != nil {
a.metrics.RecordIssuance(a.issuerType, "failure", duration)
a.metrics.RecordFailure(a.issuerType, ClassifyError(err))
} else {
a.metrics.RecordIssuance(a.issuerType, "success", duration)
}
}
// IssueCertificate delegates to the underlying connector's IssueCertificate method,
// translating between service-layer and connector-layer types.
//
// SCEP RFC 8894 + Intune master bundle Phase 5.6 follow-up: mustStaple flows
// through to the IssuanceRequest.MustStaple field. Only the local issuer
// honors it (RFC 7633 id-pe-tlsfeature extension); upstream connectors
// silently ignore the field.
func (a *IssuerConnectorAdapter) IssueCertificate(ctx context.Context, commonName string, sans []string, csrPEM string, ekus []string, maxTTLSeconds int, mustStaple bool) (*IssuanceResult, error) {
start := time.Now()
result, err := a.connector.IssueCertificate(ctx, issuer.IssuanceRequest{
CommonName: commonName,
SANs: sans,
CSRPEM: csrPEM,
EKUs: ekus,
MaxTTLSeconds: maxTTLSeconds,
MustStaple: mustStaple,
})
a.recordIssuance(start, err)
if err != nil {
return nil, err
}
return &IssuanceResult{
CertPEM: result.CertPEM,
ChainPEM: result.ChainPEM,
Serial: result.Serial,
NotBefore: result.NotBefore,
NotAfter: result.NotAfter,
}, nil
}
// RenewCertificate delegates to the underlying connector's RenewCertificate method,
// translating between service-layer and connector-layer types. Metrics:
// renewal is recorded into the same certctl_issuance_* series as
// initial issuance — operationally, renewal IS issuance from the
// connector's perspective.
func (a *IssuerConnectorAdapter) RenewCertificate(ctx context.Context, commonName string, sans []string, csrPEM string, ekus []string, maxTTLSeconds int, mustStaple bool) (*IssuanceResult, error) {
start := time.Now()
result, err := a.connector.RenewCertificate(ctx, issuer.RenewalRequest{
CommonName: commonName,
SANs: sans,
CSRPEM: csrPEM,
EKUs: ekus,
MaxTTLSeconds: maxTTLSeconds,
MustStaple: mustStaple,
})
a.recordIssuance(start, err)
if err != nil {
return nil, err
}
return &IssuanceResult{
CertPEM: result.CertPEM,
ChainPEM: result.ChainPEM,
Serial: result.Serial,
NotBefore: result.NotBefore,
NotAfter: result.NotAfter,
}, nil
}
// RevokeCertificate delegates to the underlying connector's RevokeCertificate method.
func (a *IssuerConnectorAdapter) RevokeCertificate(ctx context.Context, serial string, reason string) error {
var reasonPtr *string
if reason != "" {
reasonPtr = &reason
}
return a.connector.RevokeCertificate(ctx, issuer.RevocationRequest{
Serial: serial,
Reason: reasonPtr,
})
}
// GenerateCRL delegates to the underlying connector.
func (a *IssuerConnectorAdapter) GenerateCRL(ctx context.Context, entries []CRLEntry) ([]byte, error) {
// Convert service-layer CRLEntry to connector-layer RevokedCertEntry
connEntries := make([]issuer.RevokedCertEntry, len(entries))
for i, e := range entries {
connEntries[i] = issuer.RevokedCertEntry{
SerialNumber: e.SerialNumber,
RevokedAt: e.RevokedAt,
ReasonCode: e.ReasonCode,
}
}
return a.connector.GenerateCRL(ctx, connEntries)
}
// SignOCSPResponse delegates to the underlying connector.
func (a *IssuerConnectorAdapter) SignOCSPResponse(ctx context.Context, req OCSPSignRequest) ([]byte, error) {
return a.connector.SignOCSPResponse(ctx, issuer.OCSPSignRequest{
CertSerial: req.CertSerial,
CertStatus: req.CertStatus,
RevokedAt: req.RevokedAt,
RevocationReason: req.RevocationReason,
ThisUpdate: req.ThisUpdate,
NextUpdate: req.NextUpdate,
Nonce: req.Nonce, // RFC 6960 §4.4.1 echo (production hardening II Phase 1)
})
}
// GetCACertPEM delegates to the underlying connector.
func (a *IssuerConnectorAdapter) GetCACertPEM(ctx context.Context) (string, error) {
return a.connector.GetCACertPEM(ctx)
}
// GetRenewalInfo delegates to the underlying connector, translating between service-layer and connector-layer types.
func (a *IssuerConnectorAdapter) GetRenewalInfo(ctx context.Context, certPEM string) (*RenewalInfoResult, error) {
result, err := a.connector.GetRenewalInfo(ctx, certPEM)
if err != nil {
return nil, err
}
if result == nil {
return nil, nil
}
return &RenewalInfoResult{
SuggestedWindowStart: result.SuggestedWindowStart,
SuggestedWindowEnd: result.SuggestedWindowEnd,
RetryAfter: result.RetryAfter,
ExplanationURL: result.ExplanationURL,
}, nil
}