feat(observability): DEPL-006 — OpenTelemetry seed (surface only; no spans yet)

Acquisition-audit DEPL-006 closure (Sprint 6 ACQ, 2026-05-16).

Pre-2026-05-16, go.mod listed go.opentelemetry.io/otel,
otel/metric, otel/trace, otelhttp, and auto/sdk all as indirect
deps (pulled transitively by AWS / Azure SDKs at v1.41.0). The
SDK was never initialized — the global otel.GetTracerProvider()
returned the SDK noop provider, and certctl emitted zero spans.

This commit stands up the surface so operators with an OTel
collector can opt in via CERTCTL_OTEL_ENABLED=true without code
changes. It does NOT add per-handler / per-query / per-connector
span instrumentation — that's a v2.3 roadmap follow-up. The
DEPL-006 audit finding is closed by the surface being present.

Transport choice: OTLP/HTTP (proto-binary over HTTPS), NOT
OTLP/gRPC. Both are valid OTel transports; downstream collectors
accept either. HTTP keeps certctl's dep surface narrow — gRPC
pulls in google.golang.org/grpc + the full genproto stack, which
would expand binary size + supply-chain attack surface for a
feature that today emits zero spans. Operators with gRPC-only
collectors can run an OTel-collector tee. Swapping to gRPC later
is a single-import change.

Files
=====
- internal/observability/otel.go: new Init function. Gated by
  CERTCTL_OTEL_ENABLED. Builds an OTLP/HTTP exporter, wraps in
  a BatchSpanProcessor, installs as the otel global tracer
  provider, returns shutdown. Disabled-mode returns a no-op
  shutdown so callers defer unconditionally.
- internal/observability/otel_test.go: 3 tests — disabled-mode
  no-op (global tracer provider unchanged), enabled-mode
  registers an SDK tracer provider, OTEL_SERVICE_NAME flows
  through resource.WithFromEnv.
- internal/config/config.go: new ObservabilityConfig sub-config
  with a single OTelEnabled bool. Single env var
  (CERTCTL_OTEL_ENABLED); everything else flows through the
  standard OTEL_* env vars the OTel SDK honors directly via
  resource.WithFromEnv + otlptracehttp.New. Deliberately no
  CERTCTL_OTEL_SERVICE_NAME / CERTCTL_OTEL_ENDPOINT etc. —
  avoids the lying-field footgun where an env var exists in
  config but doesn't reach the consumer.
- cmd/server/main.go: wire observability.Init unconditionally
  near the existing demo / RFC1918 startup banners. The defer'd
  shutdown gets a 5-second timeout so an unreachable collector
  doesn't hang process exit.
- go.mod: promote go.opentelemetry.io/otel + otel/sdk +
  otlptracehttp from indirect → direct (the four pre-existing
  otel deps stay where go mod resolution puts them).
- go.sum: refreshed deps.

The genproto split (newer genproto/googleapis/{api,rpc} submodules
vs the old monolithic genproto module) needed an explicit
google.golang.org/genproto pin to a post-split pseudo-version to
resolve cleanly — included in this commit's go.mod.

Verified locally: gofmt clean, go vet clean, staticcheck clean
across internal/observability + internal/config + cmd/server;
go test -short -count=1 green on all three; `go build ./cmd/server`
produces a 30.9MB binary that boots; targeted tests
(TestInit_Disabled_NoOp / TestInit_Enabled_RegistersTracerProvider /
TestInit_Enabled_RespectsOTEL_SERVICE_NAME) all PASS.
This commit is contained in:
shankar0123
2026-05-16 19:45:42 +00:00
parent 5c5bbedc7e
commit 35277c0f2c
6 changed files with 383 additions and 3 deletions
+46
View File
@@ -118,6 +118,39 @@ type Config struct {
// only field is BlockRFC1918Outbound; future egress-policy knobs
// (per-host allowlists, max-dial-time overrides) go here.
Network NetworkConfig
// Observability holds the optional OpenTelemetry seed config.
// Acquisition-audit DEPL-006 closure (Sprint 6 ACQ, 2026-05-16).
// Default Enabled=false — operators opt in via CERTCTL_OTEL_ENABLED=true.
Observability ObservabilityConfig
}
// ObservabilityConfig is the operator-facing config surface for the
// OTel seed. Acquisition-audit DEPL-006 closure (Sprint 6 ACQ,
// 2026-05-16). Plumbed through to internal/observability.Init at
// boot from cmd/server/main.go.
//
// The single gate is CERTCTL_OTEL_ENABLED. Everything else (endpoint,
// headers, protocol, service name, resource attributes) flows
// through the standard OTEL_* env vars the OTel SDK's
// resource.WithFromEnv + otlptracehttp.New honor directly — no
// certctl-specific re-implementation of those env vars (avoids the
// "lying field" footgun where an env var exists in code but doesn't
// reach the consumer).
type ObservabilityConfig struct {
// OTelEnabled gates the optional OpenTelemetry tracer-provider
// initialization. Default false (zero behavior change for
// operators who don't opt in). When true, the boot path wires
// up an OTLP/HTTP exporter and registers it as the otel global
// tracer provider. CERTCTL_OTEL_ENABLED.
//
// Per-handler / per-query / per-connector span instrumentation
// is NOT added by Sprint 6 — this commit stands up the surface
// only; instrumentation is a v2.3 follow-up. Operators who
// enable the toggle today will see process-level resource
// attributes and (eventually) any spans the OTel SDK emits
// from its own internal paths, but no certctl-domain spans
// until the v2.3 work lands.
OTelEnabled bool
}
// NetworkConfig is the outbound-egress policy surface for certctl.
@@ -797,6 +830,19 @@ func Load() (*Config, error) {
Network: NetworkConfig{
BlockRFC1918Outbound: getEnvBool("CERTCTL_BLOCK_RFC1918_OUTBOUND", false),
},
// Acquisition-audit DEPL-006 closure (Sprint 6 ACQ,
// 2026-05-16). Optional OpenTelemetry seed. Default Enabled=false
// preserves zero-overhead behavior for operators who don't opt
// in; the boot path calls observability.Init unconditionally
// (observability.Init short-circuits to a no-op shutdown when
// disabled). Operators set CERTCTL_OTEL_ENABLED=true plus the
// standard OTEL_* env vars (OTEL_EXPORTER_OTLP_ENDPOINT, etc.)
// to wire spans to their collector. Per-handler / per-query
// instrumentation is a v2.3 roadmap follow-up; this sprint
// stands up the surface only.
Observability: ObservabilityConfig{
OTelEnabled: getEnvBool("CERTCTL_OTEL_ENABLED", false),
},
}
// Parse CERTCTL_API_KEYS_NAMED for named key authentication (M-002).