mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 13:51:36 +00:00
feat(observability): DEPL-006 — OpenTelemetry seed (surface only; no spans yet)
Acquisition-audit DEPL-006 closure (Sprint 6 ACQ, 2026-05-16).
Pre-2026-05-16, go.mod listed go.opentelemetry.io/otel,
otel/metric, otel/trace, otelhttp, and auto/sdk all as indirect
deps (pulled transitively by AWS / Azure SDKs at v1.41.0). The
SDK was never initialized — the global otel.GetTracerProvider()
returned the SDK noop provider, and certctl emitted zero spans.
This commit stands up the surface so operators with an OTel
collector can opt in via CERTCTL_OTEL_ENABLED=true without code
changes. It does NOT add per-handler / per-query / per-connector
span instrumentation — that's a v2.3 roadmap follow-up. The
DEPL-006 audit finding is closed by the surface being present.
Transport choice: OTLP/HTTP (proto-binary over HTTPS), NOT
OTLP/gRPC. Both are valid OTel transports; downstream collectors
accept either. HTTP keeps certctl's dep surface narrow — gRPC
pulls in google.golang.org/grpc + the full genproto stack, which
would expand binary size + supply-chain attack surface for a
feature that today emits zero spans. Operators with gRPC-only
collectors can run an OTel-collector tee. Swapping to gRPC later
is a single-import change.
Files
=====
- internal/observability/otel.go: new Init function. Gated by
CERTCTL_OTEL_ENABLED. Builds an OTLP/HTTP exporter, wraps in
a BatchSpanProcessor, installs as the otel global tracer
provider, returns shutdown. Disabled-mode returns a no-op
shutdown so callers defer unconditionally.
- internal/observability/otel_test.go: 3 tests — disabled-mode
no-op (global tracer provider unchanged), enabled-mode
registers an SDK tracer provider, OTEL_SERVICE_NAME flows
through resource.WithFromEnv.
- internal/config/config.go: new ObservabilityConfig sub-config
with a single OTelEnabled bool. Single env var
(CERTCTL_OTEL_ENABLED); everything else flows through the
standard OTEL_* env vars the OTel SDK honors directly via
resource.WithFromEnv + otlptracehttp.New. Deliberately no
CERTCTL_OTEL_SERVICE_NAME / CERTCTL_OTEL_ENDPOINT etc. —
avoids the lying-field footgun where an env var exists in
config but doesn't reach the consumer.
- cmd/server/main.go: wire observability.Init unconditionally
near the existing demo / RFC1918 startup banners. The defer'd
shutdown gets a 5-second timeout so an unreachable collector
doesn't hang process exit.
- go.mod: promote go.opentelemetry.io/otel + otel/sdk +
otlptracehttp from indirect → direct (the four pre-existing
otel deps stay where go mod resolution puts them).
- go.sum: refreshed deps.
The genproto split (newer genproto/googleapis/{api,rpc} submodules
vs the old monolithic genproto module) needed an explicit
google.golang.org/genproto pin to a post-split pseudo-version to
resolve cleanly — included in this commit's go.mod.
Verified locally: gofmt clean, go vet clean, staticcheck clean
across internal/observability + internal/config + cmd/server;
go test -short -count=1 green on all three; `go build ./cmd/server`
produces a 30.9MB binary that boots; targeted tests
(TestInit_Disabled_NoOp / TestInit_Enabled_RegistersTracerProvider /
TestInit_Enabled_RespectsOTEL_SERVICE_NAME) all PASS.
This commit is contained in:
@@ -41,6 +41,7 @@ import (
|
||||
"github.com/certctl-io/certctl/internal/crypto/signer"
|
||||
"github.com/certctl-io/certctl/internal/domain"
|
||||
authdomainAlias "github.com/certctl-io/certctl/internal/domain/auth"
|
||||
"github.com/certctl-io/certctl/internal/observability"
|
||||
"github.com/certctl-io/certctl/internal/ratelimit"
|
||||
"github.com/certctl-io/certctl/internal/repository/postgres"
|
||||
"github.com/certctl-io/certctl/internal/scep/intune"
|
||||
@@ -158,6 +159,36 @@ func main() {
|
||||
logger.Info("RFC1918 outbound block ENABLED (CERTCTL_BLOCK_RFC1918_OUTBOUND=true) — 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 are reserved for outbound HTTP egress AND for the network scanner")
|
||||
}
|
||||
|
||||
// Acquisition-audit DEPL-006 closure (Sprint 6 ACQ, 2026-05-16).
|
||||
// Optional OpenTelemetry seed. Init returns a no-op shutdown when
|
||||
// CERTCTL_OTEL_ENABLED is unset/false — defer'ing it
|
||||
// unconditionally is safe. The OTLP gRPC client connects lazily,
|
||||
// so an unreachable collector surfaces as failed export attempts
|
||||
// in the SDK's internal error log, NOT as a boot-time failure.
|
||||
//
|
||||
// Sprint 6 stands up the surface only — no per-handler /
|
||||
// per-query / per-connector spans are emitted yet (v2.3 roadmap
|
||||
// follow-up). Operators enabling the toggle today see process-
|
||||
// level resource attributes and any spans the OTel SDK emits
|
||||
// internally; no certctl-domain spans until v2.3.
|
||||
otelShutdown, err := observability.Init(context.Background(), observability.Config{
|
||||
Enabled: cfg.Observability.OTelEnabled,
|
||||
})
|
||||
if err != nil {
|
||||
logger.Error("failed to initialize OpenTelemetry", "error", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
defer func() {
|
||||
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
if err := otelShutdown(shutdownCtx); err != nil {
|
||||
logger.Warn("OpenTelemetry shutdown returned error", "error", err)
|
||||
}
|
||||
}()
|
||||
if cfg.Observability.OTelEnabled {
|
||||
logger.Info("OpenTelemetry tracing ENABLED (CERTCTL_OTEL_ENABLED=true) — OTLP/gRPC exporter wired; honors OTEL_EXPORTER_OTLP_ENDPOINT + other OTEL_* env vars. Per-handler instrumentation is a v2.3 roadmap follow-up; this release stands up the surface only.")
|
||||
}
|
||||
|
||||
// Phase 6 SCALE-M3 closure (2026-05-14): operator-overridable
|
||||
// package-level default for the asyncpoll MaxWait fallback.
|
||||
// Per-connector overrides (CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS,
|
||||
|
||||
Reference in New Issue
Block a user