mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:01:32 +00:00
docs(observability): DEPL-006 follow-up — document CERTCTL_OTEL_ENABLED (G-3 ci-guard)
Sprint 6 ACQ DEPL-006 closure follow-up. The G-3-env-docs-drift
ci-guard scans `internal/` + `cmd/` for every CERTCTL_*
env-var reference and cross-checks against README + docs/ +
deploy/helm/ + deploy/ENVIRONMENTS.md. The OTel-seed commit
(35277c0) introduced `CERTCTL_OTEL_ENABLED` in
`internal/config/config.go` + `cmd/server/main.go` but didn't
add the matching doc entry, so the guard caught the drift on
the next CI run with:
G-3 regression: env var(s) defined in Go source but never documented:
CERTCTL_OTEL_ENABLED
Replaces the existing "Tracing — explicitly not yet shipped"
subsection in docs/operator/observability.md with an honest
"Tracing — OTLP surface available, instrumentation pending"
section that:
- Documents the env var + the standard OTEL_* env vars the SDK
honors (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SERVICE_NAME, etc.).
- Explains the OTLP/HTTP transport choice (vs gRPC) per the
rationale in internal/observability/otel.go's header.
- Pins what the current release DOES (surface + lazy connect +
graceful shutdown) vs DOES NOT (per-handler / per-DB /
per-connector spans).
- Notes the no-op-shutdown contract so operators can defer
unconditionally.
- Cross-references the existing request_id correlation + per-
issuer Prometheus histogram as the interim correlation surface.
- Repoints the "future work" tracker from the old "v3 item"
framing to WORKSPACE-ROADMAP.md §2 (Phase 4 in the path-b
build plan).
Verified locally: `bash scripts/ci-guards/G-3-env-docs-drift.sh`
exits 0 ("G-3 env-docs-drift: clean").
This commit is contained in:
@@ -74,22 +74,55 @@ metric surface meet our SLO needs today" — not "is the right library
|
||||
under the hood." If the answer to the first question is yes, the
|
||||
second is a refactor, not a feature gap.
|
||||
|
||||
## Tracing — explicitly not yet shipped
|
||||
## Tracing — OTLP surface available, instrumentation pending
|
||||
|
||||
certctl does **not** ship distributed tracing instrumentation today:
|
||||
Sprint 6 ACQ DEPL-006 closure (2026-05-16) stood up the OTel tracer-
|
||||
provider surface. Operators with an OTel collector can opt in via:
|
||||
|
||||
- No OpenTelemetry SDK setup in `cmd/server/main.go`.
|
||||
- No OTLP exporter wired into outbound calls (issuer connectors,
|
||||
agent enrollment, etc.).
|
||||
- The `go.opentelemetry.io/otel` packages that appear in
|
||||
[`go.mod`](../../go.mod) are indirect-only — they're transitive
|
||||
dependencies of `coreos/go-oidc` and similar.
|
||||
```
|
||||
CERTCTL_OTEL_ENABLED=true
|
||||
OTEL_EXPORTER_OTLP_ENDPOINT=https://otel-collector.example.com:4318
|
||||
```
|
||||
|
||||
This is honest: there is no in-process tracing surface to monitor,
|
||||
correlate, or sample. If your environment requires end-to-end traces
|
||||
across the certctl control plane + agents + issuer backends, this is
|
||||
a gap you would close on the certctl side as part of a v3 work item.
|
||||
Until then:
|
||||
When `CERTCTL_OTEL_ENABLED` is true, `cmd/server/main.go` calls
|
||||
`internal/observability.Init` which:
|
||||
|
||||
- Constructs an OTLP/HTTP exporter (chosen over OTLP/gRPC to keep
|
||||
the dependency surface narrow — see `internal/observability/otel.go`
|
||||
header for the transport-choice rationale).
|
||||
- Registers a real `sdktrace.TracerProvider` as the otel global.
|
||||
- Honors the standard OTel env vars (`OTEL_EXPORTER_OTLP_ENDPOINT`,
|
||||
`OTEL_EXPORTER_OTLP_HEADERS`, `OTEL_EXPORTER_OTLP_INSECURE`,
|
||||
`OTEL_SERVICE_NAME` overrides the default `certctl-server`, etc.).
|
||||
- Defers a graceful shutdown that flushes the in-flight batcher.
|
||||
|
||||
What this **does not** ship yet:
|
||||
|
||||
- No per-handler / per-DB / per-connector span instrumentation in
|
||||
the certctl code base. The OTel SDK emits the spans it generates
|
||||
internally (process resource attributes, eventual stdlib HTTP
|
||||
spans), but certctl-domain spans (issuance, renewal, deployment,
|
||||
agent enrollment) are a v2.3 roadmap follow-up.
|
||||
- No tracing-correlated metric exemplars in the Prometheus
|
||||
histograms above. Those still ship the per-issuer latency signal
|
||||
without per-request fan-out.
|
||||
- No backwards-compat shim — operators who never set
|
||||
`CERTCTL_OTEL_ENABLED` (the default) see zero behavior change.
|
||||
The init returns a no-op shutdown so the deferred call is safe
|
||||
to invoke unconditionally.
|
||||
|
||||
When this matters today:
|
||||
|
||||
- Operators wiring up a v3 instrumentation effort have the OTel
|
||||
surface in place; they only need to add `tracer.Start(ctx, "…")`
|
||||
call sites in the handler/service code.
|
||||
- Operators evaluating certctl for acquisition / due-diligence see
|
||||
an opt-in OTel surface in the current release rather than a "v3
|
||||
roadmap item" — a useful signal for buyer credibility per the
|
||||
acquisition-thesis framing in `WORKSPACE-ROADMAP.md` §3.
|
||||
|
||||
Existing correlation surfaces stay in place until span coverage
|
||||
ships:
|
||||
|
||||
- Structured logs include a `request_id` you can correlate across
|
||||
the server log stream. See
|
||||
@@ -99,8 +132,9 @@ Until then:
|
||||
same per-issuer latency signal a trace span would, just without
|
||||
the per-request fan-out.
|
||||
|
||||
OpenTelemetry instrumentation is tracked in
|
||||
[WORKSPACE-ROADMAP.md](../../WORKSPACE-ROADMAP.md) as a v3 item.
|
||||
Per-handler / per-query / per-connector span instrumentation is
|
||||
tracked in [WORKSPACE-ROADMAP.md](../../WORKSPACE-ROADMAP.md) under
|
||||
§2 (NHI / Agent Identity, Phase 4 in the path-b build plan).
|
||||
|
||||
## Logging
|
||||
|
||||
|
||||
Reference in New Issue
Block a user