Trivial whitespace fix: gofmt collapsed three trailing-comment columns
that I'd hand-aligned in the test file. Local sandbox missed this
because the per-file gofmt run earlier in the commit cycle was scoped
to the changed-files list and didn't include the test file at the
final write moment; CI's project-wide `gofmt -l .` caught it.
Behavior unchanged.
Closes the #4 acquisition-readiness blocker from the 2026-05-01 issuer
coverage audit. Before this commit, certctl's Prometheus exposition had
zero per-issuer-type signal — operators answering "is DigiCert slow?"
or "is Sectigo failing more than ACME?" had to grep logs by issuer
name. This commit adds three series labelled by issuer type:
certctl_issuance_total{issuer_type, outcome}
certctl_issuance_duration_seconds{issuer_type} (histogram)
certctl_issuance_failures_total{issuer_type, error_class}
The histogram covers 0.05–120 second buckets to span the local-issuer
fast path and async-CA slow path (DigiCert/Sectigo/Entrust polling
can take minutes). error_class is a closed enum of eight values
(timeout, auth, rate_limited, validation, upstream_5xx, upstream_4xx,
network, other) classified once in service.ClassifyError. Cardinality
budget is ~276 new series, well within Prometheus's comfortable range.
Implementation:
- service.IssuanceMetrics is the thread-safe counter + histogram
table. Three independent views (counters / failures / durations)
exposed via SnapshotCounters / SnapshotFailures / SnapshotDurations.
sync.RWMutex protects the map shape; per-key sync/atomic.Uint64
primitives keep the recording hot path lock-free under concurrent
service-layer goroutines.
- service.IssuanceCounterEntry / IssuanceFailureEntry /
IssuanceDurationEntry / IssuanceMetricsSnapshotter live in service
(not handler) to avoid an import cycle: handler already imports
service for admin_est.go etc., so service can't import handler back.
Handler's exposer takes the snapshotter via the service-defined
interface.
- service.ClassifyError pure function maps error → error_class.
context.DeadlineExceeded / context.Canceled → timeout; *net.OpError
→ network; substring matches against canonical AWS / DigiCert /
Sectigo error shapes for auth / rate_limited / validation /
upstream_5xx / upstream_4xx / network; unknown → other. Each branch
has at least one representative test case in
TestClassifyError.
- IssuerConnectorAdapter.SetMetrics wires per-adapter recording
(issuerType + metrics). Existing 28+ test call sites of
NewIssuerConnectorAdapter keep their one-arg signature; production
wiring goes through SetMetrics post-construction.
- IssuerRegistry.SetIssuanceMetrics + Rebuild type-asserts to
*IssuerConnectorAdapter and calls SetMetrics with the issuer type
string. nil-guarded — tests that hand-build adapters without
metrics get no-op recording.
- IssuerConnectorAdapter.IssueCertificate / RenewCertificate wrap the
underlying connector call with start := time.Now() and
recordIssuance(start, err). Renewal is recorded into the same
certctl_issuance_* series as initial issuance — operationally,
renewal IS issuance from the connector's perspective (matches the
audit prompt's guidance on series naming).
- handler/metrics.go GetPrometheusMetrics gains a new exposer block
emitting all three series in stable label order with correct
Prometheus format (_bucket / _sum / _count for the histogram, +Inf
bucket appended). Sorted via sort.Slice for stable output. nil-
guarded so deploys without the wire produce clean exposition.
- formatLE helper trims trailing zeros from histogram bucket labels
via strconv.FormatFloat(le, 'f', -1, 64) so the `le` labels match
Prometheus client conventions ("0.05", "30", "120", not "0.0500"
etc.).
- cmd/server/main.go wires a single IssuanceMetrics instance into
both the IssuerRegistry (recording) and the MetricsHandler (exposer)
using DefaultIssuanceBucketBoundaries.
Tests:
- TestIssuanceMetrics_RecordAndSnapshot — happy-path counter +
histogram + failure recording, BucketBoundaries returns a copy
(not shared storage).
- TestIssuanceMetrics_HistogramCumulative — pins the cumulative-buckets
contract. 100ms observation lands in 0.1 bucket and every larger
bucket; 750ms only in the 1.0 bucket. Off-by-one here would
corrupt every quantile query downstream.
- TestIssuanceMetrics_Concurrency — 100 goroutines × 1000 ops under
the race detector. Asserts atomic counter integrity across
contended writes.
- TestClassifyError — 17 cases covering every branch of the closed
enum plus the nil-error special case.
Implementation chooses the existing hand-rolled fmt.Fprintf
exposition pattern (no prometheus/client_golang dependency added)
to stay consistent with the OCSP / deploy counter blocks already in
the file.
Out of scope (separate follow-ups):
- Revocation metrics (certctl_revocation_*) — symmetric to issuance
but the audit didn't ask; explicit follow-up commit.
- Discovery / health-check duration histograms.
- prometheus/client_golang migration.
Verified locally:
- gofmt clean
- go vet ./... clean
- staticcheck ./... clean
- golangci-lint run --timeout 5m ./... → 0 issues
- go test -short -count=1 ./internal/service/ green
- go test -short -count=1 -race -run TestIssuanceMetrics ./internal/service/ green
- go test -short -count=1 ./internal/api/handler/ green
- go build ./... success
Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix#4 (Part 3, narrative section).