asyncpoll: refactor Sectigo / Entrust / GlobalSign to bounded polling (Phase 2)

Phase 2 of the #5 acquisition-readiness fix from the 2026-05-01 issuer
coverage audit. Phase 1 (commit 711265b) shipped the shared asyncpoll
package and refactored DigiCert as the reference. This commit applies
the same pattern to the remaining three async-CA connectors and adds
the operator-facing docs.

Per-connector refactors:

- Sectigo (sectigo.go): GetOrderStatus now wraps pollEnrollmentOnce in
  asyncpoll.Poll. The collectNotReady sentinel (cert approved by SCM
  but not yet retrievable from the collect endpoint) maps to
  StillPending and rides the backoff schedule rather than the prior
  "return pending immediately" branch. Added isPermanentStatusError
  helper to distinguish transient HTTP errors (5xx / 429 / network)
  from permanent ones (4xx / parse failure) — the wrapped checkStatus
  errors get triaged at the poll closure boundary.

- Entrust (entrust.go): GetOrderStatus wraps pollEnrollmentOnce. The
  AWAITING_APPROVAL status maps to StillPending; operators using
  approval-pending workflows where humans approve enrollments should
  bump CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS to 86400 (24h) so a
  single scheduler tick can wait through the approval window. The
  default 10-minute deadline matches the other three connectors.

- GlobalSign (globalsign.go): GetOrderStatus wraps pollCertificateOnce.
  GlobalSign tracks orders by serial number rather than order ID, but
  the polling shape is identical to the other three. Status-code
  triage matches DigiCert: 4xx (not 429) is permanent, 5xx / 429 /
  network is transient.

Per-connector Config field added:
- DigiCert.PollMaxWaitSeconds (env CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS)
- Sectigo.PollMaxWaitSeconds (env CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS)
- Entrust.PollMaxWaitSeconds (env CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS)
- GlobalSign.PollMaxWaitSeconds (env CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS)

internal/config/config.go env-var loaders updated for all four. Default
is 600 seconds (10 minutes); zero falls back to the asyncpoll package
default.

Test-helper updates: every existing test that exercises the pending
branch (collectNotReady, AWAITING_APPROVAL, status="pending", etc.)
now sets PollMaxWaitSeconds=1 in its Config so the test doesn't block
on the production-default 10-minute deadline. Tests that exercise
permanent-error branches (404, 401, malformed JSON, etc.) continue
to return immediately.

Test sites updated:
- buildSectigoConnector helper + GetOrderStatus_CollectNotReady test
- buildEntrustConnector helper + GetOrderStatus_Pending test
- buildGlobalsignConnector helper + GetOrderStatus_Pending test +
  the GetHTTPClient_NoMTLSCertPaths test (network failure now rides
  the backoff schedule rather than returning immediately)

Documentation:
- docs/async-polling.md: new operator reference covering the backoff
  schedule, status-code triage, the four env vars, failure modes, and
  where the implementation lives. Audit blocker citation included.
- docs/connectors.md: per-issuer sections for DigiCert, Sectigo,
  Entrust, GlobalSign each gain the PollMaxWaitSeconds env var row
  and a cross-link to async-polling.md.

Lint cleanup: simplified the isPermanentStatusError branch to satisfy
staticcheck S1008 (single-line return for a final boolean check).

Verified locally:
- gofmt -l . clean
- go vet ./... clean
- staticcheck ./... clean
- golangci-lint run --timeout 5m ./... → 0 issues
- go test -short -count=1 across all 4 connector packages + config + asyncpoll: green

Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix #5 — Phase 2.
This commit is contained in:
shankar0123
2026-05-02 02:41:36 +00:00
parent 633a10aa4e
commit 0509790325
12 changed files with 523 additions and 122 deletions
+118
View File
@@ -0,0 +1,118 @@
# Async-CA Polling — Operator Reference
Closes audit fix #5 from the 2026-05-01 issuer-coverage acquisition-readiness audit.
## What this is
Four issuer connectors talk to Certificate Authorities that issue
certificates **asynchronously**`IssueCertificate` returns an order
ID immediately, and the caller (or scheduler) must call
`GetOrderStatus` later to retrieve the issued cert:
- **DigiCert** (CertCentral)
- **Sectigo** (Certificate Manager)
- **Entrust** (Certificate Services / CA Gateway)
- **GlobalSign** (Atlas HVCA)
Pre-fix, each connector's `GetOrderStatus` made one HTTP call per
invocation with no exponential backoff, no retry cap, and no deadline.
Under a renewal sweep, certctl would hammer the upstream CA's
rate-limit budget. A 429 response was treated as a hard error,
which then caused the scheduler to retry on the next tick — re-fanning
out the same call that just got rate-limited.
Post-fix, `GetOrderStatus` blocks for up to `PollMaxWait` (default
10 minutes) doing **bounded internal polling**:
```
attempt 1 → wait 5s → attempt 2 → wait 15s → attempt 3 → wait 45s →
attempt 4 → wait 2m → attempt 5 → wait 5m → ... (capped at 5m)
```
±20% jitter applied at every wait so multiple certctl instances
never synchronize on the upstream CA's rate-limit window. The
`PollMaxWait` deadline is a hard cap; if the upstream still hasn't
completed by then, `GetOrderStatus` returns `StillPending` and the
scheduler can re-enqueue the job for a future tick.
## Status-code triage
Each connector classifies HTTP responses to drive polling decisions:
| Response | Meaning | Decision |
|---|---|---|
| 2xx + status="issued"/"completed" | Cert ready | Done — return the cert |
| 2xx + status="pending"/"processing" | Still working | StillPending — keep polling |
| 2xx + status="rejected"/"denied"/"failed" | Permanent | Done — return `OrderStatus{Status:"failed"}` |
| 2xx + parse failure | Body is broken | Failed — return error |
| 4xx (404/400/401/403) | Permanent client error | Failed — return error |
| 429 (rate limited) | Transient | StillPending — keep polling with backoff |
| 5xx | Transient | StillPending — keep polling with backoff |
| Network / TLS error | Transient | StillPending — keep polling with backoff |
## Operator tuning
Each connector exposes a `PollMaxWaitSeconds` config field and
matching env var:
| Connector | Env var | Default |
|---|---|---|
| DigiCert | `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
| Sectigo | `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
| Entrust | `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
| GlobalSign | `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
Tune up (e.g., `86400` = 24 hours) for **Entrust approval-pending
workflows** where humans manually approve enrollments. Tune down (e.g.,
`60`) for high-throughput environments that prefer to recycle the
scheduler tick rather than block one renewal goroutine for minutes.
A value of 0 (or unset) falls back to the package default in
`internal/connector/issuer/asyncpoll`.
## Failure modes
**Upstream returns 429 forever.** The Poller respects the backoff
(5s → 15s → 45s → 2m → 5m), so a sustained 429 stream burns through
the full `PollMaxWait` budget with at most 7-8 attempts (instead of
~600 attempts at 1/sec). After `PollMaxWait` expires, `GetOrderStatus`
returns `StillPending`; the scheduler re-enqueues for the next tick.
The total request volume against the upstream is bounded by `tick
interval / minimum backoff` — typically 1-2 requests per minute even
under heavy load.
**Sectigo `collectNotReady` sentinel.** When the SCM status endpoint
reports `Issued` but the cert collect endpoint isn't yet ready, the
old code branched into a special "pending" return. Now that branch
returns `StillPending` from the poll closure, so the cert collection
rides the same backoff schedule.
**Entrust approval-pending.** The `AWAITING_APPROVAL` status maps to
`StillPending`. With the default `PollMaxWait=10m`, the scheduler
will re-enqueue once per tick if approval hasn't happened yet; with
`PollMaxWait=24h` the same renewal goroutine waits the full approval
window. Pick the latter when you have many approval-pending
enrollments per tick.
## Where the implementation lives
- `internal/connector/issuer/asyncpoll/asyncpoll.go` — shared `Poller`
with backoff math, jitter, deadline, and ctx-aware cancellation.
- `internal/connector/issuer/digicert/digicert.go`
`pollOrderOnce` + `GetOrderStatus` orchestrator.
- `internal/connector/issuer/sectigo/sectigo.go`
`pollEnrollmentOnce` + status-code permanence triage
(`isPermanentStatusError`).
- `internal/connector/issuer/entrust/entrust.go`
`pollEnrollmentOnce` + approval-pending mapping.
- `internal/connector/issuer/globalsign/globalsign.go`
`pollCertificateOnce` (serial-number tracking).
- `internal/connector/issuer/asyncpoll/asyncpoll_test.go` — 11 unit
tests covering happy path, transient-then-success, Failed
termination, MaxWait timeout, last-error wrap, ctx cancel,
multiplicative backoff, jitter bounds, defaults.
## Audit blocker reference
cowork/issuer-coverage-audit-2026-05-01/RESULTS.md, Top-10 fix #5
(Part 1.5 finding #4: "No polling backoff for async CAs").
+6 -2
View File
@@ -436,8 +436,9 @@ The DigiCert connector integrates with DigiCert's CertCentral REST API for order
| `CERTCTL_DIGICERT_ORG_ID` | — | DigiCert organization ID |
| `CERTCTL_DIGICERT_PRODUCT_TYPE` | `ssl_basic` | Certificate product (e.g., `ssl_basic`, `ssl_plus`, `ssl_ev`) |
| `CERTCTL_DIGICERT_BASE_URL` | `https://www.digicert.com/services/v2` | DigiCert API base URL |
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. See [docs/async-polling.md](async-polling.md). |
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. The connector periodically checks order status via `/order/certificate/{order_id}` until the certificate is available.
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. `GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m capped, ±20% jitter, default 10-minute deadline) — see [async-polling.md](async-polling.md).
**Authentication:** API key passed via `X-DC-DEVKEY` header, with organization ID in request body.
@@ -460,8 +461,9 @@ The Sectigo connector integrates with Sectigo Certificate Manager's REST API for
| `CERTCTL_SECTIGO_CERT_TYPE` | — | Certificate type ID (integer, from `/ssl/v1/types`) |
| `CERTCTL_SECTIGO_TERM` | `365` | Certificate validity in days |
| `CERTCTL_SECTIGO_BASE_URL` | `https://cert-manager.com/api` | Sectigo API base URL |
| `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. The `collectNotReady` sentinel (cert approved but not yet retrievable) rides the same backoff schedule. See [docs/async-polling.md](async-polling.md). |
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. The connector periodically checks enrollment status via `/ssl/v1/{sslId}` and downloads the PEM bundle via `/ssl/v1/collect/{sslId}/pem` when issued.
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. `GetOrderStatus` runs bounded internal polling — see [async-polling.md](async-polling.md).
**Authentication:** Three custom headers on every request — `customerUri`, `login`, and `password`.
@@ -566,6 +568,7 @@ Entrust CA Gateway REST API with mutual TLS (mTLS) client certificate authentica
| `CERTCTL_ENTRUST_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
| `CERTCTL_ENTRUST_CA_ID` | Yes | — | Certificate Authority ID (from `GET /certificate-authorities`) |
| `CERTCTL_ENTRUST_PROFILE_ID` | No | — | Optional enrollment profile ID |
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. Approval-pending workflows where humans approve enrollments should bump to `86400` (24h) so a single tick can wait through the approval window. See [docs/async-polling.md](async-polling.md). |
**Authentication:** Mutual TLS — the client certificate and key are loaded via `tls.LoadX509KeyPair()` and attached to the HTTP transport. No API key or token required.
@@ -587,6 +590,7 @@ GlobalSign Atlas High Volume CA REST API with dual authentication: mTLS for the
| `CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
| `CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
| `CERTCTL_GLOBALSIGN_SERVER_CA_PATH` | No | system trust store | PEM bundle used to verify the Atlas API server certificate. Set this for private/lab Atlas deployments whose server TLS chain is not in the host's default trust bundle. |
| `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. GlobalSign tracks orders by serial number rather than order ID; the polling shape is identical. See [docs/async-polling.md](async-polling.md). |
**Authentication:** Dual — mTLS client certificate for TLS handshake plus `X-API-Key` and `X-API-Secret` headers on every request.