mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:51:30 +00:00
asyncpoll: refactor Sectigo / Entrust / GlobalSign to bounded polling (Phase 2)
Phase 2 of the #5 acquisition-readiness fix from the 2026-05-01 issuer
coverage audit. Phase 1 (commit 711265b) shipped the shared asyncpoll
package and refactored DigiCert as the reference. This commit applies
the same pattern to the remaining three async-CA connectors and adds
the operator-facing docs.
Per-connector refactors:
- Sectigo (sectigo.go): GetOrderStatus now wraps pollEnrollmentOnce in
asyncpoll.Poll. The collectNotReady sentinel (cert approved by SCM
but not yet retrievable from the collect endpoint) maps to
StillPending and rides the backoff schedule rather than the prior
"return pending immediately" branch. Added isPermanentStatusError
helper to distinguish transient HTTP errors (5xx / 429 / network)
from permanent ones (4xx / parse failure) — the wrapped checkStatus
errors get triaged at the poll closure boundary.
- Entrust (entrust.go): GetOrderStatus wraps pollEnrollmentOnce. The
AWAITING_APPROVAL status maps to StillPending; operators using
approval-pending workflows where humans approve enrollments should
bump CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS to 86400 (24h) so a
single scheduler tick can wait through the approval window. The
default 10-minute deadline matches the other three connectors.
- GlobalSign (globalsign.go): GetOrderStatus wraps pollCertificateOnce.
GlobalSign tracks orders by serial number rather than order ID, but
the polling shape is identical to the other three. Status-code
triage matches DigiCert: 4xx (not 429) is permanent, 5xx / 429 /
network is transient.
Per-connector Config field added:
- DigiCert.PollMaxWaitSeconds (env CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS)
- Sectigo.PollMaxWaitSeconds (env CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS)
- Entrust.PollMaxWaitSeconds (env CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS)
- GlobalSign.PollMaxWaitSeconds (env CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS)
internal/config/config.go env-var loaders updated for all four. Default
is 600 seconds (10 minutes); zero falls back to the asyncpoll package
default.
Test-helper updates: every existing test that exercises the pending
branch (collectNotReady, AWAITING_APPROVAL, status="pending", etc.)
now sets PollMaxWaitSeconds=1 in its Config so the test doesn't block
on the production-default 10-minute deadline. Tests that exercise
permanent-error branches (404, 401, malformed JSON, etc.) continue
to return immediately.
Test sites updated:
- buildSectigoConnector helper + GetOrderStatus_CollectNotReady test
- buildEntrustConnector helper + GetOrderStatus_Pending test
- buildGlobalsignConnector helper + GetOrderStatus_Pending test +
the GetHTTPClient_NoMTLSCertPaths test (network failure now rides
the backoff schedule rather than returning immediately)
Documentation:
- docs/async-polling.md: new operator reference covering the backoff
schedule, status-code triage, the four env vars, failure modes, and
where the implementation lives. Audit blocker citation included.
- docs/connectors.md: per-issuer sections for DigiCert, Sectigo,
Entrust, GlobalSign each gain the PollMaxWaitSeconds env var row
and a cross-link to async-polling.md.
Lint cleanup: simplified the isPermanentStatusError branch to satisfy
staticcheck S1008 (single-line return for a final boolean check).
Verified locally:
- gofmt -l . clean
- go vet ./... clean
- staticcheck ./... clean
- golangci-lint run --timeout 5m ./... → 0 issues
- go test -short -count=1 across all 4 connector packages + config + asyncpoll: green
Audit reference: cowork/issuer-coverage-audit-2026-05-01/RESULTS.md
Top-10 fix #5 — Phase 2.
This commit is contained in:
@@ -0,0 +1,118 @@
|
||||
# Async-CA Polling — Operator Reference
|
||||
|
||||
Closes audit fix #5 from the 2026-05-01 issuer-coverage acquisition-readiness audit.
|
||||
|
||||
## What this is
|
||||
|
||||
Four issuer connectors talk to Certificate Authorities that issue
|
||||
certificates **asynchronously** — `IssueCertificate` returns an order
|
||||
ID immediately, and the caller (or scheduler) must call
|
||||
`GetOrderStatus` later to retrieve the issued cert:
|
||||
|
||||
- **DigiCert** (CertCentral)
|
||||
- **Sectigo** (Certificate Manager)
|
||||
- **Entrust** (Certificate Services / CA Gateway)
|
||||
- **GlobalSign** (Atlas HVCA)
|
||||
|
||||
Pre-fix, each connector's `GetOrderStatus` made one HTTP call per
|
||||
invocation with no exponential backoff, no retry cap, and no deadline.
|
||||
Under a renewal sweep, certctl would hammer the upstream CA's
|
||||
rate-limit budget. A 429 response was treated as a hard error,
|
||||
which then caused the scheduler to retry on the next tick — re-fanning
|
||||
out the same call that just got rate-limited.
|
||||
|
||||
Post-fix, `GetOrderStatus` blocks for up to `PollMaxWait` (default
|
||||
10 minutes) doing **bounded internal polling**:
|
||||
|
||||
```
|
||||
attempt 1 → wait 5s → attempt 2 → wait 15s → attempt 3 → wait 45s →
|
||||
attempt 4 → wait 2m → attempt 5 → wait 5m → ... (capped at 5m)
|
||||
```
|
||||
|
||||
±20% jitter applied at every wait so multiple certctl instances
|
||||
never synchronize on the upstream CA's rate-limit window. The
|
||||
`PollMaxWait` deadline is a hard cap; if the upstream still hasn't
|
||||
completed by then, `GetOrderStatus` returns `StillPending` and the
|
||||
scheduler can re-enqueue the job for a future tick.
|
||||
|
||||
## Status-code triage
|
||||
|
||||
Each connector classifies HTTP responses to drive polling decisions:
|
||||
|
||||
| Response | Meaning | Decision |
|
||||
|---|---|---|
|
||||
| 2xx + status="issued"/"completed" | Cert ready | Done — return the cert |
|
||||
| 2xx + status="pending"/"processing" | Still working | StillPending — keep polling |
|
||||
| 2xx + status="rejected"/"denied"/"failed" | Permanent | Done — return `OrderStatus{Status:"failed"}` |
|
||||
| 2xx + parse failure | Body is broken | Failed — return error |
|
||||
| 4xx (404/400/401/403) | Permanent client error | Failed — return error |
|
||||
| 429 (rate limited) | Transient | StillPending — keep polling with backoff |
|
||||
| 5xx | Transient | StillPending — keep polling with backoff |
|
||||
| Network / TLS error | Transient | StillPending — keep polling with backoff |
|
||||
|
||||
## Operator tuning
|
||||
|
||||
Each connector exposes a `PollMaxWaitSeconds` config field and
|
||||
matching env var:
|
||||
|
||||
| Connector | Env var | Default |
|
||||
|---|---|---|
|
||||
| DigiCert | `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
|
||||
| Sectigo | `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
|
||||
| Entrust | `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
|
||||
| GlobalSign | `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | 600 (10m) |
|
||||
|
||||
Tune up (e.g., `86400` = 24 hours) for **Entrust approval-pending
|
||||
workflows** where humans manually approve enrollments. Tune down (e.g.,
|
||||
`60`) for high-throughput environments that prefer to recycle the
|
||||
scheduler tick rather than block one renewal goroutine for minutes.
|
||||
|
||||
A value of 0 (or unset) falls back to the package default in
|
||||
`internal/connector/issuer/asyncpoll`.
|
||||
|
||||
## Failure modes
|
||||
|
||||
**Upstream returns 429 forever.** The Poller respects the backoff
|
||||
(5s → 15s → 45s → 2m → 5m), so a sustained 429 stream burns through
|
||||
the full `PollMaxWait` budget with at most 7-8 attempts (instead of
|
||||
~600 attempts at 1/sec). After `PollMaxWait` expires, `GetOrderStatus`
|
||||
returns `StillPending`; the scheduler re-enqueues for the next tick.
|
||||
The total request volume against the upstream is bounded by `tick
|
||||
interval / minimum backoff` — typically 1-2 requests per minute even
|
||||
under heavy load.
|
||||
|
||||
**Sectigo `collectNotReady` sentinel.** When the SCM status endpoint
|
||||
reports `Issued` but the cert collect endpoint isn't yet ready, the
|
||||
old code branched into a special "pending" return. Now that branch
|
||||
returns `StillPending` from the poll closure, so the cert collection
|
||||
rides the same backoff schedule.
|
||||
|
||||
**Entrust approval-pending.** The `AWAITING_APPROVAL` status maps to
|
||||
`StillPending`. With the default `PollMaxWait=10m`, the scheduler
|
||||
will re-enqueue once per tick if approval hasn't happened yet; with
|
||||
`PollMaxWait=24h` the same renewal goroutine waits the full approval
|
||||
window. Pick the latter when you have many approval-pending
|
||||
enrollments per tick.
|
||||
|
||||
## Where the implementation lives
|
||||
|
||||
- `internal/connector/issuer/asyncpoll/asyncpoll.go` — shared `Poller`
|
||||
with backoff math, jitter, deadline, and ctx-aware cancellation.
|
||||
- `internal/connector/issuer/digicert/digicert.go` —
|
||||
`pollOrderOnce` + `GetOrderStatus` orchestrator.
|
||||
- `internal/connector/issuer/sectigo/sectigo.go` —
|
||||
`pollEnrollmentOnce` + status-code permanence triage
|
||||
(`isPermanentStatusError`).
|
||||
- `internal/connector/issuer/entrust/entrust.go` —
|
||||
`pollEnrollmentOnce` + approval-pending mapping.
|
||||
- `internal/connector/issuer/globalsign/globalsign.go` —
|
||||
`pollCertificateOnce` (serial-number tracking).
|
||||
- `internal/connector/issuer/asyncpoll/asyncpoll_test.go` — 11 unit
|
||||
tests covering happy path, transient-then-success, Failed
|
||||
termination, MaxWait timeout, last-error wrap, ctx cancel,
|
||||
multiplicative backoff, jitter bounds, defaults.
|
||||
|
||||
## Audit blocker reference
|
||||
|
||||
cowork/issuer-coverage-audit-2026-05-01/RESULTS.md, Top-10 fix #5
|
||||
(Part 1.5 finding #4: "No polling backoff for async CAs").
|
||||
+6
-2
@@ -436,8 +436,9 @@ The DigiCert connector integrates with DigiCert's CertCentral REST API for order
|
||||
| `CERTCTL_DIGICERT_ORG_ID` | — | DigiCert organization ID |
|
||||
| `CERTCTL_DIGICERT_PRODUCT_TYPE` | `ssl_basic` | Certificate product (e.g., `ssl_basic`, `ssl_plus`, `ssl_ev`) |
|
||||
| `CERTCTL_DIGICERT_BASE_URL` | `https://www.digicert.com/services/v2` | DigiCert API base URL |
|
||||
| `CERTCTL_DIGICERT_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. See [docs/async-polling.md](async-polling.md). |
|
||||
|
||||
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. The connector periodically checks order status via `/order/certificate/{order_id}` until the certificate is available.
|
||||
The connector submits certificate orders to DigiCert's `/order/certificate/create` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by DigiCert) and poll-based completion. `GetOrderStatus` runs bounded internal polling (5s/15s/45s/2m/5m capped, ±20% jitter, default 10-minute deadline) — see [async-polling.md](async-polling.md).
|
||||
|
||||
**Authentication:** API key passed via `X-DC-DEVKEY` header, with organization ID in request body.
|
||||
|
||||
@@ -460,8 +461,9 @@ The Sectigo connector integrates with Sectigo Certificate Manager's REST API for
|
||||
| `CERTCTL_SECTIGO_CERT_TYPE` | — | Certificate type ID (integer, from `/ssl/v1/types`) |
|
||||
| `CERTCTL_SECTIGO_TERM` | `365` | Certificate validity in days |
|
||||
| `CERTCTL_SECTIGO_BASE_URL` | `https://cert-manager.com/api` | Sectigo API base URL |
|
||||
| `CERTCTL_SECTIGO_POLL_MAX_WAIT_SECONDS` | `600` | Bounded-polling deadline for `GetOrderStatus`. The `collectNotReady` sentinel (cert approved but not yet retrievable) rides the same backoff schedule. See [docs/async-polling.md](async-polling.md). |
|
||||
|
||||
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. The connector periodically checks enrollment status via `/ssl/v1/{sslId}` and downloads the PEM bundle via `/ssl/v1/collect/{sslId}/pem` when issued.
|
||||
The connector submits certificate enrollments to Sectigo's `/ssl/v1/enroll` API. DV certificates may issue immediately; OV/EV certificates require validation (handled by Sectigo) and poll-based completion. `GetOrderStatus` runs bounded internal polling — see [async-polling.md](async-polling.md).
|
||||
|
||||
**Authentication:** Three custom headers on every request — `customerUri`, `login`, and `password`.
|
||||
|
||||
@@ -566,6 +568,7 @@ Entrust CA Gateway REST API with mutual TLS (mTLS) client certificate authentica
|
||||
| `CERTCTL_ENTRUST_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_ENTRUST_CA_ID` | Yes | — | Certificate Authority ID (from `GET /certificate-authorities`) |
|
||||
| `CERTCTL_ENTRUST_PROFILE_ID` | No | — | Optional enrollment profile ID |
|
||||
| `CERTCTL_ENTRUST_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. Approval-pending workflows where humans approve enrollments should bump to `86400` (24h) so a single tick can wait through the approval window. See [docs/async-polling.md](async-polling.md). |
|
||||
|
||||
**Authentication:** Mutual TLS — the client certificate and key are loaded via `tls.LoadX509KeyPair()` and attached to the HTTP transport. No API key or token required.
|
||||
|
||||
@@ -587,6 +590,7 @@ GlobalSign Atlas High Volume CA REST API with dual authentication: mTLS for the
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_CERT_PATH` | Yes | — | Path to mTLS client certificate PEM |
|
||||
| `CERTCTL_GLOBALSIGN_CLIENT_KEY_PATH` | Yes | — | Path to mTLS client private key PEM |
|
||||
| `CERTCTL_GLOBALSIGN_SERVER_CA_PATH` | No | system trust store | PEM bundle used to verify the Atlas API server certificate. Set this for private/lab Atlas deployments whose server TLS chain is not in the host's default trust bundle. |
|
||||
| `CERTCTL_GLOBALSIGN_POLL_MAX_WAIT_SECONDS` | No | `600` (10m) | Bounded-polling deadline for `GetOrderStatus`. GlobalSign tracks orders by serial number rather than order ID; the polling shape is identical. See [docs/async-polling.md](async-polling.md). |
|
||||
|
||||
**Authentication:** Dual — mTLS client certificate for TLS handshake plus `X-API-Key` and `X-API-Secret` headers on every request.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user