mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 23:11:32 +00:00
67f346cd877c1c8d680f37fdf967476ef367ed4e
8 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
67f346cd87 |
docs(arch-h1): Phase 13 Sprint 13.1 — categorize OpenAPI exceptions + bucket guards
Phase 13 Sprint 13.1 closure (architecture diligence audit ARCH-H1):
splits api/openapi-handler-exceptions.yaml's 64 entries into two
buckets via a required `category:` field, extends the parity script
with bucket reporting + a `--bucket=` subcommand, and adds a sibling
monotonic-decrease guard pinned to a checked-in baseline file. Pure
YAML + bash + doc; zero runtime change.
Strategy
========
The audit originally framed ARCH-H1 as "burn down the 64-entry
exception list to ≤20." Sprint 13.1 reframes against the structural
reality: 36 of the 64 entries are legitimate IETF-RFC wire-protocol
contracts (SCEP RFC 8894, ACME RFC 8555, ACME ARI RFC 9773, EST
RFC 7030) that MUST stay; the remaining 28 are REST-shaped routes
whose OpenAPI op was deferred. Categorize the two buckets, monotone-
gate the rest-deferred bucket against a baseline, and Sprints
13.4-13.6 drive rest-deferred to zero.
Categorization rule applied per-entry
=====================================
An entry is `category: wire-protocol` if ANY of:
1. `why:` cites an RFC anchor (RFC 8894 / 8555 / 9773 / 7030).
2. `why:` contains the strings "wire-protocol", "wire protocol",
"sibling", or "shorthand".
3. Route path starts with `/scep`, `/scep-mtls`, `/acme/`, or
`/acme` (wire-protocol prefix).
Otherwise: `category: rest-deferred`.
This rule produced the 36 / 28 split that the Sprint 13.1 audit
prompt expected — verified by python assertion + manual eyeball
review of every entry's `why:` field before categorizing.
Per-entry decisions (read off the post-categorization YAML)
===========================================================
WIRE-PROTOCOL (36) — RFC contracts; never burn down:
SCEP family (8) — RFC 8894 + RFC 7030 SCEP-mTLS sibling:
GET /scep RFC 8894 §3.1 GetCACert / GetCACaps
POST /scep RFC 8894 §3.1 PKCSReq / RenewalReq
GET /scep/ trailing-slash variant (ChromeOS)
POST /scep/ trailing-slash variant (ChromeOS)
GET /scep-mtls EST RFC 7030 Phase 6.5 sibling
POST /scep-mtls SCEP-mTLS POST variant
GET /scep-mtls/ SCEP-mTLS trailing-slash variant
POST /scep-mtls/ SCEP-mTLS trailing-slash POST
ACME per-profile (12) — RFC 8555 §7.x + RFC 9773 ARI:
GET /acme/profile/{id}/directory RFC 8555 §7.1.1
HEAD /acme/profile/{id}/new-nonce RFC 8555 §7.2
GET /acme/profile/{id}/new-nonce RFC 8555 §7.2
POST /acme/profile/{id}/new-account RFC 8555 §7.3
POST /acme/profile/{id}/account/{acc_id} RFC 8555 §7.3.2/.6
POST /acme/profile/{id}/new-order RFC 8555 §7.4
POST /acme/profile/{id}/order/{ord_id} RFC 8555 §7.4 PoG
POST /acme/profile/{id}/order/{ord_id}/finalize RFC 8555 §7.4
POST /acme/profile/{id}/authz/{authz_id} RFC 8555 §7.5
POST /acme/profile/{id}/challenge/{chall_id} RFC 8555 §7.5.1
POST /acme/profile/{id}/cert/{cert_id} RFC 8555 §7.4.2
POST /acme/profile/{id}/key-change RFC 8555 §7.3.5
POST /acme/profile/{id}/revoke-cert RFC 8555 §7.6
GET /acme/profile/{id}/renewal-info/{cert_id} RFC 9773 ARI
ACME default-profile shorthand (14) — sibling routes; same wire
semantics, dispatched when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID
is set:
GET /acme/directory
HEAD /acme/new-nonce
GET /acme/new-nonce
POST /acme/new-account
POST /acme/account/{acc_id}
POST /acme/new-order
POST /acme/order/{ord_id}
POST /acme/order/{ord_id}/finalize
POST /acme/authz/{authz_id}
POST /acme/challenge/{chall_id}
POST /acme/cert/{cert_id}
POST /acme/key-change
POST /acme/revoke-cert
GET /acme/renewal-info/{cert_id}
REST-DEFERRED (28) — gaps; Sprints 13.4-13.6 author into openapi.yaml:
auth/sessions cluster (3):
GET /api/v1/auth/sessions
DELETE /api/v1/auth/sessions
DELETE /api/v1/auth/sessions/{id}
auth/oidc CRUD + JWKS + test + refresh cluster (10):
GET /api/v1/auth/oidc/providers
POST /api/v1/auth/oidc/providers
PUT /api/v1/auth/oidc/providers/{id}
DELETE /api/v1/auth/oidc/providers/{id}
GET /api/v1/auth/oidc/providers/{id}/jwks-status
POST /api/v1/auth/oidc/providers/{id}/refresh
POST /api/v1/auth/oidc/test
GET /api/v1/auth/oidc/group-mappings
POST /api/v1/auth/oidc/group-mappings
DELETE /api/v1/auth/oidc/group-mappings/{id}
auth/breakglass admin cluster (4):
GET /api/v1/auth/breakglass/credentials
POST /api/v1/auth/breakglass/credentials
DELETE /api/v1/auth/breakglass/credentials/{actor_id}
POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
auth/users cluster (3):
GET /api/v1/auth/users
DELETE /api/v1/auth/users/{id}
POST /api/v1/auth/users/{id}/reactivate
Misc REST one-offs (3):
GET /api/v1/auth/runtime-config
POST /api/v1/auth/demo-residual/cleanup
GET /api/v1/audit/export
OIDC + breakglass browser flows (5):
GET /auth/oidc/login
GET /auth/oidc/callback
POST /auth/oidc/back-channel-logout
POST /auth/logout
POST /auth/breakglass/login
Files changed
=============
api/openapi-handler-exceptions.yaml (+1 line per entry):
- Header rewritten to document the two-bucket contract + the
Phase 13 burn-down plan + the baseline-file convention.
- Every existing `route:` + `why:` pair preserved verbatim.
- ` category: <bucket>` line inserted after each `why:` line.
- Pyyaml round-trip parses to 64 entries cleanly.
api/openapi-handler-exceptions-baseline.txt (NEW, 1 line):
- Contains single integer `28` matching the current rest-deferred
count. Sprints 13.4-13.6 decrement this in lockstep with each
batch of OpenAPI ops authored.
scripts/ci-guards/openapi-handler-parity.sh (rewritten):
- Reports `wire-protocol: N` + `rest-deferred: N` lines alongside
the existing total.
- New `--bucket=wire-protocol|rest-deferred` subcommand prints
just the bucket count + exits 0. Used by the new monotonic
guard + by Sprint 13.7's hard-floor pin.
- New fail condition: any entry missing the required `category:`
field, or carrying an unknown category value, fails the build
with a clear ::error:: annotation.
- Existing exit-code semantics preserved (drift / orphan / stale
detection paths unchanged).
scripts/ci-guards/openapi-rest-deferred-monotonic.sh (NEW):
- Reads the rest-deferred count via the parity script's --bucket
subcommand.
- Reads the baseline file at
api/openapi-handler-exceptions-baseline.txt.
- Fails with ::error:: if current count exceeds OR falls below the
baseline. The fall-below path forces operators to update the
baseline in the same commit as the corresponding YAML deletion
— keeps the monotonic-decrease contract honest.
- CI workflow auto-discovers any scripts/ci-guards/*.sh; no
.github/workflows/ci.yml change required (verified — the loop
at .github/workflows/ci.yml::Regression\ guards uses a glob).
scripts/ci-guards/README.md (+33 lines):
- Two new entries in the per-finding regression-guards table for
`openapi-handler-parity` (existing; bucket subcommand documented)
and `openapi-rest-deferred-monotonic` (new).
- New "ARCH-H1 OpenAPI exception two-bucket contract" section
documenting the wire-protocol vs rest-deferred decision rule +
the canonical close path for a rest-deferred entry (author op
+ delete exception + decrement baseline in same PR) + the
bucket-count inspection commands.
Verification (all local, sandbox /sessions partition full so
disk-tmpfile-dependent guards skipped — see Hotfix #4 commit msg
for sandbox-disk context)
=========================================================
$ bash scripts/ci-guards/openapi-handler-parity.sh
Router routes: 220
OpenAPI operations: 158
Documented exceptions: 64
wire-protocol: 36
rest-deferred: 28
openapi-handler-parity: clean.
$ bash scripts/ci-guards/openapi-handler-parity.sh --bucket=wire-protocol
36
$ bash scripts/ci-guards/openapi-handler-parity.sh --bucket=rest-deferred
28
$ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
openapi-rest-deferred-monotonic: clean — rest-deferred = 28,
baseline = 28.
$ cat api/openapi-handler-exceptions-baseline.txt
28
$ python3 -c "import yaml; d=yaml.safe_load(open('api/openapi-handler-exceptions.yaml')); print(len(d['documented_exceptions']))"
64
Negative test (corrupted baseline → guard fails):
$ echo "abc" > api/openapi-handler-exceptions-baseline.txt
$ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
::error::api/openapi-handler-exceptions-baseline.txt must contain
a single non-negative integer; got: 'abc'
Negative test (rest-deferred over baseline → guard fails):
$ echo "27" > api/openapi-handler-exceptions-baseline.txt
$ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
::error::rest-deferred bucket grew: 28 > baseline 27.
Negative test (missing category → parity script fails):
$ # delete first 'category: wire-protocol' line
$ bash scripts/ci-guards/openapi-handler-parity.sh
::error::api/openapi-handler-exceptions.yaml: 1 entries missing
required `category:` field:
GET /scep
Ambiguous entries surfaced for operator review
==============================================
None. Every entry's category derived deterministically from the
3-rule decision tree (RFC anchor → wire-protocol; wire/sibling/
shorthand keyword in `why:` → wire-protocol; route prefix matches
wire-protocol family → wire-protocol; otherwise rest-deferred).
Closes: Phase 13 Sprint 13.1 of the certctl architecture diligence
remediation (ARCH-H1 structural categorization). Unblocks Sprints
13.4-13.6 (OpenAPI authoring batches against the rest-deferred
bucket).
|
||
|
|
3c81531398 |
ci: OpenAPI parity reconciliation + codegen scaffolding (Phase 5 — ARCH-H1 / ARCH-M6)
Phase 5 reconciliation: the audit's headline framing 'ARCH-H1 = 62-route
OpenAPI gap' was a measurement scoping error. Every one of the 209
unique router routes is already accounted for — 154 in api/openapi.yaml,
55 in api/openapi-handler-exceptions.yaml. The existing
openapi-handler-parity.sh CI guard already enforces this and passes
clean today. The audit subtracted operation-count from route-count
without accounting for the documented exceptions YAML.
Where real work remains (and what this PR does about it)
=========================================================
Of the 64 documented exceptions, 35 are legitimate wire-protocol
carve-outs that MUST stay (SCEP RFC 8894 × 8 entries, ACME RFC 8555
default + per-profile × 27 entries — they're protocol contracts, not
REST resources). The remaining 29 are REST-shaped routes whose
OpenAPI ops were deferred during their original Bundle 2 /
audit-2026-05-10 / 2026-05-11 work:
- auth/sessions (3)
- auth/oidc admin (9)
- auth/breakglass admin (4)
- auth/users mgmt (3)
- auth/runtime-config (1)
- auth/demo-residual/cleanup (1)
- audit/export (1)
- auth/logout (1)
- auth/breakglass/login (1)
- auth/oidc {login,callback,bcl} (3)
- oidc/providers/{id}/jwks-status (1)
- + 2 other auth-flow routes
Burn-down plan in 3 sprints (documented in
api/openapi-handler-exceptions.yaml header):
Sprint A: Cluster 1 — sessions + oidc admin (12 ops)
Sprint B: Cluster 2 — breakglass + users + runtime-config (8 ops)
Sprint C: Cluster 3 — audit/export + auth flows (9 ops)
This PR does NOT author the 29 OpenAPI ops; each needs request/
response schemas, not placeholders, and the design work is too
large for one PR. The reconciliation here is documentation + a CI
guard that will fail any future schema-drift, plus the scaffolding
needed for sub-phase 5b.
Sub-phase 5b: codegen scaffolding
==================================
Adds the orval scaffolding without running npm install (sandbox
disk-full; first 'npm install' + 'npm run generate' happens on the
operator's workstation):
- web/orval.config.ts — codegen config emits react-query hooks
from api/openapi.yaml into web/src/api/generated/
- web/package.json — adds orval@^7.0.0 devDep + 'generate' npm script
- web/CODEGEN.md — operator-facing migration doc:
first-time setup, per-consumer migration pattern, burn-down plan,
CI-guard rules
- scripts/ci-guards/openapi-codegen-drift.sh — blocks the build
when api/openapi.yaml changes but web/src/api/generated/ wasn't
regenerated alongside. Currently no-op (the directory doesn't
exist yet); activates from the first 'npm run generate' run.
The legacy web/src/api/client.ts stays in tree per the phase prompt's
'do not delete in same PR as codegen' rule. Consumers migrate one
page at a time as their OpenAPI ops land; client.ts deletion is a
SEPARATE follow-up PR after the last consumer migrates.
Updates to existing guard + exceptions YAML
============================================
- scripts/ci-guards/openapi-handler-parity.sh header rewritten
with the Phase 5 reconciliation numbers (220/158/64/0) and the
wire-protocol vs REST-deferred classification.
- api/openapi-handler-exceptions.yaml header rewritten with the
35/29 split + the 3-sprint burn-down plan. Each exception entry
is unchanged; the header now documents which entries are
permanent (wire-protocol) vs temporary (REST-deferred).
Sandbox limitations + operator follow-up
=========================================
- 'npm install' was NOT run from the sandbox (sessions volume
99%-full, 142 MB free). The operator runs 'cd web && npm install'
on their workstation; this lands orval@^7.0.0 in node_modules,
then 'cd web && npm run generate' produces the initial
web/src/api/generated/ tree.
- First per-consumer migration (suggested: web/src/pages/AuthSettings
or one of the operator-decision pages) lands in a follow-up PR
after npm install completes.
- The 29-op OpenAPI burn-down is a 2-sprint effort tracked under
ARCH-H1 in cowork/certctl-architecture-diligence-audit.html.
All CI guards (openapi-handler-parity, openapi-codegen-drift, plus
every existing guard) verified clean by running each individually.
Closes:
- cowork/certctl-architecture-diligence-audit.html#fix-ARCH-H1
(reconciliation: gap is 0 with exceptions accounted for; burn-down
plan documented for follow-up sprints)
- cowork/certctl-architecture-diligence-audit.html#fix-ARCH-M6
(codegen scaffolding shipped; client.ts deletion follows in a
subsequent PR after consumers migrate)
|
||
|
|
eee124efb6 |
chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5
Four scripts/ci-guards/*.sh trips on dev/auth-bundle-2 vs master:
1. G-3-env-docs-drift: 10 CERTCTL_* env vars added by Auth Bundle 2 +
audit-2026-05-10/11 fix bundle were not in docs/. Added a new 'Auth
(Bundle 1 + Bundle 2)' section to docs/reference/configuration.md
covering CERTCTL_SESSION_BIND_USER_AGENT, CERTCTL_SESSION_GC_INTERVAL,
CERTCTL_OIDC_BCL_MAX_AGE_SECONDS, CERTCTL_OIDC_PRELOGIN_REQUIRE_UA/IP,
CERTCTL_DEMO_MODE_ACK, CERTCTL_TRUSTED_PROXIES + _COUNT (synthesised),
CERTCTL_BOOTSTRAP_* set, CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD. Also
added CERTCTL_RATE_LIMIT_ to the bare-prefix allowlist (referenced
in docs/reference/auth-standards-implemented.md prose).
2. bundle-8-M-009-bare-usemutation: BreakglassPage shipped 3 bare
useMutation() calls instead of useTrackedMutation. Migrated all
three to useTrackedMutation with invalidates: [['breakglass']].
3. multi-tenant-query-coverage: Defense-in-depth tenant_id additions
in the fix bundle dropped the missing-tenant-id query count from 32
to 31. Ratcheted baseline 32 -> 31 (forward-only invariant).
4. openapi-handler-parity: 28 new REST endpoints from Bundle 2 + the
fix bundle missing from api/openapi.yaml. Added them to
api/openapi-handler-exceptions.yaml with per-route 'why:'
justifications. OpenAPI schema generation deferred to pre-v2.2.0
alongside the GUI E2E coverage push; threat model + handler
contracts already live in docs/operator/{rbac,auth-threat-model,
oidc-runbooks}.md.
After this commit every script in scripts/ci-guards/*.sh exits 0.
|
||
|
|
4dc8d3fa5b |
acme-server: key rollover + revocation + ARI (Phase 4/7)
Closes the RFC 8555 + RFC 9773 surface beyond the issuance happy-path:
- POST /acme/profile/<id>/key-change (RFC 8555 §7.3.5)
- POST /acme/profile/<id>/revoke-cert (RFC 8555 §7.6)
- GET /acme/profile/<id>/renewal-info/<cert-id> (RFC 9773 ARI)
After this commit, ACME clients can rotate account keys, revoke certs
through the ACME surface (rather than only via the certctl GUI/API),
and fetch ARI for proactive renewal scheduling.
Architecture:
- Key rollover: outer JWS verified against the registered account key
(existing kid path); the inner JWS — embedded as the outer's payload
— verified against the embedded NEW jwk in a new dedicated routine
(ParseAndVerifyKeyChangeInner) that enforces RFC 8555 §7.3.5
inner-only invariants: MUST use jwk + MUST NOT use kid, payload
.account == outer.kid, payload.oldKey thumbprint-equals registered.
A single WithinTx swaps the stored thumbprint+pem and writes the
audit row. Concurrent-rollover safety via SELECT…FOR UPDATE on the
conflicting account row in UpdateAccountJWKWithTx; the loser
observes the winner's new thumbprint and is told to retry (409).
- Revocation: two auth paths. kid → AccountOwnsCertificate single-
indexed COUNT lookup over acme_orders. jwk → constant-time RFC 7638
thumbprint compare against the cert's pubkey. Both paths route
through service.RevocationSvc.RevokeCertificateWithActor so the
existing CRL/OCSP refresh + audit + metrics pipeline applies. RFC
5280 §5.3.1 numeric reason codes clamp to certctl's
domain.ValidRevocationReasons; codes 8 (removeFromCRL) + 10
(aACompromise) clamp to 'unspecified' since they aren't in the set.
- ARI is GET-only and unauth per RFC 9773 §4. Cert-id wire shape is
base64url(AKI).base64url(serial); ParseARICertID strict-decodes,
SerialHex emits the canonical certctl-shape lowercase-no-leading-
zeros hex used in certificate_versions.serial_number.
ComputeRenewalWindow has 3 branches: bound RenewalPolicy →
[notAfter - days, notAfter - days/2]; no policy → last 33% of
validity; past expiry → [now, now + 1d] (renew immediately).
Retry-After honors CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL.
What ships:
- internal/api/acme/{keychange,ari}.go (+ phase4_test.go: 15 tests).
- internal/api/acme/order.go: RevokeCertRequest wire shape.
- internal/api/handler/acme.go: KeyChange, RevokeCert, RenewalInfo
+ 11 new writeServiceError mappings.
- internal/repository/postgres/acme.go: UpdateAccountJWKWithTx (FOR
UPDATE + expectedOldThumbprint precondition; ErrACMEAccountKey-
ConcurrentUpdate sentinel) + AccountOwnsCertificate.
- internal/service/acme.go: RotateAccountKey + RevokeCert +
RenewalInfo; CertificateRevoker + RenewalPolicyLookup interfaces;
SetRevocationDelegate + SetRenewalPolicyLookup wiring; 11 new
sentinels; 6 new metrics.
- internal/service/acme_phase4_test.go: service-layer tests for
RotateAccountKey (happy + duplicate-key) + RevokeCert (kid mismatch
+ jwk mismatch + jwk happy + already-revoked + reason-clamping) +
RenewalInfo (disabled + bad cert-id).
- internal/api/router/router.go: 6 new register calls (3 per-profile
+ 3 shorthand). Router parity exceptions extended in lockstep
(in-tree SpecParityExceptions + CI-only openapi-handler-exceptions
.yaml).
- cmd/server/main.go: SetRevocationDelegate(revocationSvc) +
SetRenewalPolicyLookup(renewalPolicyRepo) at startup.
- internal/config/config.go: CERTCTL_ACME_SERVER_ARI_ENABLED (default
true) + CERTCTL_ACME_SERVER_ARI_POLL_INTERVAL (default 6h);
BuildDirectory's ariEnabled flag now flips on under
cfg.ARIEnabled.
- docs/acme-server.md: phase status flipped to Phase 4; endpoints
table grows 6 rows (3 per-profile + 3 shorthand); FAQ section
appended explaining how to rotate keys, revoke certs, and consume
ARI.
Tests:
- 'go vet ./...' clean across the repo.
- 'go test -short -count=1 ./...' green across every package.
- phase4_test.go covers: keychange happy-path + 5 negatives +
MapKeyChangeErrorToProblem coverage; ARI cert-id round-trip + 6
malformed cases + BuildARICertID from a generated cert; window-
math 3 branches.
- service-layer tests confirm: RotateAccountKey atomically swaps the
thumbprint (verifies persisted state) and rejects duplicate keys;
RevokeCert routes through the stub RevocationSvc with the right
actor string + reason on the jwk path, rejects mismatched keys,
rejects already-revoked certs, clamps reason codes correctly;
RenewalInfo respects ARIEnabled + cert-id format.
Engineering history: cowork/WORKSPACE-CHANGELOG.md 'ACME-Server-4'.
|
||
|
|
9bc845304e |
acme-server: HTTP-01 + DNS-01 + TLS-ALPN-01 challenge validation (Phase 3/7)
Wires up the actual challenge-validation machinery so profiles in
acme_auth_mode='challenge' resolve end-to-end. After this commit,
cert-manager 1.15+ with `solver: http01: ingress` against a
challenge-mode profile completes a real HTTP-01 flow and gets a cert.
DNS-01 + TLS-ALPN-01 share the same code path with the appropriate
validator selection.
Architecture (the load-bearing parts):
- 3 separate semaphore-bounded worker pools (one per challenge type),
so HTTP-01 and DNS-01 can't starve each other under load. Default
weight 10 per type; tunable via CERTCTL_ACME_SERVER_HTTP01_CONCURRENCY,
DNS01_CONCURRENCY, TLSALPN01_CONCURRENCY.
- 30s per-challenge timeout (configurable via PoolConfig.PerChallengeTimeout).
- HTTP-01 validator runs validation.IsReservedIPForDial (newly
exported wrapper preserving the existing private impl byte-for-byte
for the network scanner + ValidateSafeURL paths) on the resolved
IP — both at the initial dial and every redirect hop. SSRF probes
into private IP space are refused before the connect.
- DNS-01 validator uses a dedicated resolver pointed at
CERTCTL_ACME_SERVER_DNS01_RESOLVER (default 8.8.8.8:53) — does
NOT use the system resolver to keep behavior deterministic across
deployments. Wildcard handling: `*.example.com` queries
_acme-challenge.example.com.
- TLS-ALPN-01 validator (RFC 8737) connects with ALPN `acme-tls/1`,
inspects the id-pe-acmeIdentifier extension (OID 1.3.6.1.5.5.7.1.31),
asserts the ASN.1 OCTET STRING value equals SHA-256 of the key
authorization. Cert chain is intentionally NOT validated
(InsecureSkipVerify=true is correct per RFC 8737 — the proof is
in the extension, not the chain). Documented in docs/tls.md L-001
table + the //nolint:gosec comment carries the justification.
SSRF guard: same posture as HTTP-01.
- Validation is asynchronous: handler accepts the POST and returns
200 immediately with status=processing; the worker-pool fires a
callback that updates challenge → authz → order in a fresh
background-context WithinTx. The order auto-promotes to `ready`
when ALL authzs become valid; auto-fails to `invalid` when ANY
authz becomes invalid.
What ships:
- internal/api/acme/challenge.go: KeyAuthorization (RFC 8555 §8.1) +
DNS01TXTRecordValue (§8.4) + TLSALPN01ExtensionValue (RFC 8737 §3)
helpers; IDPEAcmeIdentifierOID; ChallengeProblemFromError mapper
(4-way: connection / dns / tls / incorrectResponse); 9 sentinel
errors covering every named failure mode.
- internal/api/acme/validators.go: ChallengeValidator interface;
Pool dispatcher with 3 semaphores + per-type in-flight + peak
gauges; HTTP01Validator + DNS01Validator + TLSALPN01Validator
implementations; Drain method called from cmd/server/main.go's
shutdown sequence.
- internal/api/acme/validators_test.go: KeyAuthorization round-trip,
DNS01 / TLS-ALPN-01 helper tests, SSRF rejection, bounded-
concurrency saturation test (peak-in-flight ≤ cap), type-isolation
test (HTTP-01 saturation doesn't block DNS-01), UnknownType test,
7-case ChallengeProblemFromError mapping.
- internal/repository/postgres/acme.go: GetChallengeByID +
UpdateChallengeWithTx + UpdateAuthzStatusWithTx.
- internal/service/acme.go: SetValidatorPool wires the *acme.Pool;
RespondToChallenge dispatches with account-ownership assertion +
KeyAuthorization computation + processing-status transition (atomic
+ audit); recordChallengeOutcome callback persists the final
challenge + cascading authz + order-promote/-fail in one WithinTx +
audit row. 4 new metrics.
- internal/api/handler/acme.go: Challenge handler; round-trips
account.JWKPEM through ParseJWKFromPEM to recover the *jose.JSONWebKey
the validator pool needs.
- internal/api/router/router.go + openapi_parity_test.go +
api/openapi-handler-exceptions.yaml: 2 new routes (per-profile +
shorthand for challenge/{chall_id}) with parity exceptions.
- cmd/server/main.go: constructs the Pool at startup with the
per-type concurrency caps from cfg.ACMEServer; ACMEService.ValidatorPool()
accessor exposed for the shutdown drain sequence.
- internal/validation/ssrf.go: exported IsReservedIPForDial wrapper
(private impl unchanged; network scanner + ValidateSafeURL paths
byte-identical with prior behavior).
- docs/tls.md: L-001 InsecureSkipVerify table extended with the
TLS-ALPN-01 validator justification (RFC 8737 §3).
- docs/acme-server.md: phase status updated; endpoints table grows
the challenge row; phases-cross-reference flips Phase 3 → live.
Tests:
- 80%+ coverage on the new files.
- BoundedConcurrency test: 10 challenges submitted against an
HTTP-01 pool of weight 3; observed peak-in-flight ≤ 3, all 10
eventually complete, post-Drain in-flight returns to 0.
- TypeIsolation test: HTTP-01 saturation does NOT block a DNS-01
submission; DNS-01 callback fires within 2s.
- SSRF rejection test: a Validate against `localhost` is refused
before the dial (ErrChallengeReservedIP or ErrChallengeConnection).
Engineering history: cowork/WORKSPACE-CHANGELOG.md "ACME-Server-3".
|
||
|
|
c351bba41a |
acme-server: orders + authorizations + finalize + cert download (Phase 2/7)
Closes the issuance loop in trust_authenticated mode (commits |
||
|
|
a05a7d3dad |
ci: fix Phase 1b post-push CI failures (3 guards)
Phase 1b push (commit
|
||
|
|
b7a3162028 |
ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
Bundle: ci-pipeline-cleanup, Phases 7-9 / frozen decisions 0.8 + 0.10 + 0.11.
NEW image-and-supply-chain job (Ubuntu, ~3 min). Three steps:
PHASE 7 — Digest validity
scripts/ci-guards/digest-validity.sh resolves every @sha256:<digest>
ref in deploy/**/*.{yml,Dockerfile*} against its registry. Closes the
H-001 lying-field gap that Bundle II hit (11 fabricated digests passed
H-001's regex-only check and failed docker pull in CI).
Sandbox verification: 16/16 digests in deploy/* + Dockerfiles all
return HTTP 200 from registry-1.docker.io / ghcr.io / mcr.microsoft.com.
PHASE 8 — Docker build smoke (all 4 Dockerfiles)
Per frozen decision 0.10: build Dockerfile, Dockerfile.agent,
deploy/test/f5-mock-icontrol/Dockerfile, deploy/test/libest/Dockerfile.
Catches syntax errors + COPY path drift before tag-time release.yml.
The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a
syntax error there silently breaks the e2e suite.
PHASE 9 — OpenAPI ↔ handler operationId parity
scripts/ci-guards/openapi-handler-parity.sh extracts router routes
(r.mux.Handle / r.Register "METHOD /path" syntax — Go 1.22+ ServeMux),
extracts OpenAPI operations (paths × HTTP methods), and fails if any
router route has no operationId AND is not documented in the new
api/openapi-handler-exceptions.yaml.
Verified gap at HEAD
|