Commit Graph

3 Commits

Author SHA1 Message Date
shankar0123 67f346cd87 docs(arch-h1): Phase 13 Sprint 13.1 — categorize OpenAPI exceptions + bucket guards
Phase 13 Sprint 13.1 closure (architecture diligence audit ARCH-H1):
splits api/openapi-handler-exceptions.yaml's 64 entries into two
buckets via a required `category:` field, extends the parity script
with bucket reporting + a `--bucket=` subcommand, and adds a sibling
monotonic-decrease guard pinned to a checked-in baseline file. Pure
YAML + bash + doc; zero runtime change.

Strategy
========
The audit originally framed ARCH-H1 as "burn down the 64-entry
exception list to ≤20." Sprint 13.1 reframes against the structural
reality: 36 of the 64 entries are legitimate IETF-RFC wire-protocol
contracts (SCEP RFC 8894, ACME RFC 8555, ACME ARI RFC 9773, EST
RFC 7030) that MUST stay; the remaining 28 are REST-shaped routes
whose OpenAPI op was deferred. Categorize the two buckets, monotone-
gate the rest-deferred bucket against a baseline, and Sprints
13.4-13.6 drive rest-deferred to zero.

Categorization rule applied per-entry
=====================================
An entry is `category: wire-protocol` if ANY of:
  1. `why:` cites an RFC anchor (RFC 8894 / 8555 / 9773 / 7030).
  2. `why:` contains the strings "wire-protocol", "wire protocol",
     "sibling", or "shorthand".
  3. Route path starts with `/scep`, `/scep-mtls`, `/acme/`, or
     `/acme` (wire-protocol prefix).
Otherwise: `category: rest-deferred`.

This rule produced the 36 / 28 split that the Sprint 13.1 audit
prompt expected — verified by python assertion + manual eyeball
review of every entry's `why:` field before categorizing.

Per-entry decisions (read off the post-categorization YAML)
===========================================================

WIRE-PROTOCOL (36) — RFC contracts; never burn down:

  SCEP family (8) — RFC 8894 + RFC 7030 SCEP-mTLS sibling:
    GET    /scep                  RFC 8894 §3.1 GetCACert / GetCACaps
    POST   /scep                  RFC 8894 §3.1 PKCSReq / RenewalReq
    GET    /scep/                 trailing-slash variant (ChromeOS)
    POST   /scep/                 trailing-slash variant (ChromeOS)
    GET    /scep-mtls             EST RFC 7030 Phase 6.5 sibling
    POST   /scep-mtls             SCEP-mTLS POST variant
    GET    /scep-mtls/            SCEP-mTLS trailing-slash variant
    POST   /scep-mtls/            SCEP-mTLS trailing-slash POST

  ACME per-profile (12) — RFC 8555 §7.x + RFC 9773 ARI:
    GET    /acme/profile/{id}/directory             RFC 8555 §7.1.1
    HEAD   /acme/profile/{id}/new-nonce             RFC 8555 §7.2
    GET    /acme/profile/{id}/new-nonce             RFC 8555 §7.2
    POST   /acme/profile/{id}/new-account           RFC 8555 §7.3
    POST   /acme/profile/{id}/account/{acc_id}      RFC 8555 §7.3.2/.6
    POST   /acme/profile/{id}/new-order             RFC 8555 §7.4
    POST   /acme/profile/{id}/order/{ord_id}        RFC 8555 §7.4 PoG
    POST   /acme/profile/{id}/order/{ord_id}/finalize  RFC 8555 §7.4
    POST   /acme/profile/{id}/authz/{authz_id}      RFC 8555 §7.5
    POST   /acme/profile/{id}/challenge/{chall_id}  RFC 8555 §7.5.1
    POST   /acme/profile/{id}/cert/{cert_id}        RFC 8555 §7.4.2
    POST   /acme/profile/{id}/key-change            RFC 8555 §7.3.5
    POST   /acme/profile/{id}/revoke-cert           RFC 8555 §7.6
    GET    /acme/profile/{id}/renewal-info/{cert_id} RFC 9773 ARI

  ACME default-profile shorthand (14) — sibling routes; same wire
  semantics, dispatched when CERTCTL_ACME_SERVER_DEFAULT_PROFILE_ID
  is set:
    GET    /acme/directory
    HEAD   /acme/new-nonce
    GET    /acme/new-nonce
    POST   /acme/new-account
    POST   /acme/account/{acc_id}
    POST   /acme/new-order
    POST   /acme/order/{ord_id}
    POST   /acme/order/{ord_id}/finalize
    POST   /acme/authz/{authz_id}
    POST   /acme/challenge/{chall_id}
    POST   /acme/cert/{cert_id}
    POST   /acme/key-change
    POST   /acme/revoke-cert
    GET    /acme/renewal-info/{cert_id}

REST-DEFERRED (28) — gaps; Sprints 13.4-13.6 author into openapi.yaml:

  auth/sessions cluster (3):
    GET    /api/v1/auth/sessions
    DELETE /api/v1/auth/sessions
    DELETE /api/v1/auth/sessions/{id}

  auth/oidc CRUD + JWKS + test + refresh cluster (10):
    GET    /api/v1/auth/oidc/providers
    POST   /api/v1/auth/oidc/providers
    PUT    /api/v1/auth/oidc/providers/{id}
    DELETE /api/v1/auth/oidc/providers/{id}
    GET    /api/v1/auth/oidc/providers/{id}/jwks-status
    POST   /api/v1/auth/oidc/providers/{id}/refresh
    POST   /api/v1/auth/oidc/test
    GET    /api/v1/auth/oidc/group-mappings
    POST   /api/v1/auth/oidc/group-mappings
    DELETE /api/v1/auth/oidc/group-mappings/{id}

  auth/breakglass admin cluster (4):
    GET    /api/v1/auth/breakglass/credentials
    POST   /api/v1/auth/breakglass/credentials
    DELETE /api/v1/auth/breakglass/credentials/{actor_id}
    POST   /api/v1/auth/breakglass/credentials/{actor_id}/unlock

  auth/users cluster (3):
    GET    /api/v1/auth/users
    DELETE /api/v1/auth/users/{id}
    POST   /api/v1/auth/users/{id}/reactivate

  Misc REST one-offs (3):
    GET    /api/v1/auth/runtime-config
    POST   /api/v1/auth/demo-residual/cleanup
    GET    /api/v1/audit/export

  OIDC + breakglass browser flows (5):
    GET    /auth/oidc/login
    GET    /auth/oidc/callback
    POST   /auth/oidc/back-channel-logout
    POST   /auth/logout
    POST   /auth/breakglass/login

Files changed
=============

api/openapi-handler-exceptions.yaml (+1 line per entry):
  - Header rewritten to document the two-bucket contract + the
    Phase 13 burn-down plan + the baseline-file convention.
  - Every existing `route:` + `why:` pair preserved verbatim.
  - `    category: <bucket>` line inserted after each `why:` line.
  - Pyyaml round-trip parses to 64 entries cleanly.

api/openapi-handler-exceptions-baseline.txt (NEW, 1 line):
  - Contains single integer `28` matching the current rest-deferred
    count. Sprints 13.4-13.6 decrement this in lockstep with each
    batch of OpenAPI ops authored.

scripts/ci-guards/openapi-handler-parity.sh (rewritten):
  - Reports `wire-protocol: N` + `rest-deferred: N` lines alongside
    the existing total.
  - New `--bucket=wire-protocol|rest-deferred` subcommand prints
    just the bucket count + exits 0. Used by the new monotonic
    guard + by Sprint 13.7's hard-floor pin.
  - New fail condition: any entry missing the required `category:`
    field, or carrying an unknown category value, fails the build
    with a clear ::error:: annotation.
  - Existing exit-code semantics preserved (drift / orphan / stale
    detection paths unchanged).

scripts/ci-guards/openapi-rest-deferred-monotonic.sh (NEW):
  - Reads the rest-deferred count via the parity script's --bucket
    subcommand.
  - Reads the baseline file at
    api/openapi-handler-exceptions-baseline.txt.
  - Fails with ::error:: if current count exceeds OR falls below the
    baseline. The fall-below path forces operators to update the
    baseline in the same commit as the corresponding YAML deletion
    — keeps the monotonic-decrease contract honest.
  - CI workflow auto-discovers any scripts/ci-guards/*.sh; no
    .github/workflows/ci.yml change required (verified — the loop
    at .github/workflows/ci.yml::Regression\ guards uses a glob).

scripts/ci-guards/README.md (+33 lines):
  - Two new entries in the per-finding regression-guards table for
    `openapi-handler-parity` (existing; bucket subcommand documented)
    and `openapi-rest-deferred-monotonic` (new).
  - New "ARCH-H1 OpenAPI exception two-bucket contract" section
    documenting the wire-protocol vs rest-deferred decision rule +
    the canonical close path for a rest-deferred entry (author op
    + delete exception + decrement baseline in same PR) + the
    bucket-count inspection commands.

Verification (all local, sandbox /sessions partition full so
disk-tmpfile-dependent guards skipped — see Hotfix #4 commit msg
for sandbox-disk context)
=========================================================

  $ bash scripts/ci-guards/openapi-handler-parity.sh
    Router routes:                  220
    OpenAPI operations:             158
    Documented exceptions:          64
      wire-protocol:                36
      rest-deferred:                28
    openapi-handler-parity: clean.

  $ bash scripts/ci-guards/openapi-handler-parity.sh --bucket=wire-protocol
    36

  $ bash scripts/ci-guards/openapi-handler-parity.sh --bucket=rest-deferred
    28

  $ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
    openapi-rest-deferred-monotonic: clean — rest-deferred = 28,
    baseline = 28.

  $ cat api/openapi-handler-exceptions-baseline.txt
    28

  $ python3 -c "import yaml; d=yaml.safe_load(open('api/openapi-handler-exceptions.yaml')); print(len(d['documented_exceptions']))"
    64

Negative test (corrupted baseline → guard fails):
  $ echo "abc" > api/openapi-handler-exceptions-baseline.txt
  $ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
    ::error::api/openapi-handler-exceptions-baseline.txt must contain
    a single non-negative integer; got: 'abc'

Negative test (rest-deferred over baseline → guard fails):
  $ echo "27" > api/openapi-handler-exceptions-baseline.txt
  $ bash scripts/ci-guards/openapi-rest-deferred-monotonic.sh
    ::error::rest-deferred bucket grew: 28 > baseline 27.

Negative test (missing category → parity script fails):
  $ # delete first 'category: wire-protocol' line
  $ bash scripts/ci-guards/openapi-handler-parity.sh
    ::error::api/openapi-handler-exceptions.yaml: 1 entries missing
    required `category:` field:
      GET /scep

Ambiguous entries surfaced for operator review
==============================================
None. Every entry's category derived deterministically from the
3-rule decision tree (RFC anchor → wire-protocol; wire/sibling/
shorthand keyword in `why:` → wire-protocol; route prefix matches
wire-protocol family → wire-protocol; otherwise rest-deferred).

Closes: Phase 13 Sprint 13.1 of the certctl architecture diligence
remediation (ARCH-H1 structural categorization). Unblocks Sprints
13.4-13.6 (OpenAPI authoring batches against the rest-deferred
bucket).
2026-05-14 11:18:12 +00:00
shankar0123 3c81531398 ci: OpenAPI parity reconciliation + codegen scaffolding (Phase 5 — ARCH-H1 / ARCH-M6)
Phase 5 reconciliation: the audit's headline framing 'ARCH-H1 = 62-route
OpenAPI gap' was a measurement scoping error. Every one of the 209
unique router routes is already accounted for — 154 in api/openapi.yaml,
55 in api/openapi-handler-exceptions.yaml. The existing
openapi-handler-parity.sh CI guard already enforces this and passes
clean today. The audit subtracted operation-count from route-count
without accounting for the documented exceptions YAML.

Where real work remains (and what this PR does about it)
=========================================================

Of the 64 documented exceptions, 35 are legitimate wire-protocol
carve-outs that MUST stay (SCEP RFC 8894 × 8 entries, ACME RFC 8555
default + per-profile × 27 entries — they're protocol contracts, not
REST resources). The remaining 29 are REST-shaped routes whose
OpenAPI ops were deferred during their original Bundle 2 /
audit-2026-05-10 / 2026-05-11 work:

  - auth/sessions (3)
  - auth/oidc admin (9)
  - auth/breakglass admin (4)
  - auth/users mgmt (3)
  - auth/runtime-config (1)
  - auth/demo-residual/cleanup (1)
  - audit/export (1)
  - auth/logout (1)
  - auth/breakglass/login (1)
  - auth/oidc {login,callback,bcl} (3)
  - oidc/providers/{id}/jwks-status (1)
  - + 2 other auth-flow routes

Burn-down plan in 3 sprints (documented in
api/openapi-handler-exceptions.yaml header):
  Sprint A: Cluster 1 — sessions + oidc admin (12 ops)
  Sprint B: Cluster 2 — breakglass + users + runtime-config (8 ops)
  Sprint C: Cluster 3 — audit/export + auth flows (9 ops)

This PR does NOT author the 29 OpenAPI ops; each needs request/
response schemas, not placeholders, and the design work is too
large for one PR. The reconciliation here is documentation + a CI
guard that will fail any future schema-drift, plus the scaffolding
needed for sub-phase 5b.

Sub-phase 5b: codegen scaffolding
==================================

Adds the orval scaffolding without running npm install (sandbox
disk-full; first 'npm install' + 'npm run generate' happens on the
operator's workstation):

  - web/orval.config.ts — codegen config emits react-query hooks
    from api/openapi.yaml into web/src/api/generated/
  - web/package.json — adds orval@^7.0.0 devDep + 'generate' npm script
  - web/CODEGEN.md — operator-facing migration doc:
    first-time setup, per-consumer migration pattern, burn-down plan,
    CI-guard rules
  - scripts/ci-guards/openapi-codegen-drift.sh — blocks the build
    when api/openapi.yaml changes but web/src/api/generated/ wasn't
    regenerated alongside. Currently no-op (the directory doesn't
    exist yet); activates from the first 'npm run generate' run.

The legacy web/src/api/client.ts stays in tree per the phase prompt's
'do not delete in same PR as codegen' rule. Consumers migrate one
page at a time as their OpenAPI ops land; client.ts deletion is a
SEPARATE follow-up PR after the last consumer migrates.

Updates to existing guard + exceptions YAML
============================================

  - scripts/ci-guards/openapi-handler-parity.sh header rewritten
    with the Phase 5 reconciliation numbers (220/158/64/0) and the
    wire-protocol vs REST-deferred classification.
  - api/openapi-handler-exceptions.yaml header rewritten with the
    35/29 split + the 3-sprint burn-down plan. Each exception entry
    is unchanged; the header now documents which entries are
    permanent (wire-protocol) vs temporary (REST-deferred).

Sandbox limitations + operator follow-up
=========================================

  - 'npm install' was NOT run from the sandbox (sessions volume
    99%-full, 142 MB free). The operator runs 'cd web && npm install'
    on their workstation; this lands orval@^7.0.0 in node_modules,
    then 'cd web && npm run generate' produces the initial
    web/src/api/generated/ tree.
  - First per-consumer migration (suggested: web/src/pages/AuthSettings
    or one of the operator-decision pages) lands in a follow-up PR
    after npm install completes.
  - The 29-op OpenAPI burn-down is a 2-sprint effort tracked under
    ARCH-H1 in cowork/certctl-architecture-diligence-audit.html.

All CI guards (openapi-handler-parity, openapi-codegen-drift, plus
every existing guard) verified clean by running each individually.

Closes:
  - cowork/certctl-architecture-diligence-audit.html#fix-ARCH-H1
    (reconciliation: gap is 0 with exceptions accounted for; burn-down
    plan documented for follow-up sprints)
  - cowork/certctl-architecture-diligence-audit.html#fix-ARCH-M6
    (codegen scaffolding shipped; client.ts deletion follows in a
    subsequent PR after consumers migrate)
2026-05-13 20:24:20 +00:00
shankar0123 b7a3162028 ci-pipeline-cleanup Phases 7-9: image-and-supply-chain job
Bundle: ci-pipeline-cleanup, Phases 7-9 / frozen decisions 0.8 + 0.10 + 0.11.

NEW image-and-supply-chain job (Ubuntu, ~3 min). Three steps:

PHASE 7 — Digest validity
scripts/ci-guards/digest-validity.sh resolves every @sha256:<digest>
ref in deploy/**/*.{yml,Dockerfile*} against its registry. Closes the
H-001 lying-field gap that Bundle II hit (11 fabricated digests passed
H-001's regex-only check and failed docker pull in CI).
Sandbox verification: 16/16 digests in deploy/* + Dockerfiles all
return HTTP 200 from registry-1.docker.io / ghcr.io / mcr.microsoft.com.

PHASE 8 — Docker build smoke (all 4 Dockerfiles)
Per frozen decision 0.10: build Dockerfile, Dockerfile.agent,
deploy/test/f5-mock-icontrol/Dockerfile, deploy/test/libest/Dockerfile.
Catches syntax errors + COPY path drift before tag-time release.yml.
The test-sidecar Dockerfiles are load-bearing for vendor-e2e — a
syntax error there silently breaks the e2e suite.

PHASE 9 — OpenAPI ↔ handler operationId parity
scripts/ci-guards/openapi-handler-parity.sh extracts router routes
(r.mux.Handle / r.Register "METHOD /path" syntax — Go 1.22+ ServeMux),
extracts OpenAPI operations (paths × HTTP methods), and fails if any
router route has no operationId AND is not documented in the new
api/openapi-handler-exceptions.yaml.

Verified gap at HEAD c48a82c4 (root-caused):
  142 router routes, 136 OpenAPI operations
  6 router-only routes — all SCEP wire-protocol endpoints (RFC-shaped,
    not REST). Documented in api/openapi-handler-exceptions.yaml with
    one-line why: justifications.
  0 OpenAPI-only operations.

Going forward: any new gap fails the build unless documented.

Status checks per push: now 7 (was 8 after Phase 5+6 dropped windows;
this Phase adds 1 = +1 net). Final acceptance gate target.

ci.yml: 383 → 432 lines (+49 for the new job + steps).
2026-04-30 20:50:52 +00:00