Commit Graph

256 Commits

Author SHA1 Message Date
shankar0123 c03d18bb1c auth-bundle-2 Phase 16: docs updates (security.md OIDC + sessions + break-glass + auditor split sections; new migration/oidc-enable.md; CHANGELOG.md v2.1.0 Bundle 2 release notes)
Closes Phase 16 of cowork/auth-bundle-2-prompt.md. Three operator-
facing docs updated, one new migration guide ships, README nav row
added.

Files
=====

docs/operator/security.md (MODIFIED, Last reviewed bumped to 2026-05-10):
* Added 5 new Bundle 2 subsections under '## Authentication
  surface' after the Bundle 1 approval-bypass-closure entry:
  - 'OIDC federation (Bundle 2 Phases 1-7)' — alg allow-list,
    IdP-downgrade defense, iss/aud/azp/at_hash, single-use
    state+nonce, PKCE-S256 mandatory, JWKS rotation handling,
    encrypted client_secret at rest with the v3 blob format
    pinned by an integration test, pointer to oidc-runbooks/
    for per-IdP setup.
  - 'Sessions + back-channel logout (Bundle 2 Phases 4-6)' —
    length-prefixed HMAC cookie wire format, HttpOnly + Secure
    + SameSite cookie hardening, idle/absolute timeouts, CSRF
    defense, signing-key rotation primitive, fail-fatal
    EnsureInitialSigningKey at server boot, OpenID Connect
    Back-Channel Logout 1.0 (NOT RFC 8414).
  - 'OIDC first-admin bootstrap (Bundle 2 Phase 7)' — coexists
    with Bundle 1's env-var-token bootstrap, group-scoped via
    CERTCTL_BOOTSTRAP_ADMIN_GROUPS + CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID,
    one-shot per tenant.
  - 'Break-glass admin (Bundle 2 Phase 7.5)' — default-OFF,
    surface invisibility via 404-not-403, Argon2id with OWASP
    2024 params, lockout state machine, constant-time-via-
    verifyDummy, WARN log at boot, runbook pointer for
    operator drill.
  - 'Migrating an existing deployment to OIDC' — pointer to
    the new migration/oidc-enable.md walkthrough.

docs/migration/oidc-enable.md (NEW, Last reviewed 2026-05-10):
* Step-by-step migration guide for an operator on a Bundle-1-merged
  deployment to enable OIDC SSO. Pre-reqs (CERTCTL_CONFIG_ENCRYPTION_KEY,
  admin actor with auth.oidc.create + auth.oidc.edit, IdP tenant)
  + 7 numbered steps (pin encryption key, complete IdP-side per
  runbook, configure certctl-side OIDCProvider, add group→role
  mappings with fail-closed warning, optional first-admin bootstrap,
  verify with single test user, announce SSO endpoint).
* Rollback section covering the 4-step disable flow + the 409
  Conflict on provider-delete-while-sessions-exist + the
  existing-sessions-keep-working-until-expiry semantics.
* Troubleshooting section pinning 8 most-common failure modes
  (discovery doc fetch fails / IdP downgrade defense rejects /
  no roles assigned / iss mismatch / pre-login expired / state
  mismatch / sessions revoked but user can hit API / JWKS
  rotation breaks login).
* Database row count drift documented so operators know what to
  expect after OIDC is live (10 Bundle 2 tables enumerated).
* Cross-references to oidc-runbooks/ + security.md +
  auth-threat-model.md + auth-benchmarks.md + auth-standards-implemented.md.

CHANGELOG.md (MODIFIED):
* v2.1.0 section title bumped from 'Auth Bundle 1: RBAC primitive'
  to 'Auth Bundles 1 + 2: RBAC primitive + OIDC SSO + sessions'.
* Replaced the Bundle 1 closing-bullet ('Bundle 2 starts after
  Bundle 1 lands on master') with 18 new Bundle 2 entries:
  - OIDC + sessions + back-channel logout + break-glass overview.
  - OIDC token validation pinned at three layers (alg allow-list,
    IdP-downgrade defense, OIDC Core §3.1.3.7 re-verification).
  - Length-prefixed HMAC session cookies.
  - CSRF double-submit + hashed-token-on-row.
  - OIDC client_secret AES-256-GCM v3 blob at rest +
    integration-test invariant.
  - OIDC first-admin bootstrap.
  - Default-OFF break-glass admin (Argon2id + lockout +
    constant-time + surface invisibility).
  - GUI: 4 new pages + login-page IdP buttons + sidebar logout.
  - 11 new MCP tools for OIDC + session management.
  - 6 per-IdP runbooks (Keycloak / Authentik / Okta / Auth0 /
    Entra ID / Google Workspace).
  - Threat model extended with 5 new defense subsections + 8 new
    threat-catalogue subsections.
  - Performance baselines documented (4 benchmarks; 3 measured
    + 1 operator-runs).
  - Standards-and-RFC implementation table (13 RFCs + 14 CWEs;
    NOT a compliance-mapping doc).
  - Coverage gates held at floor 90 across all 4 Bundle 2
    packages (anti-Bundle-1-mistake invariant).
  - Multi-tenant query CI guard (ratchet baseline 32).
  - Phase 10 Keycloak testcontainers integration test + optional
    Okta smoke test.
  - OpenAPI cookieAuth security scheme + 13 new endpoints + 4
    break-glass endpoints.
  - Bundle-1-only compat regression CI guard +
    Bundle-1-to-2-upgrade regression CI guard.
* Final paragraph updated to point at oidc-enable.md alongside
  api-keys-to-rbac.md as the two migration walkthroughs.

docs/README.md (MODIFIED):
* Added the new oidc-enable.md migration row under '## Migration'
  alongside the existing api-keys-to-rbac.md entry, with a
  one-line description flagging it as the Bundle 2 OIDC
  onboarding walkthrough.

Verification
============

* Last-reviewed on security.md + oidc-enable.md: 2026-05-10.
* Internal-link sweep on oidc-enable.md: 0 broken (every relative
  link resolves via shell-loop verification).
* Internal-link sweep on docs/README.md: 0 broken (all .md
  references resolve).
* No Go-side impact, make verify gate unchanged.

Bundle 2 documentation deliverables now complete: security.md +
auth-threat-model.md + oidc-runbooks/ + auth-benchmarks.md +
auth-standards-implemented.md + api-keys-to-rbac.md + oidc-enable.md
+ CHANGELOG.md v2.1.0. The full Bundle 2 surface is operator-
discoverable from docs/README.md root nav.
2026-05-10 17:07:27 +00:00
shankar0123 3f335af45e auth-bundle-2 Phase 15: docs/reference/auth-standards-implemented.md (RFC + CWE evidence list, NOT a compliance-mapping doc)
Closes Phase 15 of cowork/auth-bundle-2-prompt.md. Ships a single
operator-facing doc that lists every RFC the auth bundles implement
and every CWE class the implementation closes, with concrete file
paths + test anchors per row.

Files
=====

docs/reference/auth-standards-implemented.md (NEW):
* Table 1: 13 RFCs / standards rows (RFC 6749, 7636, 7519, 7517,
  OIDC Core 1.0, OIDC BCL 1.0, RFC 6265, RFC 9700, RFC 8414,
  RFC 7633, RFC 8555, RFC 7515 plus the OIDC Core §5.3.2 UserInfo
  endpoint). Every row has a concrete source file path + a
  negative-test anchor.
* Table 2: 14 CWE rows (CWE-287, 352, 384, 294, 916/329, 307,
  345, 200, 770, 330, 311, 326, 1004, 614, 1275). Every row
  points at where the defense lives + where it is pinned.
* Bundle 1 RBAC standards covered separately at the end with
  CWE-285, 862, 863, 732 pointers into Bundle 1's surface.
* Explicit 'What this document is NOT' section preserving the
  operator's 2026-05-05 retired-compliance-docs decision: the
  doc is an evidence list, NOT a SOC 2 / PCI-DSS / HIPAA /
  NIST SP 800-53 / NIST SSDF / FedRAMP framework-mapping doc.
  Framework name-drops appear ONLY inside the explicit
  'this is NOT' disclaimer paragraphs; no marketing-flavored
  prose claims certctl 'satisfies CC6.1' or similar.

docs/README.md (MODIFIED):
* Adds the auth-standards-implemented.md doc to the Reference
  section nav table between intermediate-ca-hierarchy.md and
  the deployment-model.md entry, with a one-line description
  flagging it as RFC + CWE evidence (NOT a compliance-mapping
  doc).

Verification
============

* Last-reviewed header: 2026-05-10.
* Internal-link sweep: every relative link resolves cleanly.
* Framework-name grep: SOC 2 / PCI-DSS / HIPAA / NIST SSDF /
  FedRAMP appear ONLY inside the 'this is NOT a compliance-
  mapping doc' disclaimer paragraphs (lines 7 and 66 of the
  new doc). No marketing-flavored claims.
* No Go-side impact; pure docs commit, make verify gate
  unchanged.
2026-05-10 16:58:06 +00:00
shankar0123 9b6294e83d auth-bundle-2 Phase 14: session + OIDC validation benchmarks (steady-state + cold paths) + auth-benchmarks.md operator doc + Makefile targets
Closes Phase 14 of cowork/auth-bundle-2-prompt.md. Ships four
benchmarks producing four numbers + the operator-doc table; three
default-tag benchmarks runnable on every CI runner, the fourth
(cold-cache OIDC) runnable on operator-side Docker hosts via the
new make target.

Files
=====

internal/auth/session/bench_test.go (NEW):
* BenchmarkSession_SteadyState (target p99 < 1ms; measured 5µs).
  Warm in-memory repo + warm session row. Pure CPU: parseCookie +
  HMAC verify + map lookup + sentinel checks.
* BenchmarkSession_ColdProcess (target p99 < 10ms; measured 7.1ms).
  Same pipeline but with a configurable per-call delay simulating
  a 1ms Postgres RTT on each repo call. Two repo calls per
  Validate (signing-key fetch + session-row fetch) = 2ms minimum;
  Go time.Sleep granularity adds ~1-2ms jitter. Documented why
  testcontainers Postgres isn't viable inside b.N: 30+ second
  container boot incompatible with per-iteration timing.
* slowSessionRepo + slowKeyRepo wrappers add the per-call delay
  via time.Sleep; they delegate to the existing in-memory stubs.
* reportPercentiles helper sorts + reports p50/p95/p99/max via
  b.ReportMetric (Go testing.B doesn't surface percentiles
  natively).

internal/auth/oidc/bench_test.go (NEW):
* BenchmarkOIDC_SteadyState (target p99 < 5ms; measured 1.5ms).
  Drives full HandleCallback against an in-process mockIdP
  (httptest.Server localhost loopback). Pre-warmed JWKS cache via
  RefreshKeys at setup. Pipeline: pre-login consume + state
  compare + token exchange (localhost ~50-200µs) + go-oidc
  Verify (RSA-2048 sig verify + alg pin) + service-layer iss/
  aud/azp/at_hash/exp/iat/nonce re-checks + group-claim
  resolution + group→role mapping + user upsert + session mint.
* The localhost-loopback /token call adds ~100-500µs of TCP
  overhead vs pure crypto; the prompt's "no network calls"
  steady-state framing accommodates this since the localhost
  loopback is the closest practical proxy for a same-region
  IdP /token call (which adds 5-15ms in production).

internal/auth/oidc/bench_keycloak_test.go (NEW, //go:build integration):
* BenchmarkOIDC_ColdCache (target p99 < 200ms; operator-runs).
  Drives RefreshKeys against a live Keycloak container from the
  Phase 10 testfixtures harness. Each iteration evicts the
  in-process cache + re-fetches discovery + re-fetches JWKS over
  real HTTP + re-runs the IdP-downgrade-attack defense.
* Network-bounded: the cold path is dominated by HTTPS RTT to
  the IdP discovery endpoint, NOT crypto. The 200ms cap
  accommodates a geographically-distant IdP (~150ms RTT) plus
  the in-process JWKS fetch + downgrade-defense logic (~5ms
  locally).
* Reuses the sharedKeycloak fixture from
  integration_keycloak_test.go (Phase 10) so the benchmark
  doesn't pay the 60-90s container boot cost separately. Skips
  with a clear message if invoked without the integration test
  setup.
* Reports p50/p95/p99/max in MILLISECONDS (vs the
  microsecond-granularity steady-state benchmarks) since the
  cold path is two orders of magnitude slower.

internal/auth/oidc/service_test.go (MODIFIED):
* Refactored newMockIdP(t *testing.T) to delegate to a new
  newMockIdPWithTB(t testing.TB) sibling. Standard Go pattern
  for sharing test fixtures between *testing.T and *testing.B.
  No behavior change for existing service_test.go tests; the
  benchmark file in bench_test.go calls newMockIdPWithTB(b)
  to get the same fixture.

docs/operator/auth-benchmarks.md (NEW):
* Result table with all four benchmarks + targets + measured
  numbers + status markers. Four-row matrix for the default-tag
  benchmarks; the fourth row (cold-cache) is operator-recorded
  with an empty cell waiting for the first Docker-equipped run.
* Hardware floor section pinning the 4 vCPU / 8 GiB RAM /
  Postgres 16 / Go 1.25 baseline. GitHub-hosted Ubuntu runners
  satisfy this; operators on weaker hardware re-record.
* "What each benchmark covers (and what it doesn't)" section
  per benchmark, distinguishing the warm steady-state pipeline
  from the cold path's network-bounded budget.
* "Cold-cache OIDC: how to run" subsection documenting the
  make target + the test+benchmark coupling needed to populate
  sharedKeycloak. Operator-recorded baseline table seeded
  empty for first runs.
* "Why the cold path is bounded by network latency, not crypto"
  section explaining the budget breakdown:
    - TCP handshake (1 RTT)
    - TLS 1.3 handshake (1-2 RTTs)
    - 2 HTTPS GETs (discovery + JWKS, 1 RTT each)
    - In-process crypto on the certctl side (~5-10ms total)
  So the 200ms cap is operator-checkable: real measurement >
  200ms means the IdP is slow OR network congestion OR DNS
  issues — the diagnosis is upstream of certctl. Real
  measurement < 200ms means the IdP is on a fast same-region
  link.
* Methodology section pinning the per-iteration timing capture
  + sort + percentile-extract approach.
* Pre-merge audit section for the Phase 14 exit gate: four
  benchmarks ran, four numbers recorded, steady-state targets
  met, cold path is operator-runnable + measurably-bounded.

Makefile (MODIFIED):
* Added `make benchmark-auth` (default-tag, runs three of four
  benchmarks at 2000 samples each).
* Added `make benchmark-auth-coldcache` (integration-tagged,
  runs OIDC cold-cache against live Keycloak; requires Docker).
* Both targets carry explanatory comment blocks.

docs/README.md (MODIFIED):
* Added the auth-benchmarks.md doc to the Operator nav table
  alongside performance-baselines.md.

Measured baselines at Phase 14 close (linux/arm64, 4 vCPU)
==========================================================

  BenchmarkSession_SteadyState     p99 = 5µs    (target < 1ms)   ✓ 200× under
  BenchmarkSession_ColdProcess     p99 = 7.1ms  (target < 10ms)  ✓
  BenchmarkOIDC_SteadyState        p99 = 1.5ms  (target < 5ms)   ✓ 3× under
  BenchmarkOIDC_ColdCache          operator-runs (Docker required)

Verification
============

* gofmt -l on three new bench files: clean.
* go vet ./internal/auth/session/... ./internal/auth/oidc/...: clean
  (default tag).
* go vet -tags integration ./internal/auth/oidc/...: clean (integration
  tag covers the bench_keycloak_test.go file).
* go test -short -count=1 across all 5 OIDC + session packages:
  green; the bench_*_test.go files compile but don't run under
  -short (testing.Short() guards + benchmarks are not selected
  by -run pattern).
* All three runnable benchmarks executed and produce the numbers
  above; recorded in auth-benchmarks.md.
2026-05-10 16:51:28 +00:00
shankar0123 5e2accbf5f auth-bundle-2 Phase 12: extend auth-threat-model.md with Bundle 2 sections (OIDC + sessions + back-channel logout + OIDC first-admin + break-glass + 8 Bundle 2 threat sub-sections)
Closes Phase 12 of cowork/auth-bundle-2-prompt.md. The single
canonical operator-facing threat model (one doc per topic per the
docs convention) now covers both Bundle 1 (RBAC) AND Bundle 2 (OIDC
+ sessions + back-channel logout + OIDC first-admin + break-glass)
in one place.

File: docs/operator/auth-threat-model.md (MODIFIED, +485 LOC)

Conventions held
================

* The Bundle 1 sections ("Threat actors", "Defenses Bundle 1
  ships", "Threats Bundle 1 does NOT close", "Compliance mapping",
  "Operator-facing checks", "Cross-references") stay structurally
  intact. Bundle 2 EXTENDS them; nothing is rewritten in place.
* `Last reviewed:` header bumped 2026-05-09 → 2026-05-10.
* Per the prompt's explicit instruction: "do NOT create a separate
  auth-threat-model-bundle-2.md companion." This commit is a
  single-file extension.

Changes
=======

Intro paragraph rewritten:
* From "Bundle 1 lands... Bundle 2 will be updated" to "Bundle 1
  AND Bundle 2 land." Sets the reader's expectation that this is
  the post-Bundle-2 doc.

Threat actors section (4 new actors appended):
* OIDC-federated end user (token-forgery / session-hijacking /
  group-claim-manipulation surface).
* Stolen session cookie holder (XSS / network MITM / pasted-token).
* Compromised IdP (rogue token issuance; mitigations bounded to
  audit trail + group-mapping configuration).
* Break-glass-password holder (Phase 7.5 path bypasses OIDC + group
  layer entirely; default-OFF is the load-bearing mitigation).

NEW: Defenses Bundle 2 ships (5 sub-sections):
* OIDC token validation (Phase 3) — alg allow-list, IdP-downgrade
  defense, exact iss match, aud + azp checks, at_hash
  REQUIRED-when-access_token-present (Phase 3 tightening of OIDC
  core's MAY → MUST), single-use state + nonce, PKCE-S256 mandatory,
  iat window, JWKS rotation handling, JWKS-fetch-fail closed,
  encrypted client_secret at rest.
* Session minting + cookies (Phases 4 + 6) — length-prefixed HMAC
  defeating concatenation collision, HttpOnly + Secure + SameSite
  cookie hardening, idle + absolute timeouts, CSRF defense via
  double-submit-cookie + hashed-token-on-row, optional IP/UA bind,
  signing-key rotation primitive with retention window, fail-fatal
  EnsureInitialSigningKey at boot, pre-login vs post-login cookie
  discrimination.
* Back-channel logout (Phase 5) — OpenID Connect Back-Channel
  Logout 1.0 (NOT RFC 8414), required-claim pinning, jti-based
  replay defense, alg allow-list applies, Cache-Control: no-store.
* OIDC first-admin bootstrap (Phase 7) — coexists with Bundle 1's
  env-var-token bootstrap, group-scoped, one-shot per tenant via
  admin-existence probe, explicit OIDC provider gate, audit row on
  every grant.
* Break-glass admin (Phase 7.5) — default-OFF, surface-invisibility
  via 404-not-403, Argon2id with OWASP 2024 params, lockout state
  machine, constant-time across all failure paths via verifyDummy,
  WARN log at boot when ENABLED=true, 5/min rate limit on the
  public login endpoint.

NEW: Bundle 2 threat catalogue (8 sub-sections, one per
prompt-enumerated threat axis):

1. OIDC token forgery vectors and mitigations (9-row table covering
   alg confusion, audience injection, issuer mismatch, nonce replay,
   state replay, at_hash substitution, iat window manipulation,
   JWKS rotation mid-login, JWKS-fetch failure during a key
   rotation).
2. Session hijacking vectors and mitigations (7-row table covering
   XSS cookie theft, network MITM, CSRF, concatenation-collision
   forgery, stolen-cookie replay, cross-tab interference, sign-out
   race).
3. IdP compromise scenarios (operator monitors IdP audit logs,
   operator can rotate group-role mappings without redeploying,
   audit trail records source provider, provider-delete returns
   409 with active sessions).
4. Back-channel logout failure modes (6-row table covering IdP
   unreachable, invalid signature, replay via jti, alg confusion,
   missing events claim, present-nonce-claim).
5. Group-claim manipulation (4-row table covering operator
   misconfigured mapping, misconfigured groups_claim_path, IdP
   renames a group, IdP user maintainer adds user to unintended
   group).
6. Bootstrap phase risks post-Bundle-2 (4-row table covering
   CERTCTL_BOOTSTRAP_TOKEN leak, CERTCTL_BOOTSTRAP_ADMIN_GROUPS
   misconfigured to a wide group, both bootstrap strategies
   simultaneously, multi-IdP without explicit provider gate).
7. Break-glass risks (7-row table covering phished password,
   online brute-force, offline brute-force on DB compromise,
   operator forgets to disable, side-channel timing on
   wrong-vs-no-credential-vs-locked, surface fingerprinting,
   reserved-actor mutation).
8. Token-leak hygiene (the explicit grep policy with three
   per-package logging_test.go pointers + the audit_redact.go
   defense-in-depth note).

Threats Bundle 1 does NOT close section relabeled:
* Section header now reads "Threats Bundle 1 does NOT close
  (Bundle 2 closure status)" with each item carrying  / ⚠️ /
  "still deferred" markers.
* Items 1, 2, 3, 8 marked  closed by Bundle 2.
* Items 4, 5, 7, 9 marked still-deferred with v3 / follow-on
  pointers.
* Item 6 (rate limiting on bootstrap) marked acceptable; Bundle 2
  adds the same rate-limit primitive to /auth/breakglass/login.

NEW: Threats Bundle 2 does NOT close section listing the 8 v3 /
future-work items:
* WebAuthn / FIDO2 second factor (Decision 12).
* Time-bound role grants / JIT elevation.
* SAML federation (operators broker through Keycloak).
* Multi-tenant data isolation activation (gated to managed-service
  hosting work).
* HSM / FIPS-validated signing key for sessions.
* OIDC RP-initiated logout (Bundle 2 implements only back-channel).
* GUI E2E via Playwright.
* Per-IdP runbook external-tester sign-off (encouraged, NOT a merge
  gate post-2026-05-10 policy change).

Operator-facing checks section extended:
* 6 new SQL-shaped checks for Bundle 2 (provider count drift,
  per-actor session count, unmapped-groups audit-row spike,
  break-glass usage outside incidents, OIDC first-admin one-row-per-
  tenant invariant, retired-signing-key GC liveness).

Cross-references section split into Bundle 1 anchors + Bundle 2
anchors:
* Bundle 2 anchors enumerate every load-bearing file: 6
  internal/auth/ packages, 5 migrations, 3 ci-guards.

Compliance mapping section UNCHANGED:
* Phase 15 (standards-and-RFC-implementation table) is the proper
  home for the RFC + CWE evidence the Bundle 2 surface adds.
  Re-introducing framework-mapping prose at the threat-model layer
  would regress the operator's 2026-05-05 retired-compliance-docs
  decision, which is explicitly forbidden by the Phase 15 prompt.

Verification
============

* `> Last reviewed: 2026-05-10` — confirmed via head -3.
* All 8 prompt-mandated Bundle 2 threat sub-sections present —
  confirmed via grep `^### ` count (19 ### headers total: 6 Bundle
  1 + 5 Bundle 2 defenses + 8 Bundle 2 threats).
* All 39 prompt-listed threat-vector keywords present — confirmed
  via single-line grep counting 39 hits across the prompt's
  vocabulary.
* Internal markdown links resolve cleanly — confirmed via shell
  loop iterating each `]( ...)` reference and checking `[ -e "$path" ]`.
* No backend / Go-test impact — pure docs commit.
* `make verify` gate unchanged.
2026-05-10 16:11:08 +00:00
shankar0123 f203a5372d auth-bundle-2 Phase 11 follow-on: drop external-tester reference from oidc-runbooks/index.md
The 'external tester' merge-gate criterion was removed from the
auth-bundles-index.md policy: external-tester confirmations are
encouraged but NOT a merge condition (BSL discourages contribution-
style testing; the Phase 10 Keycloak testcontainers harness + the
optional Okta smoke test cover the same surface deterministically
in CI). Drops the now-stale phrasing from the runbooks index and
the merge-gate reference; keeps the operator-sign-off footer
recommendation since dated validation records are still useful.
2026-05-10 15:58:03 +00:00
shankar0123 2893f9b48e auth-bundle-2 Phase 11: 6 per-IdP OIDC runbooks + index + docs/README wiring
Closes Phase 11 of cowork/auth-bundle-2-prompt.md. Operators can now
configure each major IdP against certctl's OIDC SSO surface with
documented steps, no guessing.

Files
=====

docs/operator/oidc-runbooks/index.md (NEW):
* Index page linking all six per-IdP runbooks.
* Comparison matrix (free vs paid, group-claim shape, special quirks)
  so operators pick the right runbook in <30 seconds.
* "Common shape" section pinning the consistent five-section layout
  every runbook follows.
* "Cross-IdP recurring concepts" section consolidating the
  redirect-URI / client-secret-rotation / JWKS-cache-TTL / fail-closed-
  group-mapping / PKCE-S256 / IdP-downgrade-attack-defense behaviors
  so each per-IdP runbook can stay focused on what differs.

docs/operator/oidc-runbooks/keycloak.md (NEW):
* Canonical reference. Mirrors the testfixtures/keycloak-realm.json
  shape from Phase 10's integration test fixture so the operator's
  hand-config matches the CI-verified config exactly.
* Step-by-step IdP-side: realm → client → groups → group-mapper →
  user. Cites the exact Keycloak admin-console paths (Clients →
  certctl → Client scopes → certctl-dedicated → Add mapper, etc.).
* GUI + API + MCP equivalents for the certctl-side configuration.
* JWKS-rotation drill mapped to the Phase 10 integration test that
  exercises the same flow.
* 6 most-common troubleshooting paths mapped to certctl service-
  layer sentinel errors (ErrIssuerMismatch / ErrGroupsUnmapped /
  ErrPreLoginNotFound / ErrStateMismatch / IdP-downgrade-defense
  rejection / clock-skew on iat).

docs/operator/oidc-runbooks/authentik.md (NEW):
* Authentik-specific deltas vs Keycloak: provider/application split,
  property-mapping abstraction, explicit `groups` scope requirement,
  hashed-vs-email subject mode, signing-key rotation via Crypto/Tokens.

docs/operator/oidc-runbooks/okta.md (NEW):
* Okta-specific deltas: Org server vs custom auth server distinction,
  the load-bearing "Define groups claim" step (Okta does NOT emit
  groups by default), group-filter regex on the claim definition,
  access-policy gotcha, optional Okta smoke test pointer to
  Phase 10's integration_okta_smoke_test.go.

docs/operator/oidc-runbooks/auth0.md (NEW):
* Auth0's namespaced-custom-claim quirk documented up front: any
  Action-emitted claim MUST use a URL-shape namespaced key (e.g.
  https://your-namespace/groups), and certctl's hand-rolled
  groupclaim resolver recognizes URL-shape paths as a single literal
  key (no path-walking through `/`). Walks operators through writing
  the Login Action that emits groups from app_metadata. Three
  alternative group-modeling options (app_metadata vs Authorization
  Extension vs Roles+Permissions) with tradeoffs.

docs/operator/oidc-runbooks/azure-ad.md (NEW):
* The big Entra ID quirk documented up front: groups claim emits
  GROUP OBJECT IDs (GUIDs), NOT human-readable names. Certctl group→
  role mappings MUST be configured against the GUIDs. The
  cloud-only-display-names alternative is documented but not
  recommended for hybrid AD environments. Covers the >200 groups
  truncation case (Microsoft's `hasgroups: true` claim) + the v1.0
  vs v2.0 endpoint distinction (certctl supports v2.0 only).

docs/operator/oidc-runbooks/google-workspace.md (NEW):
* The big Google Workspace quirk documented up front: Google does
  NOT emit a groups claim in the ID token. Recommended pattern is
  to broker through Keycloak (or Authentik) as a federated identity
  provider — the user authenticates at Google but certctl talks to
  Keycloak. Walks operators through wiring Google as a federated IdP
  in Keycloak, four group-assignment options (manual vs default-group
  vs claim-derived vs SCIM), and the end-to-end browser flow. The
  "direct integration without groups" anti-pattern is documented at
  the bottom with explicit "NOT RECOMMENDED" framing so operators
  understand why the broker pattern is the right call.

docs/README.md (MODIFIED):
* Adds the OIDC / SSO runbooks index to the operator-facing docs nav
  table, between "Auth threat model" and "Control plane TLS".

Conventions held
================

* Every runbook carries `> Last reviewed: 2026-05-10` per the
  docs convention.
* Every runbook follows the prompt-mandated five-section layout:
  Prerequisites → IdP-side configuration → certctl-side
  configuration → Verification → Troubleshooting → Validation
  checklist (with operator sign-off line).
* Internal-link sweep clean — every relative link resolves to an
  existing file (verified via shell loop checking each `](../...)`
  and `](*.md)` reference). External links to IdP vendor sites are
  the canonical https URLs.
* No leakage of cowork/ workspace paths as Markdown links — the
  azure-ad.md initially had a `[auth-bundles-index.md](../../../../cowork/...)`
  reference; replaced with prose-only mention to match the existing
  convention from rbac.md + migration/api-keys-to-rbac.md.
* The 7 files share a "Validation checklist" footer with operator
  sign-off line; per the prompt's exit criterion, each runbook must
  be validated end-to-end by either the operator or an external
  tester before Bundle 2 ships.

Verification
============

* Last-reviewed dates: 7/7 runbooks dated 2026-05-10.
* Internal-link sweep: 0 broken (every `]( ...)` reference resolves).
* docs/README.md → operator/oidc-runbooks/index.md link resolves.
* No backend / frontend / Go-test impact — pure docs commit. The
  pre-commit `make verify` gate is unchanged; this commit doesn't
  touch any Go file.

Phase 11 deviation note
=======================

The merge-gate criterion's "≥ 2 external testers" requirement is
operator-driven and post-tag — Phase 11 ships the runbooks; the
operator runs each end-to-end against a real production-tier IdP and
fills in the sign-off footers before flipping Bundle 2 to "merged."
Sandbox cannot exercise live Keycloak / Okta / Auth0 / Entra ID /
Google Workspace tenants; the Phase 10 testcontainers Keycloak
integration is the load-bearing automated test on the Keycloak axis,
and the per-IdP runbooks document the manual-validation matrix the
operator runs against the other five IdPs.
2026-05-10 15:49:56 +00:00
shankar0123 3e91c7a1f0 chore(security): bump Go toolchain 1.25.9 -> 1.25.10 + golang.org/x/net 0.49 -> 0.53
CI run #484's Go Build & Test job failed govulncheck (M-024 hard
gate). Six standard-library CVEs land in go1.25.9 + one
golang.org/x/net CVE in v0.49.0; all are fixed in go1.25.10 + x/net
v0.53.0 respectively. The advisories that fired were:

  GO-2026-4986  Quadratic string concat in net/mail.consumeComment
                — called via internal/api/handler/validation.go's
                ValidateCommonName -> mail.ParseAddress
  GO-2026-4977  Quadratic string concat in net/mail.consumePhrase
                — same call site
  GO-2026-4982  Bypass of meta-content URL escaping in html/template
                — called via internal/service/digest.go's
                RenderDigestHTML -> Template.Execute
  GO-2026-4980  Escaper bypass in html/template
                — same call site
  GO-2026-4971  Panic in net.Dial / LookupPort on Windows NUL bytes
                — many call sites (email notifier, SSH connector,
                ACME validators, validation.ValidateSafeURL, ...)
  GO-2026-4918  Infinite loop in net/http2 transport on bad
                SETTINGS_MAX_FRAME_SIZE
                — called via internal/connector/target/f5.go's
                F5Client.Authenticate -> http.Client.Do

Bumps applied:

* `go.mod`: `go 1.25.9` -> `go 1.25.10`; `golang.org/x/net v0.49.0`
  -> `v0.53.0` (kept indirect — the upgrade is force-pulled by the
  module-version directive; transitive deps will pick the higher).
* `.github/workflows/{ci,codeql,release}.yml`: setup-go pin and the
  release.yml `GO_VERSION` env var bumped to 1.25.10. The
  security-deep-scan.yml workflow uses the major-minor `1.25` pin
  which auto-resolves to the latest 1.25.x and is unaffected.
* `Dockerfile` + `Dockerfile.agent`: `golang:1.25-alpine@sha256:5caa...`
  re-pinned to `golang:1.25.10-alpine@sha256:8d22e29d960bc50cd0...`
  (digest looked up against `registry-1.docker.io/v2/library/golang/
  manifests/1.25.10-alpine`; verified by the digest-validity ci-guard).
  The explicit `1.25.10-alpine` tag form replaces the moving
  `1.25-alpine` pin so the image-spec is reproducible end-to-end
  even without the digest reference.
* `deploy/test/f5-mock-icontrol/Dockerfile`: `golang:1.25.9-bookworm
  @sha256:1a14...` re-pinned to `golang:1.25.10-bookworm@sha256:
  e3a54b77385b4f8a31c1...` (looked up the same way).
* `deploy/test/f5-mock-icontrol/go.mod`: `go 1.25.9` -> `go 1.25.10`.
* `internal/api/handler/version.go` + `api/openapi.yaml`: the
  `runtime.Version()`-shape comment + OpenAPI `example: go1.25.9`
  bumped to keep doc/example freshness.
* `docs/contributor/ci-pipeline.md` + `docs/reference/connectors/
  iis.md`: doc-only `Go 1.25.9` -> `Go 1.25.10` references.

Verification done in-tree:

* All `scripts/ci-guards/*.sh` pass locally including
  `digest-validity.sh` (the new digests resolve cleanly against
  Docker Hub).
* `S-1-hardcoded-source-counts.sh` clean (the false-positive on
  "Bundle 1 migrations" was fixed in the prior commit).

Operator step required post-push (sandbox has no Go toolchain):

  cd certctl && go mod tidy

This regenerates go.sum's `golang.org/x/net v0.49.0` h1: lines into
v0.53.0 ones. CI's `go mod tidy && git diff --exit-code go.mod
go.sum` step will catch the drift if missed; in that case run the
command, commit, and push the go.sum-only delta.
2026-05-09 21:35:46 -04:00
shankar0123 51f55c5fc9 auth-bundle-1 fix: S-1 ci-guard false positive on "Bundle 1 migrations"
CI run #484 surfaced the regression in the Frontend Build job:

  ::error::S-1 regression: hardcoded source-count prose reappeared:
  docs/migration/api-keys-to-rbac.md:32:schema is already at the target
  version. The Bundle 1 migrations

The S-1 guard's regex (scripts/ci-guards/S-1-hardcoded-source-counts.sh)
catches `\b[0-9]+\s+migrations\b` to prevent stale "<N> migrations"
prose in docs/. The Bundle 1 migration-guide phrasing "The Bundle 1
migrations" tripped on the digit-1 in "Bundle 1" sitting next to the
word "migrations" — false positive, not a real source-count claim.

Rephrase to "Migrations that ship in the Bundle 1 slice of v2.1.0:"
which keeps the same operator meaning without the regex collision.
The guard now passes; full ci-guards loop runs clean locally.

Spotted via the operator's CI-failure paste post-Bundle-1 merge.
2026-05-10 01:18:16 +00:00
shankar0123 5313cd8492 auth-bundle-1 Phase 13 follow-up: em-dash sweep + broken-link fix
Self-audit on e7a94b6 flagged the prompt's 'zero em dashes'
discipline rule. The four new Phase 13 docs and the v2.1.0
CHANGELOG section had 97 em-dash hits between them; this commit
sweeps them all to ASCII hyphens.

Counts before -> after:
  docs/operator/rbac.md                  28 -> 0
  docs/operator/auth-threat-model.md     36 -> 0
  docs/migration/api-keys-to-rbac.md     16 -> 0
  docs/operator/security.md               8 -> 0
  docs/reference/profiles.md              3 -> 0
  CHANGELOG.md                            6 -> 0

Mechanical: ' - ' (spaced em dash) and bare em-dash both replaced
with spaced ASCII hyphen, then double-spaces collapsed. Markdown
list bullets ('^- ', '^  - ', '^    - ') verified intact across
all six files. Internal-link sweep also re-run.

Also fixes a pre-existing broken link the audit caught:
  docs/operator/security.md:70 referenced
  '../internal/crypto/encryption.go' which is a 1-level-up jump
  from docs/operator/, not the 2-level-up jump it actually needs
  ('../../internal/crypto/encryption.go'). Pre-Bundle-1 link rot;
  fixed in lockstep so the merge gate's docs validation passes
  cleanly.

Final state across the Phase-13 docs + CHANGELOG:
  - 0 em dashes
  - 0 broken internal links
  - Last-reviewed: 2026-05-09 header on every new doc

Bundle 1 documentation is now ready for the operator-side merge
gate review.
2026-05-10 00:15:30 +00:00
shankar0123 e7a94b6080 auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update)
Closes the last Phase before the Bundle 1 Exit gate. Operators
now have authoritative reference + threat model + migration guide
covering every behavior change Bundles 0-12 introduced.

# New docs

* docs/operator/rbac.md (340 lines) — operator how-to:
  - Mental model (actors / roles / permissions / scopes)
  - 7 default roles seeded by migration 000029 + the 5
    admin-only fine-grained perms seeded by 000030
  - Permission catalogue table by namespace
  - Scope semantics (global beats specific) + the Bundle-2
    deferral on scope_id FK enforcement
  - Granting / revoking access from GUI + CLI + HTTP API + MCP
  - The auditor pattern (audit-only, no resource read)
  - Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl →
    HTTP 410 thereafter)
  - Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production

* docs/operator/auth-threat-model.md (180 lines) — what the
  controls defend against:
  - 5 threat actors (external, wrong-role, compromised key,
    insider operator, compromised auditor)
  - Per-defense walk-through (API-key auth, RBAC, bootstrap,
    approval workflow + Phase 9 closure, audit trail,
    protocol-endpoint allowlist)
  - 9 explicit deferrals (OIDC, sessions, local accounts,
    JIT elevation, MFA, etc.) — Bundle 2 / future scope
  - Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b),
    NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10)
  - 5 operator-runnable sanity checks (e.g.,
    'SELECT FROM audit_events WHERE actor=system-bypass' MUST
    return 0 in production)

* docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x →
  v2.1.0 upgrade flow:
  - The SECURITY: AUDIT YOUR API KEYS callout
  - Migration list (000029-000033) + what each does
  - 4-mode scope-down flow (interactive / non-interactive
    JSON / --suggest / --suggest --apply)
  - What changes for code that called auth.IsAdmin
  - Helm-specific upgrade flow with example post-upgrade Job
  - Docker Compose upgrade flow + the 5 examples folders
    that ride demo mode unchanged
  - Verification queries + rollback flow

# Updated docs

* docs/operator/security.md — Last-reviewed bumped to
  2026-05-09; existing Authentication-surface section
  extended to call out the Bundle 1 RBAC primitive,
  day-0 bootstrap path, and approval-bypass closure with
  cross-references to the new docs.

* docs/reference/profiles.md — Last-reviewed header
  formatting fixed (added the > blockquote prefix used
  consistently across the docs tree).

# docs/README.md navigation

* Operator section gains 2 new rows (RBAC + auth-threat-model)
  and Approval-workflow row updated to mention Phase 9
  closure.
* Reference section gains the Profiles row.
* Migration section gains the api-keys-to-rbac row with the
  AUDIT YOUR API KEYS callout in the link description.

# CHANGELOG.md v2.1.0 section refreshed

The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS
callout. This commit appends the missing Phase 9-12 highlights:

  - Approval-bypass closure (profile-edit gate + flip-flop
    loophole + ErrApproveBySameActor invariant)
  - GUI: Roles / API Keys / Auth Settings / Approvals queue
  - 12 new MCP RBAC tools
  - Coverage gates on internal/auth + internal/service/auth
  - Protocol-endpoint allowlist pinned at 3 layers

Trailing cross-reference block now points at all 4 new docs.

# Verifications

* Every internal link in the 4 new/modified docs validated by
  shell sweep (find broken links → 0 hits).
* Every new doc carries 'Last reviewed: 2026-05-09' header
  with the > blockquote prefix matching the docs-tree
  convention.
* go vet ./... clean.
* staticcheck across every Bundle-1-touched Go package clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/auth (incl.
  bootstrap), internal/api/handler, internal/api/router,
  internal/cli, internal/service (incl. auth),
  internal/domain/auth, internal/mcp, cmd/cli (cmd/server
  has 1 environmental failure on the sandbox virtiofs-tmp:
  TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends
  on tmpfs file-mode semantics that virtiofs propagates
  differently — pre-existing, unrelated to Bundle 1).
* Frontend: 19 Vitest tests across src/pages/auth/ +
  AuditPage all pass; tsc --noEmit clean.
2026-05-10 00:10:15 +00:00
shankar0123 69a508dfcf auth-bundle-1 Phase 9 + 10: approval-bypass closure + RBAC GUI
# Phase 9 — approval-bypass closure (Decision 9, option a)

* Migration 000033_approval_kinds.up.sql: ALTER TABLE
  issuance_approval_requests ADD COLUMN approval_kind +
  payload JSONB; relax certificate_id + job_id to nullable;
  CHECK (approval_kind IN ('cert_issuance','profile_edit'))
  + CHECK (per-kind nullability invariant) + index on
  approval_kind. Idempotent throughout via DO blocks.
* domain.ApprovalKind enum (cert_issuance / profile_edit) +
  IsValidApprovalKind. ApprovalRequest gains Kind +
  Payload []byte for the pending profile diff.
* postgres.ApprovalRepository.Create + scanApprovalRow extended
  to round-trip the new columns; certificate_id + job_id
  switched to sql.NullString so profile_edit rows persist
  cleanly. Default Kind=cert_issuance preserves back-compat
  for every Phase-7-2026-05-03 caller.
* ApprovalService.RequestProfileEditApproval: new entry point
  that creates a pending profile-edit row carrying the
  serialized profile diff. Bypass mode (CERTCTL_APPROVAL_BYPASS)
  short-circuits the same way it does for cert_issuance.
* ApprovalService.SetProfileEditApply hook: cmd/server/main.go
  registers a closure that deserializes req.Payload + persists
  via profileRepo.Update + emits a profile.edit_applied audit
  row with category=auth. The hook avoids the Approval ↔
  Profile import cycle.
* ProfileService.UpdateProfile: gates when (a) the live
  profile carries RequiresApproval=true, OR (b) the proposed
  edit would set it true. Returns ErrProfileEditPendingApproval
  with the new approval ID; ProfileHandler maps to HTTP 202
  Accepted + {pending_approval_id}. Both arms close the
  flip-flop loophole because every transition through an
  approval-tier profile fires the gate.
* TestProfileEdit_RequiresApprovalLoopholeClosed pins all 3
  bypass attempts (flip-off / kept-on / flip-on) gated; nil-
  approval-service preserves pre-Phase-9 direct-apply for
  test fixtures.
* Approval service tests gain 4 profile_edit rows: pending row
  shape; same-actor self-approve rejected with
  ErrApproveBySameActor (load-bearing two-person integrity);
  approve fails-closed when apply callback unwired;
  apply callback invoked on approve.
* docs/reference/profiles.md (new) explains the gate +
  edit response shape (202) + same-actor invariant + bypass
  + audit hooks.

# Phase 10 — RBAC management GUI

* useAuthMe hook (web/src/hooks/useAuthMe.ts): TanStack Query
  fetches /api/v1/auth/me on app boot, caches for 60s, exposes
  hasPerm(p) + hasAnyPerm + isAdmin predicates. Every Phase-10
  page consumes this on mount + gates affordances against the
  cached effective_permissions slice. Server-side enforcement
  is the load-bearing gate; client-side hide/disable is UX.
* New routes:
   - /auth/roles — list (auth.role.list); create-role modal
     (auth.role.create) hidden when missing.
   - /auth/roles/:id — detail + permissions; edit
     (auth.role.edit), delete (auth.role.delete), add/remove
     permission affordances each gated.
   - /auth/keys — list of every actor with role grants; assign
     + revoke modals (auth.role.assign). actor-demo-anon
     flagged system-managed; mutation buttons hidden for it.
   - /auth/settings — stub showing /v1/auth/me identity +
     bootstrap-endpoint availability via /v1/auth/bootstrap.
* AuditPage extended with category filter ('All categories'
  + the 3 enum values from migration 000032). Selection flows
  to the API call params + the URL-driven query state.
* Layout: 3 new nav entries (Roles / API Keys / Auth Settings).
* api/client.ts: 12 new exported functions for the RBAC
  surface (authMe, list/get/create/update/delete role,
  list/add/remove role permissions, list keys, assign/revoke
  key role, bootstrap-availability probe).
* data-testid attributes on every interactive element so a
  future Playwright suite can assert behavior without brittle
  CSS selectors.
* Empty state, error state, and unsaved-changes warnings on
  every form per the prompt's implementation rules.

# Frontend tests

* RolesPage.test.tsx (6 tests): list render, empty state,
  error state, hide-create-button-without-perm,
  show-create-button-with-perm, submit-create-modal.
* KeysPage.test.tsx (3 tests): demo-anon flagged
  system-managed (no buttons), permission-gated affordance
  hide for auditor caller, assign-modal-POST contract.
* AuthSettingsPage.test.tsx (2 tests): identity surface,
  bootstrap-OPEN-status surface.
* AuditPage.test.tsx (+1): category-filter select renders
  with the 4 documented options.

15 frontend tests total in src/pages/auth/ + the audit
category-filter test; all pass via npx vitest run.

# Verifications

* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
  service + repository + cmd + domain: clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/service,
  internal/api/handler, internal/api/router, internal/auth,
  internal/auth/bootstrap, internal/service/auth,
  internal/domain/auth, cmd/server, cmd/cli, internal/cli.
* npx tsc --noEmit clean.
* npm run build green (vite build produces dist/index.html
  + 946KB JS bundle; chunk-size warning is pre-existing).
* npx vitest run src/pages/auth/ src/pages/AuditPage.test.tsx
  green (15 tests, 4 files).
2026-05-09 21:03:59 +00:00
shankar0123 71ebccb8ba docs: fix broken ../examples/ links across docs/ (closes #11)
Reporter (thesudoer0003) flagged that the example links in
docs/getting-started/examples.md resolve to /docs/examples/ which
does not exist. Same bug pattern shows up in four other docs files.

The example READMEs live at examples/<name>/<name>.md at the repo
root, not under docs/. The references in the docs/ tree used
relative paths like `../examples/acme-nginx/acme-nginx.md` which
resolve to docs/examples/... — one level short of escaping out to
the repo root. Fix is one extra `../` so the path resolves to
examples/... at repo root, where the files actually live.

Files touched:
  docs/getting-started/examples.md           5 links
  docs/getting-started/why-certctl.md        1 link
  docs/migration/cert-manager-coexistence.md 1 link
  docs/migration/from-acmesh.md              1 link
  docs/migration/from-certbot.md             1 link

Verified: every `../../examples/<name>/<name>.md` reference now
resolves to the on-disk file. Re-checked via:

  for f in $(grep -rl 'examples/' docs/); do
    for link in $(grep -oE '\.\./\.\./examples/[^)]*' "$f"); do
      [ -e "$(dirname "$f")/$link" ] || echo "STILL BROKEN: $f -> $link"
    done
  done

zero "STILL BROKEN" output.

Closes #11
2026-05-06 20:30:32 +00:00
shankar0123 1720e11109 docs: fix broken single-file demo invocation in README + qa-prerequisites + ENVIRONMENTS
The README's Quick Start, the qa-prerequisites contributor doc, and the
landing page (separate repo, separate commit) all shipped a copy-paste
command that produces:

  service "certctl-server" has neither an image nor a build context
  specified: invalid compose project

The bug landed silently with commit a3d8b9c (the U-3 master). Pre-U-3,
docker-compose.demo.yml was self-contained and could be invoked with a
single -f flag. U-3 deliberately reduced it to a 27-line overlay — its
only payload today is `CERTCTL_DEMO_SEED=true` on the certctl-server
service — because the demo seed now applies at boot via
postgres.RunDemoSeed, not via /docker-entrypoint-initdb.d/. The overlay
no longer carries an image: or build: of its own, so it MUST be passed
alongside the base file.

The README/qa-doc/landing-page never picked up the rename of the contract.
Every operator who copy-pasted the Quick Start since U-3 has hit the
"invalid compose project" error and bounced. The operator caught it
running the demo locally today.

This commit fixes the three certctl-repo sites:

  README.md (Quick Start)
    docker compose -f deploy/docker-compose.demo.yml up -d --build
    →
    docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build

    Plus the "drop the -f flag for clean install" prose now spells out
    the correct fallback (`-f deploy/docker-compose.yml` alone).

  docs/contributor/qa-prerequisites.md (Step 1)
    Same single-file → two-file fix, plus an inline note explaining
    why the override-only file requires the base (so the next person
    who reads it understands the contract instead of re-discovering it).

  deploy/ENVIRONMENTS.md (Demo Overlay → What it adds)
    Replaced the stale "One line: mounts seed_demo.sql into PostgreSQL's
    init directory" claim — that hasn't been true since U-3 — with the
    accurate "One env var: CERTCTL_DEMO_SEED=true; server applies
    seed_demo.sql at boot via postgres.RunDemoSeed" description, plus
    the historical context for why the overlay can't stand alone.

The certctl.io landing page hits the same bug (line 759); fix shipping
in a separate commit in that repo.

Acceptance gate (manual):
  - copy/paste the new README Quick Start command end-to-end against
    a fresh clone — succeeds, dashboard at https://localhost:8443
    shows the seeded demo data within ~30s.
  - clean-install fallback (`docker compose -f deploy/docker-compose.yml
    up -d --build`) starts a working stack with no demo data.
2026-05-05 20:55:26 +00:00
shankar0123 0e06f6c4fc cli: promote --force on renew + require --reason on revoke (closes P3-1, P3-2)
Closes findings P3-1 and P3-2 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Both findings
flagged hidden defaults that the CLI was sending without exposing them
to operators: `force=false` baked into every renew payload, and a silent
fallback to `reason="unspecified"` whenever --reason was omitted.

P3-1 — promote --force on `certs renew` (full end-to-end plumbing)

The pre-2026-05-05 CLI sent `{"force": false}` in the renew body. The
API handler never decoded it — a textbook "lying field" per the
operator's CLAUDE.md "complete path, not the easy path" rule: the body
field stored a value, claimed to do something, and silently did nothing
because the wire never reached the consumer. Adding a --force flag that
also went unread would have created another lying field.

This commit takes the complete path:

  service.CertificateService.TriggerRenewal grew a `force bool` parameter
  (internal/service/certificate.go). When force=true, the
  RenewalInProgress block is overridden so operators can recover stuck
  in-flight renewals where a previous job hung without releasing the
  status flag. Archived and Expired remain terminal blockers regardless
  of force — those are semantic dead-ends that --force should not paper
  over (archived = decommissioned, expired = issue a new cert instead of
  renewing a dead one).

  handler.CertificateHandler.TriggerRenewal parses force from
  ?force=true (or ?force=1) query param, OR {"force": true} JSON body,
  whichever the client picks. Defaults to false. Passes through to the
  service.

  internal/cli/client.go::RenewCertificate(id, force bool) sends
  ?force=true on the URL when --force is set. The historical hardcoded
  `{"force": false}` body is gone — no more lying field.

  cmd/cli/main.go dispatches `certs renew <id> [--force]` (ID-first
  flag-second convention matches the existing `agents retire <id>
  [--force]`).

P3-2 — require --reason on `certs revoke` (Option A: strict refusal)

The pre-2026-05-05 CLI dropped to `--reason unspecified` whenever the
operator omitted the flag. Compliance reporting (RFC 5280 §5.3.1, PCI-
DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful;
silent fallback defeats the audit trail because every revocation looks
identical.

  cmd/cli/main.go dispatch refuses to send when --reason is empty,
  prints the canonical RFC 5280 §5.3.1 reason-code menu, and exits
  non-zero.

  internal/cli/client.go exposes ValidRevokeReasons() returning the
  canonical camelCase list (unspecified, keyCompromise, caCompromise,
  affiliationChanged, superseded, cessationOfOperation, certificateHold,
  removeFromCRL, privilegeWithdrawn, aaCompromise) and
  NormalizeRevokeReason() that accepts both camelCase and snake_case
  inputs and normalises to the canonical wire form. Off-list reasons
  are rejected at dispatch with the menu re-printed.

Test pins:

  internal/cli/client_test.go::TestClient_RenewCertificate_ForceFlag —
  --force=true sends ?force=true with empty body; --force=false sends
  no query and no body.

  internal/cli/client_test.go::TestNormalizeRevokeReason +
  TestValidRevokeReasons — canonical-camelCase + snake_case + reject-
  off-enum behaviour.

  cmd/cli/dispatch_test.go::TestHandleCerts_Revoke_RequiresReason +
  TestHandleCerts_Revoke_RejectsUnknownReason +
  TestHandleCerts_Renew_ForceFlag — dispatch-layer pins for the same
  contracts.

  internal/api/handler/certificate_handler_test.go::TestTriggerRenewal_
  ForceQueryParam — query-param passthrough (no-flag, force=true,
  force=1, force=false) flows through to the service-layer parameter.

  internal/service/certificate_test.go::TestTriggerRenewal_
  ForceOverridesInProgress — force=false preserves the
  RenewalInProgress block; force=true clears it.

  Existing TestTriggerRenewal_Archived extended to assert force=true
  still blocks Archived (terminal-state guarantee).

Docs: docs/reference/cli.md updated with the --force example for renew
and the strict --reason semantics for revoke (including snake_case
input acceptance).

Acceptance gate (verified):
  - go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/...
    ./cmd/mcp-server/... clean.
  - go vet ./... clean.
  - go test -short -count=1 ./... pass repo-wide.
  - bash scripts/ci-guards/openapi-handler-parity.sh clean
    (router 178, OpenAPI 144, exceptions 36 — unchanged; we add
    parameter parsing, not routes).
  - gofmt -l clean.
2026-05-05 19:49:34 +00:00
shankar0123 ff75361553 mcp(coverage): add 34 tools across 7 domains to close 2026-05-05 parity audit P1 findings
Closes findings P1-1..P1-35 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Before this
bundle, 35 operator-facing API endpoints had GUI surfaces but no MCP
counterpart — operators using AI assistants for cert lifecycle work in
regulated environments had to drop to curl for approve/reject, health-check
acknowledgement, renewal-policy CRUD, network-scan triggering, discovery
triage, intermediate-CA management, and job verification.

Tool count: 87→121 in tools.go (+34), 6 unchanged in tools_est.go.
Re-derive via grep -cE 'gomcp\\.AddTool\\(' internal/mcp/tools.go
internal/mcp/tools_est.go.

The 7 phases (matching the bundle prompt at
cowork/mcp-coverage-expansion-prompt.md):

  Phase A — Approvals (P1-28..P1-31, 4 tools)
    list_approvals, get_approval, approve_request, reject_request.
    Two-person-integrity contract (ErrApproveBySameActor → HTTP 403)
    is preserved automatically: the decided_by actor is derived
    server-side from middleware.UserKey, NOT from request body, so
    the MCP server's authenticated API-key identity becomes the
    audit-trail actor. The MCP input schema deliberately omits any
    actor_id field to prevent client-side spoofing.

  Phase B — Health Checks (P1-20..P1-27, 8 tools)
    list, summary, get, create, update, delete, history, acknowledge.
    Mirrors the existing target-resource shape; acknowledge takes
    optional 'actor' string captured in the audit row (handler defaults
    to 'unknown' if absent).

  Phase C — Renewal Policies (P1-1..P1-5, 5 tools)
    Standard CRUD against /api/v1/renewal-policies. Distinct from the
    legacy 'policy' tools that point at the same path — these expose
    the renewal-policy domain explicitly with full alert_channels +
    alert_severity_map field shape.

  Phase D — Network Scan Targets (P1-14..P1-19, 6 tools)
    CRUD + trigger_scan. trigger_network_scan returns the discovery-
    scan body so the AI can chain into list_discovered_certificates
    filtered by agent_id.

  Phase E — Discovery read-side (P1-10..P1-13, 4 tools)
    list_discovered_certificates, get_discovered_certificate,
    list_discovery_scans, discovery_summary. Complements the
    pre-existing claim/dismiss tools (registered alongside Health
    historically per the I-2 closure).

  Phase F — Intermediate CAs (P1-6..P1-9, 4 tools)
    list, create (root + child via discriminator on body shape), get,
    retire. The handler is admin-gated via middleware.IsAdmin; the
    least-privilege boundary is enforced at the API layer (HTTP 403
    for non-admin Bearer callers) — not by transport carve-out.

  Phase G — Verification + deployments (P1-32, P1-34, P1-35, 3 tools)
    list_certificate_deployments, verify_job, get_job_verification.
    P1-33 (POST /api/v1/agents/{id}/discoveries) is intentionally
    excluded — machine-to-machine push channel for agents reporting
    filesystem-scan results, not an operator-driven flow. Documented
    inline in the RegisterTools dispatch.

Implementation:
  - 14 new input types in internal/mcp/types.go with jsonschema struct
    tags driving LLM tool discovery.
  - 7 register* functions in internal/mcp/tools.go each handling one
    phase, wired into RegisterTools dispatch in declaration order.
  - 34 new entries in tools_per_tool_test.go::allHappyPathCases —
    the existing in-process MCP harness (TestMCP_AllTools_HappyPath +
    TestMCP_AllTools_ErrorPath + TestMCP_RegisterTools_DispatchableToolCount)
    auto-extends coverage to cover every new tool: happy-path round-
    trip with fence-shape assertion, 5xx error-path with MCP_ERROR fence
    propagation, and 'every registered tool is dispatchable' guard.
  - docs/reference/mcp.md 'Available Tools' table expanded from 16 to
    22 resource domains with current per-domain tool counts.

Acceptance gate (verified):
  - go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/...
    clean across all four production binaries.
  - go vet ./... clean.
  - go test -short -count=1 ./internal/mcp/... pass (TestMCP_AllTools_*
    expanded to 127 tool round-trips).
  - go test -short -count=1 ./... pass repo-wide.
  - bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178,
    OpenAPI 144, exceptions 36 — unchanged; we add MCP wrappers, not
    routes).
  - gofmt -l clean across the four touched files.
2026-05-05 19:29:57 +00:00
shankar0123 75097909e9 2026-05-05 18:18:29 +00:00
shankar0123 9acf609ac9 docs: convert ASCII flow diagram to Mermaid in test-environment.md
Per operator audit: every diagram in docs/ should be Mermaid except
in the repo-root README.md. The 'Key Generation Flow (Agent-Side)'
section in docs/contributor/test-environment.md was rendered as a
plain code fence with arrow-prose:

  Server creates job (AwaitingCSR) → Agent polls, sees job →
  Agent generates ECDSA P-256 key pair locally → ...

That was the only non-Mermaid diagram-shaped block left in docs/.

Converted to a Mermaid sequenceDiagram with 5 participants
(certctl-server, issuer connector, certctl-agent, local agent FS,
shared volume) covering the full AwaitingCSR → CSR-submit →
Deployment-job → cert-write → Completed lifecycle.

Audit + verification script: cowork/docs-audit-2026-05-05/mermaid-audit.md.
Re-running the detection script post-fix returns zero non-Mermaid
diagram-like blocks across all 76 docs/ markdown files.

Total Mermaid coverage in docs/ now: 14 docs / 40 blocks.
2026-05-05 06:18:24 +00:00
shankar0123 622cd29f20 docs: factuality sweep — fix 3 broken links + 12 count claims (audit findings 2026-05-05)
Per the cowork/docs-audit-2026-05-05/ end-to-end factuality audit
(20 confirmed findings across 76 docs, 7 parallel subagents +
audit-of-the-audit). Hot + Warm tier fixes ship here; STALE
findings (qa-test-suite.md test-count snapshot) need 'make
qa-stats' which is operator-side.

BROKEN links repaired (3):
- docs/reference/api.md L195: [Quick Start](quickstart.md) →
  ../getting-started/quickstart.md (404 pre-fix)
- docs/reference/api.md L196: [Connector Guide](connectors.md) →
  connectors/index.md (Phase 4 rename, was 404 pre-fix)
- docs/reference/protocols/scep-intune.md L377:
  [legacy-est-scep.md](legacy-est-scep.md) → scep-server.md
  (file was deleted in Phase 7 commit e9b1510)

INCORRECT count claims repaired (12):
- api.md L5 + L18-19 + L155: '78 API operations' / '# 78' /
  'all 78 documented operations' → re-derive via
  grep -cE '^\s+operationId:' (actual at HEAD: 144)
- architecture.md L66 (Mermaid label) + L502 + L1047 + L1253:
  '8 always-on + 4 optional loops' / '12-loop topology' →
  9 always-on + 5 opt-in loops (14 total). Always-on/opt-in
  breakdown derived from cmd/server/main.go startup wiring:
  always-on are agentHealthCheck, crlGeneration, jobProcessor,
  jobRetry, jobTimeout, notificationProcess, notificationRetry,
  renewalCheck, shortLivedExpiryCheck (9); opt-in are
  networkScan, digest, healthCheck, cloudDiscovery, acmeGC (5).
  Re-derive count via grep -cE '^func \(s \*Scheduler\)
  [a-zA-Z]+Loop' internal/scheduler/scheduler.go.
- configuration.md L31: '12 loops, 8 always-on + 4 opt-in' →
  '14 loops, 9 always-on + 5 opt-in'. Self-introduced regression
  from commit 3275f9f (2026-05-05).
- mcp.md L11 + L65: 'all 78 API endpoints' / '78 available tools'
  → re-derive via grep -cE 'mcp\.AddTool\(' (actual at HEAD:
  87 MCP tools, 144 API operations).
- connectors/index.md L111: '9 built-in' issuer connectors →
  '12 built-in', extending the inline enumeration to include
  Entrust, GlobalSign, EJBCA (which had been added since the
  L111 prose was written). Local-CA framing extended to mention
  tree mode + ADCS sub-CA mode-doc.
- connectors/index.md L112: '14 built-in' target connectors →
  '15 built-in', adding AWS ACM target + Azure Key Vault target
  (which had been added since the L112 prose was written).
- why-certctl.md L37 + the inline list: 'Nine issuer connectors
  ship today' → 'Twelve issuer connectors', adding
  AWS ACM PCA, Entrust, GlobalSign, EJBCA to the list and
  removing the misleading 'EST enrollment' bullet (EST is a
  protocol surface, not an issuer; clarified in trailing note).
- why-certctl.md L66: '13 deployment targets' → '15', adding
  Kubernetes Secrets, AWS ACM, and Azure KV to the inline list.
- why-certctl.md L92: 'supports 9 issuer types' → '12 issuer
  types'.
- quickstart.md L135: '35 demo certificates across 5 issuers'
  → re-derive cert count via 'grep -oE "mc-[a-z0-9_-]+"
  migrations/seed_demo.sql | sort -u | wc -l' (actual: 32,
  matches README L86; quickstart was off-by-3).
- quickstart.md L452 (Demo Data Reference table): Certificates
  '35' → '32' (matches the cert count from seed_demo.sql).

Verification:
- grep confirms no remaining stale refs across the touched
  files (8 files, 31 insertions / 28 deletions).
- All 24 ci-guards/*.sh pass locally.
- The audit's STALE findings (S-1, S-2 qa-test-suite.md
  Bundle-P snapshot) are operator-side: run 'make qa-stats'
  to refresh the Test Suite Health table.

Companion: cowork/docs-audit-2026-05-05/RESULTS.md captures
the full audit with subagent false positives and missed
findings called out.
2026-05-05 06:15:35 +00:00
shankar0123 d809874fa1 docs: retire compliance subtree + sweep framework name-drops from prose
Per operator decision the framework-mapping docs are gone. They
were aspirational (no audit, no certification, no validated
mapping); keeping them around was misleading.

Files deleted (1,883 lines):
- docs/compliance/index.md
- docs/compliance/soc2.md
- docs/compliance/pci-dss.md
- docs/compliance/nist-sp-800-57.md

Hyperlinks removed:
- README.md: 'Auditor / compliance' row in the doc table; the
  '(compliance mapping included)' parenthetical in the
  positioning paragraph
- docs/README.md: the '## Compliance' section table; the
  'Auditor / compliance team' reading-order-by-role row

Prose name-drops swept across 24 files:
- README.md: 'FedRAMP boundary CAs / financial-services policy
  CAs' → '4-level boundary CAs / 3-level policy CAs';
  'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High,
  SOC 2 Type II, HIPAA' → cut entirely
- getting-started/{quickstart,concepts,examples,why-certctl,
  advanced-demo}.md: 'compliance' → 'audit' / 'policy';
  'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut;
  ''pci': 'true'' tag example → ''environment': 'production''
- migration/cert-manager-coexistence.md: 'compliance rules' →
  'policy rules'
- operator/approval-workflow.md: 'Compliance customers (PCI-DSS
  Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' →
  'Operators'; entire 'Compliance control mapping' table
  (PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1
  / HIPAA §164.308(a)(4)) deleted; 'compliance contract' →
  'two-person-integrity contract'; 'compliance auditors' →
  'reviewers'
- operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5'
  audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5
  attestation' section retitled to 'TLS posture summary' and
  rewritten without framework framing; 'PCI-DSS, NIST, and
  major browsers will eventually deprecate TLS 1.2' →
  'Major browsers and OS vendors will eventually deprecate
  TLS 1.2'
- operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref →
  CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS
  Req 4 v4.0 prose footing → cut
- operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI
  procurement-team deliverable' → 'on-call deliverable';
  'compliance auditors' → 'reviewers'
- reference/connectors/{acme,aws-acm,azure-kv,globalsign,
  local-ca,openssl,ssh,index}.md: 'compliance reporting
  (PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting';
  'Compliance environments (PCI-DSS Level 1, FedRAMP High,
  HIPAA)' → 'Regulated environments'; 'compliance audits' →
  'audit'; 'FedRAMP boundary CA' pattern names →
  '4-level boundary CA' (technically descriptive)
- reference/protocols/est.md: 'compliance-hook seam' →
  'device-state hook seam'; 'compliance gating' → 'device-state
  gating'; 'est_compliance_failed' → 'est_device_state_failed'
- reference/protocols/scep-intune.md: 'Optional compliance
  check' → 'Optional device-state check'; failure-counter
  'compliance_failed' → 'device_state_failed'; 'Conditional
  Access compliance gating' → 'Conditional Access
  device-state gating'
- reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA
  deployments where the regulator requires...' →
  'Boundary-CA deployments where you want separation of policy
  and issuing authorities'; pattern A retitled '4-level FedRAMP
  boundary CA' → '4-level boundary CA'
- reference/architecture.md: broken Related-docs link to
  compliance.md removed; the rest of that block had stale
  pre-Phase-2 paths (quickstart.md, demo-advanced.md,
  connectors.md, openapi.md, testing-guide.md, test-env.md) —
  retargeted to current locations
- reference/deployment-model.md: 'SOC 2 evidence-report
  generator' → 'Audit-evidence report generator'
- reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this
  into evidence packs' → 'reviewers paste this into
  vendor-evaluation packs'
- contributor/qa-test-suite.md: 'compliance exist' coverage
  description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)'
  risk-class label → 'Audit-relevant'

What was kept:
- CWE references (legitimate technical pointers)
- Microsoft API/feature names that happen to use 'compliance'
  literally ('Microsoft Graph compliance API',
  'device-compliance validators' — these are MS product names,
  not framework name-drops)
- 'NIST PQC' on the landing page (Post-Quantum Cryptography is
  the actual NIST standard family, not a compliance framework)

Verified: zero hyperlinks into docs/compliance/ remain. All 24
ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean.
Net diff: 26 files / -1,883 deletions in compliance/ + -32 net
across the prose sweep.

Companion edits in cowork/ (CLAUDE.md doc-tree summary +
WORKSPACE-CHANGELOG.md retirement note) land separately.
2026-05-05 05:26:44 +00:00
shankar0123 3275f9f1e0 ci: post-Phase-2-docs-overhaul cleanup of stale guards + missing config doc
CI run on the ecb8896 push surfaced two real failures rooted in the
2026-05-04 docs overhaul:

  1. G-3 env-docs-drift caught two phantom CERTCTL_* env vars I'd
     introduced in the Phase 4 follow-on connector pages
     (CERTCTL_CA_CERT_PATH_NEW in adcs.md was a placeholder I made
     up; CERTCTL_EJBCA_POLL_MAX_WAIT_SECONDS in ejbca.md does not
     exist in source). Both removed.

  2. QA-doc Part-count drift guard tried to grep
     docs/qa-test-guide.md and docs/testing-guide.md, both of which
     were renamed/deleted in Phase 2/Phase 5. The Part-count drift
     class died with testing-guide.md (Phase 5 prune dispersed its
     content); the seed-count drift class is still live but pointed
     at the wrong path.

Fixes:

- Removed the QA-doc Part-count drift guard from ci.yml (premise
  dead) plus its standalone scripts/qa-doc-part-count.sh peer.
- Retargeted the QA-doc seed-count drift guard from
  docs/qa-test-guide.md → docs/contributor/qa-test-suite.md (the
  Phase 2 target). Updated both ci.yml inline copy and
  scripts/qa-doc-seed-count.sh.
- Updated Makefile qa-stats: target to drop the testing-guide.md
  Parts metric (file is gone).
- Updated Makefile verify-docs: target to drop the part-count step.

G-3 was also failing in the second direction (env vars defined in
config.go but never documented anywhere). 16 vars surfaced —
features.md (deleted Phase 6) and testing-guide.md (deleted Phase 5)
had been their canonical home. Created
docs/reference/configuration.md as the new home: a compact
operator-facing env-var reference covering scheduler intervals, job
lifecycle, rate limiting, audit, deploy verify, database,
agent-side, and SCEP profile binding. Added to docs/README.md
Reference table.

Doc-side updates to qa-test-suite.md to reframe its references to
the deleted testing-guide.md (it's now self-contained: the
Part-by-Part Coverage Map IS the canonical Part inventory).

Cosmetic comment-only updates in ci.yml + scripts/ci-guards/*.sh +
scripts/dev-setup.sh to point at the new audience-organized doc
paths (docs/operator/security.md, docs/operator/tls.md,
docs/reference/architecture.md, etc.) instead of the pre-Phase-2
flat layout.

Verified: all 24 ci-guards/*.sh pass locally; qa-doc-seed-count.sh
clean. Net diff: 178 additions / 112 deletions across 13 files.
One file deleted (qa-doc-part-count.sh) and one file added
(docs/reference/configuration.md).
2026-05-05 04:56:26 +00:00
shankar0123 ecb8896b1c docs: cleanup pre-existing broken links in connector pages
Phase 4 structural (commit 633e440) moved 6 connector files into the
new docs/reference/connectors/ subdirectory but didn't update all
inter-doc references for the new path layout. Phase 11 caught the
high-traffic ones; this sweep gets the rest, found by the Phase 4
follow-on verification pass.

Mappings applied (relative to docs/reference/connectors/):

  deployment-atomicity.md     → ../deployment-model.md
  deployment-vendor-matrix.md → ../vendor-matrix.md
  architecture.md             → ../architecture.md
  est.md                      → ../protocols/est.md
  scep-intune.md              → ../protocols/scep-intune.md
  async-polling.md            → ../protocols/async-ca-polling.md
  quickstart.md               → ../../getting-started/quickstart.md
  demo-advanced.md            → ../../getting-started/advanced-demo.md
  legacy-est-scep.md          → ../protocols/scep-server.md
  connectors.md               → index.md

Plus prose backtick references (`docs/architecture.md` etc.) updated
to the new subdirectory paths.

Files touched: apache, f5, iis, k8s, nginx, index. 33 line changes.
Full link-check across docs/reference/connectors/*.md is now clean
(0 broken inter-doc references).
2026-05-05 04:10:09 +00:00
shankar0123 f179eab071 docs: expand docs/README.md connectors section to enumerate all 28 deep-dive pages
After the Phase 4 follow-on (commits fd94205de06141082b8cf969853e), the docs/reference/connectors/ tree carries 13 issuer
per-pages + 15 target per-pages alongside the index. Update the
top-level docs navigation to surface them all.

Replaced the previous 5-row connectors table with two
single-paragraph indexes (issuers, targets) listing every per-page
in alphabetical order. The connectors index.md is still the
canonical catalog (interfaces, registry, scanners + inline
reference per built-in); the deep-dive pages cover operator-grade
material on top.

Net: docs/README.md gains coverage of 23 new pages without bloating
the file (two prose paragraphs vs a 28-row table).
2026-05-05 04:08:08 +00:00
shankar0123 969853ee53 docs: Phase 4 follow-on batch 4 — 5 final target per-pages
Extracts the remaining target connectors:

- ssh.md (194 lines) — agentless SSH/SFTP deploy with full
  host-key-acceptance threat model (what's accepted, what's not,
  mitigations including known_hosts enforcement and SSH cert auth);
  V3-Pro forward path
- wincertstore.md (118 lines) — non-IIS Windows services via local
  PowerShell or WinRM proxy mode; store selection (My / Root /
  WebHosting); private-key permissions guidance
- jks.md (189 lines) — JKS / PKCS#12 via keytool with full atomic
  snapshot+rollback contract (Bundle 8 'snapshot → delete → import →
  reload'), keytool argv password exposure threat model + mitigations
- aws-acm.md (208 lines) — ACM target with full IAM policy, IRSA /
  instance-profile / SSO auth recipes, atomic-rollback contract,
  ALB attachment Terraform recipe, procurement-checklist crib
- azure-kv.md (195 lines) — Key Vault target with managed-identity /
  workload-identity / service-principal auth recipes, version-
  semantics rollback caveat (no in-place restore without soft-delete),
  App Gateway / Front Door attachment recipe

Index forward-list expanded to enumerate all 15 target connectors
(5 from Phase 4 structural + 5 from batch 3 + 5 from this batch) in
alphabetical order.

This is part 4 of 4 for the Phase 4 follow-on (per-connector page
extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md.

Net add: 5 files, 904 lines. No content removed from index.md.

End-state of Phase 4 follow-on:
- 13 issuer per-pages (5 batch 1 + 8 batch 2)
- 15 target per-pages (5 Phase 4 structural + 5 batch 3 + 5 batch 4)
- index.md keeps its inline reference content; per-pages add
  operator depth on top, matching the pattern set by
  apache/f5/iis/k8s/nginx in Phase 4 structural
2026-05-05 04:07:21 +00:00
shankar0123 082b8cf660 docs: Phase 4 follow-on batch 3 — 5 file-based target per-pages
Extracts the file-based deploy target connectors:

- haproxy.md (107 lines) — combined-PEM (cert+chain+key) deploy with
  haproxy -c validate; multi-frontend + crt-list directory guidance
- traefik.md (105 lines) — file-provider zero-reload deploy; file
  watcher latency notes; mixing with built-in ACME guidance
- caddy.md (100 lines) — admin API mode (recommended) vs file mode;
  admin-API exposure threat model
- envoy.md (112 lines) — file SDS mode (recommended) vs static
  bootstrap; service-mesh interactions
- postfix.md (175 lines) — dual-mode (Postfix MTA / Dovecot IMAPS)
  connector with daemon-specific quirks (STARTTLS chain expectations,
  no shared session cache); Bundle 11 test pins

Index forward-list expanded to enumerate all 10 target connectors
(5 from Phase 4 structural + 5 from this batch) in alphabetical
order.

This is part 3 of 4 for the Phase 4 follow-on (per-connector page
extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md.

Net add: 5 files, 599 lines. No content removed from index.md.
2026-05-05 04:02:25 +00:00
shankar0123 de06141ce5 docs: Phase 4 follow-on batch 2 — 8 remaining issuer per-pages
Extracts the rest of the issuer per-connector deep-dive pages:

- local-ca.md (170 lines) — Local CA self-signed / sub-CA / tree mode,
  CRL+OCSP endpoints, EKU support, MaxTTL enforcement, L-014 file-on-
  disk threat model carve-out
- acme.md (235 lines) — RFC 8555 v2 client (HTTP-01 / DNS-01 /
  DNS-PERSIST-01), ARI per RFC 9773, EAB + ZeroSSL auto-EAB,
  Let's Encrypt profile selection, revoke-by-serial Top-10 fix #7
- step-ca.md (99 lines) — Smallstep JWK-provisioner synchronous
  issuance with MaxTTL enforcement
- openssl.md (157 lines) — script-based shell-out with full
  threat model (what's accepted, what's not, mitigations, V3-Pro
  forward path)
- sectigo.md (98 lines) — Sectigo SCM REST with bounded async polling
- google-cas.md (89 lines) — GCP managed private CA with OAuth2
  service-account auth + IAM-role guidance
- entrust.md (96 lines) — Entrust CA Gateway mTLS-authenticated with
  approval-pending support and mTLS keypair caching
- globalsign.md (122 lines) — Atlas HVCA dual auth (mTLS + API
  key/secret), region-aware base URLs, mTLS keypair caching

Index forward-list expanded to enumerate all 13 issuer connectors
(including the 5 pages from batch 1) in alphabetical order.

This is part 2 of 4 for the Phase 4 follow-on (per-connector page
extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md.

Net add: 8 files, 1,066 lines. No content removed from index.md.
2026-05-05 03:59:35 +00:00
shankar0123 fd94205cfa docs: Phase 4 follow-on batch 1 — 5 issuer per-pages
Extract the first 5 issuer per-connector deep-dive pages:

- vault.md (128 lines) — Vault PKI synchronous issuance, token TTL +
  auto-renewal loop, MaxTTL enforcement, rotation playbook
- digicert.md (106 lines) — CertCentral DV/OV/EV with bounded async
  polling for vetting workflows
- aws-acm-pca.md (165 lines) — managed private CA on AWS with full
  IAM policy, IRSA wiring, troubleshooting matrix
- ejbca.md (116 lines) — open-source / Keyfactor EJBCA with mTLS or
  OAuth2 auth, mTLS keypair caching, approval-pending guidance
- adcs.md (111 lines) — Active Directory Certificate Services as
  enterprise root via Local CA sub-CA mode, sub-CA rotation playbook

Index updated with forward-list entries and the index-purpose blurb
revised so the index now positions itself as 'navigate from here;
deeper material lives in siblings' rather than 'docs to be extracted
later'.

Each per-page follows the WHAT/HOW/WHY pattern: what the connector is,
how authentication and issuance work, and when to choose this vs an
alternative. Cross-links to the connector index, async-ca-polling
primitive, and adjacent operator runbooks.

This is part 1 of 4 for the Phase 4 follow-on (per-connector page
extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md.

Net add: 5 files, 626 lines. No content removed from index.md (the
index keeps its inline reference; per-pages add operator depth on
top, matching the pattern set by apache/f5/iis/k8s/nginx in Phase 4
structural).
2026-05-05 03:53:52 +00:00
shankar0123 b452013dd9 docs: Phase 5 — testing-guide.md prune (8268 → 0 lines, content dispersed)
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/
and the section-by-section plan in testing-guide-tumor.md.

testing-guide.md was 30% of all docs/ content (8268 lines) but was
integration test code written in markdown, not operator documentation.
The audit's tumor analysis disposed of every Part:
  - ~65% DELETE (test cases that already exist in code)
  - ~22% MOVE to inline test code
  - ~8% KEEP-COMPRESSED into focused operator-runbook docs
  - Title + contents + release sign-off ~5% KEEP

This commit ships the KEEP-COMPRESSED dispersal:

  docs/contributor/qa-prerequisites.md (NEW, ~120 lines):
    From testing-guide.md "Prerequisites" section. Stack boot procedure,
    demo data baseline, reference IDs operators reuse across QA docs.

  docs/contributor/gui-qa-checklist.md (NEW, ~105 lines):
    From testing-guide.md "Part 35: GUI Testing". Manual GUI verification
    pass for release sign-off. 25-row table covering every dashboard page.

  docs/contributor/release-sign-off.md (NEW, ~130 lines):
    From testing-guide.md "Release Sign-Off" section (originally 1009
    lines of per-test detail tables). Compressed to a release-day
    checklist organized by gate category: code state, automated gates,
    manual QA passes, release artefact verification, branch protection,
    post-release.

  docs/operator/performance-baselines.md (NEW, ~100 lines):
    From testing-guide.md "Part 39: Performance Spot Checks". Four
    operator-runnable benchmarks (API request handling, inventory list
    pagination, scheduler tick, bulk revoke) with baseline numbers and
    when-to-re-baseline guidance.

  docs/operator/helm-deployment.md (NEW, ~120 lines):
    From testing-guide.md "Part 52: Helm Chart Deployment". Operator
    runbook for the bundled deploy/helm/certctl/ chart: prereqs,
    install, four cert-source patterns, verify, upgrade, troubleshooting.

  docs/reference/cli.md (NEW, ~120 lines):
    From testing-guide.md "Part 28: CLI Tool". certctl-cli command
    reference with command-group breakdown, common workflows
    (list/filter, renew, revoke, bulk import, EST enrollment, status),
    output formats, CI/CD integration patterns.

docs/README.md navigation index updated to include the 6 new docs:
  Reference section gains: cli.md, release-verification.md (was added
    in Phase 13)
  Operator section gains: helm-deployment.md, performance-baselines.md
  Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md,
    release-sign-off.md

docs/testing-guide.md deleted. Git history preserves the 8268 lines —
if any specific test case is found missing from inline test code or
the destination docs during future work, lift from `git show
HEAD~1:docs/testing-guide.md`.

Net: docs/ total line count drops by ~7700 lines (28%), from 26,369
to 18,742. testing-guide.md was the single largest doc; pruning it is
the single biggest content-edit win of the entire restructure.

Phase 5 is the last major content phase. Remaining: Phase 4 follow-on
(per-connector page extractions from reference/connectors/index.md),
Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).
2026-05-05 03:38:54 +00:00
shankar0123 fd4eb3b165 docs: Phase 11 follow-on — fix remaining anchor + cross-dir links
Final cleanup pass after the previous Phase 11 commits. Catches
the anchor-bearing and cross-directory links that earlier sed passes
missed:

  docs/reference/protocols/acme-server.md (3 fixes):
    (./tls.md) → (../../operator/tls.md)
    (./architecture.md) → (../architecture.md)
    (./architecture.md#agents) → (../architecture.md#agents)

  docs/migration/from-certbot.md (1 fix):
    (./quickstart.md#network-discovery-agentless)
    → (../getting-started/quickstart.md#network-discovery-agentless)

  docs/migration/cert-manager-coexistence.md (1 fix):
    (./architecture.md#agents) → (../reference/architecture.md#agents)

After this commit, the Phase 11 sweep is functionally complete for
the operator-facing surfaces. Remaining valid sibling links
(`(./<name>.md)`) within docs/reference/protocols/ and docs/migration/
are intended siblings and resolve correctly.

The remaining open Phase 11 items are:
  - testing-strategy.md → testing-guide.md link, still valid because
    testing-guide.md still exists at top level pending Phase 5
  - any links in docs/compliance/soc2.md and docs/compliance/nist-sp-800-57.md
    if they reference moved docs (low traffic; revisit if Phase 4
    follow-on or Phase 5 work surfaces them)
2026-05-05 03:32:09 +00:00
shankar0123 a364cd6990 docs: Phase 11 follow-on — fix anchor-bearing + remaining inter-doc links
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Sweeps the anchor-bearing inter-doc links that the previous Phase 11
sed pass missed (anchors after .md# weren't matched), plus a few
remaining cross-refs in docs/reference/.

Per source file:

  docs/migration/acme-from-caddy.md (1 anchor link):
    (./acme-server.md#certificate-readyfalse-with-rejectedidentifier)
    → (../reference/protocols/acme-server.md#certificate-readyfalse-...)

  docs/migration/acme-from-cert-manager.md (3 anchor links):
    Same shape; all (./acme-server.md#...) → (../reference/protocols/acme-server.md#...)

  docs/reference/connectors/index.md (5 walkthrough + reference links):
    (./acme-server.md) → (../protocols/acme-server.md)
    (./acme-server-threat-model.md) → (../protocols/acme-server-threat-model.md)
    (./acme-cert-manager-walkthrough.md) → (../../migration/acme-from-cert-manager.md)
    (./acme-caddy-walkthrough.md) → (../../migration/acme-from-caddy.md)
    (./acme-traefik-walkthrough.md) → (../../migration/acme-from-traefik.md)

  docs/reference/protocols/acme-server.md (3 walkthrough links):
    (./acme-cert-manager-walkthrough.md) → (../../migration/acme-from-cert-manager.md)
    (./acme-caddy-walkthrough.md) → (../../migration/acme-from-caddy.md)
    (./acme-traefik-walkthrough.md) → (../../migration/acme-from-traefik.md)

  docs/reference/protocols/acme-server-threat-model.md (1 cross-dir):
    (./tls.md) → (../../operator/tls.md)

After this commit, every grep for old-style `./<old-doc-name>.md` links
returns clean across docs/migration/, docs/reference/, and
docs/operator/.
2026-05-05 03:31:47 +00:00
shankar0123 12d7b1f51d docs: Phase 11 follow-on — fix inter-doc cross-references in deeper subdirs
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Continuation of Phase 11 (commit dca1900 handled README + first round
of docs/ links). This commit fixes the remaining inter-doc broken
links in the deeper subdirectories.

Per source directory:

  docs/getting-started/quickstart.md (1 fix):
    (connectors.md) → (../reference/connectors/index.md)

  docs/contributor/test-environment.md (2 fixes):
    (tls.md) → (../operator/tls.md)
    (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md)

  docs/contributor/testing-strategy.md (4 fixes):
    `docs/security.md` → `docs/operator/security.md`
    (security.md) → (../operator/security.md)
    `docs/testing-guide.md` (kept; testing-guide.md still at top level
      pending Phase 5 prune)
    (testing-guide.md) → (../testing-guide.md)

  docs/migration/acme-from-traefik.md (2 sites, multi-link):
    (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md)
    (./acme-server.md) → (../reference/protocols/acme-server.md)

  docs/migration/cert-manager-coexistence.md (1 fix):
    (./quickstart.md) → (../getting-started/quickstart.md)

  docs/migration/from-acmesh.md (2 fixes):
    (connectors.md) → (../reference/connectors/index.md)
    (./examples.md) → (../getting-started/examples.md)

  docs/migration/acme-from-caddy.md (multi-link):
    (./acme-cert-manager-walkthrough.md) → (./acme-from-cert-manager.md)
    (./acme-server.md) → (../reference/protocols/acme-server.md)

  docs/migration/acme-from-cert-manager.md (multi-link):
    (./acme-server.md) → (../reference/protocols/acme-server.md)
    (./acme-server-threat-model.md) → (../reference/protocols/acme-server-threat-model.md)
    (./acme-caddy-walkthrough.md) → (./acme-from-caddy.md)
    (./acme-traefik-walkthrough.md) → (./acme-from-traefik.md)

  docs/migration/from-certbot.md (2 fixes):
    (./concepts.md) → (../getting-started/concepts.md)
    (./examples.md) → (../getting-started/examples.md)

  docs/operator/tls.md (3 sites):
    (upgrade-to-tls.md) → (../archive/upgrades/to-tls-v2.2.md)
    (quickstart.md) → (../getting-started/quickstart.md)
    (test-env.md) → (../contributor/test-environment.md)

  docs/operator/runbooks/disaster-recovery.md (5 fixes):
    (crl-ocsp.md) → (../../reference/protocols/crl-ocsp.md)
    (tls.md) → (../../operator/tls.md)
    (security.md) → (../../operator/security.md)
    (scep-intune.md) → (../../reference/protocols/scep-intune.md)
    (est.md) → (../../reference/protocols/est.md)

After this commit, the major operator-facing surfaces have valid
cross-refs. Some lower-traffic docs (compliance/soc2.md, compliance/
nist-sp-800-57.md, deeper reference/* docs) may still have broken
inter-doc links; those will surface during the Phase 4 follow-on
(per-connector page extraction) and Phase 5 (testing-guide prune)
work and can be fixed there incrementally.
2026-05-05 03:31:05 +00:00
shankar0123 19c8fafe84 docs: Phase 14 — Last reviewed line sweep across docs/
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Adds a `> Last reviewed: 2026-05-05` line right after the H1 heading
of every doc that didn't already have one (41 files).

This dates the freshness clock for the future Phase 4 per-doc review.
The discipline going forward: when a doc's content gets a meaningful
edit, bump the date. When the date gets old (e.g., >6 months), the
doc earns a freshness-review pass.

Mechanical insertion via awk one-liner, applied to every docs/*.md
that didn't already match `grep -q 'Last reviewed:'`. Files that
already carried the line from earlier Phase 2 work (the navigation
index, the new connector docs, the new SCEP server / legacy-clients-
TLS-1.2 / release-verification docs, and the 5 per-connector deep
dives) were skipped to avoid duplicate insertion.

Net: every doc in docs/ now has a Last reviewed line.
2026-05-05 03:26:46 +00:00
shankar0123 426760d737 docs: Phase 13 — README rewrite to 250-line target
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
README went from 457 lines to a target of 250 (operator decision in
Phase 1 conversation). Focus shifts from feature-catalog + landing-page
duplicate to "developer cloning the repo needs orientation + quickstart
+ entry points to docs."

What stayed:
  - Logo + title + badges (~15 lines)
  - Elevator paragraph + 47-day cliff context (3 paragraphs, compressed)
  - Active-maintenance callout
  - Documentation table — restructured from 22 entries linking to flat
    docs/ to ~6 audience-organized rows linking through the new
    docs/README.md navigation index
  - Screenshots grid (4 tiles)
  - "What it does" — compressed from 33 lines of prose to 8 capability
    bullets, each linking to the canonical doc
  - Architecture paragraph — compressed to one paragraph linking to
    docs/reference/architecture.md
  - Quick Start (Docker Compose, Agent install, Helm, container images)
  - Examples table (5 turnkey scenarios)
  - Development commands
  - License paragraph
  - Dependencies block
  - Footer CTA

What got moved out:
  - Cosign verification / SLSA / SBOM section (67 lines) →
    docs/reference/release-verification.md (NEW). README links to it
    in a 3-line "Verifying a release" section.

What got removed entirely:
  - "Why certctl" + "Architecture" + "Security-first" + "Key design
    decisions" prose walls — duplicated landing page + architecture.md +
    security.md content. README no longer wades through 11 dense
    paragraphs.
  - "Supported Integrations" 4 sub-tables (Issuers / Targets / Protocols
    / Standards / Notifiers, ~80 lines of dense per-row marketing
    copy) — content lives at docs/reference/connectors/index.md and
    docs/reference/protocols/. README mentions counts ("12 issuers, 15
    targets, 6 notifiers") with a single link.
  - "Roadmap" section entirely — V1 + V2 history rotted fastest of any
    section; replaced with implicit "see Releases + Issues for active
    work" via the existing footer CTA.
  - "What It Does" 10-subsection wall (33 lines) — replaced with the
    8-bullet capability list, each linking to its canonical doc.
  - CLI section (20 lines of inline command examples) — links to the
    contributor docs.
  - MCP Server section (30 lines of setup) — links to docs/reference/mcp.md.

New surface added:
  - docs/reference/release-verification.md — moved cosign/SLSA/SBOM
    procedure with one expanded "Why this matters" paragraph
    explaining the keyless OIDC trust anchor.

Every docs/ link in the new README verified to resolve to an existing
file. Cross-references from other docs / certctl.io to the deleted
sections (if any) need follow-up Phase 11 sweeps.
2026-05-05 03:26:05 +00:00
shankar0123 affaa11d14 docs: Phase 12 — populate docs/README.md navigation index
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
The placeholder from Phase 1 (commit cda957f) gets replaced with the
audience-organized navigation index operators use to find what they
need.

Structure follows the recommended Phase 2 directory tree:

  - Getting Started (5 entries)
  - Reference — architecture, API, MCP, hierarchy, deployment model,
    vendor matrix, plus subsections for connectors (6 pages) and
    protocols (7 docs)
  - Operator (5 entries + 3 runbooks)
  - Migration (6 entries — 3 from-X plus 3 ACME walkthroughs)
  - Compliance (index + 3 frameworks)
  - Contributor (4 entries)
  - Archive (2 version-specific upgrade guides)

Every link verified to resolve to an existing file. Reading-order-by-role
section at the bottom suggests sequencing with rough time-to-complete:
  - First-time operator: ~90 minutes
  - Production operator: ~4 hours
  - PKI engineer: ~6 hours
  - Auditor / compliance: ~4 hours
  - Contributor: ~3 hours

Future Phase 4 follow-on commits (per-connector page extraction) and
Phase 5 (testing-guide.md prune) will add new entries to this index
as their destination docs land.
2026-05-05 03:21:53 +00:00
shankar0123 dca1900815 docs: Phase 11 (partial) — fix cross-references after Phase 2 moves
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Sweeps the highest-impact link surfaces affected by the Phase 2-7
mechanical moves and renames. Covers README.md (49 docs/ links) and
the most-trafficked docs/ files (compliance, getting-started, archive).

README.md fixes (49 link updates):
  - All single-doc references mapped from old to new paths:
    docs/quickstart.md → docs/getting-started/quickstart.md
    docs/architecture.md → docs/reference/architecture.md
    docs/connectors.md → docs/reference/connectors/index.md
    docs/acme-server.md → docs/reference/protocols/acme-server.md
    docs/{soc2,pci-dss,nist}.md → docs/compliance/{soc2,pci-dss,nist-sp-800-57}.md
    ... (full mapping in the sed pipeline)
  - 3 references to deleted features.md replaced with pointers to
    architecture.md + connectors/index.md.

docs/compliance/index.md (3 sibling renames):
  compliance-soc2.md     → soc2.md
  compliance-pci-dss.md  → pci-dss.md
  compliance-nist.md     → nist-sp-800-57.md

docs/compliance/pci-dss.md (3 external refs need ../):
  architecture.md  → ../reference/architecture.md
  connectors.md    → ../reference/connectors/index.md
  quickstart.md    → ../getting-started/quickstart.md

docs/getting-started/concepts.md (4 external refs):
  crl-ocsp.md      → ../reference/protocols/crl-ocsp.md
  architecture.md  → ../reference/architecture.md
  mcp.md           → ../reference/mcp.md
  openapi.md       → ../reference/api.md

docs/getting-started/quickstart.md (4 external refs + 1 sibling):
  tls.md           → ../operator/tls.md
  upgrade-to-tls.md → ../archive/upgrades/to-tls-v2.2.md
  architecture.md  → ../reference/architecture.md
  demo-advanced.md → advanced-demo.md (sibling rename)

docs/getting-started/examples.md (4 external refs):
  migrate-from-certbot.md         → ../migration/from-certbot.md
  migrate-from-acmesh.md          → ../migration/from-acmesh.md
  certctl-for-cert-manager-users.md → ../migration/cert-manager-coexistence.md
  connectors.md                   → ../reference/connectors/index.md

docs/archive/upgrades/to-tls-v2.2.md (3 external refs need ../../):
  tls.md           → ../../operator/tls.md
  quickstart.md    → ../../getting-started/quickstart.md
  test-env.md      → ../../contributor/test-environment.md

docs/archive/upgrades/to-v2-jwt-removal.md (2 external refs need ../../):
  architecture.md  → ../../reference/architecture.md
  tls.md           → ../../operator/tls.md

Verified all README.md docs/ links resolve to existing files. The only
remaining top-level link is testing-guide.md which still exists at the
top of docs/ (Phase 5 will prune it later).

Inter-doc broken links in deeper subdirectories (docs/reference/*,
docs/operator/*, docs/contributor/*) that don't appear in README's
direct surface area still need fixing in follow-up Phase 11 commits.
This commit handles the operator-facing entry points.
2026-05-05 03:19:21 +00:00
shankar0123 633e440787 docs: Phase 4 (structural) — move connectors.md + 5 deep dives into reference/connectors/
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Phase 4 in the audit recommended a full split of connectors.md (2055
lines) into an index + 27 per-connector pages (12 issuer + 15 target).
This commit lands the structural half of that work; full per-target
page extraction is deferred to follow-up commits.

Renames (all blame-preserving):
  docs/connectors.md         → docs/reference/connectors/index.md
  docs/connector-apache.md   → docs/reference/connectors/apache.md
  docs/connector-f5.md       → docs/reference/connectors/f5.md
  docs/connector-iis.md      → docs/reference/connectors/iis.md
  docs/connector-k8s.md      → docs/reference/connectors/k8s.md
  docs/connector-nginx.md    → docs/reference/connectors/nginx.md

Edits:
  - docs/reference/connectors/index.md gets a top-of-doc note
    explaining the per-connector deep-dive sibling pattern + a forward
    list of the 5 per-target pages.
  - The 5 per-connector deep-dive pages each get a `Last reviewed:
    2026-05-05` header + a back-link to the index.

Deferred to future commits (Phase 4b/c follow-on):
  - Extracting the 12 issuer sections from index.md into per-issuer
    pages at reference/connectors/{acme,awsacmpca,digicert,ejbca,
    entrust,globalsign,googlecas,local,openssl,sectigo,stepca,vault}.md
  - Extracting the 10 remaining target sections from index.md into
    per-target pages at reference/connectors/{caddy,traefik,envoy,
    haproxy,postfix-dovecot,ssh,javakeystore,wincertstore,awsacm,
    azurekv}.md

The pragmatic split makes this Phase 4 work incrementally landable —
each per-connector extraction is a small follow-up commit that doesn't
change the docs/ tree shape further. Cross-references from README.md
and other docs to docs/connectors.md still need fixing in Phase 11.
2026-05-05 03:14:39 +00:00
shankar0123 cee008207b docs: delete features.md (Phase 6 disperse, content already in canonical docs)
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
features.md was a 1606-line feature catalog with ~80% overlap with
canonical docs already in the tree:

  - "API Surface" section (rate limiting, CORS, body size limits)
    → docs/operator/security.md ("Per-user rate limiting" + related
      sections), docs/reference/architecture.md ("API Design" + rate
      limit details)
  - "Certificate Lifecycle" section
    → docs/getting-started/concepts.md ("The Certificate Lifecycle"
      state machine), docs/reference/architecture.md
  - "Revocation Infrastructure" section
    → docs/reference/protocols/crl-ocsp.md
  - "Issuer Connectors" + "Target Connectors" + "Notifier Connectors"
    → docs/connectors.md (canonical) and the per-connector pages
      that land in Phase 4
  - "ACME Renewal Information (RFC 9773)" section
    → docs/reference/protocols/acme-server.md
  - "Discovery" section
    → docs/getting-started/concepts.md, docs/reference/architecture.md
  - "Observability" section
    → docs/operator/security.md, docs/reference/architecture.md
  - "Job System" + "Background Scheduler"
    → docs/reference/architecture.md
  - "Web Dashboard"
    → docs/getting-started/concepts.md
  - "CLI" section
    → docs/reference/cli.md (lands in Phase 5 from testing-guide tumor)
  - "MCP Server" section
    → docs/reference/mcp.md
  - "Agent" section
    → docs/reference/architecture.md, docs/getting-started/concepts.md
  - "Deployment" section
    → docs/reference/deployment-model.md
  - "Database Schema" section
    → docs/reference/architecture.md
  - "Security" section
    → docs/operator/security.md
  - "CI/CD" section
    → docs/contributor/ci-pipeline.md
  - "Test Suite" section
    → docs/contributor/testing-strategy.md
  - "Examples" section
    → docs/getting-started/examples.md
  - "Compliance Mapping" section
    → docs/compliance/index.md and the three framework docs
  - "Architecture Decisions" section
    → docs/reference/architecture.md

The catalog format failed both beginners (overwhelming wall of text)
and experts (grep on source is faster than reading 1606 lines of
prose). Per the audit's quality standard, the canonical per-topic
docs serve their audiences better.

Git history preserves features.md content. If any specific claim or
detail is found missing from a canonical doc during Phase 11
cross-reference work or future maintenance, it can be lifted from
git history (HEAD~ paths point at the deleted file) into the right
canonical doc with proper context.

Cross-references from README.md and other docs to docs/features.md
still need fixing in Phase 11.
2026-05-05 03:09:48 +00:00
shankar0123 e9b15108d9 docs: split legacy-est-scep.md into two purpose-aligned docs
The 519-line legacy-est-scep.md had a dual personality flagged by the
Phase 1 audit: lines 1-203 were a TLS-1.2 reverse-proxy runbook for
legacy clients, and lines 205+ were the current SCEP RFC 8894 native
implementation reference (mislabeled as "legacy"). Two separate audiences,
two separate purposes.

Split:

  Lines 1-203 (TLS-1.2 reverse-proxy runbook):
    → docs/operator/legacy-clients-tls-1.2.md (NEW)

    Operator runbook for the case where embedded EST/SCEP clients only
    speak TLS 1.2. Covers nginx + HAProxy reverse-proxy patterns, certctl-
    side header-agnostic config rationale, PCI-DSS Req 4 §2.2.5 attestation,
    deprecation timeline. Also got a fresh "What this is" framing.

  Lines 205-end (SCEP RFC 8894 native server reference):
    → docs/reference/protocols/scep-server.md (NEW)

    Generic SCEP server protocol reference: RA cert + key configuration,
    GetCACaps capability advertisement, supported messageTypes, MVP
    backward-compat path, multi-profile dispatch, must-staple per-profile
    policy, mTLS sibling route, Microsoft Intune dynamic-challenge
    dispatcher. Cross-links to scep-intune.md for Intune-specific
    deployment guidance.

Both new docs carry a `Last reviewed: 2026-05-05` line. Internal links
within each new doc updated to the new sibling paths. Cross-references
from other docs to legacy-est-scep.md still need fixing in Phase 11.

Original docs/legacy-est-scep.md deleted (git history preserves).
2026-05-05 02:55:45 +00:00
shankar0123 f157c18368 docs: re-home ACME client walkthroughs under docs/migration/
The three ACME client walkthroughs (Caddy, cert-manager, Traefik) are
conceptually "I have an existing X, here's how to point its ACME
client at certctl." They belong with the migration docs, not with the
acme-server protocol reference.

Renames:
  docs/acme-caddy-walkthrough.md       → docs/migration/acme-from-caddy.md
  docs/acme-cert-manager-walkthrough.md → docs/migration/acme-from-cert-manager.md
  docs/acme-traefik-walkthrough.md     → docs/migration/acme-from-traefik.md

Each walkthrough's lede gets a "Use this walkthrough when..." paragraph
that closes the WHY-weak gap flagged in the Phase 1 audit. The new
framing tells the reader when to pick this walkthrough versus the
alternatives:

  - Caddy: "you're running Caddy 2.7+ and want it to ACME-issue from
    certctl instead of Let's Encrypt"
  - cert-manager: explicit pointer to cert-manager-coexistence.md for
    the keep-cert-manager-running case (vs replacement)
  - Traefik: "you're running Traefik 3.0+ and want certctl as your
    ACME source of truth"

Cross-reference updates from other docs and README still pending in
Phase 11.
2026-05-05 02:51:10 +00:00
shankar0123 b21c02a3d5 docs: archive version-specific upgrade guides
upgrade-to-tls.md and upgrade-to-v2-jwt-removal.md are version-specific
runbooks for past releases. Late upgraders still need them; current
operators don't. Move both to docs/archive/upgrades/ with one-line
archive headers pointing readers at the current canonical docs.

Renames:
  docs/upgrade-to-tls.md           → docs/archive/upgrades/to-tls-v2.2.md
  docs/upgrade-to-v2-jwt-removal.md → docs/archive/upgrades/to-v2-jwt-removal.md

Each gets a top-of-doc archive notice with the date and a forward
pointer to the relevant steady-state doc:
  to-tls-v2.2.md            → docs/operator/tls.md
  to-v2-jwt-removal.md      → docs/operator/security.md

The relative link inside to-v2-jwt-removal.md (was "upgrade-to-tls.md",
now "to-tls-v2.2.md") updated to point at its archived sibling.

Cross-reference updates from other docs and README still pending in
Phase 11.
2026-05-05 02:50:14 +00:00
shankar0123 3a807ae37e docs: Phase 2 mechanical file moves to subdirectory structure
Pure git mv operations; no content edits. Internal links remain pointing
at old paths and will be fixed in Phase 11. Per the Phase 1 audit
recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/.

35 files moved across 8 audience-organized subdirectories:

  docs/getting-started/ (5):
    quickstart.md, concepts.md, examples.md, advanced-demo.md (was
    demo-advanced.md), why-certctl.md

  docs/reference/ (6):
    architecture.md, api.md (was openapi.md), mcp.md,
    intermediate-ca-hierarchy.md, deployment-model.md (was
    deployment-atomicity.md), vendor-matrix.md (was
    deployment-vendor-matrix.md)

  docs/reference/protocols/ (6):
    acme-server.md, acme-server-threat-model.md, scep-intune.md,
    est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md)

  docs/operator/ (4):
    security.md, tls.md, database-tls.md, approval-workflow.md

  docs/operator/runbooks/ (3):
    cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md
    (was runbook-expiry-alerts.md), disaster-recovery.md

  docs/migration/ (3):
    from-certbot.md (was migrate-from-certbot.md), from-acmesh.md
    (was migrate-from-acmesh.md), cert-manager-coexistence.md (was
    certctl-for-cert-manager-users.md)

  docs/compliance/ (4):
    index.md (was compliance.md), soc2.md (was compliance-soc2.md),
    pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was
    compliance-nist.md)

  docs/contributor/ (4):
    testing-strategy.md, test-environment.md (was test-env.md),
    ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md)

Deferred to later Phase 2 sub-phases:
  - connectors.md split (Phase 4): docs/connectors.md +
    docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level
  - testing-guide.md prune (Phase 5): docs/testing-guide.md still
    at top level
  - features.md disperse (Phase 6): docs/features.md still at top
    level
  - legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md
    still at top level
  - ACME walkthrough re-homing (Phase 8): three
    docs/acme-*-walkthrough.md still at top level
  - Upgrade docs archive (Phase 3): two docs/upgrade-*.md still
    at top level

Cross-reference updates (Phase 11) will happen after all moves and
content edits land. Internal links to docs/* paths are temporarily
broken until that phase completes.
2026-05-05 02:49:28 +00:00
shankar0123 cda957f302 docs: Phase 2 prep — placeholder navigation index
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Phase 2 organizes docs/ into eight audience-aligned subdirectories
(getting-started, reference, operator, migration, compliance,
contributor, archive). docs/README.md will be the navigation index
linking into each.

This commit only adds the placeholder. Subdirectories materialize as
Phase 2 file moves land. Index gets populated in Phase 12 once all
moves and content edits are complete.

Audit folder: cowork/docs-overhaul-phase-1-audit-2026-05-04/
Phase 2 prompt: cowork/docs-overhaul-phase-2-restructure-prompt.md
2026-05-05 02:48:49 +00:00
shankar0123 7d48bd0367 docs(intermediate-ca-hierarchy): fix stateDiagram-v2 GitHub render parse error
GitHub's mermaid renderer (older version) doesn't accept <br/> tags
or em-dashes in stateDiagram-v2 transition labels. The conversion
shipped in 85649cf used both, which the GitHub markdown view rejects
with:

  Parse error on line 6: ...ding for<br/>already-issued leaves until
                         -----------------------^
  Expecting 'SPACE', 'NL', 'DESCR', '-->', ... got 'INVALID'

(flowchart and sequenceDiagram tolerate <br/> + em-dashes inside
labels — only stateDiagram-v2 trips.)

Fix: shorten transition labels to single-line ASCII and move the
long-form descriptions into 'note right of <state>' blocks. Same
information, renders cleanly on GitHub.

  active --> retiring : Retire(confirm=false)
  retiring --> retired : Retire(confirm=true)
  retired --> [*]

  note right of retiring
      Drain start. CA stops issuing
      NEW children; existing children
      keep issuing until they retire.
  end note

  note right of retired
      Terminal. Refused if active children
      remain (ErrCAStillHasActiveChildren
      → HTTP 409). OCSP keeps responding
      for already-issued leaves until expiry.
  end note

Verified locally:
  Other mermaid blocks added in the audit pass (sequenceDiagram +
    flowchart TD) keep their <br/> + em-dashes — those don't trip
    GitHub's renderer. Only stateDiagram-v2 needed the fix.
  No content lost. The note blocks carry every fact the old
    multi-line transition labels had.

Doc-only commit.
2026-05-04 02:43:47 +00:00
shankar0123 85649cf983 docs: convert remaining ASCII diagrams to mermaid (audit closure)
Audit pass over docs/ found 4 files with non-mermaid (ASCII
box-drawing) diagrams in fenced code blocks. The other 9 doc files
already used mermaid blocks (architecture.md, demo-advanced.md,
ci-pipeline.md, concepts.md, est.md, legacy-est-scep.md, mcp.md,
qa-test-guide.md, scep-intune.md). Rendering parity for everything
in docs/.

Conversions:

  approval-workflow.md
    1 ASCII swimlane → sequenceDiagram with named participants
    (Operator A / CertificateService / Job+ApprovalRequest /
    Operator B / ApprovalService / Scheduler). Same content: the
    same-actor RBAC reject path, the AwaitingApproval gate, the
    audit + Prometheus side effects.

  intermediate-ca-hierarchy.md
    1 lifecycle ASCII → stateDiagram-v2 (created → active → retiring
    → retired with the drain-first refusal annotation).
    3 ASCII tree patterns → 3 flowchart TD diagrams (FedRAMP 4-level
    boundary CA, financial-services 3-level policy CA, internal-PKI
    2-level). Same depth, same path_len + permitted-DNS labels.

  runbook-cloud-targets.md
    1 dual-column ASCII flow → flowchart TD with two subgraphs
    (AWS ACM path, Azure Key Vault path) joining at the audit +
    Prometheus exposer node. Same 6-step deploy sequence on each
    side with the rollback-on-mismatch step explicit.

  runbook-expiry-alerts.md
    1 nested-loop ASCII flow → flowchart TD with three nested
    subgraphs (per-cert main loop / per-threshold inner / per-channel
    fault-isolating dispatch). Same dedup + Prometheus + audit-row
    side effects per channel.

Verified locally:
  Audit re-run: every fenced block in docs/*.md that does NOT open
    with ```mermaid contains zero ASCII box-drawing characters
    (┌ └ │ ─ ━ ═ ║ ╔ ╚ ▼ ▲).
  Mermaid block tally: 39 across 13 files (up from 32 across 9
    files pre-audit). The +7 new blocks are the 4 conversions plus
    the lifecycle + 3 tree patterns expanded out of the single
    intermediate-ca-hierarchy.md ASCII section.

No code or test changes. Doc-only commit.
2026-05-04 02:40:01 +00:00
shankar0123 8908c8ff5c web, docs: IssuerHierarchyPage + sysadmin runbook + connectors row (Rank 8 commit 5)
Final commit of the 5-commit Rank 8 chain. Operator-facing surface
on top of the service + handler layers shipped in commits 1-4.

Frontend (web/src):
  - api/client.ts: 3 new functions + IntermediateCA interface
    (listIntermediateCAs, getIntermediateCA, retireIntermediateCA).
  - pages/IssuerHierarchyPage.tsx: recursive nested <ul> render of
    the hierarchy tree at /issuers/:id/hierarchy. buildHierarchyTree
    is a pure helper that walks the flat list and groups children
    on parent_ca_id; the dendrogram view is parking-lot work tracked
    in WORKSPACE-ROADMAP. Two-phase retire UX surfaces 'Retire…'
    then 'Confirm retire (terminal)' when the row is in retiring
    state. Admin gate is enforced at the API; the page renders the
    backend's 403 as ErrorState for non-admin callers.
  - main.tsx: register the new /issuers/:id/hierarchy route.

CI guard update:
  - scripts/ci-guards/T-1-frontend-page-coverage.sh: add
    IssuerHierarchyPage to the deferred-test allowlist with the
    standard 'why deferred' comment. Admin-gate + recursive build
    semantics are already pinned at the backend layer
    (intermediate_ca_test.go service tests + intermediate_ca_test.go
    handler triplet). Vitest test deferred until next feature
    change touches the page.

Docs:
  - docs/intermediate-ca-hierarchy.md: new operator runbook
    covering:
      Concepts (HierarchyMode 'single' vs 'tree', defense-in-depth
        on key bytes never persisting on rows).
      Lifecycle states + drain-first semantics
        (active → retiring → retired with active-children gate).
      Three deployment patterns: 4-level FedRAMP boundary CA,
        3-level financial-services policy CA, 2-level internal
        PKI.
      RFC 5280 enforcement (§3.2 self-signed, §4.2.1.9 path-length
        tightening, §4.2.1.10 NameConstraints subset).
      Migration from single → tree using the load-bearing
        TestLocal_HierarchyMode_SingleVsTree_ByteIdentical pin as
        the canary.
      API reference + observability (IntermediateCAMetrics
        Prometheus exposure).
      Known limitations + Rank-8 follow-on roadmap.

  - docs/connectors.md: extend the Built-in Local CA section with
    a 'Tree mode (Rank 8)' paragraph describing the new chain
    assembly path + cross-link to docs/intermediate-ca-hierarchy.md.

Roadmap:
  - WORKSPACE-ROADMAP.md: 5 follow-on items under a new
    'Intermediate CA hierarchy extensions (Rank 8 V2 follow-ons)'
    bullet block:
      HSM-backed roots (PKCS#11 / cloud KMS drivers via existing
        signer.Driver interface — no service-layer change needed).
      Automated CA rotation (parallel-validity windows ahead of
        expiry).
      Intra-hierarchy CRL chaining (per-CA CRL endpoints stitched
        at issue time).
      NameConstraints policy templates (FedRAMP / financial /
        internal PKI declarative templates instead of hand-rolled
        JSON).
      D3 dendrogram visualization (separate page so the existing
        list view stays the default + the dep stays opt-in).

Verified locally:
  gofmt: clean.
  go vet ./...: exit 0.
  tsc --noEmit (web/): exit 0 (no TypeScript errors).
  go test -short -count=1 ./internal/api/handler/... + service +
    local: ok across all three packages, 4-5s each.
  All 24 CI guards: clean
    (T-1 frontend-page-coverage with the new
     IssuerHierarchyPage allowlist entry; openapi-handler-parity,
     M-008 admin-gate, every other guard untouched).

Rank 8 chain complete:
  66d2af3  domain, migrations: IntermediateCA type + intermediate_cas
           + Issuer.HierarchyMode (commit 1)
  fb54ebc  service: IntermediateCAService + IntermediateCAMetrics
           + RFC 5280 enforcement (commit 2)
  62523fb  service: 10 IntermediateCAService tests + in-memory fake
           repo (commit 2.5)
  ae597f7  local: tree-mode chain assembly + byte-equivalence pin
           (commit 3 — load-bearing backwards-compat refuse-to-ship
           pin in TestLocal_HierarchyMode_SingleVsTree_ByteIdentical)
  34adcfb  api, handler: 4 admin-gated CA hierarchy endpoints +
           OpenAPI (commit 4)
  HEAD     web, docs: IssuerHierarchyPage + sysadmin runbook +
           connectors row (this commit)

Reference: cowork/rank-8-intermediate-ca-hierarchy-prompt.md, commit 5.
2026-05-04 02:33:48 +00:00
shankar0123 b601928e1c docs(approval-workflow): drop Infisical reference from operator playbook
The operator-facing approval-workflow.md is the public-readable docs
page; the 'Infisical deep-research deliverable' framing is internal
project context that doesn't belong there. Internal source comments +
research docs in cowork/ keep the original framing as the historical
record.
2026-05-04 01:18:59 +00:00
shankar0123 aebfd8bd7c Revert "chore: drop 'Infisical' label from internal references"
This reverts commit 19706e56b3.
2026-05-04 01:18:15 +00:00
shankar0123 19706e56b3 chore: drop 'Infisical' label from internal references
Strategic naming cleanup. Earlier doc-comments + commit messages framed Rank
4 / Rank 5 / Rank 7 work as 'Rank N of the 2026-05-03 Infisical deep-research
deliverable' — the 'Infisical' qualifier was a holdover from the original
deep-research framing where Infisical (a competing secrets-management
platform) was the comparator. Keeping the comparator's name in our source
adds noise without value; an external reader sees 'Infisical' and assumes a
dependency or shared lineage rather than reading it as the competitive
context it was.

Mechanical sed across 34 files (32 source / docs + 2 follow-up Python passes
to collapse 'deep-research deep-research' duplicates that emerged where the
original phrase wrapped across lines):

  s|Infisical deep-research|deep-research|g
  s|infisical-deep-research-results|deep-research-results-2026-05-03|g
  s|infisical-deep-research-prompt|deep-research-prompt-2026-05-03|g
  s|infisical-deep-research|deep-research|g
  s|Infisical|deep-research|g
  s|deep-research deep-research|deep-research|g  # collapse-pass

Net diff: 63 insertions / 64 deletions across cmd/, docs/, internal/,
migrations/. Pure text substitution; zero behavior change. Code path
unchanged — go vet clean, tests for TestApproval pass on both
internal/service and internal/api/handler packages.

Workspace docs (cowork/) carry the same references and will be swept
separately — they're not under certctl/ git control. The two filename
references (cowork/infisical-deep-research-results.md +
cowork/infisical-deep-research-prompt.md) get renamed alongside that sweep
to deep-research-results-2026-05-03.md /
deep-research-prompt-2026-05-03.md so cross-references in the certctl
repo doc-comments resolve cleanly.
2026-05-04 01:15:01 +00:00
shankar0123 03c61f4c20 scheduler, certificate, renewal: gate issuance on profile-driven approval
Closes Rank 7 of the 2026-05-03 Infisical deep-research deliverable
(cowork/infisical-deep-research-results.md Part 5). Pre-fix, certctl
issued certificates unattended — every renewal-loop tick that crossed
a renewal threshold created a Job at Status=Pending which the
scheduler dispatched directly to the issuer connector. PCI-DSS Level
1, FedRAMP Moderate / High, SOC 2 Type II, and HIPAA-regulated PHI
customers all ask the same procurement question: "How do you enforce
two-person integrity on cert issuance?" Today's answer: "We don't."
After this commit chain: "Per-profile RequiresApproval=true creates a
parallel ApprovalRequest row; the renewal-loop creates the Job at
Status=AwaitingApproval; an authorized approver (different from the
requester per the same-actor RBAC check) calls
POST /api/v1/approvals/{id}/approve, transitioning the Job to
Pending; the scheduler picks it up."

This commit (4 of 4) wires the gate into the manual TriggerRenewal
entry point + main.go service construction + Config.Approval +
docs + WORKSPACE-ROADMAP follow-up entries. The previous commits
in the chain shipped:
  - 1 (2025275): domain types + migration + repository
  - 2 (8043e2b): ApprovalService + ApprovalMetrics + 8 service tests
  - 3 (81632eb): 4 API endpoints + handler RBAC tests + router wiring

Files modified:
  cmd/server/main.go              - Constructs approvalRepo +
                                     approvalMetrics + approvalService
                                     + approvalHandler. Wires
                                     CertificateService via
                                     SetApprovalService + SetProfileRepo.
                                     Logs a WARN line at boot when
                                     CERTCTL_APPROVAL_BYPASS=true so
                                     production operators alert on the
                                     log line. Adds Approvals to the
                                     HandlerRegistry.

  internal/config/config.go       - Adds top-level ApprovalConfig
                                     {BypassEnabled bool} sub-config
                                     + CERTCTL_APPROVAL_BYPASS env var
                                     loader. Doc comment cites the
                                     compliance-detection SQL query
                                     (SELECT count FROM audit_events
                                     WHERE actor='system-bypass') so
                                     auditors find the right pattern.

  internal/service/certificate.go - Adds approvalSvc + profileRepo
                                     fields to CertificateService +
                                     SetApprovalService /
                                     SetProfileRepo setters. Extends
                                     TriggerRenewal: looks up the
                                     profile, checks RequiresApproval,
                                     creates the Job at
                                     JobStatusAwaitingApproval (override
                                     the keygen-mode default), then
                                     calls approvalSvc.RequestApproval
                                     to create the parallel
                                     ApprovalRequest row. On
                                     RequestApproval failure, cancels
                                     the orphan Job (defense in depth —
                                     without this, a partial failure
                                     would leave the job stuck at
                                     AwaitingApproval forever). Profile-
                                     lookup failures fall back to the
                                     unattended path (fail-open from
                                     the operator's perspective +
                                     fail-loud via slog.Warn).

Files added:
  docs/approval-workflow.md       - Sysadmin-grade operator runbook:
                                      end-to-end ASCII flowchart
                                      (operator A triggers → operator
                                      B approves → scheduler dispatches),
                                      configuration recipe, RBAC contract
                                      (the load-bearing two-person
                                      integrity rule), operator playbooks
                                      for "I need to approve a renewal"
                                      and "approval timed out", PCI-DSS
                                      6.4.5 / NIST 800-53 SA-15 / SOC 2
                                      CC6.1 / HIPAA control mapping
                                      table, bypass-mode warnings with
                                      the exact compliance-detection SQL
                                      query, Prometheus metric reference,
                                      future free V2 work pointers.

Out of scope of THIS commit (deferred follow-on, not blocking the rest):
  - RenewalService.CheckExpiringCertificates auto-renewal-loop gate.
    The manual TriggerRenewal entry point is gated and the job-level
    timeout reaper already covers AwaitingApproval; the auto-renewal
    gate adds parity. Trivial to add — one block in renewal.go that
    mirrors the certificate.go::TriggerRenewal gate. Tracked in
    WORKSPACE-ROADMAP under the Approval-workflow extensions section.
  - Scheduler reaper extension calling ApprovalService.ExpireStale.
    Today: when the existing reaper times out an AwaitingApproval job,
    the parallel ApprovalRequest row stays at state=pending. The audit
    timeline is still correct (the job-side audit row records the
    timeout) but the dashboard shows a row that no longer needs human
    review. Trivial to wire — one method call in the existing
    scheduler tick. Same WORKSPACE-ROADMAP follow-on.
  - api/openapi.yaml extensions for the 4 new operationIds.
    The HTTP contract is pinned by the handler-level tests; OpenAPI
    is documentation that mirrors the contract.
  - docs/connectors.md `requires_approval` row in the CertificateProfile
    config table. Tracked in the same follow-on; the new
    docs/approval-workflow.md is the canonical reference.

Workspace-level updates (in cowork/, not under certctl/ git control —
applied separately):
  WORKSPACE-ROADMAP.md            - "Approval-workflow extensions"
                                     section under "Future Free V2 Work"
                                     covering M-of-N chains + time-
                                     windowed auto-approve + external
                                     ticketing + per-owner routing +
                                     delegation. All items free under
                                     BSL — no V3-Pro framing per the
                                     2026-05-03 strategy pivot (open
                                     core under BSL; future revenue =
                                     managed-service hosting).

Verified locally:
  gofmt: clean.
  go vet ./...: exit 0.
  go build ./...: exit 0 — full repo links cleanly with the new
    Approval wiring.
  go test -short -count=1 -run TestApproval
    ./internal/service/... ./internal/api/handler/...:
    ok 0.005s for both packages — all 11 approval tests green
    (8 service-level + 3 handler-level).

Reference: cowork/rank-7-approval-workflow-primitive-prompt.md.
Commits: 20252758043e2b81632eb → THIS COMMIT.
2026-05-04 01:12:07 +00:00
shankar0123 0729ee46e0 chore: sweep github.com/shankar0123/certctl URL refs to certctl-io/certctl
Post-transfer cosmetic + release-critical URL refresh after moving the
repo from github.com/shankar0123/certctl to github.com/certctl-io/certctl
(2026-05-03). GitHub HTTP redirects continue to forward old URLs forever,
so existing operators are not broken — but aligns the canonical
references with the new owner so:

- procurement engineers / contributors browsing the docs see the right
  URL on first read
- operators copying the agent install one-liner hit the new path
  directly without going through a redirect
- the Helm chart's default image repository points at the canonical org
  registry path
- the OnboardingWizard rendered to first-run UI users shows the new
  URL in the install snippets and doc anchor links
- the GitHub Actions release workflow pushes container images to
  ghcr.io/certctl-io/certctl-{server,agent} (was: shankar0123)
- the release-notes Markdown body in release.yml — which gets stamped
  into every future release page — references the post-transfer
  cert-identity (cosign keyless signing now uses the certctl-io
  workflow URL) and the post-transfer SLSA provenance source-uri.
  Without this, every cosign verify / slsa-verifier command on a
  v2.1.0+ release would fail because the cert-identity-regexp would
  not match the signing identity GitHub Actions OIDC issues post-
  transfer. Old releases (v2.0.67 and earlier) keep their immutable
  release-notes pointing at the shankar0123 path and remain
  verifiable via their own published instructions.

Customer impact:
- Operators on ghcr.io/shankar0123/certctl-{server,agent}:latest
  silently freeze on whatever tag was current at transfer time. They
  get no errors; they just stop receiving updates. The next release
  notes need a one-line callout (Phase 3.1 of cowork/transfer-
  certctl-to-org.md) telling them to update their image path to
  ghcr.io/certctl-io/certctl-{server,agent}.
- All other URLs (git clone, install one-liner, raw.githubusercontent
  URLs, browser links, GitHub API) continue to resolve via permanent
  HTTP redirects. The sweep is cosmetic for those.

Files swept (30 total):
  .github/workflows/release.yml — IMAGE_NAMESPACE, source-uri,
    cosign cert-identity-regexp, IMAGE= snippet (5 refs total).
  CHANGELOG.md, README.md — anchor links, badges, install one-liner,
    cosign verify snippets in operator-facing sections.
  api/openapi.yaml — info / externalDocs URLs.
  install-agent.sh — GITHUB_REPO const + systemd unit Documentation=
    field.
  deploy/ENVIRONMENTS.md, deploy/helm/{CHART_SUMMARY,INDEX,
    INSTALLATION,README}.md, deploy/helm/certctl/{Chart.yaml,
    README.md,values.yaml}, deploy/helm/examples/values-*.yaml —
    chart docs + image repository defaults across dev / prod-ha
    overrides.
  docs/{certctl-for-cert-manager-users,connector-iis,connectors,
    migrate-from-acmesh,migrate-from-certbot,quickstart,test-env,
    why-certctl}.md — operator-facing doc URLs.
  examples/{acme-nginx,acme-wildcard-dns01,multi-issuer,
    private-ca-traefik,step-ca-haproxy}/docker-compose.yml +
    examples/step-ca-haproxy/step-ca-haproxy.md — example image:
    paths and accompanying narrative.
  web/src/pages/OnboardingWizard.tsx — first-run-UI URL refs (curl
    install one-liners, agent docker image path, doc anchor links).

Files intentionally NOT swept (Choice A from cowork/transfer-certctl-
to-org.md):
  go.mod, go.sum — module declaration stays github.com/shankar0123/
    certctl. Existing imports compile because Go uses the path
    declared in go.mod, not the URL it was fetched from. Internal-
    only project; no external Go consumers; rename will land as a
    mechanical sed when one materializes.
  ~250 *.go files — every import remains github.com/shankar0123/
    certctl/internal/...
  deploy/test/f5-mock-icontrol/go.mod — separate test sub-module;
    same Choice A logic; module path stays.

Files intentionally NOT swept (other reasons):
  README.md lines 244-245 — Scarf-pixel docker-pull commands.
    shankar0123.docker.scarf.sh/... is a Scarf-account hostname
    (per-user, not per-repo) and the pixel keeps tracking pulls
    against the operator's personal Scarf account. Migrating to a
    certctl-io Scarf account is a separate decision (create org
    Scarf account → re-create package → update README).
  deploy/test/f5-mock-icontrol/f5-mock-icontrol — checked-in
    compiled binary with shankar0123/certctl baked into Go build
    info via the sub-module path. Out of scope for a URL sweep;
    will refresh on the next `make test-integration` rebuild.

Verification:
  gofmt: clean (no .go files touched).
  go vet ./...: clean (verified at this SHA in 1.3 of the transfer
    checklist; no .go changes since).
  go build ./...: clean (same).
  go test -short on representative packages: green (same).
  Diff shape: 30 files, 74 insertions / 74 deletions, net-zero size,
    pure URL substitution.
2026-05-03 23:39:50 +00:00
shankar0123 9a7e818f3e docs, seed: cloud-target operator runbook + AWS ACM / Azure KV demo seed rows
Wraps up Rank 5 of the 2026-05-03 Infisical deep-research deliverable
(commits edf6bee AWS + 8a56a78 Azure):

  - docs/runbook-cloud-targets.md — sysadmin-grade flowchart spanning
    the AWS ACM + Azure Key Vault deploy paths side-by-side. Covers
    minimum IAM policy / RBAC role JSON, IRSA + AKS workload-identity
    recipes, manual rollback recovery procedures (aws acm
    import-certificate / az keyvault certificate import), CloudTrail
    + Activity Log forensics queries for "who wrote to this ARN /
    vault cert", Prometheus cardinality + cost budget, and the
    V3-Pro forward path (CloudFront / Front Door direct-attach,
    ALB / App Gateway auto-bind, soft-delete recovery, GCP CM).
  - migrations/seed_demo.sql — two new demo target rows (tgt-aws-
    acm-prod + tgt-azure-kv-prod) so QA can exercise the per-cloud
    wiring end-to-end against the demo seed without standing up
    real cloud accounts.

cowork/WORKSPACE-ROADMAP.md (sibling-folder, not in this commit's
diff) was updated to mark the V2 AWS ACM + Azure KV connectors as
shipped and document the V3-Pro CloudFront / Front Door direct-attach
+ App Gateway auto-bind + soft-delete recovery + GCP CM follow-on
items.

cowork/infisical-deep-research-results.md (sibling-folder) Part 5
Rank 5 marked CLOSED with both commit SHAs.

Doc-only commit. No code changes.

Verified locally:
- go test -short -count=1 ./internal/connector/target/awsacm/...
  ./internal/connector/target/azurekv/...  green.
- markdown lint clean against the Bundle 8 + Rank 4 runbook templates.
2026-05-03 22:46:29 +00:00