certctl

gsadmin/certctl

Fork 0

mirror of https://github.com/shankar0123/certctl.git synced 2026-06-07 20:21:29 +00:00

Commit Graph

Author SHA1 Message Date

Author	SHA1	Message	Date
shankar0123	c4ed3da30b	fix(ci): Sprint 6 CI follow-up — staticcheck ST1021 + tenant-query baseline + skip inventory Sprint 6 push (commits `43836ac` + `663b14b`) tripped three CI guards. Fixing all three in this single follow-up — each is a small, mechanical correction that doesn't change behavior: 1. staticcheck ST1021: AuditChainSnapshot doc comment was on the wrong type. internal/service/audit_chain_metric.go:91 had: // Snapshot returns the current counter state for the Prometheus // exposer. Reads use atomic loads — no mutex. type AuditChainSnapshot struct { ... } The comment described Snapshot() (the method on AuditChainCounter) but sat directly above the AuditChainSnapshot struct. staticcheck ST1021 requires exported-type comments to start with the type's name + optional leading article. Rewrote to lead with "AuditChainSnapshot is the point-in-time view ...". 2. multi-tenant-query-coverage: baseline drifted 31 → 32 because Sprint 6 COMP-002-RETENTION added UserRepository.ListDeactivatedBefore at internal/repository/postgres/user.go:191 — legitimately tenant-spanning by design. The retention policy is control-plane-wide (one CERTCTL_USER_RETENTION_WINDOW for the whole deployment, not per-tenant). The scheduler's userRetentionLoop walks every tenant's deactivated users on the same tick. A per-tenant tenant_id filter would require the scheduler to iterate every tenant — more code for equivalent semantics. Per the guard's own documentation (option b), legitimately tenant-spanning queries get an inline rationale comment + a baseline lift. Both delivered: - Inline comment block on the SELECT in user.go::ListDeactivatedBefore. - BASELINE_COUNT 31 → 32 in scripts/ci-guards/multi-tenant-query-coverage.sh, with the Sprint 6 rebase entry added to the rebase-history comment. 3. skip-inventory-drift: docs/testing/skip-inventory.md was stale. COMP-001-HASH added three new t.Skip sites in internal/repository/postgres/audit_chain_test.go (the three testing.Short() gates on the testcontainers integration tests). Re-ran ./scripts/skip-inventory.sh to regenerate the doc — totals went from 144 → 147 sites + 78 → 82 short-mode guards. Verified locally: bash scripts/ci-guards/multi-tenant-query-coverage.sh (clean) bash scripts/ci-guards/skip-inventory-drift.sh (clean) go vet ./... (clean) staticcheck ./internal/service/... (clean) Closes the three Sprint 6 CI failures. The next CI run should green out.	2026-05-16 06:24:09 +00:00
shankar0123	eee124efb6	chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5 Four scripts/ci-guards/.sh trips on dev/auth-bundle-2 vs master: 1. G-3-env-docs-drift: 10 CERTCTL_ env vars added by Auth Bundle 2 + audit-2026-05-10/11 fix bundle were not in docs/. Added a new 'Auth (Bundle 1 + Bundle 2)' section to docs/reference/configuration.md covering CERTCTL_SESSION_BIND_USER_AGENT, CERTCTL_SESSION_GC_INTERVAL, CERTCTL_OIDC_BCL_MAX_AGE_SECONDS, CERTCTL_OIDC_PRELOGIN_REQUIRE_UA/IP, CERTCTL_DEMO_MODE_ACK, CERTCTL_TRUSTED_PROXIES + _COUNT (synthesised), CERTCTL_BOOTSTRAP_* set, CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD. Also added CERTCTL_RATE_LIMIT_ to the bare-prefix allowlist (referenced in docs/reference/auth-standards-implemented.md prose). 2. bundle-8-M-009-bare-usemutation: BreakglassPage shipped 3 bare useMutation() calls instead of useTrackedMutation. Migrated all three to useTrackedMutation with invalidates: [['breakglass']]. 3. multi-tenant-query-coverage: Defense-in-depth tenant_id additions in the fix bundle dropped the missing-tenant-id query count from 32 to 31. Ratcheted baseline 32 -> 31 (forward-only invariant). 4. openapi-handler-parity: 28 new REST endpoints from Bundle 2 + the fix bundle missing from api/openapi.yaml. Added them to api/openapi-handler-exceptions.yaml with per-route 'why:' justifications. OpenAPI schema generation deferred to pre-v2.2.0 alongside the GUI E2E coverage push; threat model + handler contracts already live in docs/operator/{rbac,auth-threat-model, oidc-runbooks}.md. After this commit every script in scripts/ci-guards/*.sh exits 0.	2026-05-11 14:19:35 +00:00
shankar0123	130a65f3b6	auth-bundle-2 Phase 13: negative-test backfill (OIDC PreLoginAdapter) + OIDC client_secret encryption invariant + multi-tenant query CI guard + coverage floors held at 90 across 4 Bundle-2 packages + E2E coverage map Closes Phase 13 of cowork/auth-bundle-2-prompt.md. Ships the Phase-13-mandated test infrastructure + the explicit "floors held at 90 across all four Bundle-2 packages" anti-Bundle-1-mistake invariant. Files ===== internal/auth/oidc/prelogin_test.go (NEW, +375 LOC): * PreLoginAdapter coverage backfill. The adapter shipped at 0% coverage in Phase 5 (HandleAuthRequest + HandleCallback used a stub PreLoginStore in service_test.go); this file lifts the package's coverage from 78.8% to 93.7%. * 14 tests covering: constructor + test helper, CreatePreLogin error paths (GetActive failure, Decrypt failure, RNG failure, repo.Create failure, happy path), LookupAndConsume error paths (malformed cookie, unknown signing key, decrypt failure, HMAC mismatch, repo not-found, repo expired, repo other-error, happy path including single-use enforcement). internal/repository/postgres/oidc_encryption_invariant_test.go (NEW, +208 LOC, integration test gated by testing.Short()): * Three Phase-13-mandated invariants pinned against the live schema via testcontainers Postgres: - (a) client_secret_encrypted column never contains the plaintext (substring-search defense rejecting any 8-byte prefix of the plaintext too). - (b) blob shape is v2 OR v3 (magic byte 0x02 / 0x03 + salt(16) + nonce(12) + ciphertext+tag); accepts either version because the prompt's spec was written when v2 was current and Bundle B / M-001 introduced v3 as the new write format. Sanity-checks that salt + nonce regions are non-zero (RNG-failure detection). - (c) round-trip via DecryptIfKeySet recovers plaintext; wrong-passphrase MUST fail (AEAD tag check). * Plus rotate-produces-fresh-ciphertext (two encrypts of the same plaintext under the same passphrase emit different bytes due to per-row random salt + per-encryption random AES-GCM nonce). * Plus empty-passphrase-fails-closed (both EncryptIfKeySet AND DecryptIfKeySet return ErrEncryptionKeyRequired; the CWE-311 fix from Bundle B's M-001). scripts/ci-guards/multi-tenant-query-coverage.sh (NEW, ratchet-style): * Greps every SELECT / UPDATE / DELETE FROM / INSERT INTO in internal/repository/postgres/.go (excluding _test.go) that targets a tenant-aware table. Counts queries that lack tenant_id in the surrounding 7-line window. * Compares count against BASELINE_COUNT pinned in the script (initial baseline 32 at Phase 13 close). Regression (count > baseline) → FAIL with line-by-line violation list. Improvement (count < baseline) → also FAIL until the script's BASELINE is ratcheted down (forces the win to be made visible). * Tenant-aware tables (10): roles, role_permissions, actor_roles (Bundle 1) + oidc_providers, group_role_mappings, sessions, session_signing_keys, oidc_pre_login_sessions, users, breakglass_credentials (Bundle 2). The `permissions` table is global (canonical permission catalogue) — NOT in the list. * Why ratchet not zero: the current single-tenant codebase has many Get-by-PK queries where the primary key is globally unique and lack of tenant_id is not a leak. Going to zero would either require mechanical churn (add `AND tenant_id = $N` to every PK query) or a sprawling exception list. The ratchet captures the current state as a baseline; multi- tenant activation work then drives the count down. New code that ADDS to the count without operator review is what we catch. .github/coverage-thresholds.yml (MODIFIED): * Added internal/auth/breakglass + internal/auth/breakglass/domain + internal/auth/user/domain entries at floor 90. * Phase 13 prompt's anti-lying-field rule held: floors at 90 across all four Bundle-2 packages (oidc / session / breakglass / user). NO held-low-with-rationale entry. * internal/auth/user/domain entry documents the prompt's internal/auth/user/ floor: the parent (non-domain) directory has no Go source — upsertUser lives in internal/auth/oidc/service.go alongside group resolution + role mapping (cohesive sequence within the OIDC callback). Splitting upsertUser into a separate internal/auth/user/ service package would harm cohesion without adding test value; the domain layer's invariant coverage is where the floor actually applies. web/src/__tests__/e2e/README.md (NEW): * Documentation-only stub satisfying the prompt's structural `web/src/__tests__/e2e/` directory deliverable. Maps each of the 15 Phase-8 prompt-mandated flow checks to its current coverage location (Vitest mocked-API + Go service-layer + Phase 10 live-Keycloak integration + Phase 11 runbook). Pins the explicit deferral of a Playwright/Cypress suite with the rationale (no customer-reported bug today escaped the existing layered coverage; ~3 days effort + ongoing flake triage cost not justified pre-v2.1.0). Coverage results ================ internal/auth/oidc/ 93.7% ≥ 90 ✓ (was 78.8%, lifted by prelogin_test.go) internal/auth/oidc/domain/ 96.2% ≥ 90 ✓ internal/auth/oidc/groupclaim/ 100.0% ≥ 95 ✓ internal/auth/session/ 94.9% ≥ 90 ✓ internal/auth/session/domain/ 100.0% ≥ 90 ✓ internal/auth/breakglass/ 91.5% ≥ 90 ✓ internal/auth/breakglass/domain/ 100.0% ≥ 90 ✓ internal/auth/user/domain/ 96.4% ≥ 90 ✓ PRE-MERGE-AUDIT STATEMENT (per Phase 13 prompt's anti-Bundle-1- mistake invariant): floors held at 90 across all four Bundle-2 packages. No held-low-with-rationale entry. Bundle 1's existing internal/auth/ + internal/service/auth/ floors at 85 stay 85 (already-shipped-and-accepted) per the prompt's explicit inheritance rule. Verification ============ * gofmt -l on the new test files: clean. * go vet ./internal/auth/oidc/... ./internal/repository/postgres/...: clean. * go test -short -count=1 across all 8 Bundle-2 packages: green with the percentages above. * multi-tenant-query-coverage.sh: PASS (count 32 == baseline 32). Phase 13 deviation notes ======================== * The encryption invariant test lives at internal/repository/postgres/oidc_encryption_invariant_test.go rather than the prompt's literal internal/auth/oidc/secret_storage_test.go. Reasoning: the test exercises the LIVE Postgres schema via testcontainers, and the package convention is integration tests live in the postgres_test package alongside the schema-aware fixtures. Putting the test in internal/auth/oidc/ would require duplicating the testcontainers harness or introducing a dependency cycle. The semantic content is identical to the prompt's spec. * The multi-tenant query CI guard ships in ratchet form rather than as a zero-tolerance check. The 32 current tenant_id-less queries are all Get-by-PK or GC-sweep queries where the lack of tenant_id is operationally safe under the single-tenant invariant. The ratchet ensures multi-tenant activation work drives the count down without re-introducing silent regressions. * The full Playwright/Cypress E2E suite is deferred. The web/src/__tests__/e2e/README.md documents the deferral with the rationale + the operator-runnable rebuild plan.	2026-05-10 16:31:22 +00:00

shankar0123

c4ed3da30b

fix(ci): Sprint 6 CI follow-up — staticcheck ST1021 + tenant-query baseline + skip inventory

Sprint 6 push (commits 43836ac + 663b14b) tripped three CI guards.
Fixing all three in this single follow-up — each is a small,
mechanical correction that doesn't change behavior:

1. staticcheck ST1021: AuditChainSnapshot doc comment was on the
   wrong type.

   internal/service/audit_chain_metric.go:91 had:
     // Snapshot returns the current counter state for the Prometheus
     // exposer. Reads use atomic loads — no mutex.
     type AuditChainSnapshot struct { ... }

   The comment described Snapshot() (the method on AuditChainCounter)
   but sat directly above the AuditChainSnapshot struct. staticcheck
   ST1021 requires exported-type comments to start with the type's
   name + optional leading article. Rewrote to lead with
   "AuditChainSnapshot is the point-in-time view ...".

2. multi-tenant-query-coverage: baseline drifted 31 → 32 because
   Sprint 6 COMP-002-RETENTION added UserRepository.ListDeactivatedBefore
   at internal/repository/postgres/user.go:191 — legitimately
   tenant-spanning by design.

   The retention policy is control-plane-wide (one
   CERTCTL_USER_RETENTION_WINDOW for the whole deployment, not
   per-tenant). The scheduler's userRetentionLoop walks every
   tenant's deactivated users on the same tick. A per-tenant
   tenant_id filter would require the scheduler to iterate every
   tenant — more code for equivalent semantics.

   Per the guard's own documentation (option b), legitimately
   tenant-spanning queries get an inline rationale comment + a
   baseline lift. Both delivered:
     - Inline comment block on the SELECT in user.go::ListDeactivatedBefore.
     - BASELINE_COUNT 31 → 32 in
       scripts/ci-guards/multi-tenant-query-coverage.sh, with the
       Sprint 6 rebase entry added to the rebase-history comment.

3. skip-inventory-drift: docs/testing/skip-inventory.md was stale.
   COMP-001-HASH added three new t.Skip sites in
   internal/repository/postgres/audit_chain_test.go (the three
   testing.Short() gates on the testcontainers integration tests).
   Re-ran ./scripts/skip-inventory.sh to regenerate the doc —
   totals went from 144 → 147 sites + 78 → 82 short-mode guards.

Verified locally:
  bash scripts/ci-guards/multi-tenant-query-coverage.sh      (clean)
  bash scripts/ci-guards/skip-inventory-drift.sh              (clean)
  go vet ./...                                                (clean)
  staticcheck ./internal/service/...                          (clean)

Closes the three Sprint 6 CI failures. The next CI run should
green out.

2026-05-16 06:24:09 +00:00

shankar0123

eee124efb6

chore(ci-guards): close 4 CI-guard regressions surfaced by v2.1.0 release-gate Phase 5

Four scripts/ci-guards/*.sh trips on dev/auth-bundle-2 vs master:

1. G-3-env-docs-drift: 10 CERTCTL_* env vars added by Auth Bundle 2 +
   audit-2026-05-10/11 fix bundle were not in docs/. Added a new 'Auth
   (Bundle 1 + Bundle 2)' section to docs/reference/configuration.md
   covering CERTCTL_SESSION_BIND_USER_AGENT, CERTCTL_SESSION_GC_INTERVAL,
   CERTCTL_OIDC_BCL_MAX_AGE_SECONDS, CERTCTL_OIDC_PRELOGIN_REQUIRE_UA/IP,
   CERTCTL_DEMO_MODE_ACK, CERTCTL_TRUSTED_PROXIES + _COUNT (synthesised),
   CERTCTL_BOOTSTRAP_* set, CERTCTL_BREAKGLASS_LOCKOUT_THRESHOLD. Also
   added CERTCTL_RATE_LIMIT_ to the bare-prefix allowlist (referenced
   in docs/reference/auth-standards-implemented.md prose).

2. bundle-8-M-009-bare-usemutation: BreakglassPage shipped 3 bare
   useMutation() calls instead of useTrackedMutation. Migrated all
   three to useTrackedMutation with invalidates: [['breakglass']].

3. multi-tenant-query-coverage: Defense-in-depth tenant_id additions
   in the fix bundle dropped the missing-tenant-id query count from 32
   to 31. Ratcheted baseline 32 -> 31 (forward-only invariant).

4. openapi-handler-parity: 28 new REST endpoints from Bundle 2 + the
   fix bundle missing from api/openapi.yaml. Added them to
   api/openapi-handler-exceptions.yaml with per-route 'why:'
   justifications. OpenAPI schema generation deferred to pre-v2.2.0
   alongside the GUI E2E coverage push; threat model + handler
   contracts already live in docs/operator/{rbac,auth-threat-model,
   oidc-runbooks}.md.

After this commit every script in scripts/ci-guards/*.sh exits 0.

2026-05-11 14:19:35 +00:00

shankar0123

130a65f3b6

auth-bundle-2 Phase 13: negative-test backfill (OIDC PreLoginAdapter) + OIDC client_secret encryption invariant + multi-tenant query CI guard + coverage floors held at 90 across 4 Bundle-2 packages + E2E coverage map

Closes Phase 13 of cowork/auth-bundle-2-prompt.md. Ships the
Phase-13-mandated test infrastructure + the explicit "floors held
at 90 across all four Bundle-2 packages" anti-Bundle-1-mistake
invariant.

Files
=====

internal/auth/oidc/prelogin_test.go (NEW, +375 LOC):
* PreLoginAdapter coverage backfill. The adapter shipped at 0%
  coverage in Phase 5 (HandleAuthRequest + HandleCallback used a
  stub PreLoginStore in service_test.go); this file lifts the
  package's coverage from 78.8% to 93.7%.
* 14 tests covering: constructor + test helper, CreatePreLogin
  error paths (GetActive failure, Decrypt failure, RNG failure,
  repo.Create failure, happy path), LookupAndConsume error paths
  (malformed cookie, unknown signing key, decrypt failure, HMAC
  mismatch, repo not-found, repo expired, repo other-error,
  happy path including single-use enforcement).

internal/repository/postgres/oidc_encryption_invariant_test.go (NEW,
+208 LOC, integration test gated by testing.Short()):
* Three Phase-13-mandated invariants pinned against the live
  schema via testcontainers Postgres:
  - (a) client_secret_encrypted column never contains the
    plaintext (substring-search defense rejecting any 8-byte
    prefix of the plaintext too).
  - (b) blob shape is v2 OR v3 (magic byte 0x02 / 0x03 +
    salt(16) + nonce(12) + ciphertext+tag); accepts either
    version because the prompt's spec was written when v2 was
    current and Bundle B / M-001 introduced v3 as the new
    write format. Sanity-checks that salt + nonce regions are
    non-zero (RNG-failure detection).
  - (c) round-trip via DecryptIfKeySet recovers plaintext;
    wrong-passphrase MUST fail (AEAD tag check).
* Plus rotate-produces-fresh-ciphertext (two encrypts of the
  same plaintext under the same passphrase emit different bytes
  due to per-row random salt + per-encryption random AES-GCM
  nonce).
* Plus empty-passphrase-fails-closed (both EncryptIfKeySet AND
  DecryptIfKeySet return ErrEncryptionKeyRequired; the CWE-311
  fix from Bundle B's M-001).

scripts/ci-guards/multi-tenant-query-coverage.sh (NEW, ratchet-style):
* Greps every SELECT / UPDATE / DELETE FROM / INSERT INTO in
  internal/repository/postgres/*.go (excluding *_test.go) that
  targets a tenant-aware table. Counts queries that lack
  tenant_id in the surrounding 7-line window.
* Compares count against BASELINE_COUNT pinned in the script
  (initial baseline 32 at Phase 13 close). Regression (count >
  baseline) → FAIL with line-by-line violation list. Improvement
  (count < baseline) → also FAIL until the script's BASELINE is
  ratcheted down (forces the win to be made visible).
* Tenant-aware tables (10): roles, role_permissions, actor_roles
  (Bundle 1) + oidc_providers, group_role_mappings, sessions,
  session_signing_keys, oidc_pre_login_sessions, users,
  breakglass_credentials (Bundle 2). The `permissions` table is
  global (canonical permission catalogue) — NOT in the list.
* Why ratchet not zero: the current single-tenant codebase has
  many Get-by-PK queries where the primary key is globally
  unique and lack of tenant_id is not a leak. Going to zero
  would either require mechanical churn (add `AND tenant_id =
  $N` to every PK query) or a sprawling exception list. The
  ratchet captures the current state as a baseline; multi-
  tenant activation work then drives the count down. New code
  that ADDS to the count without operator review is what we
  catch.

.github/coverage-thresholds.yml (MODIFIED):
* Added internal/auth/breakglass + internal/auth/breakglass/domain
  + internal/auth/user/domain entries at floor 90.
* Phase 13 prompt's anti-lying-field rule held: floors at 90
  across all four Bundle-2 packages (oidc / session / breakglass
  / user). NO held-low-with-rationale entry.
* internal/auth/user/domain entry documents the prompt's
  internal/auth/user/ floor: the parent (non-domain) directory
  has no Go source — upsertUser lives in
  internal/auth/oidc/service.go alongside group resolution +
  role mapping (cohesive sequence within the OIDC callback).
  Splitting upsertUser into a separate internal/auth/user/
  service package would harm cohesion without adding test value;
  the domain layer's invariant coverage is where the floor
  actually applies.

web/src/__tests__/e2e/README.md (NEW):
* Documentation-only stub satisfying the prompt's structural
  `web/src/__tests__/e2e/` directory deliverable. Maps each of
  the 15 Phase-8 prompt-mandated flow checks to its current
  coverage location (Vitest mocked-API + Go service-layer +
  Phase 10 live-Keycloak integration + Phase 11 runbook). Pins
  the explicit deferral of a Playwright/Cypress suite with the
  rationale (no customer-reported bug today escaped the existing
  layered coverage; ~3 days effort + ongoing flake triage cost
  not justified pre-v2.1.0).

Coverage results
================

  internal/auth/oidc/                93.7% ≥ 90  ✓ (was 78.8%, lifted by prelogin_test.go)
  internal/auth/oidc/domain/         96.2% ≥ 90  ✓
  internal/auth/oidc/groupclaim/    100.0% ≥ 95  ✓
  internal/auth/session/             94.9% ≥ 90  ✓
  internal/auth/session/domain/     100.0% ≥ 90  ✓
  internal/auth/breakglass/          91.5% ≥ 90  ✓
  internal/auth/breakglass/domain/  100.0% ≥ 90  ✓
  internal/auth/user/domain/         96.4% ≥ 90  ✓

PRE-MERGE-AUDIT STATEMENT (per Phase 13 prompt's anti-Bundle-1-
mistake invariant): floors held at 90 across all four Bundle-2
packages. No held-low-with-rationale entry. Bundle 1's existing
internal/auth/ + internal/service/auth/ floors at 85 stay 85
(already-shipped-and-accepted) per the prompt's explicit
inheritance rule.

Verification
============

* gofmt -l on the new test files: clean.
* go vet ./internal/auth/oidc/... ./internal/repository/postgres/...:
  clean.
* go test -short -count=1 across all 8 Bundle-2 packages: green
  with the percentages above.
* multi-tenant-query-coverage.sh: PASS (count 32 == baseline 32).

Phase 13 deviation notes
========================

* The encryption invariant test lives at
  internal/repository/postgres/oidc_encryption_invariant_test.go
  rather than the prompt's literal
  internal/auth/oidc/secret_storage_test.go. Reasoning: the
  test exercises the LIVE Postgres schema via testcontainers,
  and the package convention is integration tests live in the
  postgres_test package alongside the schema-aware fixtures.
  Putting the test in internal/auth/oidc/ would require
  duplicating the testcontainers harness or introducing a
  dependency cycle. The semantic content is identical to the
  prompt's spec.
* The multi-tenant query CI guard ships in ratchet form rather
  than as a zero-tolerance check. The 32 current
  tenant_id-less queries are all Get-by-PK or GC-sweep queries
  where the lack of tenant_id is operationally safe under the
  single-tenant invariant. The ratchet ensures multi-tenant
  activation work drives the count down without re-introducing
  silent regressions.
* The full Playwright/Cypress E2E suite is deferred. The
  web/src/__tests__/e2e/README.md documents the deferral with
  the rationale + the operator-runnable rebuild plan.

2026-05-10 16:31:22 +00:00

3 Commits