mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 16:01:30 +00:00
v2.1.6
994 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
315e132981 |
auth-bundle-2 Phase 2a: SQL migrations (oidc_providers, sessions, users)
Three new idempotent transactional migrations that materialize the
Phase 1 domain types into Postgres tables. Repository implementations
+ integration tests land as Phase 2b in the next commit.
migrations/000034_oidc_providers.up.sql:
oidc_providers table with the full OIDCProvider field set
(issuer_url + client_id + client_secret_encrypted v2 blob +
redirect_uri + groups_claim_path + groups_claim_format +
fetch_userinfo + scopes[] + allowed_email_domains[] +
iat_window_seconds + jwks_cache_ttl_seconds + tenant_id).
group_role_mappings table linking provider+group_name to role_id.
Closed-enum CHECK on groups_claim_format ('string-array' or
'json-path').
Defense-in-depth bounds CHECKs on iat_window_seconds (1..600) and
jwks_cache_ttl_seconds (>= 60); app-layer Validate() also
enforces these.
ON DELETE CASCADE on group_role_mappings.provider_id so deleting a
provider cleans up its mappings.
ON DELETE RESTRICT on group_role_mappings.role_id so an in-use role
can't be silently dropped.
migrations/000035_sessions.up.sql:
session_signing_keys table with key_material_encrypted v2 blob +
retired_at nullable + the retired-after-created CHECK.
Partial index on (tenant_id, created_at DESC) WHERE retired_at IS
NULL backs the GetActive hot path.
sessions table covers BOTH the post-login row (1h-idle/8h-absolute
cookie lifecycle) AND the Phase 5 pre-login row (10-minute TTL,
is_pre_login=true). csrf_token_hash holds the SHA-256 of the
CSRF token plaintext (the plaintext lives in a separate
JS-readable cookie, hashed here so a DB-read leak can't replay).
Two CHECK constraints pin the expiry order (absolute > idle, idle >
created); these match the Phase 1 domain Validate() pre-write
invariants but enforce them at the DB layer too so direct SQL
inserts can't silently land malformed rows.
Partial indexes on actor_id (active sessions only), the active
session lookup, the pre-login GC sweep (created_at), and the
absolute-expired GC sweep (absolute_expires_at) cover the four
hot paths Phase 4's service consumes.
ON DELETE RESTRICT on sessions.signing_key_id so a signing key
referenced by an active session can't be dropped (the retention
window keeps retired keys valid; full purge waits until every
session signed under that key has expired).
migrations/000036_users.up.sql:
users table for federated-human identity (per-(provider, subject)
tuple via UNIQUE constraint, not global - identity is per-IdP by
design).
webauthn_credentials JSONB DEFAULT '[]' reserved for v3 (Decision
12); Bundle 2 always stores [].
Email index for the GUI's "find user by email" surface (not unique
because the same email can appear in multiple providers per the
per-IdP identity model).
ON DELETE RESTRICT on users.oidc_provider_id keeps Phase 3's "delete
provider only when no users authenticated via it" rule enforced
at the DB layer; the OIDCProviderRepository.Delete impl will
translate SQLSTATE 23503 into a 409 sentinel.
All three migrations:
Wrapped in BEGIN/COMMIT so partial-fail leaves no half-state.
IF NOT EXISTS / IF EXISTS / ON CONFLICT DO NOTHING for idempotency
(the certctl-server boot path applies every migration on every
start per CLAUDE.md "Idempotent migrations" architecture rule).
TIMESTAMPTZ for time columns (no TIMESTAMP WITHOUT TIME ZONE).
TEXT primary keys with prefixes per CLAUDE.md "Architecture
Decisions" (op- / grm- / sk- / ses- / u-).
Multi-tenant ready: tenant_id column with DEFAULT 't-default' on
every row, FK to tenants(id) ON DELETE CASCADE. Bundle 2 ships
single-tenant; managed-service activation adds tenants without a
schema migration.
Down migrations exist in lockstep, drop tables in FK-safe order
(group_role_mappings -> oidc_providers; sessions ->
session_signing_keys; users alone). Down-migrations are destructive;
docstrings call this out.
Verifications:
Migration count: ls migrations/*.up.sql | wc -l = 36 (33 from
Bundle 1 + 3 new).
BEGIN/COMMIT pair counts: each new migration is 1:1.
No Docker in this sandbox, so the migrations are not applied
end-to-end here; CI's testcontainers harness runs them via
postgres.RunMigrations on every push. Phase 2b's repository
integration tests will exercise the schema against Postgres 16
Alpine.
|
||
|
|
b0ac24fbf8 |
auth-bundle-2 Phase 1: OIDC + Session + User + Breakglass domain types
Phase 1 ships the persisted-shape types Bundle 2 needs end-to-end.
No DB migrations, no service layer, no HTTP handlers; Phase 2 ships
the SQL, Phase 3+ ship the consumers. Each type has a Validate()
method that enforces the on-disk invariants the schema will mirror,
and a focused _test.go that pins each invariant's failure mode.
Per-package summary:
internal/auth/oidc/domain/ (OIDCProvider + GroupRoleMapping):
* OIDCProvider carries the operator-configured IdP record. Fields
match the prompt's Phase 1 list plus IATWindowSeconds and
JWKSCacheTTLSeconds (Phase 3 references these by name; landing
them in Phase 1's domain type avoids the lying-field gap).
ClientSecretEncrypted is opaque from this layer; it is the v2 blob
produced by internal/crypto/encryption.go and is `json:"-"` so it
never wire-leaks.
* Validate() rejects: invalid id prefix, empty name, non-https
issuer_url (matches Phase 3's "JWKS endpoint MUST be HTTPS"),
empty client_id, empty client_secret_encrypted, non-https
redirect_uri, invalid groups_claim_format, scopes missing openid,
IAT window outside (0, 600], JWKS cache TTL below 60s. Defaults
applied in-place: GroupsClaimPath="groups", GroupsClaimFormat=
"string-array", Scopes=["openid","profile","email"],
IATWindowSeconds=300, JWKSCacheTTLSeconds=3600,
TenantID="t-default".
* GroupRoleMapping carries the operator-configured group-to-role
rule. Validate() pins prefix conventions ("grm-", "op-", "r-")
and non-empty group name.
* 18 tests across happy-path + every negative invariant.
internal/auth/session/domain/ (Session + SessionSigningKey):
* Session covers BOTH the post-login row (full 1h-idle/8h-absolute
cookie lifecycle) AND the Phase 5 pre-login row (10-minute TTL,
carries OIDC state+nonce+PKCE verifier across the IdP redirect).
IsPreLogin discriminates. CSRFTokenHash holds SHA-256 of the
CSRF token plaintext (the plaintext lives in a JS-readable
certctl_csrf cookie; storing only the hash on the row defends
against DB-read leaks per the Phase 4 CSRF contract).
* Validate() pins: id prefix "ses-", non-empty actor id/type,
signing key id prefix "sk-", AbsoluteExpiresAt strictly > Idle,
IdleExpiresAt strictly > CreatedAt, CSRFTokenHash exactly 64
lowercase hex chars when set.
* Cookie naming constants pinned by a separate test
(TestCookieNamingConstants) so a future rename can't silently
break the GUI's web/src/api/client.ts which reads these names by
string.
* SessionSigningKey stores the v2-encrypted HMAC key material; the
retired-before-created invariant catches malformed rows. 14
tests across both types.
internal/auth/user/domain/ (User):
* Federated-human identity for SSO logins. Distinct from Bundle 1's
free-form actor_id strings: actor_roles.actor_id = User.ID for
federated humans (per the prompt's note about how the two
identity systems intersect).
* WebAuthnCredentials JSONB column reserved for v3 (Decision 12);
defaults to "[]" on Validate() so Bundle 2 + v3 share the same
on-disk format from day one.
* Email validation is intentionally loose (basic shape: one @,
non-empty local + domain, no whitespace, dot in domain). RFC 5321
/ 5322 grammars are not enforced; the IdP issued the email and
we trust its shape, only rejecting gross corruption.
* 8 tests across happy-path + invalid-id + empty-email +
malformed-email + invalid-provider-id + tenant defaulting +
WebAuthn-credentials passthrough.
internal/auth/breakglass/domain/ (BreakglassCredential):
* Phase 7.5 type. Argon2id PHC-format password hash; Validate()
pins the Argon2id magic prefix so non-Argon2id formats (bcrypt,
pbkdf2, plaintext) are rejected at the persistence boundary.
* MinPasswordLengthBytes (12) + MaxPasswordLengthBytes (256)
constants pinned by a dedicated test so the operator-facing
password-strength contract can't drift silently.
* IsLocked(now) helper exposes the lockout state machine for the
Phase 7.5 service to consume; the lockout window default is
15min in the service layer.
* 9 tests across happy-path + per-invariant negative + lockout
state machine + tenant defaulting.
Cross-cutting:
* Every type has json:"-" on the encrypted-credential field
(ClientSecretEncrypted, KeyMaterialEncrypted, PasswordHash,
CSRFTokenHash) so even a misconfigured handler that marshals the
domain type directly into a response body cannot leak the
secret. Mirrors Bundle 1's pattern for issuer/target credentials.
* Every type carries TenantID with Validate() defaulting to
authdomain.DefaultTenantID. Forward-compat for the future
managed-service multi-tenant activation; Bundle 2 ships
single-tenant.
Verifications:
* gofmt -l clean across all 8 new files (one round-trip required to
satisfy Go 1.19+ doc-comment list-formatting rules in
session/domain/types.go).
* go vet clean on internal/auth/oidc/... + session/... + user/... +
breakglass/...
* go test -short -count=1 green on all four new domain packages
(49 test functions total).
* go test -short -count=1 still green on Bundle 1 packages
(internal/auth, internal/auth/bootstrap, internal/service/auth,
internal/config).
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.
Phase 1 exit criteria from cowork/auth-bundle-2-prompt.md:
* All types compile: yes.
* Validators have at least 5 test cases each: yes (smallest is
User with 8 tests; OIDCProvider has 13).
* make verify equivalent green: gofmt + vet + go test pass
(golangci-lint deferred to CI per the same operating-rule
pattern Phase 0 used).
|
||
|
|
2d9110b0c4 |
auth-bundle-2 Phase 0: dependency-add + oidc auth-type literal + runtime guard
Bundle 2 Phase 0 stages the dependencies + auth-type discriminator
literal that later phases consume. No handler chain wired yet; an
operator who sets CERTCTL_AUTH_TYPE=oidc on this commit gets a clear
refuse-to-start error rather than a silent fallback to api-key (the
G-1 failure mode that drove "jwt" out of the allowed set).
Deliverables:
* go.mod: github.com/coreos/go-oidc/v3 v3.18.0 added as a direct
require. Per the pre-bundle dependency audit (Apache-2.0, zero CVEs
ever per OSV.dev, 2,400+ stars, used by Hashicorp Vault + Dex +
Hydra + Authentik + every Kubernetes OIDC integration), this is the
ecosystem-standard Go OIDC client. Pinned to a specific minor
(v3.18.0) per the prompt's "no bare latest" rule.
* go.mod: golang.org/x/oauth2 promoted from // indirect to direct,
bumped from v0.34.0 to v0.36.0 by go mod tidy. Both versions are
OSV-clean. Maintained by the Go team.
* No JSON-path library added (forbidden by the dependency audit; the
group-claim resolver is hand-rolled in Phase 3).
* internal/config/config.go: AuthTypeOIDC constant added with a
load-bearing comment explaining (a) this is the AUTH-TYPE literal,
not a JWT alg literal, so the G-1 closure invariant is preserved
("jwt" stays out of ValidAuthTypes forever); (b) the runtime guard
in cmd/server/main.go intentionally refuses-to-start when oidc is
set pre-Phase-6 to avoid the silent-downgrade failure mode.
ValidAuthTypes() now returns {api-key, none, oidc}.
* internal/config/config_test.go: TestValidAuthTypesIsExactly_APIKey_None
renamed to TestValidAuthTypesIsExactly_APIKey_None_OIDC and now pins
the 3-entry set. TestValidAuthTypesDoesNotContainJWT (G-1 closure
test) still passes because "jwt" is never added back.
TestValidate_GenericInvalidAuthType's bad-types list updated:
"oidc" removed (now valid), "saml" added (correctly rejected per
Decision 5's SAML deferral).
* cmd/server/main.go: defense-in-depth runtime auth-type guard now
has an explicit AuthTypeOIDC case that exit(1)s with an actionable
message: "the OIDC auth chain is not yet wired in this build (Auth
Bundle 2 Phase 6 ships the session middleware that consumes this
auth-type literal)." This closes the lying-field gap the literal
would otherwise create. Phase 6 of Bundle 2 relaxes this case to
fall through alongside api-key + none.
* api/openapi.yaml: /v1/auth/info auth_type enum extended from
[api-key, none] to [api-key, none, oidc] with an in-line comment
explaining the Phase-0-vs-Phase-6 timing so an OpenAPI consumer
isn't surprised by "oidc" appearing here pre-Bundle-2-merge.
* deploy/helm/certctl/templates/_helpers.tpl::certctl.validateAuthType:
valid set extended to include "oidc". Chart-time validation now
passes for type=oidc; the binary's runtime guard takes over to
refuse the start. Once Bundle 2 ships, the runtime guard relaxes
and OIDC works end-to-end with no further chart edits.
* .env.example: CERTCTL_AUTH_TYPE comment block updated to document
the three valid values + the Phase-0-vs-Phase-6 timing.
* internal/auth/oidc/doc.go: new package directory with package doc
+ transitional blank imports for coreos/go-oidc/v3 + x/oauth2 so
go mod tidy keeps both deps as direct requires until Phase 3's
service.go replaces the blanks with real symbol use. Doc explains
the package layout (oidc/ + oidc/domain/ + oidc/groupclaim/ +
oidc/testfixtures/) so the post-Bundle-2 reader can navigate.
Verifications:
* gofmt clean on every changed file.
* go vet clean on internal/config + cmd/server + internal/auth/oidc.
* go test -short -count=1 green on internal/config (including the
G-1 closure + new validation tests), cmd/server, internal/auth (all
Bundle 1 packages), internal/service/auth.
* govulncheck ./... clean (M-024 hard CI gate).
* All 24 ci-guards pass locally.
Phase 0 exit criteria from cowork/auth-bundle-2-prompt.md:
* go.mod shows coreos/go-oidc/v3 as direct: yes.
* golang.org/x/oauth2 is direct (not indirect): yes.
* govulncheck ./... clean: yes.
* No JSON-path library in go.mod / go.sum deltas: confirmed (only
v3 of go-oidc + the x/oauth2 bump landed).
* make verify green: gofmt + vet + go test pass; full make verify
(which would invoke golangci-lint) deferred to CI since the
sandbox doesn't have golangci-lint installed; the operator runs
make verify locally before pushing per CLAUDE.md operating rule.
|
||
|
|
977cdbdf44 |
docs(README): surface Bundle 1 RBAC + signal Bundle 2 federation as roadmap
Pre-fix the README said nothing about role-based access control,
the auditor role, the day-0 bootstrap path, or the four-eyes
approval workflow — all shipped in Bundle 1 (commit
v2.0.72
|
||
|
|
5d79e53ad0 |
auth-bundle-1 follow-on: close coverage gaps to clear Phase 12 floors
CI run #486 (post-Bundle-1 merge + Go 1.25.10 bump) failed three coverage-threshold gates: internal/api/handler 74.7% < floor 75 (-0.3pp) internal/auth 66.3% < floor 85 (-18.7pp) internal/service/auth 51.1% < floor 85 (-33.9pp) The Phase 12 gate file's "85% with negative-test coverage" claim turned out to be aspirational — the read-side and Update-path methods on RoleService / PermissionService / ActorRoleService had zero unit-test coverage, and internal/auth's keystore + HasPermission helper had zero tests. This commit closes the gap without lowering the gate. Per-package CI-style averages after this commit (per scripts/check-coverage-thresholds.sh's per-function-mean): internal/api/handler 76.1% (+1.4pp, margin +1.1pp) internal/auth 90.5% (+24.2pp, margin +5.5pp) internal/service/auth 93.7% (+42.6pp, margin +8.7pp) Tests added: internal/service/auth/service_test.go (+18 tests, +518 LOC): PermissionService.List, PermissionService.GetByName, RoleService.Get (4 paths), RoleService.List (system caller), RoleService.Update (4 paths), RoleService.ListPermissions (3 paths), RoleService.AddPermission/RemovePermission round-trip + gate paths, RoleService.Delete (success + nil-caller + no-perm + audit), RoleService.Create (nil-caller), ActorRoleService.ListForActor (self-bypass + cross-actor + nil-caller + system + with-perm), ActorRoleService.Effective- Permissions (same shape), ActorRoleService.ListKeys (3 paths + system bypass), ActorRoleService.Revoke (4 paths), Authorizer edge cases (empty actorID short-circuit, empty tenantID default, scoped-grant-without-scope-id no-match invariant, repo-error wrap-and-return, HoldsAnyOf early-exit), recordAudit nil-arm short-circuits. internal/auth/keystore_test.go (NEW, +175 LOC): StaticKeyStore.Len, StaticKeyStore.LookupByHash hit + miss, MutableKeyStore seeded lookup + Len, Add registers new key, AddHashed registers from precomputed hash, AddHashed replaces on duplicate hash (idempotent boot-loader contract), HasPermission no-actor / default-actor-type / checker-error / scoped-check threading. internal/auth/bootstrap/service_test.go (+36 LOC): Service.Available nil-receiver/nil-strategy short-circuit, Service.Available delegates to Strategy when configured. internal/api/handler/auth_test.go (+208 LOC): GetRole returns role + permissions, GetRole 404 + 401, UpdateRole 200 + invalid-JSON-400 + 401, ListKeys returns actor list + 401, RemoveRolePermission 204 (global + scoped) + 401, rolePermToResponse scope encoding pin via GetRole. Verified: gofmt -l . clean (touched files only). go vet ./internal/auth/... ./internal/service/auth/... ./internal/api/handler/ rc=0. go test -count=1 -short on the four packages green. CI-style per-function averages computed via the live scripts/check-coverage-thresholds.sh arithmetic — all three gated packages clear their floors with margin. Per CLAUDE.md "complete path" + "do not lower the gate to make CI green": gate file unchanged. The 85/85/75 floors stand. |
||
|
|
3e91c7a1f0 |
chore(security): bump Go toolchain 1.25.9 -> 1.25.10 + golang.org/x/net 0.49 -> 0.53
CI run #484's Go Build & Test job failed govulncheck (M-024 hard gate). Six standard-library CVEs land in go1.25.9 + one golang.org/x/net CVE in v0.49.0; all are fixed in go1.25.10 + x/net v0.53.0 respectively. The advisories that fired were: GO-2026-4986 Quadratic string concat in net/mail.consumeComment — called via internal/api/handler/validation.go's ValidateCommonName -> mail.ParseAddress GO-2026-4977 Quadratic string concat in net/mail.consumePhrase — same call site GO-2026-4982 Bypass of meta-content URL escaping in html/template — called via internal/service/digest.go's RenderDigestHTML -> Template.Execute GO-2026-4980 Escaper bypass in html/template — same call site GO-2026-4971 Panic in net.Dial / LookupPort on Windows NUL bytes — many call sites (email notifier, SSH connector, ACME validators, validation.ValidateSafeURL, ...) GO-2026-4918 Infinite loop in net/http2 transport on bad SETTINGS_MAX_FRAME_SIZE — called via internal/connector/target/f5.go's F5Client.Authenticate -> http.Client.Do Bumps applied: * `go.mod`: `go 1.25.9` -> `go 1.25.10`; `golang.org/x/net v0.49.0` -> `v0.53.0` (kept indirect — the upgrade is force-pulled by the module-version directive; transitive deps will pick the higher). * `.github/workflows/{ci,codeql,release}.yml`: setup-go pin and the release.yml `GO_VERSION` env var bumped to 1.25.10. The security-deep-scan.yml workflow uses the major-minor `1.25` pin which auto-resolves to the latest 1.25.x and is unaffected. * `Dockerfile` + `Dockerfile.agent`: `golang:1.25-alpine@sha256:5caa...` re-pinned to `golang:1.25.10-alpine@sha256:8d22e29d960bc50cd0...` (digest looked up against `registry-1.docker.io/v2/library/golang/ manifests/1.25.10-alpine`; verified by the digest-validity ci-guard). The explicit `1.25.10-alpine` tag form replaces the moving `1.25-alpine` pin so the image-spec is reproducible end-to-end even without the digest reference. * `deploy/test/f5-mock-icontrol/Dockerfile`: `golang:1.25.9-bookworm @sha256:1a14...` re-pinned to `golang:1.25.10-bookworm@sha256: e3a54b77385b4f8a31c1...` (looked up the same way). * `deploy/test/f5-mock-icontrol/go.mod`: `go 1.25.9` -> `go 1.25.10`. * `internal/api/handler/version.go` + `api/openapi.yaml`: the `runtime.Version()`-shape comment + OpenAPI `example: go1.25.9` bumped to keep doc/example freshness. * `docs/contributor/ci-pipeline.md` + `docs/reference/connectors/ iis.md`: doc-only `Go 1.25.9` -> `Go 1.25.10` references. Verification done in-tree: * All `scripts/ci-guards/*.sh` pass locally including `digest-validity.sh` (the new digests resolve cleanly against Docker Hub). * `S-1-hardcoded-source-counts.sh` clean (the false-positive on "Bundle 1 migrations" was fixed in the prior commit). Operator step required post-push (sandbox has no Go toolchain): cd certctl && go mod tidy This regenerates go.sum's `golang.org/x/net v0.49.0` h1: lines into v0.53.0 ones. CI's `go mod tidy && git diff --exit-code go.mod go.sum` step will catch the drift if missed; in that case run the command, commit, and push the go.sum-only delta. |
||
|
|
51f55c5fc9 |
auth-bundle-1 fix: S-1 ci-guard false positive on "Bundle 1 migrations"
CI run #484 surfaced the regression in the Frontend Build job: ::error::S-1 regression: hardcoded source-count prose reappeared: docs/migration/api-keys-to-rbac.md:32:schema is already at the target version. The Bundle 1 migrations The S-1 guard's regex (scripts/ci-guards/S-1-hardcoded-source-counts.sh) catches `\b[0-9]+\s+migrations\b` to prevent stale "<N> migrations" prose in docs/. The Bundle 1 migration-guide phrasing "The Bundle 1 migrations" tripped on the digit-1 in "Bundle 1" sitting next to the word "migrations" — false positive, not a real source-count claim. Rephrase to "Migrations that ship in the Bundle 1 slice of v2.1.0:" which keeps the same operator meaning without the regex collision. The guard now passes; full ci-guards loop runs clean locally. Spotted via the operator's CI-failure paste post-Bundle-1 merge. |
||
|
|
22c4971012 |
Merge branch 'dev/auth-bundle-1' into master
Auth Bundle 1: RBAC primitive + day-0 bootstrap + auditor role +
API-key-to-role migration + approval-bypass closure.
17 commits across Phases 0-13 plus two follow-on bug fixes:
Phase 0: extract internal/auth/ package from middleware
Phase 1: RBAC schema + domain types + repository (000029_rbac)
Phase 2: RBAC service layer + Authorizer primitive
Phase 3: RequirePermission middleware + demo-mode synthetic actor
+ protocol-endpoint allowlist
Phase 3.5: handler IsAdmin -> router-wrapped RequirePermission
Phase 4-5: RBAC HTTP API + CLI surface (12 endpoints)
Phase 6: CERTCTL_BOOTSTRAP_TOKEN day-0 admin path (one-shot,
constant-time-compared, never logged)
Phase 7: certctl-cli auth keys scope-down (interactive / JSON /
--suggest with audit-event classifier)
Phase 8: audit_events.event_category column + auditor role split
(r-auditor holds only audit.read + audit.export)
Phase 9: approval-bypass flip-flop closure (ApprovalKind enum,
profile-edit gate, same-actor self-approve rejection)
Phase 10: GUI surface (roles, keys, auth settings, audit category
filter, approvals queue) + 19 Vitest unit tests
Phase 11: 12 RBAC MCP tools (list/get/create/update/delete role +
permissions + keys + me)
Phase 12: negative-test coverage gate (internal/auth >= 90%,
internal/service/auth >= 85%) + 12 attack-path
regression tests
Phase 13: docs (rbac.md + auth-threat-model.md +
api-keys-to-rbac.md + security.md update + README index)
Bug fixes shipped on the bundle branch:
|
||
|
|
efea4d0e03 |
auth-bundle-1 fix: bundled certctl-agent restart loop (latent since 2026-03-14)
The bundled `docker-compose.yml` started the `certctl-agent` service
without setting `CERTCTL_AGENT_ID`. `cmd/agent/main.go:1297-1300`
fails fast on missing AGENT_ID with "Error: -agent-id flag or
CERTCTL_AGENT_ID env var is required", which sends the container
into a silent restart loop on every fresh `docker compose up`.
Latent since commit
|
||
|
|
45122d7edb |
auth-bundle-1 fix: migration 000029 role_permissions NULL scope_id
Real bug an external tester (operator) hit on first docker compose up:
failed to execute migration 000029_rbac.up.sql: pq: null value in
column "scope_id" of relation "role_permissions" violates
not-null constraint
# Root cause
The role_permissions table declared scope_id TEXT (nullable) but
also declared
PRIMARY KEY (role_id, permission_id, scope_type, scope_id)
In Postgres, PRIMARY KEY columns are implicitly NOT NULL — the
PK constraint silently overrode the column-level nullability. So
every global-scope INSERT (which legitimately has scope_id=NULL
per the CHECK constraint that requires it) tripped the NOT NULL.
The schema was never reachable in the unit-test suite because
the in-memory fakes don't enforce Postgres semantics, and the
postgres integration tests skip on -short. First contact with a
real postgres:16-alpine boot caught it.
# Fix
Switch to a synthetic BIGSERIAL primary key + a UNIQUE NULLS NOT
DISTINCT constraint on the natural key
(role_id, permission_id, scope_type, scope_id):
- BIGSERIAL primary key satisfies Postgres's PK-implies-NOT-NULL.
- UNIQUE NULLS NOT DISTINCT (Postgres 15+; the project targets
postgres:16-alpine) treats two NULL scope_ids as colliding,
which is what the seed's ON CONFLICT (...) DO NOTHING relies
on to make re-running the migration idempotent.
- The CHECK (scope_type='global' AND scope_id IS NULL OR
scope_type IN ('profile','issuer') AND scope_id IS NOT NULL)
still enforces the per-row invariant.
The ON CONFLICT (col1, col2, ...) clauses in the seed and in
RoleRepository.AddPermission infer the unique index from the
column list and still resolve correctly against the renamed
constraint — no other changes needed.
# Verification
After this commit, docker compose up -d --build should boot
clean: postgres becomes healthy, certctl-tls-init exits 0,
certctl-server applies all 33 migrations including 000029,
backfills the 7 default roles + 33-permission catalogue + the
synthetic actor-demo-anon admin grant, and starts serving on
:8443.
docker compose -f deploy/docker-compose.yml \
-f deploy/docker-compose.demo.yml down -v
docker compose -f deploy/docker-compose.yml \
-f deploy/docker-compose.demo.yml up -d --build
sleep 15
curl -sk https://localhost:8443/api/v1/auth/me | jq
# Expect: actor_id=actor-demo-anon, admin=true, roles=[r-admin]
|
||
|
|
5313cd8492 |
auth-bundle-1 Phase 13 follow-up: em-dash sweep + broken-link fix
Self-audit on
|
||
|
|
e7a94b6080 |
auth-bundle-1 Phase 13: docs (rbac.md + threat model + migration guide + security.md update)
Closes the last Phase before the Bundle 1 Exit gate. Operators
now have authoritative reference + threat model + migration guide
covering every behavior change Bundles 0-12 introduced.
# New docs
* docs/operator/rbac.md (340 lines) — operator how-to:
- Mental model (actors / roles / permissions / scopes)
- 7 default roles seeded by migration 000029 + the 5
admin-only fine-grained perms seeded by 000030
- Permission catalogue table by namespace
- Scope semantics (global beats specific) + the Bundle-2
deferral on scope_id FK enforcement
- Granting / revoking access from GUI + CLI + HTTP API + MCP
- The auditor pattern (audit-only, no resource read)
- Day-0 bootstrap flow (CERTCTL_BOOTSTRAP_TOKEN → curl →
HTTP 410 thereafter)
- Demo-mode (CERTCTL_AUTH_TYPE=none) caveat for production
* docs/operator/auth-threat-model.md (180 lines) — what the
controls defend against:
- 5 threat actors (external, wrong-role, compromised key,
insider operator, compromised auditor)
- Per-defense walk-through (API-key auth, RBAC, bootstrap,
approval workflow + Phase 9 closure, audit trail,
protocol-endpoint allowlist)
- 9 explicit deferrals (OIDC, sessions, local accounts,
JIT elevation, MFA, etc.) — Bundle 2 / future scope
- Compliance mapping (SOC 2 CC6.1/CC6.3, HIPAA §164.312(b),
NIST SSDF PO.5.2, FedRAMP AU-9, PCI-DSS §10)
- 5 operator-runnable sanity checks (e.g.,
'SELECT FROM audit_events WHERE actor=system-bypass' MUST
return 0 in production)
* docs/migration/api-keys-to-rbac.md (200 lines) — v2.0.x →
v2.1.0 upgrade flow:
- The SECURITY: AUDIT YOUR API KEYS callout
- Migration list (000029-000033) + what each does
- 4-mode scope-down flow (interactive / non-interactive
JSON / --suggest / --suggest --apply)
- What changes for code that called auth.IsAdmin
- Helm-specific upgrade flow with example post-upgrade Job
- Docker Compose upgrade flow + the 5 examples folders
that ride demo mode unchanged
- Verification queries + rollback flow
# Updated docs
* docs/operator/security.md — Last-reviewed bumped to
2026-05-09; existing Authentication-surface section
extended to call out the Bundle 1 RBAC primitive,
day-0 bootstrap path, and approval-bypass closure with
cross-references to the new docs.
* docs/reference/profiles.md — Last-reviewed header
formatting fixed (added the > blockquote prefix used
consistently across the docs tree).
# docs/README.md navigation
* Operator section gains 2 new rows (RBAC + auth-threat-model)
and Approval-workflow row updated to mention Phase 9
closure.
* Reference section gains the Profiles row.
* Migration section gains the api-keys-to-rbac row with the
AUDIT YOUR API KEYS callout in the link description.
# CHANGELOG.md v2.1.0 section refreshed
The Phase 7 commit landed the SECURITY: AUDIT YOUR API KEYS
callout. This commit appends the missing Phase 9-12 highlights:
- Approval-bypass closure (profile-edit gate + flip-flop
loophole + ErrApproveBySameActor invariant)
- GUI: Roles / API Keys / Auth Settings / Approvals queue
- 12 new MCP RBAC tools
- Coverage gates on internal/auth + internal/service/auth
- Protocol-endpoint allowlist pinned at 3 layers
Trailing cross-reference block now points at all 4 new docs.
# Verifications
* Every internal link in the 4 new/modified docs validated by
shell sweep (find broken links → 0 hits).
* Every new doc carries 'Last reviewed: 2026-05-09' header
with the > blockquote prefix matching the docs-tree
convention.
* go vet ./... clean.
* staticcheck across every Bundle-1-touched Go package clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/auth (incl.
bootstrap), internal/api/handler, internal/api/router,
internal/cli, internal/service (incl. auth),
internal/domain/auth, internal/mcp, cmd/cli (cmd/server
has 1 environmental failure on the sandbox virtiofs-tmp:
TestPreflightSCEPRACertKey_KeyWorldReadable_Refuses depends
on tmpfs file-mode semantics that virtiofs propagates
differently — pre-existing, unrelated to Bundle 1).
* Frontend: 19 Vitest tests across src/pages/auth/ +
AuditPage all pass; tsc --noEmit clean.
|
||
|
|
06cea1ce0f |
auth-bundle-1 Phase 12 follow-up: in-tree TODO for path-12 deferral
Self-audit on
|
||
|
|
cbb47aaf5d |
auth-bundle-1 Phase 11 + 12: RBAC MCP tools + negative-test coverage gate
# Phase 11 — RBAC MCP tools
12 new tools in internal/mcp/tools_auth.go mirroring the Phase-4
+ Phase-7 HTTP surface so operators driving certctl from Claude
/ VS Code / any MCP client get the same management capability
the GUI + CLI already expose:
certctl_auth_me GET /v1/auth/me
certctl_auth_list_roles GET /v1/auth/roles
certctl_auth_get_role GET /v1/auth/roles/{id}
certctl_auth_create_role POST /v1/auth/roles
certctl_auth_update_role PUT /v1/auth/roles/{id}
certctl_auth_delete_role DELETE /v1/auth/roles/{id}
certctl_auth_list_permissions GET /v1/auth/permissions
certctl_auth_add_permission_to_role POST /v1/auth/roles/{id}/permissions
certctl_auth_remove_permission_from_role DELETE /v1/auth/roles/{id}/permissions/{perm}
certctl_auth_list_keys GET /v1/auth/keys
certctl_auth_assign_role_to_key POST /v1/auth/keys/{id}/roles
certctl_auth_revoke_role_from_key DELETE /v1/auth/keys/{id}/roles/{role_id}
Each tool routes through the existing HTTP client (no parallel
business logic), so permission gates fire server-side: a
non-admin caller's MCP tool invocation returns whatever 403 the
underlying HTTP handler emits, fenced via errorResult for LLM-
prompt-injection defense.
Input types in internal/mcp/types.go (AuthRoleIDInput,
AuthCreateRoleInput, AuthUpdateRoleInput,
AuthRolePermissionGrantInput, AuthRolePermissionRevokeInput,
AuthAssignKeyRoleInput, AuthRevokeKeyRoleInput) carry
jsonschema descriptions so the MCP consumer's tool catalogue
shows operator-friendly hints.
internal/mcp/tools_auth_test.go ships 14 tests:
- TestAuthMCP_AllToolsRegister (registration must not panic)
- TestAuthMCP_PathsAndMethods (table-driven, 12 rows pinning
each tool's HTTP method + URL)
- TestAuthMCP_ForbiddenSurfacesFencedError (12 tools × 403
mock → error surface)
internal/mcp/tools_per_tool_test.go's allHappyPathCases extended
with the 12 new rows so the in-memory dispatch coverage gate
(TestMCP_RegisterTools_DispatchableToolCount) stays green at the
new total of 139 registered tools.
Re-derived total via 'grep -cE "gomcp\.AddTool\(" internal/mcp/tools*.go':
133 (121 in tools.go + 12 in tools_auth.go).
# Phase 12 — negative-test coverage gate
Audit of the prompt's 12 negative-test paths against existing
coverage:
1. Missing actor → 401 ✓ TestRequirePermission_NoActorReturns401, TestRBACGate_NoActorReturns401
2. No roles → 403 ✓ TestRequirePermission_DeniedActorReturns403, TestRBACGate_AuditorRole_403sOnAdminRoutes
3. Role lacks specific perm → 403 ✓ same suite
4. Wrong scope → 403 ✓ TestAuthorizer_SpecificScopeMatchesExactID (wrongID arm)
5. Self-grant w/o auth.role.assign → 403 ✓ TestActorRoleService_GrantRequiresAuthRoleAssign
6. Bootstrap token wrong → 401 ✓ TestEnvTokenStrategy_WrongTokenReturnsInvalidToken, TestBootstrapHandler_Mint_WrongToken_401
7. Bootstrap used twice → 410 ✓ TestEnvTokenStrategy_OneShotConsumption, TestBootstrapHandler_Mint_TwiceReturns410
8. Bootstrap when admin exists → 410 ✓ TestEnvTokenStrategy_AdminExistsClosesPath, TestBootstrapHandler_Mint_AdminExists410
9. Role delete with assignees → 409 NEW: TestRoleService_DeleteWithActorsAssignedReturns409
10. Profile-edit loophole → gated ✓ TestProfileEdit_RequiresApprovalLoopholeClosed
11. Permission not in catalog → 400 ✓ TestRoleService_AddPermissionRejectsNonCanonical
12. Scope ID for nonexistent resource → 404 (validation deferred — no FK constraint between role_permissions.scope_id and the resource tables; documented for a future bundle)
Filled the gap at #9 with TestRoleService_DeleteWithActorsAssignedReturns409
which pins the repository sentinel pass-through (postgres FK
ON DELETE RESTRICT → repository.ErrAuthRoleInUse → service
returns the sentinel verbatim → handler maps to HTTP 409).
# Coverage gates
.github/coverage-thresholds.yml gains 2 entries:
- internal/auth: floor 85
- internal/service/auth: floor 85
.github/workflows/ci.yml's coverage test command extended with
./internal/auth/... and ./internal/api/router/... so the
threshold check has data to evaluate.
# Protocol-endpoint not-gated test (Category F)
internal/api/router/phase12_protocol_allowlist_test.go (new)
adds 3 router-level invariant tests:
- TestPhase12_ProtocolEndpointsNotGated: AST-walks router.go,
asserts no rbacGate(...) call references a path under any
protocol-endpoint prefix (/acme, /scep, /.well-known/est,
/.well-known/pki/ocsp, /.well-known/pki/crl).
- TestPhase12_IsProtocolEndpoint_CoversCanonicalPrefixes:
pins auth.IsProtocolEndpoint against the canonical prefix
set; if a future protocol lands without lockstep allowlist
update, this fails.
- TestPhase12_RBACGateRoutesAreUnderAPIv1: belt-and-braces —
every rbacGate-wrapped route MUST start with /api/v1/.
Catches accidental cross-prefix wraps.
Complements the existing TestRequirePermission_ProtocolEndpointBypassesGate
(middleware-level) + TestRouter_AuthExemptAllowlist_PinsActualRegistrations
(allowlist drift) so the Category F invariant is pinned at all
three layers (middleware + router + dispatch).
# Verifications
* gofmt clean repo-wide.
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
service + repository + cmd + domain + mcp: clean.
* go test -short -count=1 green across internal/auth (incl.
bootstrap), internal/api/handler, internal/api/router,
internal/cli, internal/service (incl. auth),
internal/domain/auth, internal/mcp, cmd/server, cmd/cli.
|
||
|
|
cfe76ad381 |
auth-bundle-1 Phase 10 follow-up: approvals queue GUI + transparent E2E deferral
Self-audit caught the missing GUI surface for Phase 9's flow #6 (profile edit gated → second admin approves → edit lands). The backend path is fully wired + tested in 69a508d; this commit adds the operator-facing UI so an approver can act without curl. # ApprovalsPage Lists every ApprovalRequest in the chosen state filter (default 'pending', toggleable to approved / rejected / expired). Renders both kinds: - cert_issuance — Rank-7 row with cert + job populated. - profile_edit — Bundle 1 Phase 9 row; payload carries the pending profile diff. Pill-rendered amber so an approver can distinguish at a glance. Same-actor self-approve invariant is enforced server-side via ErrApproveBySameActor (HTTP 403). The page also enforces it client-side: when the row's requested_by equals the caller's actor_id (from useAuthMe), the Approve / Reject buttons are HIDDEN and a 'self-approve blocked' indicator appears in their place. The operator literally cannot click the wrong button. Approve + Reject prompt for an optional note via window.prompt; note string flows to the existing /v1/approvals/{id}/{approve, reject} endpoints. Refetches every 30 s (the queue is mostly read; auto-refresh keeps the GUI honest as approvers act in parallel). # Wiring * /auth/approvals route in main.tsx. * Layout nav entry between API Keys and Auth Settings. * api/client.ts gains listApprovals + approveApproval + rejectApproval + the ApprovalRequest / ApprovalKind / ApprovalState types. # Tests ApprovalsPage.test.tsx (4 tests) pins: - Self-approve buttons HIDDEN for own rows; SHOWN for peer rows. - profile_edit kind renders with the amber pill. - Approve POSTs the right URL with the note. - Empty state. Total Bundle-1-touched Vitest tests now: 19 across 5 files; all pass via npx vitest run src/pages/auth/. # Transparent deferrals (called out for the record) The prompt's 9-flow Playwright E2E suite remains DEFERRED. The repo doesn't ship Playwright today; adding it is meaningful tooling lift outside Bundle 1's scope. Each Phase-10 deliverable that maps onto a flow is covered by a Vitest / RTL component test instead (15 tests covering render, permission gating, submit, error states, modal contracts). Full E2E coverage and the ≥75% src/pages/auth/ coverage metric are tracked as Phase 12 work; @vitest/coverage-v8 will land in the same commit that wires the coverage gate. # Verifications * npx tsc --noEmit clean. * npm run build green. * 19 Vitest tests pass. |
||
|
|
69a508dfcf |
auth-bundle-1 Phase 9 + 10: approval-bypass closure + RBAC GUI
# Phase 9 — approval-bypass closure (Decision 9, option a)
* Migration 000033_approval_kinds.up.sql: ALTER TABLE
issuance_approval_requests ADD COLUMN approval_kind +
payload JSONB; relax certificate_id + job_id to nullable;
CHECK (approval_kind IN ('cert_issuance','profile_edit'))
+ CHECK (per-kind nullability invariant) + index on
approval_kind. Idempotent throughout via DO blocks.
* domain.ApprovalKind enum (cert_issuance / profile_edit) +
IsValidApprovalKind. ApprovalRequest gains Kind +
Payload []byte for the pending profile diff.
* postgres.ApprovalRepository.Create + scanApprovalRow extended
to round-trip the new columns; certificate_id + job_id
switched to sql.NullString so profile_edit rows persist
cleanly. Default Kind=cert_issuance preserves back-compat
for every Phase-7-2026-05-03 caller.
* ApprovalService.RequestProfileEditApproval: new entry point
that creates a pending profile-edit row carrying the
serialized profile diff. Bypass mode (CERTCTL_APPROVAL_BYPASS)
short-circuits the same way it does for cert_issuance.
* ApprovalService.SetProfileEditApply hook: cmd/server/main.go
registers a closure that deserializes req.Payload + persists
via profileRepo.Update + emits a profile.edit_applied audit
row with category=auth. The hook avoids the Approval ↔
Profile import cycle.
* ProfileService.UpdateProfile: gates when (a) the live
profile carries RequiresApproval=true, OR (b) the proposed
edit would set it true. Returns ErrProfileEditPendingApproval
with the new approval ID; ProfileHandler maps to HTTP 202
Accepted + {pending_approval_id}. Both arms close the
flip-flop loophole because every transition through an
approval-tier profile fires the gate.
* TestProfileEdit_RequiresApprovalLoopholeClosed pins all 3
bypass attempts (flip-off / kept-on / flip-on) gated; nil-
approval-service preserves pre-Phase-9 direct-apply for
test fixtures.
* Approval service tests gain 4 profile_edit rows: pending row
shape; same-actor self-approve rejected with
ErrApproveBySameActor (load-bearing two-person integrity);
approve fails-closed when apply callback unwired;
apply callback invoked on approve.
* docs/reference/profiles.md (new) explains the gate +
edit response shape (202) + same-actor invariant + bypass
+ audit hooks.
# Phase 10 — RBAC management GUI
* useAuthMe hook (web/src/hooks/useAuthMe.ts): TanStack Query
fetches /api/v1/auth/me on app boot, caches for 60s, exposes
hasPerm(p) + hasAnyPerm + isAdmin predicates. Every Phase-10
page consumes this on mount + gates affordances against the
cached effective_permissions slice. Server-side enforcement
is the load-bearing gate; client-side hide/disable is UX.
* New routes:
- /auth/roles — list (auth.role.list); create-role modal
(auth.role.create) hidden when missing.
- /auth/roles/:id — detail + permissions; edit
(auth.role.edit), delete (auth.role.delete), add/remove
permission affordances each gated.
- /auth/keys — list of every actor with role grants; assign
+ revoke modals (auth.role.assign). actor-demo-anon
flagged system-managed; mutation buttons hidden for it.
- /auth/settings — stub showing /v1/auth/me identity +
bootstrap-endpoint availability via /v1/auth/bootstrap.
* AuditPage extended with category filter ('All categories'
+ the 3 enum values from migration 000032). Selection flows
to the API call params + the URL-driven query state.
* Layout: 3 new nav entries (Roles / API Keys / Auth Settings).
* api/client.ts: 12 new exported functions for the RBAC
surface (authMe, list/get/create/update/delete role,
list/add/remove role permissions, list keys, assign/revoke
key role, bootstrap-availability probe).
* data-testid attributes on every interactive element so a
future Playwright suite can assert behavior without brittle
CSS selectors.
* Empty state, error state, and unsaved-changes warnings on
every form per the prompt's implementation rules.
# Frontend tests
* RolesPage.test.tsx (6 tests): list render, empty state,
error state, hide-create-button-without-perm,
show-create-button-with-perm, submit-create-modal.
* KeysPage.test.tsx (3 tests): demo-anon flagged
system-managed (no buttons), permission-gated affordance
hide for auditor caller, assign-modal-POST contract.
* AuthSettingsPage.test.tsx (2 tests): identity surface,
bootstrap-OPEN-status surface.
* AuditPage.test.tsx (+1): category-filter select renders
with the 4 documented options.
15 frontend tests total in src/pages/auth/ + the audit
category-filter test; all pass via npx vitest run.
# Verifications
* go vet ./... clean.
* staticcheck across internal/auth + handler + router + cli +
service + repository + cmd + domain: clean.
* gofmt -l clean repo-wide.
* go test -short -count=1 green across internal/service,
internal/api/handler, internal/api/router, internal/auth,
internal/auth/bootstrap, internal/service/auth,
internal/domain/auth, cmd/server, cmd/cli, internal/cli.
* npx tsc --noEmit clean.
* npm run build green (vite build produces dist/index.html
+ 946KB JS bundle; chunk-size warning is pre-existing).
* npx vitest run src/pages/auth/ src/pages/AuditPage.test.tsx
green (15 tests, 4 files).
|
||
|
|
af4fa12724 |
auth-bundle-1 Phase 8 follow-up: classify issuer/target audit rows + auditor end-to-end tests + gofmt drift
Self-audit caught five real gaps in 3ef45e2; this commit closes them. # Phase 8 — issuer/target audit rows now classified as 'config' The Phase 8 prompt explicitly required existing config-mutation calls (issuer config, target config, etc.) to write event_category=config. The |
||
|
|
3ef45e2ad4 |
auth-bundle-1 Phase 6-7-8: bootstrap path + scope-down CLI + auditor-role split
# Phase 6 — day-0 admin bootstrap * internal/auth/bootstrap/ (new package): Strategy interface + EnvTokenStrategy with constant-time compare, one-shot consumption via sync.Mutex, optional admin-existence probe. Bundle 2's OIDC- first-admin will plug in alongside as an alternate Strategy. * BootstrapService.ValidateAndMint: validates the operator's CERTCTL_BOOTSTRAP_TOKEN, mints a 32-byte (64-hex-char) random API key value, persists the SHA-256 hash to api_keys, grants r-admin via actor_roles, AddHashed's the runtime keystore so the just- minted key authenticates the next request without restart, and records bootstrap.consume to the audit trail with category=auth. * internal/auth/keystore.go (new): KeyStore interface + StaticKeyStore (immutable env-var-only path) + MutableKeyStore (env-var keys + DB-loaded api_keys + runtime AddHashed). The auth middleware now consumes a KeyStore so the bootstrap path can extend the lookup table at runtime. * migrations/000031_api_keys.up/down.sql: api_keys table with (id, name UNIQUE, key_hash UNIQUE, tenant_id, admin, created_by, created_at, expires_at, last_used_at). Idempotent. * /v1/auth/bootstrap GET (probe) + POST (mint) — auth-exempt. Both routes documented in api/openapi.yaml + AuthExemptRouterRoutes allowlist updated. The token never leaves internal/auth/bootstrap; the minted plaintext key flows only into the HTTP response body. * Startup warning emitted when CERTCTL_BOOTSTRAP_TOKEN is set AND admin actors already exist (config drift signal). * Tests: 4 strategy invariants (empty token born disabled, wrong token=ErrInvalidToken without consumption, one-shot consumption, admin-exists closes path), 5 service tests (happy path + actor- name validation + propagation of strategy errors + nil-deps guard + 32-byte entropy budget), 8 HTTP-handler tests (status 201/410/401/400 mapping + token-leak hygiene scan of slog + audit details + Location header). Token-leak test redirects slog.Default to a buffer for the test scope. # Phase 7 — API-key migration + scope-down CLI * GET /v1/auth/keys handler + service method ListKeys backed by ActorRoleRepository.ListDistinctActors. Returns one row per (actor_id, actor_type) pair with the slice of role IDs they hold. Permission: auth.role.list. * internal/cli/auth_scope_down.go: AuthListKeys, AuthScopeDown (interactive), AuthScopeDownNonInteractive (JSON config), AuthScopeDownSuggest (--suggest with optional --apply). The synthetic actor-demo-anon is filtered out of every interactive / bulk path; non-interactive flow logs and skips it explicitly. * SuggestRoleFromAuditEvents (pure function): walks 30 days of audit events per actor and returns the narrowest matching role (admin / mcp / viewer / agent / operator) plus a one-line reason. Classification: any admin-shaped action wins; otherwise all-MCP → mcp; all-read-only → viewer; all-agent-shaped → agent; otherwise operator. Test table pins all six classifications. * CLI subcommand tree extended: 'auth keys list' + 'auth keys scope-down [--non-interactive <cfg>] [--suggest [--apply]]'. * CHANGELOG.md leads v2.1.0 with the SECURITY: AUDIT YOUR API KEYS call-out + four flow examples. # Phase 8 — auditor role + event_category column * migrations/000032_audit_category.up/down.sql: ALTER TABLE audit_events ADD COLUMN event_category TEXT NOT NULL DEFAULT 'cert_lifecycle' + CHECK constraint (cert_lifecycle/auth/config) + (event_category) and (event_category, timestamp DESC) indexes for the auditor-filter query path. WORM trigger from migration 000018 continues to enforce append-only at the DB layer (DDL is not blocked). * domain.AuditEvent gains EventCategory string (omitempty); domain.EventCategoryCertLifecycle / Auth / Config constants. * AuditService.RecordEventWithCategory sibling of RecordEvent; legacy callers stay on RecordEvent (defaults to cert_lifecycle). Auth callers (RoleService, ActorRoleService, BootstrapService) switched to RecordEventWithCategory(..., 'auth', ...). * GET /v1/audit?category=<cat>: handler accepts the optional query param, validates against the enum (400 on invalid value), dispatches through ListAuditEventsByCategory. OpenAPI updated with the new query param + AuditEvent.event_category schema. * Postgres AuditRepository.Create now writes event_category; AuditRepository.List filters on it; AuditFilter.EventCategory gates the WHERE clause. * Tests: 5 audit-category-filter HTTP tests (dispatch routing, back-compat fallback, 400 for invalid values, all 3 enum values accepted, page+category combine, JSON output surfaces the field). 3 auditor-role invariants (auditor holds exactly audit.read+audit.export, no mutating perms, disjoint from viewer except audit.read). # Cross-phase wiring * HandlerRegistry.Bootstrap field added; cmd/server/main.go wires the bootstrap service ahead of RegisterHandlers (extracted assembleNamedAPIKeys helper into auth_backfill.go, moved the keystore + bootstrap construction up alongside the auth repos). * AuthCheckResolver / AuthActorRoleService extended with ListKeys to satisfy the Phase 7 surface; existing fakes updated. * fakeAudit + mockAuditService stubs in tests gain RecordEventWithCategory + ListAuditEventsByCategory; existing tests untouched. # Verifications * gofmt -l: clean across every modified file. * go vet ./...: clean. * staticcheck across internal/auth + handler + router + cli + service + repository + cmd + domain: clean. * go test -short -count=1: green across every Bundle-1-touched package — internal/auth (incl. bootstrap), internal/api/handler, internal/api/router, internal/cli, internal/service/auth, internal/service, internal/domain/auth, internal/repository/postgres, cmd/server, cmd/cli, plus internal/scheduler, internal/api/middleware, cmd/agent, internal/mcp. |
||
|
|
60a589ab96 |
auth-bundle-1 Phase 0-5 closure: demo-mode wire, named-key backfill, AuthCheck enrichment, OpenAPI schema, intermediate-ca comment refresh
Closes the 5 gaps the post-Phase-5 audit flagged on dev/auth-bundle-1.
C1: cmd/server/main.go now selects auth.NewDemoModeAuth() when
CERTCTL_AUTH_TYPE=none and falls back to auth.NewAuthWithNamedKeys
otherwise. Pre-closure, the no-op pass-through that
NewAuthWithNamedKeys returns for empty keys would have left
ActorIDKey / ActorTypeKey / TenantIDKey unpopulated and 401'd
every Phase-3.5 rbacGate-wrapped admin route + every Phase-4
RBAC handler in demo deployments. NewDemoModeAuth injects the
synthetic 'actor-demo-anon' actor seeded by migration 000029,
which holds r-admin at global scope.
C2: backfillNamedKeyActorRoles startup hook (cmd/server/auth_backfill.go)
iterates CERTCTL_API_KEYS_NAMED entries (and legacy
CERTCTL_AUTH_SECRET synthesized fallbacks) and grants r-admin
or r-viewer to each via authActorRoleRepo.Grant before the
HTTP server starts accepting requests. Idempotent via
ON CONFLICT DO NOTHING in the repo. Failures log a warning but
are non-fatal — the server still starts and the operator can
fix grants via /v1/auth/keys. Helper extracted from main.go so
the role-mapping invariant is pinned by 4 focused unit tests
(admin->r-admin, non-admin->r-viewer, empty no-op,
grant-error non-fatal, nil-logger safe).
M1: HealthHandler.AuthCheck now returns actor_id, actor_type,
tenant_id, roles, effective_permissions, and admin_via_role
when the optional AuthCheckResolver is wired (production path:
authCheckResolverAdapter wraps the postgres ActorRoleRepository
in main.go). Nil resolver preserves the legacy {status, user,
admin} contract for back-compat with pre-Bundle-1 GUIs and
test fixtures. Adds 2 regression tests + 1 fake resolver shim.
M2: refreshes the stale 'Admin gate: every method calls
auth.IsAdmin first' comment on IntermediateCAHandler — the gate
moved to router.go::rbacGate via auth.RequirePermission
middleware in Phase 3.5; the new comment block points readers
there.
M4: 11 RBAC routes (auth/me, auth/permissions, 5 role lifecycle,
2 role-permission grant/revoke, 2 actor-role grant/revoke) added
to api/openapi.yaml under the [Auth] tag with operationIds and
shared AuthRole / AuthRolePermission schemas. AuthCheck path
extended with the Bundle-1 enrichment fields. The 11 entries
removed from openapi_parity_test.go::SpecParityExceptions.
Tests: go vet + staticcheck + go test -short -count=1 green
across cmd/server/, internal/auth/, internal/api/router/, and
internal/api/handler/. New tests: 4 backfill unit tests,
2 AuthCheck M1 enrichment tests, 1 demo-mode + rbacGate chain
integration test (TestRBACGate_DemoModeChainReachesHandler).
Branch SECURITY.md (cowork/auth-bundle-1-SECURITY.md, not part
of this commit) captures the full posture of dev/auth-bundle-1
as of this closure for the operator's pre-merge review.
|
||
|
|
7ff2e2de08 |
auth-bundle-1 Phase 3.5: handler IsAdmin -> router-wrapped RequirePermission
Phase 3.5 atomic conversion. The five legacy admin-gated handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) had their in-body auth.IsAdmin checks removed; the gate moved to router.go via auth.RequirePermission middleware wrapping each route. Non-admin operators with the right scoped permission can now reach these endpoints; legacy in-body admin checks no longer block them.
Migration 000030_rbac_admin_perms.up.sql ships five admin-only fine-grained permissions: cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage. All five are seeded into r-admin only; operator/viewer/agent/mcp/cli/auditor do not receive them by default. Operators can grant any of these to a custom role via the Phase 4 RBAC API. Idempotent + transaction-wrapped.
internal/domain/auth/validate.go::CanonicalPermissions extended with the five new entries so RoleService.AddPermission accepts them.
internal/api/router/router.go: HandlerRegistry gains a Checker field (auth.PermissionChecker). New rbacGate(checker, perm, handler) helper wraps a handler with auth.RequirePermission middleware; nil-checker fall-through preserves test/demo deployments without the RBAC stack. 12 admin routes wrapped: cert.bulk_revoke (POST /api/v1/certificates/bulk-revoke + POST /api/v1/est/certificates/bulk-revoke), crl.admin (GET /api/v1/admin/crl/cache), scep.admin (GET /api/v1/admin/scep/profiles + GET /api/v1/admin/scep/intune/stats + POST /api/v1/admin/scep/intune/reload-trust), est.admin (GET /api/v1/admin/est/profiles + POST /api/v1/admin/est/reload-trust), ca.hierarchy.manage (POST /api/v1/issuers/{id}/intermediates + GET /api/v1/issuers/{id}/intermediates + POST /api/v1/intermediates/{id}/retire + GET /api/v1/intermediates/{id}).
cmd/server/main.go: HandlerRegistry.Checker wired with the same authPermissionCheckerAdapter shim Phase 4 introduced for AuthHandler. Same adapter; one source of truth.
Handler bodies: removed eight in-body auth.IsAdmin checks across the 5 files. bulk_revocation.go's BulkRevoke + BulkRevokeEST, admin_crl_cache.go::ListCache, admin_scep_intune.go's three methods, admin_est.go's two methods, intermediate_ca.go's four methods. Replaced each with a comment naming the new gate location. Unused 'github.com/certctl-io/certctl/internal/auth' imports removed.
Test triplet rewrite: deleted obsolete _NonAdmin_Returns403 and _AdminExplicitFalse_Returns403 tests across 6 test files (5 handler tests + bulk_revocation_est_test.go) — they tested the now-removed in-body gate. _AdminPermitted_ForwardsActor tests stay intact: they pin the actor-passthrough invariant which is still relevant. Added internal/api/router/rbac_gate_integration_test.go with four router-level integration tests pinning the new gate: deny → 403 + handler not reached, permit → 200 + handler reached, nil-checker → fall-through, no-actor → 401.
M-008 admin-gate registry: AdminGatedHandlers map now empty (Phase 3.5 invariant: zero in-handler auth.IsAdmin call sites; only health.go's informational caller remains). m008_admin_gate_test.go retains the scan to enforce the invariant going forward; new admin-gated routes must wrap at router.go::rbacGate, not gate in-handler. Updated error message to direct future contributors to the new pattern.
Verifications: gofmt clean across all touched files; go vet ./... clean; go test -short across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, cmd/server all green.
Branch: dev/auth-bundle-1. Commit chain:
|
||
|
|
b169f258de |
auth-bundle-1 Phase 4 + 5: RBAC HTTP API + CLI surface
Phase 4 (HTTP API): * internal/api/handler/auth.go: AuthHandler with 12 endpoints under /api/v1/auth/* — ListRoles, GetRole, CreateRole, UpdateRole, DeleteRole, ListPermissions, AddRolePermission, RemoveRolePermission, AssignRoleToKey, RevokeRoleFromKey, Me. callerFromRequest builds an authsvc.Caller from the Phase 3 ActorIDKey/ActorTypeKey/TenantIDKey context values. writeAuthError translates service + repository sentinels into HTTP status codes (401/403/404/409/400/500). 14 handler tests with in-memory fakes pin the HTTP shape + error mapping. * internal/api/router/router.go: HandlerRegistry gains an Auth field; 11 new routes registered. openapi_parity_test SpecParityExceptions extended with the new auth routes (OpenAPI YAML schema land in a Phase 4 follow-up commit so the schema review is its own atomic change; the route shape is fully documented inline via the Go type definitions until then). * cmd/server/main.go: wires the postgres auth repos (RoleRepository, PermissionRepository, ActorRoleRepository) + the Authorizer + RoleService/PermissionService/ActorRoleService into the new AuthHandler. Adds authPermissionCheckerAdapter to bridge the typed-string Authorizer signature to the auth.PermissionChecker interface (avoids an internal/auth → internal/service/auth import cycle). Phase 5 (CLI): * cmd/cli/main.go: adds 'auth' command dispatch with subcommands roles/permissions/keys/me. * internal/cli/auth.go: AuthMe, AuthListRoles, AuthGetRole, AuthListPermissions, AuthAssignRoleToKey, AuthRevokeRoleFromKey methods on Client. Mirrors the Phase 4 HTTP surface. Phase 3.5 (handler IsAdmin → middleware-wrapped RequirePermission) DEFERRED. Honest reasoning: (1) The 5 admin handlers (bulk_revocation, admin_crl_cache, admin_scep_intune, admin_est, intermediate_ca) currently gate via auth.IsAdmin checks INSIDE the handler bodies. Converting cleanly requires moving the gate to the router (auth.RequirePermission middleware wrap) AND removing the in-handler check AND rewriting the existing 3-test triplets per handler (M-008 pinned: _NonAdmin_Returns403 / _AdminExplicitFalse_Returns403 / _AdminPermitted_ForwardsActor) because the existing tests call the handler function directly, bypassing middleware. After conversion, those tests would pass without 403'ing because the gate moved away — the test invariants need to flow through a router-level integration setup instead. (2) Picking the right permission per handler is a security-review-worthy decision. Using existing operator-class perms (cert.revoke, issuer.edit) widens access from admin-only to operator-class; adding new admin-only perms (cert.bulk_revoke, crl.admin, scep.admin, est.admin, ca.hierarchy.manage) requires a migration 000030 plus a coordinated catalogue update in internal/domain/auth/validate.go. Both options are defensible but warrant a focused commit, not a 5-handler sweep mixed in with the API + CLI work. (3) The conversion can be done now without functional regressions IF we leave the in-handler IsAdmin checks in place AND add middleware wraps as defense-in-depth — but that's the worst of both worlds (legacy gate still blocks non-admin operators, defeating the point of RBAC; new gate adds runtime cost with no semantic change). A clean conversion needs the in-handler check removed. Concrete plan for Phase 3.5 (separate commit, next session): (a) add new admin-only perms via migration 000030 OR document the widening to operator-class; (b) wrap each of the 5 admin routes with auth.RequirePermission(checker, perm, nil) in router.go; (c) remove auth.IsAdmin checks from the 5 handler bodies; (d) move the M-008 _NonAdmin/_AdminExplicitFalse tests to router-level integration tests, keep _AdminPermitted as a direct handler test for actor-passthrough; (e) update m008_admin_gate_test.go registry to track auth.RequirePermission middleware wraps in router.go instead of auth.IsAdmin call sites in handler files. Verifications: go vet ./... clean; gofmt clean across all touched files; go test -short -count=1 across internal/auth, internal/service/auth, internal/api/handler, internal/api/router, internal/cli, cmd/server, cmd/cli all green (one transient too-many-open-files retry on internal/cli + internal/api/router; second run clean). Branch: dev/auth-bundle-1. Commit chain: |
||
|
|
d473398aba |
auth-bundle-1 Phase 3 (primitive): RequirePermission middleware + demo-mode + protocol allowlist
Bundle 1 / Phase 3 (primitive ship): the load-bearing RBAC middleware factory plus its dependencies. Handler conversion sweep (5 admin files: bulk_revocation.go, admin_crl_cache.go, admin_scep_intune.go, admin_est.go, intermediate_ca.go) + m008_admin_gate_test.go registry update is Phase 3.5 follow-on; this commit ships the primitive so 3.5 is mechanical. New context keys (internal/auth/context.go): ActorIDKey, ActorTypeKey, TenantIDKey alongside the legacy UserKey + AdminKey. New helpers GetActorID / GetActorType / GetTenantID with safe fallbacks (UserKey for actor id, ActorTypeAPIKey for missing type, DefaultTenantID for missing tenant). Constants DemoAnonActorID + ActorTypeAPIKey + ActorTypeAnonymous mirror internal/domain/auth without an import cycle. RequirePermission factory (internal/auth/require_permission.go): wraps a handler and gates it behind a named permission. 401 when no actor, 403 when actor lacks permission, 500 on repository error. Skips the gate entirely for protocol endpoints (ACME / SCEP / EST / OCSP / CRL) per the audit's Category F do-not-gate allowlist. PermissionChecker is an interface so internal/auth doesn't depend on internal/service/auth (cmd/server wires the concrete Authorizer at startup). HasPermission is the imperative variant for handlers that branch behaviour rather than 403'ing. ScopeFunc closure extracts the scope type + id from the request for per-resource gating. Protocol-endpoint allowlist (internal/auth/protocol_endpoints.go): IsProtocolEndpoint matches /acme, /scep, /.well-known/est, /.well-known/pki/ocsp, /.well-known/pki/crl prefixes. Adding a new protocol endpoint MUST update this list and add a parallel test. Demo-mode synthetic admin (internal/auth/middleware.go::NewDemoModeAuth): when CERTCTL_AUTH_TYPE=none is configured, this middleware injects ActorID=actor-demo-anon, ActorType=Anonymous, TenantID=t-default, plus the legacy UserKey + AdminKey for back-compat with existing handlers. The synthetic actor's admin-role grant is seeded by migration 000029 so RequirePermission resolves through the JOIN like any other actor. cmd/server startup wires this middleware only when none-mode is configured. API-key middleware extension: NewAuthWithNamedKeys now populates the new keys (ActorIDKey, ActorTypeKey=APIKey, TenantIDKey=t-default) alongside UserKey + AdminKey on every successful Bearer match. Existing handlers continue to read UserKey / IsAdmin until the Phase 3.5 sweep converts them to RequirePermission. Test coverage: TestRequirePermission_NoActorReturns401, TestRequirePermission_GrantedActorReaches200, TestRequirePermission_DeniedActorReturns403, TestRequirePermission_CheckerErrorReturns500, TestRequirePermission_ProtocolEndpointBypassesGate (covers all 5 prefixes), TestRequirePermission_ScopeFnExtractsResourceID, TestIsProtocolEndpoint_PrefixesOnly, TestNewDemoModeAuth_InjectsSyntheticActor, TestNewAuthWithNamedKeys_PopulatesPhase3ContextKeys. fakeChecker pins the contract without a database. Phase 3.5 follow-on (NOT in this commit): convert each of the 5 admin handlers from auth.IsAdmin checks to auth.RequirePermission middleware in router.go; update internal/api/handler/m008_admin_gate_test.go to track auth.RequirePermission call sites instead of (or alongside) auth.IsAdmin; pick the right permission per handler (cert.revoke for bulk_revocation, etc.). Each handler conversion needs the 3-test triplet (_NonAdmin_Returns403 / _AdminExplicitFalse_Returns403 / _AdminPermitted_ForwardsActor) per M-008. Branch: dev/auth-bundle-1. Phase 2 was prior commit (service layer). Phase 3.5 (handler conversion) + Phase 4 (HTTP API) on the next session. |
||
|
|
bd54d5f7fa |
auth-bundle-1 Phase 2: RBAC service layer + Authorizer primitive
Bundle 1 / Phase 2: ships PermissionService, RoleService, ActorRoleService, and the Authorizer primitive that Phase 3 RequirePermission middleware calls on every gated request.
Authorizer.CheckPermission semantics: a grant matches when (a) the permission name equals the requested permission AND (b) the grant is global-scoped OR the grant scope_type+scope_id exactly match the request. Global beats specific; per-resource grants widen the effective set rather than shadowing global. Hot-path query is one ActorRoleRepository.EffectivePermissions JOIN call (already shipped in Phase 1) plus an in-memory walk; Phase 12 will add benchmarks + caching if the JOIN cost shows up at scale.
Privilege-escalation guard: ActorRoleService.Grant and Revoke require the caller to hold auth.role.assign globally. Without it, ErrSelfRoleAssignment. System callers (AsSystemCaller()) bypass the check; bootstrap, migrations, scheduler-initiated grants use this path. Reserved actor actor-demo-anon is rejected on Grant + Revoke so the demo path stays alive even after a misclick (ErrAuthReservedActor).
Caller abstraction: every service entry point takes *Caller (ActorID, ActorType, TenantID, IsSystem). CallerFromContext is a stub returning ErrUnauthenticated; Phase 3 wires the middleware-context bridge that fills the Caller from request context. The contract is pinned by TestCallerFromContext_Phase2ReturnsUnauthenticated so the Phase 3 upgrade is observable.
Audit recording: every mutating service operation calls AuditService.RecordEvent. Bundle 1 Phase 8 adds the event_category column + parameter and back-fills 'auth' for these calls; until then the rows go in with the default category.
Test coverage: in-memory fakeRoleRepo / fakePermissionRepo / fakeActorRoleRepo / fakeAudit pin the privilege-escalation invariants (ErrUnauthenticated for nil caller, ErrForbidden for missing perm, ErrInvalidPermission for non-canonical permission name, ErrSelfRoleAssignment for Grant without auth.role.assign, ErrAuthReservedActor for actor-demo-anon mutations, system-caller bypass) without requiring testcontainers. Phase 12 will add live-Postgres integration coverage.
Branch: dev/auth-bundle-1. Phase 1 was
|
||
|
|
19497eef87 |
auth-bundle-1 Phase 1: RBAC schema + domain types + repository layer
Bundle 1 / Phase 1: ships the RBAC primitive as schema + domain types + repo layer. Service-layer wiring lands in Phase 2; middleware integration in Phase 3.
Schema (migrations/000029_rbac.up.sql, 272 lines, idempotent, transaction-wrapped):
tenants, roles, permissions, role_permissions, actor_roles. TEXT primary keys with prefixes (t-, r-, p-, ar-) per CLAUDE.md Architecture Decisions. TIMESTAMPTZ time columns. FK cascade explicit (tenant CASCADE, role RESTRICT, actor CASCADE). Three-value scope_type CHECK ('global', 'profile', 'issuer') matched 1:1 with internal/domain/auth.ScopeType. UNIQUE(tenant_id, name) on roles, UNIQUE(name) on permissions, UNIQUE(actor_id, actor_type, role_id, tenant_id) on actor_roles.
Seeds: t-default tenant, 7 default roles (admin, operator, viewer, agent, mcp, cli, auditor), 33-permission canonical catalogue (cert.* / profile.* / issuer.* / target.* / agent.* / audit.* / auth.role.* / auth.key.* / auth.bootstrap.use), full role->permission grant matrix at global scope. Demo-mode preservation: actor-demo-anon seeded with admin role unconditionally; Phase 3 wires the auth middleware to inject this actor into the context when CERTCTL_AUTH_TYPE=none. Reserved system actor; Phase 4 API rejects mutations / deletions targeting it with 409 Conflict.
Domain types (internal/domain/auth/{types,validate,validate_test}.go):
Tenant, Role, Permission, RolePermission, ActorRole structs with JSON tags. ScopeType enum (global/profile/issuer). ActorTypeValue mirrors internal/domain.ActorType to avoid an import cycle. CanonicalPermissions slice + DefaultRoles map are the single source of truth referenced by the migration; validate_test.go pins (a) no duplicate permissions, (b) every default-role permission is canonical, (c) admin holds the full catalogue, (d) seeded IDs carry the prefix convention, (e) ScopeType enum has exactly 3 values matching the CHECK constraint.
Extended internal/domain/audit.go: added ActorTypeAPIKey + ActorTypeAnonymous to the existing User/System/Agent enum so the audit trail can distinguish API-key requests from federated humans (Bundle 2 OIDC) and demo-mode (CERTCTL_AUTH_TYPE=none). Existing code that records actor_type=User keeps working; new APIKey value used by Bundle 1 Phase 3 middleware.
Repository layer (internal/repository/auth.go + internal/repository/postgres/auth.go):
TenantRepository (Get, List, EnsureDefault). RoleRepository (Get, GetByName, List, Create, Update, Delete with ErrAuthRoleInUse on FK RESTRICT, ListPermissions, AddPermission idempotent, RemovePermission). PermissionRepository (List, GetByName, IsCanonical for fail-fast catalog check). ActorRoleRepository (ListByActor, ListByRole, Grant idempotent, Revoke, EffectivePermissions which is the JOIN that auth.RequirePermission will use in Phase 3 — returns deduplicated (permission, scope) triples honouring the not-yet-expired predicate so future time-bound grants work without code change). Sentinel errors ErrAuthNotFound, ErrAuthDuplicateName, ErrAuthRoleInUse, ErrAuthReservedActor, ErrAuthUnknownPermission for handler-layer 404/409/400 mapping.
Verification: gofmt clean, go vet ./... clean, go test -short ./internal/domain/auth ./internal/repository/postgres pass. Integration tests against a live Postgres are gated by testing.Short() per repo convention; Phase 12 wires the testcontainers harness for full e2e coverage.
Branch: dev/auth-bundle-1. Phase 0 was
|
||
|
|
99a012e3be |
auth-bundle-1 Phase 0: extract internal/auth/ from middleware package
Bundle 1 / Phase 0: pure refactor splitting auth surface out of internal/api/middleware so Bundle 2 (OIDC + sessions) and the broader RBAC primitive (roles, permissions, scoped grants) have a clean home. Moved to internal/auth/: NamedAPIKey, HashAPIKey, AuthConfig, NewAuthWithNamedKeys, NewAuth, UserKey, AdminKey, GetUser, IsAdmin. Added testfixtures.go (WithActor / WithAdmin / WithActorAdmin) so handler tests don't construct context manually. Stayed in internal/api/middleware/: RequestID, Logging, NewLogging, Recovery, RateLimitConfig, NewRateLimiter (now imports auth.GetUser for per-user keying per audit Category C), CORSConfig, NewCORS, ContentType, CORS, GetRequestID, responseWriter, Chain, audit middleware (now imports auth.GetUser). Updated 22 caller files across cmd/, internal/api/handler/, internal/api/middleware/, internal/mcp/. Existing m008_admin_gate_test.go now scans for auth.IsAdmin( substring; Phase 3 will further evolve to track auth.RequirePermission. Behavior unchanged: all handler / middleware / service / connector / cmd / mcp tests pass with no test-logic edits, only import-path renames. Phase 0 exit criteria: internal/auth/ exists with 6 files; middleware.go went 575 -> 422 lines (auth-related ~150 lines moved out); grep -rE 'middleware\.(GetUser|IsAdmin|UserKey|AdminKey|NamedAPIKey|HashAPIKey|NewAuth)' returns 0 hits; context.WithValue(.*middleware.UserKey/AdminKey) returns 0 hits; go vet ./... clean; go test -short ./... green across all packages tested. Branch: dev/auth-bundle-1. Per cowork/auth-bundle-1-prompt.md, do not merge to master without (1) make verify green, (2) >= 2 external testers confirm, (3) >= 90% coverage on internal/auth/ in .github/coverage-thresholds.yml. |
||
|
|
71ebccb8ba |
docs: fix broken ../examples/ links across docs/ (closes #11)
Reporter (thesudoer0003) flagged that the example links in
docs/getting-started/examples.md resolve to /docs/examples/ which
does not exist. Same bug pattern shows up in four other docs files.
The example READMEs live at examples/<name>/<name>.md at the repo
root, not under docs/. The references in the docs/ tree used
relative paths like `../examples/acme-nginx/acme-nginx.md` which
resolve to docs/examples/... — one level short of escaping out to
the repo root. Fix is one extra `../` so the path resolves to
examples/... at repo root, where the files actually live.
Files touched:
docs/getting-started/examples.md 5 links
docs/getting-started/why-certctl.md 1 link
docs/migration/cert-manager-coexistence.md 1 link
docs/migration/from-acmesh.md 1 link
docs/migration/from-certbot.md 1 link
Verified: every `../../examples/<name>/<name>.md` reference now
resolves to the on-disk file. Re-checked via:
for f in $(grep -rl 'examples/' docs/); do
for link in $(grep -oE '\.\./\.\./examples/[^)]*' "$f"); do
[ -e "$(dirname "$f")/$link" ] || echo "STILL BROKEN: $f -> $link"
done
done
zero "STILL BROKEN" output.
Closes #11
|
||
|
|
ff6bf8f203 |
docs(README): add Status: Early-access disclosure block
Reddit posts and operator-facing copy describe certctl as alpha for production, but the README's marketing-paragraph framing implied a more polished maturity. Dual-positioning erodes credibility because evaluators read both surfaces. Adds a dedicated "Status: Early-access" blockquote between the SC-081v3 paragraph and the existing "Actively maintained, shipping weekly" callout. Calls out the production-quality core (Local CA, ACME, agent deployment, CRUD, audit) versus the still-maturing broader surface (intermediate CA hierarchy, ACME/SCEP/EST servers, network appliances). Encourages lab/dev deployments and welcomes production deployments with the customer-scale caveat. The two consecutive blockquotes (Status + Actively maintained) read as paired signals: the project is early-access AND actively shipping, which is the honest joint position. |
||
|
|
7a9ae3157f |
fix(seed): repair deployment_targets FK violation crashing fresh demo boot
The Rank 5 cloud-target seed rows in `seed_demo.sql` referenced a non-existent `ag-server` agent_id. On every fresh-clone `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up` the server crash-looped at the demo-seed step: pq: insert or update on table "deployment_targets" violates foreign key constraint "deployment_targets_agent_id_fkey" Origin: commitv2.0.71 |
||
|
|
1720e11109 |
docs: fix broken single-file demo invocation in README + qa-prerequisites + ENVIRONMENTS
The README's Quick Start, the qa-prerequisites contributor doc, and the
landing page (separate repo, separate commit) all shipped a copy-paste
command that produces:
service "certctl-server" has neither an image nor a build context
specified: invalid compose project
The bug landed silently with commit
|
||
|
|
f40e975439 |
gui(certificates): surface profile contract in create-cert form (closes P3-3, P3-4, P3-5)
Closes findings P3-3, P3-4, P3-5 from the 2026-05-05 CLI/API/MCP↔GUI
parity audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). The
audit flagged three "hidden defaults" in the create-certificate form:
environment='production', shortLived=false, selectedEkus=['serverAuth'].
Re-grounding against the live source:
P3-3 was a false positive. The form already exposes an environment
selector with three options (Production / Staging / Development) and
defaults to Production. No change needed — covered by new test pin.
P3-4 + P3-5 misread the architecture. allow_short_lived and
allowed_ekus are NOT per-cert form-state fields; they are properties
of the CertificateProfile that the operator binds via the existing
Profile dropdown. Adding form-level toggles for them would contradict
the profile-as-primitive design (the profile carries the policy
contract — TTL, EKUs, key-algo allow-list, short-lived eligibility —
so the cert can inherit a coherent set rather than letting operators
hand-mix invalid combinations).
The genuine UX gap was opacity: operators picked a profile without
seeing what allow_short_lived / allowed_ekus the profile carried.
This commit closes the spirit of the finding by surfacing the selected
profile's load-bearing properties in a read-only "Profile contract"
panel that appears below the Profile dropdown once a profile is
selected. The panel shows:
- allowed_ekus list (so operators see whether a profile is
serverAuth, emailProtection, codeSigning, or a mix)
- allow_short_lived flag (highlighted when true so operators know
they're picking a profile that allows TTL < 1h CRL/OCSP-exempt
certs per the M15b regime)
- explanatory text that EKUs and short-lived eligibility are
profile-level (not per-cert), guiding operators to edit the
profile or pick a different one
Test pins (web/src/pages/CertificatesPage.test.tsx):
- environment selector renders with 3 options, defaults to production
- environment selector toggles to staging / development on change
- Profile contract panel is hidden until a profile is selected
- Profile contract panel surfaces allowed_ekus when a TLS-server
profile is picked
- Profile contract panel surfaces emailProtection EKU when an S/MIME
profile is picked (closes the "S/MIME flows can't be initiated
from the GUI" sub-finding — they can, by picking an emailProtection
profile)
- Profile contract panel flags allow_short_lived=true when an IoT
short-lived profile is picked (closes the "operators can't issue
short-lived certs through the GUI" sub-finding — they can, by
picking an allow_short_lived profile)
Implementation notes:
- data-testid='cert-form-environment' + 'cert-form-profile' +
'cert-form-profile-detail' added to make the test selectors stable
across DOM-restructuring refactors. No production behaviour change
from the test IDs.
- No new dependencies; no form-library introduction (per the prompt's
out-of-scope list); uses the existing bare React state pattern.
- No API changes — Certificate.allowed_ekus / allow_short_lived
already exist on the CertificateProfile type in web/src/api/types.ts.
Acceptance gate (verified):
- npm test on src/pages/CertificatesPage.test.tsx: 12/12 pass
(6 pre-existing T-1 tests + 6 new P3-3..P3-5 pins).
- All sibling page tests (AuditPage, TargetDetailPage, ShortLivedPage,
etc.) still pass.
|
||
|
|
0e06f6c4fc |
cli: promote --force on renew + require --reason on revoke (closes P3-1, P3-2)
Closes findings P3-1 and P3-2 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Both findings
flagged hidden defaults that the CLI was sending without exposing them
to operators: `force=false` baked into every renew payload, and a silent
fallback to `reason="unspecified"` whenever --reason was omitted.
P3-1 — promote --force on `certs renew` (full end-to-end plumbing)
The pre-2026-05-05 CLI sent `{"force": false}` in the renew body. The
API handler never decoded it — a textbook "lying field" per the
operator's CLAUDE.md "complete path, not the easy path" rule: the body
field stored a value, claimed to do something, and silently did nothing
because the wire never reached the consumer. Adding a --force flag that
also went unread would have created another lying field.
This commit takes the complete path:
service.CertificateService.TriggerRenewal grew a `force bool` parameter
(internal/service/certificate.go). When force=true, the
RenewalInProgress block is overridden so operators can recover stuck
in-flight renewals where a previous job hung without releasing the
status flag. Archived and Expired remain terminal blockers regardless
of force — those are semantic dead-ends that --force should not paper
over (archived = decommissioned, expired = issue a new cert instead of
renewing a dead one).
handler.CertificateHandler.TriggerRenewal parses force from
?force=true (or ?force=1) query param, OR {"force": true} JSON body,
whichever the client picks. Defaults to false. Passes through to the
service.
internal/cli/client.go::RenewCertificate(id, force bool) sends
?force=true on the URL when --force is set. The historical hardcoded
`{"force": false}` body is gone — no more lying field.
cmd/cli/main.go dispatches `certs renew <id> [--force]` (ID-first
flag-second convention matches the existing `agents retire <id>
[--force]`).
P3-2 — require --reason on `certs revoke` (Option A: strict refusal)
The pre-2026-05-05 CLI dropped to `--reason unspecified` whenever the
operator omitted the flag. Compliance reporting (RFC 5280 §5.3.1, PCI-
DSS §3.6, HIPAA §164.312) relies on the reason code being meaningful;
silent fallback defeats the audit trail because every revocation looks
identical.
cmd/cli/main.go dispatch refuses to send when --reason is empty,
prints the canonical RFC 5280 §5.3.1 reason-code menu, and exits
non-zero.
internal/cli/client.go exposes ValidRevokeReasons() returning the
canonical camelCase list (unspecified, keyCompromise, caCompromise,
affiliationChanged, superseded, cessationOfOperation, certificateHold,
removeFromCRL, privilegeWithdrawn, aaCompromise) and
NormalizeRevokeReason() that accepts both camelCase and snake_case
inputs and normalises to the canonical wire form. Off-list reasons
are rejected at dispatch with the menu re-printed.
Test pins:
internal/cli/client_test.go::TestClient_RenewCertificate_ForceFlag —
--force=true sends ?force=true with empty body; --force=false sends
no query and no body.
internal/cli/client_test.go::TestNormalizeRevokeReason +
TestValidRevokeReasons — canonical-camelCase + snake_case + reject-
off-enum behaviour.
cmd/cli/dispatch_test.go::TestHandleCerts_Revoke_RequiresReason +
TestHandleCerts_Revoke_RejectsUnknownReason +
TestHandleCerts_Renew_ForceFlag — dispatch-layer pins for the same
contracts.
internal/api/handler/certificate_handler_test.go::TestTriggerRenewal_
ForceQueryParam — query-param passthrough (no-flag, force=true,
force=1, force=false) flows through to the service-layer parameter.
internal/service/certificate_test.go::TestTriggerRenewal_
ForceOverridesInProgress — force=false preserves the
RenewalInProgress block; force=true clears it.
Existing TestTriggerRenewal_Archived extended to assert force=true
still blocks Archived (terminal-state guarantee).
Docs: docs/reference/cli.md updated with the --force example for renew
and the strict --reason semantics for revoke (including snake_case
input acceptance).
Acceptance gate (verified):
- go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/...
./cmd/mcp-server/... clean.
- go vet ./... clean.
- go test -short -count=1 ./... pass repo-wide.
- bash scripts/ci-guards/openapi-handler-parity.sh clean
(router 178, OpenAPI 144, exceptions 36 — unchanged; we add
parameter parsing, not routes).
- gofmt -l clean.
|
||
|
|
ff75361553 |
mcp(coverage): add 34 tools across 7 domains to close 2026-05-05 parity audit P1 findings
Closes findings P1-1..P1-35 from the 2026-05-05 CLI/API/MCP↔GUI parity
audit (cowork/cli-gui-parity-audit-2026-05-05/RESULTS.md). Before this
bundle, 35 operator-facing API endpoints had GUI surfaces but no MCP
counterpart — operators using AI assistants for cert lifecycle work in
regulated environments had to drop to curl for approve/reject, health-check
acknowledgement, renewal-policy CRUD, network-scan triggering, discovery
triage, intermediate-CA management, and job verification.
Tool count: 87→121 in tools.go (+34), 6 unchanged in tools_est.go.
Re-derive via grep -cE 'gomcp\\.AddTool\\(' internal/mcp/tools.go
internal/mcp/tools_est.go.
The 7 phases (matching the bundle prompt at
cowork/mcp-coverage-expansion-prompt.md):
Phase A — Approvals (P1-28..P1-31, 4 tools)
list_approvals, get_approval, approve_request, reject_request.
Two-person-integrity contract (ErrApproveBySameActor → HTTP 403)
is preserved automatically: the decided_by actor is derived
server-side from middleware.UserKey, NOT from request body, so
the MCP server's authenticated API-key identity becomes the
audit-trail actor. The MCP input schema deliberately omits any
actor_id field to prevent client-side spoofing.
Phase B — Health Checks (P1-20..P1-27, 8 tools)
list, summary, get, create, update, delete, history, acknowledge.
Mirrors the existing target-resource shape; acknowledge takes
optional 'actor' string captured in the audit row (handler defaults
to 'unknown' if absent).
Phase C — Renewal Policies (P1-1..P1-5, 5 tools)
Standard CRUD against /api/v1/renewal-policies. Distinct from the
legacy 'policy' tools that point at the same path — these expose
the renewal-policy domain explicitly with full alert_channels +
alert_severity_map field shape.
Phase D — Network Scan Targets (P1-14..P1-19, 6 tools)
CRUD + trigger_scan. trigger_network_scan returns the discovery-
scan body so the AI can chain into list_discovered_certificates
filtered by agent_id.
Phase E — Discovery read-side (P1-10..P1-13, 4 tools)
list_discovered_certificates, get_discovered_certificate,
list_discovery_scans, discovery_summary. Complements the
pre-existing claim/dismiss tools (registered alongside Health
historically per the I-2 closure).
Phase F — Intermediate CAs (P1-6..P1-9, 4 tools)
list, create (root + child via discriminator on body shape), get,
retire. The handler is admin-gated via middleware.IsAdmin; the
least-privilege boundary is enforced at the API layer (HTTP 403
for non-admin Bearer callers) — not by transport carve-out.
Phase G — Verification + deployments (P1-32, P1-34, P1-35, 3 tools)
list_certificate_deployments, verify_job, get_job_verification.
P1-33 (POST /api/v1/agents/{id}/discoveries) is intentionally
excluded — machine-to-machine push channel for agents reporting
filesystem-scan results, not an operator-driven flow. Documented
inline in the RegisterTools dispatch.
Implementation:
- 14 new input types in internal/mcp/types.go with jsonschema struct
tags driving LLM tool discovery.
- 7 register* functions in internal/mcp/tools.go each handling one
phase, wired into RegisterTools dispatch in declaration order.
- 34 new entries in tools_per_tool_test.go::allHappyPathCases —
the existing in-process MCP harness (TestMCP_AllTools_HappyPath +
TestMCP_AllTools_ErrorPath + TestMCP_RegisterTools_DispatchableToolCount)
auto-extends coverage to cover every new tool: happy-path round-
trip with fence-shape assertion, 5xx error-path with MCP_ERROR fence
propagation, and 'every registered tool is dispatchable' guard.
- docs/reference/mcp.md 'Available Tools' table expanded from 16 to
22 resource domains with current per-domain tool counts.
Acceptance gate (verified):
- go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... ./cmd/mcp-server/...
clean across all four production binaries.
- go vet ./... clean.
- go test -short -count=1 ./internal/mcp/... pass (TestMCP_AllTools_*
expanded to 127 tool round-trips).
- go test -short -count=1 ./... pass repo-wide.
- bash scripts/ci-guards/openapi-handler-parity.sh clean (router 178,
OpenAPI 144, exceptions 36 — unchanged; we add MCP wrappers, not
routes).
- gofmt -l clean across the four touched files.
|
||
|
|
e0aaa967c9 |
docs(README): add MCP server bullet to capabilities list
The README's 'What it does' section enumerated 11 capability bullets (issuers / targets / ACME server / SCEP server / EST server / hierarchy / approvals / discovery / revocation / alerts) but had zero mention of the MCP server. The 2026-05-05 CLI/API/MCP ↔ GUI parity audit confirmed 93 MCP tools shipped today (87 in internal/mcp/tools.go + 6 in internal/mcp/tools_est.go) covering the full API surface. That's a real differentiator hidden from anyone landing on the README. Adds a 12th bullet positioning the MCP server with concrete example queries operators can ask their AI client (expiring certs, revoke with key-compromise reason, agent offline check). Frames the architectural facts: separate binary at cmd/mcp-server/, stateless stdio transport, no extra auth surface beyond the existing API key, no extra attack surface. Links to docs/reference/mcp.md for setup details. |
||
|
|
17455d2ea2 |
deps(web): pin picomatch to >=4.0.4 via npm override; clears 4 dependabot alerts
Dependabot flagged four picomatch vulnerabilities in web/package-lock.json: #8 GHSA-?, ReDoS via extglob quantifiers #9 GHSA-?, ReDoS via extglob quantifiers (related to #8) #10 CVE-2026-33672 / GHSA-3v7f-55p6-f55p, method injection via POSIX character classes (related; affecting < 2.3.2) #11 CVE-2026-33672 / GHSA-3v7f-55p6-f55p, method injection via POSIX character classes — same advisory as #10, separate Dependabot row because it surfaces against a second copy of picomatch in the dep tree All four close on the same fix: every resolved picomatch instance must be >= 4.0.4 (or >= 3.0.2, or >= 2.3.2 — the patch shipped on all three release lines). Pre-fix the lockfile carried at least two vulnerable copies: node_modules/picomatch v2.3.1 (vuln) node_modules/vitest/node_modules/picomatch v4.0.3 (vuln for #11) node_modules/vite/node_modules/picomatch v4.0.4 (ok) node_modules/tinyglobby/node_modules/picomatch v4.0.4 (ok) Reachability check before fixing: - picomatch is a build-time glob-matching tool (used by tailwindcss → readdirp/anymatch/micromatch chain, plus by vite + vitest internals). - All instances in our tree are dev=true. None are bundled into the React production output (web/dist/assets/*.js) — that's just the React SPA, no node_modules at runtime. - The CVE only affects code that processes UNTRUSTED glob patterns. Our build pipeline only globs operator-controlled file patterns (TSX source files, Tailwind 'content' globs). Not network-reachable. So the CVE was not reachable from any shipped certctl artefact. Fix anyway because the alerts are noise. Fix mechanism: add an npm 'overrides' entry pinning picomatch to ^4.0.4 across all consumers. npm collapses every transitive picomatch resolution to the override, so the lockfile shrinks from 4 picomatch entries to 1, all on v4.0.4 (patched). Verification: npm install --package-lock-only → up to date, 0 vuln npm audit → found 0 vulnerabilities Diff: 2 files, 7 insertions / 43 deletions (net negative — the override de-duplicates the picomatch tree). Closes: GHSA-3v7f-55p6-f55p, CVE-2026-33672 (alerts #10, #11) + the two related ReDoS picomatch alerts (#8, #9) |
||
|
|
f2c77ba3fb |
deps: bump testcontainers-go v0.35.0 → v0.42.0; drops docker/docker dep entirely (clears CVE-2026-34040)
Dependabot flagged GHSA-x744-4wpc-v9h2 / CVE-2026-34040 (Moby AuthZ
plugin bypass on oversized request bodies, incomplete fix for
CVE-2024-41110) on the transitive github.com/docker/docker
v27.1.1+incompatible pulled in via testcontainers-go v0.35.0.
Reachability check before fixing:
- certctl does not run dockerd or configure AuthZ plugins.
- go list -deps ./cmd/{server,agent,cli,mcp-server}/... finds zero
docker/docker references in any production binary's transitive
set.
- testcontainers is consumed only by *_test.go files under
internal/repository/postgres/ + deploy/test/ for ephemeral
Postgres containers.
So the CVE was not reachable from any shipped certctl artefact.
Bump anyway because Dependabot noise is noise; the upgrade is
mechanical.
Bumping testcontainers-go v0.35.0 → v0.42.0 (latest, 2026-04-09)
removes the direct docker/docker dependency entirely — testcontainers
v0.42.0 reorganized away from the Moby SDK. After 'go mod tidy',
docker/docker is GONE from both go.mod and go.sum, not merely
bumped. The Dependabot alert closes automatically on push.
Co-bumped transitives (cascading from testcontainers' new dep tree):
go.opentelemetry.io/otel v1.24.0 → v1.41.0
go.opentelemetry.io/otel/{metric,trace} v1.24.0 → v1.41.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
v0.49.0 → v0.60.0
go.opentelemetry.io/auto/sdk added @ v1.2.1
golang.org/x/crypto v0.45.0 → v0.48.0
golang.org/x/net v0.47.0 → v0.49.0
golang.org/x/sync v0.18.0 → v0.19.0
golang.org/x/sys v0.40.0 → v0.42.0
golang.org/x/text v0.31.0 → v0.34.0
Verification (all green):
go build ./cmd/server/... ./cmd/agent/... ./cmd/cli/... \
./cmd/mcp-server/... → exit 0
go test -run=NONE -count=1 ./internal/repository/postgres/ → ok
go test -tags=integration -run=NONE -count=1 ./deploy/test/ → ok
go vet ./internal/repository/postgres/... → clean
go list -deps ./cmd/{server,agent,cli,mcp-server}/... |
grep docker → zero hits
Diff: 2 files (go.mod, go.sum), 129 insertions / 144 deletions.
Closes: GHSA-x744-4wpc-v9h2, CVE-2026-34040
|
||
|
|
d2b62880ce | |||
|
|
75097909e9 | |||
|
|
7c5cc57d75 | |||
|
|
9acf609ac9 |
docs: convert ASCII flow diagram to Mermaid in test-environment.md
Per operator audit: every diagram in docs/ should be Mermaid except in the repo-root README.md. The 'Key Generation Flow (Agent-Side)' section in docs/contributor/test-environment.md was rendered as a plain code fence with arrow-prose: Server creates job (AwaitingCSR) → Agent polls, sees job → Agent generates ECDSA P-256 key pair locally → ... That was the only non-Mermaid diagram-shaped block left in docs/. Converted to a Mermaid sequenceDiagram with 5 participants (certctl-server, issuer connector, certctl-agent, local agent FS, shared volume) covering the full AwaitingCSR → CSR-submit → Deployment-job → cert-write → Completed lifecycle. Audit + verification script: cowork/docs-audit-2026-05-05/mermaid-audit.md. Re-running the detection script post-fix returns zero non-Mermaid diagram-like blocks across all 76 docs/ markdown files. Total Mermaid coverage in docs/ now: 14 docs / 40 blocks. |
||
|
|
622cd29f20 |
docs: factuality sweep — fix 3 broken links + 12 count claims (audit findings 2026-05-05)
Per the cowork/docs-audit-2026-05-05/ end-to-end factuality audit (20 confirmed findings across 76 docs, 7 parallel subagents + audit-of-the-audit). Hot + Warm tier fixes ship here; STALE findings (qa-test-suite.md test-count snapshot) need 'make qa-stats' which is operator-side. BROKEN links repaired (3): - docs/reference/api.md L195: [Quick Start](quickstart.md) → ../getting-started/quickstart.md (404 pre-fix) - docs/reference/api.md L196: [Connector Guide](connectors.md) → connectors/index.md (Phase 4 rename, was 404 pre-fix) - docs/reference/protocols/scep-intune.md L377: [legacy-est-scep.md](legacy-est-scep.md) → scep-server.md (file was deleted in Phase 7 commit |
||
|
|
d809874fa1 |
docs: retire compliance subtree + sweep framework name-drops from prose
Per operator decision the framework-mapping docs are gone. They
were aspirational (no audit, no certification, no validated
mapping); keeping them around was misleading.
Files deleted (1,883 lines):
- docs/compliance/index.md
- docs/compliance/soc2.md
- docs/compliance/pci-dss.md
- docs/compliance/nist-sp-800-57.md
Hyperlinks removed:
- README.md: 'Auditor / compliance' row in the doc table; the
'(compliance mapping included)' parenthetical in the
positioning paragraph
- docs/README.md: the '## Compliance' section table; the
'Auditor / compliance team' reading-order-by-role row
Prose name-drops swept across 24 files:
- README.md: 'FedRAMP boundary CAs / financial-services policy
CAs' → '4-level boundary CAs / 3-level policy CAs';
'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High,
SOC 2 Type II, HIPAA' → cut entirely
- getting-started/{quickstart,concepts,examples,why-certctl,
advanced-demo}.md: 'compliance' → 'audit' / 'policy';
'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut;
''pci': 'true'' tag example → ''environment': 'production''
- migration/cert-manager-coexistence.md: 'compliance rules' →
'policy rules'
- operator/approval-workflow.md: 'Compliance customers (PCI-DSS
Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' →
'Operators'; entire 'Compliance control mapping' table
(PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1
/ HIPAA §164.308(a)(4)) deleted; 'compliance contract' →
'two-person-integrity contract'; 'compliance auditors' →
'reviewers'
- operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5'
audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5
attestation' section retitled to 'TLS posture summary' and
rewritten without framework framing; 'PCI-DSS, NIST, and
major browsers will eventually deprecate TLS 1.2' →
'Major browsers and OS vendors will eventually deprecate
TLS 1.2'
- operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref →
CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS
Req 4 v4.0 prose footing → cut
- operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI
procurement-team deliverable' → 'on-call deliverable';
'compliance auditors' → 'reviewers'
- reference/connectors/{acme,aws-acm,azure-kv,globalsign,
local-ca,openssl,ssh,index}.md: 'compliance reporting
(PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting';
'Compliance environments (PCI-DSS Level 1, FedRAMP High,
HIPAA)' → 'Regulated environments'; 'compliance audits' →
'audit'; 'FedRAMP boundary CA' pattern names →
'4-level boundary CA' (technically descriptive)
- reference/protocols/est.md: 'compliance-hook seam' →
'device-state hook seam'; 'compliance gating' → 'device-state
gating'; 'est_compliance_failed' → 'est_device_state_failed'
- reference/protocols/scep-intune.md: 'Optional compliance
check' → 'Optional device-state check'; failure-counter
'compliance_failed' → 'device_state_failed'; 'Conditional
Access compliance gating' → 'Conditional Access
device-state gating'
- reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA
deployments where the regulator requires...' →
'Boundary-CA deployments where you want separation of policy
and issuing authorities'; pattern A retitled '4-level FedRAMP
boundary CA' → '4-level boundary CA'
- reference/architecture.md: broken Related-docs link to
compliance.md removed; the rest of that block had stale
pre-Phase-2 paths (quickstart.md, demo-advanced.md,
connectors.md, openapi.md, testing-guide.md, test-env.md) —
retargeted to current locations
- reference/deployment-model.md: 'SOC 2 evidence-report
generator' → 'Audit-evidence report generator'
- reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this
into evidence packs' → 'reviewers paste this into
vendor-evaluation packs'
- contributor/qa-test-suite.md: 'compliance exist' coverage
description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)'
risk-class label → 'Audit-relevant'
What was kept:
- CWE references (legitimate technical pointers)
- Microsoft API/feature names that happen to use 'compliance'
literally ('Microsoft Graph compliance API',
'device-compliance validators' — these are MS product names,
not framework name-drops)
- 'NIST PQC' on the landing page (Post-Quantum Cryptography is
the actual NIST standard family, not a compliance framework)
Verified: zero hyperlinks into docs/compliance/ remain. All 24
ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean.
Net diff: 26 files / -1,883 deletions in compliance/ + -32 net
across the prose sweep.
Companion edits in cowork/ (CLAUDE.md doc-tree summary +
WORKSPACE-CHANGELOG.md retirement note) land separately.
|
||
|
|
5ea8fb48eb |
ci: restore +x bit on scripts/ci-guards/*.sh (sandbox stripped exec bit)
Pure mode-change commit. The previous
|
||
|
|
3275f9f1e0 |
ci: post-Phase-2-docs-overhaul cleanup of stale guards + missing config doc
CI run on the
|
||
|
|
ecb8896b1c |
docs: cleanup pre-existing broken links in connector pages
Phase 4 structural (commit
|
||
|
|
f179eab071 |
docs: expand docs/README.md connectors section to enumerate all 28 deep-dive pages
After the Phase 4 follow-on (commits |
||
|
|
969853ee53 |
docs: Phase 4 follow-on batch 4 — 5 final target per-pages
Extracts the remaining target connectors: - ssh.md (194 lines) — agentless SSH/SFTP deploy with full host-key-acceptance threat model (what's accepted, what's not, mitigations including known_hosts enforcement and SSH cert auth); V3-Pro forward path - wincertstore.md (118 lines) — non-IIS Windows services via local PowerShell or WinRM proxy mode; store selection (My / Root / WebHosting); private-key permissions guidance - jks.md (189 lines) — JKS / PKCS#12 via keytool with full atomic snapshot+rollback contract (Bundle 8 'snapshot → delete → import → reload'), keytool argv password exposure threat model + mitigations - aws-acm.md (208 lines) — ACM target with full IAM policy, IRSA / instance-profile / SSO auth recipes, atomic-rollback contract, ALB attachment Terraform recipe, procurement-checklist crib - azure-kv.md (195 lines) — Key Vault target with managed-identity / workload-identity / service-principal auth recipes, version- semantics rollback caveat (no in-place restore without soft-delete), App Gateway / Front Door attachment recipe Index forward-list expanded to enumerate all 15 target connectors (5 from Phase 4 structural + 5 from batch 3 + 5 from this batch) in alphabetical order. This is part 4 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 904 lines. No content removed from index.md. End-state of Phase 4 follow-on: - 13 issuer per-pages (5 batch 1 + 8 batch 2) - 15 target per-pages (5 Phase 4 structural + 5 batch 3 + 5 batch 4) - index.md keeps its inline reference content; per-pages add operator depth on top, matching the pattern set by apache/f5/iis/k8s/nginx in Phase 4 structural |
||
|
|
082b8cf660 |
docs: Phase 4 follow-on batch 3 — 5 file-based target per-pages
Extracts the file-based deploy target connectors: - haproxy.md (107 lines) — combined-PEM (cert+chain+key) deploy with haproxy -c validate; multi-frontend + crt-list directory guidance - traefik.md (105 lines) — file-provider zero-reload deploy; file watcher latency notes; mixing with built-in ACME guidance - caddy.md (100 lines) — admin API mode (recommended) vs file mode; admin-API exposure threat model - envoy.md (112 lines) — file SDS mode (recommended) vs static bootstrap; service-mesh interactions - postfix.md (175 lines) — dual-mode (Postfix MTA / Dovecot IMAPS) connector with daemon-specific quirks (STARTTLS chain expectations, no shared session cache); Bundle 11 test pins Index forward-list expanded to enumerate all 10 target connectors (5 from Phase 4 structural + 5 from this batch) in alphabetical order. This is part 3 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 599 lines. No content removed from index.md. |
||
|
|
de06141ce5 |
docs: Phase 4 follow-on batch 2 — 8 remaining issuer per-pages
Extracts the rest of the issuer per-connector deep-dive pages: - local-ca.md (170 lines) — Local CA self-signed / sub-CA / tree mode, CRL+OCSP endpoints, EKU support, MaxTTL enforcement, L-014 file-on- disk threat model carve-out - acme.md (235 lines) — RFC 8555 v2 client (HTTP-01 / DNS-01 / DNS-PERSIST-01), ARI per RFC 9773, EAB + ZeroSSL auto-EAB, Let's Encrypt profile selection, revoke-by-serial Top-10 fix #7 - step-ca.md (99 lines) — Smallstep JWK-provisioner synchronous issuance with MaxTTL enforcement - openssl.md (157 lines) — script-based shell-out with full threat model (what's accepted, what's not, mitigations, V3-Pro forward path) - sectigo.md (98 lines) — Sectigo SCM REST with bounded async polling - google-cas.md (89 lines) — GCP managed private CA with OAuth2 service-account auth + IAM-role guidance - entrust.md (96 lines) — Entrust CA Gateway mTLS-authenticated with approval-pending support and mTLS keypair caching - globalsign.md (122 lines) — Atlas HVCA dual auth (mTLS + API key/secret), region-aware base URLs, mTLS keypair caching Index forward-list expanded to enumerate all 13 issuer connectors (including the 5 pages from batch 1) in alphabetical order. This is part 2 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 8 files, 1,066 lines. No content removed from index.md. |
||
|
|
fd94205cfa |
docs: Phase 4 follow-on batch 1 — 5 issuer per-pages
Extract the first 5 issuer per-connector deep-dive pages: - vault.md (128 lines) — Vault PKI synchronous issuance, token TTL + auto-renewal loop, MaxTTL enforcement, rotation playbook - digicert.md (106 lines) — CertCentral DV/OV/EV with bounded async polling for vetting workflows - aws-acm-pca.md (165 lines) — managed private CA on AWS with full IAM policy, IRSA wiring, troubleshooting matrix - ejbca.md (116 lines) — open-source / Keyfactor EJBCA with mTLS or OAuth2 auth, mTLS keypair caching, approval-pending guidance - adcs.md (111 lines) — Active Directory Certificate Services as enterprise root via Local CA sub-CA mode, sub-CA rotation playbook Index updated with forward-list entries and the index-purpose blurb revised so the index now positions itself as 'navigate from here; deeper material lives in siblings' rather than 'docs to be extracted later'. Each per-page follows the WHAT/HOW/WHY pattern: what the connector is, how authentication and issuance work, and when to choose this vs an alternative. Cross-links to the connector index, async-ca-polling primitive, and adjacent operator runbooks. This is part 1 of 4 for the Phase 4 follow-on (per-connector page extraction) tracked in cowork/docs-overhaul-phase-2-restructure-2026-05-04/log.md. Net add: 5 files, 626 lines. No content removed from index.md (the index keeps its inline reference; per-pages add operator depth on top, matching the pattern set by apache/f5/iis/k8s/nginx in Phase 4 structural). |
||
|
|
b452013dd9 |
docs: Phase 5 — testing-guide.md prune (8268 → 0 lines, content dispersed)
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/
and the section-by-section plan in testing-guide-tumor.md.
testing-guide.md was 30% of all docs/ content (8268 lines) but was
integration test code written in markdown, not operator documentation.
The audit's tumor analysis disposed of every Part:
- ~65% DELETE (test cases that already exist in code)
- ~22% MOVE to inline test code
- ~8% KEEP-COMPRESSED into focused operator-runbook docs
- Title + contents + release sign-off ~5% KEEP
This commit ships the KEEP-COMPRESSED dispersal:
docs/contributor/qa-prerequisites.md (NEW, ~120 lines):
From testing-guide.md "Prerequisites" section. Stack boot procedure,
demo data baseline, reference IDs operators reuse across QA docs.
docs/contributor/gui-qa-checklist.md (NEW, ~105 lines):
From testing-guide.md "Part 35: GUI Testing". Manual GUI verification
pass for release sign-off. 25-row table covering every dashboard page.
docs/contributor/release-sign-off.md (NEW, ~130 lines):
From testing-guide.md "Release Sign-Off" section (originally 1009
lines of per-test detail tables). Compressed to a release-day
checklist organized by gate category: code state, automated gates,
manual QA passes, release artefact verification, branch protection,
post-release.
docs/operator/performance-baselines.md (NEW, ~100 lines):
From testing-guide.md "Part 39: Performance Spot Checks". Four
operator-runnable benchmarks (API request handling, inventory list
pagination, scheduler tick, bulk revoke) with baseline numbers and
when-to-re-baseline guidance.
docs/operator/helm-deployment.md (NEW, ~120 lines):
From testing-guide.md "Part 52: Helm Chart Deployment". Operator
runbook for the bundled deploy/helm/certctl/ chart: prereqs,
install, four cert-source patterns, verify, upgrade, troubleshooting.
docs/reference/cli.md (NEW, ~120 lines):
From testing-guide.md "Part 28: CLI Tool". certctl-cli command
reference with command-group breakdown, common workflows
(list/filter, renew, revoke, bulk import, EST enrollment, status),
output formats, CI/CD integration patterns.
docs/README.md navigation index updated to include the 6 new docs:
Reference section gains: cli.md, release-verification.md (was added
in Phase 13)
Operator section gains: helm-deployment.md, performance-baselines.md
Contributor section gains: qa-prerequisites.md, gui-qa-checklist.md,
release-sign-off.md
docs/testing-guide.md deleted. Git history preserves the 8268 lines —
if any specific test case is found missing from inline test code or
the destination docs during future work, lift from `git show
HEAD~1:docs/testing-guide.md`.
Net: docs/ total line count drops by ~7700 lines (28%), from 26,369
to 18,742. testing-guide.md was the single largest doc; pruning it is
the single biggest content-edit win of the entire restructure.
Phase 5 is the last major content phase. Remaining: Phase 4 follow-on
(per-connector page extractions from reference/connectors/index.md),
Phase 15 (WHAT/HOW/WHY remediation), Phase 16 (final acceptance gate).
|