mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:41:41 +00:00
17b30c1f7f
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
186 lines
7.7 KiB
YAML
186 lines
7.7 KiB
YAML
# Coverage floors per gated package.
|
|
#
|
|
# Each entry: floor: <integer percentage>, why: <load-bearing context>.
|
|
# Adding a new gated package: one entry here; CI's `Check Coverage Thresholds`
|
|
# step auto-picks up. Lowering a floor REQUIRES corresponding code-side test
|
|
# work — never lower the gate to make CI green.
|
|
#
|
|
# Per ci-pipeline-cleanup bundle Phase 2 / frozen decision 0.3.
|
|
|
|
internal/service:
|
|
floor: 70
|
|
why: |
|
|
Bundle R-CI-extended raise (post-Bundle-N.C-extended): service
|
|
55 → 70. HEAD 73.4% (3pp margin). Prescribed Bundle R target
|
|
was 80; held lower to avoid false-positives on single low-
|
|
coverage files dragging the global per-file-average down.
|
|
|
|
internal/api/handler:
|
|
floor: 75
|
|
why: |
|
|
Bundle R-CI-extended raise: handler 60 → 75. HEAD 79.8% (4pp
|
|
margin). Prescribed Bundle R target was 80; held lower for
|
|
same reason as service layer.
|
|
|
|
internal/domain:
|
|
floor: 40
|
|
why: |
|
|
Domain layer is mostly type definitions + validators; 40% is
|
|
the load-bearing-paths floor.
|
|
|
|
internal/api/middleware:
|
|
floor: 30
|
|
why: |
|
|
Middleware coverage is per-handler-test-driven. 30% is the
|
|
floor that catches the wired-up middleware paths; the
|
|
unwired paths (alternative auth providers not currently
|
|
enabled) sit below.
|
|
|
|
internal/crypto:
|
|
floor: 88
|
|
why: |
|
|
Bundle R closure CI checkpoint #3: crypto floor lifted 85 → 88.
|
|
Post-Bundle-Q package-scoped coverage at HEAD: 88.2%. The
|
|
remaining ~12% gap is platform-failure branches (rand.Reader /
|
|
aes.NewCipher) that require interface seams the production
|
|
code doesn't use; closing them is tracked as R-CI-extended,
|
|
not Bundle R scope.
|
|
|
|
internal/connector/issuer/local:
|
|
floor: 86
|
|
why: |
|
|
Bundle R closure CI checkpoint #3: local-issuer floor lifted
|
|
85 → 86. Post-Bundle-Q package-scoped coverage at HEAD: 86.7%.
|
|
The prescribed Bundle R target was 92, but reaching it
|
|
requires interface seams for crypto/x509 signing-error
|
|
branches — tracked as R-CI-extended.
|
|
|
|
internal/connector/issuer/acme:
|
|
floor: 80
|
|
why: |
|
|
Bundle R-CI-extended threshold raise (post-Bundle-J-extended):
|
|
ACME 50 → 80. The Pebble-style mock + per-CA failure tests
|
|
lift package-scoped ACME to 85.4%; gate at 80 with 5pp margin
|
|
to absorb the global-run per-file-average dip.
|
|
|
|
internal/connector/issuer/stepca:
|
|
floor: 80
|
|
why: |
|
|
Bundle L.B / Coverage-Audit C-005 — StepCA failure-mode + JWE
|
|
round-trip tests lift package from 52.1% to 90.4% (per-package
|
|
run). Floor at 80 with margin.
|
|
|
|
internal/mcp:
|
|
floor: 85
|
|
why: |
|
|
Bundle K / Coverage-Audit C-002 — MCP per-tool dispatch via
|
|
in-memory transport lifts package from 28.0% to 93.1% (per-
|
|
package run). Floor at 85.
|
|
|
|
internal/auth:
|
|
floor: 85
|
|
why: |
|
|
Bundle 1 Phase 12 — RBAC primitive coverage gate.
|
|
internal/auth ships keystore + middleware + RequirePermission +
|
|
bootstrap + the Phase-3 context keys + the protocol-endpoint
|
|
allowlist. Negative-test coverage (no actor → 401, no role →
|
|
403, wrong scope → 403, bootstrap-token-wrong → 401, bootstrap-
|
|
used-twice → 410, admin-already-exists → 410, zero-length token
|
|
rejection) is now in place. Prescribed Bundle 1 target was 90;
|
|
held at 85 to absorb the per-file-average dip from the
|
|
middleware shim files (testfixtures.go) which CI runs but only
|
|
test fixtures exercise. Sub-package internal/auth/bootstrap
|
|
inherits this floor.
|
|
|
|
internal/service/auth:
|
|
floor: 85
|
|
why: |
|
|
Bundle 1 Phase 12 — RBAC service-layer coverage gate.
|
|
PermissionService + RoleService + ActorRoleService + Authorizer
|
|
each have positive + negative tests covering the
|
|
privilege-escalation guard (auth.role.assign required for
|
|
Grant/Revoke), the reserved-actor invariant (actor-demo-anon
|
|
cannot be mutated), the canonical-permission validation, the
|
|
role-in-use guard on Delete, and every sentinel-error path
|
|
(ErrUnauthenticated / ErrForbidden / ErrSelfRoleAssignment /
|
|
ErrAuthReservedActor / ErrAuthUnknownPermission /
|
|
ErrAuthRoleInUse).
|
|
|
|
internal/auth/oidc:
|
|
floor: 90
|
|
why: |
|
|
Bundle 2 Phase 3 — OIDC service coverage gate. Phase 3 spec
|
|
pins the floor at 90 explicitly because every fail-closed
|
|
branch is load-bearing for the security posture: alg pinning
|
|
(deny-list HS*/none + allow-list RS*/ES*/EdDSA), audience
|
|
re-check, azp enforcement on multi-aud tokens, at_hash
|
|
REQUIRED-when-access-token-present (Phase 3 lifts the OIDC
|
|
core "MAY" to a service-level "MUST"), iat-window window,
|
|
nonce constant-time-compare, single-use state replay defense,
|
|
PKCE-S256 mandatory, IdP downgrade-attack defense at
|
|
provider-load + RefreshKeys time, JWKS-fail-closed semantics,
|
|
group-claim resolution + userinfo-fallback fail-closed
|
|
semantics, token-leak hygiene. A regression in any one of
|
|
these branches is a security incident; the floor catches it
|
|
before the commit lands. The mock-IdP fixture in
|
|
service_test.go is the load-bearing harness.
|
|
|
|
internal/auth/oidc/groupclaim:
|
|
floor: 95
|
|
why: |
|
|
Bundle 2 Phase 3 — group-claim resolver. Hand-rolled (no
|
|
JSON-path dep per Decision 10); ~150 LOC, every branch
|
|
exercised by 19 unit tests covering the documented IdP shapes
|
|
(Okta string array, Keycloak realm_access.roles, Auth0
|
|
namespaced URL claim, single-string normalization,
|
|
deeply-nested 3-segment walks) plus every fail-closed branch
|
|
(empty path, missing key, missing nested key, non-object
|
|
intermediate, bool/number/object/nil values, array with
|
|
non-string element, URL-shape with dots-in-path treated as
|
|
literal). Resolver should be at 100%; floor at 95 leaves a
|
|
1-statement margin for future error-message refactors.
|
|
|
|
internal/auth/oidc/domain:
|
|
floor: 90
|
|
why: |
|
|
Bundle 2 Phase 1 — OIDCProvider + GroupRoleMapping domain.
|
|
Validation-heavy package; constructors + Validate methods
|
|
cover all canonical IdP shapes (Okta / Azure AD / Google
|
|
Workspace / Keycloak / Authentik / Auth0). Floor at 90 to
|
|
catch any future field that ships without a validator.
|
|
|
|
internal/auth/session:
|
|
floor: 90
|
|
why: |
|
|
Bundle 2 Phase 4 — session lifecycle service. Phase 4 spec
|
|
pins the floor at 90 because every fail-closed branch carries
|
|
a security invariant: HMAC-SHA256 cookie signing with a
|
|
LENGTH-PREFIXED canonical input (defeats the
|
|
`<a, bc>`-vs-`<ab, c>` concatenation collision attack on the
|
|
bare-concat form), v1. version-prefix lock, idle expiry,
|
|
absolute expiry, revocation, retired-but-in-retention key
|
|
success path, retired-past-retention failure path, CSRF
|
|
constant-time compare against the SHA-256-hashed copy on the
|
|
session row, optional IP/UA-bind defense-in-depth gates,
|
|
fail-fatal initial-key bootstrap. A regression in any one of
|
|
these branches is a security incident; the floor catches it
|
|
before the commit lands. The 15-case negative-test matrix in
|
|
service_test.go is the load-bearing harness; the in-memory
|
|
stubs of SessionRepo + SigningKeyRepo + AuditRecorder let the
|
|
state machine be exercised without the postgres testcontainer
|
|
overhead (which Phase 2's integration tests already cover).
|
|
|
|
internal/auth/session/domain:
|
|
floor: 90
|
|
why: |
|
|
Bundle 2 Phase 1 — Session + SessionSigningKey domain. Both
|
|
types ship Validate() with full invariant coverage: ID prefix
|
|
enforcement (ses-/sk-), expiry-order CHECK (absolute > idle >
|
|
created), CSRFTokenHash format pin (64 lowercase hex chars),
|
|
KeyMaterialEncrypted non-empty, retired-before-created
|
|
rejection, TenantID defaulting. Cookie naming constants are
|
|
pinned by TestCookieNamingConstants because the GUI's
|
|
web/src/api/client.ts will read `certctl_csrf` by string.
|
|
Floor at 90 to catch any future field that ships without a
|
|
validator.
|