mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:21:31 +00:00
auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
This commit is contained in:
@@ -24,6 +24,7 @@ import (
|
||||
"github.com/certctl-io/certctl/internal/api/router"
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
"github.com/certctl-io/certctl/internal/auth/bootstrap"
|
||||
"github.com/certctl-io/certctl/internal/auth/session"
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
discoveryawssm "github.com/certctl-io/certctl/internal/connector/discovery/awssm"
|
||||
discoveryazurekv "github.com/certctl-io/certctl/internal/connector/discovery/azurekv"
|
||||
@@ -341,6 +342,47 @@ func main() {
|
||||
}
|
||||
}
|
||||
bootstrapHandler := handler.NewBootstrapHandler(bootstrapService)
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 4 — session service.
|
||||
//
|
||||
// Wired AFTER migrations + RBAC backfill, BEFORE the HTTP listener
|
||||
// binds (per the prompt's "fail-fatal on bootstrap key mint failure"
|
||||
// requirement). EnsureInitialSigningKey is idempotent: if a non-
|
||||
// retired signing key already exists for the tenant the call is a
|
||||
// no-op; otherwise it mints a fresh 32-byte HMAC key, persists it,
|
||||
// and emits an auth.session_signing_key_bootstrap audit row with
|
||||
// event_category=auth.
|
||||
//
|
||||
// Failure here is fatal — the server refuses to boot rather than
|
||||
// serve session-less.
|
||||
//
|
||||
// The session service is wired into the scheduler below (sessionGCLoop)
|
||||
// so the GC sweep runs every CERTCTL_SESSION_GC_INTERVAL tick. The
|
||||
// HTTP middleware that consumes ValidateInput / ValidateCSRF lands
|
||||
// in Phase 5; pre-Phase-5 deployments boot the service so the GC
|
||||
// sweep can keep the sessions + signing-keys tables tidy.
|
||||
sessionRepo := postgres.NewSessionRepository(db)
|
||||
sessionKeyRepo := postgres.NewSessionSigningKeyRepository(db)
|
||||
sessionService := session.NewService(
|
||||
sessionRepo,
|
||||
sessionKeyRepo,
|
||||
auditService,
|
||||
authdomainAlias.DefaultTenantID,
|
||||
session.Config{
|
||||
IdleTimeout: cfg.Auth.Session.IdleTimeout,
|
||||
AbsoluteTimeout: cfg.Auth.Session.AbsoluteTimeout,
|
||||
SigningKeyRetention: cfg.Auth.Session.SigningKeyRetention,
|
||||
BindIP: cfg.Auth.Session.BindIP,
|
||||
BindUserAgent: cfg.Auth.Session.BindUserAgent,
|
||||
},
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
)
|
||||
if err := sessionService.EnsureInitialSigningKey(bootCtx); err != nil {
|
||||
logger.Error("FATAL: session signing key bootstrap failed; refusing to boot", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
policyService := service.NewPolicyService(policyRepo, auditService)
|
||||
policyService.SetCertRepo(certificateRepo) // D-008: CertificateLifetime arm needs CertificateVersion.NotBefore/NotAfter
|
||||
// G-1: RenewalPolicyService — distinct from PolicyService (compliance rules).
|
||||
@@ -937,6 +979,18 @@ func main() {
|
||||
sched.SetJobTimeoutInterval(cfg.Scheduler.JobTimeoutInterval)
|
||||
sched.SetAwaitingCSRTimeout(cfg.Scheduler.AwaitingCSRTimeout)
|
||||
sched.SetAwaitingApprovalTimeout(cfg.Scheduler.AwaitingApprovalTimeout)
|
||||
|
||||
// Auth Bundle 2 Phase 4 — wire the session-GC sweep. The service
|
||||
// itself was constructed (with the EnsureInitialSigningKey fail-
|
||||
// fatal call) above the policy/cert-service block; here we just
|
||||
// register it with the scheduler so the loop fires every
|
||||
// CERTCTL_SESSION_GC_INTERVAL.
|
||||
sched.SetSessionGarbageCollector(sessionService)
|
||||
sched.SetSessionGCInterval(cfg.Auth.Session.GCInterval)
|
||||
logger.Info("session GC sweep enabled",
|
||||
"interval", cfg.Auth.Session.GCInterval.String(),
|
||||
"absolute_timeout", cfg.Auth.Session.AbsoluteTimeout.String(),
|
||||
"signing_key_retention", cfg.Auth.Session.SigningKeyRetention.String())
|
||||
logger.Info("job timeout reaper enabled",
|
||||
"interval", cfg.Scheduler.JobTimeoutInterval.String(),
|
||||
"csr_timeout", cfg.Scheduler.AwaitingCSRTimeout.String(),
|
||||
|
||||
Reference in New Issue
Block a user