mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 18:01:37 +00:00
17b30c1f7f
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
131 lines
5.9 KiB
Go
131 lines
5.9 KiB
Go
package repository
|
|
|
|
import (
|
|
"context"
|
|
"errors"
|
|
|
|
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
|
|
)
|
|
|
|
// Sentinel errors for the session repositories.
|
|
var (
|
|
// ErrSessionNotFound: Get returned no row. Phase 4 maps to 401
|
|
// (the cookie either expired or was forged with a known-good key
|
|
// id but stale session id).
|
|
ErrSessionNotFound = errors.New("session: not found")
|
|
|
|
// ErrSessionRevoked: Get found a row but RevokedAt is set. Phase 4
|
|
// maps to 401.
|
|
ErrSessionRevoked = errors.New("session: revoked")
|
|
|
|
// ErrSessionExpired: Get found a row but the absolute expiry has
|
|
// passed (Phase 4 also enforces idle expiry but that's a service-
|
|
// level check against last_seen_at, not a repository sentinel).
|
|
ErrSessionExpired = errors.New("session: expired")
|
|
|
|
// ErrSessionSigningKeyNotFound: GetActive returned no row. Phase 4
|
|
// EnsureInitialSigningKey treats this as "boot-time provisioning
|
|
// needed" and mints the first key.
|
|
ErrSessionSigningKeyNotFound = errors.New("session: signing key not found")
|
|
|
|
// ErrSessionSigningKeyInUse: Delete (full purge, not Retire) failed
|
|
// because at least one sessions row still references the key. Phase
|
|
// 4's GarbageCollect waits for sessions to expire before purging.
|
|
ErrSessionSigningKeyInUse = errors.New("session: signing key still referenced by active sessions")
|
|
)
|
|
|
|
// SessionRepository wraps the sessions table. Two cookie shapes share
|
|
// the rows: post-login sessions (1h-idle/8h-absolute) and pre-login
|
|
// sessions (10-minute TTL, IsPreLogin=true; carry OIDC state + nonce
|
|
// + PKCE verifier across the IdP redirect).
|
|
type SessionRepository interface {
|
|
// Create persists a session row. Caller MUST have called
|
|
// s.Validate(). Returns ErrAuthDuplicateName-shape on the
|
|
// extremely-unlikely id collision (the id is a 32-byte random;
|
|
// callers SHOULD generate fresh ids on the second attempt).
|
|
Create(ctx context.Context, s *sessiondomain.Session) error
|
|
|
|
// Get returns a session by id. ErrSessionNotFound on miss.
|
|
// Returns the row even if revoked / expired so the service layer
|
|
// can produce the right 401 reason code (revoked vs expired vs
|
|
// not-found are all 401 to the wire but distinguishable in audit).
|
|
Get(ctx context.Context, id string) (*sessiondomain.Session, error)
|
|
|
|
// ListByActor returns every active (non-revoked, non-expired,
|
|
// non-pre-login) session for an actor. Used by the GUI's
|
|
// /v1/auth/sessions surface so users can revoke their old laptops.
|
|
ListByActor(ctx context.Context, actorID, actorType, tenantID string) ([]*sessiondomain.Session, error)
|
|
|
|
// UpdateLastSeen sets last_seen_at = NOW() for the named session.
|
|
// Phase 4's middleware calls this on every request to keep the
|
|
// idle-expiry sliding window fresh.
|
|
UpdateLastSeen(ctx context.Context, id string) error
|
|
|
|
// UpdateCSRFTokenHash replaces the csrf_token_hash on the session
|
|
// row. Phase 4's RotateCSRFToken consumes this on login completion,
|
|
// logout, and any actor-role mutation against this actor. The hash
|
|
// is the SHA-256 hex of the operator-facing CSRF token plaintext.
|
|
UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error
|
|
|
|
// Revoke sets revoked_at = NOW() for the named session. Subsequent
|
|
// Get returns the row with RevokedAt set; Phase 4's Validate maps
|
|
// to 401.
|
|
Revoke(ctx context.Context, id string) error
|
|
|
|
// RevokeAllForActor sets revoked_at = NOW() on every active session
|
|
// for an actor. Used on role change, fired-employee scenarios, and
|
|
// the back-channel logout endpoint (Phase 5).
|
|
RevokeAllForActor(ctx context.Context, actorID, actorType, tenantID string) error
|
|
|
|
// GarbageCollectExpired deletes sessions whose absolute expiry
|
|
// has passed AND whose revoked_at is older than the configurable
|
|
// retention window (default 24h). Pre-login rows older than the
|
|
// 10-minute TTL are also deleted. Returns the number of rows
|
|
// deleted.
|
|
GarbageCollectExpired(ctx context.Context) (int, error)
|
|
|
|
// Delete unconditionally removes a session row. Used for the
|
|
// admin-only "purge a specific session" surface (rarely needed;
|
|
// Revoke is the normal path).
|
|
Delete(ctx context.Context, id string) error
|
|
}
|
|
|
|
// SessionSigningKeyRepository wraps the session_signing_keys table.
|
|
// Phase 4's Service.RotateSigningKey + EnsureInitialSigningKey + the
|
|
// scheduler-driven retention sweep consume this.
|
|
type SessionSigningKeyRepository interface {
|
|
// List returns every signing key in the tenant (including
|
|
// retired). Order: created_at DESC.
|
|
List(ctx context.Context, tenantID string) ([]*sessiondomain.SessionSigningKey, error)
|
|
|
|
// GetActive returns the most-recently-created non-retired key.
|
|
// ErrSessionSigningKeyNotFound when no non-retired key exists
|
|
// (Phase 4's EnsureInitialSigningKey treats this as "mint first
|
|
// key").
|
|
GetActive(ctx context.Context, tenantID string) (*sessiondomain.SessionSigningKey, error)
|
|
|
|
// Get returns one key by id (including retired keys; Phase 4's
|
|
// Validate consults this for cookies signed under retired-but-
|
|
// in-retention keys).
|
|
Get(ctx context.Context, id string) (*sessiondomain.SessionSigningKey, error)
|
|
|
|
// Add persists a new signing key. Caller MUST have called
|
|
// k.Validate() and encrypted the key_material via
|
|
// internal/crypto/encryption.go. CreatedAt defaults to NOW() if
|
|
// zero.
|
|
Add(ctx context.Context, k *sessiondomain.SessionSigningKey) error
|
|
|
|
// Retire marks an active key as retired (sets retired_at = NOW()).
|
|
// The key stays in the table for verification of cookies signed
|
|
// under it; the scheduler's retention sweep purges it after the
|
|
// configurable retention window (default 24h beyond retired_at).
|
|
Retire(ctx context.Context, id string) error
|
|
|
|
// Delete unconditionally removes a signing key row. Returns
|
|
// ErrSessionSigningKeyInUse if any sessions row still references
|
|
// the key (FK ON DELETE RESTRICT). Phase 4's GarbageCollect calls
|
|
// this only after RetentionWindow has passed AND no sessions
|
|
// reference the key.
|
|
Delete(ctx context.Context, id string) error
|
|
}
|