Files
certctl/internal/repository/session.go
T
shankar0123 17b30c1f7f auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap

Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.

Service surface (internal/auth/session/service.go, ~880 LOC):

  - Service.Create(actorID, actorType, ip, ua) -> *CreateResult
    Mints a session row; signs the cookie value with the active signing
    key; returns the cookie payload AND the CSRF token plaintext for
    the handler to set on the response.
  - Service.Validate(ValidateInput) -> *Session
    Parses the cookie, looks up the signing key (incl. retired-but-in-
    retention), recomputes HMAC-SHA256, loads the session row, enforces
    revocation + absolute + idle expiry + optional IP/UA bind. Maps to
    one of 9 sentinel errors; the handler uniformly returns 401 to the
    wire (specific reason in the audit row).
  - Service.ValidateCSRF(headerValue, *Session) error
    Constant-time compares SHA-256(header) against the stored hash on
    the session row.
  - Service.UpdateLastSeen / Revoke / RevokeAllForActor
  - Service.RotateCSRFToken — mints fresh token, persists hash, returns
    plaintext; called on login completion, logout, role-change against
    actor, explicit operator rotate.
  - Service.RotateSigningKey — mints new active key, retires previous;
    retired keys stay valid for cfg.SigningKeyRetention so existing
    cookies don't immediately fail.
  - Service.EnsureInitialSigningKey — idempotent; mints first key on
    fresh deploys; emits auth.session_signing_key_bootstrap audit row
    with event_category=auth. Wired into cmd/server/main.go AFTER
    migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
    is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
    to boot rather than serve session-less.
  - Service.GarbageCollect — sweeps expired post-login sessions +
    pre-login rows >10min + retired-past-retention signing keys. Wired
    into the new internal/scheduler/scheduler.go::sessionGCLoop on a
    CERTCTL_SESSION_GC_INTERVAL tick.

Cookie wire format (load-bearing):

  v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>

The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:

  len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id

where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.

The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).

CSRF token model:

  - Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
    intentional; the GUI must read it to echo into X-CSRF-Token header).
  - SHA-256 hash of the plaintext lives on the session row.
  - Validation: SHA-256(X-CSRF-Token) constant-time-compared.
  - Rotated by Service.RotateCSRFToken on login / logout / role-change /
    explicit admin-trigger.

Optional defense-in-depth (default OFF):

  - CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
    recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
    (user may have legitimate IP change). Mobile + corporate-NAT
    environments leave this off.
  - CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.

Configurable lifetimes (env vars wired in internal/config/config.go):

  CERTCTL_SESSION_IDLE_TIMEOUT             1h
  CERTCTL_SESSION_ABSOLUTE_TIMEOUT         8h
  CERTCTL_SESSION_SIGNING_KEY_RETENTION    24h
  CERTCTL_SESSION_GC_INTERVAL              1h
  CERTCTL_SESSION_SAMESITE                 Lax
  CERTCTL_SESSION_BIND_IP                  false
  CERTCTL_SESSION_BIND_USER_AGENT          false

Test surface (internal/auth/session/service_test.go, ~860 LOC):

  All 15 prompt-mandated negative cases:

    1.  Tampered cookie (HMAC byte flipped near segment start where all
        6 bits are real — base64url-no-pad's last char carries only 2
        bits so a tail-flip is unreliable).
    1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
    2.  Cookie missing v1. prefix.
    3.  Cookie with unknown version prefix (v99).
    4.  Idle expiry — back-dated last_seen_at + idle_expires_at.
    5.  Absolute expiry — back-dated absolute_expires_at.
    6.  Revoked session.
    7.  Wrong signing key id (no row matches).
    8.  Cookie signed under retired-but-in-retention key SUCCEEDS.
    9.  Cookie signed under retired-past-retention key FAILS.
    10. Concatenation collision — direct evidence that
        computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
        a forged-boundary-slide cookie is rejected.
    11. CSRF token missing.
    12. CSRF token mismatch (constant-time compare).
    13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
    14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
    15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
        wrap (cmd/server/main.go treats as fatal).

  Plus coverage-lift batch covering: every error wrap on every repo
  collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
  Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
  RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
  the cookie parser's full negative matrix (empty, wrong segment count,
  missing prefixes, bad base64, wrong HMAC length), and a real-encryption
  round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
  the v3-blob path is exercised end-to-end at the session-cookie level.

Coverage:

  internal/auth/session              94.5%  (floor 90)
  internal/auth/session/domain       96+%   (floor 90, Phase 1)

.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.

Repository extensions:

  internal/repository/session.go gains UpdateCSRFTokenHash on the
  SessionRepository interface; internal/repository/postgres/session.go
  ships the implementation. RotateCSRFToken consumes it.

Scheduler extensions:

  internal/scheduler/scheduler.go gains SessionGarbageCollector
  interface + sessionGC field + sessionGCInterval +
  SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
  Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
  concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
  per-tick context.WithTimeout(1m) bounds a stuck Postgres.

Server wiring:

  cmd/server/main.go constructs sessionService AFTER the bootstrap
  block (post-RBAC backfill) and BEFORE the policy-service block.
  EnsureInitialSigningKey runs immediately; failure is fatal via
  os.Exit(1). The scheduler section wires SetSessionGarbageCollector
  + SetSessionGCInterval alongside the other interval setters and
  emits an Info log so operators can confirm the loop is enabled.

Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.

Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
2026-05-10 05:31:24 +00:00

131 lines
5.9 KiB
Go

package repository
import (
"context"
"errors"
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
)
// Sentinel errors for the session repositories.
var (
// ErrSessionNotFound: Get returned no row. Phase 4 maps to 401
// (the cookie either expired or was forged with a known-good key
// id but stale session id).
ErrSessionNotFound = errors.New("session: not found")
// ErrSessionRevoked: Get found a row but RevokedAt is set. Phase 4
// maps to 401.
ErrSessionRevoked = errors.New("session: revoked")
// ErrSessionExpired: Get found a row but the absolute expiry has
// passed (Phase 4 also enforces idle expiry but that's a service-
// level check against last_seen_at, not a repository sentinel).
ErrSessionExpired = errors.New("session: expired")
// ErrSessionSigningKeyNotFound: GetActive returned no row. Phase 4
// EnsureInitialSigningKey treats this as "boot-time provisioning
// needed" and mints the first key.
ErrSessionSigningKeyNotFound = errors.New("session: signing key not found")
// ErrSessionSigningKeyInUse: Delete (full purge, not Retire) failed
// because at least one sessions row still references the key. Phase
// 4's GarbageCollect waits for sessions to expire before purging.
ErrSessionSigningKeyInUse = errors.New("session: signing key still referenced by active sessions")
)
// SessionRepository wraps the sessions table. Two cookie shapes share
// the rows: post-login sessions (1h-idle/8h-absolute) and pre-login
// sessions (10-minute TTL, IsPreLogin=true; carry OIDC state + nonce
// + PKCE verifier across the IdP redirect).
type SessionRepository interface {
// Create persists a session row. Caller MUST have called
// s.Validate(). Returns ErrAuthDuplicateName-shape on the
// extremely-unlikely id collision (the id is a 32-byte random;
// callers SHOULD generate fresh ids on the second attempt).
Create(ctx context.Context, s *sessiondomain.Session) error
// Get returns a session by id. ErrSessionNotFound on miss.
// Returns the row even if revoked / expired so the service layer
// can produce the right 401 reason code (revoked vs expired vs
// not-found are all 401 to the wire but distinguishable in audit).
Get(ctx context.Context, id string) (*sessiondomain.Session, error)
// ListByActor returns every active (non-revoked, non-expired,
// non-pre-login) session for an actor. Used by the GUI's
// /v1/auth/sessions surface so users can revoke their old laptops.
ListByActor(ctx context.Context, actorID, actorType, tenantID string) ([]*sessiondomain.Session, error)
// UpdateLastSeen sets last_seen_at = NOW() for the named session.
// Phase 4's middleware calls this on every request to keep the
// idle-expiry sliding window fresh.
UpdateLastSeen(ctx context.Context, id string) error
// UpdateCSRFTokenHash replaces the csrf_token_hash on the session
// row. Phase 4's RotateCSRFToken consumes this on login completion,
// logout, and any actor-role mutation against this actor. The hash
// is the SHA-256 hex of the operator-facing CSRF token plaintext.
UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error
// Revoke sets revoked_at = NOW() for the named session. Subsequent
// Get returns the row with RevokedAt set; Phase 4's Validate maps
// to 401.
Revoke(ctx context.Context, id string) error
// RevokeAllForActor sets revoked_at = NOW() on every active session
// for an actor. Used on role change, fired-employee scenarios, and
// the back-channel logout endpoint (Phase 5).
RevokeAllForActor(ctx context.Context, actorID, actorType, tenantID string) error
// GarbageCollectExpired deletes sessions whose absolute expiry
// has passed AND whose revoked_at is older than the configurable
// retention window (default 24h). Pre-login rows older than the
// 10-minute TTL are also deleted. Returns the number of rows
// deleted.
GarbageCollectExpired(ctx context.Context) (int, error)
// Delete unconditionally removes a session row. Used for the
// admin-only "purge a specific session" surface (rarely needed;
// Revoke is the normal path).
Delete(ctx context.Context, id string) error
}
// SessionSigningKeyRepository wraps the session_signing_keys table.
// Phase 4's Service.RotateSigningKey + EnsureInitialSigningKey + the
// scheduler-driven retention sweep consume this.
type SessionSigningKeyRepository interface {
// List returns every signing key in the tenant (including
// retired). Order: created_at DESC.
List(ctx context.Context, tenantID string) ([]*sessiondomain.SessionSigningKey, error)
// GetActive returns the most-recently-created non-retired key.
// ErrSessionSigningKeyNotFound when no non-retired key exists
// (Phase 4's EnsureInitialSigningKey treats this as "mint first
// key").
GetActive(ctx context.Context, tenantID string) (*sessiondomain.SessionSigningKey, error)
// Get returns one key by id (including retired keys; Phase 4's
// Validate consults this for cookies signed under retired-but-
// in-retention keys).
Get(ctx context.Context, id string) (*sessiondomain.SessionSigningKey, error)
// Add persists a new signing key. Caller MUST have called
// k.Validate() and encrypted the key_material via
// internal/crypto/encryption.go. CreatedAt defaults to NOW() if
// zero.
Add(ctx context.Context, k *sessiondomain.SessionSigningKey) error
// Retire marks an active key as retired (sets retired_at = NOW()).
// The key stays in the table for verification of cookies signed
// under it; the scheduler's retention sweep purges it after the
// configurable retention window (default 24h beyond retired_at).
Retire(ctx context.Context, id string) error
// Delete unconditionally removes a signing key row. Returns
// ErrSessionSigningKeyInUse if any sessions row still references
// the key (FK ON DELETE RESTRICT). Phase 4's GarbageCollect calls
// this only after RetentionWindow has passed AND no sessions
// reference the key.
Delete(ctx context.Context, id string) error
}