Files
certctl/internal/repository/postgres/session.go
T
shankar0123 17b30c1f7f auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap

Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.

Service surface (internal/auth/session/service.go, ~880 LOC):

  - Service.Create(actorID, actorType, ip, ua) -> *CreateResult
    Mints a session row; signs the cookie value with the active signing
    key; returns the cookie payload AND the CSRF token plaintext for
    the handler to set on the response.
  - Service.Validate(ValidateInput) -> *Session
    Parses the cookie, looks up the signing key (incl. retired-but-in-
    retention), recomputes HMAC-SHA256, loads the session row, enforces
    revocation + absolute + idle expiry + optional IP/UA bind. Maps to
    one of 9 sentinel errors; the handler uniformly returns 401 to the
    wire (specific reason in the audit row).
  - Service.ValidateCSRF(headerValue, *Session) error
    Constant-time compares SHA-256(header) against the stored hash on
    the session row.
  - Service.UpdateLastSeen / Revoke / RevokeAllForActor
  - Service.RotateCSRFToken — mints fresh token, persists hash, returns
    plaintext; called on login completion, logout, role-change against
    actor, explicit operator rotate.
  - Service.RotateSigningKey — mints new active key, retires previous;
    retired keys stay valid for cfg.SigningKeyRetention so existing
    cookies don't immediately fail.
  - Service.EnsureInitialSigningKey — idempotent; mints first key on
    fresh deploys; emits auth.session_signing_key_bootstrap audit row
    with event_category=auth. Wired into cmd/server/main.go AFTER
    migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
    is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
    to boot rather than serve session-less.
  - Service.GarbageCollect — sweeps expired post-login sessions +
    pre-login rows >10min + retired-past-retention signing keys. Wired
    into the new internal/scheduler/scheduler.go::sessionGCLoop on a
    CERTCTL_SESSION_GC_INTERVAL tick.

Cookie wire format (load-bearing):

  v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>

The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:

  len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id

where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.

The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).

CSRF token model:

  - Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
    intentional; the GUI must read it to echo into X-CSRF-Token header).
  - SHA-256 hash of the plaintext lives on the session row.
  - Validation: SHA-256(X-CSRF-Token) constant-time-compared.
  - Rotated by Service.RotateCSRFToken on login / logout / role-change /
    explicit admin-trigger.

Optional defense-in-depth (default OFF):

  - CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
    recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
    (user may have legitimate IP change). Mobile + corporate-NAT
    environments leave this off.
  - CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.

Configurable lifetimes (env vars wired in internal/config/config.go):

  CERTCTL_SESSION_IDLE_TIMEOUT             1h
  CERTCTL_SESSION_ABSOLUTE_TIMEOUT         8h
  CERTCTL_SESSION_SIGNING_KEY_RETENTION    24h
  CERTCTL_SESSION_GC_INTERVAL              1h
  CERTCTL_SESSION_SAMESITE                 Lax
  CERTCTL_SESSION_BIND_IP                  false
  CERTCTL_SESSION_BIND_USER_AGENT          false

Test surface (internal/auth/session/service_test.go, ~860 LOC):

  All 15 prompt-mandated negative cases:

    1.  Tampered cookie (HMAC byte flipped near segment start where all
        6 bits are real — base64url-no-pad's last char carries only 2
        bits so a tail-flip is unreliable).
    1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
    2.  Cookie missing v1. prefix.
    3.  Cookie with unknown version prefix (v99).
    4.  Idle expiry — back-dated last_seen_at + idle_expires_at.
    5.  Absolute expiry — back-dated absolute_expires_at.
    6.  Revoked session.
    7.  Wrong signing key id (no row matches).
    8.  Cookie signed under retired-but-in-retention key SUCCEEDS.
    9.  Cookie signed under retired-past-retention key FAILS.
    10. Concatenation collision — direct evidence that
        computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
        a forged-boundary-slide cookie is rejected.
    11. CSRF token missing.
    12. CSRF token mismatch (constant-time compare).
    13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
    14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
    15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
        wrap (cmd/server/main.go treats as fatal).

  Plus coverage-lift batch covering: every error wrap on every repo
  collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
  Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
  RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
  the cookie parser's full negative matrix (empty, wrong segment count,
  missing prefixes, bad base64, wrong HMAC length), and a real-encryption
  round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
  the v3-blob path is exercised end-to-end at the session-cookie level.

Coverage:

  internal/auth/session              94.5%  (floor 90)
  internal/auth/session/domain       96+%   (floor 90, Phase 1)

.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.

Repository extensions:

  internal/repository/session.go gains UpdateCSRFTokenHash on the
  SessionRepository interface; internal/repository/postgres/session.go
  ships the implementation. RotateCSRFToken consumes it.

Scheduler extensions:

  internal/scheduler/scheduler.go gains SessionGarbageCollector
  interface + sessionGC field + sessionGCInterval +
  SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
  Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
  concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
  per-tick context.WithTimeout(1m) bounds a stuck Postgres.

Server wiring:

  cmd/server/main.go constructs sessionService AFTER the bootstrap
  block (post-RBAC backfill) and BEFORE the policy-service block.
  EnsureInitialSigningKey runs immediately; failure is fatal via
  os.Exit(1). The scheduler section wires SetSessionGarbageCollector
  + SetSessionGCInterval alongside the other interval setters and
  emits an Info log so operators can confirm the loop is enabled.

Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.

Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
2026-05-10 05:31:24 +00:00

366 lines
12 KiB
Go

package postgres
import (
"context"
"database/sql"
"errors"
"fmt"
"github.com/lib/pq"
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
"github.com/certctl-io/certctl/internal/repository"
)
// =============================================================================
// SessionRepository (Auth Bundle 2 Phase 2)
// =============================================================================
// SessionRepository is the postgres implementation of
// repository.SessionRepository.
type SessionRepository struct {
db *sql.DB
}
// NewSessionRepository constructs a SessionRepository.
func NewSessionRepository(db *sql.DB) *SessionRepository {
return &SessionRepository{db: db}
}
const sessionColumns = `id, tenant_id, actor_id, actor_type,
signing_key_id, is_pre_login, csrf_token_hash,
idle_expires_at, absolute_expires_at, created_at, last_seen_at,
ip_address, user_agent, revoked_at`
func scanSession(row interface{ Scan(...interface{}) error }) (*sessiondomain.Session, error) {
var s sessiondomain.Session
var revokedAt sql.NullTime
if err := row.Scan(
&s.ID, &s.TenantID, &s.ActorID, &s.ActorType,
&s.SigningKeyID, &s.IsPreLogin, &s.CSRFTokenHash,
&s.IdleExpiresAt, &s.AbsoluteExpiresAt, &s.CreatedAt, &s.LastSeenAt,
&s.IPAddress, &s.UserAgent, &revokedAt,
); err != nil {
return nil, err
}
if revokedAt.Valid {
s.RevokedAt = &revokedAt.Time
}
return &s, nil
}
// Create persists a session row. Caller MUST have called s.Validate().
func (r *SessionRepository) Create(ctx context.Context, s *sessiondomain.Session) error {
_, err := r.db.ExecContext(ctx, `
INSERT INTO sessions (
id, tenant_id, actor_id, actor_type, signing_key_id,
is_pre_login, csrf_token_hash, idle_expires_at,
absolute_expires_at, created_at, last_seen_at,
ip_address, user_agent
) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13)`,
s.ID, s.TenantID, s.ActorID, s.ActorType, s.SigningKeyID,
s.IsPreLogin, s.CSRFTokenHash, s.IdleExpiresAt,
s.AbsoluteExpiresAt, s.CreatedAt, s.LastSeenAt,
s.IPAddress, s.UserAgent)
if err != nil {
var pqErr *pq.Error
if errors.As(err, &pqErr) && pqErr.Code == "23505" {
return repository.ErrAuthDuplicateName
}
return fmt.Errorf("sessions create: %w", err)
}
return nil
}
// Get returns a session by id. Returns the row even if revoked /
// expired; the service layer handles the disposition.
func (r *SessionRepository) Get(ctx context.Context, id string) (*sessiondomain.Session, error) {
row := r.db.QueryRowContext(ctx, `SELECT `+sessionColumns+` FROM sessions WHERE id = $1`, id)
s, err := scanSession(row)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, repository.ErrSessionNotFound
}
return nil, fmt.Errorf("sessions get: %w", err)
}
return s, nil
}
// ListByActor returns active (non-revoked, non-expired, non-pre-login)
// sessions for an actor.
func (r *SessionRepository) ListByActor(ctx context.Context, actorID, actorType, tenantID string) ([]*sessiondomain.Session, error) {
rows, err := r.db.QueryContext(ctx, `
SELECT `+sessionColumns+`
FROM sessions
WHERE actor_id = $1
AND actor_type = $2
AND tenant_id = $3
AND revoked_at IS NULL
AND is_pre_login = FALSE
AND absolute_expires_at > NOW()
ORDER BY created_at DESC`,
actorID, actorType, tenantID)
if err != nil {
return nil, fmt.Errorf("sessions list_by_actor: %w", err)
}
defer rows.Close()
var out []*sessiondomain.Session
for rows.Next() {
s, err := scanSession(rows)
if err != nil {
return nil, fmt.Errorf("sessions scan: %w", err)
}
out = append(out, s)
}
return out, rows.Err()
}
// UpdateLastSeen sets last_seen_at = NOW() for the named session.
func (r *SessionRepository) UpdateLastSeen(ctx context.Context, id string) error {
res, err := r.db.ExecContext(ctx, `UPDATE sessions SET last_seen_at = NOW() WHERE id = $1`, id)
if err != nil {
return fmt.Errorf("sessions update_last_seen: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
return repository.ErrSessionNotFound
}
return nil
}
// UpdateCSRFTokenHash replaces csrf_token_hash on the named session.
// Phase 4's RotateCSRFToken consumes this on login completion, logout,
// and any actor-role mutation against this actor.
func (r *SessionRepository) UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error {
res, err := r.db.ExecContext(ctx, `UPDATE sessions SET csrf_token_hash = $2 WHERE id = $1`, id, csrfTokenHash)
if err != nil {
return fmt.Errorf("sessions update_csrf_token_hash: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
return repository.ErrSessionNotFound
}
return nil
}
// Revoke sets revoked_at = NOW() for the named session. Idempotent:
// re-revoking an already-revoked session is a no-op (returns nil).
func (r *SessionRepository) Revoke(ctx context.Context, id string) error {
res, err := r.db.ExecContext(ctx, `UPDATE sessions SET revoked_at = NOW() WHERE id = $1 AND revoked_at IS NULL`, id)
if err != nil {
return fmt.Errorf("sessions revoke: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
// Distinguish "not found" from "already revoked" by re-querying.
row := r.db.QueryRowContext(ctx, `SELECT 1 FROM sessions WHERE id = $1`, id)
var x int
if err := row.Scan(&x); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return repository.ErrSessionNotFound
}
return fmt.Errorf("sessions revoke probe: %w", err)
}
// Row exists but already revoked: idempotent success.
}
return nil
}
// RevokeAllForActor sets revoked_at = NOW() on every active session
// for an actor. Returns nil on zero matches (idempotent).
func (r *SessionRepository) RevokeAllForActor(ctx context.Context, actorID, actorType, tenantID string) error {
_, err := r.db.ExecContext(ctx, `
UPDATE sessions SET revoked_at = NOW()
WHERE actor_id = $1 AND actor_type = $2 AND tenant_id = $3 AND revoked_at IS NULL`,
actorID, actorType, tenantID)
if err != nil {
return fmt.Errorf("sessions revoke_all_for_actor: %w", err)
}
return nil
}
// GarbageCollectExpired deletes:
// - Sessions whose absolute_expires_at < NOW() (post-login expired).
// - Pre-login sessions older than 10 minutes.
//
// Returns the number of rows deleted across both classes.
func (r *SessionRepository) GarbageCollectExpired(ctx context.Context) (int, error) {
res, err := r.db.ExecContext(ctx, `
DELETE FROM sessions
WHERE absolute_expires_at < NOW()
OR (is_pre_login = TRUE AND created_at < NOW() - INTERVAL '10 minutes')`)
if err != nil {
return 0, fmt.Errorf("sessions garbage_collect: %w", err)
}
n, _ := res.RowsAffected()
return int(n), nil
}
// Delete unconditionally removes a session row.
func (r *SessionRepository) Delete(ctx context.Context, id string) error {
res, err := r.db.ExecContext(ctx, `DELETE FROM sessions WHERE id = $1`, id)
if err != nil {
return fmt.Errorf("sessions delete: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
return repository.ErrSessionNotFound
}
return nil
}
// =============================================================================
// SessionSigningKeyRepository (Auth Bundle 2 Phase 2)
// =============================================================================
// SessionSigningKeyRepository is the postgres implementation of
// repository.SessionSigningKeyRepository.
type SessionSigningKeyRepository struct {
db *sql.DB
}
// NewSessionSigningKeyRepository constructs a SessionSigningKeyRepository.
func NewSessionSigningKeyRepository(db *sql.DB) *SessionSigningKeyRepository {
return &SessionSigningKeyRepository{db: db}
}
const sessionSigningKeyColumns = `id, tenant_id, key_material_encrypted, created_at, retired_at`
func scanSessionSigningKey(row interface{ Scan(...interface{}) error }) (*sessiondomain.SessionSigningKey, error) {
var k sessiondomain.SessionSigningKey
var retiredAt sql.NullTime
if err := row.Scan(&k.ID, &k.TenantID, &k.KeyMaterialEncrypted, &k.CreatedAt, &retiredAt); err != nil {
return nil, err
}
if retiredAt.Valid {
k.RetiredAt = &retiredAt.Time
}
return &k, nil
}
// List returns every signing key in the tenant, including retired ones.
func (r *SessionSigningKeyRepository) List(ctx context.Context, tenantID string) ([]*sessiondomain.SessionSigningKey, error) {
rows, err := r.db.QueryContext(ctx,
`SELECT `+sessionSigningKeyColumns+` FROM session_signing_keys WHERE tenant_id = $1 ORDER BY created_at DESC`,
tenantID)
if err != nil {
return nil, fmt.Errorf("session_signing_keys list: %w", err)
}
defer rows.Close()
var out []*sessiondomain.SessionSigningKey
for rows.Next() {
k, err := scanSessionSigningKey(rows)
if err != nil {
return nil, fmt.Errorf("session_signing_keys scan: %w", err)
}
out = append(out, k)
}
return out, rows.Err()
}
// GetActive returns the most-recently-created non-retired key. Returns
// ErrSessionSigningKeyNotFound when no non-retired key exists.
func (r *SessionSigningKeyRepository) GetActive(ctx context.Context, tenantID string) (*sessiondomain.SessionSigningKey, error) {
row := r.db.QueryRowContext(ctx, `
SELECT `+sessionSigningKeyColumns+`
FROM session_signing_keys
WHERE tenant_id = $1 AND retired_at IS NULL
ORDER BY created_at DESC
LIMIT 1`, tenantID)
k, err := scanSessionSigningKey(row)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, repository.ErrSessionSigningKeyNotFound
}
return nil, fmt.Errorf("session_signing_keys get_active: %w", err)
}
return k, nil
}
// Get returns a key by id (including retired keys; Phase 4's Validate
// consults this for cookies signed under retired-but-in-retention keys).
func (r *SessionSigningKeyRepository) Get(ctx context.Context, id string) (*sessiondomain.SessionSigningKey, error) {
row := r.db.QueryRowContext(ctx,
`SELECT `+sessionSigningKeyColumns+` FROM session_signing_keys WHERE id = $1`, id)
k, err := scanSessionSigningKey(row)
if err != nil {
if errors.Is(err, sql.ErrNoRows) {
return nil, repository.ErrSessionSigningKeyNotFound
}
return nil, fmt.Errorf("session_signing_keys get: %w", err)
}
return k, nil
}
// Add persists a new signing key. Caller MUST have called k.Validate().
func (r *SessionSigningKeyRepository) Add(ctx context.Context, k *sessiondomain.SessionSigningKey) error {
if k.CreatedAt.IsZero() {
_, err := r.db.ExecContext(ctx, `
INSERT INTO session_signing_keys (id, tenant_id, key_material_encrypted)
VALUES ($1, $2, $3)`,
k.ID, k.TenantID, k.KeyMaterialEncrypted)
if err != nil {
return fmt.Errorf("session_signing_keys add: %w", err)
}
// Read the row back to populate CreatedAt.
row := r.db.QueryRowContext(ctx, `SELECT created_at FROM session_signing_keys WHERE id = $1`, k.ID)
if err := row.Scan(&k.CreatedAt); err != nil {
return fmt.Errorf("session_signing_keys add (read created_at): %w", err)
}
return nil
}
_, err := r.db.ExecContext(ctx, `
INSERT INTO session_signing_keys (id, tenant_id, key_material_encrypted, created_at)
VALUES ($1, $2, $3, $4)`,
k.ID, k.TenantID, k.KeyMaterialEncrypted, k.CreatedAt)
if err != nil {
return fmt.Errorf("session_signing_keys add: %w", err)
}
return nil
}
// Retire marks an active key as retired (sets retired_at = NOW()).
// Idempotent: re-retiring an already-retired key is a no-op.
func (r *SessionSigningKeyRepository) Retire(ctx context.Context, id string) error {
res, err := r.db.ExecContext(ctx,
`UPDATE session_signing_keys SET retired_at = NOW() WHERE id = $1 AND retired_at IS NULL`, id)
if err != nil {
return fmt.Errorf("session_signing_keys retire: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
// Distinguish not-found vs already-retired.
row := r.db.QueryRowContext(ctx, `SELECT 1 FROM session_signing_keys WHERE id = $1`, id)
var x int
if err := row.Scan(&x); err != nil {
if errors.Is(err, sql.ErrNoRows) {
return repository.ErrSessionSigningKeyNotFound
}
return fmt.Errorf("session_signing_keys retire probe: %w", err)
}
// Row exists but already retired: idempotent success.
}
return nil
}
// Delete unconditionally removes a signing key. Returns
// ErrSessionSigningKeyInUse on SQLSTATE 23503 (FK ON DELETE RESTRICT
// from sessions.signing_key_id).
func (r *SessionSigningKeyRepository) Delete(ctx context.Context, id string) error {
res, err := r.db.ExecContext(ctx, `DELETE FROM session_signing_keys WHERE id = $1`, id)
if err != nil {
var pqErr *pq.Error
if errors.As(err, &pqErr) && pqErr.Code == "23503" {
return repository.ErrSessionSigningKeyInUse
}
return fmt.Errorf("session_signing_keys delete: %w", err)
}
n, _ := res.RowsAffected()
if n == 0 {
return repository.ErrSessionSigningKeyNotFound
}
return nil
}