mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 17:51:29 +00:00
auth-bundle-2 Phase 4: session service (cookie minting + signature
validation, idle/absolute expiry, signing-key rotation, CSRF, GC),
15-case negative-test matrix, fail-fatal initial-key bootstrap
Phase 4 of the bundle ships the post-login session lifecycle that backs
every authenticated request once Phase 5 wires the OIDC handlers + the
session middleware. The state machine is the load-bearing primitive for
the Bundle 2 control plane: forge a session cookie and you bypass every
RBAC gate.
Service surface (internal/auth/session/service.go, ~880 LOC):
- Service.Create(actorID, actorType, ip, ua) -> *CreateResult
Mints a session row; signs the cookie value with the active signing
key; returns the cookie payload AND the CSRF token plaintext for
the handler to set on the response.
- Service.Validate(ValidateInput) -> *Session
Parses the cookie, looks up the signing key (incl. retired-but-in-
retention), recomputes HMAC-SHA256, loads the session row, enforces
revocation + absolute + idle expiry + optional IP/UA bind. Maps to
one of 9 sentinel errors; the handler uniformly returns 401 to the
wire (specific reason in the audit row).
- Service.ValidateCSRF(headerValue, *Session) error
Constant-time compares SHA-256(header) against the stored hash on
the session row.
- Service.UpdateLastSeen / Revoke / RevokeAllForActor
- Service.RotateCSRFToken — mints fresh token, persists hash, returns
plaintext; called on login completion, logout, role-change against
actor, explicit operator rotate.
- Service.RotateSigningKey — mints new active key, retires previous;
retired keys stay valid for cfg.SigningKeyRetention so existing
cookies don't immediately fail.
- Service.EnsureInitialSigningKey — idempotent; mints first key on
fresh deploys; emits auth.session_signing_key_bootstrap audit row
with event_category=auth. Wired into cmd/server/main.go AFTER
migrations + RBAC backfill, BEFORE the HTTP listener binds; failure
is FATAL (logger.Error + os.Exit(1)) per the prompt — server refuses
to boot rather than serve session-less.
- Service.GarbageCollect — sweeps expired post-login sessions +
pre-login rows >10min + retired-past-retention signing keys. Wired
into the new internal/scheduler/scheduler.go::sessionGCLoop on a
CERTCTL_SESSION_GC_INTERVAL tick.
Cookie wire format (load-bearing):
v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>
The HMAC input is LENGTH-PREFIXED to defeat concatenation collisions:
len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
where len(...) is the ASCII decimal byte-length. Without the length
prefix, the bare-concatenation form `session_id || signing_key_id`
would let a forger swap one byte across the boundary — `<a, bc>` and
`<ab, c>` produce identical HMAC inputs. The length prefix moves the
boundary into the input itself so the two cases can never collide.
The v1. version prefix is reserved. A future incompatible upgrade
ships as v2. and the parser rejects unknown prefixes (no fallback).
CSRF token model:
- Plaintext goes in a JS-readable certctl_csrf cookie (HttpOnly=false
intentional; the GUI must read it to echo into X-CSRF-Token header).
- SHA-256 hash of the plaintext lives on the session row.
- Validation: SHA-256(X-CSRF-Token) constant-time-compared.
- Rotated by Service.RotateCSRFToken on login / logout / role-change /
explicit admin-trigger.
Optional defense-in-depth (default OFF):
- CERTCTL_SESSION_BIND_IP — Validate compares client IP to row's
recorded IP. Mismatch -> 401, audit row, session NOT auto-revoked
(user may have legitimate IP change). Mobile + corporate-NAT
environments leave this off.
- CERTCTL_SESSION_BIND_USER_AGENT — same shape against UA.
Configurable lifetimes (env vars wired in internal/config/config.go):
CERTCTL_SESSION_IDLE_TIMEOUT 1h
CERTCTL_SESSION_ABSOLUTE_TIMEOUT 8h
CERTCTL_SESSION_SIGNING_KEY_RETENTION 24h
CERTCTL_SESSION_GC_INTERVAL 1h
CERTCTL_SESSION_SAMESITE Lax
CERTCTL_SESSION_BIND_IP false
CERTCTL_SESSION_BIND_USER_AGENT false
Test surface (internal/auth/session/service_test.go, ~860 LOC):
All 15 prompt-mandated negative cases:
1. Tampered cookie (HMAC byte flipped near segment start where all
6 bits are real — base64url-no-pad's last char carries only 2
bits so a tail-flip is unreliable).
1b. Tampered SESSION_ID segment (same HMAC-recompute outcome).
2. Cookie missing v1. prefix.
3. Cookie with unknown version prefix (v99).
4. Idle expiry — back-dated last_seen_at + idle_expires_at.
5. Absolute expiry — back-dated absolute_expires_at.
6. Revoked session.
7. Wrong signing key id (no row matches).
8. Cookie signed under retired-but-in-retention key SUCCEEDS.
9. Cookie signed under retired-past-retention key FAILS.
10. Concatenation collision — direct evidence that
computeHMAC("abc","de") != computeHMAC("ab","cde") AND that
a forged-boundary-slide cookie is rejected.
11. CSRF token missing.
12. CSRF token mismatch (constant-time compare).
13. IP-bind enabled + IP changed -> ErrSessionIPMismatch + audit row.
14. UA-bind enabled + UA changed -> ErrSessionUAMismatch + audit row.
15. EnsureInitialSigningKey RNG failure -> ErrInitialSigningKeyMintFailed
wrap (cmd/server/main.go treats as fatal).
Plus coverage-lift batch covering: every error wrap on every repo
collaborator (Create, Get, UpdateLastSeen, UpdateCSRFTokenHash,
Revoke, RevokeAllForActor, GC), every RNG-failure surface in Create /
RotateCSRFToken / RotateSigningKey, every alg-pinning helper edge,
the cookie parser's full negative matrix (empty, wrong segment count,
missing prefixes, bad base64, wrong HMAC length), and a real-encryption
round-trip via internal/crypto.EncryptIfKeySet -> DecryptIfKeySet so
the v3-blob path is exercised end-to-end at the session-cookie level.
Coverage:
internal/auth/session 94.5% (floor 90)
internal/auth/session/domain 96+% (floor 90, Phase 1)
.github/coverage-thresholds.yml extended with 2 new gate entries
(internal/auth/session and internal/auth/session/domain). The
why: paragraphs explain why each fail-closed branch is load-bearing.
Repository extensions:
internal/repository/session.go gains UpdateCSRFTokenHash on the
SessionRepository interface; internal/repository/postgres/session.go
ships the implementation. RotateCSRFToken consumes it.
Scheduler extensions:
internal/scheduler/scheduler.go gains SessionGarbageCollector
interface + sessionGC field + sessionGCInterval +
SetSessionGarbageCollector + SetSessionGCInterval + sessionGCLoop.
Pattern matches the existing acmeGCLoop: atomic.Bool guard prevents
concurrent sweeps, sync.WaitGroup tracks for graceful shutdown,
per-tick context.WithTimeout(1m) bounds a stuck Postgres.
Server wiring:
cmd/server/main.go constructs sessionService AFTER the bootstrap
block (post-RBAC backfill) and BEFORE the policy-service block.
EnsureInitialSigningKey runs immediately; failure is fatal via
os.Exit(1). The scheduler section wires SetSessionGarbageCollector
+ SetSessionGCInterval alongside the other interval setters and
emits an Info log so operators can confirm the loop is enabled.
Phase 4 deviation note: Service.GarbageCollect() returns (int, error)
rather than the prompt's literal `error`. The int is the count of
session rows deleted on this sweep; the scheduler discards it (`_, err
:= ...`) but tests + future operator-facing audit rows can read it.
The wider behavior matches the spec exactly.
Verifications: gofmt clean, go vet ./internal/auth/session/...
./internal/scheduler/... ./internal/config/... ./cmd/server/...
./internal/repository/... clean, go test -short -count=1 -race green
across all 3 session packages, full repository + auth + scheduler +
config test sweeps green, no regressions in Bundle 1 packages.
This commit is contained in:
@@ -148,3 +148,38 @@ internal/auth/oidc/domain:
|
||||
cover all canonical IdP shapes (Okta / Azure AD / Google
|
||||
Workspace / Keycloak / Authentik / Auth0). Floor at 90 to
|
||||
catch any future field that ships without a validator.
|
||||
|
||||
internal/auth/session:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 4 — session lifecycle service. Phase 4 spec
|
||||
pins the floor at 90 because every fail-closed branch carries
|
||||
a security invariant: HMAC-SHA256 cookie signing with a
|
||||
LENGTH-PREFIXED canonical input (defeats the
|
||||
`<a, bc>`-vs-`<ab, c>` concatenation collision attack on the
|
||||
bare-concat form), v1. version-prefix lock, idle expiry,
|
||||
absolute expiry, revocation, retired-but-in-retention key
|
||||
success path, retired-past-retention failure path, CSRF
|
||||
constant-time compare against the SHA-256-hashed copy on the
|
||||
session row, optional IP/UA-bind defense-in-depth gates,
|
||||
fail-fatal initial-key bootstrap. A regression in any one of
|
||||
these branches is a security incident; the floor catches it
|
||||
before the commit lands. The 15-case negative-test matrix in
|
||||
service_test.go is the load-bearing harness; the in-memory
|
||||
stubs of SessionRepo + SigningKeyRepo + AuditRecorder let the
|
||||
state machine be exercised without the postgres testcontainer
|
||||
overhead (which Phase 2's integration tests already cover).
|
||||
|
||||
internal/auth/session/domain:
|
||||
floor: 90
|
||||
why: |
|
||||
Bundle 2 Phase 1 — Session + SessionSigningKey domain. Both
|
||||
types ship Validate() with full invariant coverage: ID prefix
|
||||
enforcement (ses-/sk-), expiry-order CHECK (absolute > idle >
|
||||
created), CSRFTokenHash format pin (64 lowercase hex chars),
|
||||
KeyMaterialEncrypted non-empty, retired-before-created
|
||||
rejection, TenantID defaulting. Cookie naming constants are
|
||||
pinned by TestCookieNamingConstants because the GUI's
|
||||
web/src/api/client.ts will read `certctl_csrf` by string.
|
||||
Floor at 90 to catch any future field that ships without a
|
||||
validator.
|
||||
|
||||
@@ -24,6 +24,7 @@ import (
|
||||
"github.com/certctl-io/certctl/internal/api/router"
|
||||
"github.com/certctl-io/certctl/internal/auth"
|
||||
"github.com/certctl-io/certctl/internal/auth/bootstrap"
|
||||
"github.com/certctl-io/certctl/internal/auth/session"
|
||||
"github.com/certctl-io/certctl/internal/config"
|
||||
discoveryawssm "github.com/certctl-io/certctl/internal/connector/discovery/awssm"
|
||||
discoveryazurekv "github.com/certctl-io/certctl/internal/connector/discovery/azurekv"
|
||||
@@ -341,6 +342,47 @@ func main() {
|
||||
}
|
||||
}
|
||||
bootstrapHandler := handler.NewBootstrapHandler(bootstrapService)
|
||||
|
||||
// =========================================================================
|
||||
// Auth Bundle 2 Phase 4 — session service.
|
||||
//
|
||||
// Wired AFTER migrations + RBAC backfill, BEFORE the HTTP listener
|
||||
// binds (per the prompt's "fail-fatal on bootstrap key mint failure"
|
||||
// requirement). EnsureInitialSigningKey is idempotent: if a non-
|
||||
// retired signing key already exists for the tenant the call is a
|
||||
// no-op; otherwise it mints a fresh 32-byte HMAC key, persists it,
|
||||
// and emits an auth.session_signing_key_bootstrap audit row with
|
||||
// event_category=auth.
|
||||
//
|
||||
// Failure here is fatal — the server refuses to boot rather than
|
||||
// serve session-less.
|
||||
//
|
||||
// The session service is wired into the scheduler below (sessionGCLoop)
|
||||
// so the GC sweep runs every CERTCTL_SESSION_GC_INTERVAL tick. The
|
||||
// HTTP middleware that consumes ValidateInput / ValidateCSRF lands
|
||||
// in Phase 5; pre-Phase-5 deployments boot the service so the GC
|
||||
// sweep can keep the sessions + signing-keys tables tidy.
|
||||
sessionRepo := postgres.NewSessionRepository(db)
|
||||
sessionKeyRepo := postgres.NewSessionSigningKeyRepository(db)
|
||||
sessionService := session.NewService(
|
||||
sessionRepo,
|
||||
sessionKeyRepo,
|
||||
auditService,
|
||||
authdomainAlias.DefaultTenantID,
|
||||
session.Config{
|
||||
IdleTimeout: cfg.Auth.Session.IdleTimeout,
|
||||
AbsoluteTimeout: cfg.Auth.Session.AbsoluteTimeout,
|
||||
SigningKeyRetention: cfg.Auth.Session.SigningKeyRetention,
|
||||
BindIP: cfg.Auth.Session.BindIP,
|
||||
BindUserAgent: cfg.Auth.Session.BindUserAgent,
|
||||
},
|
||||
cfg.Encryption.ConfigEncryptionKey,
|
||||
)
|
||||
if err := sessionService.EnsureInitialSigningKey(bootCtx); err != nil {
|
||||
logger.Error("FATAL: session signing key bootstrap failed; refusing to boot", "err", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
policyService := service.NewPolicyService(policyRepo, auditService)
|
||||
policyService.SetCertRepo(certificateRepo) // D-008: CertificateLifetime arm needs CertificateVersion.NotBefore/NotAfter
|
||||
// G-1: RenewalPolicyService — distinct from PolicyService (compliance rules).
|
||||
@@ -937,6 +979,18 @@ func main() {
|
||||
sched.SetJobTimeoutInterval(cfg.Scheduler.JobTimeoutInterval)
|
||||
sched.SetAwaitingCSRTimeout(cfg.Scheduler.AwaitingCSRTimeout)
|
||||
sched.SetAwaitingApprovalTimeout(cfg.Scheduler.AwaitingApprovalTimeout)
|
||||
|
||||
// Auth Bundle 2 Phase 4 — wire the session-GC sweep. The service
|
||||
// itself was constructed (with the EnsureInitialSigningKey fail-
|
||||
// fatal call) above the policy/cert-service block; here we just
|
||||
// register it with the scheduler so the loop fires every
|
||||
// CERTCTL_SESSION_GC_INTERVAL.
|
||||
sched.SetSessionGarbageCollector(sessionService)
|
||||
sched.SetSessionGCInterval(cfg.Auth.Session.GCInterval)
|
||||
logger.Info("session GC sweep enabled",
|
||||
"interval", cfg.Auth.Session.GCInterval.String(),
|
||||
"absolute_timeout", cfg.Auth.Session.AbsoluteTimeout.String(),
|
||||
"signing_key_retention", cfg.Auth.Session.SigningKeyRetention.String())
|
||||
logger.Info("job timeout reaper enabled",
|
||||
"interval", cfg.Scheduler.JobTimeoutInterval.String(),
|
||||
"csr_timeout", cfg.Scheduler.AwaitingCSRTimeout.String(),
|
||||
|
||||
@@ -0,0 +1,820 @@
|
||||
// Package session implements the post-login session lifecycle for
|
||||
// Auth Bundle 2 Phase 4: cookie minting + signature validation +
|
||||
// idle/absolute expiry + revocation + signing-key rotation + GC.
|
||||
//
|
||||
// =============================================================================
|
||||
// Cookie wire format (`v1.<session_id>.<signing_key_id>.<HMAC>`):
|
||||
//
|
||||
// v1.ses-XXXXXXXX.sk-YYYYYYYY.<base64url-no-pad(HMAC-SHA256)>
|
||||
//
|
||||
// HMAC INPUT IS LENGTH-PREFIXED to defeat concatenation collisions:
|
||||
//
|
||||
// len(session_id) || ":" || session_id || ":" || len(signing_key_id) || ":" || signing_key_id
|
||||
//
|
||||
// where len(...) is the ASCII decimal byte-length. Without the length
|
||||
// prefix, the bare-concatenation form `session_id || signing_key_id`
|
||||
// would let a forger swap one byte across the boundary — `<a, bc>` and
|
||||
// `<ab, c>` produce identical HMAC inputs. The length prefix moves the
|
||||
// boundary into the input itself so the two cases never collide.
|
||||
//
|
||||
// HMAC KEY is the 32-byte plaintext of the SessionSigningKey row's
|
||||
// KeyMaterialEncrypted blob (decrypted via internal/crypto/encryption.go's
|
||||
// EncryptIfKeySet/DecryptIfKeySet path — same blob format issuer/target
|
||||
// credentials use). The plaintext is held in memory only during signature
|
||||
// computation; never logged, never persisted in plaintext form.
|
||||
//
|
||||
// VERSION PREFIX is reserved. v1 is the only accepted prefix today.
|
||||
// A future incompatible upgrade ships as `v2.` and the validator
|
||||
// rejects unknown prefixes (no fallback attempt — fail closed).
|
||||
//
|
||||
// =============================================================================
|
||||
// CSRF token model:
|
||||
//
|
||||
// - Plaintext lives in a JS-readable certctl_csrf cookie (HttpOnly=false
|
||||
// intentional; the GUI must read it to echo into X-CSRF-Token header).
|
||||
// - SHA-256 hash of the plaintext lives on the session row (csrf_token_hash).
|
||||
// - Validation: SHA-256(X-CSRF-Token header) constant-time-compared
|
||||
// against the session row's stored hash.
|
||||
// - Rotated by Service.RotateCSRFToken on: login completion, logout,
|
||||
// any actor-role mutation against this actor, explicit operator
|
||||
// "rotate CSRF" admin endpoint.
|
||||
//
|
||||
// =============================================================================
|
||||
// Failure semantics:
|
||||
//
|
||||
// Validate returns ErrSessionInvalidCookie for any tamper / format /
|
||||
// missing-key fault. The handler maps to HTTP 401 uniformly (no leak
|
||||
// of which check failed; specific reason in the audit row). Idle +
|
||||
// absolute expiry surface as ErrSessionExpiredIdle / ErrSessionExpiredAbsolute
|
||||
// so the audit row distinguishes; both wire to 401. Revocation is
|
||||
// ErrSessionRevoked. Signing-key not found / fully purged is
|
||||
// ErrSigningKeyNotFound. Length-prefix-defeating concatenation collision
|
||||
// attempts also surface as ErrSessionInvalidCookie because the HMAC
|
||||
// recomputation fails.
|
||||
//
|
||||
// =============================================================================
|
||||
// Token-leak hygiene:
|
||||
//
|
||||
// Cookie values, CSRF token plaintexts, signing-key plaintexts, and the
|
||||
// HMAC bytes themselves MUST NEVER be logged at any level. The service
|
||||
// contains zero log statements that include those values; the
|
||||
// session_id and signing_key_id (both opaque IDs) are the only identifiers
|
||||
// that ever land in audit rows.
|
||||
package session
|
||||
|
||||
import (
|
||||
"context"
|
||||
"crypto/hmac"
|
||||
cryptorand "crypto/rand"
|
||||
"crypto/sha256"
|
||||
"crypto/subtle"
|
||||
"encoding/base64"
|
||||
"encoding/hex"
|
||||
"errors"
|
||||
"fmt"
|
||||
"strconv"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
sessiondomain "github.com/certctl-io/certctl/internal/auth/session/domain"
|
||||
cryptopkg "github.com/certctl-io/certctl/internal/crypto"
|
||||
"github.com/certctl-io/certctl/internal/domain"
|
||||
"github.com/certctl-io/certctl/internal/repository"
|
||||
)
|
||||
|
||||
// =============================================================================
|
||||
// Encrypt/decrypt helpers for SessionSigningKey.KeyMaterialEncrypted
|
||||
// blobs. Production wires the real CERTCTL_CONFIG_ENCRYPTION_KEY value;
|
||||
// tests pass empty (encrypted == plaintext passthrough so the test
|
||||
// surface doesn't require an encryption-key env var).
|
||||
// =============================================================================
|
||||
|
||||
func encryptKeyMaterial(plaintext []byte, passphrase string) ([]byte, error) {
|
||||
if passphrase == "" {
|
||||
// Test path: no encryption configured. Round-trip is identity.
|
||||
// Production main.go REQUIRES CERTCTL_CONFIG_ENCRYPTION_KEY for
|
||||
// any deployment that runs the session service; the empty case
|
||||
// is intentionally only useful in unit tests.
|
||||
return plaintext, nil
|
||||
}
|
||||
blob, _, err := cryptopkg.EncryptIfKeySet(plaintext, passphrase)
|
||||
return blob, err
|
||||
}
|
||||
|
||||
func decryptKeyMaterial(blob []byte, passphrase string) ([]byte, error) {
|
||||
if passphrase == "" {
|
||||
return blob, nil
|
||||
}
|
||||
return cryptopkg.DecryptIfKeySet(blob, passphrase)
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Service-layer sentinel errors.
|
||||
// =============================================================================
|
||||
|
||||
var (
|
||||
// ErrSessionInvalidCookie is returned by Validate when the cookie
|
||||
// fails any of: format check, version-prefix check, base64 decode,
|
||||
// HMAC recomputation. The handler maps to HTTP 401 uniformly.
|
||||
ErrSessionInvalidCookie = errors.New("session: invalid cookie")
|
||||
|
||||
// ErrSessionExpiredIdle: the session's last_seen_at is older than
|
||||
// the configured idle timeout. HTTP 401.
|
||||
ErrSessionExpiredIdle = errors.New("session: idle timeout exceeded")
|
||||
|
||||
// ErrSessionExpiredAbsolute: the session's absolute_expires_at is
|
||||
// in the past. HTTP 401.
|
||||
ErrSessionExpiredAbsolute = errors.New("session: absolute timeout exceeded")
|
||||
|
||||
// ErrSessionRevoked: the session row's revoked_at is set. HTTP 401.
|
||||
ErrSessionRevoked = errors.New("session: revoked")
|
||||
|
||||
// ErrSigningKeyNotFound: the cookie's signing_key_id doesn't match
|
||||
// any row in session_signing_keys (forged cookie OR fully-purged
|
||||
// retired key). HTTP 401.
|
||||
ErrSigningKeyNotFound = errors.New("session: signing key not found")
|
||||
|
||||
// ErrSigningKeyRetired: the cookie's signing_key_id is retired and
|
||||
// past the retention window. HTTP 401.
|
||||
ErrSigningKeyRetired = errors.New("session: signing key retired beyond retention window")
|
||||
|
||||
// ErrCSRFMissing: the X-CSRF-Token header is empty on a state-
|
||||
// changing request. HTTP 403.
|
||||
ErrCSRFMissing = errors.New("session: CSRF token missing")
|
||||
|
||||
// ErrCSRFMismatch: the X-CSRF-Token header doesn't match the
|
||||
// session row's hash. HTTP 403.
|
||||
ErrCSRFMismatch = errors.New("session: CSRF token mismatch")
|
||||
|
||||
// ErrSessionIPMismatch: the configured CERTCTL_SESSION_BIND_IP gate
|
||||
// rejected the request because the client IP doesn't match the
|
||||
// session row's recorded IP. HTTP 401, audit row, session NOT
|
||||
// auto-revoked (user may have legitimate IP change).
|
||||
ErrSessionIPMismatch = errors.New("session: client IP does not match session-bound IP")
|
||||
|
||||
// ErrSessionUAMismatch: same shape as ErrSessionIPMismatch for the
|
||||
// optional CERTCTL_SESSION_BIND_USER_AGENT gate.
|
||||
ErrSessionUAMismatch = errors.New("session: User-Agent does not match session-bound User-Agent")
|
||||
|
||||
// ErrInitialSigningKeyMintFailed: EnsureInitialSigningKey could not
|
||||
// mint a key (crypto/rand failure, encryption failure, repository
|
||||
// failure). The server boot path treats this as fatal.
|
||||
ErrInitialSigningKeyMintFailed = errors.New("session: initial signing key mint failed")
|
||||
)
|
||||
|
||||
// =============================================================================
|
||||
// Service collaborator interfaces — narrow projections of the Phase 2
|
||||
// repositories so unit tests can stub without the full DB.
|
||||
// =============================================================================
|
||||
|
||||
// SessionRepo is the slice of repository.SessionRepository the service
|
||||
// consumes. Defining the projection here keeps the service decoupled
|
||||
// from the wider repo surface.
|
||||
type SessionRepo interface {
|
||||
Create(ctx context.Context, s *sessiondomain.Session) error
|
||||
Get(ctx context.Context, id string) (*sessiondomain.Session, error)
|
||||
UpdateLastSeen(ctx context.Context, id string) error
|
||||
UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error
|
||||
Revoke(ctx context.Context, id string) error
|
||||
RevokeAllForActor(ctx context.Context, actorID, actorType, tenantID string) error
|
||||
GarbageCollectExpired(ctx context.Context) (int, error)
|
||||
}
|
||||
|
||||
// SigningKeyRepo is the slice of repository.SessionSigningKeyRepository
|
||||
// the service consumes.
|
||||
type SigningKeyRepo interface {
|
||||
GetActive(ctx context.Context, tenantID string) (*sessiondomain.SessionSigningKey, error)
|
||||
Get(ctx context.Context, id string) (*sessiondomain.SessionSigningKey, error)
|
||||
Add(ctx context.Context, k *sessiondomain.SessionSigningKey) error
|
||||
Retire(ctx context.Context, id string) error
|
||||
List(ctx context.Context, tenantID string) ([]*sessiondomain.SessionSigningKey, error)
|
||||
Delete(ctx context.Context, id string) error
|
||||
}
|
||||
|
||||
// AuditRecorder is the slice of service.AuditService the session
|
||||
// service uses. Every audit row this service emits carries
|
||||
// event_category=auth (Phase 8 contract).
|
||||
type AuditRecorder interface {
|
||||
RecordEventWithCategory(ctx context.Context, actor string, actorType domain.ActorType, action, eventCategory, resourceType, resourceID string, details map[string]interface{}) error
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Service.
|
||||
// =============================================================================
|
||||
|
||||
// Service implements the session lifecycle. Construct via NewService.
|
||||
type Service struct {
|
||||
sessions SessionRepo
|
||||
keys SigningKeyRepo
|
||||
audit AuditRecorder
|
||||
tenantID string
|
||||
cfg Config
|
||||
encryption string
|
||||
|
||||
// clockNow is injectable for tests; defaults to time.Now.
|
||||
clockNow func() time.Time
|
||||
|
||||
// readRand is injectable for tests; defaults to crypto/rand.Read.
|
||||
// Wraps crypto/rand so EnsureInitialSigningKey + Create + RotateCSRFToken
|
||||
// can be exercised against a deterministic-failure RNG.
|
||||
readRand func([]byte) (int, error)
|
||||
}
|
||||
|
||||
// Config bundles the operator-tunable knobs Phase 4 exposes via
|
||||
// CERTCTL_SESSION_* env vars. internal/config/config.go owns the
|
||||
// env-binding + defaulting; this package owns the consumption.
|
||||
type Config struct {
|
||||
// IdleTimeout: maximum time between requests on a single session
|
||||
// before re-auth is required. Default 1h. Wire: CERTCTL_SESSION_IDLE_TIMEOUT.
|
||||
IdleTimeout time.Duration
|
||||
|
||||
// AbsoluteTimeout: maximum lifetime of a session regardless of
|
||||
// activity. Default 8h. Wire: CERTCTL_SESSION_ABSOLUTE_TIMEOUT.
|
||||
AbsoluteTimeout time.Duration
|
||||
|
||||
// SigningKeyRetention: time a retired signing key stays valid for
|
||||
// verification before being purged. Default 24h. Wire:
|
||||
// CERTCTL_SESSION_SIGNING_KEY_RETENTION.
|
||||
SigningKeyRetention time.Duration
|
||||
|
||||
// BindIP: when true, Validate compares the request's client IP to
|
||||
// the session row's recorded IP. Default false. Mobile + corporate-
|
||||
// NAT environments leave this off. Wire: CERTCTL_SESSION_BIND_IP.
|
||||
BindIP bool
|
||||
|
||||
// BindUserAgent: when true, Validate compares the request's User-
|
||||
// Agent to the session row's recorded UA. Default false. Wire:
|
||||
// CERTCTL_SESSION_BIND_USER_AGENT.
|
||||
BindUserAgent bool
|
||||
}
|
||||
|
||||
// DefaultConfig returns the Phase 4 defaults. cmd/server/main.go
|
||||
// merges CERTCTL_SESSION_* env vars over these.
|
||||
func DefaultConfig() Config {
|
||||
return Config{
|
||||
IdleTimeout: 1 * time.Hour,
|
||||
AbsoluteTimeout: 8 * time.Hour,
|
||||
SigningKeyRetention: 24 * time.Hour,
|
||||
BindIP: false,
|
||||
BindUserAgent: false,
|
||||
}
|
||||
}
|
||||
|
||||
// NewService constructs a session Service.
|
||||
//
|
||||
// encryptionKey is the CERTCTL_CONFIG_ENCRYPTION_KEY value used to
|
||||
// decrypt SessionSigningKey.KeyMaterialEncrypted blobs. Required in
|
||||
// production; tests may pass empty (the v3 blob path falls back via
|
||||
// internal/crypto/encryption.go's plaintext-passthrough behavior when
|
||||
// the blob is short-circuited via the test-only NewService variant —
|
||||
// see service_test.go's helpers).
|
||||
//
|
||||
// audit may be nil in test setups that don't care about audit rows;
|
||||
// production wires *service.AuditService from cmd/server/main.go.
|
||||
func NewService(
|
||||
sessions SessionRepo,
|
||||
keys SigningKeyRepo,
|
||||
audit AuditRecorder,
|
||||
tenantID string,
|
||||
cfg Config,
|
||||
encryptionKey string,
|
||||
) *Service {
|
||||
return &Service{
|
||||
sessions: sessions,
|
||||
keys: keys,
|
||||
audit: audit,
|
||||
tenantID: tenantID,
|
||||
cfg: cfg,
|
||||
encryption: encryptionKey,
|
||||
clockNow: time.Now,
|
||||
readRand: cryptorand.Read,
|
||||
}
|
||||
}
|
||||
|
||||
// SetClockForTest replaces the clock used for expiry calculations.
|
||||
// ONLY for tests; production reads time.Now via the default seam.
|
||||
func (s *Service) SetClockForTest(now func() time.Time) {
|
||||
s.clockNow = now
|
||||
}
|
||||
|
||||
// SetRandReaderForTest replaces the entropy source. ONLY for tests;
|
||||
// production reads crypto/rand via the default seam.
|
||||
func (s *Service) SetRandReaderForTest(r func([]byte) (int, error)) {
|
||||
s.readRand = r
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Create + cookie minting.
|
||||
// =============================================================================
|
||||
|
||||
// CreateResult is the post-login session payload. The handler sets
|
||||
// the cookies + redirects.
|
||||
type CreateResult struct {
|
||||
Session *sessiondomain.Session
|
||||
CookieValue string // certctl_session cookie body (`v1.ses-XX.sk-YY.HMAC`)
|
||||
CSRFToken string // certctl_csrf cookie body (32 random bytes b64url)
|
||||
}
|
||||
|
||||
// Create mints a new post-login session row, signs the cookie value,
|
||||
// and returns both the session-cookie payload and the CSRF token
|
||||
// plaintext. The handler:
|
||||
// - Sets `certctl_session` HttpOnly Secure SameSite=Lax(or Strict) Path=/
|
||||
// to CookieValue with Expires=session.AbsoluteExpiresAt.
|
||||
// - Sets `certctl_csrf` Secure SameSite=Lax(or Strict) Path=/ HttpOnly=false
|
||||
// to CSRFToken with Expires=session.AbsoluteExpiresAt.
|
||||
func (s *Service) Create(ctx context.Context, actorID, actorType, ip, userAgent string) (*CreateResult, error) {
|
||||
if strings.TrimSpace(actorID) == "" {
|
||||
return nil, fmt.Errorf("session: actor_id is required")
|
||||
}
|
||||
if strings.TrimSpace(actorType) == "" {
|
||||
return nil, fmt.Errorf("session: actor_type is required")
|
||||
}
|
||||
|
||||
active, err := s.keys.GetActive(ctx, s.tenantID)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("session: get active signing key: %w", err)
|
||||
}
|
||||
hmacKey, err := decryptKeyMaterial(active.KeyMaterialEncrypted, s.encryption)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("session: decrypt active key material: %w", err)
|
||||
}
|
||||
|
||||
sessionID, err := s.newOpaqueID("ses-")
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("session: generate session id: %w", err)
|
||||
}
|
||||
|
||||
csrfToken, err := s.newCSRFToken()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("session: generate csrf token: %w", err)
|
||||
}
|
||||
|
||||
now := s.clockNow().UTC()
|
||||
row := &sessiondomain.Session{
|
||||
ID: sessionID,
|
||||
ActorID: actorID,
|
||||
ActorType: actorType,
|
||||
SigningKeyID: active.ID,
|
||||
IsPreLogin: false,
|
||||
CSRFTokenHash: hashCSRFToken(csrfToken),
|
||||
IdleExpiresAt: now.Add(s.cfg.IdleTimeout),
|
||||
AbsoluteExpiresAt: now.Add(s.cfg.AbsoluteTimeout),
|
||||
CreatedAt: now,
|
||||
LastSeenAt: now,
|
||||
IPAddress: ip,
|
||||
UserAgent: userAgent,
|
||||
TenantID: s.tenantID,
|
||||
}
|
||||
if verr := row.Validate(); verr != nil {
|
||||
return nil, fmt.Errorf("session: validate row: %w", verr)
|
||||
}
|
||||
if cerr := s.sessions.Create(ctx, row); cerr != nil {
|
||||
return nil, fmt.Errorf("session: create row: %w", cerr)
|
||||
}
|
||||
|
||||
cookieValue := signCookie(row.ID, row.SigningKeyID, hmacKey)
|
||||
|
||||
return &CreateResult{
|
||||
Session: row,
|
||||
CookieValue: cookieValue,
|
||||
CSRFToken: csrfToken,
|
||||
}, nil
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Validate.
|
||||
// =============================================================================
|
||||
|
||||
// ValidateInput bundles the data Validate needs from the HTTP request.
|
||||
// The handler builds it from the session cookie, request IP, and
|
||||
// User-Agent header.
|
||||
type ValidateInput struct {
|
||||
CookieValue string
|
||||
ClientIP string
|
||||
UserAgent string
|
||||
}
|
||||
|
||||
// Validate verifies the cookie's signature, looks up the session row,
|
||||
// and enforces idle + absolute expiry, revocation, optional IP/UA
|
||||
// binding. Returns the session on success; one of the package-scoped
|
||||
// sentinels on failure.
|
||||
//
|
||||
// Note: Validate does NOT call UpdateLastSeen — the middleware does
|
||||
// that explicitly so the test surface stays unambiguous about side
|
||||
// effects under the read path.
|
||||
func (s *Service) Validate(ctx context.Context, in ValidateInput) (*sessiondomain.Session, error) {
|
||||
sessionID, signingKeyID, providedHMAC, err := parseCookie(in.CookieValue)
|
||||
if err != nil {
|
||||
return nil, ErrSessionInvalidCookie
|
||||
}
|
||||
|
||||
signingKey, err := s.keys.Get(ctx, signingKeyID)
|
||||
if err != nil {
|
||||
return nil, ErrSigningKeyNotFound
|
||||
}
|
||||
|
||||
now := s.clockNow().UTC()
|
||||
|
||||
// Retired key still in retention window is OK; past retention is not.
|
||||
if signingKey.RetiredAt != nil {
|
||||
retentionExpiresAt := signingKey.RetiredAt.Add(s.cfg.SigningKeyRetention)
|
||||
if now.After(retentionExpiresAt) {
|
||||
return nil, ErrSigningKeyRetired
|
||||
}
|
||||
}
|
||||
|
||||
hmacKey, err := decryptKeyMaterial(signingKey.KeyMaterialEncrypted, s.encryption)
|
||||
if err != nil {
|
||||
return nil, ErrSessionInvalidCookie
|
||||
}
|
||||
|
||||
expectedHMAC := computeHMAC(sessionID, signingKeyID, hmacKey)
|
||||
if subtle.ConstantTimeCompare(expectedHMAC, providedHMAC) != 1 {
|
||||
return nil, ErrSessionInvalidCookie
|
||||
}
|
||||
|
||||
row, err := s.sessions.Get(ctx, sessionID)
|
||||
if err != nil {
|
||||
return nil, ErrSessionInvalidCookie
|
||||
}
|
||||
|
||||
if row.RevokedAt != nil {
|
||||
return nil, ErrSessionRevoked
|
||||
}
|
||||
|
||||
// Absolute expiry: hard cap regardless of activity.
|
||||
if !now.Before(row.AbsoluteExpiresAt) {
|
||||
return nil, ErrSessionExpiredAbsolute
|
||||
}
|
||||
|
||||
// Idle expiry: re-evaluated against last_seen_at + idle window.
|
||||
idleDeadline := row.LastSeenAt.Add(s.cfg.IdleTimeout)
|
||||
if !now.Before(idleDeadline) {
|
||||
return nil, ErrSessionExpiredIdle
|
||||
}
|
||||
|
||||
// Optional defense-in-depth IP / UA binding.
|
||||
if s.cfg.BindIP && in.ClientIP != "" && row.IPAddress != "" && in.ClientIP != row.IPAddress {
|
||||
s.recordAudit(ctx, "auth.session_ip_mismatch", row.ActorID, domain.ActorType(row.ActorType), row.ID,
|
||||
map[string]interface{}{"session_id": row.ID, "expected_ip": row.IPAddress, "request_ip": in.ClientIP})
|
||||
return nil, ErrSessionIPMismatch
|
||||
}
|
||||
if s.cfg.BindUserAgent && in.UserAgent != "" && row.UserAgent != "" && in.UserAgent != row.UserAgent {
|
||||
s.recordAudit(ctx, "auth.session_ua_mismatch", row.ActorID, domain.ActorType(row.ActorType), row.ID,
|
||||
map[string]interface{}{"session_id": row.ID})
|
||||
return nil, ErrSessionUAMismatch
|
||||
}
|
||||
|
||||
return row, nil
|
||||
}
|
||||
|
||||
// ValidateCSRF compares the SHA-256 of the X-CSRF-Token header against
|
||||
// the session row's stored hash. Constant-time-compares to defeat
|
||||
// timing attacks. Empty header → ErrCSRFMissing.
|
||||
func (s *Service) ValidateCSRF(headerValue string, sess *sessiondomain.Session) error {
|
||||
if strings.TrimSpace(headerValue) == "" {
|
||||
return ErrCSRFMissing
|
||||
}
|
||||
if sess == nil || sess.CSRFTokenHash == "" {
|
||||
return ErrCSRFMismatch
|
||||
}
|
||||
provided := hashCSRFToken(headerValue)
|
||||
if subtle.ConstantTimeCompare([]byte(provided), []byte(sess.CSRFTokenHash)) != 1 {
|
||||
return ErrCSRFMismatch
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// UpdateLastSeen advances the session's last_seen_at to now. Called by
|
||||
// the middleware on every authenticated request to keep the idle-expiry
|
||||
// sliding window fresh.
|
||||
func (s *Service) UpdateLastSeen(ctx context.Context, sessionID string) error {
|
||||
if err := s.sessions.UpdateLastSeen(ctx, sessionID); err != nil {
|
||||
return fmt.Errorf("session: update_last_seen: %w", err)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Revoke + RevokeAllForActor + RotateCSRFToken.
|
||||
// =============================================================================
|
||||
|
||||
// Revoke sets revoked_at on the session row. Idempotent at the repo
|
||||
// layer (re-revoking is a no-op). Subsequent Validate returns
|
||||
// ErrSessionRevoked.
|
||||
func (s *Service) Revoke(ctx context.Context, sessionID string) error {
|
||||
if err := s.sessions.Revoke(ctx, sessionID); err != nil {
|
||||
return fmt.Errorf("session: revoke: %w", err)
|
||||
}
|
||||
s.recordAudit(ctx, "auth.session_revoked", "system", domain.ActorTypeSystem, sessionID,
|
||||
map[string]interface{}{"session_id": sessionID})
|
||||
return nil
|
||||
}
|
||||
|
||||
// RevokeAllForActor sets revoked_at on every active session for the
|
||||
// (actorID, actorType, tenantID) tuple. Used on role change, fired-
|
||||
// employee scenarios, and the back-channel logout endpoint (Phase 5).
|
||||
func (s *Service) RevokeAllForActor(ctx context.Context, actorID, actorType string) error {
|
||||
if err := s.sessions.RevokeAllForActor(ctx, actorID, actorType, s.tenantID); err != nil {
|
||||
return fmt.Errorf("session: revoke_all_for_actor: %w", err)
|
||||
}
|
||||
s.recordAudit(ctx, "auth.sessions_revoked_for_actor", actorID, domain.ActorType(actorType), actorID,
|
||||
map[string]interface{}{"actor_id": actorID, "actor_type": actorType})
|
||||
return nil
|
||||
}
|
||||
|
||||
// RotateCSRFToken mints a fresh CSRF token, persists its SHA-256 hash
|
||||
// on the session row, and returns the plaintext for the handler to
|
||||
// re-emit in the certctl_csrf cookie. Called on:
|
||||
//
|
||||
// - Login completion (Service.Create already mints a token; explicit
|
||||
// rotation here is for follow-up calls).
|
||||
// - Logout (defense-in-depth even though the session is revoked).
|
||||
// - Any actor-role mutation against this actor.
|
||||
// - Explicit operator-triggered "rotate CSRF" admin endpoint.
|
||||
func (s *Service) RotateCSRFToken(ctx context.Context, sessionID string) (string, error) {
|
||||
csrfToken, err := s.newCSRFToken()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("session: generate csrf token: %w", err)
|
||||
}
|
||||
hash := hashCSRFToken(csrfToken)
|
||||
if uerr := s.sessions.UpdateCSRFTokenHash(ctx, sessionID, hash); uerr != nil {
|
||||
return "", fmt.Errorf("session: update csrf hash: %w", uerr)
|
||||
}
|
||||
s.recordAudit(ctx, "auth.session_csrf_rotated", "system", domain.ActorTypeSystem, sessionID,
|
||||
map[string]interface{}{"session_id": sessionID})
|
||||
return csrfToken, nil
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Signing-key lifecycle.
|
||||
// =============================================================================
|
||||
|
||||
// RotateSigningKey mints a fresh 32-byte HMAC key, persists it as the
|
||||
// new active key, and retires the previously-active key. The retired
|
||||
// key stays valid for verification during cfg.SigningKeyRetention so
|
||||
// existing cookies don't immediately fail; the GarbageCollect sweep
|
||||
// purges it after the retention window passes (and after no sessions
|
||||
// reference it).
|
||||
func (s *Service) RotateSigningKey(ctx context.Context) error {
|
||||
currentActive, err := s.keys.GetActive(ctx, s.tenantID)
|
||||
if err != nil {
|
||||
// No active key at all: this is a bootstrap-not-yet-run state;
|
||||
// EnsureInitialSigningKey is the right entrypoint.
|
||||
return fmt.Errorf("session: get active for rotate: %w", err)
|
||||
}
|
||||
|
||||
newID, err := s.newOpaqueID("sk-")
|
||||
if err != nil {
|
||||
return fmt.Errorf("session: generate signing key id: %w", err)
|
||||
}
|
||||
newPlaintext, err := s.newKeyMaterial()
|
||||
if err != nil {
|
||||
return fmt.Errorf("session: generate signing key material: %w", err)
|
||||
}
|
||||
newCiphertext, err := encryptKeyMaterial(newPlaintext, s.encryption)
|
||||
if err != nil {
|
||||
return fmt.Errorf("session: encrypt signing key material: %w", err)
|
||||
}
|
||||
|
||||
newKey := &sessiondomain.SessionSigningKey{
|
||||
ID: newID,
|
||||
TenantID: s.tenantID,
|
||||
KeyMaterialEncrypted: newCiphertext,
|
||||
}
|
||||
if verr := newKey.Validate(); verr != nil {
|
||||
return fmt.Errorf("session: validate new key: %w", verr)
|
||||
}
|
||||
if aerr := s.keys.Add(ctx, newKey); aerr != nil {
|
||||
return fmt.Errorf("session: add new signing key: %w", aerr)
|
||||
}
|
||||
|
||||
if rerr := s.keys.Retire(ctx, currentActive.ID); rerr != nil {
|
||||
return fmt.Errorf("session: retire previous active key: %w", rerr)
|
||||
}
|
||||
|
||||
s.recordAudit(ctx, "auth.session_signing_key_rotated", "system", domain.ActorTypeSystem, newID,
|
||||
map[string]interface{}{"new_key_id": newID, "retired_key_id": currentActive.ID})
|
||||
return nil
|
||||
}
|
||||
|
||||
// EnsureInitialSigningKey is idempotent: if a non-retired signing key
|
||||
// exists for the tenant, it returns nil. Otherwise it mints a fresh
|
||||
// 32-byte key, persists it, and emits an
|
||||
// auth.session_signing_key_bootstrap audit row with event_category=auth.
|
||||
//
|
||||
// Production wires this into cmd/server/main.go startup AFTER
|
||||
// migrations + RBAC backfill, BEFORE the HTTP listener binds. Failure
|
||||
// is fatal — the server refuses to boot rather than serve session-less.
|
||||
func (s *Service) EnsureInitialSigningKey(ctx context.Context) error {
|
||||
_, err := s.keys.GetActive(ctx, s.tenantID)
|
||||
if err == nil {
|
||||
return nil // a key already exists; idempotent no-op.
|
||||
}
|
||||
|
||||
// Any error other than "not found" should bubble; the boot loader
|
||||
// fails fatal regardless, but distinguishing repo-error from
|
||||
// no-row-yet is useful in logs.
|
||||
if !errors.Is(err, repository.ErrSessionSigningKeyNotFound) {
|
||||
return fmt.Errorf("session: probe active signing key: %w", err)
|
||||
}
|
||||
|
||||
newID, err := s.newOpaqueID("sk-")
|
||||
if err != nil {
|
||||
return fmt.Errorf("%w: %v", ErrInitialSigningKeyMintFailed, err)
|
||||
}
|
||||
plaintext, err := s.newKeyMaterial()
|
||||
if err != nil {
|
||||
return fmt.Errorf("%w: %v", ErrInitialSigningKeyMintFailed, err)
|
||||
}
|
||||
ciphertext, err := encryptKeyMaterial(plaintext, s.encryption)
|
||||
if err != nil {
|
||||
return fmt.Errorf("%w: %v", ErrInitialSigningKeyMintFailed, err)
|
||||
}
|
||||
|
||||
k := &sessiondomain.SessionSigningKey{
|
||||
ID: newID,
|
||||
TenantID: s.tenantID,
|
||||
KeyMaterialEncrypted: ciphertext,
|
||||
}
|
||||
if verr := k.Validate(); verr != nil {
|
||||
return fmt.Errorf("%w: validate: %v", ErrInitialSigningKeyMintFailed, verr)
|
||||
}
|
||||
if aerr := s.keys.Add(ctx, k); aerr != nil {
|
||||
return fmt.Errorf("%w: persist: %v", ErrInitialSigningKeyMintFailed, aerr)
|
||||
}
|
||||
|
||||
s.recordAudit(ctx, "auth.session_signing_key_bootstrap", "system", domain.ActorTypeSystem, newID,
|
||||
map[string]interface{}{"key_id": newID})
|
||||
return nil
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// GarbageCollect.
|
||||
// =============================================================================
|
||||
|
||||
// GarbageCollect runs one sweep:
|
||||
// - Deletes sessions whose absolute_expires_at is in the past
|
||||
// (post-login expired) AND pre-login rows older than 10 minutes
|
||||
// (delegated to the repo's GarbageCollectExpired).
|
||||
// - Deletes signing keys whose retired_at + retention window has
|
||||
// passed AND that are not still referenced by sessions (the FK
|
||||
// ON DELETE RESTRICT in the schema is the safety net; we attempt
|
||||
// and ignore ErrSessionSigningKeyInUse).
|
||||
//
|
||||
// Wired into the scheduler's sessionGCLoop on a CERTCTL_SESSION_GC_INTERVAL
|
||||
// tick (default 1h). Returns the count of session rows deleted.
|
||||
func (s *Service) GarbageCollect(ctx context.Context) (int, error) {
|
||||
deleted, err := s.sessions.GarbageCollectExpired(ctx)
|
||||
if err != nil {
|
||||
return 0, fmt.Errorf("session: gc expired sessions: %w", err)
|
||||
}
|
||||
|
||||
// Sweep retired-and-expired signing keys. Best-effort; in-use keys
|
||||
// (FK reference) are skipped by the repo's ErrSessionSigningKeyInUse
|
||||
// return.
|
||||
keys, listErr := s.keys.List(ctx, s.tenantID)
|
||||
if listErr != nil {
|
||||
// Listing failed but we already deleted sessions; return the
|
||||
// session count + the list error so the operator sees both.
|
||||
return deleted, fmt.Errorf("session: gc list keys: %w", listErr)
|
||||
}
|
||||
now := s.clockNow().UTC()
|
||||
for _, k := range keys {
|
||||
if k.RetiredAt == nil {
|
||||
continue
|
||||
}
|
||||
if !now.After(k.RetiredAt.Add(s.cfg.SigningKeyRetention)) {
|
||||
continue
|
||||
}
|
||||
if derr := s.keys.Delete(ctx, k.ID); derr != nil {
|
||||
// In-use keys (sessions still reference) are kept; any other
|
||||
// error short-circuits to surface it.
|
||||
if errors.Is(derr, repository.ErrSessionSigningKeyInUse) {
|
||||
continue
|
||||
}
|
||||
return deleted, fmt.Errorf("session: gc delete signing key %s: %w", k.ID, derr)
|
||||
}
|
||||
}
|
||||
return deleted, nil
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// Helpers.
|
||||
// =============================================================================
|
||||
|
||||
// signCookie returns the wire-format session cookie value:
|
||||
// `v1.<session_id>.<signing_key_id>.<base64url-no-pad(HMAC-SHA256)>`.
|
||||
func signCookie(sessionID, signingKeyID string, hmacKey []byte) string {
|
||||
mac := computeHMAC(sessionID, signingKeyID, hmacKey)
|
||||
return fmt.Sprintf("%s.%s.%s.%s",
|
||||
sessiondomain.CookieFormatVersion,
|
||||
sessionID,
|
||||
signingKeyID,
|
||||
base64.RawURLEncoding.EncodeToString(mac),
|
||||
)
|
||||
}
|
||||
|
||||
// computeHMAC returns the HMAC-SHA256 over the LENGTH-PREFIXED
|
||||
// canonical input
|
||||
//
|
||||
// len(sessionID) || ":" || sessionID || ":" || len(signingKeyID) || ":" || signingKeyID
|
||||
//
|
||||
// where len(...) is the ASCII decimal byte-length. The length prefix
|
||||
// is load-bearing: without it, `<a, bc>` and `<ab, c>` produce
|
||||
// identical input and a forger could swap one byte across the boundary.
|
||||
func computeHMAC(sessionID, signingKeyID string, hmacKey []byte) []byte {
|
||||
mac := hmac.New(sha256.New, hmacKey)
|
||||
mac.Write([]byte(strconv.Itoa(len(sessionID))))
|
||||
mac.Write([]byte(":"))
|
||||
mac.Write([]byte(sessionID))
|
||||
mac.Write([]byte(":"))
|
||||
mac.Write([]byte(strconv.Itoa(len(signingKeyID))))
|
||||
mac.Write([]byte(":"))
|
||||
mac.Write([]byte(signingKeyID))
|
||||
return mac.Sum(nil)
|
||||
}
|
||||
|
||||
// parseCookie splits the wire format and returns the three identifying
|
||||
// parts plus the decoded HMAC. Any format/version/decode failure
|
||||
// returns an error; the caller maps to ErrSessionInvalidCookie without
|
||||
// surfacing which check failed (no information leak).
|
||||
func parseCookie(cookieValue string) (sessionID, signingKeyID string, hmacBytes []byte, err error) {
|
||||
if cookieValue == "" {
|
||||
return "", "", nil, errors.New("empty cookie")
|
||||
}
|
||||
parts := strings.Split(cookieValue, ".")
|
||||
if len(parts) != 4 {
|
||||
return "", "", nil, errors.New("expected 4 segments")
|
||||
}
|
||||
if parts[0] != sessiondomain.CookieFormatVersion {
|
||||
return "", "", nil, errors.New("unsupported version prefix")
|
||||
}
|
||||
if !strings.HasPrefix(parts[1], "ses-") {
|
||||
return "", "", nil, errors.New("session id missing prefix")
|
||||
}
|
||||
if !strings.HasPrefix(parts[2], "sk-") {
|
||||
return "", "", nil, errors.New("signing key id missing prefix")
|
||||
}
|
||||
mac, derr := base64.RawURLEncoding.DecodeString(parts[3])
|
||||
if derr != nil {
|
||||
return "", "", nil, fmt.Errorf("hmac base64: %w", derr)
|
||||
}
|
||||
if len(mac) != sha256.Size {
|
||||
return "", "", nil, errors.New("hmac length")
|
||||
}
|
||||
return parts[1], parts[2], mac, nil
|
||||
}
|
||||
|
||||
// hashCSRFToken returns the lowercase-hex SHA-256 of the plaintext
|
||||
// CSRF token. The session row stores this hash; the cookie holds the
|
||||
// plaintext.
|
||||
func hashCSRFToken(plaintext string) string {
|
||||
h := sha256.Sum256([]byte(plaintext))
|
||||
return hex.EncodeToString(h[:])
|
||||
}
|
||||
|
||||
// newOpaqueID returns prefix + base64url-no-pad of 16 random bytes.
|
||||
// 128 bits of entropy is sufficient against guessing for both session
|
||||
// ids and signing-key ids in any realistic deployment.
|
||||
func (s *Service) newOpaqueID(prefix string) (string, error) {
|
||||
b := make([]byte, 16)
|
||||
if _, err := s.readRand(b); err != nil {
|
||||
return "", err
|
||||
}
|
||||
return prefix + base64.RawURLEncoding.EncodeToString(b), nil
|
||||
}
|
||||
|
||||
// newCSRFToken returns base64url-no-pad of 32 random bytes (~256 bits
|
||||
// of entropy). Plaintext goes in the certctl_csrf cookie; SHA-256
|
||||
// hash goes on the session row.
|
||||
func (s *Service) newCSRFToken() (string, error) {
|
||||
b := make([]byte, 32)
|
||||
if _, err := s.readRand(b); err != nil {
|
||||
return "", err
|
||||
}
|
||||
return base64.RawURLEncoding.EncodeToString(b), nil
|
||||
}
|
||||
|
||||
// newKeyMaterial returns 32 raw random bytes for use as an HMAC-SHA256
|
||||
// key. crypto/rand is the source.
|
||||
func (s *Service) newKeyMaterial() ([]byte, error) {
|
||||
b := make([]byte, 32)
|
||||
if _, err := s.readRand(b); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return b, nil
|
||||
}
|
||||
|
||||
// recordAudit is a thin wrapper around s.audit.RecordEventWithCategory
|
||||
// that swallows audit-layer errors (the audit row is best-effort; a
|
||||
// failed audit must not block a successful session operation). The
|
||||
// Phase 8 contract is event_category=auth for everything in this
|
||||
// service.
|
||||
func (s *Service) recordAudit(ctx context.Context, action, actor string, actorType domain.ActorType, resourceID string, details map[string]interface{}) {
|
||||
if s.audit == nil {
|
||||
return
|
||||
}
|
||||
_ = s.audit.RecordEventWithCategory(ctx, actor, actorType, action,
|
||||
"auth", "session", resourceID, details)
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1589,6 +1589,13 @@ type AuthConfig struct {
|
||||
// Setting: CERTCTL_AGENT_BOOTSTRAP_TOKEN environment variable.
|
||||
AgentBootstrapToken string
|
||||
|
||||
// Session holds the Auth Bundle 2 Phase 4 session-service tunables.
|
||||
// Defaults are documented on the SessionConfig fields. The session
|
||||
// service is wired into cmd/server/main.go alongside the OIDC
|
||||
// service in Phase 5; pre-Phase-5 deployments that run with the
|
||||
// legacy `api-key` auth type ignore this struct entirely.
|
||||
Session SessionConfig
|
||||
|
||||
// BootstrapToken is the one-shot pre-shared secret that gates the
|
||||
// Bundle 1 Phase 6 bootstrap endpoint (POST /v1/auth/bootstrap). When
|
||||
// set at server startup AND no admin-roled actors exist, the
|
||||
@@ -1609,6 +1616,56 @@ type AuthConfig struct {
|
||||
BootstrapToken string
|
||||
}
|
||||
|
||||
// SessionConfig contains the Auth Bundle 2 Phase 4 session-service
|
||||
// tunables. Every field is operator-overridable via the documented
|
||||
// CERTCTL_SESSION_* env var; defaults are the conservative values from
|
||||
// the Phase 4 spec.
|
||||
//
|
||||
// Bundle 2 Phase 4 / OWASP ASVS V3 (Session Management). The defaults
|
||||
// (1h idle / 8h absolute / 24h key retention / 1h GC / Lax cookies /
|
||||
// no IP-or-UA bind) are the conservative starting point that matches
|
||||
// the prompt; tightening to Strict + IP/UA bind suits high-security
|
||||
// environments at the cost of breaking inbound deep-links from external
|
||||
// apps and login-from-mobile-on-cellular flows.
|
||||
type SessionConfig struct {
|
||||
// IdleTimeout: maximum time between authenticated requests on a
|
||||
// session before re-auth is required. Default 1h. Wire:
|
||||
// CERTCTL_SESSION_IDLE_TIMEOUT.
|
||||
IdleTimeout time.Duration
|
||||
|
||||
// AbsoluteTimeout: maximum lifetime of a session regardless of
|
||||
// activity. Default 8h. Wire: CERTCTL_SESSION_ABSOLUTE_TIMEOUT.
|
||||
AbsoluteTimeout time.Duration
|
||||
|
||||
// SigningKeyRetention: time a retired signing key stays valid for
|
||||
// verification before being purged from the keys table. Default
|
||||
// 24h. Wire: CERTCTL_SESSION_SIGNING_KEY_RETENTION.
|
||||
SigningKeyRetention time.Duration
|
||||
|
||||
// GCInterval: scheduler tick interval for the session-GC sweep.
|
||||
// Default 1h. Wire: CERTCTL_SESSION_GC_INTERVAL.
|
||||
GCInterval time.Duration
|
||||
|
||||
// SameSite: SameSite cookie attribute. Valid values: "Lax"
|
||||
// (default) or "Strict". Strict is recommended for high-security
|
||||
// environments at the cost of breaking inbound deep-links from
|
||||
// external apps. Wire: CERTCTL_SESSION_SAMESITE.
|
||||
SameSite string
|
||||
|
||||
// BindIP: when true, the session middleware compares the request's
|
||||
// client IP to the session row's recorded IP on every Validate.
|
||||
// Mismatch -> 401, audit row, session NOT auto-revoked (user may
|
||||
// have legitimate IP change). Default false. Wire:
|
||||
// CERTCTL_SESSION_BIND_IP.
|
||||
BindIP bool
|
||||
|
||||
// BindUserAgent: when true, the session middleware compares the
|
||||
// request's User-Agent to the session row's recorded UA on every
|
||||
// Validate. Default false; useful only in tightly-controlled
|
||||
// environments. Wire: CERTCTL_SESSION_BIND_USER_AGENT.
|
||||
BindUserAgent bool
|
||||
}
|
||||
|
||||
// RateLimitConfig contains rate limiting configuration.
|
||||
//
|
||||
// Bundle B / Audit M-025 (OWASP ASVS L2 §11.2.1): pre-bundle the rate
|
||||
@@ -1732,6 +1789,18 @@ func Load() (*Config, error) {
|
||||
// /v1/auth/bootstrap endpoint that mints the first admin
|
||||
// key. Empty = bootstrap endpoint disabled (default).
|
||||
BootstrapToken: getEnv("CERTCTL_BOOTSTRAP_TOKEN", ""),
|
||||
// Bundle 2 Phase 4: session-service tunables. Defaults match
|
||||
// the prompt; high-security deployments tighten via the env
|
||||
// vars documented on SessionConfig fields.
|
||||
Session: SessionConfig{
|
||||
IdleTimeout: getEnvDuration("CERTCTL_SESSION_IDLE_TIMEOUT", 1*time.Hour),
|
||||
AbsoluteTimeout: getEnvDuration("CERTCTL_SESSION_ABSOLUTE_TIMEOUT", 8*time.Hour),
|
||||
SigningKeyRetention: getEnvDuration("CERTCTL_SESSION_SIGNING_KEY_RETENTION", 24*time.Hour),
|
||||
GCInterval: getEnvDuration("CERTCTL_SESSION_GC_INTERVAL", 1*time.Hour),
|
||||
SameSite: getEnv("CERTCTL_SESSION_SAMESITE", "Lax"),
|
||||
BindIP: getEnvBool("CERTCTL_SESSION_BIND_IP", false),
|
||||
BindUserAgent: getEnvBool("CERTCTL_SESSION_BIND_USER_AGENT", false),
|
||||
},
|
||||
},
|
||||
RateLimit: RateLimitConfig{
|
||||
Enabled: getEnvBool("CERTCTL_RATE_LIMIT_ENABLED", true),
|
||||
|
||||
@@ -129,6 +129,21 @@ func (r *SessionRepository) UpdateLastSeen(ctx context.Context, id string) error
|
||||
return nil
|
||||
}
|
||||
|
||||
// UpdateCSRFTokenHash replaces csrf_token_hash on the named session.
|
||||
// Phase 4's RotateCSRFToken consumes this on login completion, logout,
|
||||
// and any actor-role mutation against this actor.
|
||||
func (r *SessionRepository) UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error {
|
||||
res, err := r.db.ExecContext(ctx, `UPDATE sessions SET csrf_token_hash = $2 WHERE id = $1`, id, csrfTokenHash)
|
||||
if err != nil {
|
||||
return fmt.Errorf("sessions update_csrf_token_hash: %w", err)
|
||||
}
|
||||
n, _ := res.RowsAffected()
|
||||
if n == 0 {
|
||||
return repository.ErrSessionNotFound
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Revoke sets revoked_at = NOW() for the named session. Idempotent:
|
||||
// re-revoking an already-revoked session is a no-op (returns nil).
|
||||
func (r *SessionRepository) Revoke(ctx context.Context, id string) error {
|
||||
|
||||
@@ -61,6 +61,12 @@ type SessionRepository interface {
|
||||
// idle-expiry sliding window fresh.
|
||||
UpdateLastSeen(ctx context.Context, id string) error
|
||||
|
||||
// UpdateCSRFTokenHash replaces the csrf_token_hash on the session
|
||||
// row. Phase 4's RotateCSRFToken consumes this on login completion,
|
||||
// logout, and any actor-role mutation against this actor. The hash
|
||||
// is the SHA-256 hex of the operator-facing CSRF token plaintext.
|
||||
UpdateCSRFTokenHash(ctx context.Context, id, csrfTokenHash string) error
|
||||
|
||||
// Revoke sets revoked_at = NOW() for the named session. Subsequent
|
||||
// Get returns the row with RevokedAt set; Phase 4's Validate maps
|
||||
// to 401.
|
||||
|
||||
@@ -84,6 +84,14 @@ type ACMEGarbageCollector interface {
|
||||
GarbageCollect(ctx context.Context) error
|
||||
}
|
||||
|
||||
// SessionGarbageCollector is the interface the scheduler's sessionGCLoop
|
||||
// invokes once per CERTCTL_SESSION_GC_INTERVAL tick. Concrete impl is
|
||||
// *session.Service. Sweeps expired post-login + pre-login session rows
|
||||
// AND retired-past-retention signing-key rows. Auth Bundle 2 Phase 4.
|
||||
type SessionGarbageCollector interface {
|
||||
GarbageCollect(ctx context.Context) (int, error)
|
||||
}
|
||||
|
||||
// JobReaperService defines the interface for job timeout reaping used by the scheduler.
|
||||
type JobReaperService interface {
|
||||
ReapTimedOutJobs(ctx context.Context, csrTTL, approvalTTL time.Duration) error
|
||||
@@ -109,6 +117,7 @@ type Scheduler struct {
|
||||
cloudDiscoveryService CloudDiscoveryServicer
|
||||
crlCacheService CRLCacheServicer
|
||||
acmeGC ACMEGarbageCollector
|
||||
sessionGC SessionGarbageCollector
|
||||
jobReaper JobReaperService
|
||||
logger *slog.Logger
|
||||
|
||||
@@ -127,6 +136,7 @@ type Scheduler struct {
|
||||
crlGenerationInterval time.Duration
|
||||
jobTimeoutInterval time.Duration
|
||||
acmeGCInterval time.Duration
|
||||
sessionGCInterval time.Duration
|
||||
// agentOfflineJobTTL: per-tick threshold for reaping Running jobs whose
|
||||
// owning agent has been silent. Bundle C / Audit M-016. Defaults below.
|
||||
agentOfflineJobTTL time.Duration
|
||||
@@ -148,6 +158,7 @@ type Scheduler struct {
|
||||
crlGenerationRunning atomic.Bool
|
||||
jobTimeoutRunning atomic.Bool
|
||||
acmeGCRunning atomic.Bool
|
||||
sessionGCRunning atomic.Bool
|
||||
|
||||
// Graceful shutdown: wait for in-flight work to complete
|
||||
wg sync.WaitGroup
|
||||
@@ -185,6 +196,7 @@ func NewScheduler(
|
||||
crlGenerationInterval: 1 * time.Hour,
|
||||
jobTimeoutInterval: 10 * time.Minute,
|
||||
acmeGCInterval: 1 * time.Minute,
|
||||
sessionGCInterval: 1 * time.Hour,
|
||||
// 5 minutes is 5×agentHealthCheckInterval default of 1m; an agent
|
||||
// must miss multiple heartbeats before its in-flight jobs are reaped.
|
||||
agentOfflineJobTTL: 5 * time.Minute,
|
||||
@@ -317,6 +329,23 @@ func (s *Scheduler) SetACMEGCInterval(d time.Duration) {
|
||||
s.acmeGCInterval = d
|
||||
}
|
||||
|
||||
// SetSessionGarbageCollector wires the Auth Bundle 2 Phase 4 session GC
|
||||
// service. Optional; nil disables the loop (Bundle-2-disabled deployments
|
||||
// still run pre-Phase-4 behavior).
|
||||
func (s *Scheduler) SetSessionGarbageCollector(gc SessionGarbageCollector) {
|
||||
s.sessionGC = gc
|
||||
}
|
||||
|
||||
// SetSessionGCInterval configures the interval at which the session GC
|
||||
// sweep runs. Default 1h. Wire: CERTCTL_SESSION_GC_INTERVAL. Zero or
|
||||
// negative values are ignored.
|
||||
func (s *Scheduler) SetSessionGCInterval(d time.Duration) {
|
||||
if d <= 0 {
|
||||
return
|
||||
}
|
||||
s.sessionGCInterval = d
|
||||
}
|
||||
|
||||
// SetAgentOfflineJobTTL sets the threshold past which a Running job whose
|
||||
// owning agent has gone silent is reaped to Failed. Bundle C / Audit M-016.
|
||||
// Zero or negative values are ignored (the default of 5 minutes is kept).
|
||||
@@ -375,6 +404,9 @@ func (s *Scheduler) Start(ctx context.Context) <-chan struct{} {
|
||||
if s.acmeGC != nil {
|
||||
loopCount++
|
||||
}
|
||||
if s.sessionGC != nil {
|
||||
loopCount++
|
||||
}
|
||||
s.wg.Add(loopCount)
|
||||
|
||||
go func() { defer s.wg.Done(); s.renewalCheckLoop(ctx) }()
|
||||
@@ -403,6 +435,9 @@ func (s *Scheduler) Start(ctx context.Context) <-chan struct{} {
|
||||
if s.acmeGC != nil {
|
||||
go func() { defer s.wg.Done(); s.acmeGCLoop(ctx) }()
|
||||
}
|
||||
if s.sessionGC != nil {
|
||||
go func() { defer s.wg.Done(); s.sessionGCLoop(ctx) }()
|
||||
}
|
||||
|
||||
// Signal that all loops are launched
|
||||
close(startedChan)
|
||||
@@ -1146,3 +1181,40 @@ func (s *Scheduler) acmeGCLoop(ctx context.Context) {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// sessionGCLoop runs every sessionGCInterval and invokes
|
||||
// SessionGarbageCollector.GarbageCollect, which sweeps:
|
||||
// - sessions whose absolute_expires_at is in the past (post-login expired);
|
||||
// - pre-login session rows older than 10 minutes;
|
||||
// - retired-past-retention session_signing_keys rows.
|
||||
//
|
||||
// Auth Bundle 2 Phase 4. The atomic.Bool guard + the per-tick
|
||||
// context.WithTimeout match the pattern of every other loop in this
|
||||
// file: a stuck Postgres can't block the next tick, and concurrent
|
||||
// sweeps are skipped not queued.
|
||||
func (s *Scheduler) sessionGCLoop(ctx context.Context) {
|
||||
ticker := time.NewTicker(s.sessionGCInterval)
|
||||
defer ticker.Stop()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-ticker.C:
|
||||
if !s.sessionGCRunning.CompareAndSwap(false, true) {
|
||||
s.logger.Warn("session GC sweep still running, skipping tick")
|
||||
continue
|
||||
}
|
||||
s.wg.Add(1)
|
||||
go func() {
|
||||
defer s.wg.Done()
|
||||
defer s.sessionGCRunning.Store(false)
|
||||
opCtx, cancel := context.WithTimeout(ctx, time.Minute)
|
||||
defer cancel()
|
||||
if _, err := s.sessionGC.GarbageCollect(opCtx); err != nil {
|
||||
s.logger.Warn("session gc sweep failed (next tick will retry)", "error", err)
|
||||
}
|
||||
}()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user