auth-bundle-1 fix: migration 000029 role_permissions NULL scope_id

Real bug an external tester (operator) hit on first docker compose up:

  failed to execute migration 000029_rbac.up.sql: pq: null value in
  column "scope_id" of relation "role_permissions" violates
  not-null constraint

# Root cause

The role_permissions table declared scope_id TEXT (nullable) but
also declared

  PRIMARY KEY (role_id, permission_id, scope_type, scope_id)

In Postgres, PRIMARY KEY columns are implicitly NOT NULL — the
PK constraint silently overrode the column-level nullability. So
every global-scope INSERT (which legitimately has scope_id=NULL
per the CHECK constraint that requires it) tripped the NOT NULL.

The schema was never reachable in the unit-test suite because
the in-memory fakes don't enforce Postgres semantics, and the
postgres integration tests skip on -short. First contact with a
real postgres:16-alpine boot caught it.

# Fix

Switch to a synthetic BIGSERIAL primary key + a UNIQUE NULLS NOT
DISTINCT constraint on the natural key
(role_id, permission_id, scope_type, scope_id):

  - BIGSERIAL primary key satisfies Postgres's PK-implies-NOT-NULL.
  - UNIQUE NULLS NOT DISTINCT (Postgres 15+; the project targets
    postgres:16-alpine) treats two NULL scope_ids as colliding,
    which is what the seed's ON CONFLICT (...) DO NOTHING relies
    on to make re-running the migration idempotent.
  - The CHECK (scope_type='global' AND scope_id IS NULL OR
    scope_type IN ('profile','issuer') AND scope_id IS NOT NULL)
    still enforces the per-row invariant.

The ON CONFLICT (col1, col2, ...) clauses in the seed and in
RoleRepository.AddPermission infer the unique index from the
column list and still resolve correctly against the renamed
constraint — no other changes needed.

# Verification

After this commit, docker compose up -d --build should boot
clean: postgres becomes healthy, certctl-tls-init exits 0,
certctl-server applies all 33 migrations including 000029,
backfills the 7 default roles + 33-permission catalogue + the
synthetic actor-demo-anon admin grant, and starts serving on
:8443.

  docker compose -f deploy/docker-compose.yml \
    -f deploy/docker-compose.demo.yml down -v
  docker compose -f deploy/docker-compose.yml \
    -f deploy/docker-compose.demo.yml up -d --build
  sleep 15
  curl -sk https://localhost:8443/api/v1/auth/me | jq
  # Expect: actor_id=actor-demo-anon, admin=true, roles=[r-admin]
This commit is contained in:
shankar0123
2026-05-10 00:25:28 +00:00
parent 5313cd8492
commit 45122d7edb
+12 -1
View File
@@ -65,13 +65,24 @@ CREATE TABLE IF NOT EXISTS permissions (
-- 'global', 'profile', 'issuer'; ScopeID is NULL when global, otherwise
-- references the resource id (managed at the application layer because
-- profiles + issuers live in different tables; we don't FK on scope_id).
-- Bundle 1 fix: PRIMARY KEY columns are implicitly NOT NULL in Postgres,
-- but global-scope grants legitimately have scope_id=NULL by design
-- (the CHECK constraint enforces it). The earlier composite PK over
-- (role_id, permission_id, scope_type, scope_id) tripped on every
-- global-scope insert because scope_id was NULL. Fix: use a synthetic
-- BIGSERIAL primary key + UNIQUE NULLS NOT DISTINCT on the natural
-- key (Postgres 15+; the project's compose targets postgres:16-alpine).
-- NULLS NOT DISTINCT means two NULL scope_ids collide for uniqueness,
-- which is what ON CONFLICT (...) DO NOTHING below relies on.
CREATE TABLE IF NOT EXISTS role_permissions (
id BIGSERIAL PRIMARY KEY,
role_id TEXT NOT NULL REFERENCES roles(id) ON DELETE CASCADE,
permission_id TEXT NOT NULL REFERENCES permissions(id) ON DELETE RESTRICT,
scope_type TEXT NOT NULL DEFAULT 'global',
scope_id TEXT, -- NULL for global
PRIMARY KEY (role_id, permission_id, scope_type, scope_id),
CONSTRAINT role_permissions_unique
UNIQUE NULLS NOT DISTINCT (role_id, permission_id, scope_type, scope_id),
CONSTRAINT role_permission_scope_check CHECK (
scope_type IN ('global', 'profile', 'issuer')
),