Files
certctl/internal
shankar0123 e772ff0d4d fix(repo/job): split UNION ALL + FOR UPDATE into two queries (Postgres-correctness)
Phase-9 docker compose smoke surfaced a latent production-breaking
bug introduced by commit 0a75a30 (H-6 atomic pending-job claim). The
ClaimPendingByAgentID query in internal/repository/postgres/job.go
combined UNION ALL with FOR UPDATE SKIP LOCKED in a single statement.
Postgres rejects this with:

  ERROR: FOR UPDATE is not allowed with UNION/INTERSECT/EXCEPT

Every agent work-poll returns HTTP 500 in any real deployment where
an agent is actually polling. From the compose log:

  request_id=6da47015-... GET /api/v1/agents/agent-demo-1/work
  status=500 duration_ms=2

The schema-per-test unit harness in internal/repository/postgres/
*_test.go never inserted jobs and polled, so the SQL execution path
was never exercised. The bug has been latent in master since 0a75a30
landed.

Fix: split the UNION ALL into two separate FOR UPDATE SKIP LOCKED
queries within the existing transaction. The H-6 atomicity invariant
(concurrent pollers never see the same Pending row) is preserved
because:

  1. The two queries run inside the same transaction (tx).
  2. Each query independently locks its result rows with
     FOR UPDATE SKIP LOCKED.
  3. The subsequent UPDATE that flips Pending -> Running runs in
     the same transaction, so the rows stay invisible to concurrent
     callers from initial SELECT through final COMMIT.
  4. The transaction is the unit of consistency, not the single
     SQL statement.

Two queries:
  - Branch 1 (direct): jobs.agent_id =  + status='Pending' +
    type='Deployment'. ORDER BY created_at ASC, FOR UPDATE SKIP LOCKED.
  - Branch 2 (fallback): jobs.agent_id IS NULL + INNER JOIN
    deployment_targets dt ON jobs.target_id = dt.id WHERE
    dt.agent_id = . ORDER BY j.created_at ASC, FOR UPDATE OF j
    SKIP LOCKED (FOR UPDATE OF needed because the join brings in dt).

Branch 3 (AwaitingCSR) is unchanged — already a single SELECT,
not affected by the UNION restriction.

Inline comment explains the fix's load-bearing-ness so a future
refactor doesn't merge them back into one UNION query.

Verify (sandbox): go vet clean; go test -short -count=1 PASS on
internal/repository/postgres/. Workstation re-runs 'docker compose
up' to confirm the agent's GET /work returns 200 with the next
pending-deployment claim.

Note: this is NOT a regression introduced by Auth Bundle 2 or the
2026-05-11 audit fixes; it's a pre-existing latent defect from H-6.
Including in v2.1.0 because shipping with a broken agent work-poll
would block the demo path on day one of release.
2026-05-11 16:11:33 +00:00
..
2026-05-05 18:18:29 +00:00
2026-05-05 18:18:29 +00:00
2026-05-05 18:18:29 +00:00