fix(policies): close the D-006 loop — TitleCase seed canonicals + severity-aware, config-consuming rule engine (D-008)

D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:

  (a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
      types ('ownership', 'environment', 'lifetime', 'renewal_window')
      that the handler validator rejects on Create/Update but that
      raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
      switch fell through to the default "unknown policy rule type"
      error branch on every demo rule × every cert × every cycle,
      flooding logs while emitting zero violations.

  (b) migrations/seed_demo.sql persisted lowercase severity values
      ('critical', 'error', 'warning') on policy_violations rows.
      INSERT succeeded because that column had no CHECK, but any
      frontend comparing against the canonical PolicySeverity enum
      mis-categorized every seeded violation.

  (c) evaluateRule hardcoded Severity: PolicySeverityWarning on
      every emitted violation and ignored rule.Config entirely —
      so the D-006 per-rule severity column (000013) and every
      per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
      required_keys, allowed, lead_time_days, max_days}) was dead
      data below the evaluation layer.

This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.

## Changes

Domain (internal/domain/policy.go):
  * Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
    Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
    arm — routing it through RenewalLeadTime would conflate "how
    close to expiry before we renew" with "how long can the cert
    possibly be", two distinct semantics. The new type accepts
    config {"max_days": int} and flags certs whose
    NotAfter - NotBefore exceeds the cap.

Handler validator (internal/api/handler/validation.go):
  * ValidatePolicyType allowlist grown to 6 canonicals
    (AllowedIssuers, AllowedDomains, RequiredMetadata,
    AllowedEnvironments, RenewalLeadTime, CertificateLifetime).

OpenAPI (api/openapi.yaml):
  * PolicyType enum grown to match domain.

Frontend (web/src/api/types.ts, types.test.ts):
  * POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
    all 6 canonicals and rejects casing drift.

Migration 000014 (policy_violations severity CHECK):
  * Named CHECK constraint (policy_violations_severity_check)
    mirroring 000013's allowlist, defense-in-depth at the DB layer
    against future drift from bypassed writes (migrations, psql
    sessions, future callers). Symmetric down migration drops by
    name.

Seed data:
  * migrations/seed.sql rewritten to emit TitleCase canonicals with
    per-arm config JSON that actually exercises the config-consuming
    paths (not the missing-field backstops):
      - pr-require-owner         → RequiredMetadata     {"required_keys":["owner"]}                        Warning
      - pr-allowed-environments  → AllowedEnvironments  {"allowed":["production","staging","development"]} Error
      - pr-max-certificate-lifetime → CertificateLifetime {"max_days":90}                                   Critical
      - pr-min-renewal-window    → RenewalLeadTime      {"lead_time_days":14}                              Warning
    Severities are now differentiated per rule (D-006 intent).
  * migrations/seed_demo.sql violation rows flipped to TitleCase
    severity ('Critical', 'Error', 'Warning') so migration 000014
    applies cleanly on upgrade paths.

Engine rewrite (internal/service/policy.go):
  * evaluateRule rewritten. All six arms now:
      1. Parse rule.Config into the per-arm typed struct.
      2. Bad JSON → log at ValidateCertificate boundary and skip
         this rule (no co-located poisoning of other rules in the
         same batch).
      3. Empty/null Config → emit the pre-D-008 missing-field
         violation (backwards compat invariant — operators who
         haven't reconfigured still see the same output).
      4. Violations emitted carry rule.Severity (no more hardcoded
         Warning); D-006 column is now load-bearing.
  * CertificateLifetime arm reads NotBefore/NotAfter from the
    certificate's latest version via CertRepo. Injected via
    PolicyService.SetCertRepo() setter — avoids churning ~36
    NewPolicyService call sites while keeping the lifetime arm
    optional (degrades to a log+skip if the setter is not wired).

Server wiring (cmd/server/main.go):
  * policyService.SetCertRepo(certRepo) wired after construction.

Tests (internal/service/policy_test.go):
  * 25 new subtests across 5 groups:
      - TestEvaluateRule_SeverityPassThrough (6): every rule type
        emits violations carrying rule.Severity, not hardcoded.
      - TestEvaluateRule_ConfigConsumed (12): every per-arm Config
        path exercised positive + negative.
      - TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
        Config still emits pre-D-008 missing-field violations.
      - TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
        and skips cleanly without poisoning neighbors.
      - TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
        ok when repo wired, log+skip when not, handles missing
        NotBefore/NotAfter edges.

Provenance: D-008 surfaced during D-005/D-006 remediation review
in 7a0ea35. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
This commit is contained in:
Shankar
2026-04-18 14:55:56 +00:00
parent 7a0ea35b97
commit dfa9faa426
12 changed files with 823 additions and 75 deletions
@@ -0,0 +1,9 @@
-- Rollback migration 000014: drop the policy_violations severity CHECK.
--
-- Drops the named CHECK constraint added by the up migration. The severity
-- column itself stays (it predates this migration — see 000001 line 183),
-- so any application code that reads/writes the column continues to work.
-- Only the DB-level enforcement of the TitleCase allowlist is removed.
ALTER TABLE policy_violations
DROP CONSTRAINT IF EXISTS policy_violations_severity_check;
@@ -0,0 +1,29 @@
-- Migration 000014: CHECK constraint on policy_violations.severity
--
-- Sibling to migration 000013, which added severity + CHECK to policy_rules.
-- policy_violations has carried a severity column since the initial schema
-- (000001, line 183) but without any CHECK. The engine used to hardcode
-- `Warning` on every violation regardless of the triggering rule's severity
-- (see pre-D-008 internal/service/policy.go:evaluateRule), so the column
-- value was uniform by accident of implementation, not by constraint.
--
-- D-008 rewrites evaluateRule to copy rule.Severity into the violation. The
-- engine now writes values drawn from the application-layer PolicySeverity
-- allowlist, but nothing at the DB level prevents a future caller — or a
-- bypassed write from a migration or psql session — from inserting casing
-- drift ('warning', 'ERROR', etc.) and re-opening the same class of bug
-- that D-005 and D-006 closed. This constraint is the defense-in-depth
-- complement to the handler validator.
--
-- Pre-existing seed_demo.sql rows use lowercase severity values. D-008
-- updates those in the same commit so this migration can apply cleanly
-- against both a fresh install and an upgraded install that has already
-- seeded the demo data.
--
-- Named constraint (policy_violations_severity_check) so the down migration
-- can DROP it by name without ambiguity; un-named CHECK constraints use
-- a synthesized PostgreSQL name that varies by environment.
ALTER TABLE policy_violations
ADD CONSTRAINT policy_violations_severity_check
CHECK (severity IN ('Warning', 'Error', 'Critical'));
+20 -14
View File
@@ -12,19 +12,25 @@ VALUES (
'[30, 14, 7, 0]'::jsonb
) ON CONFLICT (id) DO NOTHING;
-- Policy rules: Require owner assignment
-- Severity differentiated per rule to demonstrate the field means something
-- (D-006). The backend CHECK constraint (migration 000013) enforces the
-- TitleCase allowlist Warning/Error/Critical. Type-value drift
-- (ownership/environment/lifetime/renewal_window vs. the engine's TitleCase
-- canonicals) is tracked separately in d/D-008 and intentionally left
-- unchanged in this commit.
-- Policy rules: Require owner assignment, bound environments, cap lifetime,
-- and enforce a renewal lead-time.
--
-- Severity is differentiated per rule (D-006) and the types are now the
-- TitleCase canonicals the engine actually recognizes (D-008). Pre-D-008 the
-- types were lowercase strings (`ownership`, `environment`, `lifetime`,
-- `renewal_window`) that the engine silently dropped through to its
-- default-case error path — the rules looked alive in the GUI but did not
-- enforce anything. The backend CHECK constraint (migration 000013) enforces
-- the TitleCase severity allowlist Warning/Error/Critical. Configs are also
-- reshaped to match the D-008 per-arm schemas so the rules actually exercise
-- the config-consuming paths instead of falling back to the missing-field
-- placeholders.
INSERT INTO policy_rules (id, name, type, config, enabled, severity)
VALUES (
'pr-require-owner',
'require-owner',
'ownership',
'{"requirement": "owner_id must be set"}'::jsonb,
'RequiredMetadata',
'{"required_keys": ["owner"]}'::jsonb,
true,
'Warning'
) ON CONFLICT (id) DO NOTHING;
@@ -34,7 +40,7 @@ INSERT INTO policy_rules (id, name, type, config, enabled, severity)
VALUES (
'pr-allowed-environments',
'allowed-environments',
'environment',
'AllowedEnvironments',
'{"allowed": ["production", "staging", "development"]}'::jsonb,
true,
'Error'
@@ -45,19 +51,19 @@ INSERT INTO policy_rules (id, name, type, config, enabled, severity)
VALUES (
'pr-max-certificate-lifetime',
'max-certificate-lifetime',
'lifetime',
'CertificateLifetime',
'{"max_days": 90}'::jsonb,
true,
'Critical'
) ON CONFLICT (id) DO NOTHING;
-- Policy rules: Minimum renewal window
-- Policy rules: Minimum renewal window (renew at least 14 days before expiry)
INSERT INTO policy_rules (id, name, type, config, enabled, severity)
VALUES (
'pr-min-renewal-window',
'min-renewal-window',
'renewal_window',
'{"min_days": 14}'::jsonb,
'RenewalLeadTime',
'{"lead_time_days": 14}'::jsonb,
true,
'Warning'
) ON CONFLICT (id) DO NOTHING;
+13 -6
View File
@@ -478,13 +478,20 @@ ON CONFLICT (id) DO NOTHING;
-- ============================================================
-- 13. Policy Violations
-- ============================================================
-- D-008: severity values rewritten to TitleCase canonicals (Warning/Error/Critical).
-- Pre-D-008 these rows used lowercase strings ('critical', 'error', 'warning'). Those
-- values were silently tolerated by the pre-D-008 engine, which hardcoded 'Warning'
-- on every new violation regardless of the triggering rule's severity. D-008 rewires
-- evaluateRule to copy rule.Severity into the violation AND migration 000014 adds a
-- CHECK constraint enforcing the TitleCase allowlist at the DB level. Both paths now
-- round-trip correctly against these demo rows.
INSERT INTO policy_violations (id, certificate_id, rule_id, message, severity, created_at) VALUES
('pv-001', 'mc-legacy-prod', 'pr-max-certificate-lifetime', 'Certificate has expired and exceeds maximum lifetime policy', 'critical', NOW() - INTERVAL '3 days'),
('pv-002', 'mc-old-api', 'pr-max-certificate-lifetime', 'Certificate expired 15 days ago', 'critical', NOW() - INTERVAL '15 days'),
('pv-003', 'mc-vpn-prod', 'pr-min-renewal-window', 'Renewal failed within minimum renewal window', 'error', NOW() - INTERVAL '3 days'),
('pv-004', 'mc-mail-prod', 'pr-min-renewal-window', 'Certificate expiring in 5 days, below 14-day minimum window','warning', NOW() - INTERVAL '20 minutes'),
('pv-005', 'mc-wiki-prod', 'pr-max-certificate-lifetime', 'Certificate expired 7 days ago', 'critical', NOW() - INTERVAL '7 days'),
('pv-006', 'mc-compromised', 'pr-min-renewal-window', 'Certificate revoked due to key compromise', 'critical', NOW() - INTERVAL '14 days')
('pv-001', 'mc-legacy-prod', 'pr-max-certificate-lifetime', 'Certificate has expired and exceeds maximum lifetime policy', 'Critical', NOW() - INTERVAL '3 days'),
('pv-002', 'mc-old-api', 'pr-max-certificate-lifetime', 'Certificate expired 15 days ago', 'Critical', NOW() - INTERVAL '15 days'),
('pv-003', 'mc-vpn-prod', 'pr-min-renewal-window', 'Renewal failed within minimum renewal window', 'Error', NOW() - INTERVAL '3 days'),
('pv-004', 'mc-mail-prod', 'pr-min-renewal-window', 'Certificate expiring in 5 days, below 14-day minimum window','Warning', NOW() - INTERVAL '20 minutes'),
('pv-005', 'mc-wiki-prod', 'pr-max-certificate-lifetime', 'Certificate expired 7 days ago', 'Critical', NOW() - INTERVAL '7 days'),
('pv-006', 'mc-compromised', 'pr-min-renewal-window', 'Certificate revoked due to key compromise', 'Critical', NOW() - INTERVAL '14 days')
ON CONFLICT (id) DO NOTHING;
-- ============================================================