mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-11 19:28:51 +00:00
fix(policies): close the D-006 loop — TitleCase seed canonicals + severity-aware, config-consuming rule engine (D-008)
D-008 was a three-part drift in the policy engine that made the
D-005/D-006 remediation cosmetic below the DB layer:
(a) migrations/seed.sql INSERTed rules with pre-D-005 lowercase
types ('ownership', 'environment', 'lifetime', 'renewal_window')
that the handler validator rejects on Create/Update but that
raw SQL INSERTs bypassed entirely. At runtime evaluateRule's
switch fell through to the default "unknown policy rule type"
error branch on every demo rule × every cert × every cycle,
flooding logs while emitting zero violations.
(b) migrations/seed_demo.sql persisted lowercase severity values
('critical', 'error', 'warning') on policy_violations rows.
INSERT succeeded because that column had no CHECK, but any
frontend comparing against the canonical PolicySeverity enum
mis-categorized every seeded violation.
(c) evaluateRule hardcoded Severity: PolicySeverityWarning on
every emitted violation and ignored rule.Config entirely —
so the D-006 per-rule severity column (000013) and every
per-arm Config JSON ({allowed_issuer_ids, allowed_domains,
required_keys, allowed, lead_time_days, max_days}) was dead
data below the evaluation layer.
This commit lands (a)+(b)+(c) atomically. Shipping any subset
leaves the feature half-working.
## Changes
Domain (internal/domain/policy.go):
* Add PolicyTypeCertificateLifetime as the 6th TitleCase canonical.
Pre-D-008 the seeded "max-certificate-lifetime" rule had no engine
arm — routing it through RenewalLeadTime would conflate "how
close to expiry before we renew" with "how long can the cert
possibly be", two distinct semantics. The new type accepts
config {"max_days": int} and flags certs whose
NotAfter - NotBefore exceeds the cap.
Handler validator (internal/api/handler/validation.go):
* ValidatePolicyType allowlist grown to 6 canonicals
(AllowedIssuers, AllowedDomains, RequiredMetadata,
AllowedEnvironments, RenewalLeadTime, CertificateLifetime).
OpenAPI (api/openapi.yaml):
* PolicyType enum grown to match domain.
Frontend (web/src/api/types.ts, types.test.ts):
* POLICY_TYPES tuple gains CertificateLifetime; pin test asserts
all 6 canonicals and rejects casing drift.
Migration 000014 (policy_violations severity CHECK):
* Named CHECK constraint (policy_violations_severity_check)
mirroring 000013's allowlist, defense-in-depth at the DB layer
against future drift from bypassed writes (migrations, psql
sessions, future callers). Symmetric down migration drops by
name.
Seed data:
* migrations/seed.sql rewritten to emit TitleCase canonicals with
per-arm config JSON that actually exercises the config-consuming
paths (not the missing-field backstops):
- pr-require-owner → RequiredMetadata {"required_keys":["owner"]} Warning
- pr-allowed-environments → AllowedEnvironments {"allowed":["production","staging","development"]} Error
- pr-max-certificate-lifetime → CertificateLifetime {"max_days":90} Critical
- pr-min-renewal-window → RenewalLeadTime {"lead_time_days":14} Warning
Severities are now differentiated per rule (D-006 intent).
* migrations/seed_demo.sql violation rows flipped to TitleCase
severity ('Critical', 'Error', 'Warning') so migration 000014
applies cleanly on upgrade paths.
Engine rewrite (internal/service/policy.go):
* evaluateRule rewritten. All six arms now:
1. Parse rule.Config into the per-arm typed struct.
2. Bad JSON → log at ValidateCertificate boundary and skip
this rule (no co-located poisoning of other rules in the
same batch).
3. Empty/null Config → emit the pre-D-008 missing-field
violation (backwards compat invariant — operators who
haven't reconfigured still see the same output).
4. Violations emitted carry rule.Severity (no more hardcoded
Warning); D-006 column is now load-bearing.
* CertificateLifetime arm reads NotBefore/NotAfter from the
certificate's latest version via CertRepo. Injected via
PolicyService.SetCertRepo() setter — avoids churning ~36
NewPolicyService call sites while keeping the lifetime arm
optional (degrades to a log+skip if the setter is not wired).
Server wiring (cmd/server/main.go):
* policyService.SetCertRepo(certRepo) wired after construction.
Tests (internal/service/policy_test.go):
* 25 new subtests across 5 groups:
- TestEvaluateRule_SeverityPassThrough (6): every rule type
emits violations carrying rule.Severity, not hardcoded.
- TestEvaluateRule_ConfigConsumed (12): every per-arm Config
path exercised positive + negative.
- TestEvaluateRule_EmptyConfig_BackCompat (3): empty/null
Config still emits pre-D-008 missing-field violations.
- TestEvaluateRule_BadConfig_SkipsRule: malformed JSON logs
and skips cleanly without poisoning neighbors.
- TestEvaluateRule_CertificateLifetime_RepoScenarios (3):
ok when repo wired, log+skip when not, handles missing
NotBefore/NotAfter edges.
Provenance: D-008 surfaced during D-005/D-006 remediation review
in 7a0ea35. That commit added persistence and CI pins for the
severity field but did not re-verify the evaluation layer
consumed it; this finding and fix close the audit-process gap.
This commit is contained in:
+206
-49
@@ -2,8 +2,10 @@ package service
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log/slog"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/shankar0123/certctl/internal/domain"
|
||||
@@ -14,6 +16,11 @@ import (
|
||||
type PolicyService struct {
|
||||
policyRepo repository.PolicyRepository
|
||||
auditService *AuditService
|
||||
// certRepo is optional and only required by the CertificateLifetime rule
|
||||
// arm, which must read NotBefore/NotAfter from the latest CertificateVersion.
|
||||
// Wire via SetCertRepo after construction; rules other than
|
||||
// CertificateLifetime operate without it.
|
||||
certRepo repository.CertificateRepository
|
||||
}
|
||||
|
||||
// NewPolicyService creates a new policy service.
|
||||
@@ -27,6 +34,16 @@ func NewPolicyService(
|
||||
}
|
||||
}
|
||||
|
||||
// SetCertRepo wires the certificate repository needed for the CertificateLifetime
|
||||
// rule arm. Kept as a setter (not a constructor parameter) so the ~36 existing
|
||||
// NewPolicyService call sites don't churn for a single new arm's dependency.
|
||||
// Safe to call before or after construction; evaluateRule checks for nil and
|
||||
// returns an error if a CertificateLifetime rule fires without a wired repo
|
||||
// (the caller at ValidateCertificate logs and continues).
|
||||
func (s *PolicyService) SetCertRepo(r repository.CertificateRepository) {
|
||||
s.certRepo = r
|
||||
}
|
||||
|
||||
// ValidateCertificate runs all enabled policy rules against a certificate.
|
||||
func (s *PolicyService) ValidateCertificate(ctx context.Context, cert *domain.ManagedCertificate) ([]*domain.PolicyViolation, error) {
|
||||
rules, err := s.policyRepo.ListRules(ctx)
|
||||
@@ -43,7 +60,7 @@ func (s *PolicyService) ValidateCertificate(ctx context.Context, cert *domain.Ma
|
||||
}
|
||||
|
||||
// Evaluate rule against certificate
|
||||
v, err := s.evaluateRule(rule, cert)
|
||||
v, err := s.evaluateRule(ctx, rule, cert)
|
||||
if err != nil {
|
||||
slog.Error("failed to evaluate rule", "rule_id", rule.ID, "error", err)
|
||||
continue
|
||||
@@ -58,73 +75,163 @@ func (s *PolicyService) ValidateCertificate(ctx context.Context, cert *domain.Ma
|
||||
}
|
||||
|
||||
// evaluateRule checks if a certificate violates a single policy rule.
|
||||
func (s *PolicyService) evaluateRule(rule *domain.PolicyRule, cert *domain.ManagedCertificate) (*domain.PolicyViolation, error) {
|
||||
//
|
||||
// D-008 closes the engine loop by:
|
||||
// 1. Consuming rule.Severity on every violation (the pre-D-008 engine
|
||||
// hardcoded PolicySeverityWarning, which silently defeated the D-006
|
||||
// per-rule severity column).
|
||||
// 2. Parsing rule.Config per-arm so rules carry real thresholds / allowlists
|
||||
// instead of the pre-D-008 "metadata absent" placeholders. Empty/null
|
||||
// Config preserves the pre-D-008 missing-field behavior as a
|
||||
// backward-compat invariant — a rule without config still fires on the
|
||||
// absent-field shape but using its configured severity.
|
||||
// 3. Adding the CertificateLifetime arm, which reads NotBefore/NotAfter from
|
||||
// the latest CertificateVersion (injected via SetCertRepo). Required
|
||||
// because ManagedCertificate tracks ExpiresAt but not issuance date.
|
||||
//
|
||||
// Bad-config failure mode: json.Unmarshal error returns (nil, error) shaped
|
||||
// as `invalid config for rule <id> (type=<type>): <err>`; the caller at
|
||||
// ValidateCertificate logs and continues so one malformed rule doesn't fail
|
||||
// the entire pass.
|
||||
func (s *PolicyService) evaluateRule(ctx context.Context, rule *domain.PolicyRule, cert *domain.ManagedCertificate) (*domain.PolicyViolation, error) {
|
||||
switch rule.Type {
|
||||
case domain.PolicyTypeAllowedIssuers:
|
||||
// Restrict to specific issuers
|
||||
// Note: In a production implementation, we would parse rule.Config to extract parameters
|
||||
// Config: {"allowed_issuer_ids": ["iss-a", "iss-b"]}
|
||||
// Empty config = fire only on absent IssuerID (backward-compat).
|
||||
var cfg struct {
|
||||
AllowedIssuerIDs []string `json:"allowed_issuer_ids"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
if cert.IssuerID == "" {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: domain.PolicySeverityWarning,
|
||||
Message: "certificate has no issuer assigned",
|
||||
CreatedAt: time.Now(),
|
||||
}, nil
|
||||
return s.violation(rule, cert, "certificate has no issuer assigned"), nil
|
||||
}
|
||||
if len(cfg.AllowedIssuerIDs) > 0 && !containsString(cfg.AllowedIssuerIDs, cert.IssuerID) {
|
||||
return s.violation(rule, cert, fmt.Sprintf("issuer %q is not in the allowed list", cert.IssuerID)), nil
|
||||
}
|
||||
|
||||
case domain.PolicyTypeAllowedDomains:
|
||||
// Ensure certificate domains are in allowed list
|
||||
// Config: {"allowed_domains": ["example.com", "*.internal.example.com"]}
|
||||
// Wildcards are literal prefix matches (*.foo matches anything ending
|
||||
// in .foo). Empty config = fire only on zero SANs (backward-compat).
|
||||
var cfg struct {
|
||||
AllowedDomains []string `json:"allowed_domains"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
if len(cert.SANs) == 0 {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: domain.PolicySeverityWarning,
|
||||
Message: "certificate has no subject alternative names",
|
||||
CreatedAt: time.Now(),
|
||||
}, nil
|
||||
return s.violation(rule, cert, "certificate has no subject alternative names"), nil
|
||||
}
|
||||
if len(cfg.AllowedDomains) > 0 {
|
||||
for _, san := range cert.SANs {
|
||||
if !domainAllowed(san, cfg.AllowedDomains) {
|
||||
return s.violation(rule, cert, fmt.Sprintf("SAN %q is not in the allowed domain list", san)), nil
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
case domain.PolicyTypeRequiredMetadata:
|
||||
// Ensure certificate has required metadata/tags
|
||||
// Config: {"required_keys": ["owner", "cost-center"]}
|
||||
// Empty config = fire only on zero tags (backward-compat).
|
||||
var cfg struct {
|
||||
RequiredKeys []string `json:"required_keys"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
if len(cert.Tags) == 0 {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: domain.PolicySeverityWarning,
|
||||
Message: "certificate has no tags or metadata",
|
||||
CreatedAt: time.Now(),
|
||||
}, nil
|
||||
return s.violation(rule, cert, "certificate has no tags or metadata"), nil
|
||||
}
|
||||
for _, key := range cfg.RequiredKeys {
|
||||
if _, ok := cert.Tags[key]; !ok {
|
||||
return s.violation(rule, cert, fmt.Sprintf("certificate is missing required metadata key %q", key)), nil
|
||||
}
|
||||
}
|
||||
|
||||
case domain.PolicyTypeAllowedEnvironments:
|
||||
// Restrict to specific environments
|
||||
// Config: {"allowed": ["prod", "staging"]}
|
||||
// Empty config = fire only on empty Environment (backward-compat).
|
||||
var cfg struct {
|
||||
Allowed []string `json:"allowed"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
if cert.Environment == "" {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: domain.PolicySeverityWarning,
|
||||
Message: "certificate has no environment assigned",
|
||||
CreatedAt: time.Now(),
|
||||
}, nil
|
||||
return s.violation(rule, cert, "certificate has no environment assigned"), nil
|
||||
}
|
||||
if len(cfg.Allowed) > 0 && !containsString(cfg.Allowed, cert.Environment) {
|
||||
return s.violation(rule, cert, fmt.Sprintf("environment %q is not in the allowed list", cert.Environment)), nil
|
||||
}
|
||||
|
||||
case domain.PolicyTypeRenewalLeadTime:
|
||||
// Ensure renewal begins before certificate expires
|
||||
// Config: {"lead_time_days": 30}
|
||||
// Fires when remaining validity drops below lead_time_days and the
|
||||
// cert is not already expired. Empty/zero config falls back to the
|
||||
// pre-D-008 hardcoded 30-day threshold for backward compatibility.
|
||||
var cfg struct {
|
||||
LeadTimeDays int `json:"lead_time_days"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
leadDays := cfg.LeadTimeDays
|
||||
if leadDays <= 0 {
|
||||
leadDays = 30
|
||||
}
|
||||
daysUntilExpiry := time.Until(cert.ExpiresAt).Hours() / 24
|
||||
if daysUntilExpiry < 30 && daysUntilExpiry > 0 {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: domain.PolicySeverityWarning,
|
||||
Message: fmt.Sprintf("certificate expires in %.1f days, plan renewal soon", daysUntilExpiry),
|
||||
CreatedAt: time.Now(),
|
||||
}, nil
|
||||
if daysUntilExpiry < float64(leadDays) && daysUntilExpiry > 0 {
|
||||
return s.violation(rule, cert, fmt.Sprintf("certificate expires in %.1f days, plan renewal soon (policy lead time: %d days)", daysUntilExpiry, leadDays)), nil
|
||||
}
|
||||
|
||||
case domain.PolicyTypeCertificateLifetime:
|
||||
// Config: {"max_days": 397}
|
||||
// Reads NotBefore/NotAfter from the latest CertificateVersion via the
|
||||
// injected certRepo. ManagedCertificate exposes ExpiresAt but not the
|
||||
// issuance date, so lifetime math requires the version record.
|
||||
//
|
||||
// If certRepo wasn't wired (test misconfiguration / early boot),
|
||||
// returns an error so the caller logs it — better a loud failure
|
||||
// than silently ignoring the rule. If GetLatestVersion errors (e.g.,
|
||||
// the cert hasn't been issued yet), we skip the check — a cert with
|
||||
// no version has no lifetime to measure, matching the missing-field
|
||||
// backward-compat pattern used by the other arms.
|
||||
if s.certRepo == nil {
|
||||
return nil, fmt.Errorf("CertificateLifetime rule %s requires cert repository (not wired via SetCertRepo)", rule.ID)
|
||||
}
|
||||
var cfg struct {
|
||||
MaxDays int `json:"max_days"`
|
||||
}
|
||||
if len(rule.Config) > 0 {
|
||||
if err := json.Unmarshal(rule.Config, &cfg); err != nil {
|
||||
return nil, fmt.Errorf("invalid config for rule %s (type=%s): %w", rule.ID, rule.Type, err)
|
||||
}
|
||||
}
|
||||
if cfg.MaxDays <= 0 {
|
||||
// No threshold configured — nothing meaningful to enforce.
|
||||
return nil, nil
|
||||
}
|
||||
version, err := s.certRepo.GetLatestVersion(ctx, cert.ID)
|
||||
if err != nil {
|
||||
// No version yet — nothing to measure. Not an engine error;
|
||||
// the cert simply hasn't been issued.
|
||||
return nil, nil
|
||||
}
|
||||
lifetimeDays := version.NotAfter.Sub(version.NotBefore).Hours() / 24
|
||||
if lifetimeDays > float64(cfg.MaxDays) {
|
||||
return s.violation(rule, cert, fmt.Sprintf("certificate lifetime is %.1f days, exceeds policy max of %d days", lifetimeDays, cfg.MaxDays)), nil
|
||||
}
|
||||
|
||||
default:
|
||||
@@ -134,6 +241,56 @@ func (s *PolicyService) evaluateRule(rule *domain.PolicyRule, cert *domain.Manag
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
// violation constructs a PolicyViolation carrying the rule's configured
|
||||
// severity. Centralizing the build eliminates the pre-D-008 bug where each
|
||||
// arm independently stamped PolicySeverityWarning on its violation.
|
||||
func (s *PolicyService) violation(rule *domain.PolicyRule, cert *domain.ManagedCertificate, message string) *domain.PolicyViolation {
|
||||
return &domain.PolicyViolation{
|
||||
ID: generateID("violation"),
|
||||
RuleID: rule.ID,
|
||||
CertificateID: cert.ID,
|
||||
Severity: rule.Severity,
|
||||
Message: message,
|
||||
CreatedAt: time.Now(),
|
||||
}
|
||||
}
|
||||
|
||||
// containsString reports whether needle is present in haystack.
|
||||
func containsString(haystack []string, needle string) bool {
|
||||
for _, s := range haystack {
|
||||
if s == needle {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// domainAllowed reports whether a SAN (hostname) matches any of the allowed
|
||||
// domain patterns. Patterns may be exact matches or `*.example.com` wildcards
|
||||
// (the wildcard consumes a single label: `*.foo.com` matches `bar.foo.com`
|
||||
// but not `baz.bar.foo.com`, mirroring X.509 SAN wildcard semantics).
|
||||
func domainAllowed(san string, allowed []string) bool {
|
||||
san = strings.ToLower(strings.TrimSpace(san))
|
||||
for _, pattern := range allowed {
|
||||
pattern = strings.ToLower(strings.TrimSpace(pattern))
|
||||
if pattern == san {
|
||||
return true
|
||||
}
|
||||
if strings.HasPrefix(pattern, "*.") {
|
||||
suffix := pattern[1:] // ".foo.com"
|
||||
if strings.HasSuffix(san, suffix) {
|
||||
// Ensure wildcard consumes exactly one label — reject
|
||||
// sub-subdomains.
|
||||
head := strings.TrimSuffix(san, suffix)
|
||||
if head != "" && !strings.Contains(head, ".") {
|
||||
return true
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// CreateRule stores a new policy rule.
|
||||
func (s *PolicyService) CreateRule(ctx context.Context, rule *domain.PolicyRule, actor string) error {
|
||||
if rule.ID == "" {
|
||||
|
||||
Reference in New Issue
Block a user