Closes#7. The issuer create/update handlers swallowed all service errors
as generic 500s. Now differentiates: 409 for UNIQUE constraint violations,
400 for unsupported issuer type, 404 for not-found on update, 500 for
unknown errors. Adds structured error logging via slog.
OnboardingWizard now pre-populates config field defaults when a type is
selected (matching IssuersPage behavior), preventing empty required fields
from causing silent failures.
install-agent.sh hardened for curl|bash usage: --agent-id flag, =value
syntax, /dev/tty stdin reopening, proper stderr routing in download_binary,
non-interactive install examples in help text, and updated wizard commands.
Adds adversarial security tests for EST, path traversal, and query
injection handlers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SA1029: use typed context key instead of string in main_test.go
S1039: remove unnecessary fmt.Sprintf in validation_test.go
SA4023: fix unreachable nil check on concrete error type
SA4006: fix unused variable assignments in stepca_test.go (4 occurrences)
SA4000: fix duplicate expression in ssh_test.go (BEGIN vs END CERTIFICATE)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.
All tests pass: go test -short, go vet, go test -race clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Full audit of all ~100 backend API endpoints against frontend client functions
and TypeScript interfaces. Fixes field name mismatches, missing client functions,
phantom interface fields, type coercion for Go bool/int config fields, and
issuer type ID alignment with backend domain constants.
Backend:
- issuer.go/target.go: GUI-created entities default enabled=true (Go bool
zero value was overriding DB DEFAULT)
Frontend types (types.ts):
- Certificate: fingerprint→fingerprint_sha256, phantom fields made optional
- CertificateVersion: fingerprint→fingerprint_sha256, chain_pem→pem_chain,
removed phantom version/cert_pem fields
- Job: error_message→last_error (matches Go json tag)
Frontend client (client.ts):
- Added getNotification(id) and getAuditEvent(id) for existing backend routes
Frontend pages:
- CertificateDetailPage: derives serial/fingerprint/issuedAt from latest
CertificateVersion instead of empty Certificate fields
- JobsPage/JobDetailPage: error_message→last_error
- TargetsPage: reload_cmd→reload_command, validate_cmd→validate_command,
added missing config fields per backend structs (validate_command for
NGINX/Apache, hostname/winrm_timeout for IIS, private_key/passphrase/
cert_mode/key_mode for SSH, winrm_https/winrm_insecure for WinCertStore,
create_keystore for JavaKeystore, mode for Dovecot), type coercion via
buildConfigPayload() with BOOL_FIELDS/INT_FIELDS sets, IIS WinRM nesting
- TargetDetailPage: added passphrase to sensitiveKeys redaction
- issuerTypes.ts: type IDs aligned to backend constants (acme→ACME,
local→GenericCA, stepca→StepCA, openssl→OpenSSL), backward compat aliases
preserved, step-ca config fields updated to match backend struct
Utilities (utils.ts):
- formatDate/formatDateTime accept string|undefined|null
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unnecessary fmt.Sprintf wrapping a string literal (staticcheck S1039),
remove unused tempFileForPFX function, and clean up unused os import.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
golangci-lint flagged jwkThumbprint as unused. Removed it and the dead
var _ compile-time checks. Moved verifyJWSSignature (test-only helper)
from profile.go to profile_test.go where it belongs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three related ACME ecosystem changes shipped as a single milestone:
1. ACME Certificate Profile Selection: Custom JWS-signed newOrder POST with
`profile` field (e.g., `tlsserver`, `shortlived` for 6-day certs) bypassing
acme.Client.AuthorizeOrder() since golang.org/x/crypto lacks profile support.
ES256 JWS signing with kid mode, nonce management, directory discovery.
Empty profile delegates to standard library path (zero behavior change).
Configurable via CERTCTL_ACME_PROFILE env var. GUI: profile dropdown on
ACME issuer config.
2. ARI RFC 9702 → 9773 Renumber: All 25+ references updated across Go source,
docs, README, and examples. Zero remaining occurrences of RFC 9702.
3. 45-Day / Short-Lived Certificate Positioning: 5 domain tests validating
renewal thresholds against SC-081v3 validity reduction timeline (200→100→47
days) and Let's Encrypt 45-day/6-day profiles. ARI (RFC 9773) is the
expected renewal path for 6-day shortlived certs.
New tests: 13 profile + 5 domain threshold + 1 frontend = 19 new tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.
Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard
Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror M34's dynamic issuer config pattern for deployment targets: AES-256-GCM
encrypted config storage, sensitive field redaction in API responses, agent
heartbeat-based test connection endpoint, and full frontend updates including
test status indicators, source badges, and removal of stale hostname/status
fields from the Target interface.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace static env-var-based issuer wiring with GUI-driven dynamic
configuration stored encrypted in PostgreSQL. Operators can now
configure, test, enable/disable, and manage issuers from the dashboard
without restarting the server.
Key changes:
- AES-256-GCM encryption for sensitive issuer config at rest (PBKDF2
key derivation with 100k iterations)
- Dynamic IssuerRegistry with sync.RWMutex replacing static map
- Connector factory pattern (issuerfactory.NewFromConfig) replacing
140 lines of static wiring in main.go
- Migration 000009: encrypted_config, last_tested_at, test_status,
source columns on issuers table
- Env var seeding on first boot with ON CONFLICT DO NOTHING
- Registry Rebuild() for atomic map swap after CRUD operations
- Issuer type validation against domain constants on Create
- Audit trail for test connection results
- Conditional seeding for step-ca/OpenSSL (only when env vars set)
- GUI: source badge, connection test status on issuer detail page
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Google Cloud Certificate Authority Service integration via REST API
with OAuth2 service account auth (JWT→access token). Synchronous
issuance model, CA pool selection, mutex-guarded token caching,
revocation with RFC 5280 reason mapping. No Google SDK dependency —
all stdlib. 19 tests with httptest mock OAuth2 + CAS API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert 3 untagged switch statements to tagged `switch r.URL.Path {}`
form to satisfy staticcheck QF1002. No behavioral change.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dual-mode TLS connector for mail servers — single package with mode
field selecting Postfix or Dovecot defaults. File-based cert/key
deployment with correct permissions (cert 0644, key 0600), optional
chain append, shell injection prevention, and configurable
reload/validate commands. 18 tests covering config validation,
deployment, and security. GUI wizard fields and OpenAPI enum updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
File-based deployment for Envoy service mesh — writes cert/key/chain
to watched directory with optional SDS JSON config for xDS bootstrap.
Path traversal prevention, configurable filenames, 15 tests passing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete the IIS target connector with dual-mode deployment:
- WinRM proxy agent mode via masterzen/winrm for remote Windows servers
- Base64 PFX transfer with try/finally cleanup on remote host
- GUI wizard updated with 13 IIS config fields including WinRM settings
- TargetDetailPage sensitive field redaction (password/secret/token/key)
- OpenAPI TargetType enum updated (added Traefik, Caddy)
- connectors.md fully documented with WinRM proxy config example
- 38 total IIS tests (10 new WinRM tests), all passing with race detection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement full IIS target connector with PEM-to-PFX conversion via
go-pkcs12, PowerShell-based deployment (Import-PfxCertificate, IIS
binding management), SHA-1 thumbprint computation, and SNI support.
Injectable PowerShellExecutor interface enables cross-platform testing.
Regex-validated config fields prevent PowerShell injection. 28 tests.
Restructure README from 563 to 313 lines: outcome-focused feature
descriptions, "Who Is This For" persona section, examples promoted
above the fold, configuration/API/security reference moved to docs.
All numbers verified against repo (25 GUI pages, 97 OpenAPI ops,
CI thresholds service 55%/handler 60%/domain 40%/middleware 30%).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add S/MIME (emailProtection EKU) end-to-end test coverage:
- ValidateCommonName() now accepts email addresses for S/MIME certs
- S/MIME test profile (prof-test-smime) in seed data
- Phase 11 test: issuance, EKU, KeyUsage, email SAN verification
- EST config enabled in test Docker Compose
- Portable KeyUsage parsing (awk, works on BSD/GNU)
- Full test environment documentation (docs/test-env.md)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove signJWT (replaced by signJWTWithKID) and ecdsaPublicKeyToJWK
(dead code from JWE implementation) to pass CI lint checks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes 12 production bugs preventing the full issuance→deployment flow
from working with ACME (Pebble/Let's Encrypt) and step-ca issuers:
ACME connector (acme.go):
- Save orderURI before WaitOrder overwrites it (Go crypto/acme bug)
- Add CreateOrderCert fallback via WaitOrder+FetchCert
- Remove defer-reset in ValidateConfig that caused nil pointer panic
- Add Insecure TLS option for self-signed ACME servers (Pebble)
step-ca connector (stepca.go, jwe.go):
- Real JWE provisioner key loading + decryption (was using ephemeral keys)
- Fix JWT audience (/1.0/sign), sha claim (key fingerprint), kid header
- Custom root CA trust via RootCertPath config
- Remove hardcoded 90-day validity default (let step-ca decide)
NGINX target connector (nginx.go):
- Use sh -c for validate/reload commands (shell interpretation)
- Use filepath.Dir instead of fragile string slicing
- Add private key file writing (agent-mode keys were never deployed)
- Make chain_path write conditional
Server/service layer:
- TriggerRenewalWithActor now creates actual Job records (was no-op)
- createDeploymentJobs falls back to DB query when cert.TargetIDs empty
- ProcessPendingJobs skips agent-routed deployment jobs
- Agent cert pickup path parsing: len(parts)<4 → len(parts)<3
- Health/ready/auth-info endpoints bypass auth middleware
- Write timeout 15s→120s for ACME issuance
- Cert fingerprint computed on CSR submission
Integration test environment (deploy/test/):
- 10-phase test script covering Local CA, ACME, step-ca, revocation,
discovery, renewal, and API spot checks
- Docker Compose with 7 containers (server, agent, postgres, nginx,
pebble, challtestsrv, step-ca) on isolated network
- TLS verification checks SAN (not just Subject CN) for modern CA compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issuer Catalog (M33):
- Shared issuer type config (issuerTypes.ts) with 6 supported + 2 coming-soon types
- Composable wizard components (TypeSelector, ConfigForm, ConfigDetailModal)
- Catalog card layout with Connected/Available/Coming Soon badges
- VaultPKI and DigiCert added to create wizard with full config fields
- ACME EAB fields (eab_kid, eab_hmac with sensitive flag)
- Issuer type filter dropdown on configured issuers table
- Config detail modal replacing 60-char truncation
- IssuerDetailPage uses shared typeLabels/redactConfig, Edit button, enabled/disabled status
- StatusBadge extended with Enabled/Disabled styles
- 2 new frontend tests (VaultPKI + DigiCert create payload verification)
Bug fixes:
- CertificateService.CreateCertificate now defaults Status to Pending and Tags to
empty map when not set (DB column DEFAULTs only apply when columns are omitted
from INSERT, but our repo always includes all columns)
- CreateCertificate handler now logs actual error via slog.Error before returning
generic 500, enabling root cause debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert `switch { case r.URL.Path == ... }` to `switch r.URL.Path { ... }`
in Vault and DigiCert connector tests to pass golangci-lint CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deployment jobs now set agent_id from target→agent relationship at
creation time. GetPendingWork() uses ListPendingByAgentID() with a
3-way UNION query (direct match, legacy NULL fallback via target JOIN,
AwaitingCSR via cert→target→agent chain) so each agent only receives
its own jobs.
- Added AgentID *string to Job domain struct
- Added agent_id to all job SQL queries (5 SELECTs, INSERT, UPDATE, scanJob)
- New ListPendingByAgentID() repository method
- Rewrote GetPendingWork() from ~25 lines to single scoped query
- 4 new Go tests (3 agent routing + 1 deployment agent_id)
- Frontend: agent_id/target_id on Job type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CheckExpiringCertificates() now queries each issuer's ARI endpoint
before creating renewal jobs. If the CA says "not yet" (suggested
window hasn't opened), renewal is deferred. ARI errors fall back
gracefully to threshold-based logic. Audit trail records
renewal_trigger=ari when ARI drives the decision.
4 new unit tests: ShouldRenewNow, NotYet, NilFallback, ErrorFallback.
3 new smoke tests in testing-guide.md Part 35.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: certificate_versions.csr_pem is nullable in the schema but
Go code scanned it into a plain string. Used sql.NullString in
ListVersions and GetLatestVersion to handle NULL values correctly.
Also includes: partial update fetch-merge-update pattern to prevent FK
violations, nil directory guard in discovery service, diagnostic slog
logging in handlers, export handler 422 for unparseable PEM, OpenAPI
spec corrections, MCP tool description improvements, and test fixes.
Rewrites the Release Sign-Off section in testing-guide.md to individual
test-level granularity (320 rows) with smoke test results audited and
checked off (121 pass, 5 skip, 194 manual remaining).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use type conversion DigestStatusCount(c) instead of struct literal
- Replace with...else-if (invalid in Go templates) with if...else-if chain
- Add *.bak and cmd/agent/*.key/*.pem to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add certificate export in PEM (JSON or file download) and PKCS#12 formats.
Private keys are never included — they stay on agents. Add EKU-aware
issuance threading profile EKUs (serverAuth, clientAuth, codeSigning,
emailProtection, timeStamping) through the full issuance pipeline. Fix
agent CSR SAN splitting for email addresses, adaptive KeyUsage flags for
S/MIME vs TLS, and a pre-existing generateID collision bug in deployment
job creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Root cause: WaitForCompletion only waited for work goroutines (wg),
but the 5-6 loop goroutines (renewalCheckLoop, jobProcessorLoop, etc.)
were not tracked. After cancel() + WaitForCompletion(), loop goroutines
could still be alive accessing scheduler/mock fields when the next test
started, triggering the race detector.
Fix:
- Start() now adds loop goroutines to wg, so WaitForCompletion blocks
until both work items AND loops have fully exited
- Removed untracked 100ms timer goroutine for startedChan — now closed
immediately after launching loops
- Timeout test updated: uses blockCh (ignores context) instead of
slowDelay (respects context) so it reliably triggers the timeout path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The "run immediately on start" goroutines in 5 scheduler loops did not
set the idempotency guard (atomic.Bool), allowing the first ticker tick
to spawn a concurrent execution. The race detector caught overlapping
goroutines calling the same service method simultaneously.
Fix: set the Running flag before spawning the initial goroutine and
clear it in the defer, same pattern as ticker-triggered goroutines.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit replaced `certs, total, err :=` with `_, total, err :=`
but certs was used on a subsequent line. Keep the declaration and suppress
the SA4006 warning with a blank assignment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SA5011: use t.Fatal instead of t.Error before nil pointer access in
verification handler tests (stops test execution on nil)
- SA4006: replace unused lvalues with _ in repo_test.go and team_test.go
- ST1020: fix comment format on ListViolations to match method name
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add context.Context to handler test mocks (agent, agent_group)
- Refactor scheduler to use local interfaces instead of concrete service types
- Wire RevocationSvc/CAOperationsSvc sub-services in integration tests
- Add context.Background() to service test calls (agent, agent_group)
- Fix repo integration tests: add FK prerequisite records (team, owner,
issuer, renewal_policy) before creating certificates
- Set MaxOpenConns(1) on test DB to preserve SET search_path across queries
- Fix Apache/HAProxy tests: replace "echo ok"/"echo reload" with "true"
binary to avoid macOS exec.Command PATH resolution failure
- Fix validation tests: correct error expectations for regex-first checks,
replace null byte strings with strings.Repeat for length tests
- Fix scheduler timeout test flakiness with t.Skip fallback
- Remove unused imports (context in ca_operations_test, service in scheduler)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 3 deferred security tickets (TICKET-003, TICKET-007, TICKET-010)
and performs comprehensive documentation audit to eliminate drift between
code and docs.
Code changes:
- TICKET-003: Repository integration tests with testcontainers-go (50+ subtests)
- TICKET-007: CertificateService decomposition into RevocationSvc + CAOperationsSvc
- TICKET-010: Request body size limits via http.MaxBytesReader middleware
- Fix missing slog import in certificate.go after service decomposition
Documentation updates:
- README: Fix endpoint count (97→93), expand env var reference (15→39 vars)
- CLAUDE.md: Fix OpenAPI operation count (85→93), update file locations
- architecture.md: Add body size limits section, middleware chain ordering
- CONTRIBUTING.md: New contributor guide with architecture conventions,
test patterns, middleware ordering, CI thresholds
- SECURITY_REMEDIATION.md: Removed from repo (moved to cowork, gitignored)
- Test files: Add doc comments to all new test files
Documentation that should exist but doesn't yet:
- Architecture diagrams (C4 model or similar)
- Threat model document
- Testing philosophy guide
- Disaster recovery runbook
- Upgrade guide (migration between versions)
- API versioning strategy document
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added Go native fuzz tests (testing/fuzz) for security-critical input validation:
1. FuzzValidateShellCommand in internal/validation/command_fuzz_test.go
- Tests shell command validation with injection payloads (;, |, &, $, `, etc.)
- Seed corpus includes valid commands and dangerous metacharacters
- Ensures function never panics under fuzzing
2. FuzzValidateDomainName in internal/validation/command_fuzz_test.go
- Tests RFC 1123 domain validation with wildcard support
- Seed corpus includes SQL injection, path traversal, and malformed domains
- Ensures function never panics under fuzzing
3. FuzzValidateACMEToken in internal/validation/command_fuzz_test.go
- Tests base64url token validation
- Seed corpus includes injection payloads and special characters
- Ensures function never panics under fuzzing
4. FuzzIsValidRevocationReason in internal/domain/revocation_fuzz_test.go
- Tests RFC 5280 revocation reason validation
- Seed corpus includes case variations, injection attempts, and null bytes
- Ensures function never panics and returns only valid booleans
5. FuzzCRLReasonCode in internal/domain/revocation_fuzz_test.go
- Tests CRL reason code mapping
- Validates return codes are within 0-9 range
- Ensures invalid reasons default to 0 (unspecified)
All fuzz tests follow Go 1.18+ testing/fuzz conventions with seed corpus
for faster discovery of edge cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The audit middleware records events asynchronously via goroutines. Tests previously
used time.Sleep(50ms) to wait for audit recording, which is unreliable.
Implemented waitableAuditRecorder wrapper that:
- Wraps mockAuditRecorder to intercept RecordAPICall invocations
- Signals via buffered channel when recording completes
- Provides Wait(timeout) method for tests to synchronously wait
- Returns true on successful wait, false on timeout
Replaced all 7 time.Sleep(50ms) calls with recorder.Wait(1*time.Second) calls,
improving test reliability and reducing flakiness.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the 18-parameter RegisterHandlers function signature with a cleaner
HandlerRegistry struct that groups all API handler dependencies. This eliminates
the signature explosion that made the function difficult to read and maintain.
Changes:
- Added HandlerRegistry struct with 18 fields grouping all handler types
- Updated RegisterHandlers to accept a single HandlerRegistry parameter
- Updated all internal handler references to use reg.FieldName syntax
- Updated call sites in cmd/server/main.go and integration tests
- No functional changes, purely structural refactoring
Resolves TICKET-006: RegisterHandlers Signature Explosion
- Updated AgentService interface to accept context.Context parameter in all methods
- Replaced context.Background() calls with proper ctx parameter in agent.go
- Updated AgentGroupService interface to accept context.Context parameter
- Replaced context.Background() calls with proper ctx parameter in agent_group.go
- Updated handler methods to pass r.Context() to service methods
- Context now properly propagates through request lifecycle for timeout/cancellation
- Improved request tracing and cancellation behavior
## Summary
Fixes two critical scheduler reliability issues in certctl:
### TICKET-002 (CRITICAL): Scheduler job idempotency
- Added atomic.Bool guards to all 6 scheduler loops (renewal, job processor, agent health, notifications, short-lived expiry, network scan)
- Uses CompareAndSwap pattern to prevent duplicate execution if previous job is still running
- Logs warning when a tick is skipped due to in-flight work
- Prevents runaway scheduler duplicates and resource exhaustion
### TICKET-011 (MEDIUM): Graceful shutdown
- Added sync.WaitGroup to track in-flight scheduler work
- Each job is wrapped in wg.Add(1)/wg.Done() for lifecycle tracking
- New WaitForCompletion(timeout) method waits for all in-flight work to complete
- Integrates into main.go: after context cancellation, waits up to 30s for jobs to finish before closing DB
- Graceful shutdown ensures no work is lost during server restart/termination
## Changes
**internal/scheduler/scheduler.go:**
- Imports: added "errors", "sync", "sync/atomic"
- Scheduler struct: added 6 atomic.Bool fields (one per loop) + sync.WaitGroup
- All 6 loop functions: spawn goroutines with wg.Add/Done, check atomic guard on each tick, skip tick if already running
- New WaitForCompletion(timeout) method with timeout support
- New ErrSchedulerShutdownTimeout error type
**cmd/server/main.go:**
- After context cancellation and before HTTP shutdown, call sched.WaitForCompletion(30 * time.Second)
- Logs "waiting for scheduler to complete in-flight work" and any errors
**internal/scheduler/scheduler_test.go (new file):**
- Mock services for testing (renewal, job, agent, notification, network scan)
- TestSchedulerIdempotencyGuard: verifies slow job doesn't cause duplicate execution
- TestWaitForCompletionSuccess: verifies graceful shutdown with adequate timeout
- TestWaitForCompletionTimeout: verifies timeout is respected
- TestSchedulerMultipleLoopsIdempotency: verifies all 6 loops respect idempotency
- TestSchedulerGracefulShutdown: end-to-end graceful shutdown flow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>