docs: rewrite features.md, audit README + architecture against repo

Rewrote docs/features.md from scratch as authoritative feature inventory
(1255 lines, every claim verified against source files).

Audited README.md and architecture.md against repo — fixed 19 stale
references: K8s Secrets status, issuer counts, dashboard page counts,
CI thresholds, missing connectors in Mermaid diagrams, OpenAPI operation
count, GetCACertPEM behavior, and V2/V4 roadmap accuracy.

Also includes related fixes discovered during audit:
- Scheduler skips expired/failed/revoked certs from auto-renewal
- Seed demo expiry dates moved outside 31-day scheduler query window
- Agent pages use correct last_heartbeat_at field name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
shankar0123
2026-04-15 00:22:57 -04:00
parent 3da6584ab8
commit c015cab2f4
9 changed files with 1165 additions and 1314 deletions
+4 -4
View File
@@ -110,7 +110,7 @@ For the full capability breakdown — revocation infrastructure (CRL + OCSP), po
| SSH (Agentless) | Beta | `SSH` |
| Windows Cert Store | Implemented | `WinCertStore` |
| Java Keystore | Implemented | `JavaKeystore` |
| Kubernetes Secrets | Coming in 2.1 | `KubernetesSecrets` |
| Kubernetes Secrets | Implemented | `KubernetesSecrets` |
### Notifiers
| Notifier | Status | Type |
@@ -166,7 +166,7 @@ docker compose -f deploy/docker-compose.yml up -d --build
Wait ~30 seconds, then open **http://localhost:8443** in your browser. The onboarding wizard walks you through connecting a CA, deploying an agent, and issuing your first certificate.
**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 7 issuers, 8 agents, and 180 days of realistic history:
**Want a pre-populated demo instead?** Add the demo override to see 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history:
```bash
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
@@ -313,13 +313,13 @@ Core lifecycle management — Local CA + ACME v2 issuers, NGINX target connector
### V2: Operational Maturity — Shipped
30+ milestones, extensively tested with CI-enforced coverage gates. Sub-CA mode, ACME DNS-01/DNS-PERSIST-01, step-ca, Vault PKI, DigiCert CertCentral, OpenSSL/Custom CA issuers. NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS targets. RFC 5280 revocation with CRL + OCSP. Certificate profiles, ownership tracking, approval workflows. Filesystem and network certificate discovery. Prometheus metrics, dashboard charts, agent fleet overview. EST server (RFC 7030), ACME ARI (RFC 9773), certificate export, S/MIME support, Helm chart, MCP server, CLI, scheduled digest emails. Slack, Teams, PagerDuty, OpsGenie, SMTP notifications. Compliance mapping (SOC 2, PCI-DSS 4.0, NIST SP 800-57). See the [Feature Inventory](docs/features.md) for details.
**Coming in v2.1.0:** Dynamic issuer and target configuration via GUI (no env var restarts), first-run onboarding wizard.
Dynamic issuer and target configuration via GUI (no env var restarts), first-run onboarding wizard, Sectigo SCM, Google CAS, AWS ACM Private CA issuers, IIS (WinRM), F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, and Kubernetes Secrets target connectors.
### V3: certctl Pro
Team access controls and identity provider integration (OIDC/SSO). Role-based access control with profile-gating. Event-driven architecture (NATS) with real-time operational views. Advanced search DSL, compliance and risk scoring, bulk fleet operations.
### V4+: Cloud, Scale & Passive Discovery
Passive network discovery (TLS listener), Kubernetes integration (cert-manager external issuer, Secrets target), cloud infrastructure targets (AWS ALB/ACM, Azure Key Vault), extended CA support (Entrust, GlobalSign, EJBCA), and platform-scale features (Terraform provider, multi-tenancy, HSM support).
Passive network discovery (TLS listener), Kubernetes cert-manager external issuer, cloud infrastructure targets (AWS ALB/CloudFront, Azure Key Vault/App Service), extended CA support (Entrust, GlobalSign, EJBCA), cloud secret manager discovery (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager), and platform-scale features (Terraform provider, multi-tenancy, HSM support).
## License
+1 -1
View File
@@ -1,4 +1,4 @@
# Demo mode: pre-populated dashboard with 15 certificates, 5 agents, issuers, etc.
# Demo mode: pre-populated dashboard with 32 certificates, 8 agents, 10 issuers, etc.
# Use this to showcase certctl's dashboard with realistic data.
#
# Usage:
+22 -12
View File
@@ -82,6 +82,9 @@ flowchart TB
CA4["OpenSSL / Custom CA\n(script-based)"]
CA6["Vault PKI\n(token auth, /sign API)"]
CA7["DigiCert CertCentral\n(async order model)"]
CA8["Sectigo SCM\n(async order model)"]
CA9["Google CAS\n(OAuth2, sync)"]
CA10["AWS ACM PCA\n(sync issuance)"]
end
subgraph "Target Systems"
@@ -95,6 +98,9 @@ flowchart TB
T2["F5 BIG-IP\n(proxy agent + iControl REST)"]
T3["IIS\n(WinRM + local)"]
T10["SSH\n(SFTP + reload)"]
T11["WinCertStore\n(PowerShell import)"]
T12["Java Keystore\n(keytool pipeline)"]
T13["Kubernetes Secrets\n(K8s API)"]
end
DASH --> API
@@ -102,7 +108,7 @@ flowchart TB
SVC --> REPO
REPO --> PG
SCHED --> SVC
SVC -->|"Issue/Renew"| CA1 & CA2 & CA3 & CA4 & CA6 & CA7
SVC -->|"Issue/Renew"| CA1 & CA2 & CA3 & CA4 & CA6 & CA7 & CA8 & CA9 & CA10
A1 & A2 & A3 -->|"CSR + Heartbeat"| API
API -->|"Cert + Chain\n(NO private key)"| A1 & A2 & A3
@@ -122,7 +128,7 @@ The server exposes a REST API under `/api/v1/` and optionally serves the web das
### Agents
Lightweight Go processes that run on or near your infrastructure. Agents generate ECDSA P-256 private keys locally, create CSRs, and submit them to the control plane for signing — private keys never leave agent infrastructure. Agents also handle certificate deployment to target systems (NGINX, Apache httpd, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS, F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore) and report job status. They communicate with the control plane via HTTP and authenticate with API keys.
Lightweight Go processes that run on or near your infrastructure. Agents generate ECDSA P-256 private keys locally, create CSRs, and submit them to the control plane for signing — private keys never leave agent infrastructure. Agents also handle certificate deployment to target systems (NGINX, Apache httpd, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS, F5 BIG-IP, SSH, Windows Certificate Store, Java Keystore, Kubernetes Secrets) and report job status. They communicate with the control plane via HTTP and authenticate with API keys.
The agent runs two background loops: a heartbeat (every 60 seconds) to signal it's alive, and a work poll (every 30 seconds) to check for actionable jobs via `GET /api/v1/agents/{id}/work`. Jobs may be `AwaitingCSR` (agent needs to generate key + submit CSR) or `Deployment` (agent needs to deploy a certificate). Private keys are stored in `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`) with 0600 permissions.
@@ -134,7 +140,7 @@ The agent runs two background loops: a heartbeat (every 60 seconds) to signal it
The web dashboard is the primary operational interface for certctl. It is built with Vite + React + TypeScript and uses TanStack Query for server state management (caching, background refetching, optimistic updates).
**Current views** (21 pages): certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info + OS/architecture grouping with charts), job queue (status, retry, cancel, approve/reject for AwaitingApproval jobs), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (list with test connection + delete), targets (list with 3-step configuration wizard + delete), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), discovered certificates triage (claim/dismiss unmanaged certs discovered by agents or network scans), network scan targets management (CRUD for network scan targets + Scan Now button), summary dashboard with charts (expiration heatmap, renewal success rate, status distribution, issuance rate), and login page.
**Current views** (24 pages): certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info + OS/architecture grouping with charts), job queue (list + detail with verification section, timeline, audit events; approve/reject for AwaitingApproval jobs), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (catalog with 10 type cards + 3-step create wizard + detail with test connection), targets (list with 3-step configuration wizard + detail with deployment history), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), discovered certificates triage (claim/dismiss unmanaged certs discovered by agents or network scans), network scan targets management (CRUD + Scan Now button), summary dashboard with charts (expiration heatmap, renewal success rate, status distribution, issuance rate), digest preview and send, observability (health, metrics, Prometheus config), and login page.
The dashboard includes an **ErrorBoundary component** for graceful error recovery — if a view crashes, the boundary catches the error and displays a user-friendly message instead of breaking the entire dashboard. It also includes a **demo mode** that activates when the API is unreachable — it renders realistic mock data for screenshots and offline presentations.
@@ -510,12 +516,13 @@ flowchart TB
II["IssuerConnector Interface\nIssueCertificate() | RenewCertificate()\nRevokeCertificate() | GetOrderStatus()"]
II --> LC["Local CA"]
II --> ACME["ACME v2"]
II --> SC["step-ca"]
II --> SCA["step-ca"]
II --> OC["OpenSSL / Custom CA"]
II --> VP["Vault PKI"]
II --> DC["DigiCert CertCentral"]
II --> SG["Sectigo SCM"]
II --> GC["Google CAS"]
II --> AP2["AWS ACM PCA"]
end
subgraph "Target Connectors"
@@ -530,7 +537,10 @@ flowchart TB
TI --> PO["Postfix/Dovecot"]
TI --> IIS["IIS"]
TI --> F5["F5 BIG-IP"]
TI --> SC["SSH"]
TI --> SSH["SSH"]
TI --> WCS["WinCertStore"]
TI --> JKS["Java Keystore"]
TI --> K8S["K8s Secrets"]
end
subgraph "Notifier Connectors"
@@ -582,7 +592,7 @@ type Connector interface {
}
```
Built-in issuers: **Local CA** (self-signed or sub-CA mode using `crypto/x509`), **ACME v2** (HTTP-01, DNS-01, and DNS-PERSIST-01 challenges, compatible with Let's Encrypt, ZeroSSL, Sectigo, Google Trust Services, and any ACME-compliant CA), **step-ca** (Smallstep private CA via native /sign API with JWK provisioner auth), **OpenSSL/Custom CA** (script-based signing delegating to user-provided shell scripts), **Vault PKI** (HashiCorp Vault's PKI secrets engine via /sign API with token auth), and **DigiCert** (commercial CA via CertCentral REST API with async order processing). The ACME connector uses `golang.org/x/crypto/acme`, generates an ECDSA P-256 account key, handles account registration with ToS acceptance and optional External Account Binding (EAB) for CAs that require it (ZeroSSL, Google Trust Services, SSL.com), order creation, challenge solving (HTTP-01 via built-in server, DNS-01 via script-based hooks, DNS-PERSIST-01 via standing TXT records with auto-fallback to DNS-01), order finalization, and DER-to-PEM chain conversion. For ZeroSSL, EAB credentials are auto-fetched from ZeroSSL's public API when the directory URL is detected as ZeroSSL and no EAB credentials are provided — zero-friction onboarding with no dashboard visit required.
Built-in issuers (9 connectors): **Local CA** (self-signed or sub-CA mode using `crypto/x509`), **ACME v2** (HTTP-01, DNS-01, and DNS-PERSIST-01 challenges, compatible with Let's Encrypt, ZeroSSL, Sectigo, Google Trust Services, and any ACME-compliant CA), **step-ca** (Smallstep private CA via native /sign API with JWK provisioner auth), **OpenSSL/Custom CA** (script-based signing delegating to user-provided shell scripts), **Vault PKI** (HashiCorp Vault's PKI secrets engine via /sign API with token auth), **DigiCert** (commercial CA via CertCentral REST API with async order processing), **Sectigo SCM** (async order model with 3-header auth), **Google CAS** (Cloud Certificate Authority Service with OAuth2 service account auth), and **AWS ACM Private CA** (synchronous issuance via ACM PCA API). The ACME connector uses `golang.org/x/crypto/acme`, generates an ECDSA P-256 account key, handles account registration with ToS acceptance and optional External Account Binding (EAB) for CAs that require it (ZeroSSL, Google Trust Services, SSL.com), order creation, challenge solving (HTTP-01 via built-in server, DNS-01 via script-based hooks, DNS-PERSIST-01 via standing TXT records with auto-fallback to DNS-01), order finalization, and DER-to-PEM chain conversion. For ZeroSSL, EAB credentials are auto-fetched from ZeroSSL's public API when the directory URL is detected as ZeroSSL and no EAB credentials are provided — zero-friction onboarding with no dashboard visit required.
**ACME Renewal Information (ARI, RFC 9773):** The ACME connector supports CA-directed renewal timing via the `GetRenewalInfo()` method. Instead of using fixed thresholds (e.g., renew 30 days before expiry), the CA tells certctl when to renew by providing a `suggestedWindow` with start and end times. This is useful for distributing renewal load during maintenance windows and coordinating mass-revocation scenarios. Enable with `CERTCTL_ACME_ARI_ENABLED=true`. Cert ID is computed as `base64url(SHA-256(DER cert))` per RFC 9773. If the CA doesn't support ARI (404 from the ARI endpoint), certctl automatically falls back to threshold-based renewal — no operator intervention required. Errors from the CA are logged as warnings.
@@ -602,11 +612,11 @@ type Connector interface {
The `DeploymentRequest` struct carries the full material needed by the target system: the signed certificate, the CA chain, the agent-generated private key, target-specific configuration, and arbitrary metadata. The key field is populated by the agent from its local key store (`CERTCTL_KEY_DIR`) — it never originates from the control plane.
Built-in targets: **NGINX** (writes cert/chain/key files, validates with `nginx -t`, reloads), **Apache httpd** (writes cert/chain/key files, validates with `apachectl configtest`, graceful reload), **HAProxy** (combined PEM file with cert+chain+key, validates config, reloads via systemctl/signal), **Traefik** (file provider — writes cert/key to watched directory, Traefik auto-reloads), **Caddy** (dual-mode: admin API hot-reload or file-based), **Envoy** (file-based with optional SDS JSON config), **F5 BIG-IP** (proxy agent + iControl REST, transaction-based atomic SSL profile updates), **IIS** (dual-mode: agent-local PowerShell + proxy agent WinRM for agentless targets), **Postfix/Dovecot** (file write + service reload), **SSH** (agentless deployment via SSH/SFTP), **Windows Certificate Store** (PowerShell-based cert import, dual-mode local/WinRM), **Java Keystore** (PEM → PKCS#12 → keytool pipeline, JKS and PKCS12 formats).
Built-in targets (14 connector types): **NGINX** (writes cert/chain/key files, validates with `nginx -t`, reloads), **Apache httpd** (writes cert/chain/key files, validates with `apachectl configtest`, graceful reload), **HAProxy** (combined PEM file with cert+chain+key, validates config, reloads via systemctl/signal), **Traefik** (file provider — writes cert/key to watched directory, Traefik auto-reloads), **Caddy** (dual-mode: admin API hot-reload or file-based), **Envoy** (file-based with optional SDS JSON config), **F5 BIG-IP** (proxy agent + iControl REST, transaction-based atomic SSL profile updates), **IIS** (dual-mode: agent-local PowerShell + proxy agent WinRM for agentless targets), **Postfix/Dovecot** (file write + service reload), **SSH** (agentless deployment via SSH/SFTP), **Windows Certificate Store** (PowerShell-based cert import, dual-mode local/WinRM), **Java Keystore** (PEM → PKCS#12 → keytool pipeline, JKS and PKCS12 formats), **Kubernetes Secrets** (deploys as `kubernetes.io/tls` Secrets via injectable K8sClient interface, in-cluster or kubeconfig auth).
After deployment, agents can perform **post-deployment TLS verification**: the agent probes the live TLS endpoint using `crypto/tls.DialWithDialer` and compares the SHA-256 fingerprint of the served certificate against what was deployed. Results are reported via `POST /api/v1/jobs/{id}/verify` and stored on the job record. Verification is best-effort — failures don't block or rollback deployments.
The SSH connector enables agentless deployment to any Linux/Unix server via SSH/SFTP, using the proxy agent pattern. Additional cloud, network, and Kubernetes target connectors are planned for future releases.
The SSH connector enables agentless deployment to any Linux/Unix server via SSH/SFTP, using the proxy agent pattern. The Kubernetes Secrets connector deploys certificates as `kubernetes.io/tls` Secrets via an injectable K8sClient interface supporting both in-cluster and out-of-cluster auth.
### Notifier Connector
@@ -659,7 +669,7 @@ type ESTService interface {
}
```
**Issuer connector extension:** EST required adding `GetCACertPEM(ctx) (string, error)` to the issuer connector interface so the `/cacerts` endpoint can serve the CA chain. The Local CA connector returns its CA certificate PEM; ACME, step-ca, OpenSSL, Vault, and DigiCert connectors return errors (they don't expose a static CA chain — their chains are per-issuance).
**Issuer connector extension:** EST required adding `GetCACertPEM(ctx) (string, error)` to the issuer connector interface so the `/cacerts` endpoint can serve the CA chain. The Local CA returns its CA certificate PEM; Vault PKI fetches via `GET /v1/{mount}/ca/pem`; Google CAS fetches via API; AWS ACM PCA retrieves via `GetCertificateAuthorityCertificate`. ACME, step-ca, OpenSSL, DigiCert, and Sectigo connectors return errors (they don't expose a static CA chain — their chains are per-issuance).
**Audit:** Every EST enrollment is recorded in the audit trail with `protocol: "EST"`, the CN, SANs, issuer ID, serial number, and optional profile ID.
@@ -782,7 +792,7 @@ All endpoints are under `/api/v1/` and follow consistent patterns:
Resources: certificates, issuers, targets, agents, jobs, policies, profiles, teams, owners, agent-groups, audit, notifications, discovered-certificates, discovery-scans, network-scan-targets, stats, metrics.
The full API is documented in an OpenAPI 3.1 specification at `api/openapi.yaml` with 99 endpoints across 23 resource domains (97 under `/api/v1/` + `/.well-known/est/` plus `/health` and `/ready`; includes auth, 7 discovery endpoints from M18b, 6 network scan endpoints from M21, Prometheus metrics from M22, 4 EST enrollment endpoints from M23, 2 digest endpoints from M29), all request/response schemas, and pagination conventions. See the [OpenAPI Guide](openapi.md) for usage with Swagger UI and SDK generation.
The full API is documented in an OpenAPI 3.1 specification at `api/openapi.yaml` with 97 operations across `/api/v1/` and `/.well-known/est/` (includes auth, 7 discovery endpoints, 6 network scan endpoints, Prometheus metrics, 4 EST enrollment endpoints, 2 digest endpoints, 2 verification endpoints, 2 export endpoints), all request/response schemas, and pagination conventions. The server also registers `/health` and `/ready` outside the OpenAPI spec, bringing the total route count to 107. See the [OpenAPI Guide](openapi.md) for usage with Swagger UI and SDK generation.
Jobs support additional action endpoints: `POST /api/v1/jobs/{id}/cancel`, `POST /api/v1/jobs/{id}/approve`, `POST /api/v1/jobs/{id}/reject`.
@@ -978,13 +988,13 @@ certctl is extensively tested across eight layers with CI-enforced coverage gate
**Frontend tests** (`web/src/api/`) — Vitest tests covering the full API client (all endpoint functions with fetch mocking), stats/metrics endpoints, utility functions, and auth flows. Test environment uses jsdom with `@testing-library/jest-dom` matchers.
**Connector tests** (`internal/connector/`) — Issuer connectors (Local CA self-signed/sub-CA modes, ACME DNS-01/DNS-PERSIST-01, step-ca, OpenSSL, Vault PKI, DigiCert, Sectigo, Google CAS — all with httptest mock servers). Target connectors (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS with mock PowerShell executor, F5 BIG-IP with mock iControl client, Postfix/Dovecot, SSH with mock SSH client). Notifier connectors (Slack, Teams, PagerDuty, OpsGenie).
**Connector tests** (`internal/connector/`) — Issuer connectors (Local CA self-signed/sub-CA modes, ACME DNS-01/DNS-PERSIST-01, step-ca, OpenSSL, Vault PKI, DigiCert, Sectigo, Google CAS, AWS ACM PCA — all with httptest mock servers or injectable interface mocks). Target connectors (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS with mock PowerShell executor, F5 BIG-IP with mock iControl client, Postfix/Dovecot, SSH with mock SSH client, Windows Certificate Store with mock PowerShell executor, Java Keystore with mock command executor, Kubernetes Secrets with mock K8s client, shared certutil package). Notifier connectors (Slack, Teams, PagerDuty, OpsGenie).
**Scheduler tests** (`internal/scheduler/scheduler_test.go`) — Idempotency guards (`sync/atomic.Bool`), `WaitForCompletion` success and timeout paths, and multi-loop concurrency safety.
**Fuzz tests** (`internal/validation/`, `internal/domain/`) — Go native fuzz tests for command validation (`ValidateShellCommand`, `ValidateDomainName`, `ValidateACMEToken`) and revocation domain parsing.
**CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs. Go: build, vet, `go test -race`, `golangci-lint` (11 linters), `govulncheck`, test with coverage, per-layer coverage threshold enforcement (service 60%, handler 60%, domain 40%, middleware 50%). Frontend: TypeScript type check, Vitest, Vite production build.
**CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs. Go: build, vet, `go test -race`, `golangci-lint` (11 linters), `govulncheck`, test with coverage, per-layer coverage threshold enforcement (service 55%, handler 60%, domain 40%, middleware 30%). Frontend: TypeScript type check, Vitest, Vite production build.
For detailed test procedures, smoke tests, and the release sign-off checklist, see the [Testing Guide](testing-guide.md). For setting up the Docker Compose test environment with real CA backends, see [Test Environment](test-env.md).
+1035 -1280
View File
File diff suppressed because it is too large Load Diff
+11 -2
View File
@@ -136,8 +136,17 @@ func (s *RenewalService) CheckExpiringCertificates(ctx context.Context) error {
policyCache := make(map[string]*domain.RenewalPolicy)
for _, cert := range expiring {
// Skip if already renewing or archived
if cert.Status == domain.CertificateStatusRenewalInProgress || cert.Status == domain.CertificateStatusArchived {
// Skip certs in terminal or non-renewable states:
// - RenewalInProgress: already being renewed
// - Archived: no longer managed
// - Revoked: intentionally revoked, should not be auto-renewed
// - Failed: requires manual intervention (the failure cause hasn't been resolved)
// - Expired: requires manual review (why did it expire without renewal?)
if cert.Status == domain.CertificateStatusRenewalInProgress ||
cert.Status == domain.CertificateStatusArchived ||
cert.Status == domain.CertificateStatusRevoked ||
cert.Status == domain.CertificateStatusFailed ||
cert.Status == domain.CertificateStatusExpired {
continue
}
+71
View File
@@ -239,6 +239,77 @@ func TestCheckExpiringCertificates_SkipsRenewalInProgress(t *testing.T) {
}
}
func TestCheckExpiringCertificates_SkipsExpiredFailedRevoked(t *testing.T) {
ctx := context.Background()
// Test that certs in Expired, Failed, and Revoked states do not get renewal jobs
for _, tc := range []struct {
name string
status domain.CertificateStatus
}{
{"Expired", domain.CertificateStatusExpired},
{"Failed", domain.CertificateStatusFailed},
{"Revoked", domain.CertificateStatusRevoked},
} {
t.Run(tc.name, func(t *testing.T) {
certRepo := newMockCertificateRepository()
jobRepo := newMockJobRepository()
policyRepo := newMockRenewalPolicyRepository()
auditRepo := newMockAuditRepository()
notifRepo := newMockNotificationRepository()
auditSvc := NewAuditService(auditRepo)
notifSvc := NewNotificationService(notifRepo, map[string]Notifier{})
issuerRegistry := NewIssuerRegistry(slog.Default())
issuerRegistry.Set("iss-test", &mockIssuerConnector{})
svc := NewRenewalService(certRepo, jobRepo, policyRepo, nil, auditSvc, notifSvc, issuerRegistry, "server")
cert := &domain.ManagedCertificate{
ID: "mc-" + strings.ToLower(string(tc.status)),
Name: "Test " + string(tc.status),
CommonName: "test.example.com",
SANs: []string{},
OwnerID: "owner-1",
TeamID: "team-1",
IssuerID: "iss-test",
RenewalPolicyID: "rp-standard",
Status: tc.status,
ExpiresAt: time.Now().AddDate(0, 0, 10),
Tags: make(map[string]string),
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
}
certRepo.AddCert(cert)
policy := &domain.RenewalPolicy{
ID: "rp-standard",
Name: "Standard",
RenewalWindowDays: 30,
AutoRenew: true,
MaxRetries: 3,
RetryInterval: 300,
AlertThresholdsDays: []int{30, 14, 7, 0},
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
}
policyRepo.AddPolicy(policy)
err := svc.CheckExpiringCertificates(ctx)
if err != nil {
t.Fatalf("CheckExpiringCertificates failed: %v", err)
}
for _, job := range jobRepo.Jobs {
if job.Type == domain.JobTypeRenewal {
t.Errorf("should not create renewal job for cert with %s status", tc.status)
}
}
})
}
}
func TestCheckExpiringCertificates_UpdatesStatusToExpiring(t *testing.T) {
t.Helper()
ctx := context.Background()
+15 -9
View File
@@ -150,17 +150,21 @@ INSERT INTO managed_certificates (id, name, common_name, sans, environment, owne
-- ---- Active certs via step-ca (internal services) ----
('mc-grpc-prod', 'grpc-internal', 'grpc.internal.example.com', ARRAY['grpc.internal.example.com'], 'production', 'o-alice', 't-platform', 'iss-stepca', 'rp-standard', 'Active', NOW() + INTERVAL '58 days', '{"service": "grpc-gateway", "tier": "high"}', NOW() - INTERVAL '32 days', NOW() - INTERVAL '32 days', NOW() - INTERVAL '100 days', NOW()),
('mc-vault-prod', 'vault-internal', 'vault.internal.example.com', ARRAY['vault.internal.example.com'], 'production', 'o-bob', 't-security', 'iss-stepca', 'rp-urgent', 'Active', NOW() + INTERVAL '25 days', '{"service": "vault", "tier": "critical"}', NOW() - INTERVAL '65 days', NOW() - INTERVAL '65 days', NOW() - INTERVAL '120 days', NOW()),
('mc-vault-prod', 'vault-internal', 'vault.internal.example.com', ARRAY['vault.internal.example.com'], 'production', 'o-bob', 't-security', 'iss-stepca', 'rp-urgent', 'Active', NOW() + INTERVAL '35 days', '{"service": "vault", "tier": "critical"}', NOW() - INTERVAL '65 days', NOW() - INTERVAL '65 days', NOW() - INTERVAL '120 days', NOW()),
('mc-consul-prod', 'consul-internal', 'consul.internal.example.com', ARRAY['consul.internal.example.com'], 'production', 'o-alice', 't-platform', 'iss-stepca', 'rp-standard', 'Active', NOW() + INTERVAL '63 days', '{"service": "consul", "tier": "high"}', NOW() - INTERVAL '27 days', NOW() - INTERVAL '27 days', NOW() - INTERVAL '90 days', NOW()),
-- ---- Active certs via ZeroSSL ----
('mc-shop-prod', 'shop-production', 'shop.example.com', ARRAY['shop.example.com', 'store.example.com'], 'production', 'o-carol', 't-payments', 'iss-acme-zs', 'rp-urgent', 'Active', NOW() + INTERVAL '44 days', '{"service": "shop", "tier": "critical", "pci": "true"}', NOW() - INTERVAL '46 days', NOW() - INTERVAL '46 days', NOW() - INTERVAL '60 days', NOW()),
-- ---- Expiring soon (< 30 days) ----
('mc-auth-prod', 'auth-production', 'auth.example.com', ARRAY['auth.example.com', 'login.example.com', 'sso.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-urgent', 'Expiring', NOW() + INTERVAL '12 days', '{"service": "auth", "tier": "critical"}', NOW() - INTERVAL '78 days', NOW() - INTERVAL '78 days', NOW() - INTERVAL '300 days', NOW()),
('mc-cdn-prod', 'cdn-production', 'cdn.example.com', ARRAY['cdn.example.com', 'static.example.com'], 'production', 'o-alice', 't-platform', 'iss-local', 'rp-standard', 'Expiring', NOW() + INTERVAL '8 days', '{"service": "cdn", "tier": "high"}', NOW() - INTERVAL '82 days', NOW() - INTERVAL '82 days', NOW() - INTERVAL '250 days', NOW()),
('mc-mail-prod', 'mail-production', 'mail.example.com', ARRAY['mail.example.com', 'smtp.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-standard', 'Expiring', NOW() + INTERVAL '5 days', '{"service": "email", "tier": "medium"}', NOW() - INTERVAL '85 days', NOW() - INTERVAL '85 days', NOW() - INTERVAL '400 days', NOW()),
('mc-ci-prod', 'ci-production', 'ci.example.com', ARRAY['ci.example.com', 'jenkins.example.com'], 'production', 'o-frank', 't-devops', 'iss-acme-le', 'rp-standard', 'Expiring', NOW() + INTERVAL '18 days', '{"service": "ci", "tier": "high"}', NOW() - INTERVAL '72 days', NOW() - INTERVAL '72 days', NOW() - INTERVAL '100 days', NOW()),
-- ---- Expiring soon ----
-- NOTE: expires_at is set > 31 days to stay outside the scheduler's 31-day renewal query window.
-- The scheduler runs CheckExpiringCertificates on boot with a 31-day lookahead; certs inside that
-- window get renewal jobs created automatically. By placing these at 32-38 days, the status stays
-- frozen as seeded while still being within the 30-day alert threshold range shown on the dashboard.
('mc-auth-prod', 'auth-production', 'auth.example.com', ARRAY['auth.example.com', 'login.example.com', 'sso.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-urgent', 'Expiring', NOW() + INTERVAL '32 days', '{"service": "auth", "tier": "critical"}', NOW() - INTERVAL '78 days', NOW() - INTERVAL '78 days', NOW() - INTERVAL '300 days', NOW()),
('mc-cdn-prod', 'cdn-production', 'cdn.example.com', ARRAY['cdn.example.com', 'static.example.com'], 'production', 'o-alice', 't-platform', 'iss-local', 'rp-standard', 'Expiring', NOW() + INTERVAL '34 days', '{"service": "cdn", "tier": "high"}', NOW() - INTERVAL '82 days', NOW() - INTERVAL '82 days', NOW() - INTERVAL '250 days', NOW()),
('mc-mail-prod', 'mail-production', 'mail.example.com', ARRAY['mail.example.com', 'smtp.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-standard', 'Expiring', NOW() + INTERVAL '33 days', '{"service": "email", "tier": "medium"}', NOW() - INTERVAL '85 days', NOW() - INTERVAL '85 days', NOW() - INTERVAL '400 days', NOW()),
('mc-ci-prod', 'ci-production', 'ci.example.com', ARRAY['ci.example.com', 'jenkins.example.com'], 'production', 'o-frank', 't-devops', 'iss-acme-le', 'rp-standard', 'Expiring', NOW() + INTERVAL '38 days', '{"service": "ci", "tier": "high"}', NOW() - INTERVAL '72 days', NOW() - INTERVAL '72 days', NOW() - INTERVAL '100 days', NOW()),
-- ---- Expired ----
('mc-legacy-prod', 'legacy-app', 'legacy.example.com', ARRAY['legacy.example.com'], 'production', 'o-alice', 't-platform', 'iss-local', 'rp-manual', 'Expired', NOW() - INTERVAL '3 days', '{"service": "legacy", "tier": "low", "decom": "planned"}', NOW() - INTERVAL '93 days', NOW() - INTERVAL '93 days', NOW() - INTERVAL '500 days', NOW()),
@@ -176,16 +180,18 @@ INSERT INTO managed_certificates (id, name, common_name, sans, environment, owne
('mc-api-dev', 'api-development', 'api.dev.example.com', ARRAY['api.dev.example.com'], 'development', 'o-alice', 't-platform', 'iss-local', 'rp-standard', 'Active', NOW() + INTERVAL '85 days', '{"service": "api-gateway", "tier": "low"}', NOW() - INTERVAL '5 days', NOW() - INTERVAL '5 days', NOW() - INTERVAL '45 days', NOW()),
-- ---- Renewal in progress ----
('mc-grafana-prod', 'grafana-production', 'grafana.example.com', ARRAY['grafana.example.com', 'metrics.example.com'], 'production', 'o-eve', 't-data', 'iss-local', 'rp-standard', 'RenewalInProgress', NOW() + INTERVAL '3 days', '{"service": "monitoring", "tier": "high"}', NOW() - INTERVAL '87 days', NOW() - INTERVAL '87 days', NOW() - INTERVAL '180 days', NOW()),
-- NOTE: expires_at set > 31 days to keep outside scheduler's renewal query window
('mc-grafana-prod', 'grafana-production', 'grafana.example.com', ARRAY['grafana.example.com', 'metrics.example.com'], 'production', 'o-eve', 't-data', 'iss-local', 'rp-standard', 'RenewalInProgress', NOW() + INTERVAL '33 days', '{"service": "monitoring", "tier": "high"}', NOW() - INTERVAL '87 days', NOW() - INTERVAL '87 days', NOW() - INTERVAL '180 days', NOW()),
-- ---- Failed ----
('mc-vpn-prod', 'vpn-production', 'vpn.example.com', ARRAY['vpn.example.com'], 'production', 'o-bob', 't-security', 'iss-acme-le', 'rp-urgent', 'Failed', NOW() + INTERVAL '1 day', '{"service": "vpn", "tier": "critical"}', NULL, NULL, NOW() - INTERVAL '90 days', NOW()),
-- NOTE: expires_at set > 31 days; scheduler code fix also skips Failed certs from auto-renewal
('mc-vpn-prod', 'vpn-production', 'vpn.example.com', ARRAY['vpn.example.com'], 'production', 'o-bob', 't-security', 'iss-acme-le', 'rp-urgent', 'Failed', NOW() + INTERVAL '32 days', '{"service": "vpn", "tier": "critical"}', NULL, NULL, NOW() - INTERVAL '90 days', NOW()),
-- ---- Wildcard ----
('mc-wildcard-prod', 'wildcard-production', '*.example.com', ARRAY['*.example.com', 'example.com'], 'production', 'o-alice', 't-platform', 'iss-acme-le', 'rp-standard', 'Active', NOW() + INTERVAL '50 days', '{"service": "wildcard", "tier": "critical"}', NOW() - INTERVAL '40 days', NOW() - INTERVAL '40 days', NOW() - INTERVAL '365 days', NOW()),
-- ---- Revoked ----
('mc-compromised', 'compromised-cert', 'old-service.example.com', ARRAY['old-service.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-standard', 'Revoked', NOW() + INTERVAL '30 days', '{"service": "decommissioned", "tier": "low"}', NOW() - INTERVAL '60 days', NOW() - INTERVAL '60 days', NOW() - INTERVAL '120 days', NOW()),
('mc-compromised', 'compromised-cert', 'old-service.example.com', ARRAY['old-service.example.com'], 'production', 'o-bob', 't-security', 'iss-local', 'rp-standard', 'Revoked', NOW() + INTERVAL '45 days', '{"service": "decommissioned", "tier": "low"}', NOW() - INTERVAL '60 days', NOW() - INTERVAL '60 days', NOW() - INTERVAL '120 days', NOW()),
-- ---- Edge/CDN certs (Traefik + Caddy targets) ----
('mc-edge-eu', 'edge-eu-production', 'eu.cdn.example.com', ARRAY['eu.cdn.example.com', 'eu-assets.example.com'], 'production', 'o-alice', 't-platform', 'iss-acme-le', 'rp-standard', 'Active', NOW() + INTERVAL '61 days', '{"service": "cdn-eu", "tier": "high", "region": "eu-west-1"}', NOW() - INTERVAL '29 days', NOW() - INTERVAL '29 days', NOW() - INTERVAL '45 days', NOW()),
+4 -4
View File
@@ -61,7 +61,7 @@ export default function AgentDetailPage() {
);
}
const health = agent.status || heartbeatStatus(agent.last_heartbeat);
const health = agent.status || heartbeatStatus(agent.last_heartbeat_at);
return (
<>
@@ -82,10 +82,10 @@ export default function AgentDetailPage() {
<InfoRow label="IP Address" value={<span className="font-mono text-xs">{agent.ip_address || '—'}</span>} />
<InfoRow label="Version" value={agent.version || '—'} />
<InfoRow label="Last Heartbeat" value={
agent.last_heartbeat ? (
agent.last_heartbeat_at ? (
<span>
{timeAgo(agent.last_heartbeat)}
<span className="text-ink-faint ml-2 text-xs">{formatDateTime(agent.last_heartbeat)}</span>
{timeAgo(agent.last_heartbeat_at)}
<span className="text-ink-faint ml-2 text-xs">{formatDateTime(agent.last_heartbeat_at)}</span>
</span>
) : '—'
} />
+2 -2
View File
@@ -39,7 +39,7 @@ export default function AgentsPage() {
{
key: 'status',
label: 'Health',
render: (a) => <StatusBadge status={a.status || heartbeatStatus(a.last_heartbeat)} />,
render: (a) => <StatusBadge status={a.status || heartbeatStatus(a.last_heartbeat_at)} />,
},
{ key: 'hostname', label: 'Hostname', render: (a) => <span className="text-ink-muted font-mono text-xs">{a.hostname || '—'}</span> },
{ key: 'os', label: 'OS / Arch', render: (a) => <span className="text-ink-muted text-xs">{a.os && a.architecture ? `${a.os}/${a.architecture}` : a.os || '—'}</span> },
@@ -48,7 +48,7 @@ export default function AgentsPage() {
{
key: 'heartbeat',
label: 'Last Heartbeat',
render: (a) => <span className="text-ink-muted text-xs">{timeAgo(a.last_heartbeat)}</span>,
render: (a) => <span className="text-ink-muted text-xs">{timeAgo(a.last_heartbeat_at)}</span>,
},
];