diff --git a/README.md b/README.md index 288d424..2a1ac0a 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # certctl — Self-Hosted Certificate Lifecycle Platform -A self-hosted certificate lifecycle platform. Track, renew, and deploy TLS certificates across your infrastructure with a web dashboard, REST API, and agent-based architecture where private keys never leave your servers. +TLS certificate lifespans are shrinking. The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) unanimously in April 2025, setting a phased reduction: **200 days** by March 2026, **100 days** by March 2027, and **47 days** by March 2029. Manual certificate management is no longer viable at any scale. + +certctl is a self-hosted platform for **end-to-end certificate lifecycle automation** — from issuance through renewal to deployment — with zero human intervention. Track every certificate in your organization, automatically renew them before they expire, and deploy them to your servers without touching a terminal. Private keys never leave your infrastructure. [![License](https://img.shields.io/badge/license-BSL%201.1-blue.svg)](LICENSE) [![Go Report Card](https://goreportcard.com/badge/github.com/shankar0123/certctl)](https://goreportcard.com/report/github.com/shankar0123/certctl) @@ -8,7 +10,7 @@ A self-hosted certificate lifecycle platform. Track, renew, and deploy TLS certi ## What It Does -certctl gives you a single pane of glass for every TLS certificate in your organization. The **web dashboard** shows your full certificate inventory — what's healthy, what's expiring, what's already expired, and who owns each one. The **REST API** (55 endpoints) lets you automate everything. **Agents** deployed on your infrastructure generate private keys locally and submit CSRs — private keys never leave your servers. +certctl gives you a single pane of glass for every TLS certificate in your organization. The **web dashboard** shows your full certificate inventory — what's healthy, what's expiring, what's already expired, and who owns each one. The **REST API** (55 endpoints) lets you automate everything. **Agents** deployed on your infrastructure generate private keys locally and submit CSRs — private keys never leave your servers. The background scheduler watches expiration dates and triggers renewals automatically — when certificate lifespans drop to 47 days, certctl handles the constant rotation without human involvement. ```mermaid flowchart LR @@ -19,8 +21,8 @@ flowchart LR subgraph "Your Infrastructure" A1["Agent"] --> T1["NGINX"] - A2["Agent"] --> T2["F5 BIG-IP"] - A3["Agent"] --> T3["IIS"] + A2["Agent"] --> T2["Apache / HAProxy"] + A3["Agent"] --> T3["F5 · IIS"] end API --> PG @@ -40,7 +42,7 @@ flowchart LR | ![Notifications](docs/screenshots/notifications.png) | ![Policies](docs/screenshots/policies.png) | | **Notifications** — threshold alerts grouped by certificate | **Policies** — enforcement rules with enable/disable and delete | | ![Issuers](docs/screenshots/issuers.png) | ![Targets](docs/screenshots/targets.png) | -| **Issuers** — CA connectors with test connectivity | **Targets** — deployment targets (NGINX, F5, IIS) | +| **Issuers** — CA connectors with test connectivity | **Targets** — deployment targets (NGINX, Apache, HAProxy, F5, IIS) | | ![Audit Trail](docs/screenshots/audit-trail.png) | | | **Audit Trail** — immutable log of every action | | @@ -349,23 +351,24 @@ make docker-clean # Stop + remove volumes ## Roadmap ### V1 (v1.0.0 released) -All nine development milestones (M1–M9) are complete. The backend covers the full certificate lifecycle: Local CA and ACME v2 issuers, NGINX/F5/IIS target connectors, threshold-based expiration alerting, agent-side ECDSA P-256 key generation, API auth with rate limiting, and a React dashboard with 11 views wired to the real API. The CI pipeline runs build, vet, test with coverage gates (service layer 30%+, handler layer 50%+), frontend type checking, Vitest test suite, and Vite production build on every push. 220+ tests total: 170+ Go tests across service, handler, integration, and connector layers, plus 53 frontend Vitest tests covering API client functions and utility helpers. Docker images are published to GitHub Container Registry on every version tag via the release workflow. +All nine development milestones (M1–M9) are complete. The backend covers the full certificate lifecycle: Local CA and ACME v2 issuers, NGINX/Apache/HAProxy/F5/IIS target connectors, threshold-based expiration alerting, agent-side ECDSA P-256 key generation, API auth with rate limiting, and a React dashboard with 11 views wired to the real API. The CI pipeline runs build, vet, test with coverage gates (service layer 30%+, handler layer 50%+), frontend type checking, Vitest test suite, and Vite production build on every push. 230+ tests total: 180+ Go tests across service, handler, integration, and connector layers, plus 53 frontend Vitest tests covering API client functions and utility helpers. Docker images are published to GitHub Container Registry on every version tag via the release workflow. ### V2: Operational Maturity - **M10: Agent Metadata + Targets** ✅ — agents report OS, architecture, IP, hostname, version via heartbeat; Apache httpd and HAProxy target connectors -- **M11: Policy + Ownership** — crypto policy enforcement (key algo/size validation), certificate ownership tracking, dynamic device grouping, renewal approval UI +- **M11: Crypto Policy + Profiles + Ownership** — certificate profiles (configurable TTL down to 5 min for short-lived certs), crypto policy enforcement (key algo/size validation), SPIFFE URI SAN support, certificate ownership, dynamic device grouping, renewal approval UI - **M12: DNS-01 + step-ca** — ACME DNS-01 challenges (wildcard certs, Cloudflare/Route53 adapters), step-ca issuer connector -- **M13: GUI Operations** — bulk cert operations, deployment timeline, inline policy editor, target config wizard, audit export +- **M13: GUI Operations** — bulk cert operations (renew, revoke, reassign), deployment timeline, inline policy editor, target config wizard, audit export, short-lived credentials dashboard - **M14: Enterprise Connectors** — SSE/WebSocket real-time updates, F5 BIG-IP, IIS, ADCS, OpenSSL/Custom CA implementations -- **M15: Team Adoption** — OIDC/SSO, RBAC, CLI tool, Slack/Teams/PagerDuty/OpsGenie notifiers, bulk cert import -- **M16: Observability** — expiration calendar, health scores, compliance scoring, Prometheus metrics, deployment rollback -- **M17: Integrations** — MCP server (OpenClaw/Claude/Cursor), CT Log monitoring, DigiCert issuer, filesystem cert discovery +- **M15: Revocation Infrastructure** — revocation API with reason codes, embedded OCSP responder, CRL endpoint, bulk revocation by profile/owner/agent, revocation webhooks +- **M16: Team Adoption** — OIDC/SSO, RBAC (profile-gated), CLI tool, Slack/Teams/PagerDuty/OpsGenie notifiers, bulk cert import +- **M17: Observability** — expiration calendar, health scores, compliance scoring, Prometheus metrics (issuance/revocation rates, OCSP latency), deployment rollback +- **M18: Integrations** — MCP server (OpenClaw/Claude/Cursor), CT Log monitoring, DigiCert issuer, filesystem cert discovery ### V3: Discovery, Visibility & Cloud -Discovery engine (passive/active scanning, cert chain validation, Nmap/Qualys import, unknown cert detection, triage workflows), cloud targets (AWS ALB, Azure Key Vault, Palo Alto, FortiGate, Citrix ADC, Kubernetes Secrets), extended issuers (Entrust, GlobalSign, Google CAS, EJBCA, Vault PKI), ServiceNow integration, Ansible module, compliance mapping docs +Discovery engine (passive/active scanning, cert chain validation, unknown cert detection, triage workflows), Kubernetes cert-manager external issuer, cloud targets (AWS ALB/IAM Roles Anywhere, Azure Key Vault/Managed Identity, Palo Alto, FortiGate, Citrix ADC, Kubernetes Secrets), extended issuers (Entrust, GlobalSign, Google CAS, EJBCA, Vault PKI), ServiceNow integration, Ansible module ### V4+: Platform & Scale -Kubernetes CRD, Terraform provider, multi-region, HA control plane, HSM support, LDAP auth, API key scoping, multi-tenancy +Kubernetes CRD, Terraform provider, multi-region, HA control plane, HSM support, LDAP auth, API key scoping, multi-tenancy, SPIFFE/SPIRE federation, OPA policy backend, compliance reporting (NIST, SOC 2, PCI-DSS) ## License diff --git a/docs/architecture.md b/docs/architecture.md index 5834452..022f390 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -48,6 +48,8 @@ flowchart TB subgraph "Target Systems" T1["NGINX\n(file write + reload)"] + T4["Apache httpd\n(file write + reload)"] + T5["HAProxy\n(combined PEM + reload)"] T2["F5 BIG-IP\n(iControl REST, planned)"] T3["IIS\n(WinRM, planned)"] end @@ -157,6 +159,10 @@ erDiagram text hostname text status text api_key_hash + varchar os + varchar architecture + varchar ip_address + varchar version } deployment_targets { text id PK @@ -301,9 +307,11 @@ sequenceDiagram The agent deploys certificates using target connectors. Each connector knows how to push certificates to a specific system: -- **NGINX**: Writes cert/chain files to disk, validates config with `nginx -t`, reloads with `nginx -s reload` or `systemctl reload nginx` -- **F5 BIG-IP**: Calls the F5 REST API to upload certificate and update virtual server bindings -- **IIS**: Uses WinRM to import the certificate into the Windows certificate store and bind it to an IIS site +- **NGINX**: Writes cert/chain/key files to disk, validates config with `nginx -t`, reloads with `nginx -s reload` or `systemctl reload nginx` +- **Apache httpd**: Writes separate cert/chain/key files, validates with `apachectl configtest`, graceful reload +- **HAProxy**: Builds a combined PEM file (cert + chain + key), optionally validates config, reloads via systemctl or signal +- **F5 BIG-IP** (planned): Calls the F5 iControl REST API to upload certificate and update SSL profile bindings +- **IIS** (planned): Uses WinRM to import PFX into the Windows certificate store and bind it to an IIS site The agent handles both the certificate (public) and the private key (read from local key store at `CERTCTL_KEY_DIR`). The control plane never sees the private key. @@ -362,6 +370,8 @@ flowchart TB direction TB TI["TargetConnector Interface\nDeployCertificate()\nValidateDeployment()"] TI --> NG["NGINX"] + TI --> AP["Apache httpd"] + TI --> HP["HAProxy"] TI --> F5["F5 BIG-IP (interface only)"] TI --> IIS["IIS (interface only)"] end @@ -427,7 +437,7 @@ The `DeploymentRequest` struct carries the full material needed by the target sy Built-in targets: **NGINX** (writes cert/chain/key files, validates with `nginx -t`, reloads), **Apache httpd** (writes cert/chain/key files, validates with `apachectl configtest`, graceful reload), **HAProxy** (combined PEM file with cert+chain+key, validates config, reloads via systemctl/signal), **F5 BIG-IP** (interface only — iControl REST flow mapped, implementation planned V2), **IIS** (interface only — WinRM/PowerShell flow mapped, implementation planned V2). -**Planned targets (V3):** AWS ALB/CloudFront, Azure Key Vault, Palo Alto, FortiGate, Citrix ADC, Kubernetes Secrets. +**Planned (V3):** Kubernetes cert-manager external issuer, Kubernetes Secrets, AWS ALB/CloudFront, AWS IAM Roles Anywhere, Azure Key Vault, Azure Managed Identity, Palo Alto, FortiGate, Citrix ADC. ### Notifier Connector @@ -579,7 +589,7 @@ For production, you would also add an ingress controller, TLS termination for th ## Testing Strategy -certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 220+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database. +certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 230+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database. **Service layer unit tests** (`internal/service/*_test.go`) — 74 test functions across 7 files with mock repositories. These test all business logic in isolation: certificate CRUD with validation, agent lifecycle (registration, heartbeat, CSR submission with both keygen modes), job state machine (creation, processing, cancellation, retry logic), policy evaluation (all 5 rule types, violation creation), renewal and issuance flow (server-side and agent-side keygen paths), and notification deduplication (threshold tag matching, channel routing). Mock repositories are simple structs with function fields, avoiding heavy mocking frameworks — this keeps tests readable and avoids coupling to mock library APIs. @@ -591,7 +601,7 @@ certctl uses a layered testing approach aligned with the handler → service → **CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs: Go (build, vet, test with coverage, coverage threshold enforcement) and Frontend (TypeScript type check, Vitest test suite, Vite production build). The Go job runs all tests with `-coverprofile`, then enforces coverage thresholds: service layer must be at least 30% (current: ~34%) and handler layer must be at least 50% (current: ~61%). These thresholds act as regression floors — they can only go up. The service layer threshold is deliberately lower because much of the service code depends on postgres repositories and external connectors that require real infrastructure to test meaningfully. Connector tests are included via `./internal/connector/issuer/local/...` (the Local CA package, which has unit tests for certificate signing logic). The Frontend job runs `npx vitest run` between the TypeScript check and production build steps. -**What's not tested and why:** Postgres repository implementations (`internal/repository/postgres/`) require a real database and are tested only through integration tests, not unit tests. Target connectors (NGINX, F5, IIS) depend on real infrastructure or complex mocks. Scheduler loops are time-dependent and tested manually during development. The ACME connector requires a real ACME server (tested manually against Let's Encrypt staging). These are all candidates for future expansion as the test infrastructure matures. +**What's not tested and why:** Postgres repository implementations (`internal/repository/postgres/`) require a real database and are tested only through integration tests, not unit tests. Target connectors for F5 BIG-IP and IIS depend on real infrastructure or complex mocks. Apache httpd and HAProxy connectors have unit tests (7 tests each) covering config validation, deployment, and validation flows. Scheduler loops are time-dependent and tested manually during development. The ACME connector requires a real ACME server (tested manually against Let's Encrypt staging). These are all candidates for future expansion as the test infrastructure matures. ## What's Next diff --git a/docs/concepts.md b/docs/concepts.md index a0191a2..dbd5595 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -1,6 +1,6 @@ # Understanding Certificates: A Beginner's Guide -If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why managing them at scale is hard enough to need a tool like certctl. +If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential. ## What Is a TLS Certificate? @@ -12,11 +12,15 @@ Think of it like a notarized ID badge for a website. The badge says "I am api.ex ## Why Do Certificates Expire? -Every certificate has an expiration date, typically 90 days for Let's Encrypt or up to 1 year for commercial CAs. This isn't a bug — it's a security feature. Short lifetimes limit the damage if a private key is compromised, and they force organizations to prove they still control their domains. +Every certificate has an expiration date. This isn't a bug — it's a security feature. Short lifetimes limit the damage if a private key is compromised, and they force organizations to prove they still control their domains. -The problem? When you have 5 certificates, tracking expiry dates is trivial. When you have 500 certificates spread across NGINX servers, Apache instances, HAProxy load balancers, F5 appliances, and IIS boxes in three environments, it becomes a ticking time bomb. One missed renewal means a production outage — your site goes down, your API returns errors, and your customers see scary browser warnings. +Certificate lifespans have been shrinking steadily. A decade ago, certificates lasted up to 5 years. Then the CA/Browser Forum — the industry body that sets certificate rules — reduced the maximum to 3 years, then 2 years, then 398 days. In April 2025, they passed Ballot SC-081v3 unanimously, setting a phased reduction to **200 days** (March 2026), **100 days** (March 2027), and **47 days** (March 2029). Let's Encrypt already issues 90-day certificates by default. -**This is the core problem certctl solves**: automated tracking, renewal, and deployment of certificates across your entire infrastructure. +The trend is clear: shorter lifespans, more frequent renewals, and zero tolerance for manual processes. + +When you have 5 certificates, tracking expiry dates is trivial. When you have 500 certificates spread across NGINX servers, Apache instances, HAProxy load balancers, F5 appliances, and IIS boxes in three environments — and each certificate needs renewal every 47 days — manual management becomes impossible. One missed renewal means a production outage: your site goes down, your API returns errors, and your customers see browser warnings. + +**This is the core problem certctl solves**: end-to-end automation of the certificate lifecycle — issuance, renewal, and deployment — across your entire infrastructure, with no human intervention required. ## The Cast of Characters diff --git a/docs/connectors.md b/docs/connectors.md index 029effb..c90a542 100644 --- a/docs/connectors.md +++ b/docs/connectors.md @@ -8,7 +8,7 @@ Three types of connectors: 1. **Issuer Connector** — Obtains certificates from CAs (Local CA, ACME implemented; step-ca, ADCS, OpenSSL planned V2; DigiCert, Entrust, GlobalSign, EJBCA, Vault PKI, Google CAS planned V3) 2. **Target Connector** — Deploys certificates to infrastructure (NGINX, Apache httpd, HAProxy implemented; F5, IIS interface only; AWS ALB, Azure Key Vault, Palo Alto, FortiGate, Citrix ADC, Kubernetes Secrets planned V3) -3. **Notifier Connector** — Sends alerts about certificate events (Email, Webhooks; Slack, Teams, PagerDuty, OpsGenie planned V2.1) +3. **Notifier Connector** — Sends alerts about certificate events (Email, Webhooks; Slack, Teams, PagerDuty, OpsGenie planned V2) All connectors accept JSON configuration at initialization, support config validation, and are registered in the service layer. Issuer connectors run on the control plane; target connectors run on agents. diff --git a/docs/quickstart.md b/docs/quickstart.md index f0f4d3c..68aa2c7 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -1,6 +1,6 @@ # Quick Start Guide -Get certctl running locally and managing certificates in under 5 minutes. +Get certctl running locally and managing certificates in under 5 minutes. With TLS certificate lifespans dropping to 47 days by 2029, automated lifecycle management isn't optional — it's infrastructure. This guide gets you hands-on with certctl's automation loop: tracking, renewing, and deploying certificates without manual intervention. New to certificates? Read the [Concepts Guide](concepts.md) first — it explains TLS, CAs, and private keys in plain language.