mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 19:51:33 +00:00
docs: Phase 2 mechanical file moves to subdirectory structure
Pure git mv operations; no content edits. Internal links remain pointing
at old paths and will be fixed in Phase 11. Per the Phase 1 audit
recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
35 files moved across 8 audience-organized subdirectories:
docs/getting-started/ (5):
quickstart.md, concepts.md, examples.md, advanced-demo.md (was
demo-advanced.md), why-certctl.md
docs/reference/ (6):
architecture.md, api.md (was openapi.md), mcp.md,
intermediate-ca-hierarchy.md, deployment-model.md (was
deployment-atomicity.md), vendor-matrix.md (was
deployment-vendor-matrix.md)
docs/reference/protocols/ (6):
acme-server.md, acme-server-threat-model.md, scep-intune.md,
est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md)
docs/operator/ (4):
security.md, tls.md, database-tls.md, approval-workflow.md
docs/operator/runbooks/ (3):
cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md
(was runbook-expiry-alerts.md), disaster-recovery.md
docs/migration/ (3):
from-certbot.md (was migrate-from-certbot.md), from-acmesh.md
(was migrate-from-acmesh.md), cert-manager-coexistence.md (was
certctl-for-cert-manager-users.md)
docs/compliance/ (4):
index.md (was compliance.md), soc2.md (was compliance-soc2.md),
pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was
compliance-nist.md)
docs/contributor/ (4):
testing-strategy.md, test-environment.md (was test-env.md),
ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md)
Deferred to later Phase 2 sub-phases:
- connectors.md split (Phase 4): docs/connectors.md +
docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level
- testing-guide.md prune (Phase 5): docs/testing-guide.md still
at top level
- features.md disperse (Phase 6): docs/features.md still at top
level
- legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md
still at top level
- ACME walkthrough re-homing (Phase 8): three
docs/acme-*-walkthrough.md still at top level
- Upgrade docs archive (Phase 3): two docs/upgrade-*.md still
at top level
Cross-reference updates (Phase 11) will happen after all moves and
content edits land. Internal links to docs/* paths are temporarily
broken until that phase completes.
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,294 @@
|
||||
# Understanding Certificates: A Beginner's Guide
|
||||
|
||||
If you've never worked with TLS certificates before, this guide will get you up to speed. By the end, you'll understand what certificates are, why they matter, and why the industry's move toward shorter certificate lifespans — down to 47 days by 2029 — makes automated lifecycle management essential.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [What Is a TLS Certificate?](#what-is-a-tls-certificate)
|
||||
2. [Why Do Certificates Expire?](#why-do-certificates-expire)
|
||||
3. [The Cast of Characters](#the-cast-of-characters)
|
||||
- [Certificate Authority (CA)](#certificate-authority-ca)
|
||||
- [ACME Protocol](#acme-protocol)
|
||||
- [EST Protocol (Enrollment over Secure Transport)](#est-protocol-enrollment-over-secure-transport)
|
||||
- [Private Key](#private-key)
|
||||
- [Subject Alternative Names (SANs)](#subject-alternative-names-sans)
|
||||
- [Certificate Chain](#certificate-chain)
|
||||
4. [How certctl Works](#how-certctl-works)
|
||||
- [The Control Plane (Server)](#the-control-plane-server)
|
||||
- [Agents](#agents)
|
||||
- [Deployment Targets](#deployment-targets)
|
||||
5. [The Certificate Lifecycle](#the-certificate-lifecycle)
|
||||
6. [Why Not Just Use Certbot?](#why-not-just-use-certbot)
|
||||
7. [Key Concepts in certctl](#key-concepts-in-certctl)
|
||||
- [Teams and Owners](#teams-and-owners)
|
||||
- [Agent Groups](#agent-groups)
|
||||
- [Certificate Profiles](#certificate-profiles)
|
||||
- [Interactive Renewal Approval](#interactive-renewal-approval)
|
||||
- [Certificate Revocation](#certificate-revocation)
|
||||
- [Short-Lived Certificates](#short-lived-certificates)
|
||||
- [Policies](#policies)
|
||||
- [Jobs](#jobs)
|
||||
- [Audit Trail](#audit-trail)
|
||||
- [Notifications](#notifications)
|
||||
- [CLI](#cli)
|
||||
- [MCP Server (AI Integration)](#mcp-server-ai-integration)
|
||||
- [EST Enrollment (Device Certificates)](#est-enrollment-device-certificates)
|
||||
- [Certificate Discovery](#certificate-discovery)
|
||||
- [Observability](#observability)
|
||||
8. [What's Next](#whats-next)
|
||||
|
||||
## What Is a TLS Certificate?
|
||||
|
||||
When you visit `https://yourbank.com`, your browser checks a digital document called a **TLS certificate** before sending any data. That certificate proves two things: (1) you're really talking to yourbank.com and not an imposter, and (2) everything sent between you and the server is encrypted.
|
||||
|
||||
A TLS certificate is just a file — a small chunk of structured data that contains a **public key**, the **domain name** it belongs to, who **issued** it (the Certificate Authority), and when it **expires**. It's signed by a trusted third party so that browsers and clients can verify it's legitimate.
|
||||
|
||||
Think of it like a notarized ID badge for a website. The badge says "I am api.example.com," the notary (Certificate Authority) vouches for it, and anyone can check the notary's signature to confirm the badge is real.
|
||||
|
||||
## Why Do Certificates Expire?
|
||||
|
||||
Every certificate has an expiration date. This isn't a bug — it's a security feature. Short lifetimes limit the damage if a private key is compromised, and they force organizations to prove they still control their domains.
|
||||
|
||||
Certificate lifespans have been shrinking steadily. A decade ago, certificates lasted up to 5 years. Then the CA/Browser Forum — the industry body that sets certificate rules — reduced the maximum to 3 years, then 2 years, then 398 days. In April 2025, they passed Ballot SC-081v3 with zero opposition (25 CAs in favor, 5 abstentions, all 4 browser vendors in favor), setting a phased reduction to **200 days** (March 2026), **100 days** (March 2027), and **47 days** (March 2029). Let's Encrypt already issues 90-day certificates by default.
|
||||
|
||||
The trend is clear: shorter lifespans, more frequent renewals, and zero tolerance for manual processes.
|
||||
|
||||
When you have 5 certificates, tracking expiry dates is trivial. When you have 500 certificates spread across NGINX servers, Apache instances, HAProxy load balancers, F5 appliances, and IIS boxes in three environments — and each certificate needs renewal every 47 days — manual management becomes impossible. One missed renewal means a production outage: your site goes down, your API returns errors, and your customers see browser warnings.
|
||||
|
||||
**This is the core problem certctl solves**: end-to-end automation of the certificate lifecycle — issuance, renewal, and deployment — across your entire infrastructure, with no human intervention required.
|
||||
|
||||
## The Cast of Characters
|
||||
|
||||
### Certificate Authority (CA)
|
||||
|
||||
A CA is the trusted third party that signs your certificates. When a CA signs a cert, they're saying "we've verified that whoever asked for this certificate actually controls this domain." Browsers ship with a built-in list of CAs they trust.
|
||||
|
||||
Common CAs include Let's Encrypt (free, automated), DigiCert, Sectigo, and your organization's internal/private CA. Each issues certificates through different protocols and APIs.
|
||||
|
||||
certctl includes a built-in **Local CA** that can operate in two modes: self-signed (default, for development and demos) or as a **subordinate CA** under an enterprise root like Active Directory Certificate Services (ADCS). In sub-CA mode, you load a CA certificate and key signed by your enterprise root, and all certificates certctl issues automatically chain to the enterprise trust hierarchy — no manual trust configuration needed on clients that already trust your enterprise root. certctl also integrates with **step-ca** (Smallstep's private CA) via its native /sign API, providing a lightweight alternative to ACME for internal PKI.
|
||||
|
||||
### ACME Protocol
|
||||
|
||||
ACME (Automatic Certificate Management Environment) is the protocol Let's Encrypt created for automated certificate issuance. Instead of filling out forms and waiting for emails, ACME lets software request, validate, and receive certificates programmatically. The server proves domain ownership by responding to challenges — placing a specific file on the web server (HTTP-01), creating a DNS record (DNS-01), or maintaining a standing DNS record that persists across renewals (DNS-PERSIST-01).
|
||||
|
||||
certctl speaks ACME natively with HTTP-01, DNS-01, and DNS-PERSIST-01 challenges, so it can request certificates — including wildcard certificates — from Let's Encrypt or any ACME-compatible CA without manual intervention. HTTP-01 uses a built-in temporary HTTP server for domain validation; DNS-01 uses pluggable script-based hooks to create TXT records with any DNS provider (Cloudflare, Route53, Azure DNS, etc.); DNS-PERSIST-01 creates a standing `_validation-persist` TXT record once (containing the CA domain and account URI) that the CA revalidates on every renewal — no per-renewal DNS updates needed. If the CA doesn't yet support DNS-PERSIST-01, certctl automatically falls back to DNS-01.
|
||||
|
||||
### EST Protocol (Enrollment over Secure Transport)
|
||||
|
||||
EST (RFC 7030) is a standard protocol for devices to request certificates from a CA. While ACME was designed for web servers proving domain ownership, EST was designed for devices that need certificates without domain validation — think WiFi access points, corporate laptops connecting to 802.1X networks, IoT devices, and mobile devices managed by MDM platforms.
|
||||
|
||||
The workflow is straightforward: a device generates a key pair and a Certificate Signing Request (CSR), sends the CSR to the EST server, and gets back a signed certificate. The EST server also distributes its CA certificate chain so devices can build a complete trust path.
|
||||
|
||||
certctl includes a built-in EST server at `/.well-known/est/` with four operations: distributing the CA certificate chain (`/cacerts`), enrolling new devices (`/simpleenroll`), renewing existing certificates (`/simplereenroll`), and advertising CSR requirements (`/csrattrs`). EST enrollment uses the same issuer connectors as the REST API — so a certificate issued via EST and a certificate issued via the dashboard go through the same CA, appear in the same inventory, and follow the same policies.
|
||||
|
||||
### Private Key
|
||||
|
||||
Every certificate has a corresponding private key. The certificate is public — anyone can see it. The private key is secret — it's what allows your server to decrypt traffic. If someone gets your private key, they can impersonate your server.
|
||||
|
||||
**This is why certctl's architecture is built around a critical rule: private keys never leave the server they were generated on.** The control plane orchestrates certificate issuance and tracks state, but it never sees or stores private keys. Keys are generated locally by agents running on your infrastructure.
|
||||
|
||||
### Subject Alternative Names (SANs)
|
||||
|
||||
A single certificate can cover multiple domain names. The primary domain is the Common Name (CN), and additional domains are listed as Subject Alternative Names. For example, one cert might cover `example.com`, `www.example.com`, and `api.example.com`. This reduces the number of certificates you need to manage.
|
||||
|
||||
### Certificate Chain
|
||||
|
||||
When a CA signs your certificate, the CA itself has a certificate, which was signed by a higher-level CA, all the way up to a **root CA** that browsers trust directly. This chain of trust — your cert, signed by an intermediate CA, signed by a root CA — is called the certificate chain. Servers need to present the full chain so clients can verify the entire trust path.
|
||||
|
||||
## How certctl Works
|
||||
|
||||
certctl has three main components that work together:
|
||||
|
||||
### The Control Plane (Server)
|
||||
|
||||
This is the brain. It's a REST API server backed by PostgreSQL that tracks every certificate in your organization: what domain it covers, when it expires, who owns it, which servers it's deployed to, and its full audit history. It runs a scheduler that automatically checks for expiring certificates and triggers renewal jobs.
|
||||
|
||||
The control plane never touches private keys. It coordinates the certificate lifecycle — "this cert needs renewal," "deploy this cert to these targets" — but the actual cryptographic operations happen elsewhere.
|
||||
|
||||
### Agents
|
||||
|
||||
Agents are lightweight processes that run on or near your infrastructure. They do the actual work: generating private keys, creating Certificate Signing Requests (CSRs), receiving signed certificates, and deploying them to target systems. An agent typically runs on the same machine as the target (e.g., your NGINX or IIS server), deploying certificates locally. For network appliances where you can't install an agent, a proxy agent in the same network zone handles deployment via the appliance's API.
|
||||
|
||||
The flow looks like this:
|
||||
|
||||
1. The scheduler on the control plane decides a certificate needs renewal
|
||||
2. The control plane creates a renewal job
|
||||
3. An agent picks up the job, generates a new private key locally, and sends a CSR (which contains only the public key) to the control plane
|
||||
4. The control plane submits the CSR to the CA and receives the signed certificate
|
||||
5. The control plane sends the signed certificate (public material only) back to the agent
|
||||
6. The agent deploys the certificate and private key to the target server
|
||||
7. The agent reports success back to the control plane
|
||||
|
||||
At no point does the private key leave the agent. This is a fundamental security property.
|
||||
|
||||
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
|
||||
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
Targets are the systems where certificates actually get installed — NGINX web servers, Apache httpd servers, HAProxy load balancers, Traefik reverse proxies, Caddy servers, Envoy gateways, Postfix/Dovecot mail servers, Microsoft IIS servers, and network appliances. Each target type has a **connector** that knows how to deploy certificates to that specific system (e.g., writing files and reloading NGINX or Apache config, building a combined PEM for HAProxy).
|
||||
|
||||
For targets where an agent runs directly on the machine (NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, Postfix, Dovecot, IIS), the agent deploys certificates locally — no remote access needed. For network appliances where you can't install an agent (F5 BIG-IP, Palo Alto, etc.), a **proxy agent** in the same network zone picks up the deployment job and calls the appliance's API. The server never initiates outbound connections to any target.
|
||||
|
||||
## The Certificate Lifecycle
|
||||
|
||||
Every managed certificate in certctl goes through these states:
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Pending: Certificate created
|
||||
Pending --> Active: Issuance succeeds
|
||||
Pending --> Failed: Issuance fails
|
||||
Active --> Expiring: Within renewal window
|
||||
Expiring --> RenewalInProgress: Auto-renewal triggered
|
||||
RenewalInProgress --> Active: Renewal succeeds
|
||||
RenewalInProgress --> Failed: Renewal fails
|
||||
Expiring --> Expired: Renewal not attempted / all retries exhausted
|
||||
Active --> Archived: Decommissioned
|
||||
Failed --> Pending: Retry requested
|
||||
```
|
||||
|
||||
- **Pending**: Certificate record created, awaiting initial issuance
|
||||
- **Active**: Certificate is valid and deployed, everything is healthy
|
||||
- **Expiring**: Certificate is within the renewal window (e.g., 30 days before expiry) — renewal will be triggered automatically
|
||||
- **Expired**: Certificate passed its expiration date without successful renewal — this is a problem
|
||||
- **Failed**: Something went wrong during issuance or renewal — needs investigation
|
||||
- **RenewalInProgress**: A renewal job is currently running
|
||||
- **Archived**: Certificate was decommissioned and soft-deleted
|
||||
|
||||
## Why Not Just Use Certbot?
|
||||
|
||||
Certbot is great for a single server. It runs on one machine, gets one certificate, and installs it locally. But it doesn't solve the organizational problem: who owns which certificates? When do they expire across the fleet? Which servers need updating? Did the deployment succeed everywhere? Who changed what, and when?
|
||||
|
||||
certctl is for organizations that need visibility, automation, and accountability across their certificate infrastructure. It's the difference between a spreadsheet and a database — both store data, but one scales.
|
||||
|
||||
## Key Concepts in certctl
|
||||
|
||||
### Teams and Owners
|
||||
|
||||
Every certificate belongs to a **team** and has an **owner**. This answers the question "whose problem is it when this cert expires?" In a large organization, the platform team might own infrastructure certs while the payments team owns payment gateway certs. Notifications are routed to the owner's email address automatically.
|
||||
|
||||
### Agent Groups
|
||||
|
||||
Agent groups let you organize agents by criteria — OS, architecture, IP subnet, or version — for dynamic policy scoping. For example, you can create a group matching all Linux agents and scope a renewal policy to that group. Groups can use dynamic matching criteria (agents automatically join when they match) or manual membership (explicitly include/exclude specific agents). Agent groups are managed via the GUI and API.
|
||||
|
||||
### Certificate Profiles
|
||||
|
||||
Certificate profiles define the cryptographic and lifecycle constraints for a class of certificates. A profile specifies which key types are allowed (e.g., RSA-2048, ECDSA P-256), the maximum validity period, and other enrollment rules. When a certificate is assigned to a profile, certctl enforces these constraints during issuance — if an agent submits a CSR with a disallowed key type, issuance is rejected.
|
||||
|
||||
Profiles answer the question "what kind of certificate is this?" while policies answer "is this certificate compliant?" A production TLS profile might allow only ECDSA P-256 with a 90-day max TTL, while a development profile might allow RSA-2048 with a 365-day TTL. Short-lived profiles (TTL under 1 hour) enable machine-to-machine authentication patterns where certificates are issued frequently and expire quickly — these are exempt from CRL/OCSP since expiry itself is sufficient revocation.
|
||||
|
||||
Profiles are managed via the API (`/api/v1/profiles`) and the GUI, and can be assigned to certificates during creation or updated later.
|
||||
|
||||
### Interactive Renewal Approval
|
||||
|
||||
For policies with `auto_renew` disabled, renewal jobs enter an **AwaitingApproval** state instead of processing immediately. An operator must explicitly approve or reject the renewal via the API or GUI. Approved jobs transition to Pending and are picked up by the scheduler. Rejected jobs are cancelled with an optional reason. This is useful for high-value certificates where you want human oversight before renewal.
|
||||
|
||||
### Renewal Timing: Thresholds vs. ARI (RFC 9773)
|
||||
|
||||
**Traditional approach (thresholds):** By default, certctl uses static renewal thresholds — renew a certificate at a fixed number of days before expiry (default: 30 days). This simple, predictable model works for most use cases: it avoids unnecessary renewals near expiry and gives you a predictable window to catch failures.
|
||||
|
||||
**Advanced approach (ACME ARI):** Some Certificate Authorities support ACME Renewal Information (RFC 9773), which allows the CA to tell certctl the optimal time to renew. Instead of guessing "renew 30 days before expiry," the CA responds with a precise `suggestedWindow` containing start and end times. This is useful when:
|
||||
- The CA is performing maintenance and wants to batch renewals in a specific window
|
||||
- The CA is coordinating a mass revocation (e.g., due to a compromise) and needs to control renewal timing
|
||||
- You want to avoid thundering herd renewal spikes by accepting the CA's suggested timing
|
||||
|
||||
**How it works:** Enable with `CERTCTL_ACME_ARI_ENABLED=true` on your ACME issuer. When a certificate approaches expiry, certctl queries the ARI endpoint with the certificate's DER encoding. The CA responds with a suggested renewal window. If the current time is within the window or past the start time, certctl renews immediately. Otherwise, it waits until the window opens.
|
||||
|
||||
**Graceful degradation:** If your CA doesn't support ARI (returns 404 from the ARI endpoint), certctl automatically falls back to the traditional threshold-based renewal. No configuration change needed — the fallback is transparent. Errors from the CA are logged as warnings and don't block the renewal process.
|
||||
|
||||
### Shorter Certificate Validity (45-Day and 6-Day Certs)
|
||||
|
||||
The industry is moving toward shorter certificate lifetimes. The CA/Browser Forum's SC-081v3 ballot mandates a phased reduction: 200-day max (March 2026), 100-day max (March 2027), and 47-day max (March 2029). Let's Encrypt has already begun reducing default validity to 45 days, and offers 6-day "shortlived" certificates via ACME profile selection.
|
||||
|
||||
certctl handles shorter-lived certificates correctly out of the box:
|
||||
|
||||
- **45-day certs** with the default 31-day renewal window trigger renewal at day 14 — at roughly 1/3 of the cert's lifetime.
|
||||
- **6-day "shortlived" certs** are always within the renewal window. ARI (RFC 9773) is the expected renewal path for these — the CA directs timing. Short-lived certs also skip CRL/OCSP since expiry is sufficient revocation (per profile TTL < 1 hour exemption).
|
||||
- **ACME profile selection** lets you request specific certificate profiles from your CA. Set `CERTCTL_ACME_PROFILE=shortlived` to get 6-day certificates from Let's Encrypt, or `CERTCTL_ACME_PROFILE=tlsserver` for standard TLS certificates.
|
||||
|
||||
### Certificate Revocation
|
||||
|
||||
When a private key is compromised, a certificate is superseded, or a service is decommissioned, you need to revoke the certificate immediately — not wait for it to expire. Revocation tells clients "stop trusting this certificate right now."
|
||||
|
||||
certctl implements revocation using three complementary mechanisms:
|
||||
|
||||
**Revocation API**: `POST /api/v1/certificates/{id}/revoke` marks a certificate as revoked in the inventory, records the revocation in a dedicated `certificate_revocations` table, notifies the issuing CA (best-effort — the revocation succeeds even if the CA is unreachable), creates an audit trail entry, and sends notifications. You can specify an RFC 5280 reason code (keyCompromise, superseded, cessationOfOperation, etc.) or let it default to "unspecified."
|
||||
|
||||
**Bulk Revocation** (Fleet-Level Incident Response): For large-scale incidents like CA compromise or team infrastructure decommissioning, `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria in a single operation. Filter by profile, owner, team, agent group, or issuer to target the affected certificate set. This is essential for incident response — instead of revoking certificates one-by-one, operators can revoke an entire fleet in minutes. Bulk revocation creates individual revocation jobs that reuse the existing revocation pipeline, ensuring every certificate is audited and notifications are sent.
|
||||
|
||||
**Certificate Revocation List (CRL)**: certctl serves DER-encoded X.509 CRLs per issuer at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5 wire format, RFC 8615 well-known namespace). The endpoint is unauthenticated so any relying party — browser, TLS client, hardware appliance — can fetch it without a certctl API key. The CRL is signed by the issuing CA's key and has 24-hour validity; clients can download it periodically to check revocation status offline. The response carries `Content-Type: application/pkix-crl`. The CRL is **pre-generated** by a scheduler-driven loop (`crlGenerationLoop`, default interval 1 hour, configurable via `CERTCTL_CRL_GENERATION_INTERVAL`) and persisted in the `crl_cache` table — HTTP fetches read from the cache rather than rebuilding per request, so a busy CA does not DOS itself at scale. Concurrent regeneration requests for the same issuer are coalesced via an in-tree singleflight gate.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder serving both forms RFC 6960 §A.1.1 defines: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (URL-path lookup, useful for ops curl-debugging) and `POST /.well-known/pki/ocsp/{issuer_id}` with a binary `application/ocsp-request` body (the form most production clients use — Firefox, OpenSSL `s_client -status`, cert-manager, Intune device-state validators). Both forms are unauthenticated and return signed OCSP responses (good, revoked, or unknown) with `Content-Type: application/ocsp-response`. OCSP responses are signed by a **dedicated per-issuer OCSP responder cert** (RFC 6960 §2.6 / §4.2.2.2) — NOT by the CA private key directly — that carries the `id-pkix-ocsp-nocheck` extension (RFC 6960 §4.2.2.2.1) so OCSP clients do not recursively check the responder cert's own revocation status. The responder cert auto-rotates within 7 days of expiry (configurable via `CERTCTL_OCSP_RESPONDER_ROTATION_GRACE`), letting the responder key live on disk or rotate frequently while the CA key stays cold. See [`crl-ocsp.md`](crl-ocsp.md) for endpoint examples (curl, OpenSSL, Firefox, Intune) and the responder cert lifecycle.
|
||||
|
||||
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
|
||||
|
||||
### Short-Lived Certificates
|
||||
|
||||
Short-lived certificates are certificates with a TTL under 1 hour, typically used for service-to-service authentication in microservice architectures. Instead of revoking these certificates when something goes wrong, you simply stop issuing new ones — the existing certificates expire within minutes.
|
||||
|
||||
certctl provides a dedicated dashboard view for short-lived credentials that shows active certificates with live TTL countdowns, auto-refreshes every 10 seconds, and filters by profile. This gives ops teams real-time visibility into ephemeral credential activity without cluttering the main certificate inventory.
|
||||
|
||||
Short-lived certificates are defined by their profile — assign a certificate to a profile with `max_validity_days` that translates to under 1 hour, and certctl automatically treats it as short-lived: no CRL/OCSP entries, no revocation overhead, just rapid issuance and natural expiry.
|
||||
|
||||
### Policies
|
||||
|
||||
Policies are guardrails. You can enforce rules like "production certificates must use specific issuers," "all certificates must have an owner," or "certificate lifetime cannot exceed 90 days." When a certificate violates a policy, certctl flags it with a policy violation so you can take action.
|
||||
|
||||
### Jobs
|
||||
|
||||
Every action in certctl — issuing a certificate, renewing one, deploying to a target — is tracked as a **job**. Jobs have states (Pending, AwaitingCSR, AwaitingApproval, Running, Completed, Failed, Cancelled), retry logic, and a full audit trail. AwaitingCSR means the job is waiting for an agent to generate a key and submit a CSR. AwaitingApproval means the job requires human approval before proceeding (used with non-auto-renew policies). If a deployment fails, you can see exactly what happened and when.
|
||||
|
||||
### Audit Trail
|
||||
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for compliance (SOC 2, PCI-DSS, ISO 27001) and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
|
||||
### Notifications
|
||||
|
||||
certctl can alert you when certificates are expiring, when renewals fail, when deployments succeed, or when policy violations are detected. Notifications are delivered via six channels: Email, Webhook, Slack, Microsoft Teams, PagerDuty, and OpsGenie. Each notifier is configured independently via environment variables and can be enabled or disabled as needed.
|
||||
|
||||
### CLI
|
||||
|
||||
certctl ships with a command-line tool (`certctl-cli`) for operators who prefer terminal workflows or need to integrate certctl into shell scripts and CI/CD pipelines. The CLI wraps the REST API with 12 subcommands organized by resource: `certs list`, `certs get`, `certs renew`, `certs revoke`, `agents list`, `agents get`, `jobs list`, `jobs get`, `jobs cancel`, `import` (bulk PEM import), `status` (health + summary stats), and `version`.
|
||||
|
||||
The CLI supports both table and JSON output formats (`--format table` or `--format json`), connects to the server via `CERTCTL_SERVER_URL` and authenticates with `CERTCTL_API_KEY`. It's built with Go's standard library only — no external dependencies.
|
||||
|
||||
### MCP Server (AI Integration)
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes the entire REST API as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.
|
||||
|
||||
### EST Enrollment (Device Certificates)
|
||||
|
||||
certctl's EST server enables device certificate enrollment for use cases that don't fit the traditional "ops team requests a cert via API" model. When a RADIUS server is configured to use certctl for 802.1X WiFi authentication, or an MDM platform enrolls corporate devices, they use the EST protocol at `/.well-known/est/`. The EST server validates the CSR, issues a certificate via the configured issuer connector, and returns it in PKCS#7 format — the standard wire format that every EST client understands. Each enrollment is recorded in the audit trail with the protocol, common name, SANs, issuer, and serial number.
|
||||
|
||||
Enable it with `CERTCTL_EST_ENABLED=true`. Optionally bind enrollments to a specific issuer (`CERTCTL_EST_ISSUER_ID`) or certificate profile (`CERTCTL_EST_PROFILE_ID`) to constrain what EST clients can request.
|
||||
|
||||
### Certificate Discovery
|
||||
|
||||
Certificate discovery is the process of automatically finding existing certificates in your infrastructure — certificates you didn't issue through certctl, possibly issued by other CAs or tools. This is essential for building a complete inventory before you can manage everything.
|
||||
|
||||
**How it works:** There are two discovery modes. *Filesystem discovery* — agents scan configured directories (configured via `CERTCTL_DISCOVERY_DIRS`) for certificate files. On startup and every 6 hours, the agent walks directories recursively, parses PEM and DER files, extracts metadata, and reports findings to the control plane. *Network discovery* — the control plane itself probes TLS endpoints across configured CIDR ranges and ports (enabled via `CERTCTL_NETWORK_SCAN_ENABLED=true`). It connects to each endpoint, extracts certificates from the TLS handshake, and feeds results into the same discovery pipeline. This finds certificates on services you may not have agents on. In both cases, the server deduplicates by fingerprint and stores discovered certs with a status: **Unmanaged** (discovered but not yet managed), **Managed** (linked to a control plane cert), or **Dismissed** (operator decided not to manage it).
|
||||
|
||||
This gives you a three-step triage workflow:
|
||||
1. **Discover** — Agents scan filesystems and the server probes network endpoints to find all existing certs
|
||||
2. **Triage** — Operators review discoveries in the **Discovery** dashboard page and decide: claim it (link to a managed certificate) or dismiss it (not worth managing). The dashboard shows a summary stats bar (Unmanaged/Managed/Dismissed counts), filters by status and agent, and provides one-click claim and dismiss actions.
|
||||
3. **Baseline** — Once triaged, you have a complete baseline of what's deployed, what you're managing, and what's unmanaged
|
||||
|
||||
Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.
|
||||
|
||||
This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
|
||||
|
||||
### Observability
|
||||
|
||||
certctl exposes metrics in two formats: a JSON endpoint at `GET /api/v1/metrics` and a Prometheus exposition format at `GET /api/v1/metrics/prometheus` (compatible with Prometheus, Grafana Agent, Datadog Agent, and Victoria Metrics). Both provide gauges (certificate totals by status, agent counts, pending jobs), counters (completed/failed jobs), and uptime. Five stats endpoints power the dashboard charts: summary statistics, certificates by status, expiration timeline, job trends, and issuance rate.
|
||||
|
||||
The agent fleet overview page groups agents by OS, architecture, and version, showing distribution charts that help ops teams track fleet health and identify outdated agents. All API requests are logged via structured `slog` middleware with request IDs for correlation.
|
||||
|
||||
## What's Next
|
||||
|
||||
Now that you understand the concepts, head to the [Quick Start Guide](quickstart.md) to get certctl running locally in under 5 minutes. You'll see a pre-loaded dashboard with demo certificates, explore the API, and understand how everything fits together.
|
||||
|
||||
For a deeper look at the system design, see the [Architecture Guide](architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the [MCP Server Guide](mcp.md). For the full API reference, see the [OpenAPI Spec Guide](openapi.md).
|
||||
@@ -0,0 +1,120 @@
|
||||
# Deployment Examples
|
||||
|
||||
Five turnkey docker-compose scenarios, each runnable in under 5 minutes. Pick the one closest to your setup.
|
||||
|
||||
## Which Example Should I Use?
|
||||
|
||||
| I need to... | Example | Issuer | Target |
|
||||
|--------------|---------|--------|--------|
|
||||
| Get Let's Encrypt certs for NGINX on a public server | [ACME + NGINX](#acme--nginx) | ACME (HTTP-01) | NGINX |
|
||||
| Issue wildcard certs without opening port 80 | [Wildcard DNS-01](#wildcard-dns-01) | ACME (DNS-01) | Any |
|
||||
| Run an internal CA for services behind a firewall | [Private CA + Traefik](#private-ca--traefik) | Local CA | Traefik |
|
||||
| Use Smallstep step-ca as my PKI backend | [step-ca + HAProxy](#step-ca--haproxy) | step-ca | HAProxy |
|
||||
| Manage both public and internal certs from one dashboard | [Multi-Issuer](#multi-issuer) | ACME + Local CA | Mixed |
|
||||
|
||||
**Already using another tool?** See the migration sections below each example for Certbot, acme.sh, and cert-manager users.
|
||||
|
||||
---
|
||||
|
||||
## ACME + NGINX
|
||||
|
||||
**Scenario:** You have one or more public-facing domains, NGINX as the reverse proxy, and want automated Let's Encrypt certificates with HTTP-01 challenges.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + NGINX, all on one Docker network. The agent generates keys locally (ECDSA P-256), submits CSRs to the server, receives signed certs from Let's Encrypt, and deploys them to NGINX with automatic reload.
|
||||
|
||||
**Prerequisites:** A domain pointing to your server, ports 80 and 443 open, Docker Compose v20.10+.
|
||||
|
||||
```bash
|
||||
cd examples/acme-nginx
|
||||
cp .env.example .env # Edit with your domain and email
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including how HTTP-01 challenges work, adding multiple domains, switching to staging for testing, and a production checklist — is in the [example README](../examples/acme-nginx/acme-nginx.md).
|
||||
|
||||
**Migrating from Certbot?** certctl discovers your existing `/etc/letsencrypt/live/` certificates automatically. You keep your ACME account, disable the Certbot cron, and certctl takes over renewal with centralized visibility and deployment verification. The step-by-step process is in [Migrating from Certbot](migrate-from-certbot.md).
|
||||
|
||||
---
|
||||
|
||||
## Wildcard DNS-01
|
||||
|
||||
**Scenario:** You need wildcard certificates (`*.example.com`) or your servers aren't reachable from the internet (no port 80). DNS-01 validates ownership by creating a TXT record at your DNS provider.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent. Includes a Cloudflare DNS hook script as a working reference — swap in your own DNS provider (Route53, Azure DNS, Google Cloud DNS, or any provider with an API).
|
||||
|
||||
**Prerequisites:** A domain, API credentials for your DNS provider, Docker Compose.
|
||||
|
||||
```bash
|
||||
cd examples/acme-wildcard-dns01
|
||||
cp .env.example .env # Edit with domain, email, DNS provider credentials
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including DNS-PERSIST-01 (set a TXT record once, never touch DNS again on renewals), adapting scripts for other providers, and propagation troubleshooting — is in the [example README](../examples/acme-wildcard-dns01/acme-wildcard-dns01.md).
|
||||
|
||||
**Migrating from acme.sh?** Your existing `dns_*` hook scripts are compatible with certctl's DNS-01 — they use the same pattern (shell scripts creating TXT records). The migration guide covers script adaptation, discovery of existing acme.sh certificates, and phasing out the acme.sh cron. See [Migrating from acme.sh](migrate-from-acmesh.md).
|
||||
|
||||
---
|
||||
|
||||
## Private CA + Traefik
|
||||
|
||||
**Scenario:** Internal services that don't need public CA validation. You run your own certificate authority — either a self-signed root for development, or a subordinate CA chained to your enterprise root (e.g., Active Directory Certificate Services).
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + Traefik. The Local CA issuer signs certificates directly. Traefik watches a cert directory and auto-reloads when new files appear.
|
||||
|
||||
**Prerequisites:** Docker Compose. For sub-CA mode, you'll need a CA certificate and key signed by your enterprise root.
|
||||
|
||||
```bash
|
||||
cd examples/private-ca-traefik
|
||||
docker compose up -d # Self-signed mode (no .env needed for demo)
|
||||
```
|
||||
|
||||
The full walkthrough — including sub-CA setup with `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH`, creating certificates via the API, monitoring deployments, and production hardening — is in the [example README](../examples/private-ca-traefik/private-ca-traefik.md).
|
||||
|
||||
---
|
||||
|
||||
## step-ca + HAProxy
|
||||
|
||||
**Scenario:** You use Smallstep's step-ca as your private PKI and want automated lifecycle management for certificates deployed to HAProxy load balancers.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent + step-ca (with JWK provisioner) + HAProxy. certctl issues certs via step-ca's native `/sign` API, combines them into HAProxy's expected PEM format (cert + chain + key in one file), and reloads HAProxy.
|
||||
|
||||
**Prerequisites:** Docker Compose.
|
||||
|
||||
```bash
|
||||
cd examples/step-ca-haproxy
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including step-ca provisioner configuration, integrating with an existing step-ca instance, HAProxy PEM format details, and advanced features (approval workflows, policy-based renewal, multi-instance HAProxy) — is in the [example README](../examples/step-ca-haproxy/step-ca-haproxy.md).
|
||||
|
||||
---
|
||||
|
||||
## Multi-Issuer
|
||||
|
||||
**Scenario:** You manage both public-facing services (needing Let's Encrypt or another public CA) and internal services (using a private CA) and want a single dashboard for everything.
|
||||
|
||||
**What it deploys:** certctl server + PostgreSQL + certctl agent configured with both an ACME issuer and a Local CA issuer. Demonstrates issuer assignment via profiles — public services get ACME certs, internal services get Local CA certs, all visible in one inventory.
|
||||
|
||||
**Prerequisites:** Docker Compose. For real ACME certs, a public domain and port 80 access.
|
||||
|
||||
```bash
|
||||
cd examples/multi-issuer
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The full walkthrough — including profile-based issuer assignment, testing with ACME staging, Local CA enterprise sub-CA mode, and scaling beyond Docker Compose — is in the [example README](../examples/multi-issuer/multi-issuer.md).
|
||||
|
||||
**Using cert-manager for Kubernetes?** certctl complements cert-manager — cert-manager handles in-cluster certs, certctl handles everything outside: VMs, bare metal, network appliances, Windows servers. They can share the same CA (ACME, step-ca, Vault PKI). See [certctl for cert-manager Users](certctl-for-cert-manager-users.md).
|
||||
|
||||
---
|
||||
|
||||
## Beyond These Examples
|
||||
|
||||
These 5 scenarios cover the most common deployment patterns, but certctl supports a broader set of issuer and target backends — see `docs/features.md`'s Issuer Connectors and Target Connectors sections for the live catalogs (rebuild via `ls -d internal/connector/issuer/*/ | wc -l` and `ls -d internal/connector/target/*/ | wc -l`). Once you have the basics running, you can mix and match:
|
||||
|
||||
**Issuers:** ACME (Let's Encrypt, ZeroSSL, Buypass, Google Trust Services), Local CA (self-signed or sub-CA), step-ca, Vault PKI, DigiCert CertCentral, OpenSSL/Custom CA script, Sectigo (coming soon).
|
||||
|
||||
**Targets:** NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell or WinRM proxy), Postfix, Dovecot, F5 BIG-IP (coming soon).
|
||||
|
||||
See [Connector Reference](connectors.md) for configuration details on every issuer and target.
|
||||
@@ -0,0 +1,502 @@
|
||||
# Quick Start Guide
|
||||
|
||||
Certificate lifespans are dropping to **47 days by 2029**. At that cadence, a team managing 100 certificates is processing 7+ renewals per week — every week, forever. Manual processes break. certctl automates the entire lifecycle: issuance, renewal, deployment, revocation, and audit — with zero human intervention.
|
||||
|
||||
This guide gets you running in 5 minutes and walks you through everything certctl does.
|
||||
|
||||
New to certificates? Read the [Concepts Guide](concepts.md) first — it explains TLS, CAs, and private keys in plain language.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Prerequisites](#prerequisites)
|
||||
2. [Start Everything](#start-everything)
|
||||
3. [Open the Dashboard](#open-the-dashboard)
|
||||
4. [Explore the API](#explore-the-api)
|
||||
- [Core operations](#core-operations)
|
||||
- [Sorting, filtering, and pagination](#sorting-filtering-and-pagination)
|
||||
- [Stats and metrics](#stats-and-metrics)
|
||||
5. [Create Your First Certificate](#create-your-first-certificate)
|
||||
- [Revoke a certificate](#revoke-a-certificate)
|
||||
- [Interactive approval workflow](#interactive-approval-workflow)
|
||||
6. [Certificate Discovery](#certificate-discovery)
|
||||
- [Filesystem discovery (agent-based)](#filesystem-discovery-agent-based)
|
||||
- [Network discovery (agentless)](#network-discovery-agentless)
|
||||
- [Triage discovered certificates](#triage-discovered-certificates)
|
||||
7. [CLI Tool](#cli-tool)
|
||||
8. [MCP Server (AI Integration)](#mcp-server-ai-integration)
|
||||
9. [Demo Data Reference](#demo-data-reference)
|
||||
10. [Dashboard Demo Mode](#dashboard-demo-mode)
|
||||
11. [Presenting to Stakeholders](#presenting-to-stakeholders)
|
||||
12. [Tear Down](#tear-down)
|
||||
13. [What's Next](#whats-next)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
You need **Docker** and **Docker Compose** installed. That's it.
|
||||
|
||||
On macOS:
|
||||
```bash
|
||||
brew install --cask docker
|
||||
```
|
||||
|
||||
On Linux, follow the official Docker install guide for your distribution.
|
||||
|
||||
## Start Everything
|
||||
|
||||
### Docker Compose (Quick Start)
|
||||
|
||||
```bash
|
||||
git clone https://github.com/certctl-io/certctl.git
|
||||
cd certctl
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
The `--build` flag builds the server image including the React frontend. Without it, Docker may use a stale cached image.
|
||||
|
||||
**For production deployments**, copy `deploy/.env.example` to `deploy/.env` and customize the credentials:
|
||||
```bash
|
||||
cp deploy/.env.example deploy/.env
|
||||
# Edit deploy/.env to set secure POSTGRES_PASSWORD and CERTCTL_API_KEY values
|
||||
docker compose -f deploy/docker-compose.yml up -d --build
|
||||
```
|
||||
|
||||
> **Warning:** Edit `POSTGRES_PASSWORD` *before* the very first `docker compose up`. Postgres seeds the password into its data directory only on first boot of an empty volume — after that, the password is baked into `pg_authid` and the env var is ignored. If you boot once with the default and later change `POSTGRES_PASSWORD` in `.env`, the certctl-server container picks up the new value but postgres still authenticates against the old one, and the server logs `pq: password authentication failed for user "certctl"` (SQLSTATE 28P01). Two ways out: tear down the volume with `docker compose -f deploy/docker-compose.yml down -v` (this **deletes all data**) and bring up fresh, or rotate non-destructively with `docker compose -f deploy/docker-compose.yml exec postgres psql -U certctl -c "ALTER ROLE certctl PASSWORD '<new>';"` and then restart certctl-server with the matching `POSTGRES_PASSWORD`.
|
||||
|
||||
### Docker Compose Environments
|
||||
|
||||
The `deploy/` directory contains four compose files for different use cases:
|
||||
|
||||
| File | Purpose | How to run |
|
||||
|------|---------|------------|
|
||||
| `docker-compose.yml` | **Base platform.** PostgreSQL + certctl server + agent. Clean dashboard with onboarding wizard — use this for production or first-time setup. | `docker compose -f deploy/docker-compose.yml up --build` |
|
||||
| `docker-compose.demo.yml` | **Demo data override.** Layers 180 days of realistic seed data (15 certs, 5 agents, multiple issuers) onto the base. Dashboard charts and tables look populated on first boot. | `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up --build` |
|
||||
| `docker-compose.dev.yml` | **Development override.** Adds PgAdmin (port 5050), debug-level logging, and a Delve debugger port (40000) for the server. | `docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.dev.yml up --build` |
|
||||
| `docker-compose.test.yml` | **Integration test environment.** 7 containers on a static-IP subnet: PostgreSQL, certctl server+agent, step-ca, Pebble ACME server, challenge test server, and NGINX. Runs the full issuance→deployment→verification flow against real CA backends. Standalone — does not combine with the base file. | `docker compose -f deploy/docker-compose.test.yml up --build` |
|
||||
|
||||
Override files are layered onto the base with multiple `-f` flags. The test environment is self-contained and runs independently. To reset any environment's data, add `down -v` to remove volumes.
|
||||
|
||||
For a deep dive into every service, environment variable, and networking decision, see the [Docker Compose Environments Guide](../deploy/ENVIRONMENTS.md).
|
||||
|
||||
### Kubernetes with Helm
|
||||
|
||||
For production deployments on Kubernetes, use the Helm chart:
|
||||
|
||||
```bash
|
||||
helm install certctl deploy/helm/certctl/ \
|
||||
--create-namespace --namespace certctl \
|
||||
--set server.auth.apiKey="your-secure-api-key" \
|
||||
--set postgresql.auth.password="your-db-password" \
|
||||
--set ingress.enabled=true \
|
||||
--set ingress.hosts[0].host="certctl.example.com" \
|
||||
--set ingress.hosts[0].tls=true
|
||||
```
|
||||
|
||||
The chart includes: server Deployment (with configurable replicas, health probes, security context), PostgreSQL StatefulSet with persistent volumes, agent DaemonSet (one agent per infrastructure node), optional Ingress with TLS, and ServiceAccount with RBAC. All certctl configuration options are exposed in `values.yaml` — customize issuer settings, target connectors, scheduler intervals, and notifier credentials there.
|
||||
|
||||
Wait about 30 seconds for PostgreSQL to initialize, then verify:
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml ps
|
||||
```
|
||||
|
||||
You should see:
|
||||
```
|
||||
NAME STATUS
|
||||
certctl-postgres Up (healthy)
|
||||
certctl-server Up (healthy)
|
||||
certctl-agent Up
|
||||
```
|
||||
|
||||
The control plane is HTTPS-only as of v2.2. The `certctl-tls-init` init container in the shipped `deploy/docker-compose.yml` self-signs a cert on first boot and drops it into a named volume. Extract the CA bundle once and reuse it for every API call in this guide:
|
||||
|
||||
```bash
|
||||
export CA=/tmp/certctl-ca.crt
|
||||
docker compose -f deploy/docker-compose.yml exec -T certctl-server \
|
||||
cat /etc/certctl/tls/ca.crt > "$CA"
|
||||
|
||||
curl --cacert "$CA" https://localhost:8443/health
|
||||
```
|
||||
```json
|
||||
{"status":"healthy"}
|
||||
```
|
||||
|
||||
If you're bringing your own cert (internal CA, cert-manager, operator-supplied Secret), see [`docs/tls.md`](tls.md) for the full provisioning matrix. If you're cutting over an existing install, see [`docs/upgrade-to-tls.md`](upgrade-to-tls.md) for the failure modes (out-of-date `http://…` agents fail at the TLS handshake) and the one-step procedure.
|
||||
|
||||
## Open the Dashboard
|
||||
|
||||
Open **https://localhost:8443** in your browser. Your browser will warn about the self-signed cert — that's expected for the demo bootstrap. Trust the CA bundle you just exported, or click through the warning.
|
||||
|
||||
> **Note:** The Docker Compose demo runs with authentication disabled (`CERTCTL_AUTH_TYPE=none`) so you can explore immediately. For production, set `CERTCTL_AUTH_TYPE=api-key` and `CERTCTL_AUTH_SECRET=<your-secret>` in your environment, then pass `Authorization: Bearer <your-secret>` on all API requests. The dashboard will prompt for your API key on first load.
|
||||
>
|
||||
> **Key rotation:** `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `CERTCTL_AUTH_SECRET=new-key,old-key`). Both keys are valid simultaneously, enabling zero-downtime rotation: add the new key, roll clients over, then remove the old key.
|
||||
|
||||
The dashboard comes pre-loaded with 35 demo certificates across 5 issuers, 8 agents, and 90 days of job history — expiring certs, expired certs, active certs, failed renewals, revocations, discovery scans, and approval workflows. A realistic snapshot of what certificate management looks like in a real organization.
|
||||
|
||||
### What you're looking at
|
||||
|
||||
The main dashboard shows total certificates, how many are expiring soon, how many have expired, the renewal success rate, and four charts: an **expiration heatmap** (90-day weekly buckets), **renewal success rate trends** (30-day line chart), **certificate status distribution** (donut chart), and **issuance rate** (30-day bar chart).
|
||||
|
||||
Explore the sidebar: Certificates, Agents, Policies, Jobs, Audit Trail, Notifications, Profiles, Teams, Owners, Agent Groups, Fleet Overview, Short-Lived Credentials, Discovery, and Network Scans.
|
||||
|
||||
### Scenarios to walk through
|
||||
|
||||
**"We're about to have an outage"** — Filter certificates by status → Expiring. You'll see `auth-production` (12 days), `cdn-production` (8 days), and `mail-production` (5 days). At 47-day lifespans, this is every other week. certctl catches these automatically and triggers renewal before they expire.
|
||||
|
||||
**"A renewal failed"** — Look at `vpn-production` — status: Failed. Click it to see the audit trail showing the ACME challenge failure after 3 retry attempts. The system sent a webhook notification to the ops channel. No one had to notice manually.
|
||||
|
||||
**"Who owns this cert?"** — Click any certificate. Owner, team, environment, tags. Clear accountability. Notifications route to the owner's email automatically.
|
||||
|
||||
**"Can I revoke a compromised cert?"** — Click any active certificate, then "Revoke." A modal with RFC 5280 reason codes (Key Compromise, Superseded, Cessation of Operation). After revocation, CRL and OCSP are served automatically — clients stop trusting the cert immediately.
|
||||
|
||||
**"What about certificates already in production?"** — Click "Discovery" in the sidebar. The demo comes pre-loaded with 9 discovered certificates — some found by agents scanning filesystems, some found by the server probing TLS endpoints on the network. You'll see Unmanaged certs waiting for triage (including an expired printer cert and an expiring switch management cert), certs already linked to managed inventory, and one that was dismissed. Claim unmanaged certs to bring them under automation, or dismiss them. Click "Network Scans" to see the 3 configured scan targets with recent scan results.
|
||||
|
||||
**"I need to approve a renewal before it proceeds"** — Click "Jobs" in the sidebar. You'll see an amber banner: "2 jobs awaiting approval." These are renewal jobs for `auth-production` and `payments-production` that require human sign-off before proceeding. Click Approve or Reject with a reason — the decision is recorded in the audit trail.
|
||||
|
||||
**"Show me the agent fleet"** — Click "Agents." Eight agents across Linux, macOS, and Windows platforms—most online, showing OS, architecture, IP, and version metadata. A ninth entry (server-scanner) is the sentinel agent used for network certificate discovery. Click "Fleet Overview" for OS/architecture grouping, version distribution, and per-platform listing. Agents generate ECDSA P-256 keys locally — private keys never leave your infrastructure.
|
||||
|
||||
**"What about bulk operations?"** — On the Certificates page, select multiple certificates with checkboxes. A bulk action bar appears: trigger renewal, revoke with reason codes, or reassign ownership — all with progress tracking. At 47-day lifespans with hundreds of certs, bulk operations aren't optional.
|
||||
|
||||
**"Short-lived credentials?"** — Click "Short-Lived" in the sidebar. Live countdown timers for certificates with TTL under 1 hour. Auto-refresh every 10 seconds. These are for service-to-service auth where rapid expiry replaces revocation.
|
||||
|
||||
## Explore the API
|
||||
|
||||
Everything you see in the dashboard is backed by the REST API. All endpoints live under `/api/v1/` and return JSON.
|
||||
|
||||
### Core operations
|
||||
|
||||
Every request below uses `--cacert "$CA"` to pin the self-signed CA bundle extracted above. In production, point `$CA` at your internal CA root or the bundle you distributed to the fleet.
|
||||
|
||||
```bash
|
||||
# List all certificates
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates | jq .
|
||||
|
||||
# Filter by status
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?status=Expiring" | jq .
|
||||
|
||||
# Filter by environment
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?environment=production" | jq .
|
||||
|
||||
# Get a specific certificate
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/mc-api-prod | jq .
|
||||
|
||||
# Get deployment targets for a certificate
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/mc-api-prod/deployments | jq .
|
||||
|
||||
# List agents
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agents | jq .
|
||||
|
||||
# Check agent pending work
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agents/ag-web-prod/work | jq .
|
||||
|
||||
# View audit trail
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/audit | jq .
|
||||
|
||||
# View policies and violations
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/policies | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/policies/pr-require-owner/violations | jq .
|
||||
|
||||
# Notifications
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/notifications | jq .
|
||||
|
||||
# Profiles and agent groups
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/profiles | jq .
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/agent-groups | jq .
|
||||
```
|
||||
|
||||
### Sorting, filtering, and pagination
|
||||
|
||||
```bash
|
||||
# Sort by expiration date (ascending)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?sort=notAfter" | jq .
|
||||
|
||||
# Sort descending (prefix with -)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?sort=-createdAt" | jq .
|
||||
|
||||
# Time-range filters (RFC3339)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?expires_before=2026-05-01T00:00:00Z" | jq .
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?created_after=2026-03-01T00:00:00Z" | jq .
|
||||
|
||||
# Sparse fields — request only what you need
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?fields=id,common_name,status,expires_at" | jq .
|
||||
|
||||
# Cursor pagination — efficient for large inventories
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?page_size=5" | jq '{next_cursor: .next_cursor, count: (.data | length)}'
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/certificates?cursor=<next_cursor_value>&page_size=5" | jq .
|
||||
```
|
||||
|
||||
Supported sort fields: `notAfter`, `expiresAt`, `createdAt`, `updatedAt`, `commonName`, `name`, `status`, `environment`.
|
||||
|
||||
### Stats and metrics
|
||||
|
||||
```bash
|
||||
# Dashboard summary
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/stats/summary | jq .
|
||||
|
||||
# Certificates by status
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/stats/certificates-by-status | jq .
|
||||
|
||||
# Expiration timeline (next 90 days)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/stats/expiration-timeline?days=90" | jq .
|
||||
|
||||
# Job trends (last 30 days)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/stats/job-trends?days=30" | jq .
|
||||
|
||||
# JSON metrics
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/metrics | jq .
|
||||
|
||||
# Prometheus format (for Prometheus, Grafana Agent, Datadog)
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/metrics/prometheus
|
||||
```
|
||||
|
||||
## Create Your First Certificate
|
||||
|
||||
Create a certificate record that certctl will track, renew, and deploy automatically.
|
||||
|
||||
```bash
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "My First Certificate",
|
||||
"common_name": "myapp.example.com",
|
||||
"sans": ["myapp.example.com", "www.myapp.example.com"],
|
||||
"environment": "staging",
|
||||
"owner_id": "o-alice",
|
||||
"team_id": "t-platform",
|
||||
"issuer_id": "iss-local",
|
||||
"renewal_policy_id": "rp-default",
|
||||
"status": "Pending",
|
||||
"tags": {"purpose": "quickstart-demo"}
|
||||
}' | jq .
|
||||
```
|
||||
|
||||
Save the certificate ID (or provide your own `id` in the request body, e.g. `"id": "mc-my-first"`):
|
||||
```bash
|
||||
CERT_ID="<paste the id from the response>"
|
||||
```
|
||||
|
||||
Trigger renewal:
|
||||
```bash
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates/$CERT_ID/renew | jq .
|
||||
```
|
||||
|
||||
Check the result:
|
||||
```bash
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/certificates/$CERT_ID | jq .
|
||||
```
|
||||
|
||||
Refresh the dashboard at https://localhost:8443 — your new certificate appears in the inventory.
|
||||
|
||||
### Revoke a certificate
|
||||
|
||||
When a private key is compromised or a service is decommissioned:
|
||||
|
||||
```bash
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/certificates/$CERT_ID/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "superseded"}' | jq .
|
||||
```
|
||||
|
||||
Supported RFC 5280 reason codes: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn`.
|
||||
|
||||
Confirm via the unauthenticated DER CRL (RFC 5280 §5, RFC 8615):
|
||||
```bash
|
||||
# Fetch the CRL without any API key — relying parties shouldn't need one.
|
||||
# The CRL path is unauthenticated, but it's still served over TLS.
|
||||
curl --cacert "$CA" -s https://localhost:8443/.well-known/pki/crl/iss-local -o /tmp/crl.der
|
||||
openssl crl -inform der -in /tmp/crl.der -noout -text | head -40
|
||||
```
|
||||
|
||||
### Interactive approval workflow
|
||||
|
||||
For high-value certificates where you want human oversight. The demo includes 2 pre-seeded jobs in `AwaitingApproval` status (for `auth-production` and `payments-production`). Open **Jobs** in the sidebar and you'll see the amber "Pending Approval" banner immediately.
|
||||
|
||||
```bash
|
||||
# List jobs awaiting approval (demo includes 2)
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/jobs?status=AwaitingApproval" | jq '.data[] | {id, certificate_id, status}'
|
||||
|
||||
# Approve a pending job
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Approved for production deployment"}' | jq .
|
||||
|
||||
# Reject a pending job
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
```
|
||||
|
||||
## Certificate Discovery
|
||||
|
||||
Find certificates already running in your infrastructure — ones you didn't issue through certctl.
|
||||
|
||||
The demo environment comes pre-loaded with 9 discovered certificates (from agent filesystem scans and server-side network scans), 3 network scan targets, and recent scan history. Open **Discovery** and **Network Scans** in the sidebar to see the triage workflow immediately.
|
||||
|
||||
### Filesystem discovery (agent-based)
|
||||
|
||||
```bash
|
||||
# Configure agent to scan directories
|
||||
export CERTCTL_DISCOVERY_DIRS="/etc/nginx/certs,/etc/ssl/certs,/var/lib/certs"
|
||||
# Agent scans on startup + every 6 hours
|
||||
```
|
||||
|
||||
### Network discovery (agentless)
|
||||
|
||||
```bash
|
||||
# Enable network scanning
|
||||
export CERTCTL_NETWORK_SCAN_ENABLED=true
|
||||
|
||||
# Create a scan target
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/network-scan-targets \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "Internal Network",
|
||||
"cidrs": ["10.0.1.0/24"],
|
||||
"ports": [443, 8443],
|
||||
"enabled": true,
|
||||
"scan_interval_hours": 6,
|
||||
"timeout_ms": 5000
|
||||
}' | jq .
|
||||
|
||||
# Trigger an immediate scan
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/network-scan-targets/nst-internal-network/scan | jq .
|
||||
```
|
||||
|
||||
### Triage discovered certificates
|
||||
|
||||
```bash
|
||||
# List discovered certs
|
||||
curl --cacert "$CA" -s "https://localhost:8443/api/v1/discovered-certificates?agent_id=agent-nginx-prod" | jq .
|
||||
|
||||
# Summary counts
|
||||
curl --cacert "$CA" -s https://localhost:8443/api/v1/discovery-summary | jq .
|
||||
|
||||
# Claim a discovered cert (bring under management)
|
||||
curl --cacert "$CA" -s -X POST "https://localhost:8443/api/v1/discovered-certificates/DISCOVERY_ID/claim" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"managed_certificate_id": "mc-api-prod"}' | jq .
|
||||
```
|
||||
|
||||
## CLI Tool
|
||||
|
||||
```bash
|
||||
cd cmd/cli && go build -o certctl-cli .
|
||||
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # or pass --ca-bundle; --insecure for dev self-signed
|
||||
|
||||
./certctl-cli certs list # List certificates
|
||||
./certctl-cli certs get mc-api-prod # Certificate details
|
||||
./certctl-cli certs renew mc-api-prod # Trigger renewal
|
||||
./certctl-cli certs revoke mc-api-prod --reason keyCompromise
|
||||
./certctl-cli agents list # List agents
|
||||
./certctl-cli jobs list # List jobs
|
||||
./certctl-cli import /path/to/certs.pem # Bulk import
|
||||
./certctl-cli status # Health + stats
|
||||
```
|
||||
|
||||
## Scheduled Certificate Digest Emails
|
||||
|
||||
Enable automatic HTML digest emails with certificate stats, expiration timeline, and job health:
|
||||
|
||||
```bash
|
||||
# Set SMTP configuration
|
||||
export CERTCTL_SMTP_HOST=smtp.gmail.com
|
||||
export CERTCTL_SMTP_PORT=587
|
||||
export CERTCTL_SMTP_USERNAME=admin@example.com
|
||||
export CERTCTL_SMTP_PASSWORD=your-app-password
|
||||
export CERTCTL_SMTP_FROM_ADDRESS=certctl@example.com
|
||||
export CERTCTL_SMTP_USE_TLS=true
|
||||
|
||||
# Enable digest and set recipients
|
||||
export CERTCTL_DIGEST_ENABLED=true
|
||||
export CERTCTL_DIGEST_INTERVAL=24h
|
||||
export CERTCTL_DIGEST_RECIPIENTS=ops@example.com,security@example.com
|
||||
```
|
||||
|
||||
Preview the digest HTML before enabling scheduled delivery:
|
||||
```bash
|
||||
curl --cacert "$CA" https://localhost:8443/api/v1/digest/preview | jq '.html' | grep -o '<html>' # Shows HTML is ready
|
||||
|
||||
# Trigger a digest send immediately (outside of schedule)
|
||||
curl --cacert "$CA" -X POST https://localhost:8443/api/v1/digest/send
|
||||
```
|
||||
|
||||
If no recipients are configured (`CERTCTL_DIGEST_RECIPIENTS` empty), the digest falls back to certificate owner emails. Digests include total certificates, expiring soon, expired, active agents, completed/failed jobs (30-day summary), and a table of expiring certs color-coded by urgency (7/14/30 days).
|
||||
|
||||
## MCP Server (AI Integration)
|
||||
|
||||
```bash
|
||||
cd cmd/mcp-server && go build -o mcp-server .
|
||||
|
||||
export CERTCTL_SERVER_URL="https://localhost:8443"
|
||||
export CERTCTL_API_KEY="test-key-123"
|
||||
export CERTCTL_SERVER_CA_BUNDLE_PATH="$CA" # MCP is env-vars-only; no CLI flags
|
||||
|
||||
./mcp-server
|
||||
```
|
||||
|
||||
Exposes the full REST API via MCP over stdio transport. Ask Claude: "What certificates are expiring in the next 30 days?", "Revoke the payments cert due to key compromise", "Show me the audit trail."
|
||||
|
||||
## Demo Data Reference
|
||||
|
||||
| Resource | Count | Examples |
|
||||
|----------|-------|---------|
|
||||
| Teams | 6 | Platform, Security, Payments, Frontend, Data, DevOps |
|
||||
| Owners | 6 | Alice, Bob, Carol, Dave, Eve, Frank |
|
||||
| Issuers | 5 | Local Dev CA, Let's Encrypt Staging, step-ca Internal, ZeroSSL (EAB), Custom OpenSSL CA |
|
||||
| Agents | 9 | 8 real agents (linux/darwin/windows, amd64/arm64) + server-scanner (network discovery) |
|
||||
| Targets | 8 | NGINX prod, NGINX staging, NGINX data, HAProxy, Apache, IIS, Traefik, Caddy |
|
||||
| Certificates | 35 | Active, Expiring, Expired, Failed, Revoked, RenewalInProgress, Wildcard, S/MIME |
|
||||
| Jobs | 50+ | 90 days of issuance, renewal, deployment jobs + 2 AwaitingApproval |
|
||||
| Discovered Certs | 12 | Unmanaged (filesystem + network), Managed (linked), Dismissed |
|
||||
| Discovery Scans | 8 | Historical + recent agent filesystem scans + network TLS scans |
|
||||
| Network Scan Targets | 4 | DC1 Web Servers, DC2 Application Tier, DMZ Public Endpoints, Edge Locations |
|
||||
| Audit Events | 55+ | 90 days of lifecycle events (issuance, renewal, deployment, revocation, discovery) |
|
||||
| Policies | 4 | Required owner, allowed environments, max lifetime, min renewal window |
|
||||
| Profiles | 5 | Standard TLS, Internal mTLS, Short-Lived, High Security, S/MIME Email |
|
||||
| Agent Groups | 5 | Linux agents, ARM agents, Production subnet, etc. |
|
||||
|
||||
## Dashboard Demo Mode
|
||||
|
||||
The dashboard works without a backend for screenshots and presentations:
|
||||
|
||||
```bash
|
||||
cd web && npm install && npm run dev
|
||||
# Dashboard at http://localhost:5173
|
||||
```
|
||||
|
||||
When the API is unreachable, the dashboard loads realistic mock data with a "Demo Mode" badge.
|
||||
|
||||
## Presenting to Stakeholders
|
||||
|
||||
A suggested 5-minute flow:
|
||||
|
||||
1. **Dashboard** — "Certificate inventory at a glance. Real-time charts show expiration trends and renewal health."
|
||||
2. **Expiring certs** — "These three would have caused outages. At 47-day lifespans, this happens every other week."
|
||||
3. **Certificate detail** — "Full lifecycle: who owns it, where it's deployed, deployment timeline, version history with rollback."
|
||||
4. **Revocation** — "One click revokes with an RFC 5280 reason code. CRL and OCSP served automatically."
|
||||
5. **Failed renewal** — "System tried 3 times, then alerted the team via Slack, Teams, PagerDuty, or OpsGenie."
|
||||
6. **Agent fleet** — "Agents handle key generation locally (ECDSA P-256). Private keys never leave your infrastructure."
|
||||
7. **Discovery** — "Agents scan filesystems, server probes TLS endpoints. We find what you're not managing yet."
|
||||
8. **Bulk operations** — "Select multiple certs, renew or revoke in bulk. At 47-day lifespans with hundreds of certs, this is essential."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for compliance."
|
||||
10. **CLI + MCP** — "Terminal users get `certctl-cli`. AI assistants get MCP integration. Everything is API-first."
|
||||
|
||||
## Tear Down
|
||||
|
||||
```bash
|
||||
docker compose -f deploy/docker-compose.yml down -v
|
||||
```
|
||||
|
||||
The `-v` flag removes the PostgreSQL data volume for a clean slate.
|
||||
|
||||
## What's Next
|
||||
|
||||
**Ready to deploy with your stack?** The [Deployment Examples](examples.md) page has 5 turnkey docker-compose scenarios — pick the one closest to your setup and have it running in minutes. It also covers migration paths from Certbot, acme.sh, and cert-manager.
|
||||
|
||||
- **[Deployment Examples](examples.md)** — ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer
|
||||
- **[Advanced Demo](demo-advanced.md)** — Issue a real certificate via the Local CA end-to-end
|
||||
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Reference](connectors.md)** — Configuration for all 7 issuers and 10 targets
|
||||
- **[Concepts Guide](concepts.md)** — TLS certificates, CAs, and private keys explained from scratch
|
||||
@@ -0,0 +1,119 @@
|
||||
# Why certctl?
|
||||
|
||||
Certificate management is broken at every scale between "one domain on Let's Encrypt" and "Fortune 500 budget for Venafi." certctl fills that gap: a self-hosted platform that automates the entire certificate lifecycle, works with any CA, deploys to any server, and keeps private keys on your infrastructure. It's free, source-available, and you own everything.
|
||||
|
||||
## The Math That Forces the Decision
|
||||
|
||||
The CA/Browser Forum passed [Ballot SC-081v3](https://cabforum.org/2025/04/11/ballot-sc081v3-introduce-schedule-of-reducing-validity-and-data-reuse-periods/) in April 2025, mandating a phased reduction in TLS certificate lifetimes: **200 days** as of March 2026, **100 days** by March 2027, and **47 days** by March 2029.
|
||||
|
||||
At 47-day lifespans, a team managing 100 certificates is processing **7+ renewals per week**, every week, forever. At 200 certificates, it's two per day. Manual processes, calendar reminders, and certbot cron jobs don't scale to this — a single missed renewal becomes a production outage at 3 AM. Certificate lifecycle automation is no longer optional; the only question is what tool runs it.
|
||||
|
||||
## The Landscape Today
|
||||
|
||||
If you're evaluating your options, here's what you'll find:
|
||||
|
||||
**ACME clients** (certbot, lego, acme.sh) handle issuance and renewal for Let's Encrypt and similar CAs, but they don't deploy to target servers, don't track inventory, don't support private CAs, and give you no audit trail or policy enforcement. You end up writing glue scripts and hoping they don't break.
|
||||
|
||||
**Kubernetes-native tools** (cert-manager) work well inside the cluster, but most organizations run mixed infrastructure — NGINX on VMs, HAProxy at the edge, IIS on Windows, maybe an F5. You need a separate solution for everything outside Kubernetes.
|
||||
|
||||
**Commercial SaaS platforms** handle more of the lifecycle but are proprietary, cloud-dependent, and priced per certificate. At 100 certs and 20 agents, SaaS pricing runs $3,000-5,000/year and scales linearly. You're paying rent on your own infrastructure's security.
|
||||
|
||||
**Enterprise platforms** (Venafi, Keyfactor, AppViewX) are comprehensive but start at $75K/year and require dedicated teams to operate. If you have a 50-server environment, the licensing costs more than the servers.
|
||||
|
||||
## What certctl Does Differently
|
||||
|
||||
certctl handles issuance, renewal, deployment, revocation, discovery, and monitoring — with three design decisions that no other tool at any price point combines:
|
||||
|
||||
### 1. Private Keys Never Leave Your Infrastructure
|
||||
|
||||
certctl agents generate ECDSA P-256 private keys locally. The agent creates a CSR and submits it to the control plane. The signed certificate comes back. The private key stays on the agent's filesystem with 0600 permissions — it never crosses the network.
|
||||
|
||||
This isn't a premium feature. It's the default behavior, free. Most alternatives either generate keys on the server (creating a single point of compromise) or gate key isolation behind paid tiers.
|
||||
|
||||
### 2. CA-Agnostic Issuer Architecture
|
||||
|
||||
certctl works with any certificate authority, not just ACME providers. Nine issuer connectors ship today, all free:
|
||||
|
||||
- **ACME v2** (Let's Encrypt, ZeroSSL, Google Trust Services, Buypass) — HTTP-01, DNS-01, DNS-PERSIST-01 challenges, External Account Binding, ACME Renewal Information (RFC 9773), certificate profile selection
|
||||
- **HashiCorp Vault PKI** — `/v1/{mount}/sign/{role}` API, token auth
|
||||
- **DigiCert CertCentral** — async order model, OV/EV support
|
||||
- **Sectigo SCM** — async order model, DV/OV/EV support, 3-header auth
|
||||
- **Google Cloud CAS** — Certificate Authority Service, OAuth2 service account auth, CA pool selection
|
||||
- **step-ca** (Smallstep) — native /sign API with JWK provisioner auth
|
||||
- **Local CA** — self-signed or sub-CA mode (chain to ADCS or any enterprise root)
|
||||
- **OpenSSL / Custom CA** — delegate signing to any shell script
|
||||
- **EST enrollment** (RFC 7030) — device certs for WiFi/802.1X, MDM, IoT
|
||||
|
||||
Every connector implements the same interface. Running multiple CAs in parallel — Let's Encrypt for public certs, Vault for internal services, your enterprise CA for legacy systems — is configuration, not code.
|
||||
|
||||
### 3. Post-Deployment Verification
|
||||
|
||||
Every other tool in this space stops at "the deployment command succeeded." certctl goes further: after deploying a certificate, the agent connects back to the live TLS endpoint and compares the SHA-256 fingerprint of the served certificate against what was deployed.
|
||||
|
||||
A reload command can exit 0 while the certificate doesn't take effect — wrong virtual host, stale cache, config that validates but doesn't apply. certctl catches this automatically.
|
||||
|
||||
## What Else Ships Free
|
||||
|
||||
The three differentiators above get the headlines, but the feature surface is wider than most paid platforms:
|
||||
|
||||
**13 deployment targets** — NGINX, Apache, HAProxy, Traefik, Caddy, Envoy, IIS (local PowerShell + remote WinRM), F5 BIG-IP (proxy agent + iControl REST), Postfix, Dovecot, SSH (agentless), Windows Certificate Store, and Java Keystore. All use a pluggable connector model. The control plane never initiates outbound connections — agents poll for work, meaning certctl works behind firewalls, across network zones, and in air-gapped environments.
|
||||
|
||||
**Network certificate discovery** — active TLS scanning of CIDR ranges finds certificates you didn't know existed. Agents also scan local filesystems for PEM/DER files. Everything feeds into a triage workflow where you claim, dismiss, or import discovered certs into management.
|
||||
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete. Mapped to SOC 2, PCI-DSS 4.0, and NIST SP 800-57 compliance frameworks with published evidence guides.
|
||||
|
||||
**Policy engine** — 5 rule types (allowed issuers, allowed domains, required metadata, allowed environments, renewal lead time) with violation tracking and severity levels.
|
||||
|
||||
**PKI compliance** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
|
||||
**Prometheus metrics** — `/api/v1/metrics/prometheus` in standard exposition format. Works with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics.
|
||||
|
||||
**MCP server** — the entire REST API is exposed via MCP for AI-assisted certificate management via Claude, Cursor, or any MCP-compatible client. No other certificate platform offers this.
|
||||
|
||||
**Full REST API** — OpenAPI 3.1-documented operations covering the entire platform. CLI tool with 10 subcommands. Helm chart for Kubernetes deployment. Scheduled certificate digest emails. Certificate export in PEM and PKCS#12. S/MIME support with EKU-aware issuance.
|
||||
|
||||
**Extensively tested** — Go backend with race detection, static analysis (golangci-lint), and vulnerability scanning (govulncheck) on every commit. CI-enforced per-layer coverage thresholds. Frontend test suite. Every push is gated.
|
||||
|
||||
## How certctl Compares
|
||||
|
||||
### vs. ACME Clients
|
||||
|
||||
ACME clients solve one slice of the problem — issuance and renewal from ACME CAs. certctl replaces the ACME client, adds 6 more CA integrations, deploys the cert to the right server, verifies it's live, tracks it in an inventory, alerts on expiry, logs everything to an audit trail, and enforces policy. If you're currently running certbot behind a cron job and a prayer, certctl replaces all of it.
|
||||
|
||||
### vs. Agent-Based SaaS
|
||||
|
||||
The closest architectural competitors use the same agent model — local key generation, CSR submission, push-based deployment. Where certctl differs: it supports 9 issuer types (not just ACME), provides CRL/OCSP/revocation infrastructure (not just issuance), includes a policy engine and network discovery, and is source-available with no certificate limit. SaaS alternatives are typically proprietary, priced per certificate ($2+/cert/month), and cap their free tiers at 3-5 certificates. certctl is free for any number of certificates, forever.
|
||||
|
||||
### vs. Commercial PKI Platforms
|
||||
|
||||
On-prem or hosted commercial platforms offer broader cert type coverage (VPN certs, device auth, SCEP) and deeper CA integrations. The trade-off: no free tier, opaque pricing (often €13K+/year for 1,500 certs), proprietary codebases, and no public API documentation. certctl trades breadth of exotic cert types for full transparency — source-available code, fully documented OpenAPI spec, and a free community edition with no artificial limits.
|
||||
|
||||
### vs. Enterprise Platforms
|
||||
|
||||
Venafi and Keyfactor offer decades of features at $75K-$250K+/year. certctl targets organizations that need 80% of those capabilities at a fraction of the cost. What certctl doesn't have yet: SSO/RBAC (coming in certctl Pro), vendor SLA-backed support. What certctl does have that enterprise platforms don't: an MCP server for AI-assisted management, ACME ARI (RFC 9773) for CA-directed renewal timing, and a deployment model that works in 5 minutes instead of 5 months.
|
||||
|
||||
## Who Should Look Elsewhere
|
||||
|
||||
certctl isn't the right tool for everyone:
|
||||
|
||||
- **Single-domain sites** — if you have one certificate on one server, certbot is fine. certctl is designed for managing tens to hundreds of certificates across multiple servers and CAs.
|
||||
- **Pure Kubernetes environments** — if every workload runs in-cluster and you're happy with cert-manager, there's no reason to add another tool. certctl shines when your infrastructure extends beyond Kubernetes.
|
||||
- **Organizations that need a vendor SLA today** — certctl is source-available software maintained by a small team. If you need contractual uptime guarantees and a support hotline, an enterprise platform is the right choice (for now).
|
||||
|
||||
## See It Running
|
||||
|
||||
The demo seeds certificates across multiple issuers, agents, and deployment targets with 180 days of realistic history — jobs, audit events, discovery scans, approval workflows — so you can explore every feature immediately.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/certctl-io/certctl.git
|
||||
cd certctl/deploy && docker compose up -d
|
||||
# Dashboard at https://localhost:8443 (self-signed cert — pin deploy/test/certs/ca.crt)
|
||||
```
|
||||
|
||||
See the [Quickstart Guide](quickstart.md) for a full walkthrough, or explore the [5 turnkey examples](../examples/) for specific scenarios (ACME+NGINX, wildcard DNS-01, private CA+Traefik, step-ca+HAProxy, multi-issuer).
|
||||
|
||||
## License
|
||||
|
||||
certctl is source-available under the [Business Source License 1.1](../LICENSE). Free for any use except offering a competing managed service.
|
||||
|
||||
You own your data, your keys, and your deployment.
|
||||
Reference in New Issue
Block a user