mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:01:36 +00:00
docs: comprehensive V2 documentation update across all guides
Add missing V2 concepts (Certificate Profiles, Revocation with CRL/OCSP, Short-Lived Certificates, CLI, MCP Server, Observability) to concepts guide. Update quickstart with revocation examples, sorting/filtering, cursor pagination, sparse fields, stats/metrics, and approval workflows. Align 5-minute demo guide and advanced demo to full V2 feature set including revocation workflows, bulk ops, fleet overview, and dashboard charts. Update architecture with MCP server section, 5th scheduler loop, API audit log, and 860+ test count. Add revocation-across-issuers section to connectors guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
+43
-7
@@ -25,7 +25,7 @@ flowchart TB
|
||||
API["REST API\n(Go net/http, :8443)"]
|
||||
SVC["Service Layer"]
|
||||
REPO["Repository Layer\n(database/sql + lib/pq)"]
|
||||
SCHED["Background Scheduler\n4 loops"]
|
||||
SCHED["Background Scheduler\n5 loops"]
|
||||
DASH["Web Dashboard\n(React SPA)"]
|
||||
end
|
||||
|
||||
@@ -92,7 +92,7 @@ The agent runs two background loops: a heartbeat (every 60 seconds) to signal it
|
||||
|
||||
The web dashboard is the primary operational interface for certctl. It is built with Vite + React + TypeScript and uses TanStack Query for server state management (caching, background refetching, optimistic updates).
|
||||
|
||||
**Current views (17 pages)**: certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info), job queue (status, retry, cancel, approve/reject), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (list with test connection + delete), targets (list with 3-step configuration wizard + delete), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), summary dashboard, and login page.
|
||||
**Current views (19 pages)**: certificate inventory (list with multi-select bulk operations + "New Certificate" creation modal + detail with deployment status timeline, inline policy/profile editor, version history, deploy, revoke, archive, and trigger renewal actions), agent fleet (list + detail with system info + OS/architecture grouping with charts), job queue (status, retry, cancel, approve/reject), notification inbox (threshold alert grouping, mark-as-read), audit trail (time range, actor, action filters + CSV/JSON export), policy management (rules with enable/disable toggle + delete + violations), issuers (list with test connection + delete), targets (list with 3-step configuration wizard + delete), owners (list with team resolution + delete), teams (list with delete), agent groups (list with dynamic match criteria badges + enable/disable + delete), certificate profiles (list with crypto constraints), short-lived credentials dashboard (TTL countdown, profile filtering, auto-refresh), summary dashboard with charts (expiration heatmap, renewal success rate, status distribution, issuance rate), and login page.
|
||||
|
||||
The dashboard includes an **ErrorBoundary component** for graceful error recovery — if a view crashes, the boundary catches the error and displays a user-friendly message instead of breaking the entire dashboard. It also includes a **demo mode** that activates when the API is unreachable — it renders realistic mock data for screenshots and offline presentations.
|
||||
|
||||
@@ -342,7 +342,7 @@ The agent handles both the certificate (public) and the private key (read from l
|
||||
|
||||
### 4. Automatic Renewal
|
||||
|
||||
The control plane runs a scheduler with four background loops:
|
||||
The control plane runs a scheduler with five background loops:
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
@@ -351,12 +351,14 @@ flowchart LR
|
||||
J["Job Processor\n⏱ every 30s"]
|
||||
H["Agent Health\n⏱ every 2m"]
|
||||
N["Notification Processor\n⏱ every 1m"]
|
||||
SL["Short-Lived Expiry\n⏱ every 30s"]
|
||||
end
|
||||
|
||||
R -->|"Find expiring certs\nCreate renewal jobs"| DB[("PostgreSQL")]
|
||||
J -->|"Process pending jobs\nCoordinate issuance"| DB
|
||||
H -->|"Check heartbeat staleness\nMark agents offline"| DB
|
||||
N -->|"Send pending notifications\nEmail / Webhook"| DB
|
||||
N -->|"Send pending notifications\nEmail / Webhook / Slack"| DB
|
||||
SL -->|"Expire short-lived certs\nMark as Expired"| DB
|
||||
```
|
||||
|
||||
| Loop | Interval | Timeout | Purpose |
|
||||
@@ -365,6 +367,7 @@ flowchart LR
|
||||
| Job processor | 30 seconds | 2 minutes | Processes pending jobs (issuance, renewal, deployment) |
|
||||
| Agent health check | 2 minutes | 1 minute | Marks agents as offline if heartbeat is stale |
|
||||
| Notification processor | 1 minute | 1 minute | Sends pending notifications via configured channels |
|
||||
| Short-lived expiry | 30 seconds | 30 seconds | Marks expired short-lived certificates (profile TTL < 1 hour) |
|
||||
|
||||
Each operation has a context timeout to prevent indefinite hangs if external services become unresponsive.
|
||||
|
||||
@@ -478,7 +481,7 @@ type Connector interface {
|
||||
}
|
||||
```
|
||||
|
||||
Built-in notifiers: **Email** (SMTP) and **Webhook** (HTTP POST).
|
||||
Built-in notifiers: **Email** (SMTP), **Webhook** (HTTP POST), **Slack** (incoming webhook), **Microsoft Teams** (MessageCard), **PagerDuty** (Events API v2), and **OpsGenie** (Alert API v2). Each is enabled by setting its configuration environment variable.
|
||||
|
||||
See the [Connector Development Guide](connectors.md) for details on building custom connectors.
|
||||
|
||||
@@ -547,6 +550,12 @@ Every action is recorded as an immutable audit event:
|
||||
|
||||
Audit events cannot be modified or deleted. They support filtering by actor, action, resource type, resource ID, and time range. All audit operations are logged via structured `slog` logging; if an audit event fails to persist, the error is logged immediately to ensure no gaps in the audit trail go unnoticed.
|
||||
|
||||
### API Audit Log
|
||||
|
||||
In addition to application-level audit events, certctl records every HTTP API call via middleware. The audit middleware captures method, path, actor (extracted from auth context), SHA-256 request body hash (truncated to 16 characters), response status code, and request latency. Health and readiness probes are excluded to avoid noise.
|
||||
|
||||
Audit recording is async (via goroutine) so it never blocks the HTTP response. If audit persistence fails, the error is logged immediately — the API call still succeeds. The middleware sits after the auth middleware in the stack so the actor identity is available from context.
|
||||
|
||||
### Logging
|
||||
|
||||
All logging throughout the service layer uses Go's `log/slog` package for structured, queryable logs. This replaces ad-hoc `fmt.Printf` statements with consistent key-value logging that includes request context, operation names, and error details. Agents also implement exponential backoff on network failures to gracefully handle temporary connectivity issues with the control plane.
|
||||
@@ -570,6 +579,31 @@ Certificate revocation: `POST /api/v1/certificates/{id}/revoke` with optional `{
|
||||
|
||||
Health checks live outside the API prefix: `GET /health` and `GET /ready`.
|
||||
|
||||
## MCP Server
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server as a separate binary (`cmd/mcp-server/`) that enables AI assistants to interact with the certificate platform. The MCP server uses the official MCP Go SDK (`modelcontextprotocol/go-sdk`) with stdio transport for integration with Claude, Cursor, and other MCP-compatible tools.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
AI["AI Assistant\n(Claude, Cursor)"] -->|"stdio"| MCP["MCP Server\ncmd/mcp-server/"]
|
||||
MCP -->|"HTTP + Bearer token"| API["certctl REST API\n:8443"]
|
||||
|
||||
subgraph "76 MCP Tools"
|
||||
T1["Certificate CRUD"]
|
||||
T2["Agent Management"]
|
||||
T3["Job Operations"]
|
||||
T4["Policy/Profile Queries"]
|
||||
T5["Audit Trail Access"]
|
||||
T6["Stats & Metrics"]
|
||||
end
|
||||
|
||||
MCP --> T1 & T2 & T3 & T4 & T5 & T6
|
||||
```
|
||||
|
||||
The MCP server is a stateless HTTP proxy — every MCP tool call translates to an HTTP request to the certctl REST API. It adds no new state, no new dependencies, and no new attack surface beyond what the API already exposes. Configuration is minimal: `CERTCTL_SERVER_URL` and `CERTCTL_API_KEY` environment variables.
|
||||
|
||||
The 76 tools are organized across 16 resource domains with typed input structs and `jsonschema` struct tags for automatic LLM-friendly schema generation. Binary response support handles DER CRL and OCSP endpoints.
|
||||
|
||||
## Deployment Topologies
|
||||
|
||||
### Docker Compose (Development / Small Deployments)
|
||||
@@ -620,7 +654,7 @@ For production, you would also add an ingress controller, TLS termination for th
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 630+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database.
|
||||
certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 860+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database.
|
||||
|
||||
**Service layer unit tests** (`internal/service/*_test.go`) — ~238 test functions across 15 files with mock repositories. These test all business logic in isolation: certificate CRUD with validation, certificate revocation (success, already-revoked, archived, invalid reason, all RFC 5280 reason codes, issuer notification, notification service integration, OCSP/CRL generation), agent lifecycle (registration, heartbeat, CSR submission with both keygen modes), job state machine (creation, processing, cancellation, retry logic), policy evaluation (all 5 rule types, violation creation), renewal and issuance flow (server-side and agent-side keygen paths), notification deduplication (threshold tag matching, channel routing), team/owner/agent group CRUD with pagination and audit recording, issuer service CRUD with connection testing, and the issuer connector adapter (type translation between connector and service layers including revocation). Mock repositories are simple structs with function fields, avoiding heavy mocking frameworks — this keeps tests readable and avoids coupling to mock library APIs.
|
||||
|
||||
@@ -628,7 +662,9 @@ certctl uses a layered testing approach aligned with the handler → service →
|
||||
|
||||
**Integration tests** (`internal/integration/`) — Two test files exercising the full stack from HTTP request through router, handler, service, and postgres repository layers. `lifecycle_test.go` has 11 subtests covering the complete certificate lifecycle: team/owner creation, certificate creation, issuer verification, renewal trigger, job verification, agent registration, CSR submission, deployment, and status reporting. `negative_test.go` has 14 subtests covering error paths, 19 M11b endpoint tests, and 8 revocation endpoint tests (M15a+M15b): nonexistent resource lookups (404s), invalid request bodies (malformed JSON, missing required fields), invalid CSR submission, heartbeat for nonexistent agents, wrong HTTP methods on list endpoints, empty list responses, renewal on nonexistent certificates, expired certificate lifecycle, team/owner/agent group CRUD validation, revocation success, already-revoked rejection, not-found revocation, JSON CRL retrieval, DER CRL retrieval, OCSP response retrieval, and short-lived cert exemption. Both use a shared `setupTestServer()` that builds a fully-wired server with real postgres repositories and the Local CA issuer connector.
|
||||
|
||||
**Frontend tests** (`web/src/api/client.test.ts`, `web/src/api/utils.test.ts`) — 53 Vitest tests covering the API client and utility functions. The API client tests mock `globalThis.fetch` and verify all endpoint functions (certificates, agents, jobs, policies, issuers, targets, notifications, audit, health) send correct HTTP methods, URLs, headers, and request bodies. They also test API key management (store/retrieve/clear), auth header propagation, 401 event dispatching, and error handling (server messages, error fields, status text fallback). The utility tests use `vi.useFakeTimers()` for deterministic date testing and cover `formatDate`, `formatDateTime`, `timeAgo`, `daysUntil`, and `expiryColor`. The test environment uses jsdom with `@testing-library/jest-dom` matchers.
|
||||
**Frontend tests** (`web/src/api/client.test.ts`, `web/src/api/utils.test.ts`) — 86 Vitest tests covering the API client, stats/metrics endpoints, and utility functions. The API client tests mock `globalThis.fetch` and verify all endpoint functions (certificates, agents, jobs, policies, issuers, targets, notifications, audit, stats, metrics, health) send correct HTTP methods, URLs, headers, and request bodies. They also test API key management (store/retrieve/clear), auth header propagation, 401 event dispatching, and error handling (server messages, error fields, status text fallback). The stats/metrics endpoint tests verify correct query parameter handling and response shape validation. The utility tests use `vi.useFakeTimers()` for deterministic date testing and cover `formatDate`, `formatDateTime`, `timeAgo`, `daysUntil`, and `expiryColor`. The test environment uses jsdom with `@testing-library/jest-dom` matchers.
|
||||
|
||||
**CLI tests** (`internal/cli/client_test.go`) — 14 tests covering all 10 CLI subcommands with httptest mock servers, PEM parsing for bulk import, auth header verification, and JSON/table output formatting.
|
||||
|
||||
**CI pipeline** (`.github/workflows/ci.yml`) — Two parallel jobs: Go (build, vet, test with coverage, coverage threshold enforcement) and Frontend (TypeScript type check, Vitest test suite, Vite production build). The Go job runs all tests with `-coverprofile`, then enforces coverage thresholds: service layer must be at least 30% (current: ~35%) and handler layer must be at least 50% (current: ~63%). These thresholds act as regression floors — they can only go up. The service layer threshold is deliberately lower because much of the service code depends on postgres repositories and external connectors that require real infrastructure to test meaningfully. Connector tests are included via `./internal/connector/issuer/...` and `./internal/connector/target/...` (covers Local CA, ACME, step-ca, NGINX, Apache, and HAProxy packages with unit tests for certificate signing logic, DNS solver, issuer validation, and deployment flows). The Frontend job runs `npx vitest run` between the TypeScript check and production build steps.
|
||||
|
||||
|
||||
+50
-2
@@ -128,10 +128,40 @@ Every certificate belongs to a **team** and has an **owner**. This answers the q
|
||||
|
||||
Agent groups let you organize agents by criteria — OS, architecture, IP subnet, or version — for dynamic policy scoping. For example, you can create a group matching all Linux agents and scope a renewal policy to that group. Groups can use dynamic matching criteria (agents automatically join when they match) or manual membership (explicitly include/exclude specific agents). Agent groups are managed via the GUI and API.
|
||||
|
||||
### Certificate Profiles
|
||||
|
||||
Certificate profiles define the cryptographic and lifecycle constraints for a class of certificates. A profile specifies which key types are allowed (e.g., RSA-2048, ECDSA P-256), the maximum validity period, and other enrollment rules. When a certificate is assigned to a profile, certctl enforces these constraints during issuance — if an agent submits a CSR with a disallowed key type, issuance is rejected.
|
||||
|
||||
Profiles answer the question "what kind of certificate is this?" while policies answer "is this certificate compliant?" A production TLS profile might allow only ECDSA P-256 with a 90-day max TTL, while a development profile might allow RSA-2048 with a 365-day TTL. Short-lived profiles (TTL under 1 hour) enable machine-to-machine authentication patterns where certificates are issued frequently and expire quickly — these are exempt from CRL/OCSP since expiry itself is sufficient revocation.
|
||||
|
||||
Profiles are managed via the API (`/api/v1/profiles`) and the GUI, and can be assigned to certificates during creation or updated later.
|
||||
|
||||
### Interactive Renewal Approval
|
||||
|
||||
For policies with `auto_renew` disabled, renewal jobs enter an **AwaitingApproval** state instead of processing immediately. An operator must explicitly approve or reject the renewal via the API or GUI. Approved jobs transition to Pending and are picked up by the scheduler. Rejected jobs are cancelled with an optional reason. This is useful for high-value certificates where you want human oversight before renewal.
|
||||
|
||||
### Certificate Revocation
|
||||
|
||||
When a private key is compromised, a certificate is superseded, or a service is decommissioned, you need to revoke the certificate immediately — not wait for it to expire. Revocation tells clients "stop trusting this certificate right now."
|
||||
|
||||
certctl implements revocation using three complementary mechanisms:
|
||||
|
||||
**Revocation API**: `POST /api/v1/certificates/{id}/revoke` marks a certificate as revoked in the inventory, records the revocation in a dedicated `certificate_revocations` table, notifies the issuing CA (best-effort — the revocation succeeds even if the CA is unreachable), creates an audit trail entry, and sends notifications. You can specify an RFC 5280 reason code (keyCompromise, superseded, cessationOfOperation, etc.) or let it default to "unspecified."
|
||||
|
||||
**Certificate Revocation List (CRL)**: certctl serves both a JSON-formatted CRL at `GET /api/v1/crl` and DER-encoded X.509 CRLs per issuer at `GET /api/v1/crl/{issuer_id}`. The DER CRL is signed by the issuing CA's key and has 24-hour validity — clients can download it periodically to check revocation status offline.
|
||||
|
||||
**OCSP Responder**: For real-time revocation checking, certctl includes an embedded OCSP responder at `GET /api/v1/ocsp/{issuer_id}/{serial}`. It returns signed OCSP responses (good, revoked, or unknown) so clients can verify certificate status without downloading the full CRL.
|
||||
|
||||
Short-lived certificates (those assigned to profiles with TTL under 1 hour) are exempt from CRL and OCSP — their rapid expiry is considered sufficient revocation. This is a deliberate design choice to reduce infrastructure overhead for ephemeral machine-to-machine credentials.
|
||||
|
||||
### Short-Lived Certificates
|
||||
|
||||
Short-lived certificates are certificates with a TTL under 1 hour, typically used for service-to-service authentication in microservice architectures. Instead of revoking these certificates when something goes wrong, you simply stop issuing new ones — the existing certificates expire within minutes.
|
||||
|
||||
certctl provides a dedicated dashboard view for short-lived credentials that shows active certificates with live TTL countdowns, auto-refreshes every 10 seconds, and filters by profile. This gives ops teams real-time visibility into ephemeral credential activity without cluttering the main certificate inventory.
|
||||
|
||||
Short-lived certificates are defined by their profile — assign a certificate to a profile with `max_validity_days` that translates to under 1 hour, and certctl automatically treats it as short-lived: no CRL/OCSP entries, no revocation overhead, just rapid issuance and natural expiry.
|
||||
|
||||
### Policies
|
||||
|
||||
Policies are guardrails. You can enforce rules like "production certificates must use specific issuers," "all certificates must have an owner," or "certificate lifetime cannot exceed 90 days." When a certificate violates a policy, certctl flags it with a policy violation so you can take action.
|
||||
@@ -146,10 +176,28 @@ Every action is logged: who did it, what changed, when, and why. This is essenti
|
||||
|
||||
### Notifications
|
||||
|
||||
certctl can alert you when certificates are expiring, when renewals fail, when deployments succeed, or when policy violations are detected. Notifications go out via email or webhooks, with Slack support planned.
|
||||
certctl can alert you when certificates are expiring, when renewals fail, when deployments succeed, or when policy violations are detected. Notifications are delivered via six channels: Email, Webhook, Slack, Microsoft Teams, PagerDuty, and OpsGenie. Each notifier is configured independently via environment variables and can be enabled or disabled as needed.
|
||||
|
||||
### CLI
|
||||
|
||||
certctl ships with a command-line tool (`certctl-cli`) for operators who prefer terminal workflows or need to integrate certctl into shell scripts and CI/CD pipelines. The CLI wraps the REST API with 10 subcommands: `list-certs`, `get-cert`, `renew-cert`, `revoke-cert`, `list-agents`, `list-jobs`, `health`, `metrics`, and `import` (for bulk PEM import).
|
||||
|
||||
The CLI supports both table and JSON output formats (`--format table` or `--format json`), connects to the server via `CERTCTL_SERVER_URL` and authenticates with `CERTCTL_API_KEY`. It's built with Go's standard library only — no external dependencies.
|
||||
|
||||
### MCP Server (AI Integration)
|
||||
|
||||
certctl includes an MCP (Model Context Protocol) server that exposes all 76 API endpoints as MCP tools. This enables AI assistants like Claude, Cursor, and other MCP-compatible tools to interact with your certificate infrastructure using natural language — "show me all expiring certificates," "revoke the VPN cert," or "what agents are offline?"
|
||||
|
||||
The MCP server is a separate binary (`cmd/mcp-server/`) that communicates via stdio transport and acts as a stateless HTTP proxy to the certctl REST API. It requires no additional infrastructure — just point it at your certctl server URL and API key.
|
||||
|
||||
### Observability
|
||||
|
||||
certctl exposes a JSON metrics endpoint at `GET /api/v1/metrics` with gauges (certificate totals by status, agent counts, pending jobs), counters (completed/failed jobs), and uptime. Five stats endpoints power the dashboard charts: summary statistics, certificates by status, expiration timeline, job trends, and issuance rate.
|
||||
|
||||
The agent fleet overview page groups agents by OS, architecture, and version, showing distribution charts that help ops teams track fleet health and identify outdated agents. All API requests are logged via structured `slog` middleware with request IDs for correlation.
|
||||
|
||||
## What's Next
|
||||
|
||||
Now that you understand the concepts, head to the [Quick Start Guide](quickstart.md) to get certctl running locally in under 5 minutes. You'll see a pre-loaded dashboard with demo certificates, explore the API, and understand how everything fits together.
|
||||
|
||||
For a deeper look at the system design, see the [Architecture Guide](architecture.md).
|
||||
For a deeper look at the system design, see the [Architecture Guide](architecture.md). For terminal-based workflows, check out the CLI Guide (docs coming soon). For AI-native integration, see the MCP Server Guide (docs coming soon).
|
||||
|
||||
@@ -187,6 +187,17 @@ Script-based issuer connector for organizations with existing CA tooling. Delega
|
||||
|
||||
The sign script receives the CSR PEM on stdin and should output the signed certificate PEM on stdout. The connector parses the certificate to extract serial number, validity dates, and chain information.
|
||||
|
||||
### Revocation Across Issuers
|
||||
|
||||
All issuer connectors implement `RevokeCertificate(ctx, serial, reason)`. When a certificate is revoked via `POST /api/v1/certificates/{id}/revoke`, certctl notifies the issuing CA on a best-effort basis — the revocation succeeds in certctl's inventory even if the CA notification fails (e.g., CA is temporarily unreachable). This ensures revocation is never blocked by external dependencies.
|
||||
|
||||
Each issuer handles revocation differently:
|
||||
|
||||
- **Local CA**: Updates the in-memory revocation list. DER-encoded CRLs and OCSP responses are generated from this list.
|
||||
- **ACME**: ACME v2 has limited revocation support — certctl records the revocation locally and serves it via CRL/OCSP.
|
||||
- **step-ca**: Calls step-ca's `/revoke` API endpoint. Clients should check step-ca's own CRL/OCSP for authoritative status.
|
||||
- **OpenSSL/Custom CA**: Invokes the configured revoke script (`CERTCTL_OPENSSL_REVOKE_SCRIPT`) with the serial number as an argument.
|
||||
|
||||
### Planned Issuers
|
||||
|
||||
The following issuer connectors are planned for future milestones:
|
||||
|
||||
+101
-8
@@ -33,7 +33,9 @@ flowchart LR
|
||||
B --> C[Create\nCertificate]
|
||||
C --> D[Trigger\nRenewal]
|
||||
D --> E[Trigger\nDeployment]
|
||||
E --> F[Inspect Audit\n& Notifications]
|
||||
E --> F[Revoke a\nCertificate]
|
||||
F --> G[Check Stats\n& Metrics]
|
||||
G --> H[Inspect Audit\n& Notifications]
|
||||
```
|
||||
|
||||
Each step corresponds to a real operation that certctl would perform in production. The difference here is that we're driving each step manually via curl instead of letting the scheduler and agents handle it automatically.
|
||||
@@ -130,7 +132,7 @@ flowchart TD
|
||||
A --> E["Local CA\n(self-signed or sub-CA)"]
|
||||
A --> F["ACME\n(Let's Encrypt)"]
|
||||
A --> G["step-ca\n(implemented)"]
|
||||
A --> H["OpenSSL / Custom CA\n(planned)"]
|
||||
A --> H["OpenSSL / Custom CA\n(script-based)"]
|
||||
A --> J["DigiCert API\n(planned)"]
|
||||
A --> K["Vault PKI\n(planned)"]
|
||||
A --> L["Entrust / GlobalSign\n(planned)"]
|
||||
@@ -447,6 +449,50 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
|
||||
---
|
||||
|
||||
## Part 8.5: Revoke a Certificate
|
||||
|
||||
Let's revoke the payments gateway certificate — simulating a key compromise scenario:
|
||||
|
||||
```bash
|
||||
curl -s -X POST $API/api/v1/certificates/mc-demo-payments/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "keyCompromise"}' | jq .
|
||||
```
|
||||
|
||||
**How it works:** The `RevokeCertificateWithActor` service method executes a 7-step process:
|
||||
|
||||
1. Validates the certificate is eligible (not already revoked, not archived)
|
||||
2. Retrieves the latest certificate version to get the serial number
|
||||
3. Updates the certificate status to "Revoked" with a timestamp and reason
|
||||
4. Records the revocation in the `certificate_revocations` table (idempotent via ON CONFLICT)
|
||||
5. Notifies the issuing CA (best-effort — revocation succeeds even if the CA is unreachable)
|
||||
6. Creates an audit trail entry
|
||||
7. Sends revocation notifications via configured channels
|
||||
|
||||
Check the CRL (Certificate Revocation List):
|
||||
|
||||
```bash
|
||||
# JSON-formatted CRL
|
||||
curl -s $API/api/v1/crl | jq .
|
||||
|
||||
# DER-encoded X.509 CRL for the local CA (binary — pipe to openssl for inspection)
|
||||
curl -s $API/api/v1/crl/iss-local -o /tmp/crl.der
|
||||
openssl crl -inform DER -in /tmp/crl.der -text -noout
|
||||
```
|
||||
|
||||
Check OCSP status:
|
||||
|
||||
```bash
|
||||
# Replace SERIAL with the actual serial number from the certificate version
|
||||
curl -s $API/api/v1/ocsp/iss-local/SERIAL | jq .
|
||||
```
|
||||
|
||||
**Why RFC 5280 reason codes:** The reason code isn't just metadata — it tells clients *why* the certificate was revoked. A `keyCompromise` revocation means the private key was exposed and the certificate should be distrusted immediately. A `superseded` revocation means a newer certificate replaced it — less urgent. CRLs and OCSP responses include the reason code so client software can make informed trust decisions.
|
||||
|
||||
**Check the dashboard.** Click the payments certificate — you'll see a revocation banner with the reason code and timestamp.
|
||||
|
||||
---
|
||||
|
||||
## Part 9: Policy Violations
|
||||
|
||||
Let's see what happens when a certificate doesn't meet policy requirements. Check existing policy rules:
|
||||
@@ -478,6 +524,36 @@ curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq .
|
||||
|
||||
---
|
||||
|
||||
## Part 9.5: Dashboard Stats and Metrics
|
||||
|
||||
certctl exposes operational metrics so you can monitor the health of your certificate infrastructure:
|
||||
|
||||
```bash
|
||||
# Dashboard summary — total certs, expiring, expired, active
|
||||
curl -s $API/api/v1/stats/summary | jq .
|
||||
|
||||
# Certificates grouped by status
|
||||
curl -s $API/api/v1/stats/certificates-by-status | jq .
|
||||
|
||||
# Expiration timeline — how many certs expire in the next 90 days
|
||||
curl -s "$API/api/v1/stats/expiration-timeline?days=90" | jq .
|
||||
|
||||
# Job trends — completed vs failed jobs over 30 days
|
||||
curl -s "$API/api/v1/stats/job-trends?days=30" | jq .
|
||||
|
||||
# Issuance rate — new certificates per day over 30 days
|
||||
curl -s "$API/api/v1/stats/issuance-rate?days=30" | jq .
|
||||
|
||||
# System metrics — gauges, counters, uptime
|
||||
curl -s $API/api/v1/metrics | jq .
|
||||
```
|
||||
|
||||
**How it works:** The `StatsService` computes aggregations in Go from existing repository List methods — no additional SQL queries or materialized views. This keeps the database schema simple while providing real-time dashboard data. The metrics endpoint returns gauges (cert totals by status, agent counts, pending jobs), counters (completed/failed jobs), and server uptime.
|
||||
|
||||
**In the dashboard**, these stats power four interactive charts: an expiration heatmap, renewal success rate trends, certificate status distribution, and issuance rate. The agent fleet overview page uses agent metadata to group by OS, architecture, and version.
|
||||
|
||||
---
|
||||
|
||||
## End-to-End Architecture Summary
|
||||
|
||||
Here's what we just walked through, mapped to the system architecture:
|
||||
@@ -615,7 +691,20 @@ echo -e "${YELLOW}Step 9: Recent audit events...${NC}"
|
||||
curl -s $API/api/v1/audit | jq '.data[0:3] | .[] | {action, resource_type, resource_id, timestamp}'
|
||||
echo ""
|
||||
|
||||
# Step 10: Summary
|
||||
# Step 10: Revoke the certificate
|
||||
echo -e "${YELLOW}Step 10: Revoking certificate...${NC}"
|
||||
curl -s -X POST $API/api/v1/certificates/$CERT_ID/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "superseded"}' | jq .
|
||||
echo -e "${GREEN}Certificate revoked${NC}"
|
||||
echo ""
|
||||
|
||||
# Step 11: Check stats
|
||||
echo -e "${YELLOW}Step 11: Dashboard summary...${NC}"
|
||||
curl -s $API/api/v1/stats/summary | jq .
|
||||
echo ""
|
||||
|
||||
# Step 12: Summary
|
||||
echo -e "${BLUE}=== Demo Complete ===${NC}"
|
||||
echo ""
|
||||
echo "What happened:"
|
||||
@@ -623,7 +712,9 @@ echo " 1. Created a team and owner for accountability"
|
||||
echo " 2. Created a managed certificate tracked by certctl"
|
||||
echo " 3. Triggered renewal (would contact the Local CA in production flow)"
|
||||
echo " 4. Triggered deployment (would push to NGINX/F5/IIS targets)"
|
||||
echo " 5. All actions recorded in the audit trail"
|
||||
echo " 5. Revoked the certificate with RFC 5280 reason codes"
|
||||
echo " 6. Checked dashboard stats and metrics"
|
||||
echo " 7. All actions recorded in the audit trail"
|
||||
echo ""
|
||||
echo -e "Open ${GREEN}http://localhost:8443${NC} to see everything in the dashboard."
|
||||
echo "Look for certificate: $CERT_ID"
|
||||
@@ -645,10 +736,12 @@ If you're using this demo to present certctl to decision-makers, here's the narr
|
||||
1. **Start with the dashboard** — "This is your certificate inventory. Every TLS certificate across your infrastructure, in one place."
|
||||
2. **Point to expiring certs** — "These certificates would have caused outages. Certctl catches them automatically."
|
||||
3. **Show the cert you just created** — "I just created this via the API. It's already tracked, assigned to a team, and will be renewed automatically."
|
||||
4. **Show the audit trail** — "Complete traceability. Every action, every change, every deployment — timestamped and attributed."
|
||||
5. **Show policies** — "Guardrails. We enforce that every certificate has an owner, uses approved CAs, and stays within allowed environments."
|
||||
6. **Show agents** — "Private keys never touch the control plane. Agents handle cryptographic operations locally on your infrastructure."
|
||||
7. **Show the API** — "Everything is API-first. The dashboard is just one consumer. You can integrate with CI/CD, Terraform, or custom tooling."
|
||||
4. **Show revocation** — "If a key is compromised, one-click revocation with RFC 5280 reason codes. CRL and OCSP endpoints are served automatically."
|
||||
5. **Show the audit trail** — "Complete traceability. Every action, every change, every deployment — timestamped and attributed."
|
||||
6. **Show policies** — "Guardrails. We enforce that every certificate has an owner, uses approved CAs, and stays within allowed environments."
|
||||
7. **Show agents** — "Private keys never touch the control plane. Agents handle cryptographic operations locally on your infrastructure."
|
||||
8. **Show dashboard stats** — "Real-time metrics: expiration trends, job success rates, certificate distribution. Everything you need to operate with confidence."
|
||||
9. **Show the CLI and MCP server** — "Terminal users get a CLI tool. AI assistants get MCP integration. Everything is API-first."
|
||||
|
||||
## Teardown
|
||||
|
||||
|
||||
+41
-9
@@ -1,6 +1,6 @@
|
||||
# certctl Demo Guide
|
||||
|
||||
A 5-7 minute guided walkthrough of certctl's dashboard and API. Perfect for stakeholder presentations and team demos.
|
||||
A 5-10 minute guided walkthrough of certctl's dashboard and API. Perfect for stakeholder presentations and team demos.
|
||||
|
||||
New to certificates? Read the [Concepts Guide](concepts.md) first. Want a hands-on demo where you issue certificates yourself? See the [Advanced Demo](demo-advanced.md).
|
||||
|
||||
@@ -28,7 +28,7 @@ The main dashboard shows at a glance:
|
||||
- **Active** — healthy certificates with time remaining
|
||||
- **Renewal success rate** — percentage of automated renewals that succeeded
|
||||
|
||||
Below the stats, you'll see an **expiry timeline** showing how many certs expire in each time bucket (7/14/30/60/90 days), and a **recent activity feed** with the latest audit events.
|
||||
Below the stats, interactive charts provide deeper visibility: an **expiration heatmap** (90-day weekly buckets), **renewal success rate trends** (30-day line chart), **certificate status distribution** (donut chart), and **issuance rate** (30-day bar chart).
|
||||
|
||||
### Certificates View
|
||||
Click "Certificates" in the sidebar to see the full inventory:
|
||||
@@ -57,6 +57,18 @@ Click "Agents" in the sidebar. Four agents are online, one (`iis-prod-agent`) we
|
||||
**6. "What policies are enforced?"**
|
||||
Click "Policies" to see the active rules: required owner metadata, allowed environments, max certificate lifetime, minimum renewal window. Check the violations list to see which certs are non-compliant.
|
||||
|
||||
**7. "Can I revoke a compromised cert?"**
|
||||
Click any active certificate, then click the "Revoke" button. A modal appears with RFC 5280 reason codes (Key Compromise, Superseded, Cessation of Operation, etc.). After revocation, the cert shows a revocation banner with the reason and timestamp.
|
||||
|
||||
**8. "Show me short-lived credentials"**
|
||||
Click "Short-Lived" in the sidebar. This view shows certificates with TTL under 1 hour — live countdown timers, auto-refresh every 10 seconds, and profile-based filtering. These are for service-to-service auth where rapid expiry replaces revocation.
|
||||
|
||||
**9. "What about bulk operations?"**
|
||||
On the Certificates page, select multiple certificates using the checkboxes. A bulk action bar appears with options to trigger renewal, revoke (with reason codes), or reassign ownership — all with progress tracking.
|
||||
|
||||
**10. "How do I see the deployment history?"**
|
||||
Click any certificate, then scroll to the deployment timeline. A visual 4-step timeline shows the lifecycle: Requested → Issued → Deploying → Active. Previous versions show a rollback button.
|
||||
|
||||
## API Walkthrough
|
||||
|
||||
The dashboard is backed by a real REST API. Try these while the demo is running:
|
||||
@@ -82,6 +94,23 @@ curl -s http://localhost:8443/api/v1/policies/pr-require-owner/violations | jq .
|
||||
|
||||
# Check system health
|
||||
curl -s http://localhost:8443/health | jq .
|
||||
|
||||
# Dashboard stats
|
||||
curl -s http://localhost:8443/api/v1/stats/summary | jq .
|
||||
|
||||
# System metrics (cert totals, agent counts, job stats)
|
||||
curl -s http://localhost:8443/api/v1/metrics | jq .
|
||||
|
||||
# Certificate profiles
|
||||
curl -s http://localhost:8443/api/v1/profiles | jq .
|
||||
|
||||
# Agent groups
|
||||
curl -s http://localhost:8443/api/v1/agent-groups | jq .
|
||||
|
||||
# Revoke a certificate
|
||||
curl -s -X POST http://localhost:8443/api/v1/certificates/mc-api-prod/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "superseded"}' | jq .
|
||||
```
|
||||
|
||||
## Demo Without Docker
|
||||
@@ -109,15 +138,18 @@ The `-v` flag removes the PostgreSQL data volume so you get a clean slate next t
|
||||
|
||||
If you're demoing to a team or customer, here's a suggested flow:
|
||||
|
||||
1. **Start with the dashboard** — "This is your certificate inventory at a glance"
|
||||
1. **Start with the dashboard** — "This is your certificate inventory at a glance, with real-time charts showing expiration trends and renewal health"
|
||||
2. **Show the expiring certs** — "These three would have caused outages without this platform"
|
||||
3. **Click into auth-production** — "Here's the full lifecycle: who owns it, where it's deployed, when it was last renewed"
|
||||
4. **Show the failed VPN cert** — "The system tried 3 times, then alerted the team via webhook"
|
||||
5. **Show agents** — "Agents run on your infrastructure, handle key generation locally, and report back"
|
||||
6. **Show policies** — "Guardrails prevent teams from going outside approved scope"
|
||||
7. **Show the API** — "Everything you see here is API-first, so you can automate on top of it"
|
||||
3. **Click into auth-production** — "Here's the full lifecycle: who owns it, where it's deployed, deployment timeline, when it was last renewed"
|
||||
4. **Show revocation** — "If a key is compromised, one click revokes the cert with an RFC 5280 reason code. CRL and OCSP are served automatically"
|
||||
5. **Show the failed VPN cert** — "The system tried 3 times, then alerted the team via Slack, Teams, PagerDuty, or OpsGenie"
|
||||
6. **Show agents and fleet overview** — "Agents run on your infrastructure, handle key generation locally. Fleet view shows OS, architecture, and version distribution"
|
||||
7. **Show profiles** — "Certificate profiles enforce crypto constraints — key types, max TTL, compliance requirements"
|
||||
8. **Show policies** — "Guardrails prevent teams from going outside approved scope"
|
||||
9. **Show bulk operations** — "Select multiple certs, trigger renewal or revoke in bulk with progress tracking"
|
||||
10. **Show the API** — "Everything you see here is API-first. We also have a CLI tool and an MCP server for AI assistant integration"
|
||||
|
||||
The whole walkthrough takes 5-7 minutes.
|
||||
The whole walkthrough takes 5-10 minutes.
|
||||
|
||||
## Next Steps
|
||||
|
||||
|
||||
@@ -206,6 +206,24 @@ curl -s http://localhost:8443/api/v1/audit | jq '.data[0:3]'
|
||||
|
||||
Refresh the dashboard at http://localhost:8443 — your new certificate appears in the inventory.
|
||||
|
||||
### Step 5: Revoke a certificate
|
||||
|
||||
If a certificate's private key is compromised or the service is decommissioned, revoke it:
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8443/api/v1/certificates/$CERT_ID/revoke \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "superseded"}' | jq .
|
||||
```
|
||||
|
||||
Supported RFC 5280 reason codes: `unspecified`, `keyCompromise`, `caCompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn`. If you omit the reason, it defaults to `unspecified`.
|
||||
|
||||
Check the CRL to confirm:
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8443/api/v1/crl | jq .
|
||||
```
|
||||
|
||||
## Understanding the Demo Data
|
||||
|
||||
The demo comes pre-loaded with realistic data so you can explore certctl's features immediately:
|
||||
@@ -219,9 +237,99 @@ The demo comes pre-loaded with realistic data so you can explore certctl's featu
|
||||
| Targets | 5 | NGINX (prod/staging/data), F5 LB, IIS |
|
||||
| Certificates | 15 | Various statuses: Active, Expiring, Expired, Failed, Wildcard |
|
||||
| Policies | 4 | Required owner, allowed environments, max lifetime, min renewal window |
|
||||
| Profiles | 3 | Default TLS, Short-Lived, High-Security |
|
||||
| Agent Groups | 5 | Linux agents, ARM agents, Production subnet, etc. |
|
||||
|
||||
Certificates have varied statuses so you can see what each state looks like in the dashboard: healthy certs with 45+ days remaining, certs about to expire (5-12 days), certs that already expired, and a failed renewal.
|
||||
|
||||
## Advanced API Features
|
||||
|
||||
### Sorting and filtering
|
||||
|
||||
```bash
|
||||
# Sort certificates by expiration date (ascending)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?sort=notAfter" | jq .
|
||||
|
||||
# Sort descending (prefix with -)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?sort=-createdAt" | jq .
|
||||
|
||||
# Time-range filters (RFC3339 format)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?expires_before=2026-05-01T00:00:00Z" | jq .
|
||||
curl -s "http://localhost:8443/api/v1/certificates?created_after=2026-03-01T00:00:00Z" | jq .
|
||||
```
|
||||
|
||||
Supported sort fields: `notAfter`, `expiresAt`, `createdAt`, `updatedAt`, `commonName`, `name`, `status`, `environment`.
|
||||
|
||||
### Sparse field selection
|
||||
|
||||
Request only the fields you need to reduce response size:
|
||||
|
||||
```bash
|
||||
curl -s "http://localhost:8443/api/v1/certificates?fields=id,common_name,status,expires_at" | jq .
|
||||
```
|
||||
|
||||
### Cursor-based pagination
|
||||
|
||||
For large datasets, cursor pagination is more efficient than page-based:
|
||||
|
||||
```bash
|
||||
# First page
|
||||
curl -s "http://localhost:8443/api/v1/certificates?page_size=5" | jq '{next_cursor: .next_cursor, count: (.data | length)}'
|
||||
|
||||
# Next page (use the next_cursor from the previous response)
|
||||
curl -s "http://localhost:8443/api/v1/certificates?cursor=<next_cursor_value>&page_size=5" | jq .
|
||||
```
|
||||
|
||||
### Stats and metrics
|
||||
|
||||
```bash
|
||||
# Dashboard summary
|
||||
curl -s http://localhost:8443/api/v1/stats/summary | jq .
|
||||
|
||||
# Certificates by status
|
||||
curl -s http://localhost:8443/api/v1/stats/certificates-by-status | jq .
|
||||
|
||||
# Expiration timeline (next 90 days)
|
||||
curl -s "http://localhost:8443/api/v1/stats/expiration-timeline?days=90" | jq .
|
||||
|
||||
# Job trends (last 30 days)
|
||||
curl -s "http://localhost:8443/api/v1/stats/job-trends?days=30" | jq .
|
||||
|
||||
# System metrics
|
||||
curl -s http://localhost:8443/api/v1/metrics | jq .
|
||||
```
|
||||
|
||||
### Certificate profiles
|
||||
|
||||
```bash
|
||||
# List all profiles
|
||||
curl -s http://localhost:8443/api/v1/profiles | jq .
|
||||
|
||||
# Get a specific profile
|
||||
curl -s http://localhost:8443/api/v1/profiles/prof-default | jq .
|
||||
```
|
||||
|
||||
### Certificate deployments
|
||||
|
||||
```bash
|
||||
# View deployment targets for a certificate
|
||||
curl -s http://localhost:8443/api/v1/certificates/mc-api-prod/deployments | jq .
|
||||
```
|
||||
|
||||
### Interactive approval workflow
|
||||
|
||||
```bash
|
||||
# Approve a pending job
|
||||
curl -s -X POST http://localhost:8443/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Approved for production deployment"}' | jq .
|
||||
|
||||
# Reject a pending job
|
||||
curl -s -X POST http://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
```
|
||||
|
||||
## Tear Down
|
||||
|
||||
```bash
|
||||
@@ -236,3 +344,4 @@ The `-v` flag removes the PostgreSQL data volume so you get a clean slate next t
|
||||
- **[Demo Walkthrough](demo-guide.md)** — Guided 5-minute stakeholder presentation
|
||||
- **[Architecture](architecture.md)** — How the control plane, agents, and connectors work together
|
||||
- **[Connector Guide](connectors.md)** — Build custom connectors for your infrastructure
|
||||
- **[CLI Reference](cli.md)** — Manage certificates from your terminal
|
||||
|
||||
Reference in New Issue
Block a user