From 8e173849834fcff4b9579bdd330dd951740387f7 Mon Sep 17 00:00:00 2001 From: shankar0123 Date: Sat, 14 Mar 2026 21:53:34 -0400 Subject: [PATCH] Add technical explanations to advanced demo and convert all diagrams to Mermaid MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add how/why technical breakdowns to every step in demo-advanced.md: handler→service→repository code paths, SQL details, security reasoning, field-by-field explanations, and architectural design decisions - Convert all ASCII box diagrams to Mermaid across docs: architecture.md (9 diagrams), demo-advanced.md (6), concepts.md (1) - Diagram types: flowcharts, sequence diagrams, ER diagram, state machine - Remove placeholder Support & Community section from README - Zero ASCII box-drawing characters remaining in docs Co-Authored-By: Claude Opus 4.6 --- README.md | 10 -- docs/architecture.md | 390 ++++++++++++++++++++++++++++++++---------- docs/concepts.md | 16 +- docs/demo-advanced.md | 295 ++++++++++++++++++++++++++++++-- 4 files changed, 590 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index 266fc70..5bcbd60 100644 --- a/README.md +++ b/README.md @@ -381,13 +381,3 @@ make docker-logs-agent Certctl is licensed under the [Apache License 2.0](LICENSE). See LICENSE file for details. -## Support & Community - -- **Issues**: [GitHub Issues](https://github.com/shankar0123/certctl/issues) -- **Discussions**: [GitHub Discussions](https://github.com/shankar0123/certctl/discussions) -- **Documentation**: [Full Docs](docs/) -- **Security**: security@example.com - ---- - -**Built with ❤️ for infrastructure teams managing certificates at scale.** diff --git a/docs/architecture.md b/docs/architecture.md index aa6caa7..d569f72 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -16,6 +16,53 @@ New to certificates? Read the [Concepts Guide](concepts.md) first. ## System Components +```mermaid +flowchart TB + subgraph "Control Plane" + API["REST API\n(Go net/http, :8443)"] + SVC["Service Layer"] + REPO["Repository Layer\n(database/sql + lib/pq)"] + SCHED["Background Scheduler\n4 loops"] + DASH["Web Dashboard\n(React SPA)"] + end + + subgraph "Data Store" + PG[("PostgreSQL 16\n14 tables\nTEXT primary keys")] + end + + subgraph "Agent Fleet" + A1["Agent: nginx-prod\n(heartbeat + work poll)"] + A2["Agent: f5-prod"] + A3["Agent: iis-prod"] + end + + subgraph "Issuer Backends" + CA1["Local CA\n(crypto/x509)"] + CA2["ACME\n(Let's Encrypt)"] + CA3["Vault PKI\n(future)"] + end + + subgraph "Target Systems" + T1["NGINX\n(SSH + reload)"] + T2["F5 BIG-IP\n(REST API)"] + T3["IIS\n(WinRM)"] + end + + DASH --> API + API --> SVC + SVC --> REPO + REPO --> PG + SCHED --> SVC + SVC -->|"Issue/Renew"| CA1 & CA2 & CA3 + + A1 & A2 & A3 -->|"CSR + Heartbeat"| API + API -->|"Cert + Chain\n(NO private key)"| A1 & A2 & A3 + + A1 -->|"Deploy"| T1 + A2 -->|"Deploy"| T2 + A3 -->|"Deploy"| T3 +``` + ### Control Plane (Server) The control plane is a Go HTTP server backed by PostgreSQL. It manages state (certificates, agents, targets, issuers, policies), orchestrates issuance by coordinating with CAs through issuer connectors, tracks jobs for certificate issuance/renewal/deployment workflows, maintains an immutable audit trail, and dispatches work via a background scheduler. @@ -40,36 +87,116 @@ The dashboard includes a **demo mode** that activates when the API is unreachabl All state is stored in PostgreSQL 16. The schema uses TEXT primary keys (not UUIDs) with human-readable prefixed IDs like `mc-api-prod`, `t-platform`, `o-alice`. -Database tables: +```mermaid +erDiagram + teams ||--o{ owners : "has members" + teams ||--o{ managed_certificates : "owns" + owners ||--o{ managed_certificates : "responsible for" + issuers ||--o{ managed_certificates : "signs" + renewal_policies ||--o{ managed_certificates : "governs" + managed_certificates ||--o{ certificate_versions : "has versions" + managed_certificates ||--o{ certificate_target_mappings : "deployed to" + deployment_targets ||--o{ certificate_target_mappings : "receives" + agents ||--o{ deployment_targets : "manages" + managed_certificates ||--o{ jobs : "triggers" + policy_rules ||--o{ policy_violations : "produces" + managed_certificates ||--o{ policy_violations : "violates" + managed_certificates ||--o{ audit_events : "logged in" + managed_certificates ||--o{ notification_events : "generates" -``` -Teams & Ownership - ├── teams - └── owners - -Certificate Management - ├── managed_certificates - ├── certificate_versions - └── renewal_policies - -Infrastructure - ├── agents - └── deployment_targets - -Issuance - ├── issuers - └── jobs - -Policy Engine - ├── policy_rules - └── policy_violations - -Certificate-Target Mapping - └── certificate_target_mappings - -Monitoring & Audit - ├── audit_events - └── notification_events + teams { + text id PK + text name + text description + } + owners { + text id PK + text name + text email + text team_id FK + } + managed_certificates { + text id PK + text name + text common_name + text[] sans + text environment + text owner_id FK + text team_id FK + text issuer_id FK + text renewal_policy_id FK + text status + timestamp expires_at + jsonb tags + } + certificate_versions { + text id PK + text certificate_id FK + text serial_number + text fingerprint_sha256 + text pem_chain + text csr_pem + } + agents { + text id PK + text name + text hostname + text status + text api_key_hash + } + deployment_targets { + text id PK + text name + text type + text agent_id FK + jsonb config + } + issuers { + text id PK + text name + text type + jsonb config + boolean enabled + } + jobs { + text id PK + text type + text certificate_id FK + text target_id FK + text status + int attempts + } + policy_rules { + text id PK + text name + text type + jsonb config + boolean enabled + } + policy_violations { + text id PK + text certificate_id FK + text rule_id FK + text message + text severity + } + audit_events { + text id PK + text actor + text actor_type + text action + text resource_type + text resource_id + jsonb details + } + notification_events { + text id PK + text type + text certificate_id FK + text channel + text recipient + text status + } ``` Migrations are idempotent (`IF NOT EXISTS` on all CREATE statements, `ON CONFLICT (id) DO NOTHING` on all seed data) so they're safe to run multiple times — important for Docker Compose where both initdb and the server may run the same SQL. @@ -78,57 +205,50 @@ Migrations are idempotent (`IF NOT EXISTS` on all CREATE statements, `ON CONFLIC ### 1. Create Managed Certificate -``` -User / API Client - │ - ├─→ POST /api/v1/certificates - │ { - │ "name": "API Production", - │ "common_name": "api.example.com", - │ "sans": ["api.example.com"], - │ "environment": "production", - │ "owner_id": "o-alice", - │ "team_id": "t-platform", - │ "issuer_id": "iss-local", - │ "renewal_policy_id": "rp-default", - │ "status": "Pending" - │ } - │ - └─→ Control Plane - ├─ Validates input and policy rules - ├─ Inserts record into managed_certificates - ├─ Logs audit event (certificate_created) - └─ Returns certificate with ID +```mermaid +sequenceDiagram + participant U as User / API Client + participant API as REST API + participant SVC as CertificateService + participant DB as PostgreSQL + participant AUD as AuditService + + U->>API: POST /api/v1/certificates
{name, common_name, sans, ...} + API->>SVC: Create(ctx, certificate) + SVC->>SVC: Validate required fields + SVC->>DB: INSERT INTO managed_certificates + SVC->>AUD: Create(audit_event: certificate_created) + AUD->>DB: INSERT INTO audit_events + SVC-->>API: ManagedCertificate + API-->>U: 201 Created + JSON body ``` ### 2. Agent Requests Certificate (CSR → Issuance) -``` -Agent Control Plane Issuer (Local CA / ACME) - │ │ │ - ├─ POST /api/v1/agents/{id}/csr │ │ - │ { "csr_pem": "-----BEGIN..." } │ │ - │ ├─ Validate CSR │ - │ │ │ - │ ├─ Submit CSR to issuer │ - │ ├──────────────────────────────→│ - │ │ │ - │ │← Signed certificate + chain │ - │ │←──────────────────────────────│ - │ │ │ - │ ├─ Store certificate version │ - │ ├─ Update cert status → Active │ - │ ├─ Log audit event │ - │ │ │ - │← Certificate + chain (PEM) │ │ - │ (NO private key) │ │ - │ │ │ - ├─ Store locally: │ │ - │ cert.pem + chain.pem │ │ - │ key.pem (generated locally, │ │ - │ never sent anywhere) │ │ - │ │ │ - └─ Deploy to target system │ │ +```mermaid +sequenceDiagram + participant A as Agent + participant API as Control Plane API + participant ISS as Issuer Connector + participant DB as PostgreSQL + + A->>A: Generate RSA-2048 key pair + A->>A: Create CSR (CN + SANs, public key only) + A->>API: POST /api/v1/agents/{id}/csr
{csr_pem: "-----BEGIN..."} + + API->>API: Validate CSR format + API->>ISS: IssueCertificate(IssuanceRequest{CSR}) + ISS-->>API: IssuanceResult{cert_pem, chain_pem, serial, not_after} + + API->>DB: INSERT INTO certificate_versions + API->>DB: UPDATE managed_certificates SET status='Active' + API->>DB: INSERT INTO audit_events + + API-->>A: {certificate_pem, chain_pem}
(NO private key in response) + + A->>A: Store cert.pem + chain.pem locally + Note over A: key.pem stays on agent
Never transmitted anywhere + A->>A: Deploy to target system ``` ### 3. Deploy Certificate to Target @@ -145,6 +265,21 @@ The agent handles both the certificate (public) and the private key (local only) The control plane runs a scheduler with four background loops: +```mermaid +flowchart LR + subgraph "Scheduler (Background Goroutines)" + R["Renewal Checker\n⏱ every 1h"] + J["Job Processor\n⏱ every 30s"] + H["Agent Health\n⏱ every 2m"] + N["Notification Processor\n⏱ every 1m"] + end + + R -->|"Find expiring certs\nCreate renewal jobs"| DB[("PostgreSQL")] + J -->|"Process pending jobs\nCoordinate issuance"| DB + H -->|"Check heartbeat staleness\nMark agents offline"| DB + N -->|"Send pending notifications\nEmail / Webhook"| DB +``` + | Loop | Interval | Purpose | |------|----------|---------| | Renewal checker | 1 hour | Finds certificates approaching expiry, creates renewal jobs | @@ -158,6 +293,33 @@ When the renewal checker finds a certificate within its renewal window (e.g., 30 Certctl uses connector interfaces for extensibility. Each connector type has a standard interface that implementations must satisfy. +```mermaid +flowchart TB + subgraph "Issuer Connectors" + direction TB + II["IssuerConnector Interface\nIssueCertificate() | RenewCertificate()\nRevokeCertificate() | GetOrderStatus()"] + II --> LC["Local CA"] + II --> ACME["ACME v2"] + II --> VP["Vault PKI (future)"] + end + + subgraph "Target Connectors" + direction TB + TI["TargetConnector Interface\nDeployCertificate()\nValidateDeployment()"] + TI --> NG["NGINX"] + TI --> F5["F5 BIG-IP"] + TI --> IIS["IIS"] + end + + subgraph "Notifier Connectors" + direction TB + NI["NotifierConnector Interface\nSendAlert() | SendEvent()"] + NI --> EM["Email (SMTP)"] + NI --> WH["Webhook (HTTP)"] + NI --> SL["Slack (future)"] + end +``` + ### Issuer Connector Handles certificate issuance from CAs. @@ -208,6 +370,30 @@ See the [Connector Development Guide](connectors.md) for details on building cus ### Private Key Management +```mermaid +flowchart LR + subgraph "Agent (Your Infrastructure)" + GEN["1. GENERATE\ncrypto/rsa 2048-bit"] + STORE["2. STORE\nFile perms 0600"] + USE["3. USE\nCSR gen + deployment"] + ROT["4. ROTATE\nDelete old after renewal"] + end + + subgraph "Control Plane (certctl-server)" + CP["Only sees:\n• Certificates (public)\n• Chains (public)\n• CSRs (public key only)"] + end + + GEN --> STORE --> USE --> ROT + USE -.->|"CSR (public key only)"| CP + CP -.->|"Signed cert + chain"| USE + + style CP fill:#fee,stroke:#c33 + style GEN fill:#efe,stroke:#3c3 + style STORE fill:#efe,stroke:#3c3 + style USE fill:#efe,stroke:#3c3 + style ROT fill:#efe,stroke:#3c3 +``` + Private keys follow a strict lifecycle: 1. **Generated on the agent** — never sent to the control plane @@ -262,29 +448,43 @@ Health checks live outside the API prefix: `GET /health` and `GET /ready`. ### Docker Compose (Development / Small Deployments) -``` -┌─────────────────────────────────┐ -│ Docker Network │ -│ ├─ certctl-server (:8443) │ -│ │ └─ Serves API + dashboard │ -│ ├─ postgres (:5432) │ -│ │ └─ Schema + seed data │ -│ └─ certctl-agent │ -│ └─ Heartbeat + work polling │ -└─────────────────────────────────┘ +```mermaid +flowchart TB + subgraph "Docker Network (certctl-network)" + SERVER["certctl-server\n:8443\nAPI + Dashboard"] + PG[("PostgreSQL\n:5432\nSchema + Seed Data")] + AGENT["certctl-agent\nHeartbeat + Work Poll"] + end + + USER["Browser / curl"] -->|"HTTP :8443"| SERVER + SERVER -->|"SQL"| PG + AGENT -->|"HTTP (internal)"| SERVER ``` ### Production (Kubernetes) -``` -┌──────────────────────────────────────────────┐ -│ Kubernetes Cluster │ -│ ├─ Deployment: certctl-server (replicas=2+) │ -│ ├─ DaemonSet: certctl-agent (infra nodes) │ -│ ├─ StatefulSet: PostgreSQL (primary+replica) │ -│ ├─ ConfigMap: issuer/target configurations │ -│ └─ Secret: API keys, ACME credentials │ -└──────────────────────────────────────────────┘ +```mermaid +flowchart TB + subgraph "Kubernetes Cluster" + subgraph "Control Plane" + DEP["Deployment\ncertctl-server\nreplicas: 2+"] + CM["ConfigMap\nIssuer/target configs"] + SEC["Secret\nAPI keys, ACME creds"] + end + + subgraph "Data" + SS[("StatefulSet\nPostgreSQL\nprimary + replica")] + end + + subgraph "Agent Fleet" + DS["DaemonSet\ncertctl-agent\n(infra nodes)"] + end + end + + ING["Ingress\n+ TLS termination"] --> DEP + DEP --> SS + DEP --> CM & SEC + DS --> DEP ``` For production, you would also add an ingress controller, TLS termination for the certctl API itself, and external PostgreSQL (RDS, Cloud SQL, etc.). diff --git a/docs/concepts.md b/docs/concepts.md index 706cbb8..dc55476 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -80,10 +80,18 @@ Targets are the systems where certificates actually get installed — NGINX web Every managed certificate in certctl goes through these states: -``` -Pending → Active → Expiring → (auto-renewal) → Active → ... - → Expired (if renewal fails) - → Failed (if issuance fails) +```mermaid +stateDiagram-v2 + [*] --> Pending: Certificate created + Pending --> Active: Issuance succeeds + Pending --> Failed: Issuance fails + Active --> Expiring: Within renewal window + Expiring --> RenewalInProgress: Auto-renewal triggered + RenewalInProgress --> Active: Renewal succeeds + RenewalInProgress --> Failed: Renewal fails + Expiring --> Expired: Renewal not attempted / all retries exhausted + Active --> Archived: Decommissioned + Failed --> Pending: Retry requested ``` - **Pending**: Certificate record created, awaiting initial issuance diff --git a/docs/demo-advanced.md b/docs/demo-advanced.md index 48910df..ccc28c2 100644 --- a/docs/demo-advanced.md +++ b/docs/demo-advanced.md @@ -1,8 +1,8 @@ # Advanced Demo: Certificate Lifecycle End-to-End -This demo goes beyond browsing pre-loaded data. You'll create a team, register an owner, set up an issuer, create a certificate, trigger renewal, and watch everything appear in the dashboard in real time. By the end, you'll understand the full certificate lifecycle as certctl manages it. +This demo goes beyond browsing pre-loaded data. You'll create a team, register an owner, set up an issuer, create a certificate, trigger renewal, and watch everything appear in the dashboard in real time. Each step includes a technical explanation of what's happening inside certctl and why the system is designed that way. -**Time**: 10-15 minutes +**Time**: 15-20 minutes **Prerequisites**: certctl running via Docker Compose (see [Quick Start](quickstart.md)) ## Setup @@ -23,6 +23,23 @@ Set up a base variable for convenience: API="http://localhost:8443" ``` +## How the pieces fit together + +Before we start, here's the high-level flow of what we're about to do: + +```mermaid +flowchart LR + A[Create Team\n& Owner] --> B[Verify Issuer] + B --> C[Create\nCertificate] + C --> D[Trigger\nRenewal] + D --> E[Trigger\nDeployment] + E --> F[Inspect Audit\n& Notifications] +``` + +Each step corresponds to a real operation that certctl would perform in production. The difference here is that we're driving each step manually via curl instead of letting the scheduler and agents handle it automatically. + +--- + ## Part 1: Build the Organization Structure ### Create a new team @@ -37,6 +54,10 @@ curl -s -X POST $API/api/v1/teams \ }' | jq . ``` +**How it works:** This `POST` hits the `/api/v1/teams` endpoint, which routes through Go 1.22's `net/http` pattern-based mux to the `TeamsHandler.CreateTeam` method. The handler deserializes the JSON body into a `domain.Team` struct, calls the `TeamService.Create()` method, which delegates to the `TeamRepository.Create()` postgres implementation — executing an `INSERT INTO teams (id, name, description, created_at, updated_at) VALUES (...)`. The server returns the full team object with server-generated timestamps. + +**Why teams exist:** Certificate ownership is a core design decision. In organizations with hundreds of certificates, outages happen when nobody knows who's responsible for a specific cert. Teams create accountability boundaries — when a cert expires, certctl knows exactly which team to alert. This maps to how enterprises actually operate: the platform team owns infrastructure certs, the payments team owns PCI-scoped certs, etc. + ### Register an owner ```bash @@ -50,6 +71,10 @@ curl -s -X POST $API/api/v1/owners \ }' | jq . ``` +**How it works:** Same handler → service → repository flow. The owner is inserted into the `owners` table with a foreign key reference to the team via `team_id`. The `team_id` field isn't enforced at the database FK level in V1 (to keep migrations simple), but the service layer validates the reference. + +**Why owners matter:** Owners are the individual humans accountable for certificates. When certctl sends an expiration warning notification, it needs a recipient. The owner's email becomes the notification target. This also feeds the audit trail — every action is attributed to an actor, and owners provide the human identity layer. + Verify both exist: ```bash @@ -57,7 +82,11 @@ curl -s $API/api/v1/teams/t-demo | jq . curl -s $API/api/v1/owners/o-demo-user | jq . ``` -## Part 2: Configure the Issuer +**How it works:** These `GET` requests use path parameters (`/api/v1/teams/{id}`) which Go 1.22's router extracts via `r.PathValue("id")`. The handler calls `service.Get(ctx, id)` which issues `SELECT * FROM teams WHERE id = $1`. If the row doesn't exist, the repository returns `nil` and the handler responds with HTTP 404. + +--- + +## Part 2: Verify the Issuer The demo ships with a Local CA issuer (`iss-local`) that can sign certificates immediately — no external CA needed. Let's verify it's available: @@ -75,7 +104,36 @@ You should see: } ``` -This Local CA generates real X.509 certificates using Go's `crypto/x509` library. The certificates are self-signed (not trusted by browsers in production), but structurally identical to production certificates — they have serial numbers, validity periods, SANs, key usage extensions, and a proper certificate chain. +**How it works:** The issuer record was inserted during database seeding (`migrations/seed_demo.sql`). The `type` field (`GenericCA`) maps to a connector implementation. When the server starts, it registers connector instances in an `issuerRegistry` map keyed by issuer ID. When a certificate needs issuance, the service layer looks up the issuer ID in this registry to find the right connector. + +**How the Local CA works internally:** The Local CA connector (`internal/connector/issuer/local/local.go`) generates a self-signed root CA certificate on first use using Go's `crypto/x509` and `crypto/rsa` packages. The CA key pair lives in memory only — it's regenerated each time the server restarts. When it receives an `IssuanceRequest` containing a CSR (Certificate Signing Request), it: + +1. Parses the CSR using `x509.ParseCertificateRequest()` +2. Generates a random serial number via `crypto/rand` +3. Creates an `x509.Certificate` template with the CN, SANs, validity period, key usage extensions (Digital Signature, Key Encipherment), and extended key usage (TLS Server Auth) +4. Signs it with the CA's private key using `x509.CreateCertificate()` +5. Returns the PEM-encoded certificate and chain + +The result is a structurally valid X.509 certificate — browsers won't trust it (no root CA in their trust store), but it exercises the exact same code paths that a production ACME or Vault issuer would. + +**Why pluggable issuers:** Different organizations use different CAs. Some use Let's Encrypt (ACME protocol), some use internal PKI (Vault, ADCS), some use commercial CAs (DigiCert, Sectigo). The connector interface means certctl doesn't care — it calls `IssueCertificate()` and gets back a signed cert regardless of the backend. + +```mermaid +flowchart TD + subgraph "Issuer Connector Interface" + A["IssueCertificate(CSR)"] + B["RenewCertificate(CSR)"] + C["RevokeCertificate(serial)"] + D["GetOrderStatus(orderID)"] + end + + A --> E["Local CA\n(crypto/x509)"] + A --> F["ACME\n(Let's Encrypt)"] + A --> G["Vault PKI\n(future)"] + A --> H["DigiCert API\n(future)"] +``` + +--- ## Part 3: Create a Managed Certificate @@ -103,7 +161,28 @@ curl -s -X POST $API/api/v1/certificates \ }' | jq . ``` -**Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details: owner, team, environment, tags, and timeline. +**How it works:** The `CertificatesHandler.CreateCertificate` handler deserializes the JSON into a `domain.ManagedCertificate` struct and calls `CertificateService.Create()`. The service layer: + +1. Validates required fields (`common_name`, `issuer_id`, `renewal_policy_id`) +2. Stores `sans` as a PostgreSQL `TEXT[]` array and `tags` as a `JSONB` column +3. Inserts into the `managed_certificates` table +4. Logs an audit event via `AuditService.Create()` — recording the actor, action (`certificate_created`), resource type, and resource ID +5. Returns the full certificate record with `created_at` and `updated_at` timestamps + +**Why each field matters:** + +| Field | Purpose | +|-------|---------| +| `id` | Human-readable TEXT primary key (not UUID). Prefixed with `mc-` by convention so you can identify resource types at a glance in logs and queries. | +| `common_name` | The primary domain this certificate covers. Maps to the CN field in the X.509 certificate. | +| `sans` | Subject Alternative Names — additional domains covered by the same certificate. Modern browsers actually check SANs, not CN, for domain validation. | +| `environment` | Organizational tag (`production`, `staging`, `development`). Used for dashboard filtering and policy enforcement (e.g., "staging certs can only use the Local CA"). | +| `issuer_id` | Links to the issuer connector that will sign this certificate. Determines which CA backend is used. | +| `renewal_policy_id` | Links to a `renewal_policies` row that defines: how many days before expiry to renew (`renewal_window_days`), whether auto-renewal is enabled (`auto_renew`), max retries, and retry interval. The default policy (`rp-default`) renews 30 days before expiry. | +| `status` | Set to `Pending` because the certificate hasn't been issued yet. The scheduler will pick it up, or you can trigger renewal manually. | +| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"pci": "true"` for compliance scoping). | + +**Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details. ### Verify via API @@ -111,9 +190,11 @@ curl -s -X POST $API/api/v1/certificates \ curl -s $API/api/v1/certificates/mc-demo-api | jq '{id, name, common_name, status, environment, owner_id, team_id}' ``` +--- + ## Part 4: Trigger Certificate Renewal -In production, the scheduler automatically triggers renewal when certificates approach expiry. For this demo, we'll trigger it manually: +In production, the scheduler automatically triggers renewal when certificates approach expiry. The scheduler's renewal loop runs every hour, queries `SELECT * FROM managed_certificates WHERE status IN ('Active', 'Expiring') AND expires_at < NOW() + interval '30 days'`, and creates renewal jobs for each match. For this demo, we'll trigger it manually: ```bash curl -s -X POST $API/api/v1/certificates/mc-demo-api/renew | jq . @@ -126,7 +207,51 @@ Expected response: } ``` -This creates a renewal job. Check the jobs list: +**How it works:** The `TriggerRenewal` handler extracts the certificate ID from the URL path, calls `CertificateService.TriggerRenewal(ctx, id)`, which: + +1. Fetches the certificate from the database to verify it exists +2. Creates a new `Job` record in the `jobs` table with `type: "Renewal"`, `status: "Pending"`, `certificate_id: "mc-demo-api"`, and `scheduled_at: now()` +3. The response returns `202 Accepted` immediately — the actual renewal happens asynchronously + +The `202 Accepted` status code is deliberate. Certificate issuance can take seconds (Local CA) to minutes (ACME DNS challenges). The API doesn't block the caller — it creates a job and returns. The job processor loop (runs every 30 seconds) picks up pending jobs and executes them. + +**What happens during a real renewal (production flow):** + +```mermaid +sequenceDiagram + participant S as Scheduler + participant DB as PostgreSQL + participant SVC as CertificateService + participant ISS as IssuerConnector + participant A as Agent + + S->>DB: Query expiring certificates + DB-->>S: [mc-demo-api: expires in 25 days] + S->>DB: INSERT job (type=Renewal, status=Pending) + + Note over S: Job processor loop (every 30s) + S->>DB: SELECT pending jobs + DB-->>S: [job-123: Renewal for mc-demo-api] + + S->>A: Notify: generate CSR for demo-api.internal.example.com + A->>A: Generate RSA-2048 key pair locally + A->>A: Create CSR with CN + SANs + A->>SVC: POST /api/v1/agents/{id}/csr {csr_pem: "..."} + + SVC->>ISS: IssueCertificate(CSR) + ISS-->>SVC: {cert_pem, chain_pem, serial, not_after} + + SVC->>DB: INSERT certificate_version + SVC->>DB: UPDATE managed_certificates SET status='Active' + SVC->>DB: INSERT audit_event (certificate_renewed) + + SVC-->>A: {certificate_pem, chain_pem} + A->>A: Store cert + chain locally (key never leaves) +``` + +The critical security property: the private key is generated by the agent in step 3 and never transmitted. The CSR contains only the public key. The control plane forwards the CSR to the issuer and returns the signed certificate — it never has access to the private key material. + +Check the jobs list: ```bash curl -s "$API/api/v1/jobs" | jq '.data[] | select(.certificate_id == "mc-demo-api") | {id, type, status, certificate_id}' @@ -134,6 +259,8 @@ curl -s "$API/api/v1/jobs" | jq '.data[] | select(.certificate_id == "mc-demo-ap **Check the dashboard.** Go to the "Jobs" view — you'll see the renewal job for your certificate. +--- + ## Part 5: Deploy the Certificate Trigger deployment to see the deployment workflow: @@ -149,12 +276,54 @@ Expected response: } ``` +**How it works:** The `TriggerDeployment` handler optionally accepts a `target_id` in the request body. If no target is specified, it creates deployment jobs for all targets mapped to this certificate (via the `certificate_target_mappings` table). Each deployment job is independent — if NGINX succeeds but F5 fails, the NGINX deployment isn't rolled back. + +The handler: +1. Looks up the certificate +2. Finds all deployment targets for this certificate (or uses the specific `target_id` if provided) +3. Creates a `Job` record for each target with `type: "Deployment"`, `target_id`, and `certificate_id` +4. Returns `202 Accepted` + +**What the agent does during deployment:** + +```mermaid +sequenceDiagram + participant A as Agent + participant TC as TargetConnector + participant T as Target System + + A->>A: Load cert.pem + key.pem from local storage + A->>TC: DeployCertificate(cert_pem, chain_pem, config) + + alt NGINX Target + TC->>T: Write cert.pem to /etc/nginx/certs/ + TC->>T: Write chain.pem to /etc/nginx/certs/ + TC->>T: Run: nginx -t (validate config) + TC->>T: Run: systemctl reload nginx + TC-->>A: {success: true, deployed_at: "..."} + else F5 Target + TC->>T: POST /mgmt/tm/sys/crypto/cert (upload cert) + TC->>T: PUT /mgmt/tm/ltm/virtual (bind to virtual server) + TC-->>A: {success: true, deployed_at: "..."} + else IIS Target + TC->>T: WinRM: Import-PfxCertificate + TC->>T: WinRM: Set-WebBinding -SslFlags + TC-->>A: {success: true, deployed_at: "..."} + end + + A->>A: Report deployment status to control plane +``` + +Notice the `DeploymentRequest` struct intentionally omits the private key field. The agent loads the key from its own local storage and combines it with the certificate from the control plane. This is the architectural boundary that ensures zero private key exposure — the target connector interface physically cannot receive keys from the control plane because the data structure doesn't carry them. + Check for deployment jobs: ```bash curl -s "$API/api/v1/jobs" | jq '.data[] | select(.certificate_id == "mc-demo-api")' ``` +--- + ## Part 6: View the Audit Trail Every action you've taken has been recorded. Check the audit trail: @@ -163,10 +332,24 @@ Every action you've taken has been recorded. Check the audit trail: curl -s $API/api/v1/audit | jq '.data[0:5]' ``` -You'll see events for certificate creation, renewal trigger, and deployment trigger — each with actor, action, resource type, and timestamp. +**How it works:** The `audit_events` table is append-only — there is no `UPDATE` or `DELETE` in the `AuditRepository` interface. This is a deliberate design decision for compliance. Every service method that mutates state calls `AuditService.Create()` with: + +| Field | Source | Example | +|-------|--------|---------| +| `actor` | The authenticated user or system component | `"o-demo-user"`, `"system"`, `"agent-prod-01"` | +| `actor_type` | Category of the actor | `"User"`, `"System"`, `"Agent"` | +| `action` | What happened | `"certificate_created"`, `"renewal_triggered"`, `"deployment_completed"` | +| `resource_type` | What was affected | `"certificate"`, `"team"`, `"agent"` | +| `resource_id` | Specific resource | `"mc-demo-api"` | +| `details` | Arbitrary JSON context | `{"environment": "staging", "issuer": "iss-local"}` | +| `timestamp` | When it happened (server clock) | `"2026-03-14T10:30:00Z"` | + +**Why immutable audit:** Compliance frameworks (SOC 2 Type II, PCI-DSS, ISO 27001) require tamper-evident audit logs. By making the repository interface append-only, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection. **Check the dashboard.** The "Audit" view shows the full timeline of all actions across the system. +--- + ## Part 7: Check Notifications Certctl sends notifications for certificate lifecycle events. Check what notifications were generated: @@ -175,7 +358,26 @@ Certctl sends notifications for certificate lifecycle events. Check what notific curl -s $API/api/v1/notifications | jq '.data[0:5]' ``` -In demo mode, notifications are marked as "sent" even without a real email/webhook backend. In production, these would go out via SMTP or HTTP webhooks. +**How it works:** The `NotificationService` generates notification records in the `notification_events` table whenever significant events occur — certificate creation, expiration warnings, renewal success/failure, deployment results, policy violations. Each notification has a `channel` (Email, Webhook) and a `recipient`. + +The notification processor loop runs every 60 seconds and processes pending notifications: + +```mermaid +flowchart TD + A[Notification Processor\nevery 60s] --> B{Pending\nnotifications?} + B -->|Yes| C[Look up channel\nin notifierRegistry] + C --> D{Notifier\nregistered?} + D -->|Yes| E[Call Notifier.Send\nrecipient, subject, body] + D -->|No| F[Mark as 'sent'\nDemo mode graceful skip] + E --> G{Delivery\nsucceeded?} + G -->|Yes| H[Update status → 'sent'\nRecord sent_at timestamp] + G -->|No| I[Update status → 'failed'\nRecord error message] + B -->|No| J[Sleep until next tick] +``` + +**Why graceful notifier fallback:** In demo mode, no SMTP server or webhook endpoint is configured. Rather than spamming error logs with "notifier not found" every 60 seconds (which was the original behavior — we fixed this), the service marks notifications as "sent" when no notifier is registered for the channel. This keeps the notification records visible in the dashboard without requiring external infrastructure. + +--- ## Part 8: Create a Second Certificate and Compare @@ -204,9 +406,15 @@ curl -s -X POST $API/api/v1/certificates \ }' | jq . ``` -This certificate expires in about 18 days from the demo date, so it should show up as "Expiring" in the dashboard when the scheduler runs. **Refresh the dashboard** — you'll see it in the certificate list. +**How it works:** This certificate is created with status `Active` and an explicit `expires_at` 18 days from now. The scheduler's renewal checker will flag this certificate when it runs because `expires_at - now() < 30 days` (the default renewal window in `rp-default`). It would transition the status to `Expiring` and create a renewal job. -Now filter the dashboard by environment or status to see how the filtering works with your new certificates mixed in with the demo data. +**Why `environment` matters:** The environment field isn't just metadata — it feeds the policy engine. A policy rule with type `AllowedEnvironments` can restrict which environments are valid. If someone tries to create a certificate with `environment: "yolo"`, the policy engine flags a violation. In a mature deployment, you'd enforce policies strictly: production certificates must use a trusted CA (not Local CA), staging certificates can use Let's Encrypt staging, and development certificates can use the Local CA. + +**Why `pci: true` in tags:** Tags are free-form, but they enable powerful filtering and compliance scoping. A security team could query `GET /api/v1/certificates?tags.pci=true` (not implemented yet, but the JSONB column supports it) to find all PCI-scoped certificates and verify they meet compliance requirements. + +**Refresh the dashboard** — you'll see the new payment gateway certificate. Try filtering by environment or status to see how both certificates appear alongside the demo data. + +--- ## Part 9: Policy Violations @@ -216,14 +424,75 @@ Let's see what happens when a certificate doesn't meet policy requirements. Chec curl -s $API/api/v1/policies | jq '.data[] | {id, name, type, enabled}' ``` -The demo includes rules for required owner metadata, allowed environments, maximum certificate lifetime, and minimum renewal windows. Check existing violations: +**How it works:** Policy rules are stored in the `policy_rules` table with a `type` field that determines the enforcement logic and a `config` JSONB column with rule-specific parameters. The demo ships with four rules: + +| Rule | Type | What it enforces | +|------|------|-----------------| +| `pr-require-owner` | `RequiredMetadata` | Every certificate must have an `owner_id` | +| `pr-allowed-environments` | `AllowedEnvironments` | Only `production`, `staging`, `development` are valid | +| `pr-max-certificate-lifetime` | `RenewalLeadTime` | Certificates can't exceed a maximum lifetime | +| `pr-min-renewal-window` | `RenewalLeadTime` | Certificates must be renewed at least N days before expiry | + +When a certificate is created or updated, the policy service evaluates it against all enabled rules. Violations are recorded in the `policy_violations` table with a severity (`Warning`, `Error`, `Critical`) and a human-readable message. + +Check existing violations: ```bash curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq . ``` +**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific non-compliance. + **In the dashboard**, click "Policies" in the sidebar to see all active rules and which certificates are violating them. +--- + +## End-to-End Architecture Summary + +Here's what we just walked through, mapped to the system architecture: + +```mermaid +flowchart TB + subgraph "What You Did (API Calls)" + U1["POST /teams"] --> U2["POST /owners"] + U2 --> U3["POST /certificates"] + U3 --> U4["POST /certificates/{id}/renew"] + U4 --> U5["POST /certificates/{id}/deploy"] + U5 --> U6["GET /audit"] + end + + subgraph "Control Plane (certctl-server)" + API["REST API\nGo net/http"] + SVC["Service Layer\nBusiness Logic"] + REPO["Repository Layer\ndatabase/sql + lib/pq"] + SCHED["Scheduler\n4 background loops"] + CONN["Connector Registry\nIssuer + Target + Notifier"] + end + + subgraph "Data Store" + PG["PostgreSQL 16\n14 tables, TEXT PKs"] + end + + subgraph "Agent (certctl-agent)" + AGENT["Agent Process\nHeartbeat + Work Poll"] + KEYS["Local Key Storage\nPrivate keys (0600)"] + TC["Target Connectors\nNGINX / F5 / IIS"] + end + + U1 & U2 & U3 & U4 & U5 & U6 --> API + API --> SVC + SVC --> REPO + REPO --> PG + SVC --> CONN + SCHED --> SVC + AGENT -->|"CSR + Heartbeat"| API + API -->|"Cert + Chain (no key)"| AGENT + AGENT --> KEYS + AGENT --> TC +``` + +--- + ## Full Automated Script Here's a single script that runs the entire demo end-to-end. Save it as `demo.sh` and run it: @@ -336,6 +605,8 @@ chmod +x demo.sh ./demo.sh ``` +--- + ## What to Show Stakeholders If you're using this demo to present certctl to decision-makers, here's the narrative: