mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 12:41:30 +00:00
docs: retire compliance subtree + sweep framework name-drops from prose
Per operator decision the framework-mapping docs are gone. They
were aspirational (no audit, no certification, no validated
mapping); keeping them around was misleading.
Files deleted (1,883 lines):
- docs/compliance/index.md
- docs/compliance/soc2.md
- docs/compliance/pci-dss.md
- docs/compliance/nist-sp-800-57.md
Hyperlinks removed:
- README.md: 'Auditor / compliance' row in the doc table; the
'(compliance mapping included)' parenthetical in the
positioning paragraph
- docs/README.md: the '## Compliance' section table; the
'Auditor / compliance team' reading-order-by-role row
Prose name-drops swept across 24 files:
- README.md: 'FedRAMP boundary CAs / financial-services policy
CAs' → '4-level boundary CAs / 3-level policy CAs';
'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High,
SOC 2 Type II, HIPAA' → cut entirely
- getting-started/{quickstart,concepts,examples,why-certctl,
advanced-demo}.md: 'compliance' → 'audit' / 'policy';
'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut;
''pci': 'true'' tag example → ''environment': 'production''
- migration/cert-manager-coexistence.md: 'compliance rules' →
'policy rules'
- operator/approval-workflow.md: 'Compliance customers (PCI-DSS
Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' →
'Operators'; entire 'Compliance control mapping' table
(PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1
/ HIPAA §164.308(a)(4)) deleted; 'compliance contract' →
'two-person-integrity contract'; 'compliance auditors' →
'reviewers'
- operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5'
audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5
attestation' section retitled to 'TLS posture summary' and
rewritten without framework framing; 'PCI-DSS, NIST, and
major browsers will eventually deprecate TLS 1.2' →
'Major browsers and OS vendors will eventually deprecate
TLS 1.2'
- operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref →
CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS
Req 4 v4.0 prose footing → cut
- operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI
procurement-team deliverable' → 'on-call deliverable';
'compliance auditors' → 'reviewers'
- reference/connectors/{acme,aws-acm,azure-kv,globalsign,
local-ca,openssl,ssh,index}.md: 'compliance reporting
(PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting';
'Compliance environments (PCI-DSS Level 1, FedRAMP High,
HIPAA)' → 'Regulated environments'; 'compliance audits' →
'audit'; 'FedRAMP boundary CA' pattern names →
'4-level boundary CA' (technically descriptive)
- reference/protocols/est.md: 'compliance-hook seam' →
'device-state hook seam'; 'compliance gating' → 'device-state
gating'; 'est_compliance_failed' → 'est_device_state_failed'
- reference/protocols/scep-intune.md: 'Optional compliance
check' → 'Optional device-state check'; failure-counter
'compliance_failed' → 'device_state_failed'; 'Conditional
Access compliance gating' → 'Conditional Access
device-state gating'
- reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA
deployments where the regulator requires...' →
'Boundary-CA deployments where you want separation of policy
and issuing authorities'; pattern A retitled '4-level FedRAMP
boundary CA' → '4-level boundary CA'
- reference/architecture.md: broken Related-docs link to
compliance.md removed; the rest of that block had stale
pre-Phase-2 paths (quickstart.md, demo-advanced.md,
connectors.md, openapi.md, testing-guide.md, test-env.md) —
retargeted to current locations
- reference/deployment-model.md: 'SOC 2 evidence-report
generator' → 'Audit-evidence report generator'
- reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this
into evidence packs' → 'reviewers paste this into
vendor-evaluation packs'
- contributor/qa-test-suite.md: 'compliance exist' coverage
description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)'
risk-class label → 'Audit-relevant'
What was kept:
- CWE references (legitimate technical pointers)
- Microsoft API/feature names that happen to use 'compliance'
literally ('Microsoft Graph compliance API',
'device-compliance validators' — these are MS product names,
not framework name-drops)
- 'NIST PQC' on the landing page (Post-Quantum Cryptography is
the actual NIST standard family, not a compliance framework)
Verified: zero hyperlinks into docs/compliance/ remain. All 24
ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean.
Net diff: 26 files / -1,883 deletions in compliance/ + -32 net
across the prose sweep.
Companion edits in cowork/ (CLAUDE.md doc-tree summary +
WORKSPACE-CHANGELOG.md retirement note) land separately.
This commit is contained in:
@@ -26,7 +26,6 @@ The full audience-organized index lives at [`docs/README.md`](docs/README.md). T
|
||||
| New to certctl | [Concepts](docs/getting-started/concepts.md) → [Quickstart](docs/getting-started/quickstart.md) → [Examples](docs/getting-started/examples.md) |
|
||||
| Production operator | [Architecture](docs/reference/architecture.md) → [Security posture](docs/operator/security.md) → [Disaster recovery runbook](docs/operator/runbooks/disaster-recovery.md) |
|
||||
| PKI engineer | [ACME server](docs/reference/protocols/acme-server.md) → [SCEP server](docs/reference/protocols/scep-server.md) → [EST server](docs/reference/protocols/est.md) → [CA hierarchy](docs/reference/intermediate-ca-hierarchy.md) |
|
||||
| Auditor / compliance | [Compliance overview](docs/compliance/index.md) → [SOC 2](docs/compliance/soc2.md) / [PCI-DSS](docs/compliance/pci-dss.md) / [NIST SP 800-57](docs/compliance/nist-sp-800-57.md) |
|
||||
| Migrating from another tool | [from certbot](docs/migration/from-certbot.md) / [from acme.sh](docs/migration/from-acmesh.md) / [cert-manager coexistence](docs/migration/cert-manager-coexistence.md) |
|
||||
| Contributor | [Architecture](docs/reference/architecture.md) → [Testing strategy](docs/contributor/testing-strategy.md) → [CI pipeline](docs/contributor/ci-pipeline.md) |
|
||||
|
||||
@@ -51,7 +50,7 @@ For the connector reference (12 issuers, 15 targets, 6 notifiers) see [`docs/ref
|
||||
|
||||
Certificate lifecycle tooling has historically split into two camps. Enterprise platforms (Venafi, Keyfactor, AppViewX, the CyberArk/Venafi merged stack) charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools (certbot, cert-manager, acme.sh) handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together certbot cron jobs across a fleet, manually renewing certs, or writing custom Adaptable scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.
|
||||
|
||||
Built for **platform engineering and DevOps teams** managing 10 to 500+ certificates, **security and compliance teams** who need audit trails and policy enforcement for SOC 2, PCI-DSS 4.0, or NIST SP 800-57 ([compliance mapping included](docs/compliance/index.md)), and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see [Why certctl?](docs/getting-started/why-certctl.md).
|
||||
Built for **platform engineering and DevOps teams** managing 10 to 500+ certificates, **security teams** who need audit trails and policy enforcement, and **small teams without enterprise budgets** who need Venafi-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see [Why certctl?](docs/getting-started/why-certctl.md).
|
||||
|
||||
## What it does
|
||||
|
||||
@@ -62,8 +61,8 @@ certctl handles the full certificate lifecycle in one self-hosted control plane:
|
||||
- **Run as an ACME server** so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See [`docs/reference/protocols/acme-server.md`](docs/reference/protocols/acme-server.md).
|
||||
- **Run as a SCEP server** for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See [`docs/reference/protocols/scep-server.md`](docs/reference/protocols/scep-server.md).
|
||||
- **Run as an EST server** for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See [`docs/reference/protocols/est.md`](docs/reference/protocols/est.md).
|
||||
- **Manage multi-level CA hierarchies** with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for FedRAMP boundary CAs (4-level), financial-services policy CAs (3-level with per-BU `PermittedDNSDomains`), internal PKI (2-level). See [`docs/reference/intermediate-ca-hierarchy.md`](docs/reference/intermediate-ca-hierarchy.md).
|
||||
- **Gate high-stakes issuance** behind two-person-integrity approval. Flag a profile as `RequiresApproval`, the request lands in a queue, a non-requester approves, the scheduler dispatches. Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA. See [`docs/operator/approval-workflow.md`](docs/operator/approval-workflow.md).
|
||||
- **Manage multi-level CA hierarchies** with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU `PermittedDNSDomains`, and 2-level internal PKI. See [`docs/reference/intermediate-ca-hierarchy.md`](docs/reference/intermediate-ca-hierarchy.md).
|
||||
- **Gate high-stakes issuance** behind two-person-integrity approval. Flag a profile as `RequiresApproval`, the request lands in a queue, a non-requester approves, the scheduler dispatches. See [`docs/operator/approval-workflow.md`](docs/operator/approval-workflow.md).
|
||||
- **Discover** existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
|
||||
- **Revoke** with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See [`docs/reference/protocols/crl-ocsp.md`](docs/reference/protocols/crl-ocsp.md).
|
||||
- **Alert** via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See [`docs/operator/runbooks/expiry-alerts.md`](docs/operator/runbooks/expiry-alerts.md).
|
||||
|
||||
@@ -91,17 +91,6 @@ You're moving from another cert-management tool to certctl, or running both in p
|
||||
| cert-manager ACME (point cert-manager at certctl) | [migration/acme-from-cert-manager.md](migration/acme-from-cert-manager.md) |
|
||||
| Traefik ACME (point Traefik at certctl) | [migration/acme-from-traefik.md](migration/acme-from-traefik.md) |
|
||||
|
||||
## Compliance
|
||||
|
||||
You're working through a SOC 2, PCI, or NIST audit and need to map certctl's capabilities to control objectives.
|
||||
|
||||
| Doc | What it covers |
|
||||
|---|---|
|
||||
| [Compliance overview](compliance/index.md) | What these guides cover and what they don't |
|
||||
| [SOC 2 Type II](compliance/soc2.md) | Trust Service Criteria mapping (CC6, CC7, CC8, A1) |
|
||||
| [PCI-DSS 4.0](compliance/pci-dss.md) | Requirements 3, 4, 6, 7, 8, 10 |
|
||||
| [NIST SP 800-57](compliance/nist-sp-800-57.md) | Key management alignment with NIST guidance |
|
||||
|
||||
## Contributor
|
||||
|
||||
You're contributing to certctl, running tests locally, or trying to understand the CI pipeline.
|
||||
@@ -135,6 +124,4 @@ Historical docs preserved for reference. Most operators don't need these.
|
||||
|
||||
**PKI engineer:** [ACME server](reference/protocols/acme-server.md) → [SCEP server](reference/protocols/scep-server.md) → [EST server](reference/protocols/est.md) → [Intermediate CA hierarchy](reference/intermediate-ca-hierarchy.md). About 6 hours end to end.
|
||||
|
||||
**Auditor / compliance team:** [Compliance overview](compliance/index.md) → applicable framework doc → [Disaster recovery runbook](operator/runbooks/disaster-recovery.md) → [Approval workflow](operator/approval-workflow.md) → [ACME server threat model](reference/protocols/acme-server-threat-model.md). About 4 hours end to end.
|
||||
|
||||
**Contributor:** [Architecture](reference/architecture.md) → [Testing strategy](contributor/testing-strategy.md) → [Test environment](contributor/test-environment.md) → [CI pipeline](contributor/ci-pipeline.md). About 3 hours end to end.
|
||||
|
||||
@@ -1,124 +0,0 @@
|
||||
# Compliance Mapping Guides
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl is a certificate lifecycle management tool, not a compliance product. It doesn't make you compliant — your organization, policies, and processes do that. What certctl provides is tooling that supports the technical controls auditors and evaluators look for when assessing certificate and key management practices.
|
||||
|
||||
These guides map certctl's features to three widely referenced compliance frameworks. They're designed for security engineers, IT auditors, and procurement teams evaluating certctl for environments with regulatory requirements.
|
||||
|
||||
## What's Covered
|
||||
|
||||
**[SOC 2 Type II](soc2.md)** — Maps certctl features to AICPA Trust Service Criteria. Covers logical access controls (CC6), system operations and monitoring (CC7), change management (CC8), and availability (A1). Most relevant for organizations undergoing SOC 2 audits where certificate management is in scope.
|
||||
|
||||
**[PCI-DSS 4.0](pci-dss.md)** — Maps certctl features to PCI Data Security Standard version 4.0 requirements. Covers data-in-transit protection (Req 4), cryptographic key management (Req 3), authentication (Req 8), audit logging (Req 10), secure development (Req 6), and access control (Req 7). Most relevant for organizations handling cardholder data where TLS certificates protect transmission channels.
|
||||
|
||||
**[NIST SP 800-57](nist-sp-800-57.md)** — Maps certctl's key management practices to NIST Special Publication 800-57 Part 1 Rev 5 (2020). Covers key generation, storage, cryptoperiods, key state lifecycle, algorithm selection, key transport, and revocation. Most relevant for organizations aligning with US federal cryptographic guidance or using NIST as a key management baseline.
|
||||
|
||||
## What These Guides Are Not
|
||||
|
||||
These are mapping guides, not certification claims. certctl is not SOC 2 certified, PCI-DSS validated, or NIST-assessed. The guides document how certctl's technical implementation supports the controls these frameworks require — they do not replace your auditor's assessment, your organization's policies, or your security team's judgment.
|
||||
|
||||
The guides also clearly identify gaps where certctl's current implementation doesn't fully align with a framework's recommendations, features planned for future versions, and areas where operator action is required regardless of what certctl provides.
|
||||
|
||||
## How to Use These Guides
|
||||
|
||||
If you're evaluating certctl for a regulated environment, start with the framework your auditor cares about. Each guide includes an evidence summary table mapping specific compliance criteria to certctl features, API endpoints, and configuration — the kind of specifics your auditor will ask for.
|
||||
|
||||
If you're preparing for an audit and certctl is already deployed, use the "Operator Responsibilities" section of each guide to identify what your organization must manage beyond what certctl provides.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Framework | Primary Concern | Key certctl Features |
|
||||
|---|---|---|
|
||||
| SOC 2 Type II | Trust service criteria for SaaS/infrastructure | API audit trail, auth controls, monitoring, change management |
|
||||
| PCI-DSS 4.0 | Cardholder data protection | TLS lifecycle, key management, immutable logging, access control |
|
||||
| NIST SP 800-57 | Cryptographic key management | Agent-side keygen, key isolation, algorithm selection, revocation |
|
||||
|
||||
## Audit-Trail Integrity & Privacy (Bundle 6)
|
||||
|
||||
Two complementary controls protect the `audit_events` table against tampering and minimize PII exposure. Both apply automatically — no operator action is required at install time, but operators must understand the contract before responding to a legal-hold or retention request.
|
||||
|
||||
### Append-Only Enforcement (HIPAA §164.312(b))
|
||||
|
||||
<!-- Source: migrations/000018_audit_events_worm.up.sql -->
|
||||
|
||||
`audit_events` rows cannot be modified or deleted by the application role. Two layers:
|
||||
|
||||
| Layer | Mechanism | Surface |
|
||||
|---|---|---|
|
||||
| **DB trigger** | `audit_events_block_modification()` raises `check_violation` on `BEFORE UPDATE OR DELETE` | Catches any UPDATE / DELETE — including direct `psql` from the app role |
|
||||
| **App-role grant** | `REVOKE UPDATE, DELETE ON audit_events FROM certctl` | Defence-in-depth; the app role can't even attempt the modification |
|
||||
|
||||
**Verification.** From a `psql` session connected as the `certctl` app role:
|
||||
|
||||
```sql
|
||||
UPDATE audit_events SET actor = 'tampered' WHERE id = 'audit-001';
|
||||
-- ERROR: audit_events is append-only (Bundle-6 / M-017 / HIPAA §164.312(b))
|
||||
-- HINT: Use a compliance superuser role for legitimate retention operations.
|
||||
```
|
||||
|
||||
**Compliance superuser pattern.** Legitimate retention work (legal hold, GDPR right-to-be-forgotten, statutory purges) requires a separate PostgreSQL role provisioned out-of-band that bypasses the trigger. Certctl does NOT auto-create this role — operators provision it per their compliance policy. Suggested shape:
|
||||
|
||||
```sql
|
||||
-- One-time setup by a DBA. Stored procedure pattern keeps the
|
||||
-- compliance superuser audit-able too: every invocation should
|
||||
-- itself land in audit_events.
|
||||
CREATE ROLE certctl_compliance LOGIN PASSWORD '<strong-secret>';
|
||||
GRANT UPDATE, DELETE ON audit_events TO certctl_compliance;
|
||||
-- (optional) provision SECURITY DEFINER stored procedures that
|
||||
-- (a) record the retention reason in audit_events as the FIRST step
|
||||
-- (b) then perform the UPDATE/DELETE
|
||||
-- (c) all under the certctl_compliance role's grants.
|
||||
```
|
||||
|
||||
### Body Redaction (GDPR Art. 32, CWE-532)
|
||||
|
||||
<!-- Source: internal/service/audit_redact.go -->
|
||||
|
||||
`AuditService.RecordEvent` routes every `details` map through `RedactDetailsForAudit` BEFORE marshaling to the JSONB column. Two deny-lists:
|
||||
|
||||
| Category | Match | Replacement | Examples |
|
||||
|---|---|---|---|
|
||||
| **Credentials** | case-insensitive key match | `"[REDACTED:CREDENTIAL]"` | `api_key`, `password`, `token`, `*_pem`, `eab_secret`, `acme_account_key`, `signature` |
|
||||
| **PII** | case-insensitive key match | `"[REDACTED:PII]"` | `email`, `phone`, `ssn`, `dob`, `name`, `address`, `postal_code`, `ip_address` |
|
||||
|
||||
Nested maps and arrays are walked recursively — sensitive keys at any depth get scrubbed. The redactor is mutation-free (the caller's original map is unchanged) so service-layer code that reuses the map elsewhere is safe.
|
||||
|
||||
**Operator visibility — `redacted_keys` array.** The redacted map includes a `redacted_keys` array listing every dotted-path that was scrubbed. This surfaces the redaction footprint to compliance auditors without exposing values. Example before/after:
|
||||
|
||||
```jsonc
|
||||
// Caller's input map (e.g., from a service handler):
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "abc123secret",
|
||||
"contact": { "email": "ops@example.com", "role": "admin" }
|
||||
}
|
||||
}
|
||||
|
||||
// Persisted in audit_events.details:
|
||||
{
|
||||
"action": "create_issuer",
|
||||
"issuer_id": "iss-acme-prod",
|
||||
"config": {
|
||||
"endpoint": "https://acme.example.com",
|
||||
"eab_secret": "[REDACTED:CREDENTIAL]",
|
||||
"contact": { "email": "[REDACTED:PII]", "role": "admin" }
|
||||
},
|
||||
"redacted_keys": ["config.eab_secret", "config.contact.email"]
|
||||
}
|
||||
```
|
||||
|
||||
**Maintenance.** When introducing a new credential-bearing field anywhere in the codebase, add the key name to `credentialKeys` (or `piiKeys`) in `internal/service/audit_redact.go`. The unit test suite in `audit_redact_test.go` exercises every entry and proves case-insensitivity + JSON round-trip safety.
|
||||
|
||||
## certctl Pro (V3) Enhancements
|
||||
|
||||
Several compliance-relevant features are planned for certctl Pro:
|
||||
|
||||
- **OIDC/SSO** — Enterprise identity provider integration (SOC 2 CC6.1, PCI-DSS 8.3)
|
||||
- **RBAC** — Role-based access control with admin/operator/viewer roles (SOC 2 CC6.3, PCI-DSS 7.2)
|
||||
- **NATS Audit Streaming** — Real-time audit event streaming to SIEM systems (SOC 2 CC7.2, PCI-DSS 10.2)
|
||||
- **Bulk Revocation** — Fleet-wide incident response capability (NIST SP 800-57 Section 5.4)
|
||||
- **Health/Compliance Scoring** — Automated compliance posture assessment per certificate
|
||||
@@ -1,343 +0,0 @@
|
||||
# NIST SP 800-57 Key Management Alignment
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
NIST SP 800-57 Part 1 Rev 5 (May 2020) is the authoritative US government guidance on cryptographic key management. This document maps certctl's implementation to its recommendations. certctl follows NIST guidance where applicable; this guide documents the alignment and identifies gaps for future roadmap planning.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [Key Generation (Section 6.1)](#key-generation-section-61)
|
||||
2. [Key Storage and Protection (Sections 6.3, 6.4)](#key-storage-and-protection-sections-63-64)
|
||||
3. [Cryptoperiods (Section 5.3, Table 1)](#cryptoperiods-section-53-table-1)
|
||||
4. [Key States and Transitions (Section 5.2)](#key-states-and-transitions-section-52)
|
||||
5. [Algorithm Recommendations (Section 5.1, SP 800-131A)](#algorithm-recommendations-section-51-sp-800-131a)
|
||||
6. [Key Distribution and Transport (Section 6.2)](#key-distribution-and-transport-section-62)
|
||||
7. [Revocation and Compromise (NIST SP 800-57 Part 3)](#revocation-and-compromise-nist-sp-800-57-part-3)
|
||||
8. [Alignment Summary Table](#alignment-summary-table)
|
||||
9. [Gaps and Remediation Roadmap](#gaps-and-remediation-roadmap)
|
||||
- [V2 (Current)](#v2-current)
|
||||
- [V3 (Planned: 2026)](#v3-planned-2026)
|
||||
- [V5 (Planned: 2027+)](#v5-planned-2027)
|
||||
- [Post-Quantum (2027+)](#post-quantum-2027)
|
||||
10. [References](#references)
|
||||
11. [Questions or Corrections?](#questions-or-corrections)
|
||||
|
||||
## Key Generation (Section 6.1)
|
||||
|
||||
certctl generates certificate keys on agent infrastructure using Go's `crypto/rand` for entropy, backed by `/dev/urandom` on Linux and `CryptGenRandom` on Windows. Key generation happens as follows:
|
||||
|
||||
**Agent-Side Key Generation (Production Default)**
|
||||
- Agents generate ECDSA P-256 key pairs per certificate using `crypto/ecdsa` + `crypto/elliptic` (Go stdlib)
|
||||
- Key generation triggered by `AwaitingCSR` job state in renewal/issuance workflows
|
||||
- Agent creates Certificate Signing Request (CSR) with `x509.CreateCertificateRequest`, signed with the agent's private key
|
||||
- Only the CSR crosses the network to the control plane; private key material never leaves the agent
|
||||
- Configuration: `CERTCTL_KEYGEN_MODE=agent` (default, production)
|
||||
|
||||
**Server-Side Key Generation (Demo Only)**
|
||||
- Available for development and testing via `CERTCTL_KEYGEN_MODE=server`
|
||||
- Explicitly logged as a warning at startup: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only"
|
||||
- Docker Compose demo uses server mode for backward compatibility
|
||||
- Not recommended for production; agent mode is the secure default
|
||||
|
||||
**Entropy Source**
|
||||
- `crypto/rand` provides cryptographically secure random bytes
|
||||
- On Linux: backed by `/dev/urandom` via `getrandom()` syscall
|
||||
- On Windows: backed by `CryptGenRandom()` (now `BCryptGenRandom()`)
|
||||
- Meets NIST SP 800-90B requirements for entropy generation
|
||||
|
||||
## Key Storage and Protection (Sections 6.3, 6.4)
|
||||
|
||||
certctl implements tiered key storage with different protection profiles based on key purpose.
|
||||
|
||||
**Agent Private Keys**
|
||||
- Stored on agent filesystem at `CERTCTL_KEY_DIR` (default: `/var/lib/certctl/keys`)
|
||||
- File permissions: 0600 (read/write by agent process only, no world/group access)
|
||||
- One PEM file per certificate, organized by certificate ID
|
||||
- Accessible only to the agent process; isolated from other processes
|
||||
- For container deployments: use Docker volumes with restricted permissions (`-v /var/lib/certctl/keys:0600`)
|
||||
|
||||
**Issuing CA Keys (Local CA Connector)**
|
||||
- Loaded from disk at server startup via `CERTCTL_CA_CERT_PATH` and `CERTCTL_CA_KEY_PATH` env vars
|
||||
- Supports RSA (PKCS#1, PKCS#8) and ECDSA (SEC1, PKCS#8) key formats
|
||||
- Validates certificate constraints before use:
|
||||
- `IsCA=true` flag present
|
||||
- `KeyUsageCertSign` extension set
|
||||
- Valid certificate chain (for sub-CA mode)
|
||||
- Keys held in memory during server runtime (no on-disk caching after load)
|
||||
- Cleared from memory only on server shutdown
|
||||
|
||||
**Sub-CA Mode (Enterprise Integration)**
|
||||
- CA certificate and key signed by upstream enterprise root (e.g., Active Directory Certificate Services)
|
||||
- Certctl acts as subordinate CA, inheriting issuer DN from upstream CA
|
||||
- All issued certificates chain to enterprise trust anchor
|
||||
- CA key protection inherits upstream root's key management practices
|
||||
- Configured via: `CERTCTL_CA_CERT_PATH=/path/to/ca.crt` and `CERTCTL_CA_KEY_PATH=/path/to/ca.key`
|
||||
|
||||
**NIST Gap: HSM Storage**
|
||||
NIST SP 800-57 Part 1 recommends Hardware Security Module (HSM) storage for high-value keys (CA signing keys). certctl V2 uses filesystem storage on the server. HSM support is planned for certctl Pro (V3), enabling integration with:
|
||||
- AWS CloudHSM
|
||||
- Azure Dedicated HSM
|
||||
- Thales Luna, Gemalto SafeNet, YubiHSM (on-premises)
|
||||
- PKCS#11-compatible devices
|
||||
|
||||
## Cryptoperiods (Section 5.3, Table 1)
|
||||
|
||||
NIST recommends cryptoperiods (key validity durations) based on key type and security requirements. certctl enforces cryptoperiods through certificate profiles and renewal policies.
|
||||
|
||||
**Certificate Profile Enforcement**
|
||||
- Certificate profiles (M11a) define `max_ttl` constraint per enrollment profile
|
||||
- All certificates issued through a profile cannot exceed the profile's max_ttl
|
||||
- Profile configuration example:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"name": "Production Web Certs",
|
||||
"max_ttl_seconds": 31536000, // 1 year max
|
||||
"allowed_key_algorithms": ["ECDSA_P256"],
|
||||
"required_sans": ["example.com"]
|
||||
}
|
||||
```
|
||||
|
||||
**Renewal Thresholds**
|
||||
- Renewal policies with configurable `alert_thresholds_days`: `[30, 14, 7, 0]` (days before expiry)
|
||||
- Background scheduler checks renewal eligibility every 1 hour
|
||||
- Certificates transitioned to `Expiring` status at 30 days, `Expired` at 0 days
|
||||
- Renewal workflow can be triggered manually or automatically
|
||||
|
||||
**NIST Cryptoperiod Recommendations vs certctl Implementation**
|
||||
|
||||
| Key Type | NIST Recommendation | certctl Implementation |
|
||||
|----------|---------------------|------------------------|
|
||||
| CA signing key | 3–10 years | Configured via CA certificate not-after date; inheritable from upstream CA in sub-CA mode |
|
||||
| End-entity web server cert | 1–3 years (trending shorter) | Profile `max_ttl` configurable; ACME issuer typically 90 days; SC-081v3 mandating 47 days by 2029 |
|
||||
| Code signing cert | 2–8 years | Profile enforcement via `max_ttl`; not primary certctl use case |
|
||||
| Short-lived credentials | < 1 hour recommended | Profile TTL < 1 hour; exempt from CRL/OCSP (expiry is sufficient revocation); auto-expiry on scheduler tick |
|
||||
| OCSP signing key | 1–2 years | Embedded OCSP responder uses issuing CA key (same period as issuer) or delegated signing cert |
|
||||
| TLS/SSL interoperability cert | 1–2 years | Trending 1 year or less; certctl's ACME/sub-CA/step-ca issuers all support short periods |
|
||||
|
||||
## Key States and Transitions (Section 5.2)
|
||||
|
||||
NIST defines lifecycle states for keys: pre-activation, active, suspended, deactivated, compromised, and destroyed. certctl maps these to certificate and job states:
|
||||
|
||||
| NIST Key State | certctl Equivalent | Implementation |
|
||||
|---|---|---|
|
||||
| **Pre-activation** | `Pending` job state / `AwaitingCSR` | Job created but key not yet generated; awaiting agent CSR submission (agent-mode) or server keygen (demo mode) |
|
||||
| **Active** | Certificate status `Active` | Cert deployed to targets and in use; within validity period (not before < now < not after) |
|
||||
| **Suspended** | Job state `AwaitingApproval` | Interactive approval holds deployment job pending human review; resumes on approval or cancels on rejection |
|
||||
| **Deactivated** | Certificate status `Expired` | Past not-after date; auto-transitioned by scheduler every 2 minutes; renewal eligible |
|
||||
| **Compromised** | Certificate status `Revoked` | Issued via `POST /api/v1/certificates/{id}/revoke` with RFC 5280 revocation reason |
|
||||
| **Destroyed** | Archived (implementation detail) | Operator responsibility; certctl retains all certs in audit trail for compliance; no destructive deletion API |
|
||||
|
||||
**State Transition Audit Trail**
|
||||
All transitions logged to immutable `audit_events` table with:
|
||||
- Event type (e.g., `certificate_revoked`, `renewal_job_completed`)
|
||||
- Actor (authenticated user or agent ID)
|
||||
- Timestamp (RFC3339)
|
||||
- Resource (certificate ID)
|
||||
- Reason (revocation reason code, approval reason, etc.)
|
||||
- HTTP method, path, status (for API calls)
|
||||
|
||||
Example audit entry for revocation:
|
||||
```json
|
||||
{
|
||||
"id": "ae-2024-0615",
|
||||
"event_type": "certificate_revoked",
|
||||
"actor": "ops-alice@example.com",
|
||||
"timestamp": "2024-06-15T14:23:00Z",
|
||||
"resource_id": "cert-web-prod-2024",
|
||||
"resource_type": "certificate",
|
||||
"description": "Revoked: reason=keyCompromise",
|
||||
"body_hash": "sha256:a1b2c3d..."
|
||||
}
|
||||
```
|
||||
|
||||
## Algorithm Recommendations (Section 5.1, SP 800-131A)
|
||||
|
||||
NIST SP 800-131A Rev 2 (January 2024) categorizes cryptographic algorithms as Approved, Conditionally Approved, or Disallowed. certctl implements only NIST-approved algorithms:
|
||||
|
||||
| Algorithm | NIST Status | certctl Support | Notes |
|
||||
|-----------|-------------|-----------------|-------|
|
||||
| **ECDSA P-256** | Approved (128-bit security strength) | Default for agent-side keygen | Meets NIST curve requirements (FIPS 186-4) |
|
||||
| **ECDSA P-384** | Approved (192-bit security strength) | Supported via profile configuration | Higher security margin; slower than P-256 |
|
||||
| **ECDSA P-521** | Approved (256-bit security strength) | Supported via profile configuration | Rarely needed; overkill for TLS |
|
||||
| **RSA 2048** | Approved minimum (112-bit security, transitioning) | Supported via all issuers | Deprecated path; migrate to 3072+ by 2030 per NIST |
|
||||
| **RSA 3072** | Approved (128-bit security) | Supported via all issuers | Recommended minimum for long-term security |
|
||||
| **RSA 4096** | Approved (192-bit security) | Supported via all issuers | Supported but slower; overkill for most TLS |
|
||||
| **SHA-256** | Approved | Used throughout | CSR signing, certificate fingerprints, audit body hashing, CRL/OCSP signing |
|
||||
| **SHA-384** | Approved (192-bit) | Supported where algorithm selection available | Used in some CA signing scenarios |
|
||||
| **SHA-512** | Approved (256-bit) | Supported where algorithm selection available | Rarely needed; SHA-256 suffices for most use cases |
|
||||
| **SHA-1** | Deprecated | Not used in certctl | Browsers reject SHA-1 certs; certctl never generates them |
|
||||
|
||||
**Algorithm Enforcement via Profiles**
|
||||
Certificate profiles enforce allowed key algorithms:
|
||||
```json
|
||||
{
|
||||
"id": "prof-web-prod",
|
||||
"allowed_key_algorithms": ["ECDSA_P256", "ECDSA_P384", "RSA3072"]
|
||||
}
|
||||
```
|
||||
|
||||
**Post-Quantum Cryptography (Tracking)**
|
||||
NIST has finalized PQC standards (FIPS 204, FIPS 205) in August 2024:
|
||||
- **ML-KEM** (Kyber): Approved key encapsulation mechanism
|
||||
- **ML-DSA** (Dilithium): Approved digital signature algorithm
|
||||
- **SLH-DSA** (SPHINCS+): Approved stateless hash-based signature scheme
|
||||
|
||||
certctl will track NIST's PQC roadmap and plan integration when hybrid PQC+classical certificate formats reach browser/infrastructure support. Currently, pure PQC certificates are not widely interoperable.
|
||||
|
||||
## Key Distribution and Transport (Section 6.2)
|
||||
|
||||
NIST SP 800-57 Part 1 Section 6.2 addresses secure key distribution to minimize exposure during transit. certctl implements a zero-transmission-of-private-keys model:
|
||||
|
||||
**Private Key Distribution**
|
||||
- Agent-side keygen model: Private keys never leave agent infrastructure
|
||||
- CSR transmitted over HTTPS (TLS 1.2+) with mutual TLS optional
|
||||
- API key authentication via `Authorization: Bearer <api-key>` header
|
||||
- All API calls logged to immutable audit trail
|
||||
|
||||
**Signed Certificate Distribution**
|
||||
- Certificates (public component) distributed via `GET /agents/{id}/work` over HTTPS
|
||||
- Work endpoint enriches deployment jobs with certificate PEM and metadata
|
||||
- Certificate PEM is idempotent (same cert always returns same bytes)
|
||||
|
||||
**Target Deployment**
|
||||
- Deployment to targets via local filesystem write (NGINX, Apache, HAProxy)
|
||||
- No network transmission of private keys to targets
|
||||
- Agents read local private key from `CERTCTL_KEY_DIR` on deployment
|
||||
- For appliances without agents (F5 BIG-IP, IIS), proxy agent pattern:
|
||||
- Proxy agent runs in same trust zone as appliance
|
||||
- Proxy agent holds target API credentials (iControl, WinRM)
|
||||
- Control plane never communicates with appliance directly
|
||||
- Deployment request includes certificate and proxy agent ID
|
||||
- Proxy agent executes deployment via appliance API
|
||||
|
||||
**Revocation Distribution**
|
||||
- Certificate Revocation List (CRL) via `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615)
|
||||
- Returns DER-encoded X.509 CRL signed by issuing CA (`Content-Type: application/pkix-crl`)
|
||||
- 24-hour validity period
|
||||
- Includes all revoked serials, reasons, and revocation timestamps
|
||||
- Served unauthenticated so relying parties without certctl API credentials can fetch it
|
||||
- Subject to URL caching; OCSP preferred for real-time revocation
|
||||
- OCSP via `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960)
|
||||
- Returns DER-encoded OCSP response (OCSPResponse ASN.1 structure, `Content-Type: application/ocsp-response`)
|
||||
- Signed by issuing CA (or delegated OCSP signing cert)
|
||||
- Responds with good/revoked/unknown status
|
||||
- Served unauthenticated — the RFC 6960 relying-party model does not assume API credentials
|
||||
- Real-time, more bandwidth-efficient than CRL polling
|
||||
|
||||
## Revocation and Compromise (NIST SP 800-57 Part 3)
|
||||
|
||||
NIST SP 800-57 Part 3 covers revocation (Section 2.5) when keys are suspected compromised or no longer needed. certctl implements comprehensive revocation infrastructure:
|
||||
|
||||
**Revocation API**
|
||||
- Endpoint: `POST /api/v1/certificates/{id}/revoke`
|
||||
- Request body:
|
||||
```json
|
||||
{
|
||||
"reason": "keyCompromise",
|
||||
"reason_text": "Private key exposed in log file"
|
||||
}
|
||||
```
|
||||
- Supports all 8 RFC 5280 revocation reason codes:
|
||||
- `unspecified` — no specific reason provided
|
||||
- `keyCompromise` — private key suspected compromised
|
||||
- `caCompromise` — issuing CA key compromised
|
||||
- `affiliationChanged` — subject org/affiliation changed
|
||||
- `superseded` — cert superseded by newer cert
|
||||
- `cessationOfOperation` — key no longer in use
|
||||
- `certificateHold` — temporary hold (rarely used)
|
||||
- `privilegeWithdrawn` — subject authorization withdrawn
|
||||
|
||||
**Revocation Recording**
|
||||
- Certificate status updated to `Revoked`
|
||||
- Entry recorded in `certificate_revocations` table with:
|
||||
- Certificate serial number
|
||||
- Revocation timestamp
|
||||
- Revocation reason code
|
||||
- Issuer ID
|
||||
- Idempotent (revoking an already-revoked cert is safe; returns 200 OK)
|
||||
|
||||
**Issuer Notification (Best-Effort)**
|
||||
- Control plane calls `issuer.RevokeCertificate(ctx, serial, reason)` on issuing connector
|
||||
- Failure does not block the revocation (async, logged, retried)
|
||||
- Supported issuers:
|
||||
- Local CA: generates new CRL immediately
|
||||
- ACME: submits revocation to ACME server (RFC 8555 Section 7.6)
|
||||
- step-ca: calls `/revoke` API
|
||||
- OpenSSL: executes user-provided revocation script
|
||||
|
||||
**Revocation Notifications**
|
||||
- Notifiers triggered after revocation recorded: Slack, Teams, PagerDuty, OpsGenie, email, webhook
|
||||
- Message includes certificate common name, issuer, reason, actor, timestamp
|
||||
- Delivery is asynchronous and retried on failure
|
||||
|
||||
**CRL and OCSP Distribution**
|
||||
- CRL updated on every revocation (or scheduled refresh for non-issued revocations)
|
||||
- OCSP responder queries revocation table in real-time
|
||||
- Short-lived certificate exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
**Bulk Revocation for Large-Scale Compromise Response** (V2.2) — NIST SP 800-57 Part 3 emphasizes rapid revocation when keys are compromised. `POST /api/v1/certificates/bulk-revoke` revokes all certificates matching filter criteria (profile, owner, agent, issuer) in a single operation. This enables operators to execute fleet-wide revocation for key compromise events affecting multiple certificates. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring every certificate is recorded in the audit trail with the incident reason.
|
||||
|
||||
**Revocation Audit Trail**
|
||||
All revocation events logged:
|
||||
- Event type: `certificate_revoked` or `bulk_revocation_initiated` (for fleet operations)
|
||||
- Actor: authenticated user or service
|
||||
- Reason code: RFC 5280 enum (or incident justification for bulk operations)
|
||||
- Timestamp: RFC3339
|
||||
- Issuer notification status: success or error reason
|
||||
- Filter criteria: profile_id, owner_id, agent_id, issuer_id (for bulk revocation)
|
||||
|
||||
## Alignment Summary Table
|
||||
|
||||
| NIST SP 800-57 Area | Status | Coverage | Notes |
|
||||
|---|---|---|---|
|
||||
| **Key Generation** | ✅ Aligned | 100% | Agent-side ECDSA P-256 using crypto/rand; server mode flagged as demo-only |
|
||||
| **Key Storage** | ⚠️ Partially Aligned | 80% | Filesystem with 0600 perms; HSM support planned V3 Pro |
|
||||
| **Cryptoperiods** | ✅ Aligned | 100% | Profile-enforced max_ttl; threshold-based renewal alerting |
|
||||
| **Key States** | ✅ Aligned | 100% | Full lifecycle tracking with immutable audit trail |
|
||||
| **Algorithms** | ✅ Aligned | 100% | NIST-approved algorithms only; post-quantum tracking in progress |
|
||||
| **Key Distribution** | ✅ Aligned | 100% | Private keys never transmitted; CSR/cert over TLS; agent-local deployment |
|
||||
| **Revocation** | ✅ Aligned | 100% | CRL, OCSP, all RFC 5280 reason codes; real-time updates |
|
||||
|
||||
## Gaps and Remediation Roadmap
|
||||
|
||||
### V2 (Current)
|
||||
- [x] Agent-side key generation
|
||||
- [x] Profile-enforced cryptoperiods
|
||||
- [x] CRL and OCSP distribution
|
||||
- [x] RFC 5280 revocation support
|
||||
- [x] Immutable audit trail
|
||||
|
||||
### V2.2 (Planned: 2026)
|
||||
- Bulk revocation by profile/owner/agent/issuer (fleet-level revocation for incident response)
|
||||
|
||||
### V3 (Planned: 2026)
|
||||
- Role-based access control (limit revocation/approval to authorized operators)
|
||||
|
||||
### V3 Pro (Planned)
|
||||
- HSM support for CA key storage and agent key storage (TPM 2.0, PKCS#11)
|
||||
- FIPS 140-2/3 validated crypto module (BoringCrypto build or external FIPS library)
|
||||
- Key destruction API (explicit secure erasure of agent keys)
|
||||
- Key escrow / recovery mechanism (backup encrypted private keys for disaster recovery)
|
||||
|
||||
### Post-Quantum (2027+)
|
||||
- ML-KEM and ML-DSA support when browser/TLS ecosystem supports hybrid certificates
|
||||
- Migration path documentation (how to transition existing RSA certs to PQC)
|
||||
|
||||
## References
|
||||
|
||||
- NIST SP 800-57 Part 1 Rev 5 (May 2020): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r5.pdf
|
||||
- NIST SP 800-131A Rev 2 (January 2024): https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-131Ar2.pdf
|
||||
- FIPS 186-4 (Digital Signature Standard): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf
|
||||
- RFC 5280 (X.509 PKI Certificate and CRL Profile): https://tools.ietf.org/html/rfc5280
|
||||
- RFC 8555 (Automatic Certificate Management Environment): https://tools.ietf.org/html/rfc8555
|
||||
- NIST FIPS 204 (ML-DSA): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.204.pdf
|
||||
- NIST FIPS 205 (ML-KEM): https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.205.pdf
|
||||
|
||||
## Questions or Corrections?
|
||||
|
||||
This document reflects certctl's implementation as of March 2026. For the latest code, refer to:
|
||||
- Key generation: `cmd/agent/main.go` (agent keygen) and `internal/service/renewal.go` (server keygen)
|
||||
- Key storage: `internal/config/config.go` (CERTCTL_KEY_DIR, CERTCTL_CA_CERT_PATH)
|
||||
- Revocation: `internal/service/revocation.go` and `internal/api/handler/certificates.go`
|
||||
- Audit trail: `internal/api/middleware/audit.go`
|
||||
@@ -1,827 +0,0 @@
|
||||
# PCI-DSS 4.0 Compliance Mapping
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This guide maps certctl's existing capabilities to PCI-DSS 4.0 requirements relevant to TLS certificate and cryptographic key management. It is **not a compliance attestation** — a qualified security assessor (QSA) must evaluate your organization's complete control environment. Rather, this document helps you understand which PCI-DSS control objectives certctl supports and where operator responsibility lies.
|
||||
|
||||
Organizations subject to PCI-DSS typically need to demonstrate control over certificate issuance, renewal, rotation, revocation, and key management. Certctl automates the technical controls for certificate lifecycle; compliance depends on how you deploy, monitor, and audit it.
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [Requirement 4: Protect Data in Transit](#requirement-4-protect-data-in-transit)
|
||||
- [4.2.1 — Strong Cryptography for Transmission](#421--strong-cryptography-for-transmission)
|
||||
- [4.2.2 — Certificate Inventory and Validation](#422--certificate-inventory-and-validation)
|
||||
3. [Requirement 3: Protect Stored Cardholder Data (Key Management)](#requirement-3-protect-stored-cardholder-data-key-management)
|
||||
- [3.6 — Cryptographic Key Documentation](#36--cryptographic-key-documentation)
|
||||
- [3.7 — Key Lifecycle Procedures](#37--key-lifecycle-procedures)
|
||||
4. [Requirement 8: Identify and Authenticate](#requirement-8-identify-and-authenticate)
|
||||
- [8.3 — Strong Authentication](#83--strong-authentication)
|
||||
- [8.6 — Application Account Management](#86--application-account-management)
|
||||
5. [Requirement 10: Log and Monitor](#requirement-10-log-and-monitor)
|
||||
- [10.2 — Implement Automated Audit Logging](#102--implement-automated-audit-logging)
|
||||
- [10.3 — Protect Audit Trail](#103--protect-audit-trail)
|
||||
- [10.4 — Promptly Review and Address Audit Trail Exceptions](#104--promptly-review-and-address-audit-trail-exceptions)
|
||||
- [10.7 — Retain and Protect Audit Trail History](#107--retain-and-protect-audit-trail-history)
|
||||
6. [Requirement 6: Develop and Maintain Secure Systems and Applications](#requirement-6-develop-and-maintain-secure-systems-and-applications)
|
||||
- [6.3.1 — Security Coding Practices](#631--security-coding-practices)
|
||||
- [6.5.10 — Broken Authentication and Cryptography Prevention](#6510--broken-authentication-and-cryptography-prevention)
|
||||
7. [Requirement 7: Restrict Access by Business Need-to-Know](#requirement-7-restrict-access-by-business-need-to-know)
|
||||
- [7.2 — Implement Access Control](#72--implement-access-control)
|
||||
8. [Evidence Summary Table](#evidence-summary-table)
|
||||
9. [Operator Responsibilities](#operator-responsibilities)
|
||||
10. [V3 Enhancements for PCI-DSS](#v3-enhancements-for-pci-dss)
|
||||
11. [Next Steps for Compliance](#next-steps-for-compliance)
|
||||
12. [Questions?](#questions)
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
Your QSA will request evidence that your certificate and key management systems meet specific PCI-DSS 4.0 requirements. For each applicable requirement, this guide identifies:
|
||||
|
||||
1. **Which certctl features support the control** — API endpoints, database tables, background processes
|
||||
2. **What evidence you can produce** — audit logs, dashboard metrics, API queries, deployment configs
|
||||
3. **Operator responsibilities** — what you must do outside certctl (policy, monitoring, access control)
|
||||
4. **Status** — Available (v1.0 shipped), Planned (future release), or Operator Responsibility (outside scope)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 4: Protect Data in Transit
|
||||
|
||||
**Objective**: Ensure strong cryptography is used to protect sensitive data during transmission.
|
||||
|
||||
### 4.2.1 — Strong Cryptography for Transmission
|
||||
|
||||
**Requirement**: Use appropriate and current cryptographic algorithms for all TLS and SSH connections protecting card data in transit.
|
||||
|
||||
**certctl Support**:
|
||||
- **Automated TLS certificate lifecycle** — Certctl issues TLS certificates to NGINX, Apache HAProxy targets via `POST /api/v1/deployments`. Certificates include RSA 2048-bit and ECDSA P-256 key types (configurable per profile, M11a).
|
||||
- **Control plane TLS enforcement** — All REST API endpoints served exclusively over HTTPS. Agent-to-server heartbeat and work polling use TLS. No plaintext protocol options.
|
||||
- **Issuer connector key negotiation** — ACME v2 (Let's Encrypt, ZeroSSL) validates issuer cryptography. Local CA enforces RSA/ECDSA constraints. step-ca integration ensures Smallstep's cryptography standards.
|
||||
- **Certificate profiles** (M11a) document allowed key types and minimum key sizes per environment (development, production, cardholder-network).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate inventory via `GET /api/v1/certificates` with key algorithm and size (serial JSON).
|
||||
- Issued certificate details showing RSA 2048+ or ECDSA P-256 for all deployed certificates.
|
||||
- Audit trail (`GET /api/v1/audit`) showing issuer connector selection and certificate profile assignment per certificate.
|
||||
- Target deployment logs showing TLS certificate installation on NGINX/Apache/HAProxy.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure certificate profiles for your environments with approved key algorithms.
|
||||
- Audit cipher suite configuration on deployed targets (certctl deploys certs; you verify target TLS settings).
|
||||
- Periodically review `CERTCTL_KEYGEN_MODE` — must be `agent` in production (never `server`).
|
||||
- Monitor issuer connector configuration to ensure issuers meet your cryptography standards.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 4.2.2 — Certificate Inventory and Validation
|
||||
|
||||
**Requirement**: Ensure all TLS/SSL certificates used for data transmission are valid, current, and meet required cryptographic standards.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Managed Certificate Inventory** — Full CRUD API (`/api/v1/certificates`) with sortable, filterable list. Fields: common name, SANs, subject, issuer, serial number, key type/size, not-before/after dates, issuer ID, profile ID, owner, team, status (Active/Expiring/Expired/Revoked).
|
||||
|
||||
- **Filesystem Certificate Discovery** (M18b) — Agents scan configured directories (`CERTCTL_DISCOVERY_DIRS` env var) for existing PEM/DER certificates every 6 hours and on startup. Control plane deduplicates by SHA-256 fingerprint. Three triage statuses: Unmanaged (not managed by certctl), Managed (linked to a managed certificate), Dismissed (operator-marked as out-of-scope).
|
||||
- API endpoints:
|
||||
- `GET /api/v1/discovered-certificates?status=Unmanaged` — find orphaned certs
|
||||
- `GET /api/v1/discovery-summary` — aggregate counts by status
|
||||
- `POST /api/v1/discovered-certificates/{id}/claim` — link to managed certificate
|
||||
- `POST /api/v1/discovered-certificates/{id}/dismiss` — mark out-of-scope
|
||||
|
||||
- **Expiration Threshold Alerting** — Renewal policies support `alert_thresholds_days` (default 30, 14, 7, 0). Background scheduler evaluates daily; certificates transition to Expiring/Expired status automatically. Notifications sent to owners via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **Certificate Status Tracking** — Four statuses: Active (deployed, not yet expired), Expiring (within threshold, awaiting renewal), Expired (past not-after date), Revoked (revoked via RFC 5280 revocation API). Dashboard charts show status distribution.
|
||||
|
||||
- **Revocation Infrastructure** (M15a, M15b, M-006):
|
||||
- Revocation API: `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes
|
||||
- CRL endpoint: `GET /.well-known/pki/crl/{issuer_id}` — DER X.509 CRL, 24h validity, signed by issuing CA, served unauthenticated (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP responder: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` — DER-encoded OCSP response (good/revoked/unknown), served unauthenticated (RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Bulk revocation (V2.2): `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) for fleet-wide incident response
|
||||
- Short-lived cert exemption: certs with TTL < 1 hour skip CRL/OCSP (expiry is sufficient revocation)
|
||||
|
||||
- **Stats API** (M14) — Real-time visibility:
|
||||
- `GET /api/v1/stats/summary` — total certs, by status, by issuer
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration distribution (weekly buckets)
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance job success rates
|
||||
- `GET /api/v1/certificates` with `?sort=-notAfter&fields=id,commonName,notAfter,status` — sparse, sorted inventory
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Discovered certificate report: `GET /api/v1/discovered-certificates` JSON export showing all certs on systems, fingerprints, and status.
|
||||
- Managed certificate inventory: `GET /api/v1/certificates` with filters (`?status=Expiring` for upcoming renewals).
|
||||
- Expiration alert configuration: policy JSON showing `alert_thresholds_days` for each environment.
|
||||
- CRL/OCSP availability proof: unauthenticated HTTP GET requests to `/.well-known/pki/crl/{issuer_id}` (DER, `application/pkix-crl`) and `/.well-known/pki/ocsp/{issuer_id}/{serial}` (DER, `application/ocsp-response`) with signed responses.
|
||||
- Audit trail for certificate creation/renewal/revocation: `GET /api/v1/audit?type=certificate_issued,certificate_renewed,certificate_revoked`.
|
||||
- Dashboard charts showing expiration timeline, renewal success trends, status distribution.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Configure `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate storage locations (e.g., `/etc/nginx/certs`, `/etc/apache2/certs`, `/usr/local/share/ca-certificates`).
|
||||
- Regularly triage discovered certificates: `GET /api/v1/discovered-certificates?status=Unmanaged`, claim or dismiss each.
|
||||
- Set renewal policies for all certificate profiles with appropriate `alert_thresholds_days` (recommendation: 30, 14, 7, 0).
|
||||
- Monitor expiration dashboard and respond to Expiring alerts before certificates expire.
|
||||
- Verify that issued certificates meet your organization's cryptography standards (key type, key size, SANs).
|
||||
- Test CRL/OCSP endpoints periodically to confirm they are reachable and signed correctly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, discovery M18b, revocation M15a/M15b)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 3: Protect Stored Cardholder Data (Key Management)
|
||||
|
||||
**Objective**: Render cardholder data unreadable anywhere it is stored; protect cryptographic keys used to encrypt data.
|
||||
|
||||
### 3.6 — Cryptographic Key Documentation
|
||||
|
||||
**Requirement**: Document and implement all key management processes and procedures covering generation, storage, archival, destruction, and change; protect cryptographic keys; and restrict access to keys to the minimum required.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Profile Documentation** (M11a) — Named profiles define allowed key types, maximum TTL, and allowed EKUs per use case. Each profile is a documented policy:
|
||||
```json
|
||||
{
|
||||
"id": "p-web-tls",
|
||||
"name": "Web TLS Production",
|
||||
"allowed_key_types": ["RSA_2048", "ECDSA_P256"],
|
||||
"max_ttl_seconds": 31536000,
|
||||
"require_sans": true,
|
||||
"description": "Production TLS certs for external web services"
|
||||
}
|
||||
```
|
||||
|
||||
- **Owner and Team Tracking** (M11b) — Every certificate is assigned an owner (person + email) and optionally a team. This documents key responsibility and escalation paths.
|
||||
|
||||
- **Issuer Connector Specification** — Configuration and API endpoints document which CA and protocol issues each certificate:
|
||||
- `GET /api/v1/issuers/{id}` returns issuer type (local-ca, acme, step-ca, openssl), CA endpoint, authentication method, constraints
|
||||
- Each issuer type has documented key handling (e.g., Local CA loads CA key from `CERTCTL_CA_CERT_PATH`, step-ca via JWK provisioner)
|
||||
|
||||
- **Immutable Audit Trail** (M19) — Every certificate lifecycle event recorded in append-only `audit_events` table:
|
||||
- `certificate_issued` — when certificate created, by whom, issuer type, profile
|
||||
- `certificate_renewed` — when renewed, by whom, issuer
|
||||
- `certificate_revoked` — when revoked, by whom, RFC 5280 reason code
|
||||
- `certificate_deployed` — when deployed to target, by agent, target type
|
||||
- Query: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}`
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Exported certificate profiles: `GET /api/v1/profiles` showing documented key types, max TTLs, constraints per environment.
|
||||
- Certificate-to-owner mapping: `GET /api/v1/certificates` with owner/team fields.
|
||||
- Issuer configuration audit: `GET /api/v1/issuers` showing CA endpoints, key storage paths, auth methods.
|
||||
- Audit trail for a certificate: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete lifecycle.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- Define and document certificate profiles for each environment and use case.
|
||||
- Assign owner and team to each certificate via API or dashboard.
|
||||
- Document issuer connector configuration (CA endpoint, auth method, key storage location).
|
||||
- Maintain baseline audit trail exports for compliance evidence.
|
||||
- Establish certificate retirement policy (how long to retain audit records after certificate expiry/revocation).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
### 3.7 — Key Lifecycle Procedures
|
||||
|
||||
**Requirement**: Generate, store, protect, access, and destroy cryptographic keys used to encrypt data in transit or at rest.
|
||||
|
||||
This requirement covers key generation, storage, rotation, and destruction. Certctl addresses the certificate/TLS key portion (not symmetric encryption keys used for cardholder data at rest — those are outside scope).
|
||||
|
||||
#### 3.7.1 — Key Generation
|
||||
|
||||
**Requirement**: Generate new keys using strong cryptography.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Generation** (M8) — Production mode (default `CERTCTL_KEYGEN_MODE=agent`):
|
||||
- Agents generate ECDSA P-256 key pairs using `crypto/ecdsa` + `crypto/elliptic.P256()` + `crypto/rand` (cryptographically secure random).
|
||||
- Key generation happens **only on the agent**, never on the control plane.
|
||||
- Agent submits Certificate Signing Request (CSR) with public key to control plane via `POST /api/v1/agents/{id}/csr`.
|
||||
- Issued certificate is returned; private key remains on agent at `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`).
|
||||
|
||||
- **Server-Side Fallback** (demo/development only) — `CERTCTL_KEYGEN_MODE=server`:
|
||||
- Control plane generates RSA 2048-bit or ECDSA P-256 keys using `crypto/rand` + `crypto/rsa`.
|
||||
- Server signs CSR and stores the private key in the certificate version record for agent deployment. **Security note:** In server keygen mode, the control plane holds private keys — this is why agent keygen mode is the recommended default for production.
|
||||
- **Must not be used in production.** Explicit warning logged: `server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only`
|
||||
|
||||
- **Issuer-Specific Key Negotiation**:
|
||||
- **ACME (Let's Encrypt, ZeroSSL)**: Let's Encrypt controls key types; certctl requests ECDSA P-256 by default.
|
||||
- **Local CA**: Supports RSA 2048+, ECDSA (P-256, P-384), PKCS#8 format. Key algorithm inherited from CA cert or specified via profile.
|
||||
- **step-ca**: Smallstep's provisioner defines key type; certctl respects server constraints.
|
||||
- **OpenSSL / Custom CA**: User-provided signing script; key type depends on CA backend.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment configuration: `CERTCTL_KEYGEN_MODE=agent` in production (verify in `docker-compose.yml`, Kubernetes manifests, or systemd units).
|
||||
- Agent log excerpt showing key generation: Go `crypto/ecdsa.GenerateKey(elliptic.P256())` via agent process logs with CSR submission timestamp.
|
||||
- Certificate CSR audit: `GET /api/v1/audit?type=certificate_issued` showing CSR fingerprint (SHA-256 hash of CSR PEM).
|
||||
- Renewal job logs showing agent-submitted CSR, not server-generated key.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enforce `CERTCTL_KEYGEN_MODE=agent` in all production deployments.** Never use `server` mode outside demos.
|
||||
- Verify agent hardware is adequately isolated (crypto/rand relies on OS `/dev/urandom` quality).
|
||||
- Monitor `CERTCTL_KEY_DIR` on agents for unauthorized file access (use OS-level file audit if available).
|
||||
- Backup agent key directory (`/var/lib/certctl/keys`) as part of disaster recovery procedure.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.2 — Key Storage and Access Control
|
||||
|
||||
**Requirement**: Restrict cryptographic key access to the minimum required and protect keys from unauthorized access.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Agent-Side Key Storage** (M8) — Private keys written to `CERTCTL_KEY_DIR` (default `/var/lib/certctl/keys`):
|
||||
- File permissions: `0600` (readable/writable by agent process owner only).
|
||||
- Filename convention: one file per certificate (e.g., `web-tls-prod.key`, `api-service.key`).
|
||||
- No key data passed over the network between agent and control plane (CSR only).
|
||||
- Keys used locally by agent to sign TLS handshakes, never transmitted to control plane or other systems.
|
||||
|
||||
- **Control Plane Key Storage** — Sensitive credentials managed via environment variables or `.env` files:
|
||||
- CA private key path: `CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH` (for Local CA sub-CA mode).
|
||||
- ACME account key: embedded in ACME issuer config (not stored separately; ACME library handles in memory).
|
||||
- step-ca provisioner key: `CERTCTL_STEPCA_KEY_PATH` env var (path to JWK private key file, loaded into memory during runtime).
|
||||
- API keys: `CERTCTL_API_KEY` (SHA-256 hashed in database, plaintext never stored).
|
||||
- Database credentials: `CERTCTL_DATABASE_URL` in `.env` file, not in source code.
|
||||
|
||||
- **Docker Compose Credential Management** — `.env` file (git-ignored) holds all secrets:
|
||||
```bash
|
||||
CERTCTL_API_KEY=sk-test-...
|
||||
CERTCTL_DATABASE_URL=postgres://user:pass@db:5432/certctl
|
||||
CERTCTL_CA_KEY_PATH=/run/secrets/ca.key
|
||||
```
|
||||
Credentials never in `docker-compose.yml` or Dockerfile.
|
||||
|
||||
- **Kubernetes Secrets** (operator responsibility) — Deploy control plane with:
|
||||
```yaml
|
||||
env:
|
||||
- name: CERTCTL_DATABASE_URL
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: database-url
|
||||
- name: CERTCTL_API_KEY
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: certctl-secrets
|
||||
key: api-key
|
||||
```
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Agent key directory listing (without keys): `ls -la /var/lib/certctl/keys` (shows file count, permissions, timestamps).
|
||||
- Deployment manifest (`docker-compose.yml` or Kubernetes YAML) showing secrets via env var or Secret object (not inline).
|
||||
- `.env` file (do not share contents, only confirm existence and git-ignore status).
|
||||
- API key hash verification: `GET /api/v1/auth/check` with API key, verifying hash matching without plaintext exposure.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Store `.env` and credential files outside version control.** Verify `.gitignore` includes `.env`, `*.key`, `ca.key`, etc.
|
||||
- **Restrict file system access to `/var/lib/certctl/keys` on agents** via OS-level permissions (Linux: `chmod 0700`, owned by agent user).
|
||||
- **Limit CA key file read access** — `CERTCTL_CA_KEY_PATH` should be readable only by certctl server process (OS permissions).
|
||||
- **Rotate API keys periodically** (recommendation: annually or when personnel changes). No audit trail for API key rotation (outside certctl scope).
|
||||
- **Backup private key stores** (agent key dirs, CA key file) as part of disaster recovery. Encrypt backups at rest.
|
||||
- **Monitor access logs** to `/var/lib/certctl/keys` and CA key file location (use OS audit or file integrity monitoring).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.3 — Key Rotation
|
||||
|
||||
**Requirement**: Rotate cryptographic keys upon expiration or compromise.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Automated Certificate Renewal** — Renewal policies trigger certificate renewal automatically:
|
||||
- Background scheduler checks every 60 minutes (configurable via `CERTCTL_SCHEDULER_RENEWAL_CHECK_INTERVAL`).
|
||||
- For each policy, evaluates all managed certificates: if `(not-after - now) <= policy.renewal_threshold_days`, trigger renewal.
|
||||
- Renewal job created in AwaitingCSR state; agent receives work, generates new key pair, submits new CSR.
|
||||
- Issuer connector signs new CSR with new key; old key discarded by agent after new certificate installed.
|
||||
- New certificate deployed to target via deployment job.
|
||||
|
||||
- **Expiration-Based Rotation** — Certificate profiles (M11a) define `max_ttl_seconds` (e.g., 31536000 for 1 year, 3600 for short-lived certs):
|
||||
- Short-lived certificates (TTL < 1 hour) rotate every deployment cycle, providing defense-in-depth (RFC 5280 revocation not needed).
|
||||
- Longer-lived certs (90/180/365 days) rotated via renewal policy thresholds (30/14/7 day alerts).
|
||||
|
||||
- **Renewal Audit Trail** — Every renewal recorded:
|
||||
- `GET /api/v1/audit?type=certificate_renewed&resource_id={cert_id}` shows each renewal, old serial, new serial, issuer, actor.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Renewal policy configuration: `GET /api/v1/policies` showing `renewal_threshold_days` and `alert_thresholds_days`.
|
||||
- Renewal job history: `GET /api/v1/jobs?type=Renewal&status=Completed` with timestamp, before/after serial numbers.
|
||||
- Certificate version history: `GET /api/v1/certificates/{id}/versions` showing all issued versions, dates, issuers.
|
||||
- Audit trail: `GET /api/v1/audit?type=certificate_renewed` for trending and compliance reporting.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Define renewal policies for all certificate profiles** with appropriate thresholds (typically 30 days before expiration for 90+ day certs, more aggressive for shorter-lived).
|
||||
- **Monitor renewal job success** via dashboard (M14 charts show renewal success trends) and alerts.
|
||||
- **Investigate renewal failures** (stuck AwaitingCSR, issuer connectivity, deployment errors) promptly to avoid expired certificates.
|
||||
- **Test renewal workflow in staging environment** before rolling out to production.
|
||||
- **Document key rotation schedule** for your organization (renewal policy thresholds, approval workflows if AwaitingApproval).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
#### 3.7.4 — Key Destruction
|
||||
|
||||
**Requirement**: Render cryptographic keys unreadable and unusable when they reach the end of their cryptographic lifetime.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Certificate Revocation API** (M15a) — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — general revocation
|
||||
- `keyCompromise` — suspected key compromise
|
||||
- `caCompromise` — CA compromise
|
||||
- `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `privilegeWithdrawn` — lifecycle management
|
||||
- Revocation recorded in `certificate_revocations` table with timestamp and reason.
|
||||
- Issuer notified (best-effort; ACME lacks standard revocation, Local CA skips issuer step).
|
||||
- Revocation notifications sent to owner via email/webhook/Slack/Teams/PagerDuty.
|
||||
|
||||
- **CRL and OCSP Publication** (M15b, M-006) — Revoked certificates published in:
|
||||
- CRL: `GET /.well-known/pki/crl/{issuer_id}` (DER X.509 signed by CA, 24h validity, RFC 5280 §5 + RFC 8615, `Content-Type: application/pkix-crl`)
|
||||
- OCSP: `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (returns revoked status for clients validating certificate chain, RFC 6960, `Content-Type: application/ocsp-response`)
|
||||
- Both endpoints are served unauthenticated so relying parties (browsers, TLS appliances) without certctl API keys can verify revocation — this is the RFC-compliant PKI model.
|
||||
- Clients checking certificate status via OCSP or CRL see revoked status within 24 hours.
|
||||
|
||||
- **Bulk Revocation for Incident Response** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. PCI-DSS Req 4 requires rapid response to data transmission security incidents — bulk revocation enables operators to revoke an entire certificate set (e.g., all certs used by a compromised team or endpoint) in minutes rather than hours.
|
||||
|
||||
- **Private Key Destruction on Agent** — When certificate renewed or revoked:
|
||||
- Agent removes old private key file from `CERTCTL_KEY_DIR` when new certificate deployed.
|
||||
- Job status tracking confirms old key is no longer needed.
|
||||
- No audit trail of key deletion (private keys don't pass through control plane).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Revocation requests: `GET /api/v1/audit?type=certificate_revoked` with RFC 5280 reason codes.
|
||||
- CRL publication: HTTP GET `/.well-known/pki/crl/{issuer_id}` (unauthenticated) returns a DER X.509 CRL — parse with `openssl crl -inform der -noout -text` to show revoked serial numbers, reasons, and timestamps.
|
||||
- OCSP responder validation: Query `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated) for a known-revoked cert; response includes `revoked` status and can be parsed with `openssl ocsp` tooling.
|
||||
- Audit trail: Certificate status transitions (Active → Revoked) recorded in `audit_events`.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Revoke certificates immediately upon key compromise suspicion** using reason code `keyCompromise`.
|
||||
- **Revoke certificates at end of lifecycle** (host decommissioning, service sunset) using reason code `cessationOfOperation`.
|
||||
- **Monitor CRL/OCSP availability** — ensure clients can check revocation status (test with TLS validator tools).
|
||||
- **Establish certificate revocation procedure** (who can revoke, approval workflow if required, documentation).
|
||||
- **Physically destroy backup private keys** (if offline backups are kept) when certificate is revoked or after archival period expires.
|
||||
- **Test revocation workflow in staging** — issue test cert, revoke, verify OCSP/CRL reflects revocation within SLA.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 8: Identify and Authenticate
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know, and authenticate and manage all access.
|
||||
|
||||
### 8.3 — Strong Authentication
|
||||
|
||||
**Requirement**: Authentication mechanisms must use strong cryptography and render authentication credentials (passwords, passphrases, keys) unreadable during transmission and storage.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **API Key Authentication** — All REST API endpoints require authentication (default):
|
||||
- Bearer token format: `Authorization: Bearer sk-...`
|
||||
- Key stored as SHA-256 hash in database (plaintext never persisted).
|
||||
- Comparison uses `crypto/subtle.ConstantTimeCompare` to prevent timing attacks.
|
||||
- Configuration: `CERTCTL_AUTH_TYPE=api-key` (enforced by default, no opt-out without explicit env var).
|
||||
|
||||
- **GUI Authentication Context** — Web dashboard login flow:
|
||||
- Login page (`/login`) accepts API key entry.
|
||||
- AuthProvider context stores API key in session (localStorage in browser, sent in Authorization header for all API calls).
|
||||
- 401 Unauthorized responses trigger automatic redirect to login.
|
||||
- Logout button clears session.
|
||||
- No session server-side (stateless API).
|
||||
|
||||
- **Credential Transmission** — All API traffic over TLS:
|
||||
- HTTPS enforced at server level (no plaintext HTTP).
|
||||
- API key transmitted in Authorization header (not URL parameter, not cookie).
|
||||
- Browser to server: TLS.
|
||||
- Agent to server: TLS.
|
||||
- No credential logging (audit records the per-key actor `Name`, never the Bearer token; logs redact the `Authorization` header).
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- API configuration: `CERTCTL_AUTH_TYPE=api-key` in deployment manifest.
|
||||
- Key inventory: `CERTCTL_API_KEYS_NAMED` env var (format `name:key:admin,...`) — seeds the in-memory `NamedAPIKey{Name, Key, Admin}` struct at `internal/api/middleware/middleware.go:29`. Keys are constant-time-compared (`subtle.ConstantTimeCompare`) against the Bearer token. No database table stores them; protect the env var contents at rest via a secrets manager (Vault / AWS Secrets Manager / Kubernetes Secrets / Docker Secrets).
|
||||
- API audit log: `GET /api/v1/audit?action=api_call` showing per-key actor names (`Name` field of matched `NamedAPIKey`) on every call, with zero plaintext or hashed key material recorded.
|
||||
- TLS certificate on control plane: `openssl s_client -connect {server}:8443` showing valid certificate, TLS 1.2+, strong cipher.
|
||||
- GUI login flow: browser network tab showing Authorization header (token value redacted in compliance report).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Issue API keys to users/systems** requiring API access (outside certctl; you maintain key registry).
|
||||
- **Rotate API keys using zero-downtime rotation** — `CERTCTL_AUTH_SECRET` supports comma-separated keys (e.g., `new-key,old-key`). Add the new key, migrate clients, then remove the old key. Recommendation: rotate at least annually, or immediately when personnel changes.
|
||||
- **Revoke API keys immediately** when user leaves or token is compromised (set `enabled=false` in API key management — not yet implemented in v1, owner must track manually).
|
||||
- **Enforce strong TLS** on control plane: TLS 1.2+, modern ciphers (configure on reverse proxy or `CERTCTL_TLS_*` env vars if operator-controlled).
|
||||
- **Protect `.env` and credential files** where API key is defined (restrict file system access, no version control).
|
||||
- **Monitor API audit trail** for suspicious access patterns (many 401 errors, access from unexpected IPs, etc.).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 8.6 — Application Account Management
|
||||
|
||||
**Requirement**: Users' system access must be restricted to the minimum level of application functions or data needed to perform duties. Application accounts (non-human) must use strong authentication.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **No Application Account Management in v1** — Certctl does not manage user accounts (no user directory, LDAP, OIDC).
|
||||
- All authentication via API key (service-to-service or human user with API key).
|
||||
- No per-user roles or permissions (that's V3 RBAC feature).
|
||||
- Single API key shared across team or one key per automation script (operator's responsibility to manage).
|
||||
|
||||
- **Credentials Not in Source Code** — Security hardening:
|
||||
- API keys via `CERTCTL_API_KEY` env var (not in `main.go`, Dockerfile, `docker-compose.yml`).
|
||||
- Database credentials via `CERTCTL_DATABASE_URL` in `.env` (git-ignored).
|
||||
- CA private key path via `CERTCTL_CA_CERT_PATH`/`CERTCTL_CA_KEY_PATH` (not inline).
|
||||
|
||||
- **Service Account Isolation** (planned for V3) — Future RBAC will support:
|
||||
- Automation script API keys with scoped permissions (e.g., read-only, renew-only, deploy-only).
|
||||
- OIDC/SSO for human users with fine-grained role assignment (admin, operator, viewer).
|
||||
- Audit trail showing which account/role performed each action.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Deployment manifest (Dockerfile, docker-compose.yml) showing no hardcoded API keys, database credentials, or CA key paths.
|
||||
- `.env` file existence (confirm via CI or compliance check, without sharing contents).
|
||||
- `.gitignore` configuration showing `.env`, `*.key`, secrets excluded.
|
||||
- Code review: grep `main.go`, `config.go` for `CERTCTL_API_KEY` — should only see env var reference, not hardcoded values.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Manage API keys externally** (issue, rotate, revoke).
|
||||
- **Document who/what has API key access** (automation scripts, team members, third-party integrations).
|
||||
- **Rotate application credentials** (API keys, database passwords) according to your organization's policy.
|
||||
- **Segregate credentials** — one API key per automation script where possible, or use V3 RBAC scoping.
|
||||
- **Monitor application account usage** via audit trail — `GET /api/v1/audit` filtered by action/actor.
|
||||
|
||||
**Status**: **Available in part** (v1.0: credentials out of source code). **Planned V3**: scoped API keys and RBAC.
|
||||
|
||||
---
|
||||
|
||||
## Requirement 10: Log and Monitor
|
||||
|
||||
**Objective**: Log and monitor access to network resources and cardholder data.
|
||||
|
||||
### 10.2 — Implement Automated Audit Logging
|
||||
|
||||
**Requirement**: Automatically log and monitor all access to system components and records containing cardholder data.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable API Audit Log** (M19) — Middleware captures every API call:
|
||||
- `audit_events` table (append-only, no UPDATE/DELETE):
|
||||
- `method`: HTTP method (GET, POST, PUT, DELETE)
|
||||
- `path`: API endpoint path only, excluding query parameters (e.g., `/api/v1/certificates` — query strings intentionally omitted to prevent sensitive data persistence in the append-only audit trail)
|
||||
- `actor`: authenticated user/service (extracted from API key or context)
|
||||
- `body_hash`: SHA-256 hash of request body (truncated to 16 chars, first 8 chars shown in logs)
|
||||
- `status_code`: HTTP response status (200, 201, 400, 401, 404, 500, etc.)
|
||||
- `latency_ms`: request duration in milliseconds
|
||||
- `timestamp`: RFC 3339 timestamp
|
||||
|
||||
- **Certificate Lifecycle Events** — Higher-level events logged separately:
|
||||
- `certificate_issued` — new certificate created, issuer, profile, profile ID
|
||||
- `certificate_renewed` — certificate renewed, old/new serial, renewal policy
|
||||
- `certificate_revoked` — certificate revoked, RFC 5280 reason code
|
||||
- `certificate_deployed` — certificate deployed to target, agent, target type
|
||||
- `certificate_validated` — validation job result (success/failure reason)
|
||||
|
||||
- **Job Lifecycle Events** — Job status transitions:
|
||||
- `job_created` — renewal/issuance/deployment/validation job created
|
||||
- `job_status_updated` — job state change (Pending → AwaitingCSR → Running → Completed/Failed)
|
||||
|
||||
- **Policy and Configuration Events** — Administrative changes:
|
||||
- `policy_created`, `policy_updated`, `policy_deleted` — renewal policy changes
|
||||
- `profile_created`, `profile_updated`, `profile_deleted` — certificate profile changes
|
||||
- `issuer_created`, `issuer_deleted` — CA connector registration changes
|
||||
|
||||
- **Excluded Paths** — Health/readiness probes not logged to reduce noise:
|
||||
- `GET /health` (excluded by default)
|
||||
- `GET /ready` (excluded by default)
|
||||
- Configurable via `CERTCTL_AUDIT_EXCLUDE_PATHS` env var
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Audit trail export: `GET /api/v1/audit` or manual database query, showing sample events with timestamp, actor, action, resource.
|
||||
- API call audit log: Query `audit_events` table showing method, path, actor, status code for last 24-48 hours.
|
||||
- Configuration changes: `GET /api/v1/audit?type=policy_created,policy_updated,issuer_created` showing who changed what and when.
|
||||
- Certificate lifecycle: `GET /api/v1/audit?resource_type=certificate&resource_id={cert_id}` showing complete issuance → deployment → renewal/revocation history.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Enable audit logging** — it's on by default; verify `CERTCTL_AUDIT_EXCLUDE_PATHS` is not set to exclude certificate-related paths.
|
||||
- **Monitor audit log growth** — `audit_events` table will grow with every API call. Recommend database maintenance (log rotation policy, archival after 90 days, etc.).
|
||||
- **Export and archive audit logs** — periodically `SELECT * FROM audit_events WHERE timestamp > {date}` and export to secure storage (S3, syslog, SIEM).
|
||||
- **Establish audit review procedure** — QSA may request sample of logs; have export process documented.
|
||||
- **Test audit logging** — make API call, verify event appears in audit trail within seconds.
|
||||
|
||||
**Status**: **Available** (M19 shipped)
|
||||
|
||||
### 10.3 — Protect Audit Trail
|
||||
|
||||
**Requirement**: Promptly protect audit trail files from unauthorized modifications.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Append-Only Database Design** — PostgreSQL triggers and constraints prevent modification:
|
||||
- `audit_events` table has no `UPDATE` or `DELETE` triggers.
|
||||
- Application code never executes UPDATE/DELETE on `audit_events`.
|
||||
- Primary key is `id` (serial); new events always INSERT.
|
||||
|
||||
- **Read-Only API Access** — Audit events accessible only via read (`GET /api/v1/audit`):
|
||||
- No `POST /api/v1/audit/{id}` endpoint (no creation from API).
|
||||
- No `PUT /api/v1/audit/{id}` endpoint (no modification).
|
||||
- No `DELETE /api/v1/audit/{id}` endpoint (no deletion).
|
||||
- Only control plane can record events (via internal service layer, not exposed API).
|
||||
|
||||
- **Database Access Control** (operator responsibility) — PostgreSQL user permissions:
|
||||
- `certctl` application user: INSERT, SELECT on `audit_events`.
|
||||
- `certctl_read_only` user (for compliance/audit team): SELECT only on `audit_events`.
|
||||
- `postgres` superuser: restricted to DBA operations, logged separately by PostgreSQL.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database schema: `\d audit_events` showing columns, primary key, no UPDATE/DELETE triggers.
|
||||
- Application code review: `internal/service/audit.go` showing `RecordEvent(...)` as only INSERT operation.
|
||||
- API endpoint audit: grep `internal/api/handler/audit*.go` or `internal/api/router/router.go` — no PUT/DELETE routes for events.
|
||||
- PostgreSQL permissions: `psql -d certctl -c "\dp audit_events"` showing INSERT/SELECT grants only.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Restrict database access** — issue read-only PostgreSQL user for compliance/audit team (no write privileges).
|
||||
- **Enable PostgreSQL query logging** — log all database connections and operations for DBA audit trail.
|
||||
- **Backup audit logs** — regularly export `audit_events` to offsite storage (S3, archive tape, syslog aggregator) for long-term retention.
|
||||
- **Monitor database modifications** — alert if any UPDATE/DELETE is attempted on `audit_events` (log-based alerting or PostgreSQL event triggers).
|
||||
- **Encrypt audit exports** — if archiving to external storage, encrypt backups at rest.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 10.4 — Promptly Review and Address Audit Trail Exceptions
|
||||
|
||||
**Requirement**: Promptly review audit logs and investigate exceptions/anomalies.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Dashboard Charts** (M14) — Real-time observability:
|
||||
- **Renewal Success Trends** (30-day line chart) — shows job success rate; spikes in failures warrant investigation.
|
||||
- **Certificate Status Distribution** (donut chart) — shows Expiring/Expired counts; high Expired = missed renewals.
|
||||
- **Expiration Timeline** (90-day weekly heatmap) — shows upcoming expirations; bunching = renewal policy tuning needed.
|
||||
- **Issuance Rate** (30-day bar chart) — shows certificate creation/renewal activity; anomalies (zero issuances for weeks) indicate stopped automation.
|
||||
|
||||
- **Stats API** (M14) — Machine-readable trends:
|
||||
- `GET /api/v1/stats/job-trends?days=30` — renewal/issuance/deployment success/failure counts per day.
|
||||
- `GET /api/v1/stats/summary` — total certs, counts by status.
|
||||
- `GET /api/v1/stats/expiration-timeline?days=90` — expiration buckets for forecasting.
|
||||
|
||||
- **Agent Fleet Overview** (M14) — Agent health visibility:
|
||||
- Pie chart: agent status distribution (healthy, offline, error).
|
||||
- Version breakdown: agent versions in use (identify outdated agents).
|
||||
- Per-agent detail: last heartbeat timestamp, OS/architecture, IP address, recent jobs.
|
||||
|
||||
- **Alert Notifications** (M3, M16a) — Configurable escalation:
|
||||
- Email alerts: certificate approaching expiration, renewal failure, revocation notification.
|
||||
- Webhook: custom HTTP POST to your monitoring system (Slack, Teams, PagerDuty, OpsGenie, custom webhook).
|
||||
- **Retry & Dead-Letter Queue** (I-005) — Transient notifier failures (SMTP timeout, webhook 5xx) are retried with exponential backoff (`2^n` minutes capped at 1h, 5-attempt budget) before landing in the terminal `dead` status. Operators monitor DLQ depth via the `certctl_notification_dead_total` Prometheus counter and requeue via the Notifications page Dead letter tab once the underlying outage is resolved. Closes the pre-I-005 silent-drop gap where a single 5xx could lose a compliance-relevant alert without evidence.
|
||||
- Deduplication: one alert per threshold/certificate per day (avoid alert fatigue).
|
||||
|
||||
- **Audit Trail Filtering and Export** (M13) — Compliance reporting:
|
||||
- `GET /api/v1/audit?actor={user}×tamp_after={date}` — filter audit log by actor, timestamp, type.
|
||||
- Export CSV/JSON via dashboard: audit page → select filters → "Export CSV" or "Export JSON".
|
||||
- Can export full audit trail for QSA review.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Dashboard screenshots: expiration timeline, renewal success trends, status distribution.
|
||||
- Job trend report: `GET /api/v1/stats/job-trends?days=90` showing success/failure rates.
|
||||
- Agent fleet health: `GET /api/v1/agents` showing heartbeat status, version count distribution.
|
||||
- Audit log sample: `GET /api/v1/audit?limit=100` showing certificate issuance/renewal/revocation activity.
|
||||
- Alert configuration: screenshot of renewal policy `alert_thresholds_days` (30, 14, 7, 0) and notifier settings (email, Slack, etc.).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dashboard charts weekly** — look for anomalies (high Expired count, failure spike, renewal stalled).
|
||||
- **Respond to alerts promptly** — expiration alert = investigate renewal (check job logs, issuer connectivity, agent heartbeat).
|
||||
- **Set alert thresholds appropriately** — default 30/14/7/0 days is a starting point; adjust per your SLA and staffing.
|
||||
- **Maintain alert distribution list** — ensure alerts reach the right on-call engineer/team.
|
||||
- **Archive and review audit logs** — export monthly/quarterly for compliance trending (e.g., "all certificate changes last quarter").
|
||||
- **Test alert delivery** — trigger a test renewal failure or manual revocation, verify alert is sent.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped, M14 observable charts, M19 audit log)
|
||||
|
||||
### 10.7 — Retain and Protect Audit Trail History
|
||||
|
||||
**Requirement**: Retain audit trail history for at least one year and ensure it can be retrieved.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Immutable Audit Trail** (M19) — `audit_events` table stores all API calls and certificate lifecycle events with timestamps.
|
||||
- **No Automatic Purge** — Certctl does not delete audit events. They remain in PostgreSQL indefinitely.
|
||||
- **Queryable History** — All events accessible via `GET /api/v1/audit` with time range, actor, resource filters.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Database retention policy: confirm `audit_events` table has no DELETE triggers or maintenance jobs that purge events.
|
||||
- Sample audit query: `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '365 days'` showing one year+ of events.
|
||||
- Export procedure: documented process for exporting audit logs to cold storage (S3, archive tape, syslog).
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Configure PostgreSQL backup/retention** — certctl relies on database backups for audit trail protection.
|
||||
- Backup `audit_events` table daily or per your RPO/RTO.
|
||||
- Retain backups for at least 1 year (configure retention policy on backup system).
|
||||
- Test restore procedure annually.
|
||||
|
||||
- **Export and archive audit logs** — periodically export `SELECT * FROM audit_events WHERE timestamp > {start_date}` to offsite storage.
|
||||
- Recommendation: monthly exports to S3 with versioning enabled.
|
||||
- Encrypt exports at rest.
|
||||
- Retain archives for at least 3 years (adjust per your compliance requirements).
|
||||
|
||||
- **Monitor audit log growth** — `audit_events` table will grow ~1-5 MB/day depending on API call volume.
|
||||
- Estimate: 10,000 API calls/day = ~50 MB/month.
|
||||
- Plan PostgreSQL storage and backup capacity accordingly.
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 6: Develop and Maintain Secure Systems and Applications
|
||||
|
||||
**Objective**: Develop and maintain secure systems and applications.
|
||||
|
||||
### 6.3.1 — Security Coding Practices
|
||||
|
||||
**Requirement**: Develop all custom application code in accordance with secure coding practices and include authentication, access control, input validation, and error handling.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Input Validation** — Centralized validators enforce strong input constraints:
|
||||
- Common name: max 253 chars, DNS-safe characters only, no leading/trailing hyphens.
|
||||
- CSR PEM: must be valid PEM format (regex validation).
|
||||
- Policy type: whitelist enum (Issuance, Renewal, Revocation, etc.).
|
||||
- API key: alphanumeric + hyphens only.
|
||||
- Implemented in `internal/domain/validation.go` and called from all handler layer inputs.
|
||||
|
||||
- **Error Handling** — No sensitive data leakage in error responses:
|
||||
- HTTP 500 errors return generic "Internal Server Error" message, not stack trace.
|
||||
- Database errors logged internally (structured slog), not exposed to client.
|
||||
- 404 errors do not reveal whether resource exists (consistent "Not Found" regardless of auth vs. not-found).
|
||||
|
||||
- **No Hardcoded Credentials** — All secrets via environment variables:
|
||||
- `CERTCTL_API_KEY`, `CERTCTL_DATABASE_URL`, `CERTCTL_CA_KEY_PATH` — env vars only.
|
||||
- Credentials not in `main.go`, Dockerfile, `docker-compose.yml`, or Git history.
|
||||
- `.env` file git-ignored and excluded from version control.
|
||||
|
||||
- **Dependency Management** — Go module pinning (`go.mod`):
|
||||
- All external dependencies pinned to specific versions.
|
||||
- No wildcard versions or `latest` tags.
|
||||
- CI runs `go mod verify` to detect tampering.
|
||||
|
||||
**Evidence You Can Provide**:
|
||||
- Code review: `internal/domain/validation.go` showing input validation functions (Common name length, CSR PEM, policy type, etc.).
|
||||
- Error handling audit: `internal/api/handler/certificates.go` showing HTTP error responses (no stack traces).
|
||||
- Credentials in source code check: `grep -r "CERTCTL_API_KEY\|DATABASE_URL\|CA_KEY" cmd/ internal/ | grep -v ".env"` (should only show env var references, not values).
|
||||
- `go.mod` review: no wildcard versions, all pinned.
|
||||
- CI workflow: `.github/workflows/ci.yml` showing `go mod verify` step.
|
||||
|
||||
**Operator Responsibility**:
|
||||
- **Review dependency updates** — keep Go version current, update certctl dependencies regularly (security patches).
|
||||
- **Scan container images** — use Trivy, Clair, or similar to scan Docker images for known vulnerabilities.
|
||||
- **Maintain secure coding practices** in any custom issuer/target connectors you deploy (scripts for OpenSSL, BASH/PowerShell for IIS/F5).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
### 6.5.10 — Broken Authentication and Cryptography Prevention
|
||||
|
||||
**Requirement**: Prevent broken authentication and cryptography weaknesses.
|
||||
|
||||
**certctl Support**:
|
||||
|
||||
- **Authentication** — API key with SHA-256 hashing, constant-time comparison (`crypto/subtle.ConstantTimeCompare`).
|
||||
- **Cryptography** — Go's `crypto/*` standard library (no weak ciphers). ECDSA P-256, RSA 2048+.
|
||||
- **TLS** — HTTPS enforced (no plaintext HTTP endpoints).
|
||||
- **No Sessions** — Stateless API (no session cookies, no session fixation risk).
|
||||
|
||||
**Status**: **Available** (v1.0 shipped)
|
||||
|
||||
---
|
||||
|
||||
## Requirement 7: Restrict Access by Business Need-to-Know
|
||||
|
||||
**Objective**: Limit access to system components and cardholder data by business need-to-know and ensure users are authenticated and authorized.
|
||||
|
||||
### 7.2 — Implement Access Control
|
||||
|
||||
**Requirement**: Ensure proper user identity management and implement access controls based on business need-to-know.
|
||||
|
||||
**certctl v1 Support** (limited):
|
||||
- **Certificate Ownership** (M11b) — Each certificate assigned to owner (person + email) and optional team. Ownership is metadata; access control is not enforced at API level.
|
||||
- **Agent Groups** (M11b) — Renewal policies target specific agent groups (OS, architecture, CIDR, version). Groups are used for policy targeting, not user access control.
|
||||
- **Interactive Approval** (M11b) — `AwaitingApproval` job state allows manual approval/rejection of renewals (enforcement of business workflows, not user access control).
|
||||
|
||||
**certctl v3 Support** (planned):
|
||||
- **OIDC/SSO** — Okta, Azure AD, Google integration. Users log in via identity provider.
|
||||
- **Role-Based Access Control (RBAC)** — Three roles: admin (all operations), operator (issue/renew/deploy), viewer (read-only). Roles assigned via OIDC claims or group membership.
|
||||
- **Profile/Owner Gating** — Operator can renew only certificates assigned to their team; viewer cannot modify anything.
|
||||
- **Audit Trail Attribution** — Every action shows which user/role performed it.
|
||||
|
||||
**Evidence You Can Provide** (v1):
|
||||
- Certificate ownership mapping: `GET /api/v1/certificates` showing owner, team fields (metadata only; access not controlled).
|
||||
- Agent group targeting: `GET /api/v1/policies` showing `agent_group_id` field.
|
||||
- Interactive approval workflow: job detail showing `AwaitingApproval` state, approve/reject endpoints in API docs.
|
||||
|
||||
**Operator Responsibility** (v1):
|
||||
- **Manage API key distribution** externally — only issue API keys to authorized users/systems.
|
||||
- **Implement reverse proxy auth** (Nginx, Apache, Okta proxy) in front of certctl to enforce OIDC/LDAP (outside certctl).
|
||||
- **Plan for V3 RBAC** — budget for upgrade when finer-grained access control is needed.
|
||||
|
||||
**Planned** (V3):
|
||||
- Upgrade to certctl Pro with OIDC/RBAC and per-role audit trail.
|
||||
|
||||
**Status**: **Available in part** (v1.0: ownership metadata, agent group targeting). **Planned V3**: OIDC/RBAC enforcement.
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| PCI-DSS Requirement | certctl Feature | API/UI Evidence | Database/Config | Audit Trail | Status |
|
||||
|---|---|---|---|---|---|
|
||||
| **4.2.1** Strong Crypto | TLS cert issuance, ACME/step-ca/Local CA, RSA 2048+/ECDSA P-256 | `GET /api/v1/certificates` (key_type, key_size) | Certificate profiles | `GET /api/v1/audit?type=certificate_issued` | Available |
|
||||
| **4.2.2** Cert Inventory & Validation | Managed cert CRUD, discovery (M18b), expiration alerting, CRL/OCSP | `GET /api/v1/certificates`, `GET /api/v1/discovered-certificates`, `GET /.well-known/pki/crl/{issuer_id}`, `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (both unauthenticated, RFC 5280 / RFC 6960) | `managed_certificates`, `discovered_certificates` tables | `GET /api/v1/audit?type=certificate_*` | Available |
|
||||
| **3.6** Key Documentation | Profiles, owner/team tracking, issuer config, audit trail | `GET /api/v1/profiles`, `GET /api/v1/issuers`, certificate detail with owner/team | Profiles, certificate owner/team fields, issuer config | `GET /api/v1/audit?resource_type=certificate` | Available |
|
||||
| **3.7.1** Key Generation | Agent-side ECDSA P-256, server keygen (demo only) | Agent logs, renewal job detail, CSR audit | `CERTCTL_KEYGEN_MODE=agent` (config), job_type=AwaitingCSR | `GET /api/v1/audit?type=certificate_issued` with CSR hash | Available |
|
||||
| **3.7.2** Key Storage | Agent `/var/lib/certctl/keys` (0600), env var secrets, .env excluded | Deployment manifest (env var refs), agent key dir listing | `.env` file (git-ignored), `CERTCTL_KEY_DIR`, `CERTCTL_CA_KEY_PATH` | No API audit (keys off-platform) | Available |
|
||||
| **3.7.3** Key Rotation | Auto renewal, expiration thresholds, renewal jobs | Dashboard renewal trends, `GET /api/v1/jobs?type=Renewal`, certificate versions | Renewal policies, certificate version history | `GET /api/v1/audit?type=certificate_renewed` | Available |
|
||||
| **3.7.4** Key Destruction | Revocation API (RFC 5280), CRL/OCSP, private key cleanup | `POST /api/v1/certificates/{id}/revoke`, unauthenticated `GET /.well-known/pki/crl/{issuer_id}` and `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` | `certificate_revocations` table, CRL publication | `GET /api/v1/audit?type=certificate_revoked` | Available |
|
||||
| **8.3** Strong Authentication | API key (SHA-256 hash, TLS), GUI login, 401 redirect | GUI login screenshot, API key auth header, TLS cert | API key hash in database | `GET /api/v1/audit` showing API calls | Available |
|
||||
| **8.6** Acct Management | Credentials out of source, .env excluded, env var config | Code review (no hardcoded secrets), `.gitignore` check | Deployment manifests showing env var refs only | No account lifecycle audit (outside scope) | Available in part |
|
||||
| **10.2** Audit Logging | API audit middleware (M19), certificate lifecycle events | `GET /api/v1/audit` with filter/pagination | `audit_events` table (every API call) | Real-time via API | Available |
|
||||
| **10.3** Audit Protection | Append-only table design, read-only API, DB permissions | API endpoint audit (no PUT/DELETE on events), DB schema | `audit_events` table, PostgreSQL GRANT SELECT | Immutable by design | Available |
|
||||
| **10.4** Review & Alert | Dashboard charts, stats API, notifier integrations | Dashboard (renewal trends, status pie, expiration heatmap), `GET /api/v1/stats/*` | Job results, alert config in policies | `GET /api/v1/audit?type=job_*` | Available |
|
||||
| **10.7** Retention | 1+ year in PostgreSQL, export/archive procedures | Database query `SELECT COUNT(*) FROM audit_events WHERE timestamp > NOW() - INTERVAL '1 year'` | `audit_events` table retention (no auto-delete) | Manual export/archival (operator) | Available |
|
||||
| **6.3.1** Secure Coding | Input validation, error handling, no hardcoded secrets, dependency pinning | Code review (validation.go, handlers), error responses | `go.mod` with pinned versions, `.gitignore` | GitHub Actions CI with `go mod verify` | Available |
|
||||
| **7.2** Access Control | Ownership metadata, agent groups, interactive approval | `GET /api/v1/certificates` (owner/team), `GET /api/v1/agent-groups` | Certificate owner/team fields, agent group criteria | User identity from auth context | Available in part (V3: RBAC) |
|
||||
|
||||
---
|
||||
|
||||
## Operator Responsibilities
|
||||
|
||||
The following control objectives are **outside certctl's scope** and must be managed by your organization:
|
||||
|
||||
| Control Objective | Responsibility | Example Actions |
|
||||
|---|---|---|
|
||||
| **Network Segmentation** | Isolate certctl control plane from cardholder network | Place certctl on separate VLAN, firewall rules |
|
||||
| **Physical Security** | Restrict access to servers/databases | Data center access controls, logging |
|
||||
| **Personnel Screening** | Background checks for staff with access | HR/employment verification |
|
||||
| **Access Control Enforcement** | User authentication & authorization outside API | Implement reverse proxy with OIDC (V3: use certctl Pro RBAC) |
|
||||
| **Incident Response** | Procedures for certificate compromise or breach | Document key revocation process, alert escalation |
|
||||
| **Disaster Recovery** | Backup and restore procedures | Database backup schedule, offsite replication |
|
||||
| **Change Management** | Approval process for config/cert changes | CAB meetings, documented procedures |
|
||||
| **Vulnerability Scanning** | ASV scanning, penetration testing, code review | Annual PCI-DSS penetration test |
|
||||
| **Key Backup & Escrow** | Secure offline storage of CA private keys (if required) | Hardware security module (HSM) or encrypted vault |
|
||||
| **Audit Log Retention** | Long-term archival and protection of audit logs | Export to S3/syslog, retain 3+ years |
|
||||
| **QSA Engagement** | Schedule and coordination of compliance assessment | Annual audit with qualified security assessor |
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements for PCI-DSS
|
||||
|
||||
Certctl v3 (Pro) adds paid features that strengthen PCI-DSS compliance posture:
|
||||
|
||||
| Feature | PCI-DSS Benefit |
|
||||
|---|---|
|
||||
| **OIDC/SSO Authentication** | Centralized identity management, audit integration with corporate directory |
|
||||
| **Role-Based Access Control (RBAC)** | Least-privilege enforcement: admin, operator, viewer roles with profile/team gating |
|
||||
| **Bulk Revocation by Profile/Owner/Agent** | Rapid incident response (revoke all certs in cardholder network in minutes) |
|
||||
| **NATS Event Bus with JetStream Audit Streaming** | Real-time event streaming to SIEM (Splunk, ELK, Datadog) for centralized audit trail |
|
||||
| **Certificate Health Scores** | Proactive risk identification (composite scoring: expiration proximity, rotation age, key strength) |
|
||||
| **Advanced Search DSL** | Complex audit queries (POST /search with nested AND/OR, regex, field projection) for compliance reporting |
|
||||
| **CT Log Monitoring** | Detect unauthorized certificate issuance (security vulnerability detection) |
|
||||
| **DigiCert Issuer Connector** | Enterprise CA integration for compliance audits |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps for Compliance
|
||||
|
||||
1. **Review this mapping with your QSA** — Confirm which requirements apply to your cardholder data environment.
|
||||
|
||||
2. **Configure certctl for your environment**:
|
||||
- Set `CERTCTL_KEYGEN_MODE=agent` in production.
|
||||
- Define certificate profiles with approved key types.
|
||||
- Configure renewal policies with appropriate thresholds (e.g., 30 days for 90-day certs).
|
||||
- Enable notifier integrations (email, Slack, PagerDuty) for alerts.
|
||||
- Plan `CERTCTL_DISCOVERY_DIRS` on agents to scan all certificate locations.
|
||||
|
||||
3. **Implement operator controls**:
|
||||
- Document certificate management procedures (issuance, renewal, revocation, archival).
|
||||
- Establish API key rotation schedule.
|
||||
- Set up audit log export and archival (monthly to S3, retain 1+ year).
|
||||
- Configure PostgreSQL backups (daily, 1+ year retention).
|
||||
- Plan incident response (who revokes certs, escalation process, timeline).
|
||||
|
||||
4. **Test compliance readiness**:
|
||||
- Trigger a test renewal and verify CRL/OCSP publication.
|
||||
- Export audit trail and verify it shows expected events.
|
||||
- Test revocation workflow and confirm OCSP reflects status within 24 hours.
|
||||
- Run discovery scan and verify unknown certs are detected and triaged.
|
||||
|
||||
5. **Prepare evidence for QSA**:
|
||||
- API endpoint documentation (OpenAPI spec: `api/openapi.yaml`).
|
||||
- Audit log sample (last 90 days of events).
|
||||
- Configuration export (profiles, policies, issuer/target definitions).
|
||||
- Deployment manifest (showing env var config, no hardcoded secrets).
|
||||
- Test certificates and CRL/OCSP query results.
|
||||
|
||||
6. **Plan for V3** (if RBAC/centralized audit required):
|
||||
- Evaluate certctl Pro for OIDC/SSO and NATS audit streaming.
|
||||
- Assess integration with existing identity provider (Okta, Azure AD, etc.).
|
||||
|
||||
---
|
||||
|
||||
## Questions?
|
||||
|
||||
For additional guidance on certctl features and PCI-DSS mapping:
|
||||
- Review the [Architecture Guide](../reference/architecture.md) for system design.
|
||||
- Check [Connectors Documentation](../reference/connectors/index.md) for issuer/target/notifier capabilities.
|
||||
- Run the [Quick Start Guide](../getting-started/quickstart.md) to see features in action.
|
||||
- Consult your QSA for final compliance determination.
|
||||
|
||||
**Last Updated**: March 24, 2026 (certctl v1.0 with M18b discovery and M19 audit logging)
|
||||
@@ -1,589 +0,0 @@
|
||||
# SOC 2 Type II Compliance Mapping
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
This guide maps certctl's implemented features to AICPA SOC 2 Trust Service Criteria (TSC). It is **not a SOC 2 certification claim** — rather, it helps security engineers, auditors, and evaluators understand how certctl supports your organization's SOC 2 compliance posture. Use this as evidence input for your own control assessment during SOC 2 audits.
|
||||
|
||||
## How to Use This Guide
|
||||
|
||||
SOC 2 audits require evidence that your infrastructure meets specific Trust Service Criteria. Auditors ask: "Does your certificate management tooling support CC6.1 logical access controls?" This guide answers by mapping certctl's features to specific criteria and pointing to evidence (API endpoints, configuration, audit trail).
|
||||
|
||||
Each section includes:
|
||||
|
||||
- **The TSC requirement** — what the auditor is looking for
|
||||
- **certctl's implementation** — which features address it
|
||||
- **Evidence location** — where to find proof (API endpoint, config variable, source code, audit events)
|
||||
- **V2 vs V3 status** — whether feature is in the free community edition (V2) or paid Pro edition (V3)
|
||||
- **Operator responsibility** — aspects your organization must handle outside of certctl
|
||||
|
||||
## Contents
|
||||
|
||||
1. [How to Use This Guide](#how-to-use-this-guide)
|
||||
2. [CC6: Logical and Physical Access Controls](#cc6-logical-and-physical-access-controls)
|
||||
- [CC6.1 — Logical Access Security](#cc61--logical-access-security)
|
||||
- [CC6.2 — Prior to Issuing System Credentials](#cc62--prior-to-issuing-system-credentials)
|
||||
- [CC6.3 — Authentication Policies](#cc63--authentication-policies)
|
||||
- [CC6.7 — Information Transmission Protection](#cc67--information-transmission-protection)
|
||||
3. [CC7: System Operations](#cc7-system-operations)
|
||||
- [CC7.1 — System Monitoring](#cc71--system-monitoring)
|
||||
- [CC7.2 — Anomaly Detection](#cc72--anomaly-detection)
|
||||
- [CC7.3 — Incident Response](#cc73--incident-response)
|
||||
- [CC7.4 — Identify and Develop Risk Mitigation Activities](#cc74--identify-and-develop-risk-mitigation-activities)
|
||||
4. [A1: Availability](#a1-availability)
|
||||
- [A1.1/A1.2 — Availability and Recovery](#a11a12--availability-and-recovery)
|
||||
5. [CC8: Change Management](#cc8-change-management)
|
||||
- [CC8.1 — Change Control](#cc81--change-control)
|
||||
6. [Evidence Summary Table](#evidence-summary-table)
|
||||
7. [What Requires Operator Action](#what-requires-operator-action)
|
||||
8. [V3 Enhancements](#v3-enhancements)
|
||||
9. [Conclusion](#conclusion)
|
||||
|
||||
## CC6: Logical and Physical Access Controls
|
||||
|
||||
### CC6.1 — Logical Access Security
|
||||
|
||||
**Requirement**: The entity restricts logical access to digital and information assets and related facilities by applying user identity authentication, registration, access rights, and usage policies.
|
||||
|
||||
**certctl Implementation** (V2 — Community Edition):
|
||||
|
||||
- **API Key Authentication** — All `/api/v1/*` calls require a Bearer token (hashed with SHA-256, stored securely, validated with constant-time comparison) or are rejected with 401 Unauthorized. Environment: `CERTCTL_AUTH_TYPE` (default `api-key`; `none` requires explicit opt-in with log warning)
|
||||
- **Standards-based enrollment and PKI distribution endpoints** — EST (`/.well-known/est/*`, RFC 7030), SCEP (`/scep`, `/scep/*`, RFC 8894), and CRL/OCSP (`/.well-known/pki/crl/{issuer_id}`, `/.well-known/pki/ocsp/{issuer_id}/{serial}`, RFC 5280 §5 / RFC 6960 / RFC 8615) are served unauthenticated at the HTTP layer because these protocols cannot present certctl Bearer tokens. Authentication is enforced in-protocol: EST relies on CSR signature verification plus profile policy (RFC 7030 §3.2.3 says EST auth is deployment-specific; §4.1.1 makes `/cacerts` explicitly anonymous); SCEP requires a shared `challengePassword` in the PKCS#10 CSR attributes (OID 1.2.840.113549.1.9.7, RFC 8894 §3.2), validated with `crypto/subtle.ConstantTimeCompare`; CRL and OCSP are intentionally anonymous for relying-party accessibility. CWE-306 (missing authentication for a critical function) is closed for SCEP by `preflightSCEPChallengePassword` in `cmd/server/main.go`, which refuses to start the control plane when `CERTCTL_SCEP_ENABLED=true` is set without `CERTCTL_SCEP_CHALLENGE_PASSWORD`. The HTTP dispatch is implemented in `cmd/server/main.go:buildFinalHandler`, which routes these prefixes through `noAuthHandler` (RequestID + structuredLogger + Recovery only, no auth or rate-limit middleware) and is pinned by the 27-subtest regression harness at `cmd/server/finalhandler_test.go`.
|
||||
- **GUI Authentication** — Web dashboard includes login screen requiring API key entry. Failed auth redirects to login on 401. Auth context persists across page navigation. Logout clears session.
|
||||
- **Configurable CORS** — API restricts cross-origin requests via `CERTCTL_CORS_ORIGINS` allowlist or wildcard. Preflight caching prevents chatty browser auth flows.
|
||||
- **Token Bucket Rate Limiting** — Per-IP rate limiting (configurable via `CERTCTL_RATE_LIMIT_RPS` / `CERTCTL_RATE_LIMIT_BURST`) returns 429 Too Many Requests with Retry-After header. Prevents credential stuffing and brute-force attacks.
|
||||
- **No Password Storage** — certctl does not store user passwords. API keys are the sole authentication mechanism. Your API key generation, distribution, and rotation policies are your responsibility (see "Operator Responsibility" below).
|
||||
- **Zero-Downtime Key Rotation** — `CERTCTL_AUTH_SECRET` accepts comma-separated keys (e.g., `new-key,old-key`). All listed keys are validated with constant-time comparison. Operators can add a new key, migrate clients, then remove the old key — no service restart required for the client migration phase. A single-key warning is logged at startup to encourage rotation configuration.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- API auth implementation: `internal/api/middleware/auth.go`
|
||||
- Auth check endpoint: `GET /api/v1/auth/check` (validates credentials)
|
||||
- Auth info endpoint: `GET /api/v1/auth/info` (returns current auth mode, served without auth so GUI detects mode)
|
||||
- Rate limiting middleware: `internal/api/middleware/rate_limit.go`
|
||||
- CORS configuration: `cmd/server/main.go`, search for `CERTCTL_CORS_ORIGINS`
|
||||
- Final handler dispatch (authenticated vs. unauthenticated routing): `cmd/server/main.go:buildFinalHandler`
|
||||
- SCEP preflight gate (CWE-306 closure): `cmd/server/main.go:preflightSCEPChallengePassword`
|
||||
- SCEP service-layer defense-in-depth (rejects enrollment on empty challenge password, `crypto/subtle.ConstantTimeCompare`): `internal/service/scep.go`
|
||||
- Final handler dispatch regression harness (27 subtests): `cmd/server/finalhandler_test.go`
|
||||
- OpenAPI spec `security: []` overrides on unauthenticated paths: `api/openapi.yaml` (EST `/cacerts`, `/simpleenroll`, `/simplereenroll`, `/csrattrs`; SCEP `/scep` GET+POST; PKI `/crl/{issuer_id}`, `/ocsp/{issuer_id}/{serial}`)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC / SSO Integration** — Optional OIDC providers (Okta, Azure AD, Google) with multi-tenant support. API key fallback for service accounts.
|
||||
- **API Key Scoping** — Per-resource or per-action permissions (e.g., "read certificates from production only" or "issue certs, no revoke")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Generate and securely distribute API keys to authorized users and systems
|
||||
- Rotate API keys regularly (recommend quarterly)
|
||||
- Revoke API keys immediately upon employee departure
|
||||
- Do not commit API keys to version control (use `.env` or secrets management)
|
||||
- Implement your own IP allowlisting at the firewall if needed (certctl enforces CORS at the HTTP layer, not at network layer)
|
||||
|
||||
---
|
||||
|
||||
### CC6.2 — Prior to Issuing System Credentials
|
||||
|
||||
**Requirement**: The entity provisions, modifies, disables, and removes user identities and rights based on an authorization process that considers user responsibility level and changes in those responsibilities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Ownership Attribution** — Certificates can be assigned to an owner (email + name). Owner information is stored and audited (see CC7.2). Ownership is tracked through the lifecycle (issuance, renewal, deployment, revocation). Ownership reassignment is audited via the immutable audit trail.
|
||||
- **Team Assignment** — Owners can be organized into teams. Certificate policies can route notifications to team email addresses.
|
||||
- **Audit Trail Attribution** — Every API call records the actor (extracted from the API key or auth context). The audit trail is immutable — no retroactive modification of who did what.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Ownership domain model: `internal/domain/certificate.go` (OwnerID field)
|
||||
- Owner CRUD API: `GET /api/v1/owners`, `POST /api/v1/owners`, `DELETE /api/v1/owners/{id}`
|
||||
- Team CRUD API: `GET /api/v1/teams`, `POST /api/v1/teams`, `DELETE /api/v1/teams/{id}`
|
||||
- Audit trail API: `GET /api/v1/audit` (actor field in every record)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **RBAC (Role-Based Access Control)** — Predefined roles (Admin, Operator, Viewer) with profile-gated permissions. Administrators manage role assignments.
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Map certctl's ownership model to your organizational structure (departments, teams, on-call rotations)
|
||||
- Establish a formal access request and approval process
|
||||
- Remove ownership access when team members depart
|
||||
- Document your access review process (audit trail shows *who* made changes, but you must justify *why*)
|
||||
|
||||
---
|
||||
|
||||
### CC6.3 — Authentication Policies
|
||||
|
||||
**Requirement**: The entity determines, documents, communicates, and enforces authentication policies that support the identification and authentication of authorized internal and external users and the transmission of user credentials.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **API Key Policy** — All `/api/v1/*` access requires an API key or explicit opt-out. Opt-out (`CERTCTL_AUTH_TYPE=none`) logs a warning: "WARNING: Auth disabled (CERTCTL_AUTH_TYPE=none) — this is insecure and only for development". Configuration choice is logged at startup. The standards-based enrollment and PKI distribution endpoints (EST, SCEP, CRL, OCSP) are served unauthenticated at the HTTP layer per their respective RFCs; see CC6.1 for the full authentication contract and CWE-306 closure via `preflightSCEPChallengePassword`.
|
||||
- **Agent Authentication** — Agents authenticate to the server via API keys (same mechanism as users). Agent credentials are separate from user API keys.
|
||||
- **Private Key Policy** — Agent-side key generation is the default (`CERTCTL_KEYGEN_MODE=agent`). Server-side keygen (`CERTCTL_KEYGEN_MODE=server`) requires explicit configuration and logs a warning: "server-side key generation enabled (CERTCTL_KEYGEN_MODE=server) — private keys touch control plane, demo only".
|
||||
- **Password Policy** — Not applicable; certctl uses API keys exclusively. Password management is delegated to your organization's IAM system if you integrate OIDC/SSO (V3).
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Auth type configuration: `internal/config/config.go`, `CERTCTL_AUTH_TYPE` env var
|
||||
- Startup logging: `cmd/server/main.go` (logs auth mode at server startup)
|
||||
- Keygen mode configuration: `internal/config/config.go`, `CERTCTL_KEYGEN_MODE` env var
|
||||
- Keygen mode warning: `cmd/server/main.go` and `cmd/agent/main.go`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **OIDC Policy** — Mandatory MFA when OIDC is enabled
|
||||
- **API Key Expiration** — Automatic key rotation policies (e.g., 90-day expiration for user keys, no expiration for long-lived service account keys)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Document your API key generation and distribution policy
|
||||
- Establish a formal change control process for auth configuration changes
|
||||
- Test authentication failures (e.g., expired keys, malformed tokens) in a non-production environment
|
||||
- Integrate certctl authentication into your organization's IAM audit reports (who has API keys, when were they issued, who has revoked them)
|
||||
|
||||
---
|
||||
|
||||
### CC6.7 — Information Transmission Protection
|
||||
|
||||
**Requirement**: The entity restricts the transmission, movement, and removal of information in a manner that prevents unauthorized disclosure, whether through digital or non-digital means.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **TLS for Control Plane** — All API communication occurs over HTTPS (TLS 1.2+). Server uses `tls.Dial()` for outbound connections to issuers and targets. Configuration: `CERTCTL_SERVER_HOST` (default `127.0.0.1`) + `CERTCTL_SERVER_PORT` (default `8080`; Docker Compose maps to `8443`).
|
||||
- **Agent-to-Server Communication** — Agents submit CSRs and heartbeats over HTTPS to the server using the same TLS stack.
|
||||
- **Private Key Isolation** — Agents generate ECDSA P-256 private keys locally (`crypto/ecdsa` + `crypto/elliptic`). Private keys are never transmitted to the server — agents submit CSRs only. Private keys are stored on agent filesystem (`CERTCTL_KEY_DIR`, default `/var/lib/certctl/keys`) with 0600 (owner read/write only) permissions. Server-side keygen mode logs a development warning; production must use agent-side keygen.
|
||||
- **Certificate Storage** — Signed certificates are stored in PostgreSQL as PEM text (along with metadata). Certificates are not secrets and may be transmitted plaintext. Private keys are never stored on the control plane in production (agent-side keygen mode).
|
||||
- **Deployment via Target Connectors** — Target connectors write certificates and keys to local filesystem or network appliance APIs. For NGINX/Apache httpd, files are written with restrictive permissions (0600 for keys). For F5/IIS (V3+), credentials are scoped to a proxy agent in the same network zone — the server never holds network appliance credentials.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- TLS configuration: deploy certctl behind a TLS-terminating reverse proxy (NGINX, HAProxy, or cloud load balancer) or use a TLS sidecar
|
||||
- Agent keygen mode: `cmd/agent/main.go` (ECDSA key generation, filesystem storage with 0600)
|
||||
- Private key handling: `internal/connector/target/nginx/nginx.go` and similar (cert/key file write)
|
||||
- Server-side keygen deprecation: `internal/service/renewal.go` (log warning when enabled)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Hardware Security Module (HSM) Support** — Optional HSM backend for CA key storage (SubCA and Local CA modes)
|
||||
- **Secrets Rotation** — Encrypted key rotation without server restart
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Enable TLS on the control plane in production (deploy behind a TLS-terminating reverse proxy or load balancer with valid certificates)
|
||||
- Enforce TLS on agent-to-server communication via firewall rules (no cleartext HTTP)
|
||||
- Protect agent filesystem key storage with:
|
||||
- File-level permissions (already 0600)
|
||||
- Encrypted filesystems (LUKS, BitLocker, or cloud provider equivalents)
|
||||
- Backup encryption (keys backed up to vault or HSM, never in cleartext backups)
|
||||
- Restrict PostgreSQL access to authorized services only (network isolation, authentication)
|
||||
- For target systems, ensure network traffic from agents to targets is encrypted (TLS, IPsec, or VPN)
|
||||
|
||||
---
|
||||
|
||||
## CC7: System Operations
|
||||
|
||||
### CC7.1 — System Monitoring
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Endpoint** — `GET /health` returns 200 OK with service status. Consumed by Docker health checks and Kubernetes probes.
|
||||
- **Readiness Endpoint** — `GET /ready` returns 200 OK when the database is connected and migrations are applied.
|
||||
- **Background Scheduler Monitoring** — 12 background loops (8 always-on + 4 opt-in) run on a fixed schedule. Authoritative topology in `docs/architecture.md`:
|
||||
- Renewal loop (always-on, 1 hour): scans for certificates approaching renewal threshold
|
||||
- Job processor loop (always-on, 30 seconds): picks up pending/waiting jobs and advances their state
|
||||
- Job retry loop (always-on, 5 minutes, `CERTCTL_SCHEDULER_RETRY_INTERVAL`): retries Failed jobs (I-001)
|
||||
- Job timeout reaper loop (always-on, 10 minutes, `CERTCTL_JOB_TIMEOUT_INTERVAL`): fails AwaitingCSR/AwaitingApproval jobs past timeout (I-003)
|
||||
- Agent health check loop (always-on, 2 minutes): pings agents to detect downtime
|
||||
- Notification dispatcher loop (always-on, 1 minute): sends queued alerts
|
||||
- Notification retry loop (always-on, 2 minutes, `CERTCTL_NOTIFICATION_RETRY_INTERVAL`): exponential backoff retry for failed notifications; promote to dead-letter after 5 attempts (I-005)
|
||||
- Short-lived cert expiry loop (always-on, 30 seconds): marks expired short-lived credentials
|
||||
- Network scanner loop (opt-in, 6 hours, `CERTCTL_NETWORK_SCAN_ENABLED`): scans enabled TLS endpoints for certificate discovery
|
||||
- Digest emailer loop (opt-in, 24 hours, `CERTCTL_DIGEST_INTERVAL`): sends scheduled certificate digest email to configured recipients
|
||||
- Endpoint health loop (opt-in, 60 seconds, `CERTCTL_HEALTH_CHECK_INTERVAL`): continuous TLS health probes (M48)
|
||||
- Cloud discovery loop (opt-in, 6 hours, `CERTCTL_CLOUD_DISCOVERY_INTERVAL`): cloud secret manager certificate discovery (M50)
|
||||
Each loop includes `atomic.Bool` idempotency guards, error handling, and structured slog failure logs.
|
||||
- **Metrics Endpoints** — Two formats for monitoring integration:
|
||||
- `GET /api/v1/metrics` — JSON object with gauges, counters, and uptime for custom dashboards
|
||||
- `GET /api/v1/metrics/prometheus` — Prometheus exposition format (`text/plain; version=0.0.4`) for native scraping by Prometheus, Grafana Agent, Datadog, and other OpenMetrics-compatible collectors
|
||||
- **Gauges** — `certctl_certificate_total`, `certctl_certificate_active`, `certctl_certificate_expiring`, `certctl_certificate_expired`, `certctl_certificate_revoked`, `certctl_agent_total`, `certctl_agent_active`, `certctl_job_pending`
|
||||
- **Counters** — `certctl_job_completed_total`, `certctl_job_failed_total`
|
||||
- **Uptime** — `certctl_uptime_seconds` (seconds since server start)
|
||||
All values are point-in-time snapshots computed from database tables.
|
||||
- **Structured Logging** — All scheduler operations, API calls, and connector actions log via `slog` (Go's structured logger). Logs include timestamp, level (DEBUG/INFO/WARN/ERROR), structured fields (e.g., `actor`, `resource_id`, `latency_ms`), and request IDs for tracing.
|
||||
- **Request ID Propagation** — Each HTTP request gets a unique ID (`X-Request-ID` header). The ID is included in all correlated logs, making it easy to trace a single request through multiple service layers.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health/readiness endpoints: `internal/api/handler/health.go`
|
||||
- Background scheduler: `internal/scheduler/scheduler.go` (Start method)
|
||||
- Metrics endpoint: `internal/api/handler/metrics.go`
|
||||
- Stats API endpoints (for detailed time-series): `internal/api/handler/stats.go`
|
||||
- `GET /api/v1/stats/summary` — dashboard KPIs
|
||||
- `GET /api/v1/stats/certificates-by-status` — cert counts by status
|
||||
- `GET /api/v1/stats/expiration-timeline?days=N` — cert expiry distribution
|
||||
- `GET /api/v1/stats/job-trends?days=N` — job completion/failure rates
|
||||
- `GET /api/v1/stats/issuance-rate?days=N` — cert issuance volume
|
||||
- Structured logging middleware: `internal/api/middleware/middleware.go`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure log aggregation (e.g., ELK, Datadog, Splunk) to centralize certctl logs
|
||||
- Set up alerting on scheduler loop failures (e.g., "renewal loop failed to complete within 2h")
|
||||
- Configure health check monitoring (e.g., Prometheus scrape of `/health` and `/ready`)
|
||||
- Establish thresholds for metrics (e.g., alert if `pending_jobs > 50` or `agents_healthy < total_agents`)
|
||||
- Document your log retention policy (audit requirement often mandates 1+ years)
|
||||
- Integrate certctl metrics into your broader observability stack (Grafana dashboards, SLO tracking)
|
||||
|
||||
---
|
||||
|
||||
### CC7.2 — Anomaly Detection
|
||||
|
||||
**Requirement**: The entity monitors system components and the operation of those components for anomalies that are indicative of malfunction, including the implementation of monitoring tools, the reporting of results of those monitoring activities, and the identification, documentation, analysis, and resolution of system anomalies.
|
||||
|
||||
(This criterion overlaps CC7.1 and extends it to specific anomaly response mechanisms.)
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Immutable API Audit Trail** (M19) — Every API call is recorded to `audit_events` table (append-only, no update/delete). Recorded: HTTP method, URL path (query parameters intentionally excluded — see security note), actor (user/agent ID), SHA-256 hash of request body (truncated 16 chars for brevity), response status code, latency in milliseconds. Excluded paths (health, ready) are configurable. Audit records are async (non-blocking) and include a timestamp. **Security: Query parameters are excluded from the audit path** because they may contain cursor tokens, API keys, or sensitive filter values; since the audit trail is append-only with no deletion, any sensitive data recorded would persist permanently.
|
||||
- **Audit Trail API** — `GET /api/v1/audit?actor=...&action=...&resource_id=...&created_after=...&created_before=...` allows searching for anomalous patterns (e.g., "who accessed certificate XYZ and when?", "did anyone revoke certs at 2 AM?").
|
||||
- **Expiration Threshold Alerting** — Certificate renewal policies define alert thresholds (days before expiry): default `[30, 14, 7, 0]`. When a certificate approaches a threshold, a notification is enqueued. Deduplication prevents duplicate alerts for the same cert at the same threshold. Auto status transition: cert moves to `Expiring` status at 30 days, `Expired` at 0 days.
|
||||
- **Certificate Status Auto-Transitions** — When a cert is issued, it's `Active`. As expiry approaches, status auto-transitions to `Expiring` (at 30d threshold). At expiry, status becomes `Expired`. Revoked certs move to `Revoked`. These transitions are recorded in the audit trail.
|
||||
- **Notification Routing** — Alerts are sent via configured notifiers (Email, Slack, Teams, PagerDuty, OpsGenie). Certificates are routed to their owner's email address (or team email if no individual owner). This allows on-call teams to react to anomalies (e.g., "your production cert will expire in 7 days, request renewal now").
|
||||
- **Deployment Rollback** — If a deployment fails or an older certificate needs to be reactivated, operators can trigger a "rollback" via the GUI. This redeploys a previous certificate version to the target. Rollback actions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Audit middleware: `internal/api/middleware/audit.go`
|
||||
- Audit trail API: `internal/api/handler/audit.go`, `GET /api/v1/audit`
|
||||
- Expiration alerting: `internal/service/renewal.go` (CheckRenewal method)
|
||||
- Notification dispatcher: `internal/scheduler/scheduler.go` (notificationTicker)
|
||||
- Status transitions: `internal/service/certificate.go` (auto status update logic)
|
||||
- Audit trail CLI export: `certctl-cli audit export --format csv` / `--format json`
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **SIEM Export** — Real-time audit event streaming to SIEM systems (via NATS event bus with JetStream sink)
|
||||
- **Anomaly Rules Engine** — Configurable rules (e.g., "alert if certificate revoked by non-admin", "alert if >10 certs issued in < 1 hour")
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Integrate audit trail into your SIEM / log analysis platform
|
||||
- Define alerting rules and thresholds for anomalies (e.g., "revocation of critical cert", "mass issuance")
|
||||
- Establish a formal incident response workflow (audit trail shows *what* happened; you must decide *what to do* about it)
|
||||
- Regularly review audit logs (e.g., monthly compliance audit of who accessed what)
|
||||
- Configure email/Slack/Teams integration so on-call teams are notified of cert expirations immediately
|
||||
- Encrypt audit trail backups (ACID guarantees don't prevent theft of database backups)
|
||||
|
||||
---
|
||||
|
||||
### CC7.3 — Incident Response
|
||||
|
||||
**Requirement**: The entity detects, investigates, and responds to incidents by executing a defined incident response and management process that includes preparation, detection and analysis, containment, eradication, recovery, and post-incident activities.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Revocation API** — `POST /api/v1/certificates/{id}/revoke` with RFC 5280 reason codes:
|
||||
- `unspecified` — catch-all
|
||||
- `keyCompromise` — private key was exposed
|
||||
- `caCompromise` — CA itself was compromised (rare)
|
||||
- `affiliationChanged` — certificate no longer applies to the organization
|
||||
- `superseded` — newer cert is in use
|
||||
- `cessationOfOperation` — service is shutting down
|
||||
- `certificateHold` — temporary revocation (can be "unhold" by reissue)
|
||||
- `privilegeWithdrawn` — access rights revoked
|
||||
Revocation is **immediate** (no approval workflow). The certificate is marked `Revoked` in inventory, an audit event is logged, and optional issuer notification is best-effort. All revoked certs are excluded from active deployments.
|
||||
- **CRL Endpoint** — `GET /.well-known/pki/crl/{issuer_id}` returns a DER-encoded X.509 CRL signed by the issuing CA (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`), served unauthenticated for relying parties that don't hold certctl API credentials.
|
||||
- **OCSP Responder** — `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` returns a signed OCSP response indicating whether a cert is good, revoked, or unknown (RFC 6960, `Content-Type: application/ocsp-response`). Also unauthenticated. Clients (browsers, TLS libraries) query this endpoint to verify cert validity in real-time.
|
||||
- **Revocation Notifications** — When a cert is revoked, notifications are sent to:
|
||||
- Certificate owner (email)
|
||||
- Configured webhooks (if you have a SIEM that subscribes)
|
||||
- Slack/Teams channels (if notifiers are configured)
|
||||
- **Bulk Revocation for Fleet-Wide Incidents** (V2.2) — `POST /api/v1/certificates/bulk-revoke` with filter criteria (profile, owner, agent, issuer) revokes all matching certificates in a single operation. Essential for incident response: key compromise affecting multiple certs, CA distrust events, decommissioning a team's infrastructure. Each bulk revocation creates individual jobs reusing the existing revocation pipeline, ensuring audit trail and notifications for every certificate.
|
||||
- **Short-Lived Cert Exemption** — Certificates with TTL < 1 hour (configured in profile) skip CRL/OCSP publication. Expiry is the revocation mechanism for short-lived certs (e.g., Kubernetes pod certs, session tokens).
|
||||
- **Deployment Rollback** — If a revoked cert is still deployed (shouldn't happen, but race conditions exist), operators can manually redeploy a previous version via the GUI. Rollback is audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Revocation API: `internal/api/handler/certificates.go`, `POST /api/v1/certificates/{id}/revoke`
|
||||
- Revocation domain model: `internal/domain/revocation.go` (RevocationReason type with RFC 5280 mapping)
|
||||
- CRL generation: `internal/service/certificate.go` (GenerateDERCRL method)
|
||||
- OCSP signing: `internal/service/certificate.go` (GetOCSPResponse method)
|
||||
- Revocation notifications: `internal/service/notification.go` (SendRevocationNotification)
|
||||
- Short-lived exemption: `internal/domain/revocation.go` (IsShortLivedCert check)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Revocation Automation** — Trigger revocation based on external events (e.g., employee termination, security breach alert from CT Log monitoring)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Establish an incident response policy (e.g., "keyCompromise → immediate deployment to new cert + notify CISO")
|
||||
- Ensure CRL/OCSP are accessible to all systems using the certs (e.g., CDN or highly-available endpoints if you host on-premises)
|
||||
- Test revocation workflow in staging (verify that revoked certs are actually blocked by clients)
|
||||
- Document justification for revocation (audit trail records *that* a cert was revoked, but not *why* — you must document it separately)
|
||||
- Integrate revocation notifications into your on-call rotation (don't let revocation alerts get lost)
|
||||
|
||||
---
|
||||
|
||||
### CC7.4 — Identify and Develop Risk Mitigation Activities
|
||||
|
||||
**Requirement**: The entity identifies, develops, and implements risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Renewal Job Tracking** — Renewal jobs track the certificate, target agents, and issuance outcome. Failed renewals are retried (configurable backoff). Job state diagram: Pending → Running → Completed (or Failed). Failed jobs trigger notifications.
|
||||
- **Agent Health Monitoring** — Health check loop (every 2m) pings all agents via heartbeat. If an agent misses 3 consecutive heartbeats, it's marked as `Unhealthy`. Unhealthy agents are excluded from new deployments.
|
||||
- **Job Cancellation** — Operators can cancel pending jobs via `POST /api/v1/jobs/{id}/cancel`. Useful when a renewal is already in progress elsewhere (multi-instance deployments) or when a certificate is being phased out.
|
||||
- **Interactive Approval** — Renewal/issuance jobs can be put in `AwaitingApproval` status. An authorized operator reviews the pending cert and approves or rejects it. Rejection records a reason in the audit trail. This provides a separation of duty between requestor and approver.
|
||||
- **Scheduled Scanning** — Agents scan configured directories for existing certs (M18b discovery). Operators triage discovered certs (claim = "we manage this now", dismiss = "this is unmanaged and we're OK with that"). Triage decisions are audited.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Job state machine: `internal/domain/job.go` (JobStatus enum)
|
||||
- Job retry logic: `internal/scheduler/scheduler.go` (jobProcessorTicker)
|
||||
- Agent health check: `internal/scheduler/scheduler.go` (healthCheckTicker)
|
||||
- Job cancellation: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/cancel`
|
||||
- Approval workflow: `internal/api/handler/jobs.go`, `POST /api/v1/jobs/{id}/approve` / `reject`
|
||||
- Discovery scan results: `internal/api/handler/discovery.go`, `GET /api/v1/discovered-certificates`
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Monitor renewal job success rate (are certs being renewed before expiry?)
|
||||
- Set up alert for unhealthy agents (missing 3+ heartbeats = broken agent, take action)
|
||||
- Establish a formal approval policy (who can approve certs? do they need to involve CISO?)
|
||||
- Test job cancellation and recovery flows in staging
|
||||
- Review discovered certs regularly (are there unmanaged certs that should be managed?)
|
||||
- Document your disaster recovery process (what if control plane database is corrupted?)
|
||||
|
||||
---
|
||||
|
||||
## A1: Availability
|
||||
|
||||
### A1.1/A1.2 — Availability and Recovery
|
||||
|
||||
**Requirement**: The entity obtains or generates, uses, retains, and disposes of information to enable the entity to meet its objectives and respond to its responsibility to provide information.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Health Probes** — `/health` and `/ready` endpoints support container orchestration (Docker Compose, Kubernetes, etc.). Docker Compose defines health checks for the server and database. Kubernetes would use liveness/readiness probes pointing to these endpoints.
|
||||
- **Database Migrations (Idempotent)** — PostgreSQL migrations use `IF NOT EXISTS` and `ON CONFLICT ... DO NOTHING` patterns. Migrations can be safely reapplied — no risk of doubling data or dropping tables mid-migration.
|
||||
- **Agent Panic Recovery** — Agent binary includes panic recovery in job execution loops. If an agent crashes during a deployment, the control plane marks the job as failed and can retry on a healthy agent.
|
||||
- **Exponential Backoff** — Agent-to-server communication uses exponential backoff (starting at 1s, capped at 5m) to handle transient network failures. This prevents thundering herd when the control plane is temporarily down.
|
||||
- **Docker Compose Deployment** — Includes health checks for server and database. Services auto-restart on failure.
|
||||
- **PostgreSQL Connection Pooling** — Server uses `database/sql` with configurable `MaxOpenConns` and `MaxIdleConns` (default 25/5). Prevents connection exhaustion.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Health endpoints: `internal/api/handler/health.go`
|
||||
- Database migrations: `migrations/` directory (all use `IF NOT EXISTS`, idempotent patterns)
|
||||
- Agent panic recovery: `cmd/agent/main.go` (defer recover() in job execution)
|
||||
- Exponential backoff: `cmd/agent/main.go` (heartbeat and work poll backoff logic)
|
||||
- Connection pooling: `cmd/server/main.go` (SetMaxOpenConns, SetMaxIdleConns)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Multi-Region HA** — Control plane federation with etcd consensus (operator can run N replicas)
|
||||
- **PostgreSQL HA** — Replication standby with automatic failover (operator responsibility to configure)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Configure PostgreSQL backups (e.g., WAL archiving, daily full backups). Certctl stores certificates but *also* stores renewal policies, audit trail, deployment history.
|
||||
- Test backup/restore process in staging (broken backups are discovered during incidents)
|
||||
- Monitor disk usage (PostgreSQL will fail if `/var` fills up)
|
||||
- Plan capacity (how many certs, agents, jobs can your PostgreSQL handle? Certctl is tested with 10k+ certs, 100+ agents, but your infra may differ)
|
||||
- Set up high-availability PostgreSQL if you need zero-downtime upgrades
|
||||
- Implement network segmentation (only authorized services can reach certctl API and database)
|
||||
|
||||
---
|
||||
|
||||
## CC8: Change Management
|
||||
|
||||
### CC8.1 — Change Control
|
||||
|
||||
**Requirement**: The entity identifies, selects, and develops risk mitigation activities for risks arising from potential business disruptions.
|
||||
|
||||
**certctl Implementation** (V2):
|
||||
|
||||
- **Certificate Profiles** — Named profiles define allowed key types, max TTL, required SANs, and permitted EKUs. Changes to profiles are common (e.g., "increase max TTL from 1 year to 3 years"). All profile changes are audited (who changed what, when). Profile updates are versioned.
|
||||
- **Policy Engine** — Renewal policies define alert thresholds and approval workflows. Policy changes (e.g., "lower alert threshold from 30 days to 14 days") are audited. Policies have violation rules (e.g., "flag certs longer than 3 years") — violations are recorded in the audit trail.
|
||||
- **Target Configuration** — When a new target (NGINX server, HAProxy load balancer) is added, it's registered with a name and configuration (JSON). Target deletions require confirmation (to prevent accidental removal). All target changes are audited.
|
||||
- **Immutable Audit Trail** — Every change (profile, policy, target, cert, agent, owner, team, approval, revocation, deployment) is recorded in `audit_events`. Audit records are append-only; no retroactive modification is possible. Audit trail is encrypted at rest (operator responsibility).
|
||||
- **GitHub Actions CI** — Pull requests must pass:
|
||||
- Go unit tests (`go test ./...`) with coverage gates (service layer ≥30%, handler layer ≥50%)
|
||||
- Go vet (static analysis)
|
||||
- Frontend TypeScript type checking (`tsc`)
|
||||
- Frontend Vitest unit tests
|
||||
- Frontend Vite build (ensures no broken imports)
|
||||
Only after all checks pass can the PR be merged and deployed.
|
||||
|
||||
**Evidence Locations**:
|
||||
|
||||
- Profile CRUD: `internal/api/handler/profiles.go`, `GET /api/v1/profiles` / `POST` / `PUT` / `DELETE`
|
||||
- Policy CRUD: `internal/api/handler/policies.go`
|
||||
- Target CRUD: `internal/api/handler/targets.go`
|
||||
- Audit trail: `internal/api/handler/audit.go`, `GET /api/v1/audit` (records action, actor, resource_id, timestamp)
|
||||
- CI configuration: `.github/workflows/ci.yml` (test, vet, coverage gates, build checks)
|
||||
|
||||
**V3 Enhancement**:
|
||||
|
||||
- **Change Approval Workflow** — Optional approval gate before profile/policy changes go live
|
||||
- **Feature Flags** — Enable/disable new features without redeployment (backward compatibility during rolling upgrades)
|
||||
|
||||
**Operator Responsibility**:
|
||||
|
||||
- Implement formal change control (ticket system, approval, peer review)
|
||||
- Document the business justification for profile/policy changes
|
||||
- Test changes in a non-production environment before deploying to production
|
||||
- Have a rollback plan (can you revert a profile change instantly if it breaks issuance?)
|
||||
- Include certctl configuration changes in your change log (for audits and incident investigations)
|
||||
- Version control your certctl configuration (Docker Compose file, environment variables) so you can track changes
|
||||
|
||||
---
|
||||
|
||||
## Evidence Summary Table
|
||||
|
||||
| SOC 2 Criterion | certctl Feature | Evidence Location | V2 (Free) | V3 (Pro) | Operator Responsibility |
|
||||
|---|---|---|---|---|---|
|
||||
| **CC6.1** Logical Access Security | API Key Authentication (SHA-256 hashed, constant-time comparison) | `internal/api/middleware/auth.go` | ✅ | Enhanced | API key generation, distribution, rotation |
|
||||
| | GUI Login with API Key | `web/src/pages/LoginPage.tsx` | ✅ | Enhanced (OIDC) | NA |
|
||||
| | CORS Allowlist | `CERTCTL_CORS_ORIGINS` env var | ✅ | ✅ | Configure appropriately |
|
||||
| | Token Bucket Rate Limiting | `internal/api/middleware/rate_limit.go` | ✅ | ✅ | Monitor for brute-force attempts |
|
||||
| **CC6.2** Prior to Issuing System Credentials | Ownership Attribution | `GET /api/v1/owners`, audit trail records owner assignment | ✅ | Enhanced (RBAC) | Map to org structure, remove on departure |
|
||||
| | Team Assignment | `GET /api/v1/teams` | ✅ | ✅ | NA |
|
||||
| | Actor Attribution in Audit Trail | `GET /api/v1/audit` (actor field) | ✅ | ✅ | Justify all changes via separate documentation |
|
||||
| **CC6.3** Authentication Policies | API Key Enforcement | `CERTCTL_AUTH_TYPE=api-key` (default) | ✅ | Enhanced (OIDC, MFA) | Document policy, test failures, integrate into IAM audit |
|
||||
| | Agent Authentication | Separate API keys for agents | ✅ | ✅ | Rotate agent keys, monitor compromise |
|
||||
| | Agent-Side Key Generation | `CERTCTL_KEYGEN_MODE=agent` (default) | ✅ | ✅ | Protect agent filesystem keys via encryption/backup |
|
||||
| | Private Key Policy | Server-side keygen logs warning, disabled in production | ✅ | ✅ | Never use server-side keygen in production |
|
||||
| **CC6.7** Information Transmission Protection | TLS for Control Plane | Deploy behind TLS-terminating reverse proxy | ✅ | ✅ | Enable TLS in production via reverse proxy |
|
||||
| | Agent-to-Server HTTPS | Agents use HTTPS for all API calls | ✅ | ✅ | Enforce TLS via firewall rules |
|
||||
| | Private Key Isolation | Agent-side keygen (ECDSA P-256), keys stored 0600 on agent FS | ✅ | ✅ | Encrypt agent filesystems, backup securely |
|
||||
| | Pull-Only Deployment | Server never initiates outbound to agents/targets | ✅ | Enhanced (HSM, proxy agents) | Encrypt agent↔target comms, isolate proxy agents |
|
||||
| **CC7.1** System Monitoring | Health Endpoint | `GET /health`, `GET /ready` | ✅ | ✅ | Integrate into monitoring (Prometheus, DataDog) |
|
||||
| | Metrics JSON Endpoint | `GET /api/v1/metrics` (gauges, counters, uptime) | ✅ | ✅ | Set thresholds, configure alerting |
|
||||
| | Stats API (time-series) | `GET /api/v1/stats/*` (summary, status, expiration, jobs, issuance) | ✅ | ✅ | Integrate into dashboards, SLO tracking |
|
||||
| | Structured Logging | `slog` middleware with request IDs | ✅ | ✅ | Aggregate logs to SIEM, define retention policy |
|
||||
| | Background Scheduler | 12 loops (8 always-on: renewal 1h, jobs 30s, job retry 5m I-001, job timeout 10m I-003, health 2m, notifications 1m, notif retry 2m I-005, short-lived 30s; 4 opt-in: network scan 6h, digest 24h, endpoint health 60s M48, cloud discovery 6h M50) | ✅ | ✅ | Alert on scheduler loop failures |
|
||||
| **CC7.2** Anomaly Detection | Immutable API Audit Trail | `internal/api/middleware/audit.go`, `GET /api/v1/audit` | ✅ | Enhanced (SIEM export) | Integrate into SIEM, search for anomalies, archive long-term |
|
||||
| | Expiration Threshold Alerting | Configurable per-policy (default 30/14/7/0 days) | ✅ | ✅ | Configure thresholds, integrate notifications |
|
||||
| | Status Auto-Transitions | Active → Expiring (30d) → Expired (0d) | ✅ | ✅ | Monitor status changes in audit trail |
|
||||
| | Notification Routing | Email, Slack, Teams, PagerDuty, OpsGenie | ✅ | ✅ | Configure notifiers, on-call integration |
|
||||
| | Deployment Rollback | Redeploy previous cert version via GUI | ✅ | ✅ | Audit rollback decisions |
|
||||
| **CC7.3** Incident Response | Revocation API (RFC 5280 reasons) | `POST /api/v1/certificates/{id}/revoke` | ✅ | Enhanced (bulk revocation) | Establish incident response policy |
|
||||
| | CRL Endpoint (DER, RFC 5280 §5) | `GET /.well-known/pki/crl/{issuer_id}` (unauthenticated, `application/pkix-crl`) | ✅ | ✅ | Ensure CRL/OCSP accessible to all clients without API keys |
|
||||
| | OCSP Responder (RFC 6960) | `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (unauthenticated, `application/ocsp-response`) | ✅ | ✅ | Test revocation in staging |
|
||||
| | Revocation Notifications | Email, webhook, Slack/Teams on revocation | ✅ | ✅ | Integrate into on-call, document justification separately |
|
||||
| | Short-Lived Cert Exemption | TTL < 1h skip CRL/OCSP | ✅ | ✅ | Configure profiles appropriately |
|
||||
| **CC7.4** Risk Mitigation | Renewal Job Tracking | Job state machine (Pending → Running → Completed/Failed) | ✅ | ✅ | Monitor renewal success rate |
|
||||
| | Agent Health Monitoring | Health check loop (ping every 2m, mark unhealthy after 3 misses) | ✅ | ✅ | Alert on unhealthy agents, investigate |
|
||||
| | Job Cancellation | `POST /api/v1/jobs/{id}/cancel` | ✅ | ✅ | Test in staging |
|
||||
| | Interactive Approval | AwaitingApproval state, `POST /api/v1/jobs/{id}/approve\|reject` | ✅ | ✅ | Define approval policy, audit decisions |
|
||||
| | Certificate Discovery | Agents scan directories, triage (claim/dismiss) | ✅ | ✅ | Review discovered certs regularly |
|
||||
| **A1.1/A1.2** Availability and Recovery | Health Probes (Docker, Kubernetes) | `/health` and `/ready` endpoints | ✅ | ✅ | Use in container orchestration |
|
||||
| | Idempotent Migrations | `IF NOT EXISTS`, `ON CONFLICT ... DO NOTHING` | ✅ | ✅ | Test migration replay in staging |
|
||||
| | Agent Panic Recovery | Panic recovery in job loops | ✅ | ✅ | Monitor agent crashes in logs |
|
||||
| | Exponential Backoff | Agent heartbeat/work poll backoff (1s → 5m) | ✅ | ✅ | Monitor for control plane downtime |
|
||||
| | PostgreSQL Connection Pooling | MaxOpenConns=25, MaxIdleConns=5 (configurable) | ✅ | ✅ | Monitor connection usage |
|
||||
| **CC8.1** Change Control | Certificate Profiles | CRUD API + GUI, profile changes audited | ✅ | ✅ | Formal change control, test in staging |
|
||||
| | Policy Engine + Violations | CRUD API + GUI, policy changes audited | ✅ | ✅ | Document justification, implement approval workflow |
|
||||
| | Target Registration | CRUD API + GUI, changes audited | ✅ | ✅ | Confirm deletions, version control config |
|
||||
| | Immutable Audit Trail | Append-only `audit_events` table | ✅ | ✅ | Encrypt at rest, archive long-term, no manual edits |
|
||||
| | GitHub Actions CI | Unit tests, vet, coverage gates, build checks | ✅ | ✅ | Review PRs before merge, maintain test quality |
|
||||
|
||||
---
|
||||
|
||||
## What Requires Operator Action
|
||||
|
||||
**certctl is a tool, not a complete compliance solution.** Your organization must handle:
|
||||
|
||||
1. **Physical Security** — Protect the infrastructure (servers, network) running certctl. Certctl can't control who has physical access to your datacenter.
|
||||
|
||||
2. **Personnel Background Checks** — Before granting anyone API key access, conduct background checks per your policy. Certctl records *who* accessed *what*, but doesn't verify that people are trustworthy.
|
||||
|
||||
3. **Formal Incident Response Plan** — Certctl provides incident detection (anomalies in audit trail) and tools for response (revocation, rollback), but you must define *when* to use them and *who* decides.
|
||||
|
||||
4. **Access Review and Removal** — Certctl stores ownership, teams, and API keys. You must:
|
||||
- Regularly review who has access (quarterly or semi-annually)
|
||||
- Immediately revoke API keys for departing employees
|
||||
- Audit that removed access is actually removed (test that old keys fail)
|
||||
|
||||
5. **Log Retention and Archival** — Certctl logs to stdout (Docker) and stores audit events in PostgreSQL. You must:
|
||||
- Ship logs to a long-term archive (SIEM, S3, or equivalent)
|
||||
- Define retention policy (often 1-7 years per industry regulation)
|
||||
- Encrypt archived logs
|
||||
- Test that you can retrieve logs from archive (restoration drills)
|
||||
|
||||
6. **Encryption at Rest** — PostgreSQL data (including audit trail) is stored on disk. You must:
|
||||
- Enable transparent data encryption (TDE) on your database VM
|
||||
- Encrypt container persistent volumes (if using Kubernetes)
|
||||
- Encrypt database backups
|
||||
|
||||
7. **Network Segmentation** — Certctl API and database must be protected by network access controls. You must:
|
||||
- Firewall the control plane (only authorized services can connect)
|
||||
- Use VPN or private networks for agent-to-server communication
|
||||
- Isolate proxy agents (for F5, IIS, etc.) in the same network zone as their targets
|
||||
|
||||
8. **Capacity Planning** — Certctl's performance scales with your PostgreSQL. You must:
|
||||
- Estimate certificate inventory size (10k, 100k, 1M certs?)
|
||||
- Test Certctl with your expected scale in staging
|
||||
- Monitor disk usage, CPU, memory
|
||||
- Plan for growth (add PostgreSQL replicas, increase connection pool, etc.)
|
||||
|
||||
9. **Disaster Recovery** — Certctl data lives in PostgreSQL. You must:
|
||||
- Back up PostgreSQL regularly (daily or hourly, depending on RPO)
|
||||
- Test restore process in staging (broken backups discovered during incidents)
|
||||
- Have a runbook for failover to replica or recovery from backup
|
||||
- Document RTO/RPO targets (how long can cert management be down? how much data can you afford to lose?)
|
||||
|
||||
10. **Integration with Your IAM** — If using OIDC/SSO (V3), you must:
|
||||
- Configure your OIDC provider (Okta, Azure AD, Google)
|
||||
- Map user groups to Certctl roles (Admin, Operator, Viewer)
|
||||
- Manage MFA policy (enforce MFA if required)
|
||||
- Audit user provisioning/deprovisioning
|
||||
|
||||
11. **Documentation and Runbooks** — Certctl documents *what it does* (this guide), but you must document:
|
||||
- Your organization's certificate lifecycle policy (who requests, who approves, who deploys)
|
||||
- How to respond to specific incidents (cert compromise, CA compromise, agent down, renewal failed)
|
||||
- How to operate certctl (day-to-day tasks, escalation procedures)
|
||||
- Contact info for on-call teams
|
||||
|
||||
---
|
||||
|
||||
## V3 Enhancements
|
||||
|
||||
**certctl Pro (V3, paid edition) adds features that significantly strengthen SOC 2 evidence:**
|
||||
|
||||
- **OIDC / SSO Integration** — Integrate with Okta, Azure AD, Google to replace API keys with federated identity. Enables MFA enforcement and centralized access management. Auditors love federated identity (easier to remove access at source).
|
||||
|
||||
- **Role-Based Access Control (RBAC)** — Predefined roles (Admin: full access; Operator: issue/renew/revoke, no policy changes; Viewer: read-only) with profile-gated enforcement. Allows separation of duties (e.g., junior operator can't change global policy).
|
||||
|
||||
- **NATS Event Bus** — Real-time audit streaming to your SIEM. Hybrid model: HTTP for synchronous APIs, NATS for async events (cert.issued, cert.expiring, agent.heartbeat, job.completed). JetStream persistence for replay and durability.
|
||||
|
||||
- **SIEM Export** — Automated export of audit trail to Splunk, ELK, DataDog, etc. (webhooks, syslog, or pull-based APIs). Makes it easy for security teams to hunt for anomalies.
|
||||
|
||||
- **Advanced Search DSL** — `POST /api/v1/search` with tree-based filters (nested AND/OR, regex, field projection). Enables complex compliance queries (e.g., "all certs issued in the last 30 days by team X that are longer than 1 year").
|
||||
|
||||
- **Bulk Revocation** — Revoke all certs issued by a profile, owner, or agent in one operation. Critical for large-scale incidents (e.g., "a team's CA key was compromised, revoke all their certs").
|
||||
|
||||
- **Certificate Health Scores** — Composite risk scoring (e.g., "this cert has no short-lived TTL enforcement, extends past your policy max, and hasn't been renewed in 2 years" → health=30%). Helps prioritize remediation.
|
||||
|
||||
- **Compliance Scoring** — Audit readiness reporting per certificate (e.g., "compliance=95% — missing only a 3-year max-TTL constraint"). Exportable compliance report.
|
||||
|
||||
- **DigiCert Issuer Connector** — OV/EV certificate issuance for public-facing services (web servers, CDNs). Complements Local CA for internal use.
|
||||
|
||||
- **CT Log Monitoring** — Passive detection of unauthorized cert issuance. Monitors public CT logs for certs matching your domains and alerts if unexpected certs appear (e.g., attacker obtained a cert for your domain).
|
||||
|
||||
- **F5 BIG-IP Implementation** — Full target connector with iControl REST API. Agents can deploy certs to F5 load balancers.
|
||||
|
||||
- **IIS Implementation** — Dual-mode: agent-local PowerShell (default) for servers with agents, or proxy agent WinRM (agentless targets). Full Windows Server integration.
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
certctl provides a strong foundation for SOC 2 compliance with API key authentication, immutable audit logging, automated alerting, and revocation capabilities. However, SOC 2 audits require evidence across your entire infrastructure — certctl is one piece. Use this guide to map certctl features to your audit questionnaire, then work with your auditors to identify gaps that must be filled by your own organizational policies and controls.
|
||||
|
||||
For a deeper SOC 2 discussion or a mock audit against this guide, contact your certctl Pro support team.
|
||||
@@ -164,7 +164,7 @@ This table shows what each Part tests and what's left for manual verification.
|
||||
| 36–37 | Issuer Catalog, Frontend Audit | SKIP | — | Requires browser |
|
||||
| 38 | Error Handling | 5 | Malformed JSON, missing required field, method not allowed, UTF-8 CN, empty body | Stack trace suppression, error response format |
|
||||
| 39 | Performance | 5 | List certs < 200ms, stats < 500ms, metrics < 200ms, Prometheus < 300ms, audit < 500ms | Load testing, concurrent request handling |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors, compliance exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 40 | Documentation | 8 | README, quickstart, architecture, connectors exist; migration guides exist; 8 issuer types in docs; 11 target types in docs | Content accuracy, link validity |
|
||||
| 41 | Regression | 3 | DELETE 204, per_page max fallback, network scan target seed count | `errors.Is(errors.New())` anti-pattern source scan |
|
||||
| 42 | Envoy Target | 5 | Domain type, connector file, test file, OpenAPI, agent dispatch | Envoy deployment test, SDS config |
|
||||
| 43 | Postfix/Dovecot | 3 | Domain types (Postfix + Dovecot), connector file, OpenAPI | Mail server deployment test |
|
||||
@@ -198,7 +198,7 @@ A buyer's QA lead reading this doc wants "where are the existential bugs caught?
|
||||
| **Medium** (Operational pain or silent data drift) | Targets, notifiers, observability, error handling, performance, regression | 14, 15-17, 30, 31, 38, 39, 40, 41, 42, 43, 44, 45, 46 | 14/14 automated (15-17 indirect via Parts 42–46) |
|
||||
| **Low** (Hygiene) | Documentation, docs verification | 40 (Documentation), 50 (Onboarding) | 2/2 automated |
|
||||
| **Frontend** (XSS, render correctness, mutation contracts) | GUI testing | 35, 36-37 | 0/3 automated in this suite (Vitest covers separately under `web/`); this doc punts to manual + Vitest |
|
||||
| **Compliance** (PCI / SOC2 / HIPAA-relevant) | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
| **Audit-relevant** | Audit trail, body-size limits, request limits, Helm chart deploy posture | 27, 32, 51, 52 | 4/4 automated |
|
||||
|
||||
This is the table acquisition reviewers screenshot for their report. When a new Part_* subtest lands in `qa_test.go`, classify it here.
|
||||
|
||||
|
||||
@@ -365,7 +365,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
| `issuer_id` | Links to the issuer connector that will sign this certificate. Determines which CA backend is used. |
|
||||
| `renewal_policy_id` | Links to a `renewal_policies` row that defines: how many days before expiry to renew (`renewal_window_days`), whether auto-renewal is enabled (`auto_renew`), max retries, and retry interval. The default policy (`rp-default`) renews 30 days before expiry. |
|
||||
| `status` | Set to `Pending` because the certificate hasn't been issued yet. The scheduler will pick it up, or you can trigger renewal manually. |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"pci": "true"` for compliance scoping). |
|
||||
| `tags` | Arbitrary key-value metadata stored as JSONB. Useful for filtering, reporting, and integration with external systems (e.g., `"environment": "production"` for fleet scoping). |
|
||||
|
||||
**Check the dashboard now.** Click "Certificates" in the sidebar. You'll see your new "Demo API Certificate" with status "Pending" alongside the pre-loaded demo certificates. Click on it to see the full details.
|
||||
|
||||
@@ -605,7 +605,7 @@ curl -s "$API/api/v1/audit?created_after=2026-03-24T09:00:00Z" | jq '.data | len
|
||||
|
||||
The audit middleware (M19) records every HTTP request: method, path, status code, actor, request body SHA-256 hash, and latency. This creates a complete API audit trail without blocking responses (logging happens asynchronously).
|
||||
|
||||
**Why immutable audit:** Compliance frameworks (SOC 2 Type II, PCI-DSS, ISO 27001) require tamper-evident audit logs. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
**Why immutable audit:** tamper-evident audit logs are a hard requirement when an attacker has compromised the API server. By making the repository interface append-only and recording API calls, even a compromised API server can't retroactively delete or modify audit records. In a production deployment, you'd also stream these to an external SIEM (Splunk, Datadog) for additional protection.
|
||||
|
||||
**Check the dashboard.** The "Audit" view shows the full timeline of all actions across the system with filtering and CSV/JSON export.
|
||||
|
||||
@@ -703,7 +703,7 @@ curl -s -X POST $API/api/v1/certificates \
|
||||
|
||||
**Why `environment` matters:** The environment field isn't just metadata — it feeds the policy engine. A policy rule with type `AllowedEnvironments` can restrict which environments are valid. If someone tries to create a certificate with `environment: "yolo"`, the policy engine flags a violation. In a mature deployment, you'd enforce policies strictly: production certificates must use a trusted CA (not Local CA), staging certificates can use Let's Encrypt staging, and development certificates can use the Local CA.
|
||||
|
||||
**Why `pci: true` in tags:** Tags are free-form, but they enable powerful filtering and compliance scoping. A security team could query `GET /api/v1/certificates?tags.pci=true` (not implemented yet, but the JSONB column supports it) to find all PCI-scoped certificates and verify they meet compliance requirements.
|
||||
**Why arbitrary tags in metadata:** Tags are free-form, but they enable powerful filtering and fleet scoping. A security team could query `GET /api/v1/certificates?tags.regulated=true` (not implemented yet, but the JSONB column supports it) to find all certificates marked regulated and verify they meet whatever requirements that label maps to.
|
||||
|
||||
**Refresh the dashboard** — you'll see the new payment gateway certificate. Try filtering by environment or status to see how both certificates appear alongside the demo data.
|
||||
|
||||
@@ -780,7 +780,7 @@ Check existing violations:
|
||||
curl -s "$API/api/v1/policies/pr-max-certificate-lifetime/violations" | jq .
|
||||
```
|
||||
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific non-compliance.
|
||||
**How it works:** This hits `GET /api/v1/policies/{id}/violations`, which queries `SELECT * FROM policy_violations WHERE rule_id = $1`. Each violation references the offending certificate and the rule it violated, creating a traceable link between the policy definition and the specific violation.
|
||||
|
||||
**In the dashboard**, click "Policies" in the sidebar to see all active rules and which certificates are violating them.
|
||||
|
||||
@@ -846,7 +846,7 @@ curl -s -X POST $API/api/v1/profiles \
|
||||
|
||||
**How it works:** Certificate profiles are stored in the `certificate_profiles` table with a `allowed_key_algorithms` JSONB column that defines which key types and minimum sizes are acceptable. When a certificate is assigned to a profile, the profile constraints are enforced during CSR validation. The `max_validity_days` field controls the maximum certificate lifetime — profiles with values translating to under 1 hour enable short-lived certificate mode, where certs are exempt from CRL/OCSP.
|
||||
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce compliance requirements across the fleet.
|
||||
**Why profiles matter:** Without profiles, any agent can submit a CSR with any key type and any validity period. Profiles create crypto policy guardrails — "production TLS certs must use ECDSA P-256 with 90-day max TTL" — that prevent configuration drift and enforce policy across the fleet.
|
||||
|
||||
**In the dashboard**, click "Profiles" in the sidebar to see and manage certificate profiles.
|
||||
|
||||
@@ -896,17 +896,17 @@ Approve or reject them:
|
||||
# Approve a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/approve \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Verified key type meets compliance requirements"}' | jq .
|
||||
-d '{"reason": "Verified key type meets policy"}' | jq .
|
||||
|
||||
# Reject a job
|
||||
curl -s -X POST $API/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet PCI requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy"}' | jq .
|
||||
```
|
||||
|
||||
**How it works:** When a renewal policy has `auto_renew` set to false, renewal jobs enter the `AwaitingApproval` state instead of being processed immediately. An operator must explicitly approve or reject the job via the API or the GUI. Approved jobs transition to `Pending` and are picked up by the job processor. Rejected jobs move to `Cancelled` with the provided reason recorded in the audit trail.
|
||||
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. PCI-scoped certificates, certs with specific compliance requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
**Why interactive approval:** Not every certificate renewal should be automatic. High-value certificates, certs with specific policy requirements, or certificates being migrated between issuers benefit from a human checkpoint. The AwaitingApproval state creates that checkpoint without blocking the entire job pipeline.
|
||||
|
||||
**In the dashboard:** Click "Jobs" in the sidebar, filter by status "AwaitingApproval", and you'll see a list of renewal jobs waiting for approval. Each job shows the certificate, issuer, and requested validity period. Click a job to open its detail view and see the Approve / Reject buttons with a reason text field. After approval or rejection, the job status updates in real-time and the audit trail records the decision.
|
||||
|
||||
|
||||
@@ -125,7 +125,7 @@ At no point does the private key leave the agent. This is a fundamental security
|
||||
|
||||
Agents also report **metadata** about themselves — their operating system, CPU architecture, IP address, hostname, and version — with every heartbeat. This gives ops teams fleet-wide visibility (e.g., "how many agents are running on ARM?", "which agents are still on v1.0.0?") and powers **agent groups** — dynamic device grouping where policies can be scoped to specific agent criteria like OS type, architecture, or network subnet.
|
||||
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for compliance reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
**Retiring an agent.** When you decommission a server, the certctl record for its agent needs to be retired, not deleted. certctl uses a **soft-delete** model: `DELETE /api/v1/agents/{id}` stamps the row with a retired-at timestamp and a reason, instead of removing it. This is deliberate — an audit trail of "who owned this certificate, on which host, for which team" stays intact forever, and the downstream deployment_targets, certificates, and jobs keep valid foreign keys. Retired agents are filtered out of default list views and the dashboard's agent counter, but remain visible through a separate retired-agents view for audit reconciliation. If the agent still has active deployment targets, deployed certificates, or pending jobs, retirement is blocked by default so you don't silently orphan those rows; the API responds with the exact counts so you can retire or reassign each dependency explicitly. A force-retire escape hatch (`?force=true&reason=...`) is available for true decommission scenarios — it transactionally retires the downstream targets, cancels pending jobs, and records the cascade in the audit trail with the reason you provided. Four internal sentinel agents that back the network scanner and the cloud secret-manager discovery sources cannot be retired at all, even with force, because retiring them would orphan their subsystems. Once retired, an agent that still attempts to heartbeat receives `410 Gone` — the agent process reads that as "you've been retired, shut down" and exits cleanly.
|
||||
|
||||
### Deployment Targets
|
||||
|
||||
@@ -244,7 +244,7 @@ Every action in certctl — issuing a certificate, renewing one, deploying to a
|
||||
|
||||
### Audit Trail
|
||||
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for compliance (SOC 2, PCI-DSS, ISO 27001) and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
Every action is logged: who did it, what changed, when, and why. This is essential for audit and for debugging. You can trace a certificate's entire history from creation through every renewal and deployment.
|
||||
|
||||
### Notifications
|
||||
|
||||
@@ -281,7 +281,7 @@ This gives you a three-step triage workflow:
|
||||
|
||||
Network scan targets are managed from the **Network Scans** dashboard page — create CIDR ranges and ports to probe, enable/disable targets, trigger on-demand scans, and view results. Discovered certificates from network scans appear in the same Discovery triage page alongside filesystem discoveries.
|
||||
|
||||
This is a prerequisite for multi-CA migration, compliance audits, and building confidence that you've found all the certificates that matter.
|
||||
This is a prerequisite for multi-CA migration, audit reviews, and building confidence that you've found all the certificates that matter.
|
||||
|
||||
### Observability
|
||||
|
||||
|
||||
@@ -324,7 +324,7 @@ curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/approve
|
||||
# Reject a pending job
|
||||
curl --cacert "$CA" -s -X POST https://localhost:8443/api/v1/jobs/JOB_ID/reject \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"reason": "Key type does not meet compliance requirements"}' | jq .
|
||||
-d '{"reason": "Key type does not meet policy requirements"}' | jq .
|
||||
```
|
||||
|
||||
## Certificate Discovery
|
||||
@@ -482,7 +482,7 @@ A suggested 5-minute flow:
|
||||
6. **Agent fleet** — "Agents handle key generation locally (ECDSA P-256). Private keys never leave your infrastructure."
|
||||
7. **Discovery** — "Agents scan filesystems, server probes TLS endpoints. We find what you're not managing yet."
|
||||
8. **Bulk operations** — "Select multiple certs, renew or revoke in bulk. At 47-day lifespans with hundreds of certs, this is essential."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for compliance."
|
||||
9. **Audit trail** — "Every action recorded. Export to CSV/JSON for review."
|
||||
10. **CLI + MCP** — "Terminal users get `certctl-cli`. AI assistants get MCP integration. Everything is API-first."
|
||||
|
||||
## Tear Down
|
||||
|
||||
@@ -62,11 +62,11 @@ The three differentiators above get the headlines, but the feature surface is wi
|
||||
|
||||
**Network certificate discovery** — active TLS scanning of CIDR ranges finds certificates you didn't know existed. Agents also scan local filesystems for PEM/DER files. Everything feeds into a triage workflow where you claim, dismiss, or import discovered certs into management.
|
||||
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete. Mapped to SOC 2, PCI-DSS 4.0, and NIST SP 800-57 compliance frameworks with published evidence guides.
|
||||
**Immutable audit trail** — every API call recorded (method, path, actor, body hash, status, latency). Every certificate lifecycle event tracked. Append-only, no update or delete.
|
||||
|
||||
**Policy engine** — 5 rule types (allowed issuers, allowed domains, required metadata, allowed environments, renewal lead time) with violation tracking and severity levels.
|
||||
|
||||
**PKI compliance** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
**Revocation infrastructure** — DER-encoded X.509 CRL signed by issuing CA, embedded OCSP responder, RFC 5280 revocation with all reason codes, short-lived certificate exemption.
|
||||
|
||||
**Prometheus metrics** — `/api/v1/metrics/prometheus` in standard exposition format. Works with Prometheus, Grafana Agent, Datadog Agent, Victoria Metrics.
|
||||
|
||||
|
||||
@@ -98,7 +98,7 @@ Go to **Policies** → **+ New Policy** to create enforcement rules:
|
||||
- **Severity:** `high`
|
||||
- **Config:** set your enforcement parameters
|
||||
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other compliance rules across your fleet.
|
||||
Certificates are linked to issuers and profiles when created or claimed from discovery. Policies add guardrails — enforcing key algorithm requirements, expiration windows, and other policy rules across your fleet.
|
||||
|
||||
### 6. View Unified Inventory
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Compliance customers (PCI-DSS Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA) configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
certctl can gate certificate issuance + renewal on a per-profile, two-person-integrity check. Operators configure this on production-tier `CertificateProfile` rows so every renewal-loop tick or manual `POST /api/v1/certificates/{id}/renew` blocks at `JobStatusAwaitingApproval` until a different actor approves.
|
||||
|
||||
Closes the procurement-checklist question "How do you enforce two-person integrity on cert issuance?" — without this surface the answer is "we don't"; with `requires_approval=true` on the profile, the answer is "here's the RBAC contract + here's the audit query that proves bypass mode is off in production."
|
||||
|
||||
@@ -50,7 +50,7 @@ Every certificate bound to that profile is now gated. The default is `requires_a
|
||||
|
||||
The actor that triggers a renewal **cannot** be the actor that approves it. The check happens at the service layer and surfaces as **HTTP 403** at the handler. The error message contains the substring `two-person integrity` so server-log greps detect attempted self-approvals.
|
||||
|
||||
This is the load-bearing compliance contract. Pinned by:
|
||||
This is the load-bearing two-person-integrity contract. Pinned by:
|
||||
|
||||
- `internal/service/approval_test.go::TestApproval_Approve_RejectsSameActor` — service-level pin.
|
||||
- `internal/api/handler/approval_test.go::TestApproval_HandlerApproveAsSameActor_Returns403` — handler-level pin (HTTP 403 + body contains "two-person integrity").
|
||||
@@ -97,20 +97,11 @@ curl -X POST "https://certctl/api/v1/certificates/mc-foo/renew" \
|
||||
|
||||
Tighten the timeout for short-window deployments via the env var, e.g. `CERTCTL_JOB_AWAITING_APPROVAL_TIMEOUT=24h`.
|
||||
|
||||
## Compliance control mapping
|
||||
|
||||
| Standard | Control | What this surface satisfies |
|
||||
|---|---|---|
|
||||
| PCI-DSS 4.0 | **§6.4.5** (Separation of duties for production change-management) | Same-actor RBAC pin; audit row carries both `requested_by` and `decided_by` so reviewers see two distinct identities per change. |
|
||||
| NIST SP 800-53 | **SA-15** (Development process; two-person review for security-relevant changes) | Service-layer `ErrApproveBySameActor` + `TestApproval_Approve_RejectsSameActor` pin the contract. Bypass-mode emits a typed audit row (`action=approval_bypassed`) so compliance reviewers detect dev-mode misuse via `SELECT count(*) FROM audit_events WHERE actor='system-bypass'` returning > 0. |
|
||||
| SOC 2 Type II | **CC6.1** (Logical access — restrict, monitor, terminate) | Per-decision audit row + `certctl_approval_decisions_total{outcome,profile_id}` Prometheus counter. Operators alert on sustained `outcome="rejected"` or `outcome="expired"` bursts. |
|
||||
| HIPAA | **§164.308(a)(4)** (Information access management) | Same surface — the per-policy gating + audit trail is the access-management control. |
|
||||
|
||||
## Bypass mode (dev / CI ONLY)
|
||||
|
||||
Setting `CERTCTL_APPROVAL_BYPASS=true` short-circuits the workflow: every `RequestApproval` call auto-approves with `decided_by=system-bypass` and `actorType=System`. Used by dev / CI to keep renewal-scheduler tests fast without standing up an approver.
|
||||
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so compliance auditors detect misuse via:
|
||||
**Production deploys MUST leave this unset.** The bypass emits a typed audit event (`action=approval_bypassed`) so reviewers detect misuse via:
|
||||
|
||||
```sql
|
||||
SELECT count(*) FROM audit_events WHERE actor = 'system-bypass';
|
||||
@@ -130,7 +121,7 @@ certctl_approval_pending_age_seconds histogram
|
||||
|
||||
`outcome` is one of `approved`, `rejected`, `expired`, `bypassed`. `profile_id` is the `CertificateProfile.ID` that triggered the gate (cardinality-bounded — operators have <100 profiles in production).
|
||||
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — compliance customers usually have a same-day decision deadline.
|
||||
The pending-age histogram observes seconds-since-creation at the moment of decision. Alert when p99 hits hours/days — production deployments usually have a same-day decision deadline.
|
||||
|
||||
## Future free V2 work
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** Bundle B / M-018. PCI-DSS v4.0 Req 4 §2.2.5; CWE-319.
|
||||
**Audit reference:** Bundle B / M-018. CWE-319 (Cleartext transmission of sensitive information).
|
||||
|
||||
certctl talks to Postgres over a single connection-string URL controlled by the
|
||||
`CERTCTL_DATABASE_URL` env var. The `sslmode` query parameter on that URL
|
||||
@@ -15,16 +15,16 @@ explicit opt-in / opt-out paths for the four real-world deployment shapes.
|
||||
|
||||
| Deployment shape | Default `sslmode` | When to change |
|
||||
|------------------------------------------------|--------------------|----------------|
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload is in PCI-DSS scope. |
|
||||
| Helm chart, bundled Postgres, in-cluster | `disable` | When the cluster does not provide pod-network encryption (CNI without WireGuard / IPSec) and the workload handles sensitive data. |
|
||||
| Helm chart, external Postgres (RDS / Cloud SQL / Azure DB) | not auto-set | **Always** set to `verify-full` and provide the cloud provider's server CA bundle. |
|
||||
| docker-compose, bundled Postgres on docker bridge | `disable` | Demo/dev only; not a deployment shape we expect operators to harden. |
|
||||
| docker-compose / k8s with external Postgres | not auto-set | **Always** set `CERTCTL_DATABASE_URL` to a connection string with `sslmode=verify-full`. |
|
||||
|
||||
`sslmode` values come from `lib/pq` (the underlying driver). The full set is:
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`. PCI-DSS
|
||||
Req 4 v4.0 §2.2.5 considers `verify-ca` the floor for sensitive-data transport;
|
||||
`verify-full` is the floor for systems exposed to spoofing risk (it adds
|
||||
hostname validation against the server cert's CN/SAN).
|
||||
`disable`, `allow`, `prefer`, `require`, `verify-ca`, `verify-full`.
|
||||
`verify-ca` is the floor for sensitive-data transport; `verify-full`
|
||||
is the floor for systems exposed to spoofing risk (it adds hostname
|
||||
validation against the server cert's CN/SAN).
|
||||
|
||||
## Helm chart (Bundle B)
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
**Audit reference:** Bundle F / M-023. PCI-DSS v4.0 Req 4 §2.2.5; CWE-326.
|
||||
**Audit reference:** Bundle F / M-023. CWE-326 (Inadequate encryption strength).
|
||||
|
||||
## What this is
|
||||
|
||||
@@ -15,13 +15,12 @@ proxy and pass the request through to certctl over TLS 1.3.
|
||||
|
||||
## Why TLS 1.3 minimum
|
||||
|
||||
certctl's audit posture, the SOC 2 / PCI-DSS / NIST SP 800-57 compliance
|
||||
mappings, and the M-001 PBKDF2 work factor all assume modern transport
|
||||
crypto. TLS 1.2 with the cipher suites still in the wild has known
|
||||
attack surface (BEAST, POODLE, ROBOT, raccoon — all CVE-categorized);
|
||||
allowing TLS 1.2 directly on the certctl listener would invalidate the
|
||||
guarantee that the server-side encryption chain is the strongest the
|
||||
ecosystem currently supports.
|
||||
certctl's audit posture and the M-001 PBKDF2 work factor both assume
|
||||
modern transport crypto. TLS 1.2 with the cipher suites still in the
|
||||
wild has known attack surface (BEAST, POODLE, ROBOT, raccoon — all
|
||||
CVE-categorized); allowing TLS 1.2 directly on the certctl listener
|
||||
would invalidate the guarantee that the server-side encryption chain
|
||||
is the strongest the ecosystem currently supports.
|
||||
|
||||
## When this runbook applies
|
||||
|
||||
@@ -71,8 +70,8 @@ server {
|
||||
server_name est.example.com;
|
||||
|
||||
# Public-facing legacy listener. ssl_protocols includes TLSv1.2 explicitly.
|
||||
# Keep ssl_ciphers conservative — only the strong AEAD suites that
|
||||
# PCI-DSS Req 4 §2.2.5 still allows under TLS 1.2.
|
||||
# Keep ssl_ciphers conservative — only strong AEAD suites with forward
|
||||
# secrecy.
|
||||
ssl_certificate /etc/nginx/certs/est.example.com.fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/certs/est.example.com.key;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
@@ -168,21 +167,19 @@ only one would fail loud at startup. Until that work ships, the
|
||||
header-agnostic default described above is the only supported
|
||||
configuration.
|
||||
|
||||
## PCI-DSS Req 4 §2.2.5 attestation
|
||||
## TLS posture summary
|
||||
|
||||
PCI-DSS v4.0 §2.2.5 ("strong cryptography for authentication/transmission
|
||||
of cardholder data") considers TLS 1.2 with strong cipher suites
|
||||
acceptable for the foreseeable future, with the explicit caveat that NIST
|
||||
or the PCI Council may shorten the deprecation window if a TLS 1.2
|
||||
weakness is published. The configuration above:
|
||||
The configuration above:
|
||||
|
||||
- Pins TLS 1.2 + TLS 1.3 only (no SSLv3, TLS 1.0, TLS 1.1).
|
||||
- Uses only AEAD cipher suites with forward secrecy (ECDHE-* with GCM or
|
||||
ChaCha20-Poly1305).
|
||||
- Re-encrypts to TLS 1.3 on the proxy-to-certctl hop.
|
||||
- Re-encrypts to TLS 1.3 on the proxy-to-certctl hop so the certctl
|
||||
listener never speaks anything below 1.3.
|
||||
|
||||
This is PCI-DSS Req 4 v4.0 compliant. Auditors looking for the
|
||||
attestation should be pointed at this section + the proxy's TLS config.
|
||||
That is the strongest posture currently achievable while still allowing
|
||||
the legacy clients to enroll. Reviewers looking for the attestation
|
||||
should be pointed at this section + the proxy's TLS config.
|
||||
|
||||
## What this runbook does NOT cover
|
||||
|
||||
@@ -197,14 +194,11 @@ attestation should be pointed at this section + the proxy's TLS config.
|
||||
|
||||
## When TLS 1.2 itself sunsets
|
||||
|
||||
PCI-DSS, NIST, and major browsers will eventually deprecate TLS 1.2.
|
||||
When that happens, this runbook becomes obsolete; the only path forward
|
||||
will be to replace the legacy clients. Subscribe to RSS feeds at the
|
||||
following sources to catch the deprecation announcement before it
|
||||
becomes a compliance failure:
|
||||
|
||||
- https://www.pcisecuritystandards.org/news_events/
|
||||
- https://nvlpubs.nist.gov/nistpubs/SpecialPublications/ (SP 800-52 revisions)
|
||||
Major browsers and OS vendors will eventually deprecate TLS 1.2. When
|
||||
that happens, this runbook becomes obsolete; the only path forward
|
||||
will be to replace the legacy clients. Watch the IETF TLS working
|
||||
group, the major browser vendors' announcement channels, and your
|
||||
own embedded-device vendors for deprecation notices.
|
||||
|
||||
## Related docs
|
||||
|
||||
|
||||
@@ -9,10 +9,10 @@
|
||||
> if a procedure here doesn't work as documented, that's a bug in
|
||||
> docs (file an issue).
|
||||
|
||||
This runbook is the SOC 2 / PCI procurement-team deliverable: it tells
|
||||
auditors and on-call operators what to do when a piece of certctl's
|
||||
state corrupts, when a CA key needs rotation, or when Postgres needs
|
||||
a point-in-time restore. Read it once when you set up certctl; print
|
||||
This runbook is the on-call deliverable: it tells reviewers and
|
||||
on-call operators what to do when a piece of certctl's state
|
||||
corrupts, when a CA key needs rotation, or when Postgres needs a
|
||||
point-in-time restore. Read it once when you set up certctl; print
|
||||
the [DR checklist](#dr-checklist) and pin it near your on-call rotation.
|
||||
|
||||
## Contents
|
||||
@@ -57,7 +57,7 @@ without operator action. The fail-safes in the codebase:
|
||||
These fail-safes mean most of this runbook is "delete the corrupt
|
||||
row + wait for the next tick" rather than "restore from backup +
|
||||
manually re-issue." The runbook documents the full procedures
|
||||
anyway because compliance auditors need to see them written down.
|
||||
anyway because reviewers need to see them written down.
|
||||
|
||||
## CRL cache recovery
|
||||
|
||||
@@ -288,7 +288,7 @@ backups. Without them, a restored DB is unusable.
|
||||
## Trust-bundle reload semantics
|
||||
|
||||
This section codifies the fail-safe behavior that's already in code,
|
||||
for compliance auditors who need to see the procedure documented.
|
||||
for reviewers who need to see the procedure documented.
|
||||
|
||||
**Pattern:** every trust-bundle holder (`internal/trustanchor.Holder`,
|
||||
used by SCEP/Intune dispatcher + EST mTLS sibling route) implements
|
||||
|
||||
@@ -495,7 +495,7 @@ Short-lived certificates (those with profile TTL < 1 hour) return "good" from OC
|
||||
|
||||
#### Bulk Revocation
|
||||
|
||||
For compliance events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
For incident-response events requiring fleet-wide revocation (key compromise, CA distrust, mass decommission), certctl supports bulk revocation by filter criteria. The `POST /api/v1/certificates/bulk-revoke` endpoint accepts filter parameters (profile_id, owner_id, agent_id, issuer_id) and creates individual revocation jobs for each matching certificate. Bulk revocation reuses the same 7-step single-cert flow for each certificate — no new issuer notification or audit mechanics. The operation is idempotent: revoking an already-revoked certificate is a no-op. Partial failures are tolerated — if one certificate fails to revoke (e.g., issuer unavailable), the operation continues for remaining certs and returns a summary. A single `bulk_revocation_initiated` audit event logs the operation with filter criteria, operator actor, and summary (total requested, succeeded, failed counts). Audit events for individual certificate revocations record the operator identity separately. The GUI bulk revoke button on the certificates list filters by visible selections and displays an affected-cert count modal before confirmation.
|
||||
|
||||
### 4. Automatic Renewal
|
||||
|
||||
@@ -1264,7 +1264,7 @@ flowchart TB
|
||||
- **Claims it** via `POST /discovered-certificates/{id}/claim` — links to existing managed cert or creates new enrollment
|
||||
- **Dismisses it** via `POST /discovered-certificates/{id}/dismiss` — removes from triage, marked as "Dismissed"
|
||||
9. **Status tracking** — `discovery_cert_claimed` and `discovery_cert_dismissed` events audit the operator's decision
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for compliance reporting)
|
||||
10. **Summary** — `GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for inventory reporting)
|
||||
|
||||
This data flow is pull-based and non-blocking. Agents discover at their own pace; the server stores results for later review. There's no pressure to claim or dismiss; operators can leave certificates in "Unmanaged" status indefinitely.
|
||||
|
||||
@@ -1328,11 +1328,10 @@ Captured baseline numbers are committed in `deploy/test/loadtest/README.md` once
|
||||
|
||||
## What's Next
|
||||
|
||||
- [Quick Start](quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](demo-advanced.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors.md) — Build custom connectors
|
||||
- [Compliance Mapping](compliance.md) — SOC 2, PCI-DSS 4.0, and NIST SP 800-57 alignment
|
||||
- [Quick Start](../getting-started/quickstart.md) — Get certctl running locally
|
||||
- [Advanced Demo](../getting-started/advanced-demo.md) — Issue a certificate end-to-end
|
||||
- [Connector Guide](connectors/index.md) — Build custom connectors
|
||||
- [MCP Server Guide](mcp.md) — AI-native access to the API
|
||||
- [OpenAPI Spec](openapi.md) — Full API reference and SDK generation
|
||||
- [Testing Guide](testing-guide.md) — Test procedures and release sign-off
|
||||
- [Test Environment](test-env.md) — Docker Compose test environment setup
|
||||
- [API Reference](api.md) — OpenAPI 3.1 spec and SDK generation
|
||||
- [QA Test Suite](../contributor/qa-test-suite.md) — Test procedures and release sign-off
|
||||
- [Test Environment](../contributor/test-environment.md) — Docker Compose test environment setup
|
||||
|
||||
@@ -225,7 +225,7 @@ camelCase form (`keyCompromise`, `cACompromise`,
|
||||
`aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE
|
||||
variants. An unknown reason returns an error rather than silently
|
||||
demoting to `unspecified` — operators rely on the reason for
|
||||
compliance reporting (PCI-DSS §3.6, HIPAA §164.312).
|
||||
audit reporting.
|
||||
|
||||
## Related docs
|
||||
|
||||
|
||||
@@ -190,7 +190,7 @@ Paste into security review:
|
||||
previous cert; both outcomes are surfaced via Prometheus.
|
||||
- The minimum IAM policy is 5 actions on
|
||||
`arn:aws:acm:*:*:certificate/*`; CloudTrail captures every
|
||||
API call for compliance audits.
|
||||
API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
|
||||
@@ -179,7 +179,7 @@ Paste into security review:
|
||||
snapshotted previous version's bytes; both outcomes are
|
||||
surfaced via Prometheus.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log
|
||||
captures every API call for compliance audits.
|
||||
captures every API call for audit.
|
||||
|
||||
## ValidateOnly contract
|
||||
|
||||
|
||||
@@ -23,7 +23,7 @@ Use the GlobalSign Atlas HVCA connector when:
|
||||
|
||||
- You're a GlobalSign Atlas customer issuing high volumes of
|
||||
publicly trusted certificates (the "HV" in HVCA).
|
||||
- You want region-pinned issuance for compliance or latency
|
||||
- You want region-pinned issuance for regulatory or latency
|
||||
reasons (EMEA / APAC / Americas regional endpoints).
|
||||
- You're prepared to manage both mTLS client certs AND
|
||||
API key/secret credentials in tandem.
|
||||
|
||||
@@ -202,7 +202,7 @@ The Local CA issuer signs certificates using Go's `crypto/x509` library. It supp
|
||||
|
||||
**Sub-CA mode:** Loads a CA certificate and private key from disk (`CERTCTL_CA_CERT_PATH` + `CERTCTL_CA_KEY_PATH`). The CA cert is signed by an upstream CA (e.g., ADCS), so all issued certificates chain to the enterprise root trust hierarchy. Clients that already trust the enterprise root automatically trust certctl-issued certs. Supports RSA, ECDSA, and PKCS#8 key formats. If the paths are not set, falls back to self-signed mode. The loaded certificate must have `IsCA=true` and `KeyUsageCertSign`.
|
||||
|
||||
**Tree mode (Rank 8 — multi-level CA hierarchy):** When `Issuer.HierarchyMode = "tree"` is set on the issuer row, the local connector reads the active CA hierarchy from the `intermediate_cas` table and assembles `IssuanceResult.ChainPEM` by walking the `parent_ca_id` ancestry from the issuing leaf CA up to the root. Tree mode is operator-managed via the admin-gated `/api/v1/issuers/{id}/intermediates` and `/api/v1/intermediates/{id}` endpoints (`POST` to create / sign children, `GET` to list / inspect, `POST .../retire` to two-phase retire). The signing path is shared with single-mode (cert is signed via `c.caCert` + `c.caSigner` from the on-disk issuing CA cert+key); only the chain bytes differ. RFC 5280 §3.2 (self-signed root validation), §4.2.1.9 (path-length tightening), and §4.2.1.10 (NameConstraints subset semantics) are enforced at the service layer fail-closed. The default is `single`, byte-identical to the pre-Rank-8 historical flow. See `docs/intermediate-ca-hierarchy.md` for the operator runbook covering 4-level FedRAMP boundary CA, 3-level financial-services policy CA, 2-level internal-PKI patterns + the migration runbook for flipping a single-mode issuer to tree.
|
||||
**Tree mode (Rank 8 — multi-level CA hierarchy):** When `Issuer.HierarchyMode = "tree"` is set on the issuer row, the local connector reads the active CA hierarchy from the `intermediate_cas` table and assembles `IssuanceResult.ChainPEM` by walking the `parent_ca_id` ancestry from the issuing leaf CA up to the root. Tree mode is operator-managed via the admin-gated `/api/v1/issuers/{id}/intermediates` and `/api/v1/intermediates/{id}` endpoints (`POST` to create / sign children, `GET` to list / inspect, `POST .../retire` to two-phase retire). The signing path is shared with single-mode (cert is signed via `c.caCert` + `c.caSigner` from the on-disk issuing CA cert+key); only the chain bytes differ. RFC 5280 §3.2 (self-signed root validation), §4.2.1.9 (path-length tightening), and §4.2.1.10 (NameConstraints subset semantics) are enforced at the service layer fail-closed. The default is `single`, byte-identical to the pre-Rank-8 historical flow. See `docs/reference/intermediate-ca-hierarchy.md` for the operator runbook covering common 4-level boundary, 3-level policy, and 2-level internal-PKI patterns + the migration runbook for flipping a single-mode issuer to tree.
|
||||
|
||||
**CRL and OCSP support (M15b):** The Local CA supports DER-encoded X.509 CRL generation served unauthenticated at `GET /.well-known/pki/crl/{issuer_id}` (RFC 5280 §5, RFC 8615, `Content-Type: application/pkix-crl`) with 24-hour validity. An embedded OCSP responder at `GET /.well-known/pki/ocsp/{issuer_id}/{serial}` (RFC 6960, `Content-Type: application/ocsp-response`) returns signed OCSP responses for issued certificates (good/revoked/unknown status). Both endpoints are reachable by relying parties with no certctl API credentials, which is how standard TLS clients, browsers, and hardware appliances consume these resources. Certificates with profile TTL < 1 hour automatically skip CRL/OCSP — expiry is treated as sufficient revocation for short-lived credentials.
|
||||
|
||||
@@ -314,7 +314,7 @@ The connector is registered in the issuer registry under `iss-acme-staging` and
|
||||
|
||||
The cert version must exist in the local store: this means the cert was issued through certctl, not imported. If `GetVersionBySerial` returns `sql.ErrNoRows`, the connector returns an actionable error pointing at the local-store requirement. Revoke-by-serial is therefore only available for ACME certs that certctl issued.
|
||||
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to `unspecified` (0), and the connector accepts the canonical camelCase form (`keyCompromise`, `cACompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE variants. An unknown reason returns an error rather than silently demoting to `unspecified` — operators rely on the reason for compliance reporting (PCI-DSS §3.6, HIPAA §164.312).
|
||||
Reason codes follow RFC 5280 §5.3.1: nil reason maps to `unspecified` (0), and the connector accepts the canonical camelCase form (`keyCompromise`, `cACompromise`, `affiliationChanged`, `superseded`, `cessationOfOperation`, `certificateHold`, `removeFromCRL`, `privilegeWithdrawn`, `aACompromise`) plus underscore_lower and ALL_CAPS_UNDERSCORE variants. An unknown reason returns an error rather than silently demoting to `unspecified` — operators rely on the reason for audit reporting.
|
||||
|
||||
Audit reference: `cowork/issuer-coverage-audit-2026-05-01/RESULTS.md` Top-10 fix #7.
|
||||
|
||||
@@ -398,7 +398,7 @@ certctl's OpenSSL adapter `exec`s an operator-supplied script for every certific
|
||||
|
||||
**When you should NOT use the OpenSSL adapter:**
|
||||
|
||||
- Compliance environments (PCI-DSS Level 1, FedRAMP High, HIPAA-regulated PHI handling) where shell-out attack surfaces are formally disallowed by your security policy.
|
||||
- Regulated environments where shell-out attack surfaces are formally disallowed by your security policy.
|
||||
- Multi-tenant certctl-server deployments where tenant-A's script can affect tenant-B's certificates.
|
||||
- Environments without operator review of every script line — trust-on-first-use is the wrong posture for a shell-out.
|
||||
- For these cases, switch to a Go-native issuer adapter (Vault, DigiCert, Sectigo, ACME, AWSACMPCA, GoogleCAS, EJBCA, Entrust, GlobalSign, step-ca) or commission a custom Go-native adapter for your CA (the issuer connector interface in `internal/connector/issuer/interface.go` is small — `IssueCertificate` + `RevokeCertificate` + `GetCACertPEM` + a few stubs).
|
||||
@@ -478,7 +478,7 @@ See [`legacy-est-scep.md`](../protocols/scep-server.md#scep-mtls-sibling-route-p
|
||||
|
||||
#### Microsoft Intune Certificate Connector dispatcher
|
||||
|
||||
When a profile has `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true`, certctl validates the Microsoft Intune Certificate Connector's signed-challenge JWS natively as a drop-in NDES replacement (the Intune Connector documents itself as RFC 8894-compliant and works against any RFC 8894 SCEP server). The dispatcher walks parse → JWS signature verify (RS256 + ES256, alg=none rejected) → version dispatch → time bounds with ±tolerance → audience pin → CSR ↔ claim binding → replay cache → per-device rate limit → optional V3-Pro compliance hook. The trust anchor file is reloaded on `SIGHUP` (operator rotates the on-disk PEM, then `kill -HUP <certctl-pid>`); a parse failure during reload keeps the OLD pool so a half-rotation doesn't take Intune down.
|
||||
When a profile has `CERTCTL_SCEP_PROFILE_<NAME>_INTUNE_ENABLED=true`, certctl validates the Microsoft Intune Certificate Connector's signed-challenge JWS natively as a drop-in NDES replacement (the Intune Connector documents itself as RFC 8894-conformant and works against any RFC 8894 SCEP server). The dispatcher walks parse → JWS signature verify (RS256 + ES256, alg=none rejected) → version dispatch → time bounds with ±tolerance → audience pin → CSR ↔ claim binding → replay cache → per-device rate limit → optional V3-Pro device-state hook. The trust anchor file is reloaded on `SIGHUP` (operator rotates the on-disk PEM, then `kill -HUP <certctl-pid>`); a parse failure during reload keeps the OLD pool so a half-rotation doesn't take Intune down.
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
@@ -493,7 +493,7 @@ See [`scep-intune.md`](../protocols/scep-intune.md) for the full deployment guid
|
||||
|
||||
#### SCEP probe in network scanner
|
||||
|
||||
The Network Scans GUI surface includes a one-click "Probe SCEP" form that runs a capability + posture check against any reachable SCEP server URL — `GetCACaps` + `GetCACert` (NEVER `PKCSReq`) so the probe is read-only and safe to run against production endpoints. Result fields surface advertised caps (POSTPKIOperation, SHA-256, SHA-512, AES, SCEPStandard, Renewal), CA cert subject + issuer + algorithm + days-to-expiry + chain length, and a probe duration. Results persist to `scep_probe_results` (migration `000021`) and the probe history is paginated under `GET /api/v1/network-scan/scep-probes`. Useful for pre-migration assessment ("what does the existing NDES advertise?") and compliance-posture audits.
|
||||
The Network Scans GUI surface includes a one-click "Probe SCEP" form that runs a capability + posture check against any reachable SCEP server URL — `GetCACaps` + `GetCACert` (NEVER `PKCSReq`) so the probe is read-only and safe to run against production endpoints. Result fields surface advertised caps (POSTPKIOperation, SHA-256, SHA-512, AES, SCEPStandard, Renewal), CA cert subject + issuer + algorithm + days-to-expiry + chain length, and a probe duration. Results persist to `scep_probe_results` (migration `000021`) and the probe history is paginated under `GET /api/v1/network-scan/scep-probes`. Useful for pre-migration assessment ("what does the existing NDES advertise?") and posture review.
|
||||
|
||||
| Endpoint | Auth | Description |
|
||||
|----------|------|-------------|
|
||||
@@ -1325,7 +1325,7 @@ certctl's SSH connector dials each target with `HostKeyCallback: ssh.InsecureIgn
|
||||
**When you should NOT use the SSH connector:**
|
||||
|
||||
- Deploying to **unknown / dynamic / multi-tenant** hosts where the IP-to-hostname binding isn't operator-controlled.
|
||||
- Environments with strict **regulatory MITM-resistance** requirements (PCI-DSS Level 1, FedRAMP High, etc.) — the inline-comment "out of scope" framing doesn't satisfy compliance auditors who want documented host-key verification at the connector level.
|
||||
- Environments with strict **regulatory MITM-resistance** requirements where the inline-comment "out of scope" framing doesn't satisfy reviewers who want documented host-key verification at the connector level.
|
||||
- For these cases, switch to a different connector (Kubernetes Secrets, WinCertStore, F5 with iControl REST under operator-managed cert pinning) **OR** layer a custom `SSHClient` with full `known_hosts` validation per the mitigations above.
|
||||
|
||||
**V3-Pro forward path:**
|
||||
@@ -1546,7 +1546,7 @@ The ARN updates in place across renewals (ACM `ImportCertificate` is upsert-styl
|
||||
- The cert key is held only in agent memory during the import call; never written to disk.
|
||||
- Every imported ACM cert is tagged with `certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback to the snapshotted previous cert; both outcomes are surfaced via Prometheus.
|
||||
- The minimum IAM policy is 5 actions on `arn:aws:acm:*:*:certificate/*`; CloudTrail captures every API call for compliance audits.
|
||||
- The minimum IAM policy is 5 actions on `arn:aws:acm:*:*:certificate/*`; CloudTrail captures every API call for audit.
|
||||
|
||||
**ValidateOnly contract.** ACM has no dry-run API for `ImportCertificate`; `ValidateOnly` returns `target.ErrValidateOnlyNotSupported` per the deploy-hardening I Phase 3 sentinel contract. Operators preview deploys via `ValidateConfig` + `aws acm describe-certificate --certificate-arn <arn>` against the current ARN.
|
||||
|
||||
@@ -1628,7 +1628,7 @@ Application Gateway / Front Door reference the cert by KID URI; certctl rotates
|
||||
- The cert key is held only in agent memory during the PFX wrap + import call; never written to disk.
|
||||
- Every imported Key Vault cert is tagged with `certctl-managed-by=certctl` + `certctl-certificate-id=<mc-id>` for forensic traceability.
|
||||
- Failed imports trigger automatic rollback by re-importing the snapshotted previous version's bytes; both outcomes are surfaced via Prometheus.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log captures every API call for compliance audits.
|
||||
- The minimum RBAC role is 3 data-plane actions; Activity Log captures every API call for audit.
|
||||
|
||||
**ValidateOnly contract.** Key Vault has no dry-run API; `ValidateOnly` returns `target.ErrValidateOnlyNotSupported`. Operators preview deploys via `ValidateConfig` + `az keyvault certificate show --vault-name <name> --name <cert>`.
|
||||
|
||||
|
||||
@@ -83,10 +83,9 @@ enforced at the service layer fail-closed. The default is
|
||||
`single`, byte-identical to the pre-Rank-8 historical flow.
|
||||
|
||||
See [intermediate-ca-hierarchy.md](../intermediate-ca-hierarchy.md)
|
||||
for the operator runbook covering 4-level FedRAMP boundary CA,
|
||||
3-level financial-services policy CA, 2-level internal-PKI
|
||||
patterns, and the migration runbook for flipping a single-mode
|
||||
issuer to tree.
|
||||
for the operator runbook covering 4-level boundary, 3-level policy,
|
||||
and 2-level internal-PKI patterns, and the migration runbook for
|
||||
flipping a single-mode issuer to tree.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
@@ -40,9 +40,8 @@ Look elsewhere when:
|
||||
Sectigo, ACME, AWS ACM PCA, Google CAS, EJBCA, Entrust,
|
||||
GlobalSign, step-ca). Use the native adapter — narrower attack
|
||||
surface, no shell-out exposure.
|
||||
- You're in a compliance environment (PCI-DSS Level 1, FedRAMP
|
||||
High, HIPAA-regulated PHI handling) where shell-out attack
|
||||
surfaces are formally disallowed.
|
||||
- You're in a regulated environment where shell-out attack
|
||||
surfaces are formally disallowed by your security policy.
|
||||
- You're running multi-tenant certctl-server where tenant-A's
|
||||
script can affect tenant-B's certificates.
|
||||
|
||||
|
||||
@@ -39,10 +39,9 @@ Look elsewhere when:
|
||||
(`InsecureIgnoreHostKey`); MITM resistance requires the
|
||||
mitigations below.
|
||||
- Your environment has strict regulatory MITM-resistance
|
||||
requirements (PCI-DSS Level 1, FedRAMP High). The inline-comment
|
||||
"out of scope" framing on host-key acceptance doesn't satisfy
|
||||
auditors who want documented host-key verification at the
|
||||
connector level.
|
||||
requirements. The inline-comment "out of scope" framing on
|
||||
host-key acceptance doesn't satisfy reviewers who want
|
||||
documented host-key verification at the connector level.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
@@ -298,8 +298,8 @@ Out of scope for the V2-free deploy-hardening I bundle:
|
||||
- **Multi-region deployment coordination** — orchestration of N
|
||||
data-center deploys with operator approval gates per stage.
|
||||
- **Cert-pinning verification against mobile-app pin manifests**.
|
||||
- **SOC 2 evidence-report generator** — auto-export of the
|
||||
deploy audit trail in the format SOC 2 auditors expect.
|
||||
- **Audit-evidence report generator** — auto-export of the
|
||||
deploy audit trail in a reviewer-friendly format.
|
||||
- **Customer-paid validation matrices** — vendor-version certified
|
||||
quirks (e.g. "tested on F5 v15.1 + v17.0 + v17.5"). See
|
||||
`cowork/deploy-hardening-ii-prompt.md` for the per-vendor
|
||||
|
||||
@@ -10,10 +10,10 @@ The default `single`-mode flow (one operator-supplied sub-CA loaded
|
||||
from disk at boot) is unchanged and will keep working byte-for-byte
|
||||
forever. This page is for operators who need a real CA tree:
|
||||
|
||||
- FedRAMP boundary-CA deployments where the regulator requires
|
||||
separation of policy and issuing authorities.
|
||||
- Financial-services policy-CA deployments (one root, one policy CA
|
||||
per business unit, one issuing CA per environment).
|
||||
- Boundary-CA deployments where you want separation of policy and
|
||||
issuing authorities.
|
||||
- Policy-CA deployments (one root, one policy CA per business unit,
|
||||
one issuing CA per environment).
|
||||
- OT / industrial control networks where the air-gapped root signs
|
||||
online sub-CAs that go in and out of service on a rotation.
|
||||
|
||||
@@ -74,12 +74,12 @@ the children first.
|
||||
|
||||
## Common deployment patterns
|
||||
|
||||
### Pattern A — 4-level FedRAMP boundary CA
|
||||
### Pattern A — 4-level boundary CA
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["Acme Root CA<br/>path_len=3<br/>offline air-gapped"]
|
||||
Policy["Acme Policy CA<br/>path_len=2<br/>FedRAMP-Moderate boundary"]
|
||||
Policy["Acme Policy CA<br/>path_len=2<br/>boundary"]
|
||||
IssA["Acme Issuing A<br/>path_len=0<br/>prod workload leaves"]
|
||||
IssB["Acme Issuing B<br/>path_len=0<br/>ephemeral pod identity"]
|
||||
Root --> Policy --> IssA --> IssB
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
> `internal/service/est*_test.go`, and (for the libest interop layer)
|
||||
> `deploy/test/est_e2e_test.go` under `//go:build integration`. The
|
||||
> bundle is **V2-free**; per-tenant CA isolation, Conditional-Access
|
||||
> compliance gating, and EST cert-bound usage analytics are documented
|
||||
> device-state gating, and EST cert-bound usage analytics are documented
|
||||
> as V3-Pro deferrals in [V3-Pro deferrals](#v3-pro-deferrals).
|
||||
|
||||
## Contents
|
||||
@@ -710,10 +710,10 @@ These capabilities are deferred to V3-Pro (paid tier). They're not
|
||||
oversights — they're the natural follow-on bundles after v2.X.0 GA:
|
||||
|
||||
- **Conditional Access / device-posture gating.** The per-profile
|
||||
ESTService exposes a nil-default compliance-hook seam (mirrors the
|
||||
SCEP/Intune `ComplianceCheck` pattern). V3-Pro plugs in a
|
||||
ESTService exposes a nil-default device-state hook seam (mirrors
|
||||
the SCEP/Intune `DeviceStateCheck` pattern). V3-Pro plugs in a
|
||||
Microsoft Graph or other posture-check callback before issuance;
|
||||
non-compliant devices fail with a typed `est_compliance_failed`
|
||||
failing devices fail with a typed `est_device_state_failed`
|
||||
reason.
|
||||
- **Multi-tenant CA isolation.** V2 has one trust anchor pool per
|
||||
EST profile and one issuer binding. V3-Pro ships per-tenant root
|
||||
|
||||
@@ -5,10 +5,10 @@
|
||||
> **Status (this document):** Phase 11 of the SCEP RFC 8894 + Intune master
|
||||
> bundle. The behavior described here is shipped on `master` and exercised
|
||||
> end-to-end by `internal/api/handler/scep_intune_e2e_test.go`. The
|
||||
> bundle is V2-free (community edition) — Conditional-Access compliance
|
||||
> gating, native Microsoft Graph integration, and per-tenant trust
|
||||
> anchors are documented under [Limitations](#limitations) as V3-Pro
|
||||
> features.
|
||||
> bundle is V2-free (community edition) — Conditional-Access
|
||||
> device-state gating, native Microsoft Graph integration, and
|
||||
> per-tenant trust anchors are documented under
|
||||
> [Limitations](#limitations) as V3-Pro features.
|
||||
|
||||
## TL;DR
|
||||
|
||||
@@ -101,9 +101,10 @@ PKIMessage with the documented `pkiStatus`/`failInfo` codes (per RFC
|
||||
issuing many DIFFERENT valid challenges for the same device. Default
|
||||
3 enrollments per 24h covers legitimate first-cert + recovery +
|
||||
post-wipe.
|
||||
9. **Optional compliance check** — V3-Pro plug-in seam (nil-default
|
||||
no-op). When set, the gate calls Microsoft Graph's compliance API
|
||||
and short-circuits non-compliant devices with FAILURE+BadRequest.
|
||||
9. **Optional device-state check** — V3-Pro plug-in seam
|
||||
(nil-default no-op). When set, the gate calls Microsoft Graph's
|
||||
device-compliance API and short-circuits failing devices with
|
||||
FAILURE+BadRequest.
|
||||
|
||||
A request that passes all nine gates flows to
|
||||
`processEnrollment`, which builds the issuance request, calls the
|
||||
@@ -245,7 +246,7 @@ common root cause and the operator action.
|
||||
| `rate_limited` | A specific device hitting `429`-equivalent failures | The device exceeded `INTUNE_PER_DEVICE_RATE_LIMIT_24H` (default 3). If legitimate (post-wipe + recovery + first-cert all in 24h), bump the cap. If suspicious, this is the limiter doing its job — investigate the device. |
|
||||
| `unknown_version` | Sudden onset of failures across the entire fleet | Microsoft shipped a new Connector version with a `version` claim certctl doesn't understand. Open an issue on the certctl repo with the failing claim payload (anonymized); the parser dispatcher accepts new versions in ~30 LoC. |
|
||||
| `malformed` | Sporadic, low-volume | Malformed challenge bytes — almost always a network proxy mangling the request body, or the Connector logging itself out mid-handshake. Capture a packet trace; the Connector should re-emit on the next device retry. |
|
||||
| `compliance_failed` | V3-Pro only | The pluggable compliance check returned non-compliant. The audit-log details carries the reason string from Microsoft Graph. V2 deployments never see this counter tick. |
|
||||
| `device_state_failed` | V3-Pro only | The pluggable device-state check rejected the device. The audit-log details carries the reason string from Microsoft Graph. V2 deployments never see this counter tick. |
|
||||
|
||||
## Operational monitoring (SCEP Administration → Intune Monitoring tab)
|
||||
|
||||
@@ -327,10 +328,10 @@ V3-Pro:
|
||||
directly — the Connector already did that. V3-Pro could ship a
|
||||
Graph client that pulls device-compliance state in addition to
|
||||
the challenge claim.
|
||||
- **Conditional Access compliance gating.** The dispatcher exposes a
|
||||
nil-default `ComplianceCheck` hook. V3-Pro plugs in a Microsoft
|
||||
Graph compliance lookup before issuance; non-compliant devices
|
||||
fail with a typed `compliance_failed` failInfo.
|
||||
- **Conditional Access device-state gating.** The dispatcher exposes
|
||||
a nil-default `DeviceStateCheck` hook. V3-Pro plugs in a Microsoft
|
||||
Graph device-compliance lookup before issuance; failing devices
|
||||
exit with a typed `device_state_failed` failInfo.
|
||||
- **Per-tenant trust anchors.** V2 has one trust anchor pool per
|
||||
SCEP profile; V3-Pro could support per-AAD-tenant anchor scoping
|
||||
for MSPs running shared certctl deployments across customers.
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
> Last reviewed: 2026-05-05
|
||||
|
||||
> Deploy-hardening II master bundle deliverable. The procurement-team
|
||||
> headline doc — SOC 2 / PCI auditors paste this into evidence packs.
|
||||
> headline doc — reviewers paste this into vendor-evaluation packs.
|
||||
> Per frozen decision 0.14: a (connector × vendor-version) cell is
|
||||
> "verified" only when ALL apply: ≥1 happy-path e2e passes against
|
||||
> the real sidecar; ≥1 specific-quirk test for that version passes;
|
||||
|
||||
Reference in New Issue
Block a user