mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-10 18:40:21 +00:00
docs: Phase 2 mechanical file moves to subdirectory structure
Pure git mv operations; no content edits. Internal links remain pointing
at old paths and will be fixed in Phase 11. Per the Phase 1 audit
recommendations at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
35 files moved across 8 audience-organized subdirectories:
docs/getting-started/ (5):
quickstart.md, concepts.md, examples.md, advanced-demo.md (was
demo-advanced.md), why-certctl.md
docs/reference/ (6):
architecture.md, api.md (was openapi.md), mcp.md,
intermediate-ca-hierarchy.md, deployment-model.md (was
deployment-atomicity.md), vendor-matrix.md (was
deployment-vendor-matrix.md)
docs/reference/protocols/ (6):
acme-server.md, acme-server-threat-model.md, scep-intune.md,
est.md, crl-ocsp.md, async-ca-polling.md (was async-polling.md)
docs/operator/ (4):
security.md, tls.md, database-tls.md, approval-workflow.md
docs/operator/runbooks/ (3):
cloud-targets.md (was runbook-cloud-targets.md), expiry-alerts.md
(was runbook-expiry-alerts.md), disaster-recovery.md
docs/migration/ (3):
from-certbot.md (was migrate-from-certbot.md), from-acmesh.md
(was migrate-from-acmesh.md), cert-manager-coexistence.md (was
certctl-for-cert-manager-users.md)
docs/compliance/ (4):
index.md (was compliance.md), soc2.md (was compliance-soc2.md),
pci-dss.md (was compliance-pci-dss.md), nist-sp-800-57.md (was
compliance-nist.md)
docs/contributor/ (4):
testing-strategy.md, test-environment.md (was test-env.md),
ci-pipeline.md, qa-test-suite.md (was qa-test-guide.md)
Deferred to later Phase 2 sub-phases:
- connectors.md split (Phase 4): docs/connectors.md +
docs/connector-{apache,f5,iis,k8s,nginx}.md still at top level
- testing-guide.md prune (Phase 5): docs/testing-guide.md still
at top level
- features.md disperse (Phase 6): docs/features.md still at top
level
- legacy-est-scep.md split (Phase 7): docs/legacy-est-scep.md
still at top level
- ACME walkthrough re-homing (Phase 8): three
docs/acme-*-walkthrough.md still at top level
- Upgrade docs archive (Phase 3): two docs/upgrade-*.md still
at top level
Cross-reference updates (Phase 11) will happen after all moves and
content edits land. Internal links to docs/* paths are temporarily
broken until that phase completes.
This commit is contained in:
@@ -0,0 +1,231 @@
|
||||
# Intermediate CA hierarchy — operator runbook
|
||||
|
||||
Rank 8 of the 2026-05-03 deep-research deliverable. This page is the
|
||||
canonical reference for operators running certctl as a multi-level
|
||||
internal PKI.
|
||||
|
||||
The default `single`-mode flow (one operator-supplied sub-CA loaded
|
||||
from disk at boot) is unchanged and will keep working byte-for-byte
|
||||
forever. This page is for operators who need a real CA tree:
|
||||
|
||||
- FedRAMP boundary-CA deployments where the regulator requires
|
||||
separation of policy and issuing authorities.
|
||||
- Financial-services policy-CA deployments (one root, one policy CA
|
||||
per business unit, one issuing CA per environment).
|
||||
- OT / industrial control networks where the air-gapped root signs
|
||||
online sub-CAs that go in and out of service on a rotation.
|
||||
|
||||
## Concepts
|
||||
|
||||
`Issuer.HierarchyMode` is a per-issuer column on the `issuers` table.
|
||||
Two values are valid (the database default is `"single"` — back-compat
|
||||
byte-identical for unmigrated rows):
|
||||
|
||||
- `single` — pre-Rank-8 historical flow. The local connector loads a
|
||||
pre-signed CA cert+key from disk via `local.Config.CACertPath` /
|
||||
`local.Config.CAKeyPath`. Existing operators upgrade with no
|
||||
behavior change.
|
||||
- `tree` — the issuer's CAs are managed via the `intermediate_cas`
|
||||
table. Chain assembly walks the `parent_ca_id` foreign key from the
|
||||
issuing leaf CA up to the root and attaches the assembled chain to
|
||||
every `IssuanceResult`.
|
||||
|
||||
Each row in `intermediate_cas` is one CA cert (root, policy, issuing).
|
||||
The lifecycle is `created` → `active` → `retiring` → `retired`. The
|
||||
state column is a closed enum and validates at the service layer; the
|
||||
postgres CHECK constraint enforces it at the database layer too.
|
||||
|
||||
A CA's private key bytes are NEVER persisted on the row. The
|
||||
`key_driver_id` column is a reference (filesystem path / KMS key ID /
|
||||
HSM slot) that the `signer.Driver` resolves at sign time. A SQL
|
||||
injection or a row-leak surface MUST NEVER expose key bytes; only the
|
||||
reference can leak.
|
||||
|
||||
## Lifecycle states
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> created : CreateRoot / CreateChild
|
||||
created --> active : registration completes
|
||||
active --> retiring : Retire(confirm=false)
|
||||
retiring --> retired : Retire(confirm=true)
|
||||
retired --> [*]
|
||||
|
||||
note right of retiring
|
||||
Drain start. CA stops issuing
|
||||
NEW children; existing children
|
||||
keep issuing until they retire.
|
||||
end note
|
||||
|
||||
note right of retired
|
||||
Terminal. Refused if active children
|
||||
remain (ErrCAStillHasActiveChildren
|
||||
→ HTTP 409). OCSP keeps responding
|
||||
for already-issued leaves until expiry.
|
||||
end note
|
||||
```
|
||||
|
||||
Drain-first semantics: a CA in `retiring` state cannot terminalize to
|
||||
`retired` while it still has active children. The service layer
|
||||
returns `ErrCAStillHasActiveChildren`; the API surfaces HTTP 409. Drain
|
||||
the children first.
|
||||
|
||||
## Common deployment patterns
|
||||
|
||||
### Pattern A — 4-level FedRAMP boundary CA
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["Acme Root CA<br/>path_len=3<br/>offline air-gapped"]
|
||||
Policy["Acme Policy CA<br/>path_len=2<br/>FedRAMP-Moderate boundary"]
|
||||
IssA["Acme Issuing A<br/>path_len=0<br/>prod workload leaves"]
|
||||
IssB["Acme Issuing B<br/>path_len=0<br/>ephemeral pod identity"]
|
||||
Root --> Policy --> IssA --> IssB
|
||||
```
|
||||
|
||||
Operator workflow:
|
||||
|
||||
1. Mint the root cert+key on the offline workstation. Move the cert
|
||||
PEM (no key) to the online operator workstation.
|
||||
2. `POST /api/v1/issuers/{id}/intermediates` with the empty
|
||||
`parent_ca_id` and `root_cert_pem` + `key_driver_id` populated
|
||||
(the operator pre-positions the root key file at the path the
|
||||
`key_driver_id` points to). The service validates RFC 5280 §3.2
|
||||
self-signed semantics + cross-checks the operator-supplied key
|
||||
matches the cert (rejects mismatched bundles at registration time
|
||||
with `ErrCAKeyMismatch`).
|
||||
3. `POST /api/v1/issuers/{id}/intermediates` with `parent_ca_id`
|
||||
pointing at the root for the Policy CA. The service generates the
|
||||
child key via `signer.Driver.Generate`, signs the child cert via
|
||||
the parent's signer (loaded from the parent's `key_driver_id`),
|
||||
and persists the new row with the next `path_len` value (parent's
|
||||
- 1 if unset). Repeat for each lower level.
|
||||
4. Set `Issuer.HierarchyMode = "tree"` on the issuer row + set the
|
||||
`treeIssuingCAID` connector field to point at the deepest CA
|
||||
(Acme Issuing B in the example above) — issued leaves chain via
|
||||
`AssembleChain` from B up to the root.
|
||||
|
||||
### Pattern B — 3-level financial-services policy CA
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["FinCo Root CA<br/>path_len=2"]
|
||||
Pol["FinCo Trading Policy CA<br/>path_len=1<br/>permitted DNS = trading.finco.example"]
|
||||
Iss["FinCo Trading Issuing CA<br/>path_len=0"]
|
||||
Root --> Pol --> Iss
|
||||
```
|
||||
|
||||
Per business-unit name constraints: each policy CA carries a
|
||||
`PermittedDNSDomains` list scoped to the business unit (RFC 5280
|
||||
§4.2.1.10). The service enforces subset semantics — a child policy CA
|
||||
cannot widen the parent's permitted set, and cannot remove an
|
||||
excluded subtree. Operators submit `name_constraints` on the
|
||||
`POST /api/v1/issuers/{id}/intermediates` body.
|
||||
|
||||
### Pattern C — 2-level internal PKI
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Root["Internal Root CA<br/>path_len=0"]
|
||||
Iss["Internal Issuing CA<br/>path_len=0<br/>issues leaves directly"]
|
||||
Root --> Iss
|
||||
```
|
||||
|
||||
The simplest tree-mode deployment. Roughly equivalent to single mode
|
||||
in terms of operator overhead, but provides one extra layer of
|
||||
indirection so the root key can stay offline while only the issuing
|
||||
CA's key sits on the certctl host.
|
||||
|
||||
## RFC 5280 enforcement
|
||||
|
||||
All enforcement happens at the service layer. The local connector
|
||||
trusts the service's contract; the API layer translates errors to
|
||||
HTTP codes.
|
||||
|
||||
- §3.2 self-signed root validation: `cert.CheckSignatureFrom(cert)` +
|
||||
subject == issuer DN. Rejected with `ErrCANotSelfSigned` →
|
||||
HTTP 400.
|
||||
- §4.2.1.9 path-length tightening: child's `PathLenConstraint` must
|
||||
be strictly less than parent's. Default to `parent - 1` when unset.
|
||||
Rejected with `ErrPathLenExceeded` → HTTP 400.
|
||||
- §4.2.1.10 NameConstraints subset: child's `Permitted` set must be a
|
||||
subset of parent's; child's `Excluded` set must be a superset of
|
||||
parent's. Rejected with `ErrNameConstraintExceeded` → HTTP 400.
|
||||
- §4.1.2.5 validity capping: child's `notAfter` capped to parent's
|
||||
`notAfter` automatically (chain breaks at parent's expiry
|
||||
regardless).
|
||||
|
||||
## Migrating a single-mode issuer to tree mode
|
||||
|
||||
Pre-flight: the load-bearing pin
|
||||
`TestLocal_HierarchyMode_SingleVsTree_ByteIdentical` guarantees that
|
||||
a 1-level tree wired around the same on-disk root cert+key produces
|
||||
byte-identical issuance bundles to single mode. Migration is therefore
|
||||
a no-downtime operation if done carefully:
|
||||
|
||||
1. Register the existing single-mode CA cert as an `intermediate_cas`
|
||||
row via `CreateRoot` (with the existing on-disk key referenced as
|
||||
`key_driver_id`).
|
||||
2. Update the issuer row's `hierarchy_mode` to `"tree"` and set the
|
||||
connector's `SetTreeIssuingCAID` to the new row's ID. Restart the
|
||||
server (no new code path activates until the connector reads the
|
||||
updated mode at boot).
|
||||
3. Issue a test cert. The byte-equivalence pin guarantees the wire
|
||||
bytes match the pre-migration output for a 1-level tree.
|
||||
4. Build out the child CAs via `CreateChild` calls. Update
|
||||
`treeIssuingCAID` to the new leaf CA. Test, then ramp.
|
||||
|
||||
If the pin breaks during migration, abort: roll back the
|
||||
`hierarchy_mode` flip and investigate. The byte-equivalence pin is
|
||||
the canary — if it goes red, deeper bugs lurk.
|
||||
|
||||
## API reference
|
||||
|
||||
All endpoints under `/api/v1/issuers/{id}/intermediates` and
|
||||
`/api/v1/intermediates/{id}` are admin-gated. Non-admin Bearer callers
|
||||
get HTTP 403.
|
||||
|
||||
| Method | Path | Purpose |
|
||||
|--------|------|---------|
|
||||
| POST | `/api/v1/issuers/{id}/intermediates` | Register root OR sign child (body discriminator) |
|
||||
| GET | `/api/v1/issuers/{id}/intermediates` | List flat hierarchy for issuer |
|
||||
| GET | `/api/v1/intermediates/{id}` | Single-row detail |
|
||||
| POST | `/api/v1/intermediates/{id}/retire` | Two-phase retirement |
|
||||
|
||||
See `api/openapi.yaml` for full request/response schemas.
|
||||
|
||||
## Observability
|
||||
|
||||
`IntermediateCAMetrics` ships counters dimensioned by `(issuer_id,
|
||||
kind)`:
|
||||
|
||||
- `create_root` — successful CreateRoot calls.
|
||||
- `create_child` — successful CreateChild calls.
|
||||
- `retire_retiring` — `active → retiring` transitions.
|
||||
- `retire_retired` — `retiring → retired` transitions.
|
||||
|
||||
The Prometheus exposer reads the snapshot via
|
||||
`SnapshotIntermediateCA()` from a single instance constructed in
|
||||
`cmd/server/main.go` (the snapshotter is the single source of truth
|
||||
between the service-side recording path and the metrics-side exposing
|
||||
path).
|
||||
|
||||
The audit table receives one row per CreateRoot / CreateChild /
|
||||
Retire transition, scoped to the actor extracted from the API
|
||||
request's auth context.
|
||||
|
||||
## Known limitations
|
||||
|
||||
The following are tracked in `WORKSPACE-ROADMAP.md` as Rank-8 follow-on
|
||||
work — none are required for the v2.1.0 acquisition gate:
|
||||
|
||||
- HSM-backed roots beyond `signer.FileDriver` (PKCS#11 / cloud KMS
|
||||
drivers).
|
||||
- Automated rotation: scheduled re-issuance of sub-CAs ahead of
|
||||
expiry with parallel-validity windows.
|
||||
- Intra-hierarchy CRL chaining: each non-leaf CA publishes a CRL
|
||||
covering its direct children's revocations.
|
||||
- NameConstraints policy templates: declarative templates an operator
|
||||
can pick from instead of hand-rolling the JSON.
|
||||
- D3 dendrogram visualization on the GUI page (today's render is a
|
||||
recursive `<ul>` nested list).
|
||||
Reference in New Issue
Block a user