mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 14:11:31 +00:00
a485e31f63e81b1f20c053ab7e0fdaa2afee0937
171 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
1daae5d709 |
docs(readme): fix demo path command — point at deploy/demo-up.sh wrapper
Operator reproduction (verbatim log captured 2026-05-14):
$ docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
... build succeeds, containers come up ...
dependency failed to start: container certctl-server is unhealthy
$ docker compose ... logs certctl-server | tail -1
certctl-server | Failed to load configuration: phase-2 SEC-H3
fail-closed guard (missing TS): CERTCTL_DEMO_MODE_ACK=true requires
CERTCTL_DEMO_MODE_ACK_TS=<unix-epoch> set within the last 24h —
refuse to start.
Root cause
==========
README.md L95 documented a bare `docker compose ... up` command that
ignores the Phase 2 SEC-H3 fail-closed guard added in
internal/config/config.go::Validate (commit 2026-05-13). The guard
pairs CERTCTL_DEMO_MODE_ACK=true with a required
CERTCTL_DEMO_MODE_ACK_TS=<unix-epoch> that must be within the last
24h, so a forgotten demo deploy doesn't accidentally end up serving
production traffic with auth-type=none.
The demo overlay (deploy/docker-compose.demo.yml) passes the
timestamp through from the shell via
`CERTCTL_DEMO_MODE_ACK_TS: "${CERTCTL_DEMO_MODE_ACK_TS:-}"`. The
README command never exported it, so the server saw an empty value,
the guard refused to boot, the healthcheck never passed, and the
dependent certctl-agent container refused to start.
The deploy/demo-up.sh wrapper (which already exists; it's used by
CI cold-DB smoke and was added in the same SEC-H3 commit chain)
mints `CERTCTL_DEMO_MODE_ACK_TS="$(date +%s)"` before exec'ing
`docker compose` with the same -f flags. Drop-in replacement for
the bare compose invocation.
Fix
===
README.md "Demo path" code block now points at the wrapper script:
./deploy/demo-up.sh -d --build
Plus a one-paragraph explanation of why the wrapper is the supported
entry point and what the SEC-H3 timestamp gate is defending against.
The bare `docker compose ... up` form is documented as failing-closed
so a future operator who tries it understands the error message they
see.
Affected paths
==============
- README.md (the Quick Start "Demo path" block; lines 92-100 before,
93-103 after this change)
Out of scope (tracked separately if needed)
============================================
- The `WARN[0000] ... defaulting to a blank string` lines on docker
compose stdout (POSTGRES_PASSWORD, CERTCTL_API_KEY, etc.) are red
herrings — they fire on the BASE compose's env interpolation but
the demo overlay immediately overrides those with hardcoded
demo-safe values. They're noise; not a footgun. Leaving them
alone — silencing the WARN would require either an .env shim or
setting empty defaults at the base layer, both of which are
worse than the current warn-but-correct behaviour.
- The bare `docker compose -f base.yml up` production path
(README L108) is unchanged. That path requires a real .env and
will fail closed on placeholders — which is the correct
behaviour. The README already documents .env setup for that
path.
|
||
|
|
de8fac24a3 |
docs(readme): fix quickstart $EDITOR portability bug
The production-path quickstart at README.md:103-108 used `$EDITOR
deploy/.env` literally — assumes the operator has $EDITOR exported
in their shell. On a fresh macOS / zsh session (default install,
nothing in .zshrc), $EDITOR is unset and the shell expands the
command to ` deploy/.env` with a leading empty arg, which zsh tries
to execute as a binary:
shankar@macbookpro certctl % $EDITOR deploy/.env
zsh: permission denied: deploy/.env
The escalation reflex makes it worse — `sudo $EDITOR deploy/.env`
expands to `sudo deploy/.env` (sudo strips env by default), which
sudo dispatches as a command lookup against PATH:
sudo: deploy/.env: command not found
Net: a new-user quickstart that fails on the second command of the
production path with two opaque errors back-to-back.
Replace with the POSIX-portable default-fallback form:
"${EDITOR:-nano}" deploy/.env
`nano` is pre-installed on macOS (BSD nano) and every mainstream
Linux distro, so the fallback always resolves. The user's preferred
editor (vim/emacs/code) is still honored if they have $EDITOR set.
Added a parenthetical reminder so the operator who has a strong
editor preference knows they can substitute.
Verified no other phantom-EDITOR sites in README / docs/getting-started
/ docs/operator via:
grep -nE '\$EDITOR\b' README.md docs/getting-started/*.md docs/operator/*.md
|
||
|
|
0161bb201c |
docs: remove internal engineering docs; docs must be tool- or story-relevant
Operator policy: docs in the public repo must help (a) a user
deploying certctl or (b) the product story. Internal engineering
process documentation belongs in cowork/ scratchpads or in git
commit history, not docs/.
Removed (docs/contributor/, 8 files, 2,323 lines):
- release-sign-off.md — internal release-day checklist
- ci-pipeline.md — what runs in CI (internal)
- ci-guards.md — what the guards are (internal)
- testing-strategy.md — internal testing strategy
- qa-test-suite.md — internal QA reference (445 lines)
- qa-prerequisites.md — internal QA setup
- gui-qa-checklist.md — manual GUI QA checklist
- test-environment.md — 1,103-line redundant with
docs/getting-started/quickstart.md +
docs/getting-started/advanced-demo.md
Removed supporting script:
- scripts/qa-doc-seed-count.sh — CI guard for the deleted
qa-test-suite.md seed-data table
Cross-reference cleanup:
- README.md: dropped the Contributor audience row + footer
pointer to docs/contributor/.
- Makefile: dropped `verify-docs` target + qa-stats comment refs.
- .github/workflows/ci.yml: dropped the QA-doc seed-count drift
CI step + dead comment refs.
- docs/reference/cli.md: repointed qa-prerequisites.md → quickstart.md.
- docs/operator/performance-baselines.md: dropped ci-pipeline.md
cross-ref.
- scripts/ci-guards/README.md: dropped the 'Guards explicitly
NOT here' section that referenced the deleted QA-doc guards.
G-3 env-docs-drift guard improvements (a real consequence: deleting
the contributor docs surfaced that some env vars only had a home
there). Refit the guard to the new doc topology:
- Defined-scan widened from `config.go + cmd/*` to all of `cmd/ +
internal/` (production code), excluding `*_test.go` — catches
service-layer env vars like CERTCTL_STEPCA_ROOT_CERT and
CERTCTL_ZEROSSL_EAB_URL that were previously invisible to the
guard.
- Docs-scan widened to include deploy/ENVIRONMENTS.md (the
canonical env-var inventory table — should have been in scope
from day one). Kept narrow to README + docs/ + deploy/helm/ +
ENVIRONMENTS.md to avoid pulling in compose/test fixtures.
- ALLOWED filter now applies to both DOCS_ONLY and CONFIG_ONLY
directions, so dynamic per-profile dispatch surfaces
(CERTCTL_SCEP_PROFILE_<NAME>_*, CERTCTL_EST_PROFILE_<NAME>_*,
CERTCTL_QA_*) don't need static doc entries.
- Added CERTCTL_SCEP_PROFILE_[A-Z_]+ and CERTCTL_EST_PROFILE_[A-Z_]+
to ALLOWED for the same reason.
deploy/ENVIRONMENTS.md: added CERTCTL_ZEROSSL_EAB_URL row — real
operator override (overrides the ZeroSSL EAB-credentials endpoint;
read at internal/connector/issuer/acme/acme.go:372) that was
defined in Go source but never documented. G-3 caught it after the
defined-scan widened.
scripts/ci-guards/S-1-hardcoded-source-counts.sh: removed dead
WORKSPACE-CHANGELOG.md allowlist entry (the file was deleted in
the prior workspace cleanup).
Verified:
All 35 scripts/ci-guards/*.sh green (FAIL=0).
No remaining references to docs/contributor/ or qa-doc-seed-count
in tracked files.
|
||
|
|
47da13e7a1 |
fix(helm): close BUNDLE 3 — Helm chart hardening + enterprise deploy
Bundle 3 closure (2026-05-12 acquisition diligence audit). Closes the
"chart claims production-ready but lying-fields silently break it"
hazard cluster: README install command had wrong key, required secrets
weren't fail-fast, external Postgres rendered the bundled StatefulSet
hostname, container-only security hardening fields landed at pod scope
(silently dropped by K8s API), and three advertised template surfaces
(ServiceMonitor, PodDisruptionBudget, NetworkPolicy) didn't render at
all even when their values.yaml toggles were on.
Source findings closed:
C2 C3 D1 D2 D3 D5 D7 D11 D12 (repo audit)
OPS-L1 OPS-L2 (cowork audit)
Source findings explicitly deferred (tracked in WORKSPACE-ROADMAP.md):
D6 OPS-H1 (backup automation — operator must choose target storage)
D10 (digest pinning of latest `:latest` tags)
OPS-M1 (prometheus/client_golang migration)
OPS-M2 (distributed tracing instrumentation)
Chart truth table (rendered with helm 3.16.3):
-f values.yaml + tls.existingSecret + auth.apiKey + pg.auth.password
→ 12 resources (default mode, no monitoring/PDB/networkpolicy)
+ postgresql.enabled=false + externalDatabase.url=…
→ NO StatefulSet, NO postgres-secret, NO postgres-service (D2)
+ server.tls.certManager.enabled=true
→ +1 Certificate (cert-manager mode)
+ replicas=3 + monitoring.enabled=true + serviceMonitor.enabled=true
+ podDisruptionBudget.enabled=true + networkPolicy.enabled=true
→ +1 ServiceMonitor + 1 PodDisruptionBudget + 1 NetworkPolicy (D5+D11)
tls.existingSecret AND tls.certManager.enabled both set
→ REFUSED with "EXACTLY ONE TLS ownership path" error (D7)
Missing required secrets (apiKey / pg password / external URL)
→ REFUSED at template time with operator-actionable guidance (D1)
Closures by source ID:
C2 — README Helm install example fixed. Was `--set postgresql.password=…`
(does not exist); now `--set postgresql.auth.password=…` matching
the chart key. README install block also wires TLS, mentions
fail-fast at template time, and links the external-Postgres example.
C3 — Kubernetes Secrets connector annotated PREVIEW in values.yaml.
The chart still exposes `kubernetesSecrets.enabled` for the RBAC
preview wiring, but the values block now states clearly that the
production K8s client at internal/connector/target/k8ssecret/
k8ssecret.go::realK8sClient is a stub (verified — go.mod imports
zero k8s.io/client-go packages). Production landing tracked in
WORKSPACE-ROADMAP.md.
D1 — `certctl.requiredSecrets` template helper. Fail-fasts at render
time when (a) server.auth.type=api-key + apiKey empty, (b)
postgresql.enabled=true + pg.auth.password empty, (c)
postgresql.enabled=false + externalDatabase.url + legacy env
CERTCTL_DATABASE_URL all empty. Each branch emits an
operator-actionable diagnostic with the openssl rand command or
values override needed. postgres-secret template additionally
uses Helm's `required` builtin so it can't render with the empty
fallback that pre-Bundle-3 produced ("changeme" literal).
D2 — externalDatabase.url first-class. New top-level values block.
certctl.databaseURL helper now branches on postgresql.enabled:
bundled path uses the helper-emitted in-cluster URL; external
path uses externalDatabase.url verbatim. postgres-secret,
postgres-statefulset, and postgres-service ALL gate on
postgresql.enabled — external mode renders ZERO postgres-*
resources. POSTGRES_PASSWORD env in server-deployment also gates.
D3 — Container-vs-pod security context split. K8s API silently drops
readOnlyRootFilesystem / allowPrivilegeEscalation / capabilities /
privileged when they land at pod scope (`spec.securityContext`);
they only work at container scope (`spec.containers[].securityContext`).
Pre-Bundle-3 all fields sat at pod scope so the chart's documented
"read-only rootfs + drop-all caps" hardening was effectively
unenforced. New certctl.podSecurityContext + containerSecurityContext
helpers split the operator-facing securityContext map by field-name
whitelist so existing values keep working byte-for-byte while
fields render at the K8s-valid scope. Applied to both
server-deployment.yaml and agent-daemonset.yaml (DaemonSet + Deployment
branches).
D5 — Prometheus ServiceMonitor template. New
templates/servicemonitor.yaml. Renders when monitoring.enabled AND
monitoring.serviceMonitor.enabled. Scrapes /api/v1/metrics/prometheus
(rbac-gated on metrics.read — needs bearerTokenSecret with an API
key holding that perm). values.yaml block extended with bearerTokenSecret,
tlsConfig, and relabelings knobs and the operator-facing comment
documenting the auth requirement.
D7 — TLS both-set rejection. certctl.tls.required helper extended.
Pre-Bundle-3 only the NEITHER-set case was caught; setting BOTH
rendered a dangling cert-manager Certificate alongside an
existing-Secret mount, two conflicting TLS sources of truth.
Now refuses with "EXACTLY ONE TLS ownership path" + remediation
steps for both possible operator intents.
D11 — PodDisruptionBudget + NetworkPolicy templates. New
templates/pdb.yaml (renders when podDisruptionBudget.enabled +
server.replicas > 1) + templates/networkpolicy.yaml (renders when
networkPolicy.enabled). PDB uses minAvailable / maxUnavailable
exclusivity per K8s spec. NetworkPolicy default-allows in-namespace
agent → server traffic, kube-DNS egress, and bundled-postgres
egress (when postgresql.enabled), with operator-extensible
extraIngress / extraEgress for CA / OIDC / SMTP egress. Both
default off so existing deploys don't lose network reach
unannounced.
D12 — Database max-conn config wired. Pre-Bundle-3
internal/repository/postgres/db.go::NewDB hard-coded
SetMaxOpenConns(25). config.go loaded CERTCTL_DATABASE_MAX_CONNS,
Validate() enforced the >= 1 floor, values.yaml documented it,
and docs/reference/configuration.md surfaced it — but the pool
ignored every operator setting. New NewDBWithMaxConns threads
the operator value into the pool with maxIdle = maxOpen / 5
(≥ 1) so the historical ratio carries forward. cmd/server/main.go
calls the new constructor; NewDB stays for compat at the default 25.
OPS-L1 — Chart version 0.1.0 → 1.0.0. Chart has shipped through 8 audit
closures since 2026-02 (M-018, U-1, U-2, U-3, H-1, G-1, B1, B2);
pre-1.0 version was implying instability the chart no longer has.
OPS-L2 — External-Postgres path is now properly documented in values.yaml
(externalDatabase block with mode-2 example), README install command
links the existing examples/values-external-db.yaml, and the chart
truth table above proves the external mode renders cleanly.
Receipts:
helm lint deploy/helm/certctl/ # clean
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.auth.password=p \
--set server.auth.apiKey=k # 12 kinds, default
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.enabled=false \
--set externalDatabase.url='postgres://u:p@h:5432/db?sslmode=require' \
--set server.auth.apiKey=k # 9 kinds, no postgres-*
helm template c deploy/helm/certctl/ \
--set server.tls.certManager.enabled=true \
--set server.tls.certManager.issuerRef.name=letsencrypt \
--set postgresql.auth.password=p --set server.auth.apiKey=k
# +1 Certificate (cert-manager)
helm template c deploy/helm/certctl/ \
--set server.tls.existingSecret=ci \
--set postgresql.auth.password=p --set server.auth.apiKey=k \
--set server.replicas=3 \
--set monitoring.enabled=true \
--set monitoring.serviceMonitor.enabled=true \
--set podDisruptionBudget.enabled=true \
--set networkPolicy.enabled=true # +ServiceMonitor +PDB +NetworkPolicy
(TLS both-set + missing apiKey + missing pg password + missing extDb URL all REFUSED.)
gofmt -l # clean
go vet ./internal/repository/postgres ./cmd/server # clean
go build ./cmd/server # clean
bash scripts/ci-guards/B3-helm-chart-coherence.sh # clean
Remaining operator warnings (deferred, tracked in WORKSPACE-ROADMAP.md):
- Backup CronJob + restore script (D6 + OPS-H1): operator chooses
target (S3, GCS, Azure Blob, NFS). Sample CronJob yaml may ship
in deploy/helm/examples/ once an operator workstation has run
one full backup-restore cycle.
- Distributed tracing (OPS-M2): otel/* are go.mod indirect deps,
not actively instrumented. Adding spans is a v3 work item.
- Prometheus client_golang migration (OPS-M1): the hand-rolled
/metrics/prometheus exposition format works today; client_golang
migration unlocks histograms + exemplars + native label sets.
Audit-Closes: BUNDLE-3 C2 C3 D1 D2 D3 D5 D7 D11 D12 OPS-L1 OPS-L2
Audit-Defers: D6 D10 OPS-H1 OPS-M1 OPS-M2
|
||
|
|
a849c8b8cf |
fix(security): close BUNDLE 2 — safe first run, demo mode, agent bootstrap
Bundle 2 closure (2026-05-12 acquisition diligence audit). Closes the
"docker compose up == accidental production" hazard: pre-Bundle-2 the
base deploy/docker-compose.yml WAS the demo path (AUTH_TYPE=none +
DEMO_MODE_ACK=true + KEYGEN_MODE=server + DEMO_SEED=true + literal
change-me-... placeholder creds), the README claimed "drop the demo
overlay for a clean install", and ENVIRONMENTS.md table documented
auth-type default as api-key — three contradictory stories layered on
the same compose file.
Source findings closed:
R2 R3 C1 D9 finding-2 S9 (repo audit)
SEC-H2 SEC-M1 SEC-M3 OPS-M3 LOW-5 HIGH-6 (cowork audit)
Compose split (deploy/docker-compose.yml + deploy/docker-compose.demo.yml):
The base now ships production-shaped — no AUTH_TYPE override, no
KEYGEN_MODE override, no DEMO_MODE_ACK, no DEMO_SEED, no literal
placeholder fallbacks. POSTGRES_PASSWORD / CERTCTL_AUTH_SECRET /
CERTCTL_CONFIG_ENCRYPTION_KEY / CERTCTL_API_KEY / CERTCTL_AGENT_ID
must come from deploy/.env (sample template in deploy/.env.example +
root .env.example). The demo overlay carries the full demo posture
(every env var + every placeholder credential) so the
`-f docker-compose.demo.yml` one-flag flip remains a zero-config
populated-dashboard path.
Fail-closed startup guards (internal/config/config.go::Validate):
Three new gates layered on the existing HIGH-12 demo-mode listen-bind
guard. All three exempt CERTCTL_DEMO_MODE_ACK=true so the demo overlay
keeps working:
• HIGH-6: AUTH_SECRET = "change-me-in-production" → refuse
• HIGH-6: CONFIG_ENCRYPTION_KEY = "change-me-32-char..." → refuse
• LOW-5: CORS_ORIGINS contains "*" (CWE-942 + CWE-352) → refuse
Visible DEMO MODE banner (cmd/server/main.go): every boot under
DEMO_MODE_ACK=true now emits a prominent WARN line with a 6-step
production-promotion checklist. The 2026-04-19 incident (a screenshot
run that kept running for three days) drove this; the per-startup
banner makes the posture unmissable in any log scraper.
Agent enrollment doc alignment:
• docs/reference/configuration.md L83: corrected the non-existent
URL `POST /api/v1/agents/register` to the real route
`POST /api/v1/agents`; added the bootstrap-token note and the
install-agent.sh handoff sequence.
• docs/reference/architecture.md L154: replaced "agents register
themselves at first heartbeat" (false — cmd/agent/main.go fail-
fasts when CERTCTL_AGENT_ID is unset) with the actual two-step
operator-driven flow (REST or GUI registration first, returned ID
fed to install-agent.sh second).
Tests + CI guard:
• 9 new TestValidate_Bundle2_* cases in internal/config/config_test.go
covering: placeholder-secret refused + demo-ack exempt; placeholder
encryption-key refused + demo-ack exempt; real key not mistaken for
placeholder; wildcard CORS refused + demo-ack exempt; wildcard mixed
into a concrete allowlist still refused; concrete allowlist accepted.
• scripts/ci-guards/B2-compose-base-no-demo-env.sh: greps the base
compose for any of the demo-mode env vars + placeholder credentials.
Comments stripped before checking so the narrative header in the
base file can still reference the overlay's posture in prose.
Cold-DB CI smoke (.github/workflows/ci.yml::cold-db-compose-smoke):
Switched to layering -f docker-compose.demo.yml on top of the base —
the new production base requires real env vars the smoke doesn't have,
and the smoke's purpose (catch migration-on-cold-DB regressions + the
bootstrap-token mint path) is orthogonal to which auth posture the
boot lands in.
Receipts:
• Current first-run truth table
compose flag → posture
-f docker-compose.yml (production)
→ requires .env;
fail-fasts on
missing AUTH_SECRET
/ CONFIG_ENCRYPTION
_KEY / POSTGRES
_PASSWORD; agent
fail-fasts on
missing AGENT_ID
-f docker-compose.yml -f docker-compose.demo.yml (demo)
→ zero-config;
AUTH_TYPE=none +
DEMO_MODE_ACK=true
+ KEYGEN=server +
DEMO_SEED=true;
boot banner WARN
-f docker-compose.yml -f docker-compose.dev.yml (dev)
→ base + PgAdmin
+ debug logging
-f docker-compose.test.yml (test, standalone)
→ production-shape
posture, real CA
backends
• Verification (PATH=/tmp/go/bin export GO* paths to /tmp):
gofmt -l # clean (no diffs)
go vet ./internal/config ./cmd/server # clean
go test -short -count=1 ./internal/config/... # PASS (cumulative +
all 9 new Bundle 2
cases green)
go test -short -count=1 # PASS (no regression
./internal/connector/target/configcheck in the Bundle 1 -
closure tests)
go build ./cmd/server ./cmd/agent # clean
./cmd/cli ./cmd/mcp-server
bash scripts/ci-guards/B2-compose-base-no-demo-env.sh # clean
bash scripts/ci-guards/H-1-encryption-key-min-length.sh # clean
bash scripts/ci-guards/G-3-env-docs-drift.sh # clean
Remaining operator warnings (not blocking; tracked in CLAUDE.md
"Open decisions"):
• The first `docker compose -f docker-compose.yml up -d` against a
pre-Bundle-2 .env (placeholder values still in place) will now
fail-fast. This is the intended posture but operators upgrading
from v2.0.x via .env-from-old-master need to rotate before
upgrading. The CHANGELOG note for the v2.1.0 release should
call this out alongside Auth Bundle 2's other breaking changes.
Audit-Closes: BUNDLE-2 R2 R3 C1 D9 S9 SEC-H2 SEC-M1 SEC-M3 OPS-M3 LOW-5 HIGH-6
|
||
|
|
d60a0ac297 |
fix(security): close BUNDLE 1 — server+agent connector config validation chain
Bundle 1 closure (2026-05-12 acquisition diligence audit). Closes the
acquisition-blocker chain: target.edit (default r-operator grant per
migrations/000029_rbac.up.sql:196) → arbitrary reload_command stored
without validation → agent createTargetConnector json.Unmarshal-only
→ sh -c on agent host. README's 'shell injection prevention on all
connector scripts' claim is now true at the chain level.
Server-side: new internal/connector/target/configcheck package + a
configcheck.Validate call in target.go::Create + ::Update +
::CreateTarget + ::UpdateTarget (all 4 entry points). Rejects shell
metacharacters in reload_command / validate_command / restart_command
for nginx, apache, haproxy, postfix/dovecot, javakeystore, ssh. Sentinel
errors.Is(err, service.ErrInvalidConnectorConfig) available for handler
400 mapping. Non-shell connector types (F5, IIS, Caddy, Traefik, Envoy,
cloud targets, K8s) are no-ops by design.
Agent-side: defense-in-depth connector.ValidateConfig(ctx, configJSON)
call in cmd/agent/main.go inserted between createTargetConnector and
DeployCertificate. This catches (a) configs pre-dating the server gate,
(b) encrypted-blob tampering, (c) per-connector filesystem invariants
that the server can't check.
F5 (S2 finding): proven docs-vs-code drift, not a security bug. The
applyDefaults function never set Insecure=true; runtime default has
always been Go zero-value (false → TLS verified). Three lying 'default
true' comments in f5/f5.go (lines 30, 45-47, 126) rewritten to match
actual code behavior.
Docs (C4 + C9): README L12 + L68 narrowed — 'any CA / any server' →
'Twelve native CA connectors plus an OpenSSL adapter; fifteen native
deployment-target connectors plus a proxy-agent pattern.' 'Every deploy
goes through atomic-write + ...' narrowed to file-based connectors with
inline link to per-target guarantee matrix. New deployment-model.md §1.6
ships a 15-target × 8-property guarantee table covering atomic write /
owner-perms / SHA-256 idempotency / pre-deploy snapshot / on-failure
rollback / post-deploy TLS verify / Prometheus counters / shell-injection
validation — including the K8s preview honesty marker (CLAIM-H4).
Tests: internal/connector/target/configcheck/configcheck_test.go covers
14 shell-injection payloads (semicolon, pipe, backtick, dollar-paren,
redirect, and-chain, newline, double-quote, escape, dollar-var) × 7
shell-using connectors + benign-command acceptance + non-shell no-op
behavior + empty config + malformed JSON. All pass.
Verification (run from /sessions/gifted-blissful-pasteur/mnt/cowork/certctl):
go fmt ./... # clean (no diffs)
go vet ./... # clean (no findings)
go test -short -count=1 ./internal/... ./cmd/...
# 60+ packages all ok, zero FAIL
Audit-Closes: BUNDLE-1 RT-C1 SEC-M4 CLAIM-M2 CLAIM-L3
Audit-Verifies-False: S2 (F5 'default insecure' was a comment lie, code was always secure)
|
||
|
|
7b3a57dfdf | docs(readme): revert Status block to 4-paragraph form (over-split was too choppy) | ||
|
|
a103ccfe5c | docs(readme): one sentence per blockquote in Status block — full breathing room | ||
|
|
c029875196 |
docs(readme): Status block rewrite — design-partner CTA, paragraph cadence
Earlier versions were either link-soup or so tight they read as
boilerplate. This pass aims for CMO-grade copy:
- Paragraph 1: lede that combines the early-access label with the
design-partner ask — sets the tone in one line.
- Paragraph 2: what's production-quality today, with the RBAC + OIDC
doc links inline (no bold, no link-soup). Names the v2.1.0 layer
on top.
- Paragraph 3: the ask — production deployments wanted, framed
explicitly as 'we can't manufacture this exposure in CI'. Honest
about the federated-identity surface being where the new exposure
lives. Mutual-value framing.
- Paragraph 4: the actionable bit — file issues liberally, with the
why ('how the platform earns the right to drop early-access').
Three inline doc links (RBAC, OIDC runbook index, file-issues).
Same factual content, warmer voice, paragraph cadence with
breathing room between.
|
||
|
|
ed833e80f6 | docs(readme): space out the Status block — three separate blockquotes | ||
|
|
0eb3d0310c |
docs(readme): tighten Status block; add RBAC + OIDC runbook links
Quieter version of the Status block — single blockquote, three short sentences, three inline links (RBAC, OIDC, file-issues). Drops: - The Local-CA / ACME / agent-deployment / CRUD / audit feature pile (those live in the doc table immediately below) - The 6-IdP enumeration (Keycloak / Authentik / Okta / Auth0 / Entra ID / Google Workspace) — operators find that in the OIDC runbook index, now linked inline - The double 'in early-access' phrasing - 'HMAC-signed server-side sessions with __Host- cookies and CSRF rotation; OIDC Back-Channel Logout; Argon2id break-glass admin' — the spec details belong in the auth-threat-model + security docs, not the front-page status Same early-access framing, same issue-link CTA, far more readable. |
||
|
|
46769fc7fa |
docs(readme): audit pass — fix 7 stale/inaccurate claims
Each claim ground-truthed against the live repo, not memory. Numeric drift (claims rotted since they were written): - Screenshot caption 'Catalog with 10 CA types' → 12 (matches internal/connector/issuerfactory/factory.go enumeration). - '33-permission canonical catalogue' → dropped the number. 33 was the base in migration 000029; across all 45 migrations 82 unique perms are seeded (+5 admin / +7 OIDC / +2 break-glass / +33 audit-CRIT-1 / +2 user). 'Fine-grained permission catalogue' is monotonic prose. - 'PostgreSQL 16 backend (35+ tables, idempotent migrations)' → '…backend with idempotent migrations'. Actual table count is 49 across 45 migrations; bare 'idempotent migrations' is drift-proof. - Demo overlay seeds '32 certificates across 10 issuers, 8 agents, 180 days' → '180 days of realistic history across 13 issuers, 8 agents, managed + discovered certs, jobs, deploys, audit, and notification events'. seed_demo.sql actually seeds 14 managed certs + 16 cert versions + 12 discovered, 13 issuers (not 10), 8 agents ✓, 23 INTERVAL '180 days' refs ✓. - 'golangci-lint (11 linters)' → '(govet + staticcheck + contextcheck + unused)'. .golangci.yml lists exactly 4 active linters; 6 others are commented-out 'temporarily disabled' so neither 4 nor 10 explains 11. Broken Helm one-liner (silently no-ops because --set against a nonexistent path doesn't error): - '--set server.apiKey=…' → 'server.auth.apiKey' (deploy/helm/certctl/values.yaml:147 + templates/server- secret.yaml:16). - '--set postgres.password=…' → 'postgresql.password' (top-level key is 'postgresql', not 'postgres'; password sits at postgresql.password per values.yaml:315). Verified accurate (no change): - 12 issuers / 15 targets / 6 notifiers (factory + dir listings). - 7 default roles seeded in migration 000029. - Coverage thresholds (service 70 / handler 75 / crypto 88 / auth packages 85-95) against .github/coverage-thresholds.yml. - All 6 OIDC runbooks present (auth0 / authentik / azure-ad / google-workspace / keycloak / okta). - 4 referenced screenshots all exist on disk. - 8 agents in demo seed, 180 days of history. - RFC 9700 §4.7.1 / 9207 / 8555 / 9773 / 8894 / 9266 / 5280 / 6960 citations match source. - ChromeOS in SCEP description matches source. - install-agent.sh uses uname for OS / arch detection + systemd (Linux) / launchd (macOS). |
||
|
|
12705efe36 | docs(readme): split Status block into two blockquotes for breathing room | ||
|
|
de53847f51 |
docs(readme): quiet the Status block
The previous version crammed 5 bold-emphasized inline links plus inline code into a single paragraph — visually loud and hard to scan. Rewrite as two short paragraphs: - First paragraph: what's production-quality + what's still maturing. No links, em-dash cadence for breathing room. - Second paragraph: v2.1.0 OIDC + sessions + break-glass slice with a single issue-link tail. Drops the bold-link sandwich in favor of plain prose; the doc-nav table directly below handles per-doc routing. Same content, same early-access framing, far less visual noise. |
||
|
|
56e2ea1ad7 |
docs: v2.1.0 release polish — strip internal bundle/phase tags, update status for OIDC ship
README:
- Rewrite Status block: drop the stale 'federated identity not yet
shipped' line; flag v2.1.0 OIDC + sessions + back-channel logout
+ break-glass as early-access; encourage GitHub issues for IdP
rough edges. (A1 framing — keep early-access umbrella, no
SAML/WebAuthn/JIT roadmap teaser.)
- Add OIDC SSO bullet to 'What it does' covering per-IdP runbooks,
group-claim → role mapping, AES-256-GCM client_secret encryption,
JWKS auto-refresh, PKCE-S256, RFC 9700 §4.7.1 pre-login binding,
RFC 9207 iss check, __Host- cookies, CSRF rotation, idle+absolute
expiry, BCL, break-glass admin.
- Update Security paragraph: three auth paths (API keys / OIDC /
break-glass), HMAC-signed sessions, CSRF rotation, RFC OIDC BCL.
- Correct CI coverage thresholds against
.github/coverage-thresholds.yml (service 70%, handler 75%,
crypto 88%, auth packages 85-95%); 'static analysis' replaces
the inflated '11 linters' claim (actual count is 4 active).
Docs B3 sweep — strip operator-facing 'Bundle N' / 'Phase N' tags:
- docs/operator/auth-threat-model.md — rewrite intro; rename 5 H2
sections (API-key + RBAC defenses / OIDC + sessions + break-glass
defenses / OIDC + sessions threat catalogue / Closed federated-
identity threats / Future-work threats); clean ~12 H3/prose hits.
- docs/operator/rbac.md — strip Bundle 1 framing from intro,
scope_id deferral note, MCP tools section, day-0 bootstrap, and
'Where to look next'.
- docs/operator/auth-benchmarks.md — drop 'Phase 14' framing from
title intro, hardware floor caption, result table caption,
methodology, and pre-merge audit section.
- docs/operator/security.md — already cleaned earlier this session
(RBAC / day-0 / approval-bypass / OIDC federation / sessions /
OIDC first-admin / break-glass H3s).
- docs/operator/oidc-runbooks/{index,keycloak,authentik,okta,
azure-ad}.md — strip Auth Bundle 2 framing + Phase 10/3/4
references; replace with feature-name prose.
- docs/operator/legacy-clients-tls-1.2.md — drop Bundle F / M-023
audit-reference framing; keep CWE-326.
- docs/operator/database-tls.md — drop Bundle B / M-018 framing
from intro + Helm section.
- docs/operator/runbooks/disaster-recovery.md — drop 'Production
hardening II Phase 10' status callout.
- docs/migration/oidc-enable.md — retitle 'Enable OIDC SSO';
strip Bundle 1/2 framing from prereqs, troubleshooting, related
docs; update __Host- cookie callout from 'audit MED-14' to
v2.1.0-BREAKING.
- docs/migration/api-keys-to-rbac.md — strip Bundle 1 framing from
intro, migration table, IsAdmin section, and cross-references.
- docs/migration/acme-from-cert-manager.md — strip residual
'Phase 5' tags from cert-manager integration test references.
- docs/reference/configuration.md — retitle Auth section.
- docs/reference/profiles.md — strip Bundle 1 Phase 9 framing
from RequiresApproval section + Related list.
- docs/reference/auth-standards-implemented.md — rewrite intro
(API-key + RBAC + OIDC + sessions + back-channel logout +
break-glass); rename 'Bundle 1 (RBAC) standards covered
separately' H2; clean per-row Phase references.
- docs/README.md — rewrite nav-table entries to drop Bundle 1/2
parentheticals; retitle 'Enable OIDC SSO' migration entry.
No code or test changes; pure operator-facing prose polish for
the v2.1.0 tag.
|
||
|
|
977cdbdf44 |
docs(README): surface Bundle 1 RBAC + signal Bundle 2 federation as roadmap
Pre-fix the README said nothing about role-based access control,
the auditor role, the day-0 bootstrap path, or the four-eyes
approval workflow — all shipped in Bundle 1 (commit
|
||
|
|
ff6bf8f203 |
docs(README): add Status: Early-access disclosure block
Reddit posts and operator-facing copy describe certctl as alpha for production, but the README's marketing-paragraph framing implied a more polished maturity. Dual-positioning erodes credibility because evaluators read both surfaces. Adds a dedicated "Status: Early-access" blockquote between the SC-081v3 paragraph and the existing "Actively maintained, shipping weekly" callout. Calls out the production-quality core (Local CA, ACME, agent deployment, CRUD, audit) versus the still-maturing broader surface (intermediate CA hierarchy, ACME/SCEP/EST servers, network appliances). Encourages lab/dev deployments and welcomes production deployments with the customer-scale caveat. The two consecutive blockquotes (Status + Actively maintained) read as paired signals: the project is early-access AND actively shipping, which is the honest joint position. |
||
|
|
1720e11109 |
docs: fix broken single-file demo invocation in README + qa-prerequisites + ENVIRONMENTS
The README's Quick Start, the qa-prerequisites contributor doc, and the
landing page (separate repo, separate commit) all shipped a copy-paste
command that produces:
service "certctl-server" has neither an image nor a build context
specified: invalid compose project
The bug landed silently with commit
|
||
|
|
e0aaa967c9 |
docs(README): add MCP server bullet to capabilities list
The README's 'What it does' section enumerated 11 capability bullets (issuers / targets / ACME server / SCEP server / EST server / hierarchy / approvals / discovery / revocation / alerts) but had zero mention of the MCP server. The 2026-05-05 CLI/API/MCP ↔ GUI parity audit confirmed 93 MCP tools shipped today (87 in internal/mcp/tools.go + 6 in internal/mcp/tools_est.go) covering the full API surface. That's a real differentiator hidden from anyone landing on the README. Adds a 12th bullet positioning the MCP server with concrete example queries operators can ask their AI client (expiring certs, revoke with key-compromise reason, agent offline check). Frames the architectural facts: separate binary at cmd/mcp-server/, stateless stdio transport, no extra auth surface beyond the existing API key, no extra attack surface. Links to docs/reference/mcp.md for setup details. |
||
|
|
7c5cc57d75 | |||
|
|
d809874fa1 |
docs: retire compliance subtree + sweep framework name-drops from prose
Per operator decision the framework-mapping docs are gone. They
were aspirational (no audit, no certification, no validated
mapping); keeping them around was misleading.
Files deleted (1,883 lines):
- docs/compliance/index.md
- docs/compliance/soc2.md
- docs/compliance/pci-dss.md
- docs/compliance/nist-sp-800-57.md
Hyperlinks removed:
- README.md: 'Auditor / compliance' row in the doc table; the
'(compliance mapping included)' parenthetical in the
positioning paragraph
- docs/README.md: the '## Compliance' section table; the
'Auditor / compliance team' reading-order-by-role row
Prose name-drops swept across 24 files:
- README.md: 'FedRAMP boundary CAs / financial-services policy
CAs' → '4-level boundary CAs / 3-level policy CAs';
'Compliance-grade for PCI-DSS Level 1, FedRAMP Moderate / High,
SOC 2 Type II, HIPAA' → cut entirely
- getting-started/{quickstart,concepts,examples,why-certctl,
advanced-demo}.md: 'compliance' → 'audit' / 'policy';
'PCI-DSS / SOC 2 / NIST SP 800-57' framework lists cut;
''pci': 'true'' tag example → ''environment': 'production''
- migration/cert-manager-coexistence.md: 'compliance rules' →
'policy rules'
- operator/approval-workflow.md: 'Compliance customers (PCI-DSS
Level 1, FedRAMP Moderate / High, SOC 2 Type II, HIPAA)' →
'Operators'; entire 'Compliance control mapping' table
(PCI-DSS §6.4.5 / NIST SP 800-53 SA-15 / SOC 2 Type II CC6.1
/ HIPAA §164.308(a)(4)) deleted; 'compliance contract' →
'two-person-integrity contract'; 'compliance auditors' →
'reviewers'
- operator/legacy-clients-tls-1.2.md: 'PCI-DSS v4.0 Req 4 §2.2.5'
audit-reference → CWE-326 (kept); 'PCI-DSS Req 4 §2.2.5
attestation' section retitled to 'TLS posture summary' and
rewritten without framework framing; 'PCI-DSS, NIST, and
major browsers will eventually deprecate TLS 1.2' →
'Major browsers and OS vendors will eventually deprecate
TLS 1.2'
- operator/database-tls.md: PCI-DSS Req 4 §2.2.5 audit-ref →
CWE-319 only; 'PCI-DSS scope' → 'sensitive data'; PCI-DSS
Req 4 v4.0 prose footing → cut
- operator/runbooks/disaster-recovery.md: 'SOC 2 / PCI
procurement-team deliverable' → 'on-call deliverable';
'compliance auditors' → 'reviewers'
- reference/connectors/{acme,aws-acm,azure-kv,globalsign,
local-ca,openssl,ssh,index}.md: 'compliance reporting
(PCI-DSS §3.6, HIPAA §164.312)' → 'audit reporting';
'Compliance environments (PCI-DSS Level 1, FedRAMP High,
HIPAA)' → 'Regulated environments'; 'compliance audits' →
'audit'; 'FedRAMP boundary CA' pattern names →
'4-level boundary CA' (technically descriptive)
- reference/protocols/est.md: 'compliance-hook seam' →
'device-state hook seam'; 'compliance gating' → 'device-state
gating'; 'est_compliance_failed' → 'est_device_state_failed'
- reference/protocols/scep-intune.md: 'Optional compliance
check' → 'Optional device-state check'; failure-counter
'compliance_failed' → 'device_state_failed'; 'Conditional
Access compliance gating' → 'Conditional Access
device-state gating'
- reference/intermediate-ca-hierarchy.md: 'FedRAMP boundary-CA
deployments where the regulator requires...' →
'Boundary-CA deployments where you want separation of policy
and issuing authorities'; pattern A retitled '4-level FedRAMP
boundary CA' → '4-level boundary CA'
- reference/architecture.md: broken Related-docs link to
compliance.md removed; the rest of that block had stale
pre-Phase-2 paths (quickstart.md, demo-advanced.md,
connectors.md, openapi.md, testing-guide.md, test-env.md) —
retargeted to current locations
- reference/deployment-model.md: 'SOC 2 evidence-report
generator' → 'Audit-evidence report generator'
- reference/vendor-matrix.md: 'SOC 2 / PCI auditors paste this
into evidence packs' → 'reviewers paste this into
vendor-evaluation packs'
- contributor/qa-test-suite.md: 'compliance exist' coverage
description cut; 'Compliance (PCI / SOC2 / HIPAA-relevant)'
risk-class label → 'Audit-relevant'
What was kept:
- CWE references (legitimate technical pointers)
- Microsoft API/feature names that happen to use 'compliance'
literally ('Microsoft Graph compliance API',
'device-compliance validators' — these are MS product names,
not framework name-drops)
- 'NIST PQC' on the landing page (Post-Quantum Cryptography is
the actual NIST standard family, not a compliance framework)
Verified: zero hyperlinks into docs/compliance/ remain. All 24
ci-guards/*.sh pass locally. qa-doc-seed-count.sh clean.
Net diff: 26 files / -1,883 deletions in compliance/ + -32 net
across the prose sweep.
Companion edits in cowork/ (CLAUDE.md doc-tree summary +
WORKSPACE-CHANGELOG.md retirement note) land separately.
|
||
|
|
426760d737 |
docs: Phase 13 — README rewrite to 250-line target
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
README went from 457 lines to a target of 250 (operator decision in
Phase 1 conversation). Focus shifts from feature-catalog + landing-page
duplicate to "developer cloning the repo needs orientation + quickstart
+ entry points to docs."
What stayed:
- Logo + title + badges (~15 lines)
- Elevator paragraph + 47-day cliff context (3 paragraphs, compressed)
- Active-maintenance callout
- Documentation table — restructured from 22 entries linking to flat
docs/ to ~6 audience-organized rows linking through the new
docs/README.md navigation index
- Screenshots grid (4 tiles)
- "What it does" — compressed from 33 lines of prose to 8 capability
bullets, each linking to the canonical doc
- Architecture paragraph — compressed to one paragraph linking to
docs/reference/architecture.md
- Quick Start (Docker Compose, Agent install, Helm, container images)
- Examples table (5 turnkey scenarios)
- Development commands
- License paragraph
- Dependencies block
- Footer CTA
What got moved out:
- Cosign verification / SLSA / SBOM section (67 lines) →
docs/reference/release-verification.md (NEW). README links to it
in a 3-line "Verifying a release" section.
What got removed entirely:
- "Why certctl" + "Architecture" + "Security-first" + "Key design
decisions" prose walls — duplicated landing page + architecture.md +
security.md content. README no longer wades through 11 dense
paragraphs.
- "Supported Integrations" 4 sub-tables (Issuers / Targets / Protocols
/ Standards / Notifiers, ~80 lines of dense per-row marketing
copy) — content lives at docs/reference/connectors/index.md and
docs/reference/protocols/. README mentions counts ("12 issuers, 15
targets, 6 notifiers") with a single link.
- "Roadmap" section entirely — V1 + V2 history rotted fastest of any
section; replaced with implicit "see Releases + Issues for active
work" via the existing footer CTA.
- "What It Does" 10-subsection wall (33 lines) — replaced with the
8-bullet capability list, each linking to its canonical doc.
- CLI section (20 lines of inline command examples) — links to the
contributor docs.
- MCP Server section (30 lines of setup) — links to docs/reference/mcp.md.
New surface added:
- docs/reference/release-verification.md — moved cosign/SLSA/SBOM
procedure with one expanded "Why this matters" paragraph
explaining the keyless OIDC trust anchor.
Every docs/ link in the new README verified to resolve to an existing
file. Cross-references from other docs / certctl.io to the deleted
sections (if any) need follow-up Phase 11 sweeps.
|
||
|
|
dca1900815 |
docs: Phase 11 (partial) — fix cross-references after Phase 2 moves
Per Phase 1 audit at cowork/docs-overhaul-phase-1-audit-2026-05-04/.
Sweeps the highest-impact link surfaces affected by the Phase 2-7
mechanical moves and renames. Covers README.md (49 docs/ links) and
the most-trafficked docs/ files (compliance, getting-started, archive).
README.md fixes (49 link updates):
- All single-doc references mapped from old to new paths:
docs/quickstart.md → docs/getting-started/quickstart.md
docs/architecture.md → docs/reference/architecture.md
docs/connectors.md → docs/reference/connectors/index.md
docs/acme-server.md → docs/reference/protocols/acme-server.md
docs/{soc2,pci-dss,nist}.md → docs/compliance/{soc2,pci-dss,nist-sp-800-57}.md
... (full mapping in the sed pipeline)
- 3 references to deleted features.md replaced with pointers to
architecture.md + connectors/index.md.
docs/compliance/index.md (3 sibling renames):
compliance-soc2.md → soc2.md
compliance-pci-dss.md → pci-dss.md
compliance-nist.md → nist-sp-800-57.md
docs/compliance/pci-dss.md (3 external refs need ../):
architecture.md → ../reference/architecture.md
connectors.md → ../reference/connectors/index.md
quickstart.md → ../getting-started/quickstart.md
docs/getting-started/concepts.md (4 external refs):
crl-ocsp.md → ../reference/protocols/crl-ocsp.md
architecture.md → ../reference/architecture.md
mcp.md → ../reference/mcp.md
openapi.md → ../reference/api.md
docs/getting-started/quickstart.md (4 external refs + 1 sibling):
tls.md → ../operator/tls.md
upgrade-to-tls.md → ../archive/upgrades/to-tls-v2.2.md
architecture.md → ../reference/architecture.md
demo-advanced.md → advanced-demo.md (sibling rename)
docs/getting-started/examples.md (4 external refs):
migrate-from-certbot.md → ../migration/from-certbot.md
migrate-from-acmesh.md → ../migration/from-acmesh.md
certctl-for-cert-manager-users.md → ../migration/cert-manager-coexistence.md
connectors.md → ../reference/connectors/index.md
docs/archive/upgrades/to-tls-v2.2.md (3 external refs need ../../):
tls.md → ../../operator/tls.md
quickstart.md → ../../getting-started/quickstart.md
test-env.md → ../../contributor/test-environment.md
docs/archive/upgrades/to-v2-jwt-removal.md (2 external refs need ../../):
architecture.md → ../../reference/architecture.md
tls.md → ../../operator/tls.md
Verified all README.md docs/ links resolve to existing files. The only
remaining top-level link is testing-guide.md which still exists at the
top of docs/ (Phase 5 will prune it later).
Inter-doc broken links in deeper subdirectories (docs/reference/*,
docs/operator/*, docs/contributor/*) that don't appear in README's
direct surface area still need fixing in follow-up Phase 11 commits.
This commit handles the operator-facing entry points.
|
||
|
|
e50ba168ac |
docs(README): strategic refresh — surface Rank 4/5/7/8 + ACME server + cloud targets
README audit found six classes of drift between the README and the
shipped repo. Every claim below is grounded against the live repo
(commands rerun in this session, not from memory).
Stale numeric claims fixed:
'111 routes' → '180+ routes'
(live: grep -cE 'r\.Register' router.go = 184)
'80 tools' → '85+ tools'
(live: grep -cE 'mcp\.AddTool' tools.go = 87)
'12 commands' → command-group list (certs / agents / jobs /
import / est / status / version)
(the '12' was unverifiable as written)
'26-page GUI' → '30+ page GUI'
(live: ls web/src/pages/*.tsx | grep -v test = 31)
'21 tables' → '35+ tables'
(live: distinct CREATE TABLE in migrations = 35)
Connectors added to tables (these shipped commits ago without
README mentions):
Deployment Targets:
AWS Certificate Manager (AWSACM) — commit
|
||
|
|
cabe1aee45 |
docs(README): drop V3 Pro + V4 sections — everything ships free under BSL
Strategic pivot. We are NOT building a V3 Pro paid tier or a V4 cloud / scale tier. Every certctl feature — current and future — ships free under the same BSL 1.1 source-available license. No gated features, no paid edition, no enterprise tier. Future revenue path is a managed-service hosting offering: operator runs the certctl-server control plane as a hosted service; customers self-install only the certctl-agent in their infrastructure. The self-hosted code stays free forever; the managed service sells operational convenience (no PostgreSQL to run, no upgrades, no backups, no SSO setup). BSL 1.1 was already structured around exactly this — the license expressly prevents competitors from running their own commercial certctl-as-a-service against the same source while leaving self-hosting unrestricted. Removed the old roadmap sections: - "### V3: certctl Pro" — Enterprise capabilities for larger deployments are available in the commercial tier. - "### V4+: Cloud & Scale" — Kubernetes cert-manager external issuer, cloud infrastructure targets, extended CA support, and platform-scale features. Replaced with a single "Forward-looking work — all free, all self-hostable" section that names the real engineering tracks (OIDC / SSO / RBAC, NATS / real-time, search / risk scoring, HSM / TPM / FIPS, deeper Vault auth, cloud-managed-target deep integrations, adapter hardening, credential lifecycle expansion) and points at the workspace-level WORKSPACE-ROADMAP.md for the unshipped backlog. The full feature surface lands in V2 over time — V3 / V4 are not real version targets, they were positioning artifacts. Diff: 2 insertions / 5 deletions. README's License section (BSL 1.1 licensing-inquiries footer) is unchanged. |
||
|
|
0729ee46e0 |
chore: sweep github.com/shankar0123/certctl URL refs to certctl-io/certctl
Post-transfer cosmetic + release-critical URL refresh after moving the
repo from github.com/shankar0123/certctl to github.com/certctl-io/certctl
(2026-05-03). GitHub HTTP redirects continue to forward old URLs forever,
so existing operators are not broken — but aligns the canonical
references with the new owner so:
- procurement engineers / contributors browsing the docs see the right
URL on first read
- operators copying the agent install one-liner hit the new path
directly without going through a redirect
- the Helm chart's default image repository points at the canonical org
registry path
- the OnboardingWizard rendered to first-run UI users shows the new
URL in the install snippets and doc anchor links
- the GitHub Actions release workflow pushes container images to
ghcr.io/certctl-io/certctl-{server,agent} (was: shankar0123)
- the release-notes Markdown body in release.yml — which gets stamped
into every future release page — references the post-transfer
cert-identity (cosign keyless signing now uses the certctl-io
workflow URL) and the post-transfer SLSA provenance source-uri.
Without this, every cosign verify / slsa-verifier command on a
v2.1.0+ release would fail because the cert-identity-regexp would
not match the signing identity GitHub Actions OIDC issues post-
transfer. Old releases (v2.0.67 and earlier) keep their immutable
release-notes pointing at the shankar0123 path and remain
verifiable via their own published instructions.
Customer impact:
- Operators on ghcr.io/shankar0123/certctl-{server,agent}:latest
silently freeze on whatever tag was current at transfer time. They
get no errors; they just stop receiving updates. The next release
notes need a one-line callout (Phase 3.1 of cowork/transfer-
certctl-to-org.md) telling them to update their image path to
ghcr.io/certctl-io/certctl-{server,agent}.
- All other URLs (git clone, install one-liner, raw.githubusercontent
URLs, browser links, GitHub API) continue to resolve via permanent
HTTP redirects. The sweep is cosmetic for those.
Files swept (30 total):
.github/workflows/release.yml — IMAGE_NAMESPACE, source-uri,
cosign cert-identity-regexp, IMAGE= snippet (5 refs total).
CHANGELOG.md, README.md — anchor links, badges, install one-liner,
cosign verify snippets in operator-facing sections.
api/openapi.yaml — info / externalDocs URLs.
install-agent.sh — GITHUB_REPO const + systemd unit Documentation=
field.
deploy/ENVIRONMENTS.md, deploy/helm/{CHART_SUMMARY,INDEX,
INSTALLATION,README}.md, deploy/helm/certctl/{Chart.yaml,
README.md,values.yaml}, deploy/helm/examples/values-*.yaml —
chart docs + image repository defaults across dev / prod-ha
overrides.
docs/{certctl-for-cert-manager-users,connector-iis,connectors,
migrate-from-acmesh,migrate-from-certbot,quickstart,test-env,
why-certctl}.md — operator-facing doc URLs.
examples/{acme-nginx,acme-wildcard-dns01,multi-issuer,
private-ca-traefik,step-ca-haproxy}/docker-compose.yml +
examples/step-ca-haproxy/step-ca-haproxy.md — example image:
paths and accompanying narrative.
web/src/pages/OnboardingWizard.tsx — first-run-UI URL refs (curl
install one-liners, agent docker image path, doc anchor links).
Files intentionally NOT swept (Choice A from cowork/transfer-certctl-
to-org.md):
go.mod, go.sum — module declaration stays github.com/shankar0123/
certctl. Existing imports compile because Go uses the path
declared in go.mod, not the URL it was fetched from. Internal-
only project; no external Go consumers; rename will land as a
mechanical sed when one materializes.
~250 *.go files — every import remains github.com/shankar0123/
certctl/internal/...
deploy/test/f5-mock-icontrol/go.mod — separate test sub-module;
same Choice A logic; module path stays.
Files intentionally NOT swept (other reasons):
README.md lines 244-245 — Scarf-pixel docker-pull commands.
shankar0123.docker.scarf.sh/... is a Scarf-account hostname
(per-user, not per-repo) and the pixel keeps tracking pulls
against the operator's personal Scarf account. Migrating to a
certctl-io Scarf account is a separate decision (create org
Scarf account → re-create package → update README).
deploy/test/f5-mock-icontrol/f5-mock-icontrol — checked-in
compiled binary with shankar0123/certctl baked into Go build
info via the sub-module path. Out of scope for a URL sweep;
will refresh on the next `make test-integration` rebuild.
Verification:
gofmt: clean (no .go files touched).
go vet ./...: clean (verified at this SHA in 1.3 of the transfer
checklist; no .go changes since).
go build ./...: clean (same).
go test -short on representative packages: green (same).
Diff shape: 30 files, 74 insertions / 74 deletions, net-zero size,
pure URL substitution.
|
||
|
|
b3aad02232 |
chore(README): remove the second Scarf pixel — analytics consolidated to certctl.io
The README has carried two Scarf pixels for some time:
- 89db181e-76e0-45cc-b9c0-790c3dfdfc73 (kept earlier as 'GitHub
traffic complement to GitHub Insights')
- b9379aff-9e5c-4d01-8f2d-9e4ffa09d126 (moved to the certctl.io
landing page in commit
|
||
|
|
6a5cfb3d01 |
chore(README): remove duplicative Scarf pixel — moved to certctl.io
The README had two Scarf pixels (89db181e and b9379aff). For README visit tracking, GitHub's built-in Insights → Traffic dashboard already provides views, uniques, clones, AND referring sites with click counts (Reddit, HN, Twitter, search, etc.) at higher granularity than a Scarf pixel can extract — Scarf can only see 'github.com' as the referrer because that's where the README HTML is served from, while GitHub Insights knows the actual external referrer that landed the visitor on the README. Removing pixel b9379aff-9e5c-4d01-8f2d-9e4ffa09d126 from the README and reusing it on the certctl.io landing page (sibling commit on certctl-io/certctl.io), where Scarf is the only analytics source and the referrer header actually carries useful attribution. Pixel 89db181e-76e0-45cc-b9c0-790c3dfdfc73 stays in the README as a backup signal alongside GitHub Insights — keeps continuity for the longer-running Scarf project counter. No data loss: GitHub Insights covers what 89db181e was double- counting, and b9379aff now serves a distinct surface (certctl.io) where it actually adds new attribution data. |
||
|
|
b95a548f65 |
docs: deploy-hardening I — atomic deploy + post-verify operator guide + connectors / README updates
Phase 12 of the deploy-hardening I master bundle. NEW docs/deployment-atomicity.md (12 sections, ~280 lines): 1. Overview — the three procurement-checklist gaps closed 2. The atomic-write primitive (Plan / File / Apply algorithm) 3. Per-connector atomic contract table (all 13 connectors) 4. Post-deploy TLS verification (handshake + SHA-256 + retries) 5. Rollback semantics (3 triggers + escalation path) 6. ValidateOnly dry-run mode (per-connector matrix) 7. File ownership + mode preservation (precedence + per-distro defaults) 8. Per-target deploy mutex (Phase 2) 9. Idempotency via SHA-256 (defends against retry storms) 10. Troubleshooting matrix (one row per failure mode) 11. V3-Pro deferrals (multi-region, pin manifests, SOC 2 export) 12. Per-connector quick reference (paste-able config snippets) UPDATE README.md::Deployment Targets — every connector row now notes the atomic + verify + rollback semantics that landed in deploy-hardening I. Added a closing paragraph linking to the new docs/deployment-atomicity.md. UPDATE docs/features.md — two new env-var rows: - CERTCTL_DEPLOY_BACKUP_RETENTION (default 3, -1 disables) - CERTCTL_K8S_DEPLOY_KUBELET_SYNC_TIMEOUT (default 60s) The G-3 docs-drift CI guard is satisfied: every new CERTCTL_DEPLOY_* env var documented here also appears in source (internal/deploy/types.go for BACKUP_RETENTION, k8ssecret config for KUBELET_SYNC_TIMEOUT). S-1 stale-counts guard: no literal-number current-state counts in the new doc — the per-connector tests are referenced via the file:line pattern (internal/connector/target/<name>/<name>_atomic_test.go) so the operator can grep for the actual count. Phase 13 next: pre-commit verification (full matrix + CI guard reproductions). |
||
|
|
db4a9b7e69 |
docs(README): expand Standards & Revocation table with production hardening II surfaces
Surfaces the eight items shipped in the post-2026-04-30 production
hardening II bundle on the README's Supported Integrations →
Standards & Revocation table so procurement teams comparing
checklists see them without diving into docs/.
Updates to the existing rows:
- DER-encoded X.509 CRL: now also calls out RFC 7232 caching
headers (ETag + If-None-Match 304 short-circuit)
- Embedded OCSP responder: now also calls out RFC 6960 §4.4.1
nonce echo + the empty/oversized rejection
- S/MIME: spelled out the adaptive KeyUsage delta vs TLS default
- Certificate export: spelled out the cipher (AES-256-CBC PBE2
SHA-256 KDF) + V2 cert-only design rationale
NEW rows:
- CRL DistributionPoints auto-injection (RFC 5280 §4.2.1.13)
- OCSP pre-signed response cache (with the load-bearing
InvalidateOnRevoke wire called out)
- Per-endpoint rate limits (OCSP + cert-export)
- Cert-export typed audit (with cipher pin)
- Prometheus per-area metrics (certctl_ocsp_counter_total)
- Disaster-recovery runbook (docs/disaster-recovery.md, the SOC 2
/ PCI procurement deliverable)
G-3 docs-drift CI guard reproduced clean (every CERTCTL_* env var
mention maps back to internal/config/config.go). S-1 stale-counts
prose guard clean (no literal-number prose for current-state
counts; the rate-limit defaults are config-default values, not
source-derived counts that drift).
|
||
|
|
c98d83f596 |
fix(README): drop hardcoded source-counts from EST row to satisfy S-1 guard
CI's 'Forbidden hardcoded source-count prose regression guard (S-1)'
fired on the new EST row in README.md:109. The trip was on the literal
'6 MCP tools' phrase — that matches the regex pattern
\b[0-9]+\s+MCP tools\b which the S-1 guard rejects per the CLAUDE.md
rule 'Numeric claims about current state rot.'
Same rule covers the '13 typed audit-action codes' literal earlier on
the same line — the regex doesn't catch that one specifically (no
'audit-action codes' alternation in the guard pattern), but the spirit
of the rule applies, so I removed it preemptively to avoid the next
operator-reads-the-doc-then-edits-the-code-then-the-count-is-wrong
drift cycle.
Replacements:
'13 typed audit-action codes (...)' →
'Typed audit-action codes per failure dimension (... — full set in
internal/service/est_audit_actions.go)'
'CLI + 6 MCP tools' →
'CLI + matching MCP tool family (rebuild count via
grep -cE '"est_' internal/mcp/tools_est.go)'
The rebuild-command form follows the convention CLAUDE.md::Current-state
commands established + the existing docs/features.md row
'MCP tools | rebuild via grep -cE 'gomcp\.AddTool\(' ...'
Verified locally with the exact CI guard regex against README.md +
docs/ — 'S-1 stale-counts guardrail: clean.'
The 'All six RFC 7030 endpoints' phrasing earlier on the same line
is NOT a current-state count — six is fixed by RFC 7030 (cacerts +
simpleenroll + simplereenroll + csrattrs + serverkeygen + fullcmc),
not derived from source. The S-1 regex requires \b[0-9]+ literal
digits, so 'six' as a word doesn't match anyway.
|
||
|
|
6622883989 |
docs(est): EST RFC 7030 operator guide + WiFi/802.1X recipe + IoT bootstrap recipe + FreeRADIUS integration + architecture + README
EST RFC 7030 hardening master bundle Phase 12 — comprehensive operator- facing documentation for the Phases 1-11 backend work that shipped on 2026-04-29. NEW docs/est.md (19 sections, ~810 lines): Concepts (host vs user enrollment, profile-driven policy, multi-profile dispatch); 5-minute single-profile Quick start with curl + openssl recipes; Multi-profile dispatch (CERTCTL_EST_PROFILES=corp,iot,wifi setup with PathID rules enforced at boot); Authentication modes (mTLS / Basic / both / empty with cross-check semantics); RFC 9266 channel binding (failure-mode HTTP mapping table — ErrChannelBindingMissing/Mismatch/NotTLS13 → 400/409/426); WiFi/802.1X recipe with end-to-end FreeRADIUS integration (EAP-TLS supplicant config, mods-available/eap tls-common block, CRL distribution endpoint cross-ref, troubleshooting playbook); IoT bootstrap recipe (factory provisioning, first boot, steady-state renewal, compromise/decommission via bulk-revoke, recommended cert lifetimes per master prompt §7.7); serverkeygen for resource-constrained devices (CMS EnvelopedData wrap, RSA-only at this revision, zeroize discipline, Phase-1 cross-check refusing _SERVERKEYGEN_ENABLED=true with empty _PROFILE_ID); HSM-backed CA signing for EST cross-ref (signer interface seam); Operator GUI tabbed surface tour (/est: Profiles / Recent Activity / Trust Bundle); CLI + 6 MCP tools; Renewal device-driven model (RFC 7030 §4.2.2 mandate, renewal-trigger ratios for laptops/IoT, operator-push via webhook); Troubleshooting matrix (one row per typed audit-action constant in internal/service/est_audit_actions.go); TLS 1.2 reverse-proxy runbook cross-ref (channel-binding caveat explained); Threat model (load-bearing properties: trust-anchor reload fail-safety, per-profile counter isolation, mTLS cross-profile bleed defense, source-IP limiter process-locality, server-keygen heap residency, HTTP Basic in-process-only, legacy-anonymous-default back-compat carve-out); V3-Pro deferrals; Appendix A (libest sidecar reproducer + 5 integration test names); Appendix B (Cisco IOS 15.x + 16.x + Apple MDM + OpenWRT + libest <v3.0 wire-format quirks tested in internal/api/handler/cisco_ios_quirks_test.go). UPDATED docs/architecture.md: new "EST Server (RFC 7030) — Production Deployment" section under the existing baseline EST section. Mermaid diagram of multi-profile dispatch + mTLS sibling route + per-profile gate ordering + audit + GUI + SIGHUP-equivalent reload. Existing authentication paragraph updated with forward-ref to the hardening section. Audit paragraph updated to enumerate the 13 typed est_* action codes operators grep on. Trust-anchor reload semantics + libest interop tested in CI both called out. UPDATED README.md::Enrollment Protocols: replaced the one-line EST row with the full production-grade surface description matching the SCEP analog. Cross-references docs/est.md. UPDATED docs/connectors.md::EST/SCEP Integration: extended the EST-or-SCEP shared paragraph to point at the per-profile env-var form for both protocols + linked the new architecture.md section. NEW "Multi-profile EST dispatch + production hardening" subsection mirrors the SCEP equivalent: 9-row env-var table, cross-ref to docs/est.md. G-3 docs-drift CI guard reproduced locally clean — every CERTCTL_EST_* mention in docs maps back to internal/config/config.go, and every defined env var is documented. The `<NAME>` placeholder convention matches the SCEP idiom so the docs grep doesn't extract per-deploy profile names as phantom env vars. No new env vars introduced — this is a pure docs commit. |
||
|
|
0be889ff1d |
refactor(scep-gui): rebrand SCEP admin surface to per-profile tabbed interface (Profiles + Intune + Recent Activity)
Phase 9 follow-up to the SCEP RFC 8894 + Intune master bundle. The
Phase 9.4 GUI shipped 'SCEP Intune Monitoring' at /scep/intune, which
made the per-profile observability surface look Intune-only — operators
running EJBCA + Jamf would never click that nav link expecting per-
profile RA cert + mTLS observability. The page is per-profile keyed
under the hood; this commit rebrands + restructures so the surface
matches what operators actually need.
Spec: cowork/scep-gui-restructure-prompt.md.
User-visible change:
- Nav link renamed: 'SCEP Intune' → 'SCEP Admin'.
- Route: /scep is the new canonical path; /scep/intune kept as a
backward-compat alias that lands directly on the Intune tab.
- Page header: 'SCEP Administration'.
- Three tabs:
* Profiles (default) — per-profile lean cards with RA cert
expiry countdown, mTLS sibling-route status badge, Intune
enabled/disabled badge, challenge-password-set indicator.
'View Intune details →' link on Intune-enabled cards
deep-links into the Intune tab.
* Intune Monitoring — the existing Phase 9.4 deep-dive
(per-status counters, trust anchor expiry, recent failures
table, reload-trust button + confirmation modal).
* Recent Activity — full SCEP audit log filter merging all
four action codes (scep_pkcsreq + scep_renewalreq +
scep_pkcsreq_intune + scep_renewalreq_intune); chip filters
for All / Initial / Renewal / Intune / Static.
Backend:
* internal/service/scep.go — new SCEPProfileStatsSnapshot type +
IntuneSection sub-block + ProfileStats(now) accessor. Adds
raCertSubject/raCertNotBefore/raCertNotAfter + mtlsEnabled +
mtlsTrustBundlePath fields with SetRACert + SetMTLSConfig setters.
Existing IntuneStatsSnapshot + IntuneStats(now) preserved
UNCHANGED for /admin/scep/intune/stats backward compat (the
JSON shape stays byte-stable for external consumers — the
aliasing approach the prompt initially suggested doesn't work
because the new shape nests Intune while the old one is flat).
ChallengePasswordSet is derived from challengePassword != ''
(the secret value itself is never surfaced).
* internal/api/handler/admin_scep_intune.go — new Profiles handler
method on AdminSCEPIntuneHandler with the same M-008 admin gate.
AdminSCEPIntuneServiceImpl extended (in place; same
map[string]*service.SCEPService) to satisfy the new
AdminSCEPProfileService interface. Single handler file gets the
third method so the M-008 pin entry count stays steady (no new
file, no new triplet of admin-gate test files — just three new
Profiles tests inside the existing test file).
* internal/api/router/router.go — one new route
'GET /api/v1/admin/scep/profiles' registered to
reg.AdminSCEPIntune.Profiles. HandlerRegistry unchanged.
* api/openapi.yaml — new operation 'listSCEPProfiles' documenting
the request body / response shape / error mapping. Existing
Intune entries unchanged.
* cmd/server/main.go — per-profile loop now calls
scepService.SetMTLSConfig(profile.MTLSEnabled,
profile.MTLSClientCATrustBundlePath) right after SetPathID, and
scepService.SetRACert(raCert) right after loadSCEPRAPair returns
the leaf cert. Both setters are nil-safe.
* internal/api/handler/m008_admin_gate_test.go — extended the
existing admin_scep_intune.go entry's justification to mention
the third endpoint. No new map entry needed (file already
listed).
Backend tests (8 new):
* TestAdminSCEPProfiles_NonAdmin_Returns403
* TestAdminSCEPProfiles_AdminExplicitFalse_Returns403
* TestAdminSCEPProfiles_AdminPermitted_ForwardsActor — also pins
that Intune-enabled profiles emit an 'intune' sub-block while
Intune-disabled profiles OMIT it.
* TestAdminSCEPProfiles_RejectsNonGetMethod
* TestAdminSCEPProfiles_PropagatesServiceError
* TestAdminSCEPProfilesServiceImpl_NilMapReturnsEmpty
* (existing 16 Phase 9 admin tests still pass — backward-compat
preserved)
Frontend:
* web/src/api/types.ts — new SCEPProfileStatsSnapshot +
IntuneSection + SCEPProfilesResponse types. Existing
IntuneStatsSnapshot et al unchanged.
* web/src/api/client.ts — new getAdminSCEPProfiles helper.
* web/src/pages/SCEPAdminPage.tsx — full rewrite as the tabbed
surface. Reuses the existing ConfirmReloadModal and Intune
deep-dive card components verbatim; adds ProfileSummaryCard
(lean card for the Profiles tab) and ActivityTab. URL state
sync via useSearchParams so deep links survive reloads + browser
back/forward. The legacy /scep/intune route alias defaults the
activeTab to 'intune' on mount.
* web/src/main.tsx — new <Route path='scep' /> + preserved
<Route path='scep/intune' /> alias. Both render SCEPAdminPage.
* web/src/components/Layout.tsx — nav link rebranded:
label 'SCEP Intune' → 'SCEP Admin', to '/scep/intune' → '/scep'.
Frontend tests (20 — full rebuild):
* Admin gate (non-admin sees gated banner + zero admin API calls)
* Profiles tab default + Intune tab tabswitch + ?tab=intune deep
link + legacy /scep/intune alias all land on Intune
* Profiles tab status badges (Intune + mTLS + challenge-set)
reflect each profile's flags
* RA cert expiry tone bands (good ≥30d / warn 7-30d / bad <7d /
EXPIRED) verified across three fixture profiles
* 'View Intune details →' only renders for Intune-enabled
profiles AND switches tabs on click
* Empty-state banner when no profiles configured
* Intune tab counters render with the existing Phase 9 deep-dive
shape; reload modal Open/Confirm/Cancel/Error paths all pinned
* Recent Activity tab merges all four SCEP audit actions across
four parallel useQuery calls; filter chips
(all/initial/renewal/intune/static) narrow correctly
* Error path surfaces ErrorState on the active tab
Docs:
* docs/scep-intune.md — Operational monitoring section heading
expanded to '(SCEP Administration → Intune Monitoring tab)'.
Page-surface description rewritten for the tabbed shape;
admin-endpoints list extended with the new /admin/scep/profiles
entry.
* docs/architecture.md — Microsoft Intune Connector trust anchor
subsection updated to reference the Intune Monitoring tab inside
the SCEP Administration page + lists all three admin endpoints.
* docs/legacy-est-scep.md — forward-ref expanded with a parallel
sentence for the per-profile observability surface (independent
of Intune).
* README.md — Enrollment Protocols bullet for Intune updated to
'admin GUI SCEP Administration page at /scep' with the three
tabs called out.
Verification:
* gofmt clean on touched files
* go vet ./... clean
* staticcheck on intune+service+handler+router+cmd-server clean
* go test -short across intune+service+handler+router+cmd-server:
all green (existing Phase 9 tests + new Profiles tests)
* Frontend tsc --noEmit clean
* Vitest: 20/20 SCEPAdminPage tests + 3/3 sibling AuditPage tests
pass
* G-3 docs-drift CI guard reproduced locally: clean (no new env
vars; existing CERTCTL_SCEP_ allowlist prefix covers everything)
* M-009 hard-zero useMutation guard reproduced locally: clean
(the existing reload mutation already used useTrackedMutation
from the Phase 9 follow-up commit
|
||
|
|
5d080c86fd |
docs(scep-intune): deployment guide + troubleshooting + Microsoft support statement
Phase 11 of the SCEP RFC 8894 + Intune master bundle.
Phase 11.1 — docs/scep-intune.md (new, ~340 lines):
* TL;DR — drop-in NDES replacement framing; what an operator gets
over NDES (per-profile endpoints, audit-log forensics, SIGHUP
reload, GUI monitoring, per-device rate limit).
* Architecture diagram — Intune cloud → Connector → certctl SCEP
→ issuer connector. Explicit 'certctl replaces NDES, NOT the
Connector' framing; nine-gate dispatcher walk (shape pre-check,
JWS sig, version dispatch, time bounds, audience pin, CSR binding,
replay, per-device rate limit, optional compliance).
* Migration playbook (NDES + EJBCA / NDES + ADCS) — 9-step run-book:
install alongside, configure per-profile endpoint, extract trust
anchor, configure CONNECTOR_CERT_PATH + AUDIENCE, configure
issuer connector, migrate one profile, verify enrollment, roll
out fleet, decommission NDES.
* Intune SCEP profile field mapping table — every Intune admin
center field mapped to certctl's behavior (cert type, subject
name format, SAN, validity, key storage provider, key usage,
EKU, hash algorithm, SCEP server URL).
* Trust anchor extraction recipe — step-by-step certlm.msc export
of the 'CN=Microsoft Intune Certificate Connector' cert, PEM
rename, env-var configuration, HA Connector concatenation, SIGHUP
rotation flow.
* Troubleshooting matrix — 10 failure modes mapped to root causes
and operator actions: signature_invalid (trust anchor stale),
claim_mismatch (Intune profile SAN config), expired (clock skew /
Connector cert past NotAfter), not_yet_valid (reverse skew),
wrong_audience (URL mismatch), replay (retry-window collision),
rate_limited (limiter doing its job), unknown_version (Microsoft
shipped new format), malformed (proxy mangling body),
compliance_failed (V3-Pro hook returned non-compliant).
* Operational monitoring — admin GUI surface description, expiry
badge tone bands (≥30d green / 7-30d amber / <7d red / EXPIRED),
per-status counter polling cadence, audit log filter, recommended
Prometheus alert thresholds.
* Limitations — explicit V3-Pro deferrals: native Microsoft Graph
integration, Conditional Access compliance gating, per-tenant
trust anchors (MSP scoping), OCSP stapling at SCEP-response time,
auto-discovery of Connector signing cert.
* Microsoft support statement — three Microsoft Learn URLs (verified
live with HTTP 200): Connector overview, SCEP profile setup,
Connector install validation. Microsoft documents the Connector
as RFC-8894-compliant and supports its use against any RFC 8894
SCEP server.
Phase 11.2 — Cross-references:
* docs/legacy-est-scep.md — the previous forward-ref pointed at
'the Phase 11 doc this bundle ships'; updated to a richer pointer
that lists what scep-intune.md covers (architecture, migration,
profile mapping, extraction, troubleshooting, monitoring,
limitations, Microsoft support).
* README.md — new bullet under Enrollment Protocols table:
'Microsoft Intune SCEP fleet (drop-in NDES replacement)' with
the per-profile dispatcher feature list + link to scep-intune.md.
Procurement teams scanning the README see the Intune story
alongside ChromeOS / Jamf in the same table row.
* docs/architecture.md — new 'Microsoft Intune Connector trust
anchor (per-profile, opt-in)' subsection in the Security Model
section. ASCII diagram showing the dispatcher walk; calls out
the SIGHUP reload + admin-gated GUI surface; forward-link to
scep-intune.md.
Verification:
* All linked anchors inside scep-intune.md resolve to existing
headings: #limitations, #microsoft-support-statement,
#operational-monitoring, #trust-anchor-extraction.
* All linked doc paths resolve: legacy-est-scep.md, architecture.md,
features.md, tls.md.
* All three Microsoft Learn URLs return HTTP 200 (verified via curl).
* G-3 docs-drift CI guard reproduced locally and clean — the
migration playbook uses the <NAME> placeholder convention
consistently (matching features.md style) so the docs scanner
doesn't extract literal env-var names that aren't in config.go.
* Backend tests across intune+handler+service+router still green.
Refs: cowork/scep-rfc8894-intune-master-prompt.md::Phase 11
cowork/scep-rfc8894-intune/progress.md
|
||
|
|
23603f5174 |
docs(scep): RFC 8894 hardening — README + architecture + connectors
SCEP RFC 8894 + Intune master bundle — Phase 6 of 14.
Closes Half 1 of the bundle (Phases 0-6). The certctl SCEP server now
ships full RFC 8894 wire format (EnvelopedData decrypt + signerInfo POPO
verify + CertRep PKIMessage builder), tested against ChromeOS-shape
hermetic E2E requests, with multi-profile dispatch and must-staple
per-profile policy. Half 2 (Phases 7-12) adds the Microsoft Intune
dynamic-challenge layer; Phase 6.5 (mTLS sibling route) is independently
shippable as an opt-in enterprise-procurement feature.
README.md
* Standards & Revocation table SCEP row updated to mention full RFC
8894 wire format (EnvelopedData decryption, signerInfo POPO
verification, CertRep PKIMessage builder), PKCSReq + RenewalReq +
GetCertInitial messageType dispatch, multi-profile dispatch
(/scep/<pathID>), per-profile RA cert + key, MVP fall-through for
lightweight clients.
* Enrollment protocols paragraph extended with the same scope, plus
a link to docs/legacy-est-scep.md for the operator + device-
integration guide.
docs/architecture.md
* SCEP wire format paragraph rewritten to describe the two paths
(RFC 8894 first, MVP fall-through), the messageType dispatch
table, the EnvelopedData decrypt (constant-time PKCS#7 unpad
closing the padding-oracle leg), the SET-OF Attribute
re-serialisation quirk per RFC 5652 §5.4, and the CertRep
PKIMessage shape (cert chain encrypted to req.SignerCert, NOT
the RA cert).
* SCEP service interface updated to show the three new
*WithEnvelope variants alongside the legacy PKCSReq method.
* Added 'Capabilities advertised', 'Multi-profile dispatch', and
'Must-staple per profile' subsections covering the RFC 7633
extension policy.
docs/connectors.md
* EST/SCEP Integration section extended with the per-profile
issuer-binding env-var form (CERTCTL_SCEP_PROFILE_<NAME>_ISSUER_ID).
* New SCEP RA cert + key paragraph pointing operators at the
legacy-est-scep.md openssl recipe + ChromeOS Admin Console
pointer + must-staple per-profile policy.
cowork/CLAUDE.md::Active Focus
* 2026-04-29 SCEP RFC 8894 + Intune master bundle status updated
to 'HALF 1 COMPLETE (Phases 0-5 of 14 SHIPPED)' with the full
chain of commit SHAs (
|
||
|
|
2519da85f0 |
docs: README + concepts + features reflect CRL/OCSP responder bundle
Audit pass against cowork/crl-ocsp-responder-prompt.md found three
operator-facing docs still describing the pre-bundle CRL/OCSP surface
(GET-only OCSP, CA-key-direct signing, no scheduler-driven cache). Each
claim updated below was ground-truthed against repo HEAD before edit.
README.md
* Standards & Revocation table — CRL row now mentions
scheduler-pre-generated cache (CERTCTL_CRL_GENERATION_INTERVAL,
crl_cache table); OCSP row mentions GET + POST forms, dedicated
responder cert per RFC 6960 §2.6, id-pkix-ocsp-nocheck per
§4.2.2.2.1, 7d auto-rotation grace.
* Revocation paragraph — corrected the 'Embedded OCSP responder'
one-liner to call out the dedicated-responder-cert design (the CA
private key is never used directly for OCSP signing, which is the
load-bearing security property for the future PKCS#11/HSM driver
path) and added the link to the relying-party guide.
docs/concepts.md
* CRL paragraph — added the scheduler pre-generation + singleflight
coalescing detail. Kept the existing 24h validity claim (verified
against internal/connector/issuer/local/local.go:956 — 'NextUpdate:
now.Add(24 * time.Hour)').
* OCSP paragraph — corrected the description so it covers both GET
and POST forms (POST per RFC 6960 §A.1.1 is what production
clients use: Firefox, OpenSSL s_client -status, cert-manager,
Intune); added the dedicated-responder-cert + nocheck-extension +
auto-rotation explanation; cross-link to docs/crl-ocsp.md.
docs/features.md
* Revocation Infrastructure section — CRL Endpoint, OCSP Responder,
new Admin Cache Observability subsection, new GUI Revocation
Endpoints Panel subsection. Corrected the previously-wrong 'Signs
with the issuing CA key' OCSP claim — the bundle's load-bearing
security improvement is exactly that the CA key is NOT used
directly. Cross-link to crl-ocsp.md.
* Local CA env vars table — added all four new
CERTCTL_CRL_GENERATION_INTERVAL / CERTCTL_OCSP_RESPONDER_KEY_DIR
(with the prod 'MUST set' callout) / _ROTATION_GRACE / _VALIDITY
rows. Closes the G-3 'env var defined in Go but never documented'
drift that broke CI on commit
|
||
|
|
e720474fb7 |
Bundle D: Documentation & transparency sweep — 8 findings closed
Closes H-009 + L-001 + L-007 + L-008 + L-016 + L-017 + L-018 + M-027
from comprehensive-audit-2026-04-25.
H-009 — README JWT verified-already-clean
README has zero JWT mentions at audit time. docs/architecture.md
correctly documents JWT/OIDC integration via authenticating-gateway
pattern (line 905-912).
.github/workflows/ci.yml: new step
'Forbidden README JWT advertising regression guard (H-009)'
greps README for JWT-as-supported phrasing; passes verbatim
(gateway / pre-G-1) but fails build on net-new advertising.
L-001 (CWE-295) — InsecureSkipVerify per-site justification
Audit count was 8; recon found 13 production sites.
docs/tls.md: new 'InsecureSkipVerify justifications' table
enumerates each site by file:line with per-site rationale.
cmd/agent/verify.go:78, internal/tlsprobe/probe.go:54,
internal/service/network_scan.go:460: each previously-bare
InsecureSkipVerify: true now carries //nolint:gosec.
.github/workflows/ci.yml: new step
'Forbidden bare InsecureSkipVerify regression guard (L-001)'
fails build if any net-new ISV lands in non-test .go without
nolint:gosec on the same or preceding line.
L-007 — README dependency-audit commands
README.md: new Dependencies section with go list -m all | wc -l,
go mod why, govulncheck ./.... Honors operating-rules invariant.
L-008 — Release-time govulncheck gate
.github/workflows/release.yml: new 'Install govulncheck' +
'Run govulncheck (release gate)' steps in the matrix job.
Pinned to same install path as ci.yml. Default exit code
semantics (fail on called-vuln only, deferred-call advisories
tracked on master via L-021) keeps the gate appropriate.
L-016 — architecture.md drift fixes
docs/architecture.md: system-components diagram's '21 tables'
annotation removed (current 23; replaced with TEXT-keys
descriptor); connector-architecture '9 connectors' prose
replaced with grep ref + current 12-issuer list (added
Entrust/GlobalSign/EJBCA which were missing); API-design
'97 operations / 107 total' replaced with grep commands.
Connector subgraphs verified-current at 12/13/6.
L-017 — workspace CLAUDE.md verified-already-clean
Bundle B's pre-commit-gate refactor already converted current-
state numeric claims to grep commands. Phase 0 recon confirmed
zero remaining hardcoded counts.
L-018 — Defect age table
cowork/comprehensive-audit-2026-04-25/defect-age.md (NEW):
Tabulates all 9 High findings with first-mentioned commit,
closing bundle, days-open. Methodology snippet for re-running.
Key finding: 8 of 9 closed within 24h of audit publication.
M-027 — OpenAPI parity verified-already-clean
Audit's 'router 121 vs OpenAPI 125 — 4-op gap' was wrong
methodology. The 4-op 'gap' was exactly the 4 routes registered
via r.mux.Handle (auth-exempt allowlist) instead of r.Register.
When you count both dispatch shapes the totals match exactly.
internal/api/router/openapi_parity_test.go (NEW):
TestRouter_OpenAPIParity AST-walks router.go for both
Register and mux.Handle calls + walks api/openapi.yaml's
path/method nesting + asserts the sets match. Adding a route
without updating the spec fails CI permanently.
Audit deliverables:
audit-report.md: score 38/55 -> 46/55 closed
(High 7/9 -> 8/9; Medium 20/27 -> 21/27; Low 8/19 -> 14/19)
findings.yaml: 8 status flips open -> closed
defect-age.md: new file
certctl/CHANGELOG.md: Bundle D section
Verification:
TestRouter_OpenAPIParity PASS
L-001 grep guard self-test (after //nolint:gosec adds) PASS
H-009 grep guard self-test PASS
go test -count=1 -short on changed packages green
|
||
|
|
45ba27693b | Update LICENSE metadata | ||
|
|
52248be717 |
v2.0.47: HTTPS Everywhere — TLS-only control plane, agents/CLI/MCP
Breaking change release. Plaintext HTTP listener removed. The certctl control plane now terminates TLS 1.3 on :8443 via http.Server.ListenAndServeTLS. No CERTCTL_TLS_ENABLED=false escape hatch. No dual-listener mode. One-step cutover per docs/upgrade-to-tls.md. Server - cmd/server/tls.go: certHolder with SIGHUP hot-reload + atomic cert swap, buildServerTLSConfig (TLS 1.3 min, GetCertificate callback), preflightServerTLS validation - cmd/server/main.go: ListenAndServeTLS in place of ListenAndServe, watchSIGHUP wiring, cert/key path config threading - tls_test.go: 418-line regression coverage of reload, preflight, callback behavior, SAN validation Config - CERTCTL_TLS_CERT_PATH / CERTCTL_TLS_KEY_PATH (required) - Plaintext rejection: agents/CLI/MCP pre-flight-fail on http:// URLs with a pointer to docs/upgrade-to-tls.md Agents, CLI, MCP - All three pre-flight-reject http:// URLs with fail-loud diagnostic - CERTCTL_SERVER_CA_BUNDLE_PATH for private-CA trust - CERTCTL_SERVER_TLS_INSECURE_SKIP_VERIFY for dev-only bypass (loud warning on startup) - install-agent.sh emits both vars as commented template lines docker-compose - certctl-tls-init sidecar generates SAN-valid self-signed cert into deploy/test/certs/ on first boot - All demo-stack curls pin against ca.crt with --cacert Helm chart - Three TLS provisioning modes, exactly one required: - server.tls.existingSecret (operator-supplied) - server.tls.certManager.enabled (cert-manager integration) - server.tls.selfSigned.enabled (eval only — not for production) - server-certificate.yaml template for cert-manager mode - helm install without a TLS source fails at template render with a pointer to docs/tls.md CI - .github/workflows/ci.yml Helm Chart Validation step renders the chart in both existingSecret and cert-manager modes, plus an inverse guard-regression test that asserts helm template MUST refuse to render when no TLS source is configured. Previously the single `helm template` invocation hit the certctl.tls.required fail-loud guard and exit-1'd CI. Four invocations now: lint (existingSecret), template (existingSecret), template (cert-manager), template (no args — must fail). Integration tests - deploy/test/integration_test.go stands up the Compose stack over HTTPS, extracts the CA bundle, and exercises every certctl API over https://localhost:8443 - All 34 integration subtests green (per Phase 8 local CI-parity) Documentation - New: docs/tls.md (provisioning patterns, rotation, SIGHUP reload) - New: docs/upgrade-to-tls.md (one-step cutover, no-downgrade warnings, fleet-roll sequencing) - CHANGELOG.md: v2.2.0 "HTTPS Everywhere — The Irony" entry (file heading unchanged; release tag is v2.0.47) - All curls in docs/, examples/, deploy/helm/ guides use https://localhost:8443 --cacert Verification - grep -rn "ListenAndServe[^T]" cmd/ internal/ → 0 hits - grep -rn "\"http://" cmd/ internal/ → 2 benign hits (Caddy admin API default, SSRF doc comment) — zero certctl endpoints - Tasks #197–#206 (Phases 0–8) all closed in the tracker Files: 65 changed, 3489 insertions, 372 deletions (pre-CI-fix). |
||
|
|
cb308bb4c7 |
ci(release): migrate cosign sign-blob to --bundle (cosign v3.0)
Cosign v3.0 (shipped by default with sigstore/cosign-installer@cad07c2e, release v3.0.5) removed --output-signature and --output-certificate from the sign-blob subcommand. The replacement is a single --bundle flag that emits a unified Sigstore bundle (.sigstore.json) containing the signature, certificate chain, and Rekor inclusion proof in one file. This change migrates both sign-blob invocations in .github/workflows/ release.yml (per-binary matrix signing and aggregate checksums.txt signing), updates the artefact upload paths, the artefact aggregation case filter, the GitHub Release asset list, and the release-notes body verify-blob example. The README cosign verification snippet and sidecar description are also updated to the --bundle / .sigstore.json shape. No cosign version pinning. No legacy fallback. OCI image signing (cosign sign on image digest) is unchanged — only sign-blob flags changed in v3.0. See M-11 in certctl-audit-report.md. Verification gates: - YAML parse: OK - go vet ./...: exit 0 - go build ./...: exit 0 - grep 'cosign sign-blob' release.yml: 2 (expected: 2) - grep '.sigstore.json' release.yml: 9 (expected: >=5) - grep '.sig/.pem' release.yml non-comment: 0 (expected: 0) - README legacy cosign refs: 0 (expected: 0) - docs/ legacy cosign refs: 0 (expected: 0) Coverage: unchanged (CI workflow edit + README — zero Go code touched). |
||
|
|
b1df6dab27 | ci(release): add CLI/MCP binaries, checksums, SBOM, Cosign, SLSA provenance (M-3) | ||
|
|
e9947dc0fe |
docs: redact V3 feature specifics from README (fixes H-7)
Problem
-------
H-7 (CWE-200 / information disclosure, strategic-policy class): the
public README's V3 section enumerated the paid-tier feature set --
"Role-based access control with profile-gating", "Event-driven
architecture with real-time operational views", "Advanced search",
"compliance scoring", "HSM/TPM integration" -- violating the
CLAUDE.md directive "Keep V3+ deliberately vague -- one-liner
descriptions only. Don't telegraph the paid feature set." The prior
wording also carried factual drift: `compliance scoring` was pulled
forward to V2.2 per the V2.2 Roadmap, so pairing it with V3 in the
README misrepresented the open-core line.
Fix
---
Replace the two-sentence enumeration at README.md:322-323 with a
single deliberately-vague sentence:
Enterprise capabilities for larger deployments are available in
the commercial tier.
No named features. No SKU enumeration. Matches the policy one-liner
shape used in neighboring V1 / V2 / V4+ sections. Net -1 line of
prose.
Files
-----
README.md 1 -, 1 +
Wire-format invariants preserved
--------------------------------
This is a docs-only change. All protocol surfaces are byte-identical:
- RFC 7030 EST handler (internal/api/handler/est.go) -- untouched
- RFC 8894 SCEP handler (internal/api/handler/scep.go) -- untouched
- Shared internal/pkcs7/ package -- untouched
- H-1 revocation composite key (migration 000012) -- untouched
- H-2 SCEP challenge-password preflight + PKCSReq guard -- untouched
- C-2 AES-256-GCM config encryption contract -- untouched
- CRL DER bytes, OCSP response bytes -- untouched
Verification
------------
git diff
|
||
|
|
1c7d085f16 |
docs: move maintenance notice and quick start link above Documentation section
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
13cd4d98ba |
feat(V2.2): bulk revocation — filter-based fleet-wide certificate revocation
Add POST /api/v1/certificates/bulk-revoke with filter criteria (profile_id, owner_id, agent_id, issuer_id, team_id, certificate_ids), partial-failure tolerance, and audit trail. Includes MCP tool, CLI command (certs bulk-revoke), server-side bulk modal in GUI replacing client-side sequential loop, OpenAPI spec, compliance mapping updates, and 21 new tests (12 service, 7 handler, 1 CLI, 1 frontend). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e1bcde4cf1 |
feat(M50): cloud secret manager discovery — AWS SM, Azure KV, GCP SM
Extend certificate discovery from filesystem + network to cloud secret managers. Three pluggable DiscoverySource connectors feed into the existing discovery pipeline via sentinel agent pattern, with a 9th scheduler loop for periodic cloud scanning. - AWS Secrets Manager: aws-sdk-go-v2, tag/prefix filtering, 10 tests - Azure Key Vault: stdlib HTTP + OAuth2, base64 DER/PEM, 16 tests - GCP Secret Manager: stdlib HTTP + JWT OAuth2, label filter, 14 tests - CloudDiscoveryService orchestrator with 9 tests - 9th scheduler loop (6h default, atomic.Bool idempotency) - Discovery page: color-coded source type badges - 14 new env vars across CloudDiscoveryConfig structs - Docs: connectors.md, architecture.md, features.md, README updated 49 new tests. All CI checks pass (go vet, race, lint, coverage). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3f619bcaac |
feat(M49): Entrust, GlobalSign & EJBCA issuer connectors
Add three new issuer connectors completing commercial and open-source CA coverage. Entrust uses mTLS client certificate auth with sync/async issuance. GlobalSign Atlas uses mTLS + API key/secret dual auth with serial-based tracking. EJBCA supports dual auth (mTLS or OAuth2) for self-hosted Keyfactor CAs. Each connector implements the full issuer.Connector interface (9 methods), includes httptest-based unit tests (~14 each), and follows established patterns (injectable HTTP clients, RFC 5280 revocation reason mapping, CRL/OCSP delegated to CA). Also includes: issuer factory cases, env var seeding, config structs, domain types, seed data (3 rows, all disabled), OpenAPI enum updates, frontend issuer catalog entries with config fields, and full docs (connectors.md, architecture.md, features.md, README). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
596d86a206 |
feat(M48): continuous TLS health monitoring — endpoint state machine, shared tlsprobe, 8 API endpoints, GUI
Adds continuous TLS endpoint health monitoring that closes the deploy→verify→monitor loop. After M25 verifies a deployment succeeded once, M48 continuously confirms it stays healthy. Key components: - Shared `internal/tlsprobe/` package extracted from network scanner for reuse - Health status state machine: healthy → degraded (2 failures) → down (5 failures), plus cert_mismatch when served fingerprint differs from expected - 8th scheduler loop (60s tick, per-endpoint configurable intervals) - PostgreSQL migration 000011: endpoint_health_checks + endpoint_health_history tables - 8 REST API endpoints (CRUD, history, acknowledge, summary) - Health Monitor GUI page with summary bar, status table, create modal, auto-refresh - 38 new tests (5 tlsprobe + 11 domain + 10 service + 8 handler + 4 frontend) - All coverage thresholds maintained (service 68%, handler 83%, domain 87%, middleware 63%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f2e60b93a3 |
feat(M11c): crypto policy enforcement — CSR validation, MaxTTL caps, key metadata
Enforce certificate profile crypto constraints across all 5 issuance paths (renewal, agent CSR, EST, SCEP). ValidateCSRAgainstProfile() rejects CSRs with key algorithm/size that don't match profile rules. MaxTTL enforcement caps certificate validity per issuer connector (Local CA, Vault, step-ca enforce directly; ACME/DigiCert/Sectigo pass through). Key algorithm and size are now persisted in certificate_versions for audit compliance. 16 new tests (12 service-layer + 4 Local CA connector). Removes hardcoded version number from GUI sidebar. Documentation updated across architecture, features, connectors, and README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
f16a9c767a |
docs: consolidate README — merge architecture, security, design decisions into Why certctl
Fold Architecture, Key Design Decisions, and Security sections into the Why certctl section as bold-header paragraphs. Removes three standalone sections, tightening the README structure: Documentation → Integrations → Why certctl (with architecture, security, design decisions) → What It Does → Quick Start → Examples → CLI → MCP → Development → Roadmap → License. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3a27c87b3f |
docs: move Supported Integrations under Documentation links in README
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |