break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)
Phase 7 — OIDC first-admin bootstrap (Decision 3):
- Optional AdminBootstrapHook closure on *oidc.Service. When wired,
HandleCallback consults the hook AFTER group resolution + user
upsert and BEFORE the empty-mapping fail-closed check. Hook
receives (providerID, groups, userID); returns grantAdmin=true
when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
admin exists yet in the tenant.
- cmd/server/main.go wires the hook as a closure that:
* Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
* Probes AdminExists via authActorRoleRepo (admin-already-exists
silently returns false; bootstrap mode is one-shot per tenant).
* Walks group intersection.
* On match: grants r-admin via authActorRoleRepo.Grant + emits
the bootstrap.oidc_first_admin audit row with
event_category=auth + INFO log.
- Coexists with the Bundle 1 env-var-token bootstrap. Both paths
can be configured; first match wins (admin-existence probe
short-circuits the second).
- HandleCallback's empty-mapping fail-closed check moved AFTER the
hook so a fresh deployment with zero group_role_mappings can
still mint the first admin.
- 5 tests in service_test.go: hook grants admin on match, hook
returns false preserves empty-mapping fail-closed, admin-already-
exists silently falls through to normal mapping, hook-error wraps
+ bubbles, idempotent when admin is already in the mapped role set.
Phase 7.5 — Break-glass admin (Decision 4, default-OFF):
Migration 000038 ships:
- breakglass_credentials table — at-most-one-credential-per-actor
(UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
state machine (failure_count, locked_until, last_failure_at).
FK CASCADE on users(id) so deleting a user atomically removes
their credential.
- Two new permissions seeded into r-admin only:
auth.breakglass.admin — set/rotate/unlock/remove credentials.
auth.breakglass.login — actor uses break-glass to log in.
CanonicalPermissions extended in lockstep.
internal/auth/breakglass/service.go (~580 LOC):
- Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
- SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
salt=16 random bytes, output=32 bytes); per-password random salt;
PHC-format hash output. Min 12 / max 256 byte input.
- Authenticate: constant-time-compare via subtle.ConstantTimeCompare
on every code path. Identical 401 + identical timing across the
wrong-password / locked-account / non-existent-actor paths so an
attacker cannot probe whether a given actor has break-glass
configured. Non-existent-actor + locked-account paths run a
verifyDummy() Argon2id pass for timing parity. Lockout state
machine: failure_count++ on every wrong attempt; threshold (default
5) trips locked_until = NOW() + duration (default 15m). Successful
Authenticate resets the counter. Reset-window: failures aged out
after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
auto-reset on next attempt.
- Unlock + RemoveCredential: admin-only (auth.breakglass.admin
gated at the router via rbacGate). Audit rows on every operation.
- All public methods refuse to act when Enabled()==false (returns
ErrDisabled; the handler maps to HTTP 404 — surface invisibility).
internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.
internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:
- POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
source IP via the existing rate limiter; returns 404 when
disabled). On success sets the post-login session cookie + CSRF
cookie via SessionService.Create + 204. On any failure:
uniform 401 + identical timing (the service has already audited
the specific failure category).
- POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
- POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
(auth.breakglass.admin)
- DELETE /api/v1/auth/breakglass/credentials/{actor_id}
(auth.breakglass.admin)
Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.
Tests (internal/auth/breakglass/service_test.go):
All 8 Phase 7.5 spec-mandated negative cases:
1. Service.Enabled()==false → all ops return ErrDisabled.
2. Wrong password → ErrInvalidCredentials, failure_count++,
audit row with event_category=auth.
3. Failure_count exceeds threshold → locked, subsequent attempts
(including with the CORRECT password) return identical-shape
401 while the lockout window holds.
4. Lockout window expires → next attempt with correct password
succeeds + resets the counter.
5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
6. Password leak hygiene — the service has zero slog calls; the
audit-row map literal never includes the password plaintext.
7. Argon2id hash never appears in logs OR API responses — pinned
by `json:"-"` tag on BreakglassCredential.PasswordHash + a
belt-and-braces json.Marshal probe asserting the hash bytes
never appear in the marshaled output.
8. Constant-time-compare verified via timing-statistical test —
wrong-password vs no-credential paths take statistically
indistinguishable time (within 5x ratio). The verifyDummy()
hash compute on the no-credential + locked paths is what
keeps timing parity; absent that, an attacker could side-
channel "actor doesn't have a credential" via timing.
Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.
Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).
cmd/server/main.go wiring:
- Constructs breakglassRepo + breakglassService + breakglassHandler
after the OIDC service block.
- breakglassSessionMinterAdapter shim bridges *session.Service.Create
to the breakglass.SessionMinter port.
- Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
visibility for the deliberate SSO-bypass).
internal/config/config.go gains:
- AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
- AuthConfig.Breakglass nested struct with 4 env vars
(CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
+ LOCKOUT_RESET_INTERVAL).
Router wiring:
- 4 new breakglass routes registered when reg.AuthBreakglass != nil;
public login route via direct r.mux.Handle (auth-exempt), 3 admin
routes via r.Register + rbacGate(auth.breakglass.admin).
- POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
allowlist with Phase 7.5 justification.
- SpecParityExceptions extended with 4 new entries documenting
the Phase 7.5 deferral of full per-endpoint OpenAPI rows
(handler doc-block at the top of auth_breakglass.go is the
operator-facing reference).
Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):
- Break-glass is a deliberate bypass of the SSO security boundary.
An attacker who phishes the password OR finds it in a compromised
password manager bypasses MFA, OIDC, and every group-claim gate.
- Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
state. Enable only during SSO-broken incidents. Disable after
recovery.
- WebAuthn pairing (v3 per Decision 12) is the load-bearing second
factor. Without it, break-glass is best treated as an emergency-
only path.
- Audit trail surfaces every break-glass action under
event_category=auth; the auditor role can monitor for unexpected
break-glass logins.
Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
certctl — Self-Hosted Certificate Lifecycle Platform
certctl is a self-hosted platform that automates the entire TLS certificate lifecycle, from issuance through renewal to deployment, with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. Free, source-available under BSL 1.1, covers the same lifecycle that enterprise platforms charge $100K+/year for.
The CA/Browser Forum's Ballot SC-081v3 caps public TLS certificates at 200 days by March 2026, 100 days by 2027, and 47 days by 2029. At 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever. Manual workflows stop being a choice.
Status: Early-access. Production-quality core (Local CA, ACME, agent deployment, CRUD, audit, role-based authz with auditor split + day-0 bootstrap + four-eyes approval) with broader feature surface (intermediate CA hierarchy, ACME/SCEP/EST servers, network appliances) still maturing. Federated identity (OIDC/SAML/WebAuthn, server-side sessions, break-glass accounts, JIT elevation) is the next slice on the roadmap, not yet shipped. Lab and dev deployments encouraged; production deployments welcome with the understanding that customer-scale battle-testing is in progress. File GitHub issues for any rough edges.
Actively maintained, shipping weekly. Open an issue if something breaks. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.
Ready to try it? Jump to the Quick Start. For the marketing site, see certctl.io.
Documentation
The full audience-organized index lives at docs/README.md. Top-level entry points:
| Audience | Start here |
|---|---|
| New to certctl | Concepts → Quickstart → Examples |
| Production operator | Architecture → Security posture → Disaster recovery runbook |
| PKI engineer | ACME server → SCEP server → EST server → CA hierarchy |
| Migrating from another tool | from certbot / from acme.sh / cert-manager coexistence |
| Contributor | Architecture → Testing strategy → CI pipeline |
For the connector reference (12 issuers, 15 targets, 6 notifiers) see docs/reference/connectors/index.md.
Screenshots
Why certctl
Certificate lifecycle tooling has historically split into two camps. Enterprise platforms charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together cron jobs across a fleet, manually renewing certs, or writing custom integration scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.
Built for platform engineering and DevOps teams managing 10 to 500+ certificates, security teams who need audit trails and policy enforcement, and small teams without enterprise budgets who need enterprise-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see Why certctl?.
What it does
certctl handles the full certificate lifecycle in one self-hosted control plane:
- Issue and renew from any CA. Let's Encrypt and any ACME provider, an embedded ACME server you can point cert-manager / certbot / lego at directly, a built-in local CA with sub-CA mode (chains under your enterprise root like ADCS), step-ca, Vault PKI, EJBCA, AWS ACM PCA, Google CAS, DigiCert, Sectigo, GlobalSign, Entrust, plus an OpenSSL / shell-script adapter for anything custom. Twelve native issuer connectors. See the connector reference.
- Deploy automatically to NGINX, Apache, HAProxy, Caddy, Traefik, Envoy, IIS, Windows Cert Store, Java keystore, Kubernetes Secrets, AWS ACM, Azure Key Vault, SSH known-hosts, Postfix + Dovecot, F5 BIG-IP. Fifteen native target connectors. Every deploy goes through atomic-write + ownership-preservation + SHA-256 idempotency + per-target Prometheus counters + pre-deploy snapshot + on-failure rollback. See
docs/reference/deployment-model.md. - Run as an ACME server so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See
docs/reference/protocols/acme-server.md. - Run as a SCEP server for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See
docs/reference/protocols/scep-server.md. - Run as an EST server for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See
docs/reference/protocols/est.md. - Manage multi-level CA hierarchies with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU
PermittedDNSDomains, and 2-level internal PKI. Seedocs/reference/intermediate-ca-hierarchy.md. - Gate high-stakes issuance behind two-person-integrity approval. Flag a profile as
RequiresApproval, the request lands in a queue, a non-requester approves, the scheduler dispatches. Profile-edit changes on approval-tier profiles route through the same gate so the flip-flop bypass is closed. Seedocs/operator/approval-workflow.md. - Authorize with role-based access control. Seven default roles (admin, operator, viewer, agent, mcp, cli, auditor) over a 33-permission canonical catalogue with global / per-profile / per-issuer scope. Auditor role is read-only on the audit trail (
audit.read+audit.export, nothing else) so a regulator's key cannot read certificates or mutate config. Day-0 admin via a one-shotCERTCTL_BOOTSTRAP_TOKENendpoint that closes itself the moment any admin lands. Privilege-escalation guard requiresauth.role.assignto grant or revoke a role. Seedocs/operator/rbac.md,docs/operator/auth-threat-model.md, and the v2.0.x → v2.1.0 migration guide. - Discover existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
- Revoke with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See
docs/reference/protocols/crl-ocsp.md. - Alert via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See
docs/operator/runbooks/expiry-alerts.md. - Drive the platform from natural language via the bundled MCP (Model Context Protocol) server. The full REST API is exposed as MCP tools — ask your AI client "show me all expiring certificates", "revoke the VPN cert, key compromised", or "what agents are offline?" and it translates to API calls. Stateless stdio-transport binary at
cmd/mcp-server/; same auth as the REST API; no extra attack surface. Seedocs/reference/mcp.md.
Architecture and security
Go 1.25 control plane with handler → service → repository layering. PostgreSQL 16 backend (35+ tables, idempotent migrations). Pull-only deployment model — the server never initiates outbound connections. Agents poll for work and generate ECDSA P-256 keys locally so private keys never touch the control plane. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). See the Architecture Guide for full system diagrams.
Security: API-key authentication with SHA-256 hashing + constant-time comparison, then role-based authorization on every gated handler with global / per-profile / per-issuer scope. Auditor split keeps regulator-class actors strictly read-only on the audit trail. Day-0 admin via a one-shot bootstrap token; granting or revoking roles requires the dedicated auth.role.assign permission. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Issuer and target credentials encrypted at rest with AES-256-GCM. HTTPS-only control plane with TLS 1.3 pinned and a fail-closed startup gate that refuses to boot if the TLS bundle is unusable. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit. See docs/operator/security.md for the full posture and docs/operator/auth-threat-model.md for what's defended vs deferred.
Quick Start
Docker Compose (recommended)
git clone https://github.com/certctl-io/certctl.git
cd certctl
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build
Wait ~30 seconds, then open https://localhost:8443 in your browser. The shipped demo overlay seeds 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history. The certctl-tls-init init container self-signs an ECDSA-P256 cert on first boot — accept the browser warning for the demo, or feed the generated ca.crt to your client.
For a clean install without demo data, drop the -f deploy/docker-compose.demo.yml flag and run docker compose -f deploy/docker-compose.yml up -d --build. The four compose files (docker-compose.yml base, docker-compose.demo.yml overlay, docker-compose.dev.yml for PgAdmin + debug logging, docker-compose.test.yml for integration tests) are documented at deploy/ENVIRONMENTS.md.
curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
# {"status":"healthy"}
The control plane is HTTPS-only with TLS 1.3 pinned. See docs/operator/tls.md for cert provisioning patterns.
Agent install (one-liner)
curl -sSL https://raw.githubusercontent.com/certctl-io/certctl/master/install-agent.sh | bash
Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See install-agent.sh.
Helm chart (Kubernetes)
helm install certctl deploy/helm/certctl/ \
--set server.apiKey=your-api-key \
--set postgres.password=your-db-password
Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See values.yaml.
Container images
docker pull ghcr.io/certctl-io/certctl-server:latest
docker pull ghcr.io/certctl-io/certctl-agent:latest
Examples
Pick the scenario closest to your setup and have it running in 2 minutes:
| Example | Scenario |
|---|---|
examples/acme-nginx/ |
Let's Encrypt + NGINX, HTTP-01 challenges |
examples/acme-wildcard-dns01/ |
Wildcard certs via DNS-01 (Cloudflare hook included) |
examples/private-ca-traefik/ |
Local CA (self-signed or sub-CA) + Traefik file provider |
examples/step-ca-haproxy/ |
Smallstep step-ca + HAProxy combined PEM |
examples/multi-issuer/ |
ACME for public + Local CA for internal, one dashboard |
Each directory contains a docker-compose.yml and a README.md explaining the scenario, prerequisites, and customization.
Verifying a release
Every v* tag publishes signed, attested artefacts (Cosign keyless OIDC + SLSA Level 3 provenance + SPDX-JSON SBOMs). For the verification procedure, see docs/reference/release-verification.md.
Development
make build # Build server + agent binaries
make test # Run tests
make lint # golangci-lint (11 linters)
govulncheck ./... # Vulnerability scan
make docker-up # Start Docker Compose stack
CI runs go vet, go test -race, golangci-lint, govulncheck, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%) on every push. Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build.
For the full contributor guide see docs/contributor/ — testing strategy, test environment, CI pipeline, QA prerequisites.
License
Licensed under the Business Source License 1.1. The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial certificate-management offering to third parties. See the LICENSE file for the full Additional Use Grant.
For licensing inquiries: certctl@proton.me
Dependencies
go list -m all | wc -l # total module count (direct + transitive)
go mod why <path> # explain why a module is pulled in
govulncheck ./... # vulnerability scan (CI runs this on every commit)
The release-time SBOM is published as an SPDX-JSON file alongside each release artifact.
If certctl solves a problem you have, star the repo to help others find it. Questions, bugs, or feature requests: open an issue.




