shankar0123 5204f1b5fd auth-bundle-2 Phase 7 + Phase 7.5: OIDC first-admin bootstrap +
break-glass admin (Argon2id, lockout, default-OFF, surface-invisibility)

Phase 7 — OIDC first-admin bootstrap (Decision 3):

  - Optional AdminBootstrapHook closure on *oidc.Service. When wired,
    HandleCallback consults the hook AFTER group resolution + user
    upsert and BEFORE the empty-mapping fail-closed check. Hook
    receives (providerID, groups, userID); returns grantAdmin=true
    when the user matches CERTCTL_BOOTSTRAP_ADMIN_GROUPS AND no
    admin exists yet in the tenant.
  - cmd/server/main.go wires the hook as a closure that:
      * Filters by CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID (if configured).
      * Probes AdminExists via authActorRoleRepo (admin-already-exists
        silently returns false; bootstrap mode is one-shot per tenant).
      * Walks group intersection.
      * On match: grants r-admin via authActorRoleRepo.Grant + emits
        the bootstrap.oidc_first_admin audit row with
        event_category=auth + INFO log.
  - Coexists with the Bundle 1 env-var-token bootstrap. Both paths
    can be configured; first match wins (admin-existence probe
    short-circuits the second).
  - HandleCallback's empty-mapping fail-closed check moved AFTER the
    hook so a fresh deployment with zero group_role_mappings can
    still mint the first admin.
  - 5 tests in service_test.go: hook grants admin on match, hook
    returns false preserves empty-mapping fail-closed, admin-already-
    exists silently falls through to normal mapping, hook-error wraps
    + bubbles, idempotent when admin is already in the mapped role set.

Phase 7.5 — Break-glass admin (Decision 4, default-OFF):

Migration 000038 ships:

  - breakglass_credentials table — at-most-one-credential-per-actor
    (UNIQUE(actor_id)), Argon2id PHC-format password_hash, lockout
    state machine (failure_count, locked_until, last_failure_at).
    FK CASCADE on users(id) so deleting a user atomically removes
    their credential.
  - Two new permissions seeded into r-admin only:
      auth.breakglass.admin — set/rotate/unlock/remove credentials.
      auth.breakglass.login — actor uses break-glass to log in.
    CanonicalPermissions extended in lockstep.

internal/auth/breakglass/service.go (~580 LOC):

  - Service.Enabled() reflects CERTCTL_BREAKGLASS_ENABLED.
  - SetPassword: Argon2id with OWASP 2024 params (m=64MiB, t=3, p=4,
    salt=16 random bytes, output=32 bytes); per-password random salt;
    PHC-format hash output. Min 12 / max 256 byte input.
  - Authenticate: constant-time-compare via subtle.ConstantTimeCompare
    on every code path. Identical 401 + identical timing across the
    wrong-password / locked-account / non-existent-actor paths so an
    attacker cannot probe whether a given actor has break-glass
    configured. Non-existent-actor + locked-account paths run a
    verifyDummy() Argon2id pass for timing parity. Lockout state
    machine: failure_count++ on every wrong attempt; threshold (default
    5) trips locked_until = NOW() + duration (default 15m). Successful
    Authenticate resets the counter. Reset-window: failures aged out
    after CERTCTL_BREAKGLASS_LOCKOUT_RESET_INTERVAL (default 1h)
    auto-reset on next attempt.
  - Unlock + RemoveCredential: admin-only (auth.breakglass.admin
    gated at the router via rbacGate). Audit rows on every operation.
  - All public methods refuse to act when Enabled()==false (returns
    ErrDisabled; the handler maps to HTTP 404 — surface invisibility).

internal/repository/postgres/breakglass.go ships the 5-method
postgres impl with atomic single-statement IncrementFailure (so
concurrent racing wrong-password attempts can't observe an
intermediate state and slip past the threshold) and idempotent
ResetFailureCount.

internal/api/handler/auth_breakglass.go ships the 4-endpoint HTTP
surface:

  - POST /auth/breakglass/login (auth-exempt; 5/min rate-limited per
    source IP via the existing rate limiter; returns 404 when
    disabled). On success sets the post-login session cookie + CSRF
    cookie via SessionService.Create + 204. On any failure:
    uniform 401 + identical timing (the service has already audited
    the specific failure category).
  - POST /api/v1/auth/breakglass/credentials (auth.breakglass.admin)
  - POST /api/v1/auth/breakglass/credentials/{actor_id}/unlock
    (auth.breakglass.admin)
  - DELETE /api/v1/auth/breakglass/credentials/{actor_id}
    (auth.breakglass.admin)

Admin endpoints share the surface-invisibility property: when
CERTCTL_BREAKGLASS_ENABLED=false, every admin endpoint also returns
404 (not 403) so probing via the admin surface gets the same signal
as probing the login endpoint.

Tests (internal/auth/breakglass/service_test.go):

All 8 Phase 7.5 spec-mandated negative cases:

  1. Service.Enabled()==false → all ops return ErrDisabled.
  2. Wrong password → ErrInvalidCredentials, failure_count++,
     audit row with event_category=auth.
  3. Failure_count exceeds threshold → locked, subsequent attempts
     (including with the CORRECT password) return identical-shape
     401 while the lockout window holds.
  4. Lockout window expires → next attempt with correct password
     succeeds + resets the counter.
  5. Password < 12 bytes (or > 256 bytes) → ErrWeakPassword.
  6. Password leak hygiene — the service has zero slog calls; the
     audit-row map literal never includes the password plaintext.
  7. Argon2id hash never appears in logs OR API responses — pinned
     by `json:"-"` tag on BreakglassCredential.PasswordHash + a
     belt-and-braces json.Marshal probe asserting the hash bytes
     never appear in the marshaled output.
  8. Constant-time-compare verified via timing-statistical test —
     wrong-password vs no-credential paths take statistically
     indistinguishable time (within 5x ratio). The verifyDummy()
     hash compute on the no-credential + locked paths is what
     keeps timing parity; absent that, an attacker could side-
     channel "actor doesn't have a credential" via timing.

Plus coverage-lift batch covering: SetPassword first-time vs rotate,
no-caller-id rejection, no-target-id rejection, RNG failure surface,
Authenticate happy-path mints session, no-credential audit row,
session-mint-failure surface, FailureResetInterval recycle, Unlock
+ RemoveCredential happy paths, hash-format unit tests (round-trip,
mismatch, malformed/wrong-version/bad-base64 formats), nil-audit +
nil-session pass-through.

Coverage on internal/auth/breakglass/ at 91.5% per-statement (above
the Phase 7.5 spec ≥ 90% floor).

cmd/server/main.go wiring:

  - Constructs breakglassRepo + breakglassService + breakglassHandler
    after the OIDC service block.
  - breakglassSessionMinterAdapter shim bridges *session.Service.Create
    to the breakglass.SessionMinter port.
  - Logs WARN at boot when CERTCTL_BREAKGLASS_ENABLED=true (operator
    visibility for the deliberate SSO-bypass).

internal/config/config.go gains:

  - AuthConfig.BootstrapAdminGroups + BootstrapOIDCProviderID for
    Phase 7 (CERTCTL_BOOTSTRAP_ADMIN_GROUPS comma-list +
    CERTCTL_BOOTSTRAP_OIDC_PROVIDER_ID).
  - AuthConfig.Breakglass nested struct with 4 env vars
    (CERTCTL_BREAKGLASS_ENABLED + LOCKOUT_THRESHOLD + LOCKOUT_DURATION
    + LOCKOUT_RESET_INTERVAL).

Router wiring:

  - 4 new breakglass routes registered when reg.AuthBreakglass != nil;
    public login route via direct r.mux.Handle (auth-exempt), 3 admin
    routes via r.Register + rbacGate(auth.breakglass.admin).
  - POST /auth/breakglass/login pinned in AuthExemptRouterRoutes
    allowlist with Phase 7.5 justification.
  - SpecParityExceptions extended with 4 new entries documenting
    the Phase 7.5 deferral of full per-endpoint OpenAPI rows
    (handler doc-block at the top of auth_breakglass.go is the
    operator-facing reference).

Threat model (encoded in service.go + auth_breakglass.go doc-blocks
+ migration 000038 docstrings, to be promoted to docs/operator/auth-
threat-model.md in Phase 12):

  - Break-glass is a deliberate bypass of the SSO security boundary.
    An attacker who phishes the password OR finds it in a compromised
    password manager bypasses MFA, OIDC, and every group-claim gate.
  - Recommendation: keep CERTCTL_BREAKGLASS_ENABLED=false in steady-
    state. Enable only during SSO-broken incidents. Disable after
    recovery.
  - WebAuthn pairing (v3 per Decision 12) is the load-bearing second
    factor. Without it, break-glass is best treated as an emergency-
    only path.
  - Audit trail surfaces every break-glass action under
    event_category=auth; the auditor role can monitor for unexpected
    break-glass logins.

Verifications: gofmt clean, go vet clean across all touched packages,
go test -short -count=1 green across internal/auth/oidc (3.0s; new
Phase 7 hook tests integrated alongside the 21+ Phase 3 negatives),
internal/auth/breakglass (3.6s; 8 spec-mandated negatives + coverage
batch passing), internal/config + internal/domain/auth + internal/api/
router + internal/api/handler all green, no regressions in Bundle 1
packages.
2026-05-10 06:51:41 +00:00

certctl logo

certctl — Self-Hosted Certificate Lifecycle Platform

License Go Report Card GitHub Release GitHub Stars

certctl is a self-hosted platform that automates the entire TLS certificate lifecycle, from issuance through renewal to deployment, with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong. Free, source-available under BSL 1.1, covers the same lifecycle that enterprise platforms charge $100K+/year for.

The CA/Browser Forum's Ballot SC-081v3 caps public TLS certificates at 200 days by March 2026, 100 days by 2027, and 47 days by 2029. At 47-day lifespans, a team managing 100 certificates is processing 7+ renewals per week, every week, forever. Manual workflows stop being a choice.

Status: Early-access. Production-quality core (Local CA, ACME, agent deployment, CRUD, audit, role-based authz with auditor split + day-0 bootstrap + four-eyes approval) with broader feature surface (intermediate CA hierarchy, ACME/SCEP/EST servers, network appliances) still maturing. Federated identity (OIDC/SAML/WebAuthn, server-side sessions, break-glass accounts, JIT elevation) is the next slice on the roadmap, not yet shipped. Lab and dev deployments encouraged; production deployments welcome with the understanding that customer-scale battle-testing is in progress. File GitHub issues for any rough edges.

Actively maintained, shipping weekly. Open an issue if something breaks. CI runs the full test suite with race detection, static analysis, and vulnerability scanning on every commit.

Ready to try it? Jump to the Quick Start. For the marketing site, see certctl.io.

Documentation

The full audience-organized index lives at docs/README.md. Top-level entry points:

Audience Start here
New to certctl ConceptsQuickstartExamples
Production operator ArchitectureSecurity postureDisaster recovery runbook
PKI engineer ACME serverSCEP serverEST serverCA hierarchy
Migrating from another tool from certbot / from acme.sh / cert-manager coexistence
Contributor ArchitectureTesting strategyCI pipeline

For the connector reference (12 issuers, 15 targets, 6 notifiers) see docs/reference/connectors/index.md.

Screenshots

Dashboard
Dashboard
Stats, expiration heatmap, renewal trends, issuance rate
Certificates
Certificates
Inventory with bulk ops, status filters, owner/team columns
Issuers
Issuers
Catalog with 10 CA types, GUI config, test connection
Jobs
Jobs
Issuance, renewal, deployment queue with approval workflow

See all screenshots →

Why certctl

Certificate lifecycle tooling has historically split into two camps. Enterprise platforms charge six-figure annual licenses, take months to deploy, and bill professional-services hours at $250 to $400 per hour to write integration code that should ship with the product. Single-purpose tools handle one slice of the problem and leave the operator to glue the rest together. certctl fills the gap — full lifecycle automation, self-hosted, free, CA-agnostic, target-agnostic. If you're stitching together cron jobs across a fleet, manually renewing certs, or writing custom integration scripts to bridge a commercial CLM platform to your actual infrastructure, certctl replaces all of that.

Built for platform engineering and DevOps teams managing 10 to 500+ certificates, security teams who need audit trails and policy enforcement, and small teams without enterprise budgets who need enterprise-grade automation for a 50-server environment. For the detailed positioning argument and when not to use certctl, see Why certctl?.

What it does

certctl handles the full certificate lifecycle in one self-hosted control plane:

  • Issue and renew from any CA. Let's Encrypt and any ACME provider, an embedded ACME server you can point cert-manager / certbot / lego at directly, a built-in local CA with sub-CA mode (chains under your enterprise root like ADCS), step-ca, Vault PKI, EJBCA, AWS ACM PCA, Google CAS, DigiCert, Sectigo, GlobalSign, Entrust, plus an OpenSSL / shell-script adapter for anything custom. Twelve native issuer connectors. See the connector reference.
  • Deploy automatically to NGINX, Apache, HAProxy, Caddy, Traefik, Envoy, IIS, Windows Cert Store, Java keystore, Kubernetes Secrets, AWS ACM, Azure Key Vault, SSH known-hosts, Postfix + Dovecot, F5 BIG-IP. Fifteen native target connectors. Every deploy goes through atomic-write + ownership-preservation + SHA-256 idempotency + per-target Prometheus counters + pre-deploy snapshot + on-failure rollback. See docs/reference/deployment-model.md.
  • Run as an ACME server so existing client tooling plugs in directly. RFC 8555 + RFC 9773 ARI, two per-profile auth modes (public-trust-style validation or trust_authenticated for internal PKI), doubly-signed key rollover, revoke-cert on both kid path and jwk path, per-account rate limiting. Cert-manager / certbot / lego all work pointed at it. See docs/reference/protocols/acme-server.md.
  • Run as a SCEP server for Microsoft Intune-managed phones, ChromeOS devices, network appliances. RFC 8894 native with full PKIMessage wire format, native Intune challenge dispatch with replay protection, per-profile dispatch with separate RA cert per profile. See docs/reference/protocols/scep-server.md.
  • Run as an EST server for HTTPS-based PKCS#10 enrollment. 802.1X / Wi-Fi authentication, IoT device enrollment, RFC 9266 channel binding. See docs/reference/protocols/est.md.
  • Manage multi-level CA hierarchies with name constraints, path-length enforcement, and end-to-end RFC 5280 path validation. Root → intermediate → issuing chains, admin-gated CRUD, drain-first retirement. Patterns documented for 4-level boundary CAs, 3-level policy CAs with per-BU PermittedDNSDomains, and 2-level internal PKI. See docs/reference/intermediate-ca-hierarchy.md.
  • Gate high-stakes issuance behind two-person-integrity approval. Flag a profile as RequiresApproval, the request lands in a queue, a non-requester approves, the scheduler dispatches. Profile-edit changes on approval-tier profiles route through the same gate so the flip-flop bypass is closed. See docs/operator/approval-workflow.md.
  • Authorize with role-based access control. Seven default roles (admin, operator, viewer, agent, mcp, cli, auditor) over a 33-permission canonical catalogue with global / per-profile / per-issuer scope. Auditor role is read-only on the audit trail (audit.read + audit.export, nothing else) so a regulator's key cannot read certificates or mutate config. Day-0 admin via a one-shot CERTCTL_BOOTSTRAP_TOKEN endpoint that closes itself the moment any admin lands. Privilege-escalation guard requires auth.role.assign to grant or revoke a role. See docs/operator/rbac.md, docs/operator/auth-threat-model.md, and the v2.0.x → v2.1.0 migration guide.
  • Discover existing certs across your fleet via filesystem scanning on agents, network TLS probing across CIDR ranges, and cloud secret manager imports (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager). Triage workflow for claim / dismiss / investigate.
  • Revoke with full RFC 5280 reason codes, DER CRL generation per issuer (scheduler-pre-generated and ETag-cached), and an embedded RFC 6960 OCSP responder with dedicated per-issuer responder certs. Single + bulk revocation. See docs/reference/protocols/crl-ocsp.md.
  • Alert via Slack, Microsoft Teams, PagerDuty, OpsGenie, email, webhooks. Per-policy multi-channel routing matrix with severity tiers and fault-isolating per-channel dispatch. See docs/operator/runbooks/expiry-alerts.md.
  • Drive the platform from natural language via the bundled MCP (Model Context Protocol) server. The full REST API is exposed as MCP tools — ask your AI client "show me all expiring certificates", "revoke the VPN cert, key compromised", or "what agents are offline?" and it translates to API calls. Stateless stdio-transport binary at cmd/mcp-server/; same auth as the REST API; no extra attack surface. See docs/reference/mcp.md.

Architecture and security

Go 1.25 control plane with handler → service → repository layering. PostgreSQL 16 backend (35+ tables, idempotent migrations). Pull-only deployment model — the server never initiates outbound connections. Agents poll for work and generate ECDSA P-256 keys locally so private keys never touch the control plane. For network appliances and agentless servers, a proxy agent in the same network zone handles deployment via the target's API (WinRM, iControl REST, SSH/SFTP). See the Architecture Guide for full system diagrams.

Security: API-key authentication with SHA-256 hashing + constant-time comparison, then role-based authorization on every gated handler with global / per-profile / per-issuer scope. Auditor split keeps regulator-class actors strictly read-only on the audit trail. Day-0 admin via a one-shot bootstrap token; granting or revoking roles requires the dedicated auth.role.assign permission. CORS deny-by-default. Shell injection prevention on all connector scripts. SSRF protection (reserved IP filtering) on the network scanner. Issuer and target credentials encrypted at rest with AES-256-GCM. HTTPS-only control plane with TLS 1.3 pinned and a fail-closed startup gate that refuses to boot if the TLS bundle is unusable. Every API call recorded to an immutable audit trail with actor attribution, body hash, and latency tracking. CI runs race detection, 11 linters, and vulnerability scanning on every commit. See docs/operator/security.md for the full posture and docs/operator/auth-threat-model.md for what's defended vs deferred.

Quick Start

git clone https://github.com/certctl-io/certctl.git
cd certctl
docker compose -f deploy/docker-compose.yml -f deploy/docker-compose.demo.yml up -d --build

Wait ~30 seconds, then open https://localhost:8443 in your browser. The shipped demo overlay seeds 32 certificates across 10 issuers, 8 agents, and 180 days of realistic history. The certctl-tls-init init container self-signs an ECDSA-P256 cert on first boot — accept the browser warning for the demo, or feed the generated ca.crt to your client.

For a clean install without demo data, drop the -f deploy/docker-compose.demo.yml flag and run docker compose -f deploy/docker-compose.yml up -d --build. The four compose files (docker-compose.yml base, docker-compose.demo.yml overlay, docker-compose.dev.yml for PgAdmin + debug logging, docker-compose.test.yml for integration tests) are documented at deploy/ENVIRONMENTS.md.

curl --cacert $(docker compose -f deploy/docker-compose.yml exec -T certctl-server cat /etc/certctl/tls/ca.crt) https://localhost:8443/health
# {"status":"healthy"}

The control plane is HTTPS-only with TLS 1.3 pinned. See docs/operator/tls.md for cert provisioning patterns.

Agent install (one-liner)

curl -sSL https://raw.githubusercontent.com/certctl-io/certctl/master/install-agent.sh | bash

Detects your OS and architecture, downloads the binary, configures systemd (Linux) or launchd (macOS), and starts the agent. See install-agent.sh.

Helm chart (Kubernetes)

helm install certctl deploy/helm/certctl/ \
  --set server.apiKey=your-api-key \
  --set postgres.password=your-db-password

Production-ready chart with Server Deployment, PostgreSQL StatefulSet, Agent DaemonSet, health probes, security contexts (non-root, read-only rootfs), and optional Ingress. See values.yaml.

Container images

docker pull ghcr.io/certctl-io/certctl-server:latest
docker pull ghcr.io/certctl-io/certctl-agent:latest

Examples

Pick the scenario closest to your setup and have it running in 2 minutes:

Example Scenario
examples/acme-nginx/ Let's Encrypt + NGINX, HTTP-01 challenges
examples/acme-wildcard-dns01/ Wildcard certs via DNS-01 (Cloudflare hook included)
examples/private-ca-traefik/ Local CA (self-signed or sub-CA) + Traefik file provider
examples/step-ca-haproxy/ Smallstep step-ca + HAProxy combined PEM
examples/multi-issuer/ ACME for public + Local CA for internal, one dashboard

Each directory contains a docker-compose.yml and a README.md explaining the scenario, prerequisites, and customization.

Verifying a release

Every v* tag publishes signed, attested artefacts (Cosign keyless OIDC + SLSA Level 3 provenance + SPDX-JSON SBOMs). For the verification procedure, see docs/reference/release-verification.md.

Development

make build              # Build server + agent binaries
make test               # Run tests
make lint               # golangci-lint (11 linters)
govulncheck ./...       # Vulnerability scan
make docker-up          # Start Docker Compose stack

CI runs go vet, go test -race, golangci-lint, govulncheck, and per-layer coverage thresholds (service 55%, handler 60%, domain 40%, middleware 30%) on every push. Frontend CI runs TypeScript type checking, Vitest tests, and Vite production build.

For the full contributor guide see docs/contributor/ — testing strategy, test environment, CI pipeline, QA prerequisites.

License

Licensed under the Business Source License 1.1. The source code is publicly available and free to use, modify, and self-host. The one restriction: you may not use certctl's certificate management functionality as part of a commercial certificate-management offering to third parties. See the LICENSE file for the full Additional Use Grant.

For licensing inquiries: certctl@proton.me

Dependencies

go list -m all | wc -l   # total module count (direct + transitive)
go mod why <path>        # explain why a module is pulled in
govulncheck ./...        # vulnerability scan (CI runs this on every commit)

The release-time SBOM is published as an SPDX-JSON file alongside each release artifact.


If certctl solves a problem you have, star the repo to help others find it. Questions, bugs, or feature requests: open an issue.

S
Description
certctl is a self-hosted platform that automates the entire certificate lifecycle — from issuance through renewal to deployment — with zero human intervention. It works with any certificate authority, deploys to any server, and keeps private keys on your infrastructure where they belong.
Readme 47 MiB
Languages
Go 77.2%
TypeScript 19.2%
Shell 2.2%
PLpgSQL 0.7%
JavaScript 0.3%
Other 0.3%