mirror of
https://github.com/shankar0123/certctl.git
synced 2026-06-07 15:11:29 +00:00
87213128cc
Pre-G-2 internal/domain/connector.go::Agent::APIKeyHash was tagged
`json:"api_key_hash"` and shipped on every wire surface that returned
domain.Agent — GET /api/v1/agents (PagedResponse{Data: agents}),
GET /api/v1/agents/{id}, GET /api/v1/agents/retired, and the
POST /api/v1/agents registration response. Every authenticated client
(browser, CLI --json, MCP tool calls) received the SHA-256-of-the-API-key
string. The browser silently dropped it because web/src/api/types.ts
omits the field, but CLI and MCP consumers print full JSON so the hash
was visible there. Even though the value is a hash and not the plaintext
key, shipping it gives an attacker an offline brute-force target if the
API-key entropy is low (certctl doesn't enforce a minimum on operator-
supplied keys), and there's no business reason for any client to ever
receive it — the value is server-internal, used only for the lookup at
internal/repository/postgres/agent.go::GetByAPIKey. (Audit:
cat-s5-apikey_leak in coverage-gap-audit-2026-04-24-v5/unified-audit.md.)
We chose the audit's recommended fix (json:"-") plus a defense-in-depth
MarshalJSON plus a CI guardrail. Three layers because struct-tag
redaction alone is one rebase away from being silently reverted, the
custom MarshalJSON catches the case where a parent struct embeds Agent
under a different tag, and the CI grep blocks reintroduction at the spec
or frontend boundary even without a code review catching it.
Files changed:
Phase 1 — Domain redaction:
- internal/domain/connector.go: APIKeyHash tag flipped from
`json:"api_key_hash"` to `json:"-"`. New Agent.MarshalJSON
with value receiver + type-alias-recursion-break that explicitly
zeroes APIKeyHash on the marshal-time copy. Long-form docblock
explaining the G-2 closure rationale + cross-references to
service.RegisterAgent (populator), repository.AgentRepository::
GetByAPIKey (consumer), docs/architecture.md (DB-shape vs
API-shape distinction), and the audit finding.
Phase 2 — Domain tests (5 test functions):
- internal/domain/connector_test.go: TestAgent_MarshalJSON_RedactsAPIKeyHash
pins the marshal-boundary contract on a value receiver. ...RedactsViaPointer
pins the *Agent path. ...RedactsInSlice pins the []Agent path that the
ListAgents handler actually emits via PagedResponse. ...DoesNotMutateReceiver
pins the by-value-receiver contract so a future refactor that switches
to pointer-receiver gets caught. ...RoundTrip pins the wire-shape
guarantee that APIKeyHash is dropped on encode and cannot reappear on
decode. Single sentinel value ("sha256:LEAKED-CREDENTIAL-DERIVATIVE-
SENTINEL") flows through every fixture for grep-ability on regression.
Phase 3 — Handler tests (4 test functions):
- internal/api/handler/agent_handler_test.go: TestListAgents_DoesNotLeakAPIKeyHash,
TestGetAgent_DoesNotLeakAPIKeyHash, TestRegisterAgent_DoesNotLeakAPIKeyHash,
TestListRetiredAgents_DoesNotLeakAPIKeyHash. Each asserts (a) the
literal substring "api_key_hash" is absent from the httptest-captured
body, (b) the leak sentinel value is absent, (c) the non-leaked fields
ARE present (sanity that the handler is serving real data, not just
empty payloads). Shared sentinel "sha256:LEAKED-CREDENTIAL-DERIVATIVE-
HANDLER-SENTINEL" so a single grep over a failing test's output
identifies the leak surface immediately.
Phase 4 — Spec / docs:
- api/openapi.yaml: api_key_hash property REMOVED from Agent schema
(was at line 3690). Inline G-2 comment naming the closure + the
database-vs-API-shape distinction so a future spec edit doesn't
silently re-introduce the field.
- docs/architecture.md: ER-diagram block already documents the agents
table including api_key_hash (DB shape — correct). Added a sibling
note paragraph immediately below the diagram explaining that several
columns are intentionally server-internal (api_key_hash redaction
+ issuers.config / deployment_targets.config encrypted shadow), with
cross-references to the redaction enforcement site, the OpenAPI
schema, the frontend interface, and the CI guardrail.
- web/src/api/types.ts: Agent interface unchanged in shape (already
omitted the field) but added a leading comment block explaining
WHY the omission is intentional — stops a future frontend dev from
"completing" the interface from the OpenAPI spec or the Go struct.
Phase 5 — CI guardrail:
- .github/workflows/ci.yml: new "Forbidden api_key_hash JSON-shape
regression guard (G-2)" step. Scoped patterns catch the actual
regression shapes — Go struct tag (json:"api_key_hash"), frontend
interface declaration, OpenAPI schema property, YAML enum/array
membership. Repository / migration / seed / service / integration /
unit-test / comment lines exempt. Verified locally on the real tree
(passes) and against 4 synthetic regression patterns (each fires
the guardrail). Mirrors the G-1 pattern from .github/workflows/
ci.yml lines 47-108.
Phase 5b — Sweep verification (no changes, results documented for the
next reader):
- internal/api/middleware/audit.go: doesn't serialize Agent struct;
records request body only. No leak.
- service.RegisterAgent audit-event payload: `map[string]interface{}{
"name": name, "hostname": hostname}` — name + hostname only,
no APIKeyHash. No leak.
- All 9 slog sites that mention agent: scalar attrs only ("agent_id",
"error", "agent_hostname"), never the full struct. No leak.
- internal/mcp, internal/cli, cmd/cli, cmd/mcp-server: zero matches
for APIKeyHash / api_key_hash. Both pass server JSON verbatim, so
the wire-side fix transitively closes them.
Verification (all gates pass):
- go build ./...
- go vet ./...
- go test -short ./... — every package green
- go test -short -race ./internal/domain/... ./internal/api/handler/... — clean
- govulncheck ./... — no vulnerabilities in our code
- helm lint deploy/helm/certctl/ — clean
- helm template smoke render — succeeds
- python3 yaml.safe_load on api/openapi.yaml — parses
- OpenAPI Agent schema scan: no api_key_hash property
- CI guardrail mirror: clean on real tree, fires on all 4 synthetic
regression patterns
- Domain pkg coverage: Agent.MarshalJSON 100%, connector.go total 87.5%
- Handler pkg coverage: 79.2%
Sample response body (httptest captured during verification, GET
/api/v1/agents/{id} via the new handler test):
{"id":"agent-demo","name":"demo-agent","hostname":"demo.host",
"status":"Online","last_heartbeat_at":"2026-04-24T11:59:30Z",
"registered_at":"2026-04-24T12:00:00Z","os":"linux",
"architecture":"amd64","ip_address":"10.0.0.42",
"version":"v2.0.49"}
Note the absence of any api_key_hash key, even though the in-memory
struct passed to the handler had APIKeyHash set to a sentinel.
Out of scope (intentionally untouched):
- internal/repository/postgres/agent.go SELECT/INSERT/UPDATE/scan
paths and GetByAPIKey lookup — DB column stays, repo still
populates the struct, auth lookup still works. The redaction is a
marshal-boundary concern.
- migrations/000001_initial_schema.up.sql + migrations/seed_*.sql —
DB schema and seed data unchanged.
- internal/service/agent.go::RegisterAgent — service-side hashing
and persistence unchanged.
- Other domain types with potential credential-derivative fields
(Issuer.Config, DeploymentTarget.Config, notifier configs). Not
flagged by the audit; some are already protected (e.g.,
DeploymentTarget.EncryptedConfig []byte `json:"-"`). File a
separate audit pass if recon surfaces additional leaks.
- Per-resource DTO layer across every handler. Single audit
finding, single domain type.
- A separate possible follow-up: the v2 RegisterAgent endpoint
doesn't return the plaintext API key to the agent, which may
mean self-bootstrap via POST /api/v1/agents is broken. Verified
during recon; out of scope for G-2; should be its own ticket.
Refs: coverage-gap-audit-2026-04-24-v5/unified-audit.md
§2 P1 cluster, cat-s5-apikey_leak
Audit recommendation: 'json:"-" or API-response DTO
excluding APIKeyHash' — went with the json:"-" + MarshalJSON
defense-in-depth pair plus CI guardrail and structural docs.
302 lines
13 KiB
YAML
302 lines
13 KiB
YAML
name: CI
|
|
|
|
on:
|
|
push:
|
|
branches:
|
|
- master
|
|
- v2-dev
|
|
pull_request:
|
|
branches:
|
|
- master
|
|
|
|
jobs:
|
|
go-build-and-test:
|
|
name: Go Build & Test
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Set up Go
|
|
uses: actions/setup-go@v5
|
|
with:
|
|
go-version: '1.25.9'
|
|
|
|
- name: Go Build
|
|
run: |
|
|
go build ./cmd/server/...
|
|
go build ./cmd/agent/...
|
|
go build ./cmd/mcp-server/...
|
|
go build ./cmd/cli/...
|
|
|
|
- name: Go Vet
|
|
run: go vet ./...
|
|
|
|
- name: Install golangci-lint
|
|
run: |
|
|
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.11.4
|
|
|
|
- name: Run golangci-lint
|
|
run: golangci-lint run ./... --timeout 5m
|
|
|
|
- name: Install govulncheck
|
|
run: go install golang.org/x/vuln/cmd/govulncheck@latest
|
|
|
|
- name: Run govulncheck
|
|
run: govulncheck ./...
|
|
|
|
- name: Forbidden auth-type literal regression guard (G-1)
|
|
# G-1 closed the JWT silent auth downgrade by removing "jwt" from the
|
|
# accepted CERTCTL_AUTH_TYPE values. This step grep-fails the build
|
|
# if "jwt" reappears in any of the *additive* auth-type surfaces:
|
|
# the validAuthTypes / ValidAuthTypes() set, the OpenAPI enum, the
|
|
# helm chart's allowed-types list, or the .env.example default.
|
|
# Comment lines and the dedicated rejection branch in config.go
|
|
# (`c.Auth.Type == "jwt"`) are intentionally exempt — those are the
|
|
# G-1 fix itself, not a regression.
|
|
#
|
|
# Connector packages (internal/connector/) are exempt because the
|
|
# Google OAuth2 service-account JWT and step-ca provisioner one-
|
|
# time-token JWT are external-protocol uses, unrelated to certctl's
|
|
# own auth shape. Test files (_test.go) are exempt so negative
|
|
# tests can pass the literal.
|
|
#
|
|
# See docs/upgrade-to-v2-jwt-removal.md for the closure rationale,
|
|
# or internal/config/config.go::ValidAuthTypes for the allowed set.
|
|
run: |
|
|
set -e
|
|
|
|
# Scoped patterns that indicate "jwt" being added back to an
|
|
# allowed-set surface. Each catches a regression shape we've
|
|
# actually seen in pre-G-1 code:
|
|
# - Go map/slice literal: "jwt": true or "jwt",
|
|
# - Go switch case: case "jwt"
|
|
# - YAML enum: enum: [..., jwt, ...] or - jwt
|
|
# - .env conditional: AUTH_TYPE.*"jwt"|=jwt$
|
|
BAD=$(grep -rnEH \
|
|
-e '"jwt"\s*:\s*true' \
|
|
-e '"jwt"\s*,' \
|
|
-e 'case\s+"jwt"' \
|
|
-e 'enum:.*\bjwt\b' \
|
|
-e '^\s*-\s*jwt\s*$' \
|
|
-e 'AUTH_TYPE\s*=\s*jwt\s*$' \
|
|
-e 'AUTH_TYPE\s*=\s*jwt\s*#' \
|
|
-e 'auth\.type\s*=\s*jwt\s*$' \
|
|
-e 'AuthType\("jwt"\)' \
|
|
internal/config/ \
|
|
internal/api/ \
|
|
cmd/ \
|
|
api/openapi.yaml \
|
|
.env.example \
|
|
deploy/.env.example \
|
|
deploy/helm/certctl/values.yaml \
|
|
deploy/helm/certctl/templates/ \
|
|
2>/dev/null \
|
|
| grep -v '_test.go' \
|
|
| grep -vE '^\s*[^:]+:[0-9]+:\s*(//|#)' \
|
|
| grep -v 'is no longer accepted' \
|
|
|| true)
|
|
if [ -n "$BAD" ]; then
|
|
echo "G-1 regression: \"jwt\" reappeared in an allowed-set surface:"
|
|
echo "$BAD"
|
|
echo ""
|
|
echo "Allowed surface for 'jwt' literals: comment lines, the"
|
|
echo "dedicated rejection branch in internal/config/config.go,"
|
|
echo "and connector packages (Google OAuth2, step-ca)."
|
|
echo "See docs/upgrade-to-v2-jwt-removal.md and"
|
|
echo "internal/config/config.go::ValidAuthTypes()."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Forbidden api_key_hash JSON-shape regression guard (G-2)
|
|
# G-2 closed cat-s5-apikey_leak by tagging Agent.APIKeyHash
|
|
# `json:"-"` and adding a defense-in-depth Agent.MarshalJSON that
|
|
# zeroes the field on the marshal-time copy. This step grep-fails
|
|
# the build if `api_key_hash` reappears in any of the *additive*
|
|
# JSON-emitting surfaces: a Go struct json tag in internal/domain/,
|
|
# an OpenAPI Agent schema property, a TypeScript field declaration
|
|
# in web/src/, or an enum-list / discriminator in handler
|
|
# production code.
|
|
#
|
|
# Repository, migration, seed, service, integration-test, and
|
|
# unit-test files are exempt — those are server-internal use
|
|
# sites (the DB column stays, the in-memory struct field stays,
|
|
# the auth-lookup path stays). Comment lines are exempt so the
|
|
# G-2 closure rationale can stay in the source.
|
|
#
|
|
# See coverage-gap-audit-2026-04-24-v5/unified-audit.md
|
|
# cat-s5-apikey_leak for the closure rationale, or
|
|
# internal/domain/connector.go::Agent::MarshalJSON for the
|
|
# redaction enforcement.
|
|
run: |
|
|
set -e
|
|
|
|
# Scoped patterns that indicate api_key_hash being added back
|
|
# to a JSON-emitting surface. Each catches a regression shape
|
|
# that pre-G-2 actually shipped or that a future refactor
|
|
# could plausibly introduce:
|
|
# - Go struct tag: `json:"api_key_hash"`
|
|
# - Frontend interface: api_key_hash[?]: string
|
|
# - OpenAPI schema property: api_key_hash: (column-aligned)
|
|
# - YAML enum / array: - api_key_hash
|
|
BAD=$(grep -rnEH \
|
|
-e 'json:"api_key_hash[",]' \
|
|
-e '^\s*api_key_hash\??\s*:' \
|
|
-e '^\s*-\s*api_key_hash\s*$' \
|
|
internal/domain/ \
|
|
internal/api/ \
|
|
cmd/ \
|
|
api/openapi.yaml \
|
|
web/src/ \
|
|
2>/dev/null \
|
|
| grep -v '_test.go' \
|
|
| grep -vE '^\s*[^:]+:[0-9]+:\s*(//|#)' \
|
|
|| true)
|
|
if [ -n "$BAD" ]; then
|
|
echo "G-2 regression: api_key_hash reappeared in a JSON-emitting surface:"
|
|
echo "$BAD"
|
|
echo ""
|
|
echo "Allowed surface for api_key_hash literals: comment lines,"
|
|
echo "the database column (migrations/), the in-memory struct"
|
|
echo "field tagged \`json:\"-\"\`, and the repository / service"
|
|
echo "use sites. See internal/domain/connector.go::Agent and"
|
|
echo "coverage-gap-audit-2026-04-24-v5/unified-audit.md"
|
|
echo "cat-s5-apikey_leak for the closure rationale."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Race Detection
|
|
run: go test -race ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/scheduler/... ./internal/connector/... ./internal/crypto/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -timeout 300s
|
|
|
|
- name: Go Test with Coverage
|
|
run: |
|
|
go test ./internal/service/... ./internal/api/handler/... ./internal/api/middleware/... ./internal/integration/... ./internal/connector/issuer/... ./internal/connector/target/... ./internal/connector/notifier/... ./internal/connector/discovery/... ./internal/crypto/... ./internal/mcp/... ./internal/cli/... ./internal/domain/... ./internal/validation/... ./internal/tlsprobe/... -count=1 -cover -coverprofile=coverage.out
|
|
|
|
- name: Check Coverage Thresholds
|
|
run: |
|
|
# Extract per-package coverage from test output
|
|
echo "=== Coverage Report ==="
|
|
go tool cover -func=coverage.out | tail -1
|
|
|
|
# Check service layer coverage (target: 60%+)
|
|
SERVICE_COV=$(go tool cover -func=coverage.out | grep 'internal/service' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Service layer coverage: ${SERVICE_COV}%"
|
|
|
|
# Check handler layer coverage (target: 60%+)
|
|
HANDLER_COV=$(go tool cover -func=coverage.out | grep 'internal/api/handler' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Handler layer coverage: ${HANDLER_COV}%"
|
|
|
|
# Check domain layer coverage (target: 40%+)
|
|
DOMAIN_COV=$(go tool cover -func=coverage.out | grep 'internal/domain' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Domain layer coverage: ${DOMAIN_COV}%"
|
|
|
|
# Check middleware layer coverage (target: 50%+)
|
|
MIDDLEWARE_COV=$(go tool cover -func=coverage.out | grep 'internal/api/middleware' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Middleware layer coverage: ${MIDDLEWARE_COV}%"
|
|
|
|
# Check crypto package coverage (target: 85%+)
|
|
# M-8 rationale: encryption primitives are a security-critical gate.
|
|
# v2 format, key-derivation, fallback, and fail-closed sentinel paths
|
|
# all need exhaustive coverage to avoid silent regressions (CWE-916 / CWE-329).
|
|
CRYPTO_COV=$(go tool cover -func=coverage.out | grep 'internal/crypto' | awk '{print $NF}' | sed 's/%//' | awk '{sum+=$1; n++} END {if(n>0) printf "%.1f", sum/n; else print "0"}')
|
|
echo "Crypto package coverage: ${CRYPTO_COV}%"
|
|
|
|
# Fail if thresholds not met
|
|
if [ "$(echo "$SERVICE_COV < 55" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Service layer coverage ${SERVICE_COV}% is below 55% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$HANDLER_COV < 60" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Handler layer coverage ${HANDLER_COV}% is below 60% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$DOMAIN_COV < 40" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Domain layer coverage ${DOMAIN_COV}% is below 40% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$MIDDLEWARE_COV < 30" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Middleware layer coverage ${MIDDLEWARE_COV}% is below 30% threshold"
|
|
exit 1
|
|
fi
|
|
if [ "$(echo "$CRYPTO_COV < 85" | bc -l)" -eq 1 ]; then
|
|
echo "::error::Crypto package coverage ${CRYPTO_COV}% is below 85% threshold"
|
|
exit 1
|
|
fi
|
|
echo "Coverage thresholds passed!"
|
|
|
|
- name: Upload Coverage Report
|
|
uses: actions/upload-artifact@v4
|
|
with:
|
|
name: go-coverage
|
|
path: coverage.out
|
|
retention-days: 30
|
|
|
|
frontend-build:
|
|
name: Frontend Build
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Set up Node.js
|
|
uses: actions/setup-node@v4
|
|
with:
|
|
node-version: '22'
|
|
|
|
- name: Install Dependencies
|
|
working-directory: web
|
|
run: npm ci
|
|
|
|
- name: TypeScript Check
|
|
working-directory: web
|
|
run: npx tsc --noEmit
|
|
|
|
- name: Run Frontend Tests
|
|
working-directory: web
|
|
run: npx vitest run
|
|
|
|
- name: Build Frontend
|
|
working-directory: web
|
|
run: npx vite build
|
|
|
|
helm-lint:
|
|
name: Helm Chart Validation
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- name: Install Helm
|
|
uses: azure/setup-helm@v4
|
|
with:
|
|
version: '3.13.0'
|
|
|
|
# HTTPS-Everywhere (v2.0.47): the chart fails render when no TLS source is
|
|
# configured. Every lint/template invocation below must pick exactly one
|
|
# provisioning mode — see deploy/helm/certctl/templates/_helpers.tpl
|
|
# (certctl.tls.required) and docs/tls.md.
|
|
- name: Lint Helm Chart
|
|
run: |
|
|
helm lint deploy/helm/certctl/ \
|
|
--set server.tls.existingSecret=certctl-tls-ci
|
|
|
|
- name: Template Helm Chart (existingSecret mode)
|
|
run: |
|
|
helm template certctl deploy/helm/certctl/ \
|
|
--set server.tls.existingSecret=certctl-tls-ci \
|
|
> /dev/null
|
|
|
|
- name: Template Helm Chart (cert-manager mode)
|
|
run: |
|
|
helm template certctl deploy/helm/certctl/ \
|
|
--set server.tls.certManager.enabled=true \
|
|
--set server.tls.certManager.issuerRef.name=letsencrypt-prod \
|
|
> /dev/null
|
|
|
|
- name: Template Helm Chart (guard fails without TLS)
|
|
run: |
|
|
# Inverse test: the chart MUST refuse to render when no TLS source is
|
|
# configured. If this ever renders successfully, the fail-loud guard
|
|
# in certctl.tls.required has regressed.
|
|
if helm template certctl deploy/helm/certctl/ > /dev/null 2>&1; then
|
|
echo "::error::Helm chart rendered without a TLS source — fail-loud guard regressed"
|
|
exit 1
|
|
fi
|