feat: M18b Filesystem Certificate Discovery — agent scanning, server dedup, triage API

Agent-side:
- Filesystem scanner walks configured directories (CERTCTL_DISCOVERY_DIRS)
- Parses PEM (.pem, .crt, .cer, .cert) and DER (.der) certificate files
- Extracts CN, SANs, serial, issuer/subject DN, validity, key info, SHA-256 fingerprint
- Reports discoveries to control plane on startup + every 6 hours
- Skips files >1MB and private key files

Server-side:
- Migration 000006: discovered_certificates + discovery_scans tables
- Domain model: DiscoveredCertificate, DiscoveryScan, DiscoveryReport
- Three triage states: Unmanaged, Managed (claimed), Dismissed
- Repository with upsert dedup (fingerprint + agent + path)
- Service layer: process reports, claim, dismiss, list, summary
- 7 new API endpoints (84 total):
  POST /agents/{id}/discoveries, GET /discovered-certificates,
  GET /discovered-certificates/{id}, POST .../claim, POST .../dismiss,
  GET /discovery-scans, GET /discovery-summary
- Audit trail: scan_completed, cert_claimed, cert_dismissed events

Tests: 28 new test functions (domain, handler, service layers)
Docs: README, quickstart, demo-guide, demo-advanced, architecture,
      concepts, connectors, features.md all updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
shankar0123
2026-03-24 00:25:00 -04:00
parent 8768a7b3ef
commit 667a30870d
23 changed files with 2916 additions and 24 deletions
+55
View File
@@ -703,6 +703,61 @@ flowchart TB
For production, you would also add an ingress controller, TLS termination for the certctl API itself, and external PostgreSQL (RDS, Cloud SQL, etc.).
## Discovery Data Flow (M18b)
Certificate discovery enables operators to build a complete inventory of existing certificates before managing them with certctl. Here's how data flows through the system:
```mermaid
flowchart TB
AGENT["certctl-agent\n(on infrastructure)"]
SCAN["Filesystem Scanner\n(CERTCTL_DISCOVERY_DIRS)"]
EXTRACT["Extract Metadata\n(CN, SANs, serial, issuer, expiry, fingerprint)"]
REPORT["POST /api/v1/agents/{id}/discoveries\n(submit scan results)"]
HANDLER["Discovery Handler\n(parse request)"]
SERVICE["Discovery Service\n(ProcessDiscoveryReport)"]
REPO["Discovery Repository\n(upsert with fingerprint dedup)"]
DB["PostgreSQL\ndiscovered_certificates\ndiscovery_scans tables"]
AUDIT["Audit Service\n(RecordDiscoveryScanCompleted)"]
API_LIST["GET /api/v1/discovered-certificates\n(list for triage)"]
API_CLAIM["POST /discovered-certificates/{id}/claim\n(operator claims cert)"]
API_DISMISS["POST /discovered-certificates/{id}/dismiss\n(operator dismisses)"]
UPDATE_STATUS["Update Status\n(Unmanaged → Managed/Dismissed)"]
AGENT -->|"Scan loop\n(startup + 6h)"| SCAN
SCAN --> EXTRACT
EXTRACT --> REPORT
REPORT --> HANDLER
HANDLER --> SERVICE
SERVICE --> REPO
REPO -->|"Dedup by fingerprint\n+ agent + path"| DB
SERVICE --> AUDIT
AUDIT -->|"discovery_scan_completed"| DB
DB -->|"query unmanaged"| API_LIST
API_LIST -->|"operator reviews"| API_CLAIM
API_LIST -->|"operator reviews"| API_DISMISS
API_CLAIM --> UPDATE_STATUS
API_DISMISS --> UPDATE_STATUS
UPDATE_STATUS -->|"RecordDiscoveryCertClaimed\nRecordDiscoveryCertDismissed"| AUDIT
AUDIT --> DB
```
**Key steps:**
1. **Agent-side discovery** — Agent scans `CERTCTL_DISCOVERY_DIRS` on startup and every 6 hours, walking directories recursively and parsing PEM/DER files
2. **Metadata extraction** — For each certificate found, extract: common name, SANs, serial number, issuer DN, subject DN, expiration date, key algorithm, key size, is_ca flag, SHA-256 fingerprint (used as dedup key)
3. **Server submission** — Agent POSTs scan results as `DiscoveryReport` to `POST /api/v1/agents/{id}/discoveries`
4. **Deduplication** — Server uses fingerprint + agent ID + filesystem path as unique key; prevents duplicate records of the same cert on the same agent
5. **Storage** — Records stored in `discovered_certificates` table with status = "Unmanaged"
6. **Audit**`discovery_scan_completed` event logged with agent ID, cert count, scan timestamp
7. **Operator triage** — Operator queries `GET /api/v1/discovered-certificates?status=Unmanaged` to see new findings
8. **Claim or dismiss** — For each unmanaged cert, operator either:
- **Claims it** via `POST /discovered-certificates/{id}/claim` — links to existing managed cert or creates new enrollment
- **Dismisses it** via `POST /discovered-certificates/{id}/dismiss` — removes from triage, marked as "Dismissed"
9. **Status tracking**`discovery_cert_claimed` and `discovery_cert_dismissed` events audit the operator's decision
10. **Summary**`GET /api/v1/discovery-summary` returns count of Unmanaged, Managed, and Dismissed certs (useful for compliance reporting)
This data flow is pull-based and non-blocking. Agents discover at their own pace; the server stores results for later review. There's no pressure to claim or dismiss; operators can leave certificates in "Unmanaged" status indefinitely.
## Testing Strategy
certctl uses a layered testing approach aligned with the handler → service → repository architecture, with 860+ tests across five layers (service, handler, integration, connector, and frontend). The goal is high-confidence regression prevention at the service and handler layers, where the most complex business logic lives, combined with integration tests that exercise the full request path from HTTP to database.