Commit Graph

11 Commits

Author SHA1 Message Date
shankar0123 c222c8b57a ssh: fix staticcheck ST1008 — error is last return from restoreFromBackups
CI's golangci-lint run on commit 636de7f ("ssh: pre-deploy snapshot
+ reload-failure rollback") caught a staticcheck ST1008 violation:
restoreFromBackups returned (error, map[string]string) — error must
be the last return value per Go convention.

Reorder the return tuple to (map[string]string, error) and update
the single caller in DeployCertificate. No behavior change; pure
signature shuffle to satisfy the lint gate.

Verified locally:
- gofmt -l ./internal/connector/target/ssh/  clean
- go vet ./internal/connector/target/ssh/  clean
- go test -race -count=1 ./internal/connector/target/ssh/  green
2026-05-02 17:35:45 +00:00
shankar0123 636de7f6b5 ssh: pre-deploy snapshot + reload-failure rollback
Closes Bundle 6 of the 2026-05-02 deployment-target coverage audit
(see cowork/deployment-target-audit-2026-05-02/RESULTS.md). Pre-fix,
DeployCertificate at ssh.go:201-316 wrote new cert/key/chain via
SFTP then ran the operator's reload command. If reload failed, the
new files stayed on the remote — partial-success state with no
rollback path. docs/deployment-atomicity.md L92 promised "Pre-deploy
SCP backup of remote files"; the code didn't deliver.

This commit:

1. Pre-deploy snapshot. Before any WriteFile, iterate the deploy's
   target paths (cert, key, optional chain). For each path:
   - StatFile to detect existence. errors.Is(err, os.ErrNotExist)
     means first-time deploy (rollback = Remove). Other stat
     errors bail out before any write happens.
   - ReadFile into an in-memory backups map[string][]byte keyed
     by remote path. Original mode captured into a parallel
     modes map for restore fidelity.

2. SSHClient interface evolution — three changes:
   - StatFile(path) (os.FileInfo, error) — was (int64, error).
     FileInfo carries Mode() needed for accurate restore. Existing
     fixture tests updated to call info.Size() instead of the
     bare size value.
   - ReadFile(path) ([]byte, error) — new method; SFTP Open + read
     via io.ReadAll. realSSHClient implements via sftpClient.Open.
   - Remove(path) error — new method; SFTP Remove. Used by the
     rollback path to clean up first-time-deploy partial state.

3. On-reload-failure rollback. Replace the bare error-return at
   L282-295 with restoreFromBackups + retry-reload escalation:
   - For paths in the snapshot map, WriteFile the original bytes
     with the original mode (0600 fallback if mode capture was
     incomplete).
   - For paths that didn't exist pre-deploy, Remove the new file.
   - Re-run the reload command (best-effort second attempt). If
     it succeeds, the target is back to pre-deploy state. If it
     fails, the remote is in pre-deploy file state but the daemon
     may be stuck — surface as wrapped error so the operator
     knows where to look.

4. DeploymentResult.Metadata gains backup_status_{cert,key,chain}
   so operators can see per-path snapshot state on both success
   ("snapshotted" / "no_pre_existing" / "n/a") and failure
   ("restored" / "removed" / "restore_failed" / "remove_failed").
   buildMetadataWithBackup helper centralises the metadata
   shape so success and failure paths emit a consistent set
   of keys.

5. Helper extraction. restoreFromBackups(ctx, paths, backups,
   modes) is a private method on Connector; returns the first
   error + per-key restore status map for clean test seams.

DeploymentResult shape on failure:
- rollback OK + retry-reload OK → Success=false, "reload command
  failed; rolled back to pre-deploy state" (clean recoverable
  failure; remote fully restored, daemon serving original cert).
- rollback OK + retry-reload FAIL → wrapped error noting "rolled
  back files; retry-reload also failed; daemon may need manual
  restart". Metadata flags daemon_state_unknown=true.
- rollback FAIL → operator-actionable wrapped error containing
  BOTH the reload error AND the rollback error; metadata flags
  manual_action_required=true.

Tests added to ssh_test.go (4 new tests, ~330 LOC):
- TestSSH_ReloadFails_FilesRestored — happy rollback path with
  pre-existing remote bytes for cert/key/chain. Asserts every
  path's last WriteFile call contains the captured backup bytes
  verbatim, no Remove calls fired (all paths had snapshots), and
  metadata reports backup_status=restored for each path.
- TestSSH_NoExistingCert_ReloadFails_NewCertRemoved — first-time
  deploy variant. StatFile returns os.ErrNotExist for every path;
  rollback Removes each written file but performs no WriteFile
  during restore (no backup to restore from). Asserts exactly 3
  WriteFile calls (deploy only) and 3 Remove calls (rollback).
- TestSSH_ReloadFails_RollbackAlsoFails_OperatorActionable —
  uses a writeOrderTrackingMock to fail the SECOND WriteFile to
  the cert path (i.e. the restore call, not the initial deploy).
  Asserts wrapped error contains both the reload error and the
  rollback error, and metadata flags manual_action_required=true.
- TestSSH_ReloadFails_RestoreThenSecondReloadFails — partial-
  recovery escalation. Rollback succeeds but the post-restore
  retry-reload fails. Asserts wrapped error mentions "rolled back
  files; retry-reload also failed" and metadata flags
  daemon_state_unknown=true.

Existing tests preserved by extending mockSSHClient with backward-
compatible per-path response maps (statByPath / readByPath /
writeFileErrByPath / executeErrSequence). Legacy global fields
(statFileSize / statFileErr / writeFileErr / executeErr) still
work when no per-path override matches, so TestValidateConfig_*
and TestDeployCertificate_Success_* don't need changes.

docs/deployment-atomicity.md L92 unchanged from today's text —
Bundle 1 doc-realignment hasn't shipped, so the "Pre-deploy SCP
backup of remote files" line was never softened. Post-Bundle-6
the claim is honest (was aspirational pre-fix).

Verified locally (sandbox lacks staticcheck install due to disk
pressure; CI runs the full lint gate):
- gofmt -l ./internal/connector/target/ssh/  clean
- go vet ./internal/connector/target/ssh/  clean
- go build ./internal/connector/target/ssh/...  clean
- go build ./cmd/agent/...  clean
- go test -race -count=1 ./internal/connector/target/ssh/  green

Audit reference: cowork/deployment-target-audit-2026-05-02/RESULTS.md
Bundle 6.
2026-05-02 17:13:38 +00:00
shankar0123 7cb453a336 chore(fmt): repo-wide gofmt -w sweep — close drift surfaced by ci-pipeline-cleanup Phase 4
Mechanical reformat. The new 'gofmt drift' CI step (added in
ci-pipeline-cleanup Phase 4, commit 0f205a8) surfaced 111 files
with accumulated gofmt drift across cmd/, internal/, and deploy/test/.

Each file's diff is gofmt-standard: whitespace adjustments, intra-
group import sorting (alphabetical by import path within blank-line-
separated groups), and struct-tag column alignment. No semantic
changes — verified via 'git diff --ignore-all-space' which shows only
the line-position deltas from import reordering.

The gate stays in place after this commit. Going forward it catches
gofmt drift at PR time.
2026-04-30 22:33:57 +00:00
shankar0123 8637131f80 chore: gofmt fixes across deploy-hardening I new files
Phase 13 verification surfaced gofmt-formatting drift in 6 files
across the bundle's new code:

- internal/api/handler/metrics.go (struct field alignment)
- internal/connector/target/k8ssecret/validate_only_test.go (alignment)
- internal/connector/target/nginx/nginx.go (alignment)
- internal/connector/target/postfix/postfix.go (alignment)
- internal/connector/target/ssh/validate_only_test.go (alignment)
- internal/service/deploy_counters.go (alignment)

Pure mechanical gofmt -w fixes; no behavior changes. CI's
make verify gate (which runs `go fmt ./...`) didn't catch these
because go fmt is more lenient than gofmt -l, but golangci-lint
v2.11.4 + the explicit gofmt step in Phase 13 verification did.

Phase 13 full-matrix verification all green:
- gofmt -l: empty across all bundle-touched files
- go vet ./internal/deploy/... ./internal/connector/target/... ./internal/service/ ./internal/api/handler/ ./cmd/agent/: clean
- golangci-lint v2.11.4 (the version CI runs): 0 issues
- go test -race -count=1 across deploy + nginx + apache + haproxy + agent + service: all green
- INTEGRATION=1 go test -tags integration -run Deploy ./deploy/test/...: 4/4 e2e tests green

Phase 14 next: release prep — Active Focus update, release notes,
Reddit-beat draft, final tag handoff to operator.
2026-04-30 15:33:33 +00:00
shankar0123 9f41b58b2f feat(ssh,wincertstore,javakeystore,k8ssecret): explicit ValidateOnly + leverage existing connectors
Phase 9 of the deploy-hardening I master bundle. The four
non-file-server connectors get real ValidateOnly probes that
operators use to preview a deploy without touching the live cert.
Existing DeployCertificate paths already have explicit backup +
rollback semantics (SCP backup / WinCertStore Get-ChildItem
snapshot / keytool snapshot / K8s atomic API).

SSH (validate_only.go):
- Probes via SSHClient.Connect. Confirms agent reachability +
  credentials. Cheap (no remote command runs); released cleanly
  via defer Close.
- A true SCP dry-run requires a no-commit upload (SCP doesn't
  have one). V2 ships the auth probe as the load-bearing check.
- 3 new tests in validate_only_test.go.

WinCertStore (validate_only.go):
- Probes via PowerShell `Get-ChildItem -Path Cert:\<loc>\<store>`
  using the configured StoreLocation + StoreName (defaults
  LocalMachine\My).
- Confirms agent has Windows + the IIS module + the right ACLs.
- 4 new tests including default-store-path verification.

JavaKeystore (validate_only.go):
- Probes via `keytool -list -keystore <path> -storepass <pass>`
  using the configured KeystorePath / KeystorePassword and
  KeytoolPath (default "keytool").
- Confirms keystore exists, password is correct, JRE is on PATH.
- 4 new tests covering succeeds / fails / no-path-sentinel /
  nil-executor-sentinel.

K8s Secret (validate_only.go):
- Probes via K8sClient.GetSecret on the configured Namespace +
  SecretName. Returns nil on success or "not found" (the
  CreateSecret path on Deploy will handle it). Other errors
  (forbidden/unreachable) surface as wrapped.
- 4 new tests covering succeeds / RBAC-error wrapped /
  no-config-sentinel / nil-client-sentinel.

Smoke test connectorsAtPhase3 list shrunk from 7 to 3 entries
(ssh + wincertstore + javakeystore + k8ssecret removed). Only
caddy (file-mode) + envoy + traefik remain — those three
genuinely have no validate-with-target command available.

Race detector clean across all 13 connectors. golangci-lint
v2.11.4 clean.

Phase 10 next: DeployCounters + Prometheus exposer mirroring the
production-hardening-II OCSP counter pattern.
2026-04-30 15:22:17 +00:00
shankar0123 49f1a60762 feat(target): ValidateOnly dry-run method on Connector interface (default returns ErrValidateOnlyNotSupported)
Phase 3 of the deploy-hardening I master bundle. Extends the
target.Connector interface with the dry-run method that operators
will use to preview a deploy before committing — but ships only the
default-stub for all 13 connectors. Phases 4-9 replace each stub
with the real validate-with-the-target implementation.

interface.go:
- Add ErrValidateOnlyNotSupported sentinel (frozen decision 0.6 —
  connectors that cannot dry-run, like K8s, return this rather than
  nil so operator triage can errors.Is for "not supported" vs
  "validated successfully").
- Add ValidateOnly(ctx, request DeploymentRequest) error to
  Connector interface.

13 new validate_only.go files (one per connector at
internal/connector/target/<name>/validate_only.go):
- apache, caddy, envoy, f5, haproxy, iis, javakeystore, k8ssecret,
  nginx, postfix, ssh, traefik, wincertstore.
- Each file is identical except for the package declaration: a
  one-method default stub returning target.ErrValidateOnlyNotSupported.
- Per-connector files (rather than a single embed-method approach)
  let Phases 4-9 replace each connector's stub independently
  without churning a shared base.

Tests:
- internal/connector/target/validate_only_test.go pins the sentinel
  contract (errors.Is identity, Error() string, %w wrap propagation).
- internal/connector/target/validate_only_smoke_test.go (external
  test package) constructs a zero-value &<pkg>.Connector{} for each
  of the 13 connectors and asserts ValidateOnly returns
  ErrValidateOnlyNotSupported. The test's
  connectorsAtPhase3 list is the load-bearing CI guard:
  - A 14th connector added without wiring ValidateOnly fails the
    `len(connectorsAtPhase3) != 13` invariant.
  - A connector whose real ValidateOnly lands (Phase 4 NGINX, Phase
    5 Apache, etc.) MUST be removed from this list or the smoke test
    fails (real impl no longer returns the sentinel). That removal
    IS the bookkeeping that the operator-visible bit + behavior
    change are wired together end-to-end.

Compile + go vet + golangci-lint v2.11.4 + go test all 0 issues.

Phase 4 next: NGINX canonical real-impl — replace the stub with
nginx -t -c <temp>; same time replace the existing os.WriteFile
flow in DeployCertificate with deploy.Apply(...).
2026-04-30 14:40:51 +00:00
shankar0123 dfb083c9f4 Bundle M.SSH-extended (Coverage Audit Extension): SSH connector 71.6% -> 90.2% — H-002 closed
internal/connector/target/ssh/ssh_server_fixture_test.go (~580 LoC,
14 tests) pins realSSHClient.Connect / Execute / WriteFile /
StatFile / Close end-to-end via an embedded golang.org/x/crypto/ssh
ServerConn + pkg/sftp.NewServer, bound to net.Listen('tcp',
'127.0.0.1:0'). Same hand-rolled in-process protocol-server pattern
as the M.Email SMTP fixture.

Coverage delta (per-function):
  Connect      0.0% -> ~95% (ed25519 host key + password/key auth +
                             handshake + sftp open)
  Execute     25.0% -> ~95% (success path + exit-code-1 + not-conn)
  WriteFile   15.4% -> ~95% (round-trip + chmod + not-conn)
  StatFile    33.3% -> ~95% (size assertion + not-conn + not-exist)
  Close       42.9% -> ~95% (idempotent + never-connected)

Package overall: 71.6% -> 90.2% (+18.6pp; +5.2 above 85% gate).

Test infrastructure
  - fakeSSHServer (~150 LoC): net.Listen + ed25519 host key +
    PasswordCallback + PublicKeyCallback. Optional toggles for
    rejectAuth / dropOnHandshake / failExec / failSFTP failure
    modes.
  - encodePEMBlock + base64Encode helpers (~50 LoC) for OpenSSH
    private-key serialization. Avoids encoding/pem dep churn in
    test header.
  - t.Cleanup wires server shutdown + WaitGroup-drain of in-flight
    connection handlers (no goroutine leaks).

Test groups
  - Connect: password success / wrong-password / auth-rejected-all /
    handshake-dropped / TCP-refused / key-auth success
  - Execute: success / not-connected / exit-code-1
  - WriteFile + StatFile: round-trip with size + chmod 0640
    verification / not-connected / not-exist
  - Close: idempotent / never-connected

Verification
  - go test -short -count=1 ./internal/connector/target/ssh/...: PASS
  - 20ms wall time
  - go vet clean

Audit deliverables
  - findings.yaml H-002 status partial_closed -> closed
    (will update in extension-progress.md sweep)
  - extension-progress.md: M.SSH-extended marked DONE

Closes: H-002 (SSH Connect / Execute / WriteFile branches)
Bundle: M.SSH-extended (Coverage Audit Extension)
2026-04-27 19:07:38 +00:00
shankar0123 41a8f5853e Bundle M (Coverage Audit Closure): connector failure-mode round — 3 of 4 sub-batches
M.F5 closes H-001; M.Email closes H-003; M.SSH partial-closes H-002; M.Cloud (H-004) deferred.

M.F5 (~430 LoC f5_realclient_test.go):

  Coverage: 44.6% -> 90.1% (+45.5pp; +5.1 above 85% target)

  Bypasses existing F5Client-interface mock; exercises every realF5Client

  HTTP method end-to-end against httptest.Server with canned iControl REST

  responses. 401-retry path verified. Per-fn ALL previously-0% lifted to

  88-100%. Plus context-cancel test.

M.SSH (~150 LoC ssh_realclient_test.go) PARTIAL-CLOSED:

  Coverage: 55.2% -> 71.6% (+16.4pp; below 85% target)

  Covers buildAuthMethods all branches + WriteFile/Execute/StatFile

  not-connected guards + Close idempotency.

  Connect() ~50 LoC needs embedded golang.org/x/crypto/ssh server fixture

  (~1000 LoC test infrastructure). Tracked as Bundle M.SSH-extended.

M.Email (~340 LoC email_failure_test.go):

  Coverage: 39.7% -> 70.5% (+30.8pp; +0.5 above 70% target)

  Hand-rolled minimal SMTP server (responds to EHLO/AUTH/MAIL/RCPT/DATA/

  QUIT with canned 2xx/3xx/5xx responses based on per-test failOn map).

  Tests:

    - Header-injection (CWE-113): CR/LF/NUL in From/To/Subject reject

      before any SMTP I/O (6 tests across sendEmail + sendHTMLEmail)

    - Connection-refused for both sendEmail and sendHTMLEmail

    - SendAlert / SendEvent full SMTP transactions (happy path)

    - Server-side failures: RCPT 550, DATA 554

    - AUTH PLAIN happy + 535-failure

M.Cloud (H-004) DEFERRED:

  AzureKV 41.2% / GCP-SM 43.1%. Same M.F5 approach (httptest.Server +

  OAuth2 token endpoint mock) is straightforward but ~600 LoC tests +

  ~200 LoC mock infrastructure exceeds session budget. Tracked as

  Bundle M.Cloud-extended.

Verification:

  go vet ./internal/connector/{target/f5,target/ssh,notifier/email}/...  clean

  gofmt -l                                                                clean

  staticcheck -checks all                                                 clean

  go test -short -count=1                                                 PASS

  F5     90.1%  Email 70.5%  SSH 71.6%

Audit deliverables:

  findings.yaml: -0008 (F5) + -0010 (Email) -> closed; -0009 (SSH) ->

    partial_closed; -0011 (Cloud) retained as deferred

  gap-backlog.md: strikethroughs + Bundle M closure-log entry covering all 4 sub-batches

  coverage-matrix.md: 3 new rows for F5/SSH/Email at post-Bundle-M coverage

  closure-plan.md: Bundle M [~] with per-sub-batch status breakdown

  CHANGELOG.md: [unreleased] Bundle M entry
2026-04-27 17:24:55 +00:00
shankar0123 370f856725 fix: resolve 8 staticcheck lint errors in test files
SA1029: use typed context key instead of string in main_test.go
S1039: remove unnecessary fmt.Sprintf in validation_test.go
SA4023: fix unreachable nil check on concrete error type
SA4006: fix unused variable assignments in stepca_test.go (4 occurrences)
SA4000: fix duplicate expression in ssh_test.go (BEGIN vs END CERTIFICATE)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:27:57 -04:00
shankar0123 7382e5f03b test: comprehensive test gap closure across 24 packages
Close coverage gaps identified by dual-audit (qualitative + quantitative).
New test files for config (0%→98%), router (0%→100%), handler validation,
health, audit, response helpers, webhook notifier (0%→88%), email notifier,
middleware (recovery, rate limiter), domain profile, service nil-safety,
config helpers, issuer bootstrap, and server bootstrap wiring. Expanded
existing tests for ACME (34%→42%), step-ca (42%→52%), F5, SSH, agent
(43%→63%), scheduler (88%→99%), renewal service, and issuerfactory.

All tests pass: go test -short, go vet, go test -race clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 23:09:40 -04:00
shankar0123 697c0be9f3 feat(M38): SSH target connector for agentless deployment via SSH/SFTP
Adds a new target connector enabling certificate deployment to any
Linux/Unix server without installing the certctl agent binary. Uses the
proxy agent pattern — a single agent in the same network zone deploys
certs to remote servers over SSH/SFTP.

Key additions:
- SSH/SFTP connector with key auth (file/inline) + password auth
- Injectable SSHClient interface for cross-platform testing (25 tests)
- Shell injection prevention via validation.ValidateShellCommand()
- Configurable cert/key/chain paths with octal permissions
- GUI: 11 SSH config fields in target create wizard

Also fixes pre-existing frontend bug where all target type strings
(nginx, apache, etc.) were sent as lowercase but the backend expects
proper-case (NGINX, Apache, etc.), breaking GUI-created targets.
Adds missing TargetTypeSSH to validTargetTypes service map.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 12:36:01 -04:00