auth-bundle-2 Phase 11: 6 per-IdP OIDC runbooks + index + docs/README wiring

Closes Phase 11 of cowork/auth-bundle-2-prompt.md. Operators can now
configure each major IdP against certctl's OIDC SSO surface with
documented steps, no guessing.

Files
=====

docs/operator/oidc-runbooks/index.md (NEW):
* Index page linking all six per-IdP runbooks.
* Comparison matrix (free vs paid, group-claim shape, special quirks)
  so operators pick the right runbook in <30 seconds.
* "Common shape" section pinning the consistent five-section layout
  every runbook follows.
* "Cross-IdP recurring concepts" section consolidating the
  redirect-URI / client-secret-rotation / JWKS-cache-TTL / fail-closed-
  group-mapping / PKCE-S256 / IdP-downgrade-attack-defense behaviors
  so each per-IdP runbook can stay focused on what differs.

docs/operator/oidc-runbooks/keycloak.md (NEW):
* Canonical reference. Mirrors the testfixtures/keycloak-realm.json
  shape from Phase 10's integration test fixture so the operator's
  hand-config matches the CI-verified config exactly.
* Step-by-step IdP-side: realm → client → groups → group-mapper →
  user. Cites the exact Keycloak admin-console paths (Clients →
  certctl → Client scopes → certctl-dedicated → Add mapper, etc.).
* GUI + API + MCP equivalents for the certctl-side configuration.
* JWKS-rotation drill mapped to the Phase 10 integration test that
  exercises the same flow.
* 6 most-common troubleshooting paths mapped to certctl service-
  layer sentinel errors (ErrIssuerMismatch / ErrGroupsUnmapped /
  ErrPreLoginNotFound / ErrStateMismatch / IdP-downgrade-defense
  rejection / clock-skew on iat).

docs/operator/oidc-runbooks/authentik.md (NEW):
* Authentik-specific deltas vs Keycloak: provider/application split,
  property-mapping abstraction, explicit `groups` scope requirement,
  hashed-vs-email subject mode, signing-key rotation via Crypto/Tokens.

docs/operator/oidc-runbooks/okta.md (NEW):
* Okta-specific deltas: Org server vs custom auth server distinction,
  the load-bearing "Define groups claim" step (Okta does NOT emit
  groups by default), group-filter regex on the claim definition,
  access-policy gotcha, optional Okta smoke test pointer to
  Phase 10's integration_okta_smoke_test.go.

docs/operator/oidc-runbooks/auth0.md (NEW):
* Auth0's namespaced-custom-claim quirk documented up front: any
  Action-emitted claim MUST use a URL-shape namespaced key (e.g.
  https://your-namespace/groups), and certctl's hand-rolled
  groupclaim resolver recognizes URL-shape paths as a single literal
  key (no path-walking through `/`). Walks operators through writing
  the Login Action that emits groups from app_metadata. Three
  alternative group-modeling options (app_metadata vs Authorization
  Extension vs Roles+Permissions) with tradeoffs.

docs/operator/oidc-runbooks/azure-ad.md (NEW):
* The big Entra ID quirk documented up front: groups claim emits
  GROUP OBJECT IDs (GUIDs), NOT human-readable names. Certctl group→
  role mappings MUST be configured against the GUIDs. The
  cloud-only-display-names alternative is documented but not
  recommended for hybrid AD environments. Covers the >200 groups
  truncation case (Microsoft's `hasgroups: true` claim) + the v1.0
  vs v2.0 endpoint distinction (certctl supports v2.0 only).

docs/operator/oidc-runbooks/google-workspace.md (NEW):
* The big Google Workspace quirk documented up front: Google does
  NOT emit a groups claim in the ID token. Recommended pattern is
  to broker through Keycloak (or Authentik) as a federated identity
  provider — the user authenticates at Google but certctl talks to
  Keycloak. Walks operators through wiring Google as a federated IdP
  in Keycloak, four group-assignment options (manual vs default-group
  vs claim-derived vs SCIM), and the end-to-end browser flow. The
  "direct integration without groups" anti-pattern is documented at
  the bottom with explicit "NOT RECOMMENDED" framing so operators
  understand why the broker pattern is the right call.

docs/README.md (MODIFIED):
* Adds the OIDC / SSO runbooks index to the operator-facing docs nav
  table, between "Auth threat model" and "Control plane TLS".

Conventions held
================

* Every runbook carries `> Last reviewed: 2026-05-10` per the
  docs convention.
* Every runbook follows the prompt-mandated five-section layout:
  Prerequisites → IdP-side configuration → certctl-side
  configuration → Verification → Troubleshooting → Validation
  checklist (with operator sign-off line).
* Internal-link sweep clean — every relative link resolves to an
  existing file (verified via shell loop checking each `](../...)`
  and `](*.md)` reference). External links to IdP vendor sites are
  the canonical https URLs.
* No leakage of cowork/ workspace paths as Markdown links — the
  azure-ad.md initially had a `[auth-bundles-index.md](../../../../cowork/...)`
  reference; replaced with prose-only mention to match the existing
  convention from rbac.md + migration/api-keys-to-rbac.md.
* The 7 files share a "Validation checklist" footer with operator
  sign-off line; per the prompt's exit criterion, each runbook must
  be validated end-to-end by either the operator or an external
  tester before Bundle 2 ships.

Verification
============

* Last-reviewed dates: 7/7 runbooks dated 2026-05-10.
* Internal-link sweep: 0 broken (every `]( ...)` reference resolves).
* docs/README.md → operator/oidc-runbooks/index.md link resolves.
* No backend / frontend / Go-test impact — pure docs commit. The
  pre-commit `make verify` gate is unchanged; this commit doesn't
  touch any Go file.

Phase 11 deviation note
=======================

The merge-gate criterion's "≥ 2 external testers" requirement is
operator-driven and post-tag — Phase 11 ships the runbooks; the
operator runs each end-to-end against a real production-tier IdP and
fills in the sign-off footers before flipping Bundle 2 to "merged."
Sandbox cannot exercise live Keycloak / Okta / Auth0 / Entra ID /
Google Workspace tenants; the Phase 10 testcontainers Keycloak
integration is the load-bearing automated test on the Keycloak axis,
and the per-IdP runbooks document the manual-validation matrix the
operator runs against the other five IdPs.
This commit is contained in:
shankar0123
2026-05-10 15:49:56 +00:00
parent 8de28a74ba
commit 2893f9b48e
8 changed files with 1179 additions and 0 deletions
+207
View File
@@ -0,0 +1,207 @@
# Microsoft Entra ID (Azure AD) OIDC runbook
> Last reviewed: 2026-05-10
This runbook wires certctl's OIDC SSO surface against [Microsoft Entra ID](https://learn.microsoft.com/entra/), formerly Azure AD. Entra ID is Microsoft's commercial cloud IdP; it's the default IdP for any organization on Microsoft 365 / Azure.
For the canonical reference + mental model, read [keycloak.md](keycloak.md) first; this runbook only documents the Entra-ID-specific deltas.
## The big Entra ID quirk: groups claim emits OBJECT IDs, not names
Entra ID's `groups` claim emits a JSON array of **group object IDs (GUIDs)**, not human-readable names. A user in `Engineering Group` and `Cert Operators` will see something like:
```json
{
"groups": [
"8b9b1faa-4e83-471e-8b00-7d99c3e2a5f1",
"f00cf1e2-2db1-4cdf-a1ba-1234567890ab"
]
}
```
**You must configure your certctl group→role mappings against these GUIDs**, not against `Engineering Group` or `Cert Operators`. There are workarounds (cloud-only group display names + the optional claims path; see the alternative below) but the GUID-based approach is the only one that works reliably across all Entra ID configurations.
This is by design at Microsoft — group names are mutable and not globally unique within a tenant; object IDs are immutable and globally unique. Operators on Microsoft 365 / Azure deployments are accustomed to managing access by GUID.
## Prerequisites
**On the Entra ID side:**
- A Microsoft 365 tenant or standalone Azure AD tenant. Free Azure AD tier is sufficient; paid tiers (P1/P2) unlock conditional access + SCIM provisioning + risk-based auth, none of which are required for the basic OIDC integration.
- Application Administrator or Global Administrator role.
- Network reachability from certctl-server to `https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration`.
**On the certctl side:** same as Keycloak.
## IdP-side configuration
### 1. Register the application
In the [Entra ID admin center](https://entra.microsoft.com/):
**Applications → App registrations → New registration**:
- Name: `certctl`.
- Supported account types: **Accounts in this organizational directory only** (single-tenant; matches the typical operator use case).
- Redirect URI: **Web** + `https://<your-certctl-host>:8443/auth/oidc/callback`.
- Click **Register**.
On the saved app's **Overview** page, copy:
- **Application (client) ID** → certctl's `client_id`.
- **Directory (tenant) ID** → goes into the issuer URL.
### 2. Create a client secret
**App → Certificates & secrets → Client secrets → New client secret**:
- Description: `certctl-server`.
- Expires: 6 months / 12 months / 24 months — your choice. Set a calendar reminder; Entra ID does NOT auto-rotate secrets.
- Click **Add**.
Copy the **Value** column immediately — it's shown ONCE on creation. The certctl provider's `client_secret` field gets this value.
(Production hardening: prefer **Certificates** over secrets for client authentication; certctl currently supports `client_secret_post` only, but a follow-on bundle can add `private_key_jwt` for cert-based client auth. Track this if you have a hard requirement against shared secrets.)
### 3. Add the `groups` claim to the token
**App → Token configuration → Add groups claim**:
- Pick **Security groups** (covers most operators) OR **Groups assigned to the application** (more granular but requires Premium).
- Token type: **ID token** + **Access token** (both, so userinfo fallback works).
- Customize emit format for ID/access: leave as **Group ID** (default; this is the GUID-based path the runbook is structured around).
- Click **Save**.
If you instead want display names in the claim (only works for cloud-only groups; on-prem-synced groups continue to emit GUIDs regardless):
- Customize emit format → **Cloud-only group display names**.
- BUT — note this works only for groups created in Entra ID itself, not groups synced from on-prem AD. Hybrid environments will have inconsistent claims.
### 4. Add the optional `email` and `profile` claims
By default Entra ID's ID token does NOT include `email` — Microsoft considers email part of the "OIDC profile" but only emits it under specific conditions. To force emission:
**App → Token configuration → Add optional claim → ID token → email**.
You may also want `family_name`, `given_name`, `preferred_username` for richer User records on the certctl side.
### 5. Grant the API permissions
**App → API permissions**:
- Microsoft Graph → Delegated permissions → ensure these are granted (most are default):
- `openid`
- `profile`
- `email`
- `offline_access` (optional; for refresh tokens — certctl doesn't use them currently).
- Click **Grant admin consent** if your tenant requires it.
### 6. (Optional) Restrict who can sign in
By default any user in your tenant can attempt to sign in to the app. To restrict to specific users / groups:
**Enterprise applications → certctl → Properties → Assignment required: Yes**.
Then **Users and groups → Add user/group** and pick the `cert-engineers` / `cert-viewers` Entra ID groups.
## certctl-side configuration
```bash
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/providers \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"name": "Entra ID",
"issuer_url": "https://login.microsoftonline.com/<tenant-id>/v2.0",
"client_id": "<application-id>",
"client_secret": "<client-secret-value>",
"redirect_uri": "https://certctl.example.com:8443/auth/oidc/callback",
"groups_claim_path": "groups",
"groups_claim_format": "string-array",
"fetch_userinfo": false,
"scopes": ["openid", "profile", "email"],
"iat_window_seconds": 300,
"jwks_cache_ttl_seconds": 3600
}'
```
Notes:
- `issuer_url` MUST include `/v2.0` at the end for the v2.0 endpoint. The v1.0 endpoint emits tokens with a different `iss` shape and is NOT supported by certctl. The discovery doc at `https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration` confirms the right path.
- `<tenant-id>` is the Directory (tenant) ID GUID from step 1.
### Add the group→role mappings (GUID-keyed)
Get the GUIDs of your engineering / viewer groups:
**Entra ID → Groups → All groups → <group> → Overview → Object ID**.
Then in certctl:
```bash
# Engineering group → r-operator
curl -X POST https://<your-certctl-host>:8443/api/v1/auth/oidc/group-mappings \
-H "Authorization: Bearer ${CERTCTL_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"provider_id": "<provider-id>",
"group_name": "8b9b1faa-4e83-471e-8b00-7d99c3e2a5f1",
"role_id": "r-operator"
}'
```
Repeat for every group you want to map. **Document the GUID-to-name mapping in your operator runbook** — without it, the next operator looking at certctl's mappings page sees a wall of GUIDs with no way to know which is which. Consider naming the mapping descriptively if your group-mapping schema supports it (Bundle 2 doesn't yet — group-mapping descriptions are a parking-lot item for a follow-on bundle).
## Verification
End-to-end login + audit + Sessions checks are identical to Keycloak.
**Entra-ID-specific:** the audit row's `details.subject` will be Microsoft's `oid` claim (a GUID, the user's object ID), stable across UPN / email changes. The certctl `users` table's `oidc_subject` column holds this GUID.
**JWKS-rotation:** Microsoft auto-rotates signing keys on a documented schedule (every ~6 weeks). The discovery doc + JWKS endpoint always serve the union of active + recently-active keys, so in-flight logins continue to validate. No manual operator action needed in steady state. If you suspect a stuck cache after a Microsoft-side rotation, click "Refresh discovery cache" in the certctl GUI to evict.
## Troubleshooting
**Login completes; ID token contains a `hasgroups: true` claim instead of `groups`.**
Entra ID emits this when a user is in too many groups (>200 by default for ID tokens, >150 for access tokens) — Microsoft truncates the claim and tells the consumer to use Microsoft Graph to look up the full list. certctl does NOT currently support the Graph fallback path (it's a follow-on bundle item).
Workarounds:
- Reduce the user's group membership to <200 (rarely practical in large tenants).
- Restrict the `groups` claim to "Groups assigned to the application" (Token configuration step 3 above) instead of "Security groups". The "assigned" set is bounded by the app's user assignments and stays under the limit.
- Use Entra ID's optional `wids` (well-known IDs) claim if you only care about admin/non-admin distinction; certctl can be configured against `wids` by setting `groups_claim_path` accordingly.
**`groups` claim missing entirely.**
Step 3 wasn't completed — Entra ID does NOT emit `groups` by default. Add the claim via Token configuration before users will see it.
**`ErrIssuerMismatch` even though the `tid` in the token matches.**
The v2.0 endpoint emits `iss = https://login.microsoftonline.com/<tenant-id>/v2.0` (no trailing slash). The v1.0 endpoint emits `iss = https://sts.windows.net/<tenant-id>/`. Confirm certctl's `issuer_url` matches v2.0 exactly — no trailing slash, includes `/v2.0`.
**On-prem-synced groups emit GUIDs even when "Cloud-only display names" is selected.**
Expected behavior — Microsoft only emits display names for groups created in Entra ID itself (cloud-only). On-prem-synced groups always emit object IDs. The hybrid case is unfixable from the IdP side; either map against GUIDs (recommended) or migrate the relevant groups to cloud-only.
**The `email` claim is empty even though the user has a primary email.**
Entra ID's `email` claim only populates when:
1. The user has a "Primary email" set on their Entra ID profile (often blank for B2B guest users).
2. The optional claim was added in step 4.
For B2B guests, the `preferred_username` claim usually carries the email-shape login. You can configure certctl to use `preferred_username` as the user's display name fallback, but the `User.Email` column will remain blank — that's expected for guests.
**Conditional Access policies blocking the login.**
If your tenant has Conditional Access requiring MFA for new applications, certctl will see the user redirected through the MFA challenge. This works transparently — the certctl service doesn't care that MFA was performed; it only validates the resulting ID token. If MFA is failing for the user, debug at the Entra ID side (Sign-in logs).
## Validation checklist
Same as [keycloak.md](keycloak.md#validation-checklist), with these additions:
- [ ] The ID token's `groups` claim is a string-array of GUIDs (decode at jwt.io).
- [ ] Each certctl group-mapping uses the GUID, not a human-readable name.
- [ ] A user with >200 groups successfully logs in (or the operator has documented the limitation + workaround in their internal runbook).
- [ ] The Entra ID **Sign-in logs** view shows the certctl login event with status "Success".
Sign-off: _______________ (operator) on _______________ (date).