Identity Patterns for AI Applications
01 / 15
github.com/bhavink/applied-ai-governance

Identity Patterns
for AI Applications

OBO, M2M, and Federation. Three token flows, one governance model.
The Challenge

You need an AI platform that serves two identity worlds

Business users need plain-English answers from live data. Knowledge workers need natural language search across documents and institutional memory. AI agents need to collaborate: retrieval, reasoning, and action, coordinated by a supervisor. A tool layer connects it all to your data and APIs.

Internal teams authenticate through your corporate IdP. But partners and customers authenticate through their own identity providers, and they need the same governed access without Databricks accounts.

Each identity pattern (OBO, M2M, Federation) solves a different piece of this puzzle. This deck shows you when to use which.

Foundations

Authentication vs Authorization

Two distinct questions, enforced at different layers.

Authentication (AuthN)

Who are you?

Proves identity. Handled by OAuth tokens (OBO, M2M, or Federation exchange). The token carries the caller's identity through every layer of the stack.

OBOUser's OAuth token via Apps proxy
M2MSP client credentials grant
FederationIDP JWT exchanged for SP token

Authorization (AuthZ)

What can you do?

Enforces permissions. Unity Catalog evaluates grants at the SQL engine level. Row filters, column masks, USE CONNECTION, EXECUTE on functions. Application code cannot bypass these controls.

ScopesLimit what the token can do
UC GrantsLimit what the identity can access
BothEnforce independently; either can deny
Identity Models

Three token patterns

OBO (U2M)

User-to-Machine. The platform acts on behalf of a known human.

TokenUser's scoped OAuth via Apps proxy
Identitycurrent_user() = human email
UC firesAs the individual user
AuditFull per-user attribution
Best for: Internal apps where users have Databricks accounts.
M2M

Machine-to-Machine. A service principal with its own credentials. No human in the loop.

TokenSP OAuth client credentials
Identitycurrent_user() = SP app ID
UC firesAs the service principal
AuditSP-level only (no human context)
Best for: Background jobs, scheduled pipelines, CI/CD, health checks.
FEDERATION

External IDP token exchanged for a role-based SP token. Users have no Databricks accounts.

TokenIDP JWT exchanged via RFC 8693
IdentityRole-based SP (mapped from IDP)
UC firesis_member() per SP group
AuditSP-level + app-layer human audit
Best for: Partner portals, customer-facing apps, multi-tenant SaaS.
Pattern 1

On-Behalf-Of (OBO) Flow

User
Alicealice@company.com
Platform
Databricks Appsauthenticates + forwards
Platform
OAuth Serverscoped user token (1h)
Your Code
App / MCPreceives user token
Platform
Unity Catalogfires as Alice

What the proxy does

Databricks Apps handles the entire OAuth flow before your code runs:

  1. Authenticates the user via their workspace IdP
  2. Creates a scoped token limited to configured permissions
  3. Forwards the request with the token in x-forwarded-access-token

Your app never handles login flows, passwords, or OAuth redirects.

current_user() = human

Every query carries Alice's identity. UC row filters evaluate:

WHERE owner = current_user()
-- returns 'alice@company.com'

Column masks hide sensitive fields via is_member('group'). No app code changes needed. All access logged with Alice's email.

Pattern 2

Machine-to-Machine (Service Principal)

Trigger
Scheduled Jobcron / CI/CD / event
Identity
Service Principalsp-agent-prod
Platform
OAuth ServerM2M token (1h)
Platform
Unity Catalogchecks SP GRANTs

When to use

  • Batch processing and scheduled workflows
  • CI/CD pipelines deploying or testing
  • Background audit writes (fire-and-forget)
  • Health checks and monitoring
  • Any automation with no human in the loop

Audit implications

Every execution gets identical permissions. The SP's grants do not change between runs.

All actions logged as sp-agent-prod. No individual user attribution in UC audit logs. If you need per-user trails, use OBO or Federation instead.

Databricks handles token rotation automatically. No secrets to manage manually.

Pattern 3

Federation Token Exchange

Customer
External UserAuth0 / Okta / Entra
Customer
Your App Serververifies JWT, maps role
Platform
/oidc/v1/tokenvalidates federation policy
Databricks App
MCP Serverrole-scoped SP token
Platform
UC Governancefires per SP group

Token exchange in one call

POST /oidc/v1/token
grant_type=urn:ietf:params:oauth
  :grant-type:token-exchange
subject_token=<IDP_JWT>
scope=sql genie serving

Databricks validates the JWT's iss, aud, sub against the federation policy on the target SP. Mismatch = 401/403.

Role-based service principals

JWT roleIDP injects role claim (sales-west, executive, etc.)
SP mappingApp server maps role to a Databricks SP
UC groupsEach SP belongs to workspace groups
ResultRow filters + column masks fire per SP group

IDP-agnostic. Swap Auth0, Okta, or Entra by changing the federation policy config.

Federation Details

The trust boundary

Federation policy validation

Three fields are checked on every exchange:

issMust match the IDP's issuer URL exactly (trailing slash matters)
audMust match the workspace URL registered in the policy
subMust match the client ID registered on the SP

Databricks fetches the IDP's JWKS keys from iss/.well-known/jwks.json and verifies the RS256 signature. No proxy headers involved.

Why the API server is the trust boundary

  • Verifies JWT independently via JWKS (does not trust caller's DB token for SQL)
  • Performs its own token exchange server-side, controlling which SP the token maps to
  • Mitigates token replay by binding the exchange to the verified JWT role
  • Prevents privilege escalation because the role claim is cryptographically verified

The API server is the only component that both verifies identity AND controls data access.

Token lifetime: 1 hour, not refreshable. Must re-exchange. Scopes: sql + genie + serving (configurable per use case, least privilege).

Decision Guide

When to use which pattern

Internal Databricks users
OBO current_user() = human
External users with their own IDP
FEDERATION IDP JWT → SP token
No user present (automation)
M2M SP client credentials
Per-user audit trail required
OBO UC records human email
Role-based access (6 roles, 100+ users)
FEDERATION role → SP → group
No Databricks accounts for users
FEDERATION SPs act as proxies
Both internal + external users
OBO + FED same tools, same UC

Key insight: All three patterns share the same UC governance, MCP tools, and audit infrastructure. The only difference is how the token reaches your app.

Least Privilege

OAuth scopes: the capability ceiling

Scopes limit what the token can do. UC grants limit what the identity can access. Both enforce independently.

ResourceOAuth ScopeUC GrantEffect
SQL WarehousesqlCAN USEToken + UC must both allow
Genie Spacegenie + dashboards.genieSpace access + tablesAzure needs both scope values
Model ServingservingCAN QUERY on endpointPer-endpoint access control
Vector SearchsqlSELECT on VS indexAccessed via SQL interface
UC ConnectionsqlUSE CONNECTIONExternal API access gate
UC FunctionsqlEXECUTE on functionAgent tool access control

Scopes are the ceiling. A token with only sql scope cannot call Model Serving, even if the identity has CAN QUERY grants. Revoking either layer blocks access.

Configure only what your app needs. Our demo uses sql + genie + serving. Additional scopes (files.files, apps) are available as Databricks expands capabilities.

Service Principals

Account vs workspace, two identifiers

Account-level SP

Created at the Databricks account level. Can be assigned to multiple workspaces. Has a globally unique application_id (UUID).

application_idUUID, globally unique, used in current_user()
display_nameHuman-readable (e.g., fed-exchange-sp-west-sales)
ScopeAccount-wide. Assigned to workspaces.

Workspace-level identity

When assigned to a workspace, the SP gets a workspace-specific principal_id (integer). This is used for workspace-level permissions.

principal_idInteger, workspace-specific, for ACLs
GroupsAdded to workspace groups for UC governance
GrantsUC grants reference the application_id

Common confusion: application_id (UUID) is what current_user() returns and what you use in UC GRANTs. principal_id (integer) is for workspace-level permissions like CAN USE on SQL warehouses. They are not interchangeable.

Federation pattern: Each role maps to a dedicated SP. west_sales role → fed-exchange-sp-west-sales SP → west_sales UC group. The JWT role claim drives which SP receives the exchanged token.

UC Governance

One governance model, all patterns

Unity Catalog enforcement is identical. Only the identity source differs.

Row Filters

Federation: is_member('group')

OBO: current_user()

West sales rep sees only WEST region deals. Same table, different data.

Column Masks

margin_pct masked for non-privileged roles

Finance/exec/admin see real margins. Sales reps see NULL. Enforced at SQL engine.

USE CONNECTION

Controls external API access

One SQL statement to GRANT or REVOKE access to external services per role.

EXECUTE on UC Functions

Controls which identities can invoke UC functions as agent tools. Per-function, per-principal.

CAN QUERY on Serving

Controls access to Model Serving endpoints (supervisor agent, FMAPI). Per-endpoint access control.

All enforced by the SQL engine, not application code. Cannot be bypassed.

Access Control

9 Tools x 6 Roles

Federation uses RBAC matrix + UC governance. OBO uses UC governance only (no matrix needed).

ToolWest SalesEast SalesManagersExecutiveFinanceAdmin
check_identity
get_region_summary✓ WEST✓ EAST✓ ALL✓ ALL✓ ALL✓ ALL
query_sales_data✓ masked✓ masked✓ masked✓ real✓ real✓ real
genie_query
query_knowledge_base
query_audit_log
ask_supervisor
github_search
external_api_call
Production Ready

Security Checklist

No secrets in env vars
UC connections hold external credentials. http_request() injects tokens at runtime.
OAuth scopes enforce least privilege
sql + genie + serving. Never all-apis.
UC governance at SQL engine level
Row filters, column masks, USE CONNECTION, EXECUTE.
Tool-level RBAC (defense in depth)
TOOL_ACCESS matrix on top of UC. Any layer sufficient to deny.
Async audit (never blocks response)
Fire-and-forget background thread. M2M SP credentials for writes.
Tokens never logged
Bearer tokens excluded from logs and error messages.
Federation JWT verified server-side
JWKS signature check. Never trusted from headers alone.
Retry with exponential backoff + jitter
Prevents thundering herd on 429/503. Configurable limits.
Structured logging + MLflow traces
JSON with request_id correlation. 100% tool coverage.
Input validation + HTTPS-only
All tool params validated. External calls require HTTPS.
Summary

Three Patterns, One Governance Model

OBOInternal users. current_user() = human. Per-user row filters and audit trails.
M2MAutomation. SP credentials. Fixed permissions. Background jobs and pipelines.
FEDExternal users. IDP JWT exchanged for role-scoped SP token. IDP-agnostic.

Same UC governance. Same MCP tools. Same audit table.

Auth0. Okta. Entra. Any OIDC provider. Swap the IDP config, keep the architecture.

github.com/bhavink/applied-ai-governance