Identity & Governance for AI on Databricks
01 / 18

Identity & Governance
for AI on Databricks

Two patterns. One governance model. Securing AI applications at scale.
The Challenge

The business wants an AI platform. The requirements are not simple.

Business users need to ask questions in plain English and get governed answers from live data. Knowledge workers need to search across documents, policies, and institutional memory using natural language. Multiple AI agents need to collaborate: one retrieves data, another reasons over it, a third takes action, all coordinated by a supervisor. And a tool layer must connect these agents to your data, APIs, and business logic.

Internal teams will use it daily. But the harder ask is already on the table: "Can our partners, customers, and regulated subsidiaries access the same AI capabilities through their own portals, authenticated by their own identity providers, without ever getting accounts on our platform?"

Same data. Same AI tools. Same governance guarantees. Two completely different identity worlds.

To build this on Databricks, you will use Genie for natural language analytics, Vector Search for knowledge retrieval, Agent Bricks for multi-agent orchestration, MCP servers for tool governance, Unity Catalog for access control, and AI Gateway for model traffic management. This series shows you how to govern all of it.

Context

Databricks already supports IdPs natively

Authentication to Databricks workspaces uses your cloud IdP by default.

Azure

Entra ID (customer tenant)

Default. Users authenticate via Entra. SSO, MFA, Conditional Access all work.

GCP

Cloud Identity

Google Workspace or Cloud Identity federation. Native SSO.

AWS

Bring Your Own IdP

Okta, Entra, Ping, any SAML/OIDC provider via account-level SSO config.

Apps ON Databricks → OBO

Users are already authenticated via the workspace IdP. Databricks Apps proxy injects the user's OBO token. current_user() = human email. No extra IdP integration needed.

Apps OUTSIDE Databricks → Federation

An external app (partner portal, customer SaaS, internal tool on separate infra) authenticates users with its own IdP. These users don't have Databricks accounts. Federation Exchange bridges the external IdP to Databricks via token exchange (RFC 8693).

The Two Worlds

Same AI tools, different identity boundaries

World 1: Internal

Sales reps, analysts, executives — they have Databricks accounts via their workspace IdP. They use Streamlit apps, agent tools, dashboards. Identity is known and verified.

current_user() = human email

World 2: External

Partners, customers, vendors — they authenticate via their own IdP (Auth0, Okta, Entra as a separate tenant). No Databricks accounts. Identity must be mapped to role-based service principals.

IDP JWT → role-based SP → UC groups

Both need the same thing: governed access to SQL, Genie, Vector Search, Serving endpoints, external APIs — with row-level security, column masking, audit trails, and least-privilege scopes. Same MCP tools, same UC governance, same audit table.

Identity Models

Three token patterns: OBO, M2M, and Federation

OBO (U2M)

User-to-Machine. The platform acts on behalf of a known human.

TokenUser's scoped OAuth token via Apps proxy
Identitycurrent_user() = human email
UC fires asThe individual user
Audit trailFull per-user attribution
Best for: Internal apps where users have Databricks accounts. Streamlit dashboards, agent tools, interactive analytics.
DB Integrations: Databricks Apps (auto-proxy), SQL Warehouses, Genie Spaces, Model Serving, Agent Bricks Supervisor, Vector Search, UC Functions
M2M

Machine-to-Machine. A service principal authenticates with its own credentials. No human in the loop.

TokenSP OAuth client credentials grant
Identitycurrent_user() = SP application ID
UC fires asThe service principal
Audit trailSP-level only (no human attribution)
Best for: Background jobs, scheduled pipelines, CI/CD, async audit writes, health checks. Anything where no user is present.
DB Integrations: Databricks SDK, Jobs/Workflows, DLT Pipelines, DBSQL Connectors, REST APIs, UC Connections (bearer token), Repos/Git integration
FEDERATION

External IDP token exchanged for a role-based SP token. Users don't have Databricks accounts.

TokenIDP JWT exchanged via RFC 8693
IdentityRole-based SP (mapped from IDP group)
UC fires asis_member() per SP group
Audit trailSP-level + app-layer human attribution
Best for: Partner portals, customer-facing apps, multi-tenant SaaS. Users authenticate via their own IDP.
DB Integrations: Token Exchange API (RFC 8693), Federation Policies, Databricks Apps (as MCP backend), SQL Warehouses, Genie Spaces, Model Serving

Key distinction: OBO propagates the human's identity through every layer. M2M uses service credentials with no human context. Federation bridges an external identity to a role-based SP. Most production apps combine OBO + M2M (user-facing calls use OBO, background tasks use M2M).

Two Patterns

Same governance, different identity models

OBO

On-Behalf-Of

IdentityDatabricks user (human email)
Tokenx-forwarded-access-token via Apps proxy
UC fires ascurrent_user() = individual
Scopessql genie serving
Best forInternal apps, Streamlit dashboards, agent tools
FEDERATION

Federation Exchange

IdentityExternal IDP user → role-based SP
TokenToken Exchange (RFC 8693) → SP token
UC fires asis_member() per SP group
Scopessql genie serving
Best forPartner portals, customer-facing apps, multi-tenant
Decision Guide

When to use which pattern

Internal Databricks users
OBO current_user() = human
External users with existing IDP
FEDERATION IDP JWT → SP token
Per-user audit trail in UC
OBO UC records human email
Role-based access (6 roles, 100+ users)
FEDERATION role → SP → group
No Databricks accounts for users
FEDERATION SPs act as proxies
Both internal + external users
OBO + FED same MCP, same UC

Key insight: Both patterns share the same UC governance, MCP tools, and audit infrastructure. The only difference is how the token gets to the MCP server.

Architecture

OBO Pattern — Identity Flow

User
Databricks Userbrowser OAuth
Platform
Databricks Appsauthenticates & forwards
Your Code
MCP Serverreceives user token
Platform
UC Governancefires as human

What does "Databricks Apps" do here?

When you deploy an app on Databricks, the platform handles authentication for you automatically. Before any request reaches your code, Databricks:

  1. Authenticates the user via their workspace IdP (Entra, Cloud Identity, or your SSO)
  2. Creates a scoped token limited to the permissions you configured (sql, genie, serving)
  3. Forwards the request to your app with the user's identity in HTTP headers

Your app never handles login flows, passwords, or OAuth redirects. It just reads the token from x-forwarded-access-token and uses it. The platform does the rest.

What your app code does

  • Reads x-forwarded-access-token header
  • Uses that token for SQL, Genie, Serving calls
  • current_user() = the human's email
  • UC row filters + column masks fire automatically

Scopes (configured in App UI)

sqlSQL warehouse access
genieGenie Conversation API
dashboards.genieGenie (Azure requires both)
servingModel Serving endpoints
files.filesFiles & directories
appsCall other Databricks Apps

Our demo uses sql + genie + serving. You configure only what your app needs — the list of available scopes grows as Databricks adds capabilities.

Architecture

Federation Pattern — Identity Flow

Customer
External UserAuth0 / Okta / Entra
Customer
Your App Serververifies identity
Platform
Token ExchangeIDP JWT → SP token
Databricks App
MCP Serveryour tools
Platform
UC Governancefires per SP group

When is this needed?

When you have an app outside Databricks (a partner portal, a customer-facing SaaS, an internal tool on separate infrastructure) that needs to call Databricks services. The users authenticate with their own IdP — not the workspace IdP. They don't have Databricks accounts.

Your app server verifies the user's identity via their IdP JWT, then exchanges that JWT for a Databricks service principal token. The SP token is scoped and short-lived. Databricks never sees or trusts the external IdP directly — the token exchange is the bridge.

Your app server's job

  • Verify the user's IdP JWT (signature, expiry, claims)
  • Map user role → Databricks SP via Federation Policy
  • Exchange: IdP JWT → scoped Databricks SP token
  • Forward SP token + caller metadata to MCP server

Token Exchange (one API call)

POST /oidc/v1/token
grant_type=urn:ietf:params:oauth
  :grant-type:token-exchange
subject_token=<IDP JWT>
scope=sql genie serving

Scopes are configurable per use case. Request only what the app needs.

UC Governance

One governance model, both patterns

Unity Catalog enforcement is identical — only the identity source differs.

Row Filters

Federation: is_member('group')

OBO: current_user()

West sales rep sees only WEST deals. Same table, different data.

Column Masks

margin_pct masked for non-privileged roles

Finance/exec/admin see real margins. Sales reps see NULL.

USE CONNECTION

Controls external API access

GRANT/REVOKE on UC connection = GRANT/REVOKE GitHub access per role.

EXECUTE on UC Functions

Controls which identities can invoke UC functions as agent tools.

CAN QUERY on Serving

Controls access to Model Serving endpoints (supervisor agent, FMAPI).

All enforced by the SQL engine, not application code.

Least Privilege

Scope-Based Access Model

ResourceOAuth ScopeUC GrantEnforcement
SQL WarehousesqlCAN USEToken + UC
Genie Spacegenie + dashboards.genieSpace access + underlying tablesToken + UC
Model ServingservingCAN QUERY on endpointToken + UC
Vector SearchsqlSELECT on VS indexToken + UC
UC ConnectionsqlUSE CONNECTIONToken + UC
UC FunctionsqlEXECUTE on functionToken + UC
Audit TablesqlSELECT on tableToken + UC

Key insight: Scopes limit what the token can do. Grants limit what the identity can access. Both enforce independently. Revoking either blocks access.

This table shows scopes used in our demo. Databricks supports additional scopes (files.files, apps, and more as capabilities expand). You configure only what your app needs — principle of least privilege.

Custom MCP

MCP as the Governance Gateway

Both patterns use identical MCP servers — 9 tools, same production patterns.

9 Tools

check_identityAuth chain + current_user()
get_region_summarySales by region (row-filtered)
query_sales_dataTop deals (masked margins)
query_audit_logAudit trail (restricted roles)
genie_queryNatural language SQL
query_knowledge_baseVector Search
ask_supervisorMulti-agent orchestrator
github_searchGitHub via UC Connection (github_bearer_token)
external_api_callExternal API via UC Connection (github_bearer_token)

Production Patterns

  • Connection pooling — httpx (standard 35s + long-running 120s)
  • Retry — exponential backoff + jitter (429/500/502/503/504)
  • Rate limiting — Genie QPM limit (currently 5, subject to change)
  • Async audit — fire-and-forget, never blocks response
  • Structured logging — JSON, request_id correlation
  • MLflow tracing — 100% tool coverage
  • UC connectionshttp_request() SQL fn for external APIs
  • Input validation — all tool parameters checked
  • Health check/health validates all dependencies
Genie Integration

Conversation Threading

Multi-Turn Flow

User: "total revenue by region"
Genie: SELECT region, SUM(amount) ... 3 rows
User: "now show top 5 by amount"
Genie: WITH revenue_by_region AS (...) contextual
User: "show me the thing"
Genie: "Please specify what you mean by 'the thing'..." CLARIFICATION

Implementation

conversation_idPersisted per session. New → /start-conversation. Follow-up → /conversations/{id}/messages
ClarificationTEXT_ATTACHMENT → return genie_message + conversation_id for follow-up
Rate limitConfigurable sliding window (currently 5 QPM*). Returns retry_after_ms
PollingAdaptive: 2s → 5s cap, 60s timeout (new), 40s (follow-up)
ResultsCapped at 50 rows + truncated flag + total_rows

* Genie QPM limit is a Databricks platform constraint as of March 2026. This limit may increase as the service scales. The MCP server's rate limiter is configurable — update _GenieLimiter(max_qpm=N) when the platform limit changes.

Observability

Three-Layer Observability Stack

MLflow Traces

Every tool call traced.

Tags: service_name, tool, caller.email, caller.role, request_id, latency_ms

Spans: auth, sql, external_api, audit

100% COVERAGE

Audit Table

Async fire-and-forget.

Columns: request_id, email, role, tool, status, error_code, latency_ms

Status: success, error, access_denied

NEVER BLOCKS

Lakeview Dashboard

4-page AI/BI dashboard.

Pages: Overview, Access Control, Performance, Identity & Audit

Queries audit table directly.

LIVE

All three layers are queryable, governed by UC, and work identically for both OBO and Federation patterns.

Access Control

9 Tools × 6 Roles

ToolWest SalesEast SalesManagersExecutiveFinanceAdmin
check_identity
get_region_summary✓ WEST✓ EAST✓ ALL✓ ALL✓ ALL✓ ALL
query_sales_data✓ masked✓ masked✓ masked✓ real✓ real✓ real
genie_query
query_knowledge_base
query_audit_log
ask_supervisor
github_search
external_api_call

Federation: tool-level RBAC (TOOL_ACCESS matrix) + UC governance. OBO: UC governance only (no tool matrix needed — UC handles per individual).

Agent Bricks

Supervisor Agent Integration

Supervisor AgentModel Serving endpoint
UC Connectionbearer token auth
Custom MCP9 tools

How it works

  • Register MCP via UC Connection (bearer token auth)
  • Add as "External MCP Server" sub-agent in Supervisor UI
  • Supervisor uses OBO — calling user's identity propagates
  • Per-user permission checks fire at runtime
  • Max 20 sub-agents per supervisor

Access control lever

-- Grant access
GRANT USE CONNECTION
  ON CONNECTION mcp_federation
  TO `role-sp-uuid`;

-- Revoke access
REVOKE USE CONNECTION
  ON CONNECTION mcp_federation
  FROM `role-sp-uuid`;

Revoking removes the MCP from the user's supervisor experience at runtime.

Production Ready

Security Checklist

No secrets in env vars
UC connections hold external credentials. http_request() injects tokens.
OAuth scopes enforce least privilege
sql + genie + serving. No all-apis.
UC governance at SQL engine level
Row filters, column masks, USE CONNECTION, EXECUTE.
Tool-level RBAC (defense in depth)
TOOL_ACCESS matrix on top of UC. Any layer sufficient to deny.
Rate limiting
Genie QPM sliding window (currently 5*). Connection pool limits. Configurable as limits evolve.
Async audit (never blocks)
Fire-and-forget background thread. M2M SP credentials.
Structured logging
JSON with request_id correlation. Every request traceable.
Input validation + HTTPS-only
All tool params validated. External calls require HTTPS.
Retry with jitter
Exponential backoff prevents thundering herd on 429/503.
Tokens never logged
Bearer tokens excluded from logs and error messages.
Summary

What We Built

2Custom MCP servers (Federation + OBO), 9 tools each, production-ready
1Governance model (UC row filters, column masks, connections, scopes)
1Observability stack (MLflow traces + audit table + Lakeview dashboard)
Genie conversation threading with clarification handling

IDP-agnostic. Plug in any IDP, get governed AI tools.

Auth0. Okta. Entra. Same architecture — swap the IDP config.