| → / Space | Next slide |
| ← | Previous slide |
| Home | First slide |
| End | Last slide |
| Swipe | Touch navigation |
Business users need to ask questions in plain English and get governed answers from live data. Knowledge workers need to search across documents, policies, and institutional memory using natural language. Multiple AI agents need to collaborate: one retrieves data, another reasons over it, a third takes action, all coordinated by a supervisor. And a tool layer must connect these agents to your data, APIs, and business logic.
Internal teams will use it daily. But the harder ask is already on the table: "Can our partners, customers, and regulated subsidiaries access the same AI capabilities through their own portals, authenticated by their own identity providers, without ever getting accounts on our platform?"
Same data. Same AI tools. Same governance guarantees. Two completely different identity worlds.
To build this on Databricks, you will use Genie for natural language analytics, Vector Search for knowledge retrieval, Agent Bricks for multi-agent orchestration, MCP servers for tool governance, Unity Catalog for access control, and AI Gateway for model traffic management. This series shows you how to govern all of it.
Authentication to Databricks workspaces uses your cloud IdP by default.
Entra ID (customer tenant)
Default. Users authenticate via Entra. SSO, MFA, Conditional Access all work.
Cloud Identity
Google Workspace or Cloud Identity federation. Native SSO.
Bring Your Own IdP
Okta, Entra, Ping, any SAML/OIDC provider via account-level SSO config.
Users are already authenticated via the workspace IdP. Databricks Apps proxy injects the user's OBO token. current_user() = human email. No extra IdP integration needed.
An external app (partner portal, customer SaaS, internal tool on separate infra) authenticates users with its own IdP. These users don't have Databricks accounts. Federation Exchange bridges the external IdP to Databricks via token exchange (RFC 8693).
Sales reps, analysts, executives — they have Databricks accounts via their workspace IdP. They use Streamlit apps, agent tools, dashboards. Identity is known and verified.
current_user() = human email
Partners, customers, vendors — they authenticate via their own IdP (Auth0, Okta, Entra as a separate tenant). No Databricks accounts. Identity must be mapped to role-based service principals.
IDP JWT → role-based SP → UC groups
Both need the same thing: governed access to SQL, Genie, Vector Search, Serving endpoints, external APIs — with row-level security, column masking, audit trails, and least-privilege scopes. Same MCP tools, same UC governance, same audit table.
User-to-Machine. The platform acts on behalf of a known human.
| Token | User's scoped OAuth token via Apps proxy |
| Identity | current_user() = human email |
| UC fires as | The individual user |
| Audit trail | Full per-user attribution |
Machine-to-Machine. A service principal authenticates with its own credentials. No human in the loop.
| Token | SP OAuth client credentials grant |
| Identity | current_user() = SP application ID |
| UC fires as | The service principal |
| Audit trail | SP-level only (no human attribution) |
External IDP token exchanged for a role-based SP token. Users don't have Databricks accounts.
| Token | IDP JWT exchanged via RFC 8693 |
| Identity | Role-based SP (mapped from IDP group) |
| UC fires as | is_member() per SP group |
| Audit trail | SP-level + app-layer human attribution |
Key distinction: OBO propagates the human's identity through every layer. M2M uses service credentials with no human context. Federation bridges an external identity to a role-based SP. Most production apps combine OBO + M2M (user-facing calls use OBO, background tasks use M2M).
| Identity | Databricks user (human email) |
| Token | x-forwarded-access-token via Apps proxy |
| UC fires as | current_user() = individual |
| Scopes | sql genie serving |
| Best for | Internal apps, Streamlit dashboards, agent tools |
| Identity | External IDP user → role-based SP |
| Token | Token Exchange (RFC 8693) → SP token |
| UC fires as | is_member() per SP group |
| Scopes | sql genie serving |
| Best for | Partner portals, customer-facing apps, multi-tenant |
Key insight: Both patterns share the same UC governance, MCP tools, and audit infrastructure. The only difference is how the token gets to the MCP server.
When you deploy an app on Databricks, the platform handles authentication for you automatically. Before any request reaches your code, Databricks:
Your app never handles login flows, passwords, or OAuth redirects. It just reads the token from x-forwarded-access-token and uses it. The platform does the rest.
x-forwarded-access-token headercurrent_user() = the human's emailsql | SQL warehouse access |
genie | Genie Conversation API |
dashboards.genie | Genie (Azure requires both) |
serving | Model Serving endpoints |
files.files | Files & directories |
apps | Call other Databricks Apps |
Our demo uses sql + genie + serving. You configure only what your app needs — the list of available scopes grows as Databricks adds capabilities.
When you have an app outside Databricks (a partner portal, a customer-facing SaaS, an internal tool on separate infrastructure) that needs to call Databricks services. The users authenticate with their own IdP — not the workspace IdP. They don't have Databricks accounts.
Your app server verifies the user's identity via their IdP JWT, then exchanges that JWT for a Databricks service principal token. The SP token is scoped and short-lived. Databricks never sees or trusts the external IdP directly — the token exchange is the bridge.
POST /oidc/v1/token
grant_type=urn:ietf:params:oauth
:grant-type:token-exchange
subject_token=<IDP JWT>
scope=sql genie serving
Scopes are configurable per use case. Request only what the app needs.
Unity Catalog enforcement is identical — only the identity source differs.
Federation: is_member('group')
OBO: current_user()
West sales rep sees only WEST deals. Same table, different data.
margin_pct masked for non-privileged roles
Finance/exec/admin see real margins. Sales reps see NULL.
Controls external API access
GRANT/REVOKE on UC connection = GRANT/REVOKE GitHub access per role.
Controls which identities can invoke UC functions as agent tools.
Controls access to Model Serving endpoints (supervisor agent, FMAPI).
All enforced by the SQL engine, not application code.
| Resource | OAuth Scope | UC Grant | Enforcement |
|---|---|---|---|
| SQL Warehouse | sql | CAN USE | Token + UC |
| Genie Space | genie + dashboards.genie | Space access + underlying tables | Token + UC |
| Model Serving | serving | CAN QUERY on endpoint | Token + UC |
| Vector Search | sql | SELECT on VS index | Token + UC |
| UC Connection | sql | USE CONNECTION | Token + UC |
| UC Function | sql | EXECUTE on function | Token + UC |
| Audit Table | sql | SELECT on table | Token + UC |
Key insight: Scopes limit what the token can do. Grants limit what the identity can access. Both enforce independently. Revoking either blocks access.
This table shows scopes used in our demo. Databricks supports additional scopes (files.files, apps, and more as capabilities expand). You configure only what your app needs — principle of least privilege.
Both patterns use identical MCP servers — 9 tools, same production patterns.
check_identity | Auth chain + current_user() |
get_region_summary | Sales by region (row-filtered) |
query_sales_data | Top deals (masked margins) |
query_audit_log | Audit trail (restricted roles) |
genie_query | Natural language SQL |
query_knowledge_base | Vector Search |
ask_supervisor | Multi-agent orchestrator |
github_search | GitHub via UC Connection (github_bearer_token) |
external_api_call | External API via UC Connection (github_bearer_token) |
http_request() SQL fn for external APIs/health validates all dependencies| conversation_id | Persisted per session. New → /start-conversation. Follow-up → /conversations/{id}/messages |
| Clarification | TEXT_ATTACHMENT → return genie_message + conversation_id for follow-up |
| Rate limit | Configurable sliding window (currently 5 QPM*). Returns retry_after_ms |
| Polling | Adaptive: 2s → 5s cap, 60s timeout (new), 40s (follow-up) |
| Results | Capped at 50 rows + truncated flag + total_rows |
* Genie QPM limit is a Databricks platform constraint as of March 2026. This limit may increase as the service scales. The MCP server's rate limiter is configurable — update _GenieLimiter(max_qpm=N) when the platform limit changes.
Every tool call traced.
Tags: service_name, tool, caller.email, caller.role, request_id, latency_ms
Spans: auth, sql, external_api, audit
100% COVERAGE
Async fire-and-forget.
Columns: request_id, email, role, tool, status, error_code, latency_ms
Status: success, error, access_denied
NEVER BLOCKS
4-page AI/BI dashboard.
Pages: Overview, Access Control, Performance, Identity & Audit
Queries audit table directly.
LIVE
All three layers are queryable, governed by UC, and work identically for both OBO and Federation patterns.
| Tool | West Sales | East Sales | Managers | Executive | Finance | Admin |
|---|---|---|---|---|---|---|
| check_identity | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| get_region_summary | ✓ WEST | ✓ EAST | ✓ ALL | ✓ ALL | ✓ ALL | ✓ ALL |
| query_sales_data | ✓ masked | ✓ masked | ✓ masked | ✓ real | ✓ real | ✓ real |
| genie_query | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| query_knowledge_base | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| query_audit_log | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ |
| ask_supervisor | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| github_search | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
| external_api_call | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ |
Federation: tool-level RBAC (TOOL_ACCESS matrix) + UC governance. OBO: UC governance only (no tool matrix needed — UC handles per individual).
-- Grant access GRANT USE CONNECTION ON CONNECTION mcp_federation TO `role-sp-uuid`; -- Revoke access REVOKE USE CONNECTION ON CONNECTION mcp_federation FROM `role-sp-uuid`;
Revoking removes the MCP from the user's supervisor experience at runtime.
| 2 | Custom MCP servers (Federation + OBO), 9 tools each, production-ready |
| 1 | Governance model (UC row filters, column masks, connections, scopes) |
| 1 | Observability stack (MLflow traces + audit table + Lakeview dashboard) |
| ∞ | Genie conversation threading with clarification handling |
IDP-agnostic. Plug in any IDP, get governed AI tools.
Auth0. Okta. Entra. Same architecture — swap the IDP config.