// AI Governance on Databricks
01 / 20

Deck 01

AI Governance
on Databricks

The Complete Picture

Identity, Authorization, and Tool Governance for Enterprise AI

The Challenge

Every enterprise wants the same thing

Business users

Ask plain-English questions, get governed answers. Search docs. Build dashboards.

AI agents

Multiple agents collaborate on complex tasks. Partners and customers need the same capabilities through their own portals.

Same data. Same AI. Same governance.
Two identity worlds: the human's identity, and the application's identity.

Authentication

One Authentication Layer

Databricks has one authorization layer — Unity Catalog. Authentication can come from multiple sources: the workspace IdP for internal users, or external IdPs via token federation for partners and customers. All paths converge on UC, which enforces governance regardless of how the token was obtained.

Azure

Entra ID

AWS

BYO IdP (Okta, Ping, any SAML/OIDC)

GCP

Cloud Identity or BYO IdP

Multiple authentication paths (workspace IdP, federation, M2M credentials) — but one authorization engine. Unity Catalog is the single trust boundary.

Token Architecture

Three Token Paths
Not Three Auth Systems

U2M, OBO, and M2M are token acquisition paths, not different authentication systems.

PathWho AuthenticatesIdentity in UCToken Acquired By
U2MThe human (directly)The humanHuman's client
OBOThe human (via app)The humanApp, forwarding token
M2MService PrincipalThe SPThe application itself
Single IdP
U2M
OBO
M2M

ELI5

The Restaurant Analogy

One restaurant (Databricks). One ID-check at the door (your company IdP). Same bouncer for everyone.

U2M

You walk in yourself, show your badge at the door, sit down and order. The kitchen checks your allergy list, serves your food.

You're the one at the table.

OBO

You're in a meeting, so you send your assistant with your badge. Same door, same bouncer. Kitchen checks your allergy list.

Assistant carries the tray. Never shows their own badge.

M2M

Your company has a catering account. The catering bot shows the company badge. Standard menu, same meal for everyone.

Doesn't matter who placed the order.

The door (IdP) is always the same. The kitchen (UC) always checks the badge that was presented. The only difference is whose badge gets shown.

Decision Framework

Who should UC see as the identity?

THE ACTUAL HUMAN — Does the human call Databricks directly?
Yes → U2M
No, through an app → OBO
THE APPLICATION (SP) — Shared data access, background jobs, RAG pipelines
M2M
Start with the identity question. Everything else follows from the answer.

Resource Model

The Resource Lens

Authorization differs by what you're accessing, not where the app runs.

1
Serving Endpoints
Agent Bricks
2
Genie Spaces
3
UC Functions
4
Vector Search
5
UC HTTP Connections
6
Tables
Row Filters, Column Masks
7
Lakebase
PG-native exception

Authorization Matrix

Resource Auth Matrix

ResourceRecommended PathIdentity in UCAuthZ Model
Serving EndpointsOBO or M2MUser or SPUC + OAuth scopes
GenieOBOCalling userUC + genie scopes
UC FunctionsOBO or M2MUser or SPUC EXECUTE
Vector SearchM2MApp SPUC SELECT
UC HTTP ConnectionsM2M + per-user OAuthSP + external userUSE CONNECTION
TablesAnyDepends on pathRow filters, column masks
LakebaseM2MApp SP (PG role)PG-native (GRANT, RLS) — NOT UC

Defense in Depth

The Six Enforcement Layers

6Audit & Observabilitysystem.access.audit + MLflow traces
5Execution BoundaryModel Serving, Apps, serverless
4Outbound ControlUC Connections, network policies
3Data GovernanceRow Filters, Column Masks, ABAC
2Permission ModelUC privileges, least-privilege SP grants
1IdentityAgent SP, User OBO, Token federation
Never accept a design that relies on a single layer.

Common Pitfall

current_user() vs is_member()

The #1 source of auth bugs in AI apps.

Pathcurrent_user() Returnsis_member() Evaluates
U2MHuman emailHuman's groups
OBO (direct SQL)Human emailHuman's groups
OBO (via Genie / Agent Bricks)Human emailExecution service identity
M2MSP UUIDSP's groups
Universal rule: Use current_user() for row filters. It works correctly in every path.

External Identity

The Two-Sided Identity Problem

U2M / OBO / M2M answer "who does Databricks see?" External connections add: "who does the external service see?"

Connection Auth MethodDatabricks SeesExternal Service Sees
Bearer TokenCallerShared
OAuth M2MCallerShared
OAuth U2M SharedCallerShared
OAuth U2M Per UserCallerPer-user ✓
Managed OAuthCallerPer-user ✓

Federation

Token Federation

Any app with a trusted IdP JWT can exchange it for a Databricks token via federation. No secrets.

Account Token Federation

Users & SPs. Requires SCIM sync. 5 issuer limit per account.

Workload Identity Federation

CI/CD pipelines. Per-SP binding. Unlimited issuers. Completely secretless.

Scopes

OAuth Scopes

OperationRequired Scope
Geniedashboards.genie + genie both required
Agent Bricks / Servingmodel-serving
SQLsql
UC / External MCPunity-catalog
Vector Searchvector-search
Refresh tokensoffline_access
Scopes enforce least privilege at the token level. Request only what you need.

Exception

Lakebase — The Exception

Lakebase uses PG-native authorization (roles, GRANT, RLS), not UC.

AuthN

Databricks OAuth (token as PG password) or native PG roles

AuthZ

PG GRANT + RLS policies. Not Unity Catalog.

Roles

Instance owner (LOGIN, CREATEDB, CREATEROLE). App SP → auto-created PG role. System roles for sync/monitoring.

UC Registration + Lakehouse Sync

Register → read-only UC catalog for cross-source queries. Lakehouse Sync → continuous CDC to Delta (SCD2 history via wal2delta).

PG-native authZ for direct connections. But UC Registration + Lakehouse Sync bridge Lakebase data into UC governance.

Agent Pattern

Agent Architecture Pattern

How agents are governed — mapped to the six enforcement layers.

  1. Identity (Layer 1) — Agent gets a dedicated Service Principal
  2. Permissions (Layer 2) — SP has explicit UC grants — zero default access
  3. Data Governance (Layer 3) — Row filters fire on every data access
  4. Outbound Control (Layer 4) — External calls via UC Connections with USE CONNECTION
  5. Execution Boundary (Layer 5) — Runs in Model Serving / Apps sandbox
  6. Audit (Layer 6) — All traced via MLflow + system.access.audit

Checklist

OBO Prerequisites

Checklist

M2M Prerequisites

Security

Security Checklist

  • No secrets in env vars (UC connections hold credentials)
  • OAuth scopes enforce least privilege
  • UC governance at SQL engine level
  • Tool-level RBAC (defense in depth)
  • Rate limiting (Genie 5 QPM, connection pools)
  • Async audit (never blocks request path)
  • Structured JSON logging with request_id
  • Input validation + HTTPS-only
  • Retry with jitter (429 / 503)
  • Tokens never logged

Differentiation

What's Different About This Approach

Identity, not secrets

No PATs, no static credentials. Federation + OAuth everywhere.

Authorization at the resource

UC grants per resource type, not blanket access.

Audit is automatic

system.access.audit captures every API call. MLflow traces every agent step.

Implementation

Getting Started

  1. Configure account-level IdP (Entra ID / BYO SAML/OIDC)
  2. Enable SCIM sync for users and groups
  3. Create Service Principals per capability boundary
  4. Set UC grants with least privilege
  5. Configure UC Connections for external services
  6. Deploy with OBO for user-facing, M2M for background
  7. Monitor via system.access.audit + MLflow

Summary

One IdP.
Three Token Paths.
Six Enforcement Layers.

Same governance model — whether the user is internal or external, whether the app runs on Databricks or outside.

Keyboard Shortcuts

or SpaceNext slide
Previous slide
HomeFirst slide
EndLast slide
?Toggle this help