Identity & Authorization by Resource Type
01 / 22

Identity & Authorization
by Resource Type

How auth works at each Databricks AI component
The Resource Lens

Authorization differs by what you're accessing

The same token — U2M, OBO, or M2M — hits different auth enforcement depending on the resource. Seven resources, seven auth stories.

Serving Endpoints
UC + scopes
💬
Genie
UC + genie scopes
𝑓
UC Functions
UC EXECUTE
🔍
Vector Search
UC SELECT
🔗
UC HTTP Connections
USE CONNECTION
🗃
Tables
Row filters & masks
🐘
Lakebase
PG-native (NOT UC)
Overview

Resource Auth Summary

Resource Recommended Path Identity in UC How Token Arrives AuthZ Model
Serving Endpoints OBO or M2M User or SP Authorization: Bearer UC + scopes
Genie (via App) OBO Calling user X-Forwarded-Access-Token → Genie API UC + genie scopes
Genie (direct) U2M Calling user User's own token UC + genie scopes
UC Functions OBO or M2M User or SP Via SQL execution context UC EXECUTE
Vector Search M2M App SP WorkspaceClient() no-args UC SELECT
UC HTTP Connections M2M + per-user OAuth SP + ext user USE CONNECTION + auth method UC + external IdP
Tables Any Depends on path Via SQL Row filters, column masks
Lakebase M2M App SP (PG role) OAuth token as PG password PG-native (NOT UC)
Serving Endpoints

Serving Endpoints

Agent Bricks and Model Serving

Identity Model

  • OBO via ModelServingUserCredentials()
  • M2M via WorkspaceClient()
  • Token arrives as Authorization: Bearer header
  • current_user() = the token's identity
  • Scope required: model-serving

Key Pattern

Agent Bricks supervisor auto-forwards OBO token to sub-agents (max 20 in chain).

User
Supervisor
Sub-Agent 1
Sub-Agent N

Same OBO token propagates through entire chain. current_user() = human at every hop.

Serving Endpoints

Serving Endpoints — Gotchas

Single App → Serving Endpoint

Works fine. Read X-Forwarded-Access-Token from proxy header, pass as Authorization: Bearer to endpoint.

current_user() inside the endpoint = the human.

Wrong API: ModelServingUserCredentials() only works inside Model Serving. Don't use it in Apps code — use the header instead.

App A → App B (Two-Proxy)

User token is lost. App B's proxy strips App A's forwarded user token and replaces it with App A's SP identity.

Workaround: X-Forwarded-Email survives proxy hops (set by proxy from validated token, cannot be forged). Use email + M2M SQL.

Genie Spaces

Genie Spaces

Natural language to SQL, governed by UC

Identity Model

  • Identity: OBO — user's token forwarded to Genie API
  • Scopes: BOTH dashboards.genie AND genie required
  • current_user() = calling human
  • is_member() = execution service (NOT the human)

Rule: Use current_user() for row filters, never is_member().

Rate Limits

5 queries per minute (sliding window)

Thread Management

  • conversation_id for follow-up questions
  • New conversation for topic changes
  • Genie maintains SQL context within a thread
Genie Spaces

Genie — Gotchas

The Scope Bug

UI only shows dashboards.genie when configuring app integration scopes.

But the API checks for both genie and dashboards.genie.

Missing genie scope →

403: "required scopes: genie"

Fix: Patch via API

PATCH /api/2.0/preview/accounts/
  {account_id}/oidc/
  custom-app-integration/
  {integration_id}

{
  "scopes": [
    "dashboards.genie",
    "genie",
    "sql"
  ]
}

Add genie scope via the account-level API since the UI won't show it.

UC Functions

Unity Catalog Functions

The safe way to expose write operations to agents

Identity Model

  • Identity: inherits from calling context (OBO or M2M)
  • Grant required: EXECUTE on the function
  • current_user() = whoever triggered the SQL

Key Pattern

Wrap INSERT/UPDATE in a UC function. Grant EXECUTE to agent SP.

Agent gets EXECUTE, not raw table writes.

Agent
UC Function
Table

Function enforces business logic + authorization internally. Agent never touches table directly.

UC Functions

UC Functions — Patterns

CREATE FUNCTION prod.tools.approve_deal(
  deal_id STRING,
  approver STRING
)
RETURNS STRING
LANGUAGE SQL
RETURN (
  UPDATE prod.sales.deals
  SET status = 'approved',
      approved_by = approver
  WHERE id = deal_id
    AND current_user() IN (
      SELECT email
      FROM prod.sales.approvers
    )
);

What This Achieves

Agent SP Has EXECUTE on function
Agent SP Does NOT have UPDATE on table
Function Enforces business logic internally
current_user() Checked inside function body

Principle: Functions are the authorization boundary for write operations. Grant EXECUTE, not table-level permissions.

Vector Search

Vector Search

Semantic similarity over governed data

Identity Model

  • Identity: typically M2M (App SP has SELECT on the index)
  • WorkspaceClient() no-args picks up auto-injected SP credentials
  • Scope: vector-search

No native per-user context. Vector Search doesn't have OBO support for per-user filtering.

Per-User Filtering Strategies

Pre-filter Include user_group column in indexed data. Filter in similarity search query.
Post-filter Retrieve top-K, then check user's permissions on each result.
Shared KB M2M is correct — same docs for all users. No filtering needed.
Vector Search

Vector Search — Patterns

Pre-Filter Pattern

# Index includes department column
results = vs_client.query(
  index_name="prod.docs.knowledge_idx",
  query_text=user_query,
  filters={
    "department": user_dept
  },
  num_results=10
)

Filter happens inside the vector search engine. Only matching rows are scored.

Delta Sync

Delta Sync index auto-updates when source table changes. No manual re-indexing needed.

Post-Filter Pattern

Retrieve top-K results, then check each result against user's permissions. Higher latency, but works when permissions are complex or external.

Trade-off: Pre-filter is faster but requires group info in the index. Post-filter is flexible but wastes retrieval budget on filtered-out results.

UC HTTP Connections

UC HTTP Connections

The governed way to call external services

What It Does

Governed connectivity to external APIs: Jira, GitHub, Salesforce, Slack, and more.

  • Four auth methods: Bearer Token, OAuth M2M, OAuth U2M Shared, OAuth U2M Per User
  • Access controlled via: GRANT USE CONNECTION ON CONNECTION <name> TO <principal>
  • Tokens injected by platform, never in code

Why It Matters

Without UC connections, agents either:

  • Hardcode API keys in environment variables (unauditable)
  • Use shared service accounts (no per-user identity)
  • Require custom token management (error-prone)

UC connections give you governed, auditable, rotatable external access — with GRANT/REVOKE at runtime.

UC HTTP Connections

Two-Sided Identity

Who does Databricks see vs. who does the external service see?

Auth Method Databricks Sees External Service Sees Use Case
Bearer Token Caller Shared (one token) Simple integrations, shared API keys
OAuth M2M Caller Shared (app creds) Org-level access
OAuth U2M Shared Caller Shared (one user's) Admin-delegated access
OAuth U2M Per User Caller Per-user ✓ User-specific (Jira as them, GitHub as them)
Managed OAuth Caller Per-user ✓ Google, SharePoint — Databricks handles OAuth

Only U2M Per User and Managed OAuth give true per-user identity on BOTH sides. All others share a single external identity.

UC HTTP Connections

UC HTTP Connections — Gotchas

Watch Out

  • GRANT/REVOKE is runtime — removing USE CONNECTION immediately removes tool access
  • Bearer token is static — rotate manually
  • U2M Per User requires user to have authorized the external app (OAuth consent)
  • Managed OAuth: limited providers (Google, SharePoint)

No Read-Only Scope

There is no way to scope a connection to read-only at the connection level.

The external service must enforce read vs. write permissions. UC controls who can use the connection, not what they can do through it.

Mitigation: Create separate connections with different external credentials (read-only vs. read-write) and GRANT them to different groups.

Tables

Tables — Row Filters & Column Masks

Data-level authorization, enforced at the SQL engine

Row Filters

  • SQL function evaluated at query time, attached to table
  • Fires regardless of access path: notebook, API, Genie, agent
  • Uses current_user() or is_member() to determine access

Column Masks

  • SQL function that transforms column values per user
  • Same trigger rules as row filters
  • Common pattern: mask PII for non-privileged users

ABAC governed tags for attribute-based rules. Tag columns with sensitivity levels, then write filters that check user attributes against tags.

Tables

current_user() vs is_member()

The #1 gotcha in table-level authorization

Access Path current_user() is_member()
Notebook (U2M) Human ✓ Human's groups ✓
App SQL (OBO direct) Human ✓ Human's groups ✓
Genie (OBO) Human ✓ Execution service ✗
Agent Bricks (OBO) Human ✓ Execution service ✗
Background job (M2M) SP UUID SP's groups ✓

Rule: Always use current_user()

It returns the correct identity in every access path. is_member() breaks in Genie and Agent Bricks OBO because the execution context is a service, not the user.

Lakebase

Lakebase — PG-Native Authorization

The ONLY resource that does NOT use UC for authorization

The Exception

  • Uses PostgreSQL-native authorization: roles, GRANT, RLS policies
  • AuthN: Databricks OAuth (token as PG password) or native PG roles + passwords
  • PG role per SP: app's SP client ID becomes the PG role name
NOT Unity Catalog

Two AuthZ Paths

Direct PG connection (apps, psql): PG roles + GRANT + RLS govern access.

Via lakehouse (notebooks, DBSQL, federation): UC policies apply, not PG grants.

Implication: Same data, different auth model depending on how you access it.

Lakebase

Lakebase — Roles & Apps Integration

Pre-Created Roles

Instance ownerLOGIN, CREATEDB, CREATEROLE, BYPASSRLS
databricks_superuserNOLOGIN; inherits pg_read_all_data + pg_write_all_data + pg_monitor

System roles (databricks_control_plane, databricks_monitor, databricks_writer_*, databricks_reader_*, databricks_gateway) are auto-created. Do not modify.

App SP → PG Role

Adding Lakebase as app resource auto-creates a PG role = SP client ID.

PGHOSTAuto-injected
PGUSERSP client ID = PG role
PGDATABASEAuto-injected

Role gets CONNECT + CREATE. Additional grants (SELECT, INSERT) must be added manually per-table.

Token refresh: @databricks/lakebase (auto) or SDK generate_database_credential() (manual).

Lakebase

Lakebase — UC Registration & Query Federation

Register in Unity Catalog

Creates a read-only UC catalog mirroring PG database structure. Enables browsing in Catalog Explorer, lineage tracking, and audit logs.

w.postgres.create_catalog(
  catalog=Catalog(spec=CatalogCatalogSpec(
    postgres_database="mydb",
    branch="projects/.../production",
  )),
  catalog_id="my-catalog",
).wait()

Prereqs: CREATE CATALOG on metastore + serverless SQL warehouse.

Cross-Source Queries

-- Join Lakebase + Delta
SELECT c.conversation_id,
       u.subscription_tier
FROM lakebase_catalog.public
       .conversations c
JOIN main.analytics.users u
  ON c.user_id = u.user_id;

Read-only: Cannot modify Lakebase data through UC queries. One database per catalog. Branch-bound.

Max 20 synced tables per source table. Metadata caching — new PG objects may not appear immediately.

Lakebase

Lakebase — Lakehouse Sync & Data API

Lakehouse Sync (Postgres → Delta)

Continuous CDC via wal2delta extension. SCD Type 2 history — every insert, update, delete preserved.

-- Step 1: Required before sync
ALTER TABLE my_table
  REPLICA IDENTITY FULL;

-- Destination: lb_<table>_history
-- Columns: _change_type,
-- _timestamp, _lsn, _xid

Gotchas: Partitioned tables unsupported. Schema changes break sync. Re-enable after disable = data loss (no re-snapshot). pgvector/PostGIS types unsupported.

Data API (PostgREST)

GETSELECT
POSTINSERT
PATCHUPDATE
DELETEDELETE

Single “authenticator” PG role + RLS policies.

-- RLS per user
CREATE POLICY user_data
  ON tasks USING (
    user_id = current_setting(
      'request.jwt.claims'
    )::json->>'sub'
  );

Lakebase governs itself (PG) — UC governs everything else. But UC Registration + Lakehouse Sync bridge the two worlds.

Comparison

All Seven Resources

Resource AuthZ Model Per-User? Key Gotcha
Serving Endpoints UC + scopes OBO: yes ModelServingUserCredentials only in Serving context
Genie UC + genie scopes Yes (OBO) Needs BOTH genie + dashboards.genie scopes
UC Functions UC EXECUTE Inherits context Safe write pattern for agents
Vector Search UC SELECT Pre/post filter No native per-user context
UC HTTP Connections USE CONNECTION Per User OAuth Two-sided identity problem
Tables Row filters / masks current_user() is_member() fails in Genie/Agent Bricks OBO
Lakebase PG GRANT + RLS PG RLS NOT UC — only PG-native exception
Decision Guide

Which Authorization Path?

Does your resource use UC for authorization?
Yes — 6 resources
Standard pattern: SP grants, row filters, OAuth scopes
No — Lakebase only
PG-native: roles, GRANT, RLS policies
Does the user need per-user identity at the external service?
Yes
UC Connection with U2M Per User or Managed OAuth
No
Bearer or M2M OAuth is fine
Do agents need write access to tables?
Yes
Wrap in UC Functions — grant EXECUTE, not table writes
No (read only)
Row filters + column masks on the table directly
Summary

Seven resources. Six use UC.
One uses PG-native.

1Authorization is at the resource, not at the app.
2Use current_user() everywhere. It works in every path.
3Lakebase is the exception — PG GRANT + RLS, not UC.

github.com/bhavink/applied-ai-governance