Databricks · AI Governance

AI Gateway
Patterns

When to use Databricks AI Gateway, external API gateways, UC Connections — and when to use no gateway at all.

Pattern 1 — No Gateway Pattern 2 — AI Gateway Pattern 3 — UC Connections Pattern 4 — External Gateway

↑↓ arrow keys or Space to navigate

01 / 14

The Challenge

Your AI platform calls external models. Who controls that traffic?

Agents, knowledge retrieval, and orchestration workflows all call LLM endpoints, both internal foundation models and external providers like OpenAI or Anthropic. Each call carries cost, latency, and compliance implications. Without a control point, there is no rate limiting, no usage tracking, no guardrails, and no fallback routing.

But not every deployment needs a gateway. Some traffic is purely internal and already governed by Unity Catalog. The question is where to place the control point, and whether you need one at all. This deck walks through four patterns and helps you decide.

02 / 14

Core Principle

Two options. Neither replaces the other.
Neither is mandatory.

Databricks AI Gateway

Governs LLM endpoint traffic — rate limits, guardrails, usage tracking, fallback routing. Configured per serving endpoint. UC-aware identity model.

External API Gateway

Manages the boundary between outside callers and Databricks — auth translation, API catalog, per-tenant rate limiting. Customer-managed infrastructure.

What neither should govern: internal Databricks traffic between agents, Genie, custom MCP servers, and serving endpoints. That traffic is governed by Unity Catalog at the data plane — inserting a gateway into that path breaks the OBO token chain.

03 / 14

Overview

The Four Traffic Patterns

🟢

Pattern 1 — Internal Traffic

Agent ↔ Genie · FM API · Agent Bricks · Vector Search · UC Functions · Custom MCP

No Gateway

🔵

Pattern 2 — LLM Endpoint Governance

Rate limits per user/group · Content guardrails · Usage tracking · Fallback routing

AI Gateway

🟡

Pattern 3 — Outbound External

Agent calls external APIs · Third-party MCP servers · External LLMs not on FM API

UC Connections + SNP

🔴

Pattern 4 — Inbound External

External clients calling Databricks · Enterprise apps · Partners · Customer portals

External Gateway

04 / 14

Pattern 1

🟢 No Gateway

Internal Traffic

All Databricks AI Services

Genie (NL-to-SQL, AI/BI chatbot)
FM API (Foundation Model API)
Agent Bricks · Knowledge Assistant
Vector Search (RAG retrieval)
UC Functions + DBSQL
Custom MCP Server (on Databricks Apps)

Auth Methods

OBO — On-Behalf-Of User

User's token flows end-to-end. UC row filters evaluate current_user() — the actual requesting user, not the app SP.

M2M — Service Principal

App SP identity. For shared resources (vector indexes, knowledge bases) where user-level isolation is not needed.

05 / 14

Pattern 1 — Why No Gateway

A gateway in the OBO path breaks governance

❌ Gateway intercepts

1

App sends user token to gateway

2

Gateway re-issues its own token.
Original user identity is lost.

3

Genie calls UC with gateway identity

current_user() = gateway service
Row filters evaluate wrong identity ⚠️

✅ Direct OBO

1

User token flows: App → Genie (unchanged)

2

Genie calls UC with user's identity

3

Row filter: rep_email = current_user()

current_user() = user@company.com
Per-user data access enforced ✅

Key principle: UC enforcement happens at query execution time, inside the compute layer. A gateway observes HTTP traffic — it cannot observe which rows were filtered or what current_user() returned.

06 / 14

Pattern 2

🔵 LLM Governance

Databricks AI Gateway

Rate Limiting

Per Databricks user, group, or endpoint. Tokens per minute + requests per minute. Enforced using UC identity — the same model as row filters and column masks.

Content Guardrails

Input and output safety filtering. PII detection (block or mask). Topic filtering. Applied uniformly at the endpoint — no per-application code needed.

Usage Tracking

Token consumption per identity to system tables. Cost attribution by team, application, or user group. MLflow integration for experiment tracking.

Fallback Routing

Route to a backup model if the primary exceeds latency thresholds. Traffic splitting across model versions. Zero custom throttling code.

Configured directly on the serving endpoint — not a separate infrastructure hop. Works with OBO and M2M token flows. Compatible with FM API, custom models, and external models via AI Gateway.

07 / 14

Pattern 3

🟡 Outbound External

UC HTTP Connections + Serverless Network Policies

The Proxy Model

App / Agent
sp-appeals

→

/api/2.0/mcp/external/
npi_registry_conn
checks USE CONNECTION grant

→ ✅

External Service
+ injected credential
app never holds raw token

UC HTTP Connections

Credentials stored encrypted in UC
USE CONNECTION grant = per-SP authorization
sp-appeals ✅ for NPI registry; sp-billing ❌
App code never receives raw credential values

Serverless Network Policy (SNP)

Workspace-level FQDN allowlist for all serverless compute
Blocks egress to any host not on the list
Network layer: "can this workspace reach this host?"
UC Connections layer: "is this SP authorized to authenticate?"

08 / 14

Pattern 3 — Defense-in-Depth

Two independent layers of control

Layer 1 — Network (SNP)

Workspace-level FQDN allowlist

Blocks egress regardless of which app is calling. Governs: "Can this workspace reach this host at all?"

Layer 2 — Credential (UC Connections)

Per-SP authorization gate

Even if the host is reachable, the app needs a USE CONNECTION grant to get credentials injected. Governs: "Is this SP authorized to authenticate?"

Threat Model

Threat	Defended by
App calls an unapproved external host	SNP — FQDN not on allowlist, unreachable at network layer
App calls approved host it's not authorized for	UC Connections — no USE CONNECTION grant → 403, credentials never injected
App exfiltrates stored credentials	Not possible — app code never receives raw credential; proxy injects server-side

09 / 14

Pattern 4

🔴 Inbound External

External API Gateway

When to use

Requests originate outside Databricks
External clients use enterprise SSO or API keys not natively handled by Databricks
Multiple external tenants need per-tenant rate limits
You need an API catalog, versioned endpoints, or developer portal
Regulatory or enterprise policy requires an API facade

What it provides

Auth Translation

Enterprise SSO / API key → Databricks OAuth token. Databricks doesn't natively manage external client identities.

Per-Tenant Rate Limits

Rate limit by external subscription tier, organization, or API key — not by Databricks identity.

API Catalog + Dev Portal

OpenAPI specs, versioned endpoints, subscription management — none provided natively to external clients.

Options: Azure APIM · AWS API Gateway · Kong · Apigee · Any OpenAPI-compatible gateway. The external gateway is additive — it can sit in front of a Databricks endpoint that also has Pattern 2 AI Gateway configured.

10 / 14

Comparison

Databricks AI Gateway vs External API Gateway

Dimension	Databricks AI Gateway	External API Gateway
Where it sits	On the Databricks serving endpoint	In front of Databricks (customer-managed)
Identity awareness	UC-aware — knows Databricks users and groups	Manages external client identities
Rate limiting	Per Databricks user / group / endpoint	Per external tenant / subscription / API key
Guardrails	Input + output safety, PII, topic filtering	Not provided natively; requires custom plugins
Usage tracking	Token-level → system tables + MLflow	Request-level → gateway analytics
Auth	Validates Databricks OAuth tokens	Translates external identities → Databricks tokens
Use when	Governing LLM consumption within Databricks	Managing access from external enterprise clients

Additive by design: external gateway handles boundary crossing (external identity → Databricks token); AI Gateway handles LLM governance at the endpoint; Unity Catalog handles data access. Each layer governs what it owns.

11 / 14

Reference Scenario

Prior Authorization Appeals Agent

Fictionalized scenario — all four patterns in one architecture. RegionalCare Health Plan automates context assembly for human appeal reviewers.

Traffic	Pattern	Governance mechanism
Case management system → Appeals endpoint	4 — Ext. Gateway	Auth translation (enterprise SSO → Databricks token), rate limit per org
Endpoint LLM consumption	2 — AI Gateway	Rate limit per reviewer team · usage tracking by dept · content guardrails
Agent → Genie (member eligibility, claim history)	1 — No Gateway	OBO — UC row filters enforce per-reviewer data access
Agent → Vector Search (clinical guidelines)	1 — No Gateway	M2M — shared knowledge, same for all reviewers
Agent → NPI Registry + CMS Coverage DB	3 — UC Connections	USE CONNECTION grant per SP · SNP allows FQDNs · credentials injected server-side

12 / 14

Decision Framework

Which pattern for which traffic?

Where does
the request
originate?

Outside Databricks

Enterprise apps · partners · customer portals · external workflows

→ Pattern 4 — External Gateway

Inside Databricks — going where?

External service

API · MCP server · external LLM

→ Pattern 3

Databricks service — no LLM governance

→ Pattern 1

LLM endpoint — need rate limits or guardrails

→ Pattern 2

13 / 14

Summary

Quick Reference

Traffic Type	Pattern	Approach
Agent ↔ Genie · FM API · Agent Bricks · Vector Search · UC Functions	1 — No Gateway	OBO or M2M · UC row filters at the data plane
LLM endpoint — rate limits, guardrails, or cost tracking needed	2 — AI Gateway	Databricks AI Gateway configured on the endpoint
Agent calling external APIs, MCP servers, or external LLMs	3 — UC Connections	UC HTTP Connections + SNP · credential never in app code
External clients (enterprise apps, partners) calling Databricks	4 — Ext. Gateway	External API Gateway (APIM · Kong · AWS API GW)

Patterns 2 + 4 are additive. External gateway handles boundary crossing; Databricks AI Gateway handles LLM governance. Neither interferes with the other.

OBO token chain is non-negotiable. Any intermediary that re-issues its own token breaks UC governance silently. Keep internal traffic direct.

Reference: AI-GATEWAY-PATTERNS.md · Interactive: interactive/orchestration/ai-gateway-patterns.html

14 / 14

↓ / Space	Next slide
↑	Previous slide
Home	First slide
End	Last slide
Dots (right)	Jump to slide
Swipe	Touch navigation

Navigation

AI GatewayPatterns

Your AI platform calls external models. Who controls that traffic?

Two options. Neither replaces the other.Neither is mandatory.

The Four Traffic Patterns

Internal Traffic

A gateway in the OBO path breaks governance

Databricks AI Gateway

UC HTTP Connections + Serverless Network Policies

Two independent layers of control

External API Gateway

Databricks AI Gateway vs External API Gateway

Prior Authorization Appeals Agent

Which pattern for which traffic?

Quick Reference

AI Gateway
Patterns

Two options. Neither replaces the other.
Neither is mandatory.