Databricks · AI Governance

AI Gateway
Patterns

When to use Databricks AI Gateway, external API gateways, UC Connections — and when to use no gateway at all.

Pattern 1 — No Gateway Pattern 2 — AI Gateway Pattern 3 — UC Connections Pattern 4 — External Gateway
↑↓ arrow keys or Space to navigate
01 / 14
The Challenge

Your AI platform calls external models. Who controls that traffic?

Agents, knowledge retrieval, and orchestration workflows all call LLM endpoints, both internal foundation models and external providers like OpenAI or Anthropic. Each call carries cost, latency, and compliance implications. Without a control point, there is no rate limiting, no usage tracking, no guardrails, and no fallback routing.

But not every deployment needs a gateway. Some traffic is purely internal and already governed by Unity Catalog. The question is where to place the control point, and whether you need one at all. This deck walks through four patterns and helps you decide.

02 / 14
Core Principle

Two options. Neither replaces the other.
Neither is mandatory.

Databricks AI Gateway

Governs LLM endpoint traffic — rate limits, guardrails, usage tracking, fallback routing. Configured per serving endpoint. UC-aware identity model.

External API Gateway

Manages the boundary between outside callers and Databricks — auth translation, API catalog, per-tenant rate limiting. Customer-managed infrastructure.

What neither should govern: internal Databricks traffic between agents, Genie, custom MCP servers, and serving endpoints. That traffic is governed by Unity Catalog at the data plane — inserting a gateway into that path breaks the OBO token chain.
03 / 14
Overview

The Four Traffic Patterns

🟢
Pattern 1 — Internal Traffic
Agent ↔ Genie · FM API · Agent Bricks · Vector Search · UC Functions · Custom MCP
No Gateway
🔵
Pattern 2 — LLM Endpoint Governance
Rate limits per user/group · Content guardrails · Usage tracking · Fallback routing
AI Gateway
🟡
Pattern 3 — Outbound External
Agent calls external APIs · Third-party MCP servers · External LLMs not on FM API
UC Connections + SNP
🔴
Pattern 4 — Inbound External
External clients calling Databricks · Enterprise apps · Partners · Customer portals
External Gateway
04 / 14
Pattern 1
🟢 No Gateway

Internal Traffic

All Databricks AI Services
  • Genie (NL-to-SQL, AI/BI chatbot)
  • FM API (Foundation Model API)
  • Agent Bricks · Knowledge Assistant
  • Vector Search (RAG retrieval)
  • UC Functions + DBSQL
  • Custom MCP Server (on Databricks Apps)
Auth Methods
OBO — On-Behalf-Of User

User's token flows end-to-end. UC row filters evaluate current_user() — the actual requesting user, not the app SP.

M2M — Service Principal

App SP identity. For shared resources (vector indexes, knowledge bases) where user-level isolation is not needed.

05 / 14
Pattern 1 — Why No Gateway

A gateway in the OBO path breaks governance

❌ Gateway intercepts
1
App sends user token to gateway
2
Gateway re-issues its own token.
Original user identity is lost.
3
Genie calls UC with gateway identity
current_user() = gateway service
Row filters evaluate wrong identity ⚠️
✅ Direct OBO
1
User token flows: App → Genie (unchanged)
2
Genie calls UC with user's identity
3
Row filter: rep_email = current_user()
current_user() = user@company.com
Per-user data access enforced ✅
Key principle: UC enforcement happens at query execution time, inside the compute layer. A gateway observes HTTP traffic — it cannot observe which rows were filtered or what current_user() returned.
06 / 14
Pattern 2
🔵 LLM Governance

Databricks AI Gateway

Rate Limiting

Per Databricks user, group, or endpoint. Tokens per minute + requests per minute. Enforced using UC identity — the same model as row filters and column masks.

Content Guardrails

Input and output safety filtering. PII detection (block or mask). Topic filtering. Applied uniformly at the endpoint — no per-application code needed.

Usage Tracking

Token consumption per identity to system tables. Cost attribution by team, application, or user group. MLflow integration for experiment tracking.

Fallback Routing

Route to a backup model if the primary exceeds latency thresholds. Traffic splitting across model versions. Zero custom throttling code.

Configured directly on the serving endpoint — not a separate infrastructure hop. Works with OBO and M2M token flows. Compatible with FM API, custom models, and external models via AI Gateway.
07 / 14
Pattern 3
🟡 Outbound External

UC HTTP Connections + Serverless Network Policies

The Proxy Model
App / Agent
sp-appeals
/api/2.0/mcp/external/
npi_registry_conn
checks USE CONNECTION grant
→ ✅
External Service
+ injected credential
app never holds raw token
UC HTTP Connections
  • Credentials stored encrypted in UC
  • USE CONNECTION grant = per-SP authorization
  • sp-appeals ✅ for NPI registry; sp-billing ❌
  • App code never receives raw credential values
Serverless Network Policy (SNP)
  • Workspace-level FQDN allowlist for all serverless compute
  • Blocks egress to any host not on the list
  • Network layer: "can this workspace reach this host?"
  • UC Connections layer: "is this SP authorized to authenticate?"
08 / 14
Pattern 3 — Defense-in-Depth

Two independent layers of control

Layer 1 — Network (SNP)
Workspace-level FQDN allowlist

Blocks egress regardless of which app is calling. Governs: "Can this workspace reach this host at all?"

Layer 2 — Credential (UC Connections)
Per-SP authorization gate

Even if the host is reachable, the app needs a USE CONNECTION grant to get credentials injected. Governs: "Is this SP authorized to authenticate?"

Threat Model
ThreatDefended by
App calls an unapproved external host SNP — FQDN not on allowlist, unreachable at network layer
App calls approved host it's not authorized for UC Connections — no USE CONNECTION grant → 403, credentials never injected
App exfiltrates stored credentials Not possible — app code never receives raw credential; proxy injects server-side
09 / 14
Pattern 4
🔴 Inbound External

External API Gateway

When to use
  • Requests originate outside Databricks
  • External clients use enterprise SSO or API keys not natively handled by Databricks
  • Multiple external tenants need per-tenant rate limits
  • You need an API catalog, versioned endpoints, or developer portal
  • Regulatory or enterprise policy requires an API facade
What it provides
Auth Translation

Enterprise SSO / API key → Databricks OAuth token. Databricks doesn't natively manage external client identities.

Per-Tenant Rate Limits

Rate limit by external subscription tier, organization, or API key — not by Databricks identity.

API Catalog + Dev Portal

OpenAPI specs, versioned endpoints, subscription management — none provided natively to external clients.

Options: Azure APIM · AWS API Gateway · Kong · Apigee · Any OpenAPI-compatible gateway. The external gateway is additive — it can sit in front of a Databricks endpoint that also has Pattern 2 AI Gateway configured.
10 / 14
Comparison

Databricks AI Gateway vs External API Gateway

Dimension Databricks AI Gateway External API Gateway
Where it sitsOn the Databricks serving endpointIn front of Databricks (customer-managed)
Identity awarenessUC-aware — knows Databricks users and groupsManages external client identities
Rate limitingPer Databricks user / group / endpointPer external tenant / subscription / API key
GuardrailsInput + output safety, PII, topic filteringNot provided natively; requires custom plugins
Usage trackingToken-level → system tables + MLflowRequest-level → gateway analytics
AuthValidates Databricks OAuth tokensTranslates external identities → Databricks tokens
Use whenGoverning LLM consumption within DatabricksManaging access from external enterprise clients
Additive by design: external gateway handles boundary crossing (external identity → Databricks token); AI Gateway handles LLM governance at the endpoint; Unity Catalog handles data access. Each layer governs what it owns.
11 / 14
Reference Scenario

Prior Authorization Appeals Agent

Fictionalized scenario — all four patterns in one architecture. RegionalCare Health Plan automates context assembly for human appeal reviewers.

TrafficPatternGovernance mechanism
Case management system → Appeals endpoint 4 — Ext. Gateway Auth translation (enterprise SSO → Databricks token), rate limit per org
Endpoint LLM consumption 2 — AI Gateway Rate limit per reviewer team · usage tracking by dept · content guardrails
Agent → Genie (member eligibility, claim history) 1 — No Gateway OBO — UC row filters enforce per-reviewer data access
Agent → Vector Search (clinical guidelines) 1 — No Gateway M2M — shared knowledge, same for all reviewers
Agent → NPI Registry + CMS Coverage DB 3 — UC Connections USE CONNECTION grant per SP · SNP allows FQDNs · credentials injected server-side
12 / 14
Decision Framework

Which pattern for which traffic?

Where does
the request
originate?
Outside Databricks

Enterprise apps · partners · customer portals · external workflows

→ Pattern 4 — External Gateway
Inside Databricks — going where?
External service

API · MCP server · external LLM

→ Pattern 3
Databricks service — no LLM governance
→ Pattern 1
LLM endpoint — need rate limits or guardrails
→ Pattern 2
13 / 14
Summary

Quick Reference

Traffic TypePatternApproach
Agent ↔ Genie · FM API · Agent Bricks · Vector Search · UC Functions 1 — No Gateway OBO or M2M · UC row filters at the data plane
LLM endpoint — rate limits, guardrails, or cost tracking needed 2 — AI Gateway Databricks AI Gateway configured on the endpoint
Agent calling external APIs, MCP servers, or external LLMs 3 — UC Connections UC HTTP Connections + SNP · credential never in app code
External clients (enterprise apps, partners) calling Databricks 4 — Ext. Gateway External API Gateway (APIM · Kong · AWS API GW)
Patterns 2 + 4 are additive. External gateway handles boundary crossing; Databricks AI Gateway handles LLM governance. Neither interferes with the other.
OBO token chain is non-negotiable. Any intermediary that re-issues its own token breaks UC governance silently. Keep internal traffic direct.
Reference: AI-GATEWAY-PATTERNS.md · Interactive: interactive/orchestration/ai-gateway-patterns.html
14 / 14