| ↓ / Space | Next slide |
| ↑ | Previous slide |
| Home | First slide |
| End | Last slide |
| Dots (right) | Jump to slide |
| Swipe | Touch navigation |
When to use Databricks AI Gateway, external API gateways, UC Connections — and when to use no gateway at all.
Agents, knowledge retrieval, and orchestration workflows all call LLM endpoints, both internal foundation models and external providers like OpenAI or Anthropic. Each call carries cost, latency, and compliance implications. Without a control point, there is no rate limiting, no usage tracking, no guardrails, and no fallback routing.
But not every deployment needs a gateway. Some traffic is purely internal and already governed by Unity Catalog. The question is where to place the control point, and whether you need one at all. This deck walks through four patterns and helps you decide.
Governs LLM endpoint traffic — rate limits, guardrails, usage tracking, fallback routing. Configured per serving endpoint. UC-aware identity model.
Manages the boundary between outside callers and Databricks — auth translation, API catalog, per-tenant rate limiting. Customer-managed infrastructure.
User's token flows end-to-end. UC row filters evaluate current_user() — the actual requesting user, not the app SP.
App SP identity. For shared resources (vector indexes, knowledge bases) where user-level isolation is not needed.
rep_email = current_user()
current_user() returned.
Per Databricks user, group, or endpoint. Tokens per minute + requests per minute. Enforced using UC identity — the same model as row filters and column masks.
Input and output safety filtering. PII detection (block or mask). Topic filtering. Applied uniformly at the endpoint — no per-application code needed.
Token consumption per identity to system tables. Cost attribution by team, application, or user group. MLflow integration for experiment tracking.
Route to a backup model if the primary exceeds latency thresholds. Traffic splitting across model versions. Zero custom throttling code.
USE CONNECTION grant = per-SP authorizationBlocks egress regardless of which app is calling. Governs: "Can this workspace reach this host at all?"
Even if the host is reachable, the app needs a USE CONNECTION grant to get credentials injected. Governs: "Is this SP authorized to authenticate?"
| Threat | Defended by |
|---|---|
| App calls an unapproved external host | SNP — FQDN not on allowlist, unreachable at network layer |
| App calls approved host it's not authorized for | UC Connections — no USE CONNECTION grant → 403, credentials never injected |
| App exfiltrates stored credentials | Not possible — app code never receives raw credential; proxy injects server-side |
Enterprise SSO / API key → Databricks OAuth token. Databricks doesn't natively manage external client identities.
Rate limit by external subscription tier, organization, or API key — not by Databricks identity.
OpenAPI specs, versioned endpoints, subscription management — none provided natively to external clients.
| Dimension | Databricks AI Gateway | External API Gateway |
|---|---|---|
| Where it sits | On the Databricks serving endpoint | In front of Databricks (customer-managed) |
| Identity awareness | UC-aware — knows Databricks users and groups | Manages external client identities |
| Rate limiting | Per Databricks user / group / endpoint | Per external tenant / subscription / API key |
| Guardrails | Input + output safety, PII, topic filtering | Not provided natively; requires custom plugins |
| Usage tracking | Token-level → system tables + MLflow | Request-level → gateway analytics |
| Auth | Validates Databricks OAuth tokens | Translates external identities → Databricks tokens |
| Use when | Governing LLM consumption within Databricks | Managing access from external enterprise clients |
Fictionalized scenario — all four patterns in one architecture. RegionalCare Health Plan automates context assembly for human appeal reviewers.
| Traffic | Pattern | Governance mechanism |
|---|---|---|
| Case management system → Appeals endpoint | 4 — Ext. Gateway | Auth translation (enterprise SSO → Databricks token), rate limit per org |
| Endpoint LLM consumption | 2 — AI Gateway | Rate limit per reviewer team · usage tracking by dept · content guardrails |
| Agent → Genie (member eligibility, claim history) | 1 — No Gateway | OBO — UC row filters enforce per-reviewer data access |
| Agent → Vector Search (clinical guidelines) | 1 — No Gateway | M2M — shared knowledge, same for all reviewers |
| Agent → NPI Registry + CMS Coverage DB | 3 — UC Connections | USE CONNECTION grant per SP · SNP allows FQDNs · credentials injected server-side |
Enterprise apps · partners · customer portals · external workflows
API · MCP server · external LLM
| Traffic Type | Pattern | Approach |
|---|---|---|
| Agent ↔ Genie · FM API · Agent Bricks · Vector Search · UC Functions | 1 — No Gateway | OBO or M2M · UC row filters at the data plane |
| LLM endpoint — rate limits, guardrails, or cost tracking needed | 2 — AI Gateway | Databricks AI Gateway configured on the endpoint |
| Agent calling external APIs, MCP servers, or external LLMs | 3 — UC Connections | UC HTTP Connections + SNP · credential never in app code |
| External clients (enterprise apps, partners) calling Databricks | 4 — Ext. Gateway | External API Gateway (APIM · Kong · AWS API GW) |