Pattern: deployments/full-private
Status: ✅ Production Ready
The Full Private (Air-Gapped) pattern provides a fully isolated Azure Databricks deployment with:
✅ Highly regulated industries (Financial services, Healthcare) ✅ Zero-trust network architectures ✅ Air-gapped requirements (No internet access) ✅ Strict data residency (All traffic on Azure backbone) ✅ Compliance mandates (HIPAA, PCI-DSS, FedRAMP)
┌──────────────────────────────────────────────────────────────────┐
│ User Network (On-Premises or VPN) │
└────────────┬─────────────────────────────────────────────────────┘
│
│ (Private DNS resolution)
│ (VPN or ExpressRoute)
↓
┌──────────────────────────────────────────────────────────────────┐
│ Customer VNet (VNet Injection) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Private Link Subnet │ │
│ │ ┌────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ PE: UI/API Endpoint │ │ PE: Browser Auth │ │ │
│ │ │ → Control Plane │ │ → Control Plane │ │ │
│ │ └────────────────────────┘ └──────────────────────────┘ │ │
│ │ ┌────────────────────────┐ ┌──────────────────────────┐ │ │
│ │ │ PE: DBFS Storage │ │ PE: UC Storage │ │ │
│ │ │ → Customer Storage │ │ → Customer Storage │ │ │
│ │ └────────────────────────┘ └──────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────┐ ┌──────────────────────────┐ │
│ │ Public/Host Subnet │ │ Private/Container Subnet │ │
│ │ (10.178.0.0/26) │ │ (10.178.1.0/26) │ │
│ │ │ │ │ │
│ │ - Driver Nodes │ │ - Worker Nodes │ │
│ │ - No Public IPs (NPIP) │ │ - No Public IPs (NPIP) │ │
│ │ - NO NAT Gateway │ │ - NO NAT Gateway │ │
│ └────────────────────────────┘ └──────────────────────────┘ │
│ │ │ │
│ └───────────────┬───────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Network Security Group (NSG) │ │
│ │ - Custom rules (when public access disabled) │ │
│ │ - Worker-to-worker communication │ │
│ └────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
│ │
│ (Private Link - Control Plane) │ (Private Link - Storage)
↓ ↓
┌──────────────────────────────────────────────────────────────────┐
│ Databricks Control Plane (Microsoft-Managed) │
│ - SCC Relay (backend Private Link) │
│ - API Service (frontend Private Link) │
│ - Cluster Management │
└──────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ Azure Storage (ADLS Gen2) - Customer Subscription │
│ - DBFS Root Storage (via Private Endpoint) │
│ - UC Metastore Storage (via Private Endpoint) │
│ - UC External Location (via Private Endpoint) │
└──────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────┐
│ Network Connectivity Configuration (NCC) - Optional │
│ - Enables serverless → customer storage connectivity │
│ - PE rules require manual approval in Azure Portal │
│ - Setup: See docs/04-SERVERLESS-SETUP.md │
└────────────────────────────────────────────────────────────────────┘
Legend:
| Feature | Non-PL | Full Private |
|---|---|---|
| Control Plane Access | Public internet | Private Link only |
| User Access | Any internet connection | VPN/ExpressRoute required |
| Internet Egress | ✅ NAT Gateway (PyPI/Maven) | ❌ None (air-gapped) |
| Storage Connectivity | Service Endpoints | Private Link |
| Package Management | Internet repos | Customer repos required |
| Deployment Complexity | Low | High |
| Security Posture | Secure | Maximum security |
sequenceDiagram
actor User
participant VPN as VPN/ExpressRoute
participant PE as Private Endpoint
participant UI as Databricks UI
participant CP as Control Plane
participant Cluster as Cluster VMs<br/>(VNet)
participant Storage as Azure Storage<br/>(Private Link)
User->>VPN: 1. Connect via VPN
VPN->>PE: Private DNS resolution
PE->>UI: Access workspace
UI->>CP: 2. Create Cluster
CP->>Cluster: 3. Provision VMs (NPIP)
Cluster->>CP: 4. Establish SCC (Private Link)
Cluster->>Storage: 5. Access Storage (Private Link)
CP->>User: Cluster RUNNING
Timeline: ~3-5 minutes from creation to ready state
Key Points:
User → VPN/ExpressRoute → Private Link Subnet → Private Endpoint → Databricks UI
Requirements:
- Network connectivity to customer VNet (VPN/ExpressRoute/Bastion)
- Private DNS resolution configured
- IP in allow-list (if IP Access Lists enabled)
Access Methods:
User → Private Endpoint (UI/API) → Databricks Control Plane
├─ POST /api/2.0/clusters/create
├─ Payload: {node_type, count, dbr_version}
└─ Response: Cluster ID (pending state)
Network Path: Private Link → Databricks SaaS (no public internet)
sequenceDiagram
participant CP as Control Plane
participant ARM as Azure ARM
participant VNet as Customer VNet
CP->>ARM: Create VMs (no public IPs)
ARM->>VNet: Provision Driver (Public Subnet)
ARM->>VNet: Provision Workers (Private Subnet)
VNet-->>ARM: VMs Created
ARM-->>CP: Provisioning Complete
Resources Created:
Cluster VMs → Private Endpoint (Backend) → Control Plane SCC Relay
Protocol: HTTPS/WebSocket (443)
Direction: Outbound only (VNet initiates)
Purpose: Cluster management, commands, monitoring
Routing: Private Link (NOT via public internet)
Private Link Architecture:
graph LR
Cluster[Cluster VMs]
PE[Private Endpoint]
subgraph Storage["Azure Storage (Customer)"]
DBFS[DBFS Root]
UC[UC Metastore]
ExtLoc[External Location]
end
Cluster --> PE
PE --> DBFS
PE --> UC
PE --> ExtLoc
style PE fill:#e8f5e9
style Storage fill:#fff9c4
Access Pattern:
Authentication: Managed Identity (Access Connector) via RBAC
❌ NO internet access for packages
Customer Responsibilities:
1. Host internal PyPI mirror (e.g., JFrog Artifactory, Nexus)
2. Configure pip.conf to point to internal mirror
3. Pre-install libraries via init scripts
4. Use private container registry for custom images
Example Init Script:
#!/bin/bash
# Configure pip to use internal PyPI mirror
cat > /etc/pip.conf << EOF
[global]
index-url = https://pypi.company.internal/simple
trusted-host = pypi.company.internal
EOF
# Install common libraries from internal mirror
pip install pandas numpy scikit-learn
| Traffic Type | Source | Destination | Path | Authentication |
|---|---|---|---|---|
| UI/API Access | User | Databricks SaaS | VPN → Private Endpoint | AAD/Bearer Token |
| Control Plane (SCC) | Cluster VMs | Databricks SaaS | Private Endpoint (Backend) | Databricks-managed |
| DBFS Access | Cluster VMs | DBFS (Customer) | Private Endpoint | Managed Identity |
| UC Metastore | Cluster VMs | UC Storage (Customer) | Private Endpoint | Managed Identity |
| External Data | Cluster VMs | External Location | Private Endpoint | Managed Identity |
| Worker-to-Worker | Worker VMs | Worker VMs | Within VNet | N/A |
| Logs/Metrics | Cluster VMs | Event Hub | Private Endpoint (optional) | Databricks-managed |
| Package Downloads | ❌ | ❌ | NONE (air-gapped) | N/A |
Key Routing:
| Feature | Status | Details |
|---|---|---|
| Secure Cluster Connectivity (NPIP) | ✅ Always enabled | No public IPs on clusters |
| VNet Injection | ✅ Always enabled | Deploy into customer VNet |
| Private Link (Control Plane) | ✅ Always enabled | Frontend (UI/API) + Backend (SCC) |
| Private Link (Storage) | ✅ Always enabled | All storage via Private Endpoints |
| Unity Catalog | ✅ Mandatory | Data governance and access control |
| Customer-Managed Keys (CMK) | ✅ Default enabled | Managed services + Disks + DBFS |
| BYOV Support | ✅ Optional | Bring Your Own VNet/Subnets/NSG |
| IP Access Lists | ✅ Optional | Restrict workspace access by IP |
| Private DNS Zones | ✅ Auto-created | Azure-integrated DNS for Private Endpoints |
| Service Endpoint Policy (SEP) | ✅ Optional | Storage egress control for classic compute |
| NCC (Serverless) | ✅ Optional | Private Link for serverless compute |
| Feature | Status | Reason | Alternative |
|---|---|---|---|
| NAT Gateway | ❌ Not included | Air-gapped design | Use internal package repos |
| Service Endpoints | ❌ Not used | Private Link provides stronger isolation | N/A |
| Public Internet Egress | ❌ Not allowed | Air-gapped requirement | Internal repos required |
Required:
Network Requirements:
/16 to /20 recommended/26 minimum (64 IPs)/26 minimum (64 IPs)/27 minimum (32 IPs)# 1. Navigate to deployment folder
cd deployments/full-private
# 2. Copy and configure variables
cp terraform.tfvars.example terraform.tfvars
vim terraform.tfvars
# 3. Initialize Terraform
terraform init
# 4. Review deployment plan
terraform plan
# 5. Deploy
terraform apply
Full-Private deployments rely on Azure Private DNS zones to resolve Private Endpoint FQDNs to private IP addresses within your VNet. This ensures all traffic stays on the Azure backbone and never traverses the public internet.
graph TB
User[User/Application]
VNet[Customer VNet]
subgraph "Private DNS Zones"
DBDNSZone["privatelink.azuredatabricks.net"]
DFSDNSZone["privatelink.dfs.core.windows.net"]
BlobDNSZone["privatelink.blob.core.windows.net"]
end
subgraph "Private Endpoints"
UIPE[UI/API PE<br/>databricks_ui_api]
AuthPE[Browser Auth PE<br/>browser_authentication]
DBFSPE[DBFS Storage PE<br/>dfs]
UCPE[UC Storage PE<br/>dfs]
end
User -->|1. Query| VNet
VNet -->|2. DNS Lookup| DBDNSZone
VNet -->|2. DNS Lookup| DFSDNSZone
DBDNSZone -->|3. Returns Private IP| VNet
DFSDNSZone -->|3. Returns Private IP| VNet
VNet -->|4. Connect via Private IP| UIPE
VNet -->|4. Connect via Private IP| DBFSPE
style DBDNSZone fill:#e1f5fe
style DFSDNSZone fill:#e1f5fe
style BlobDNSZone fill:#e1f5fe
style UIPE fill:#c8e6c9
style AuthPE fill:#c8e6c9
Key Components:
This deployment automatically creates and configures three Private DNS zones:
| DNS Zone | Purpose | Resources |
|---|---|---|
privatelink.azuredatabricks.net |
Databricks Control Plane access | UI/API endpoint, Browser Auth |
privatelink.dfs.core.windows.net |
ADLS Gen2 Data Lake Storage | DBFS, UC Metastore, UC External |
privatelink.blob.core.windows.net |
Blob Storage (legacy/fallback) | DBFS Blob endpoint |
Auto-Configuration:
Databricks Private Link uses two distinct sub-resource types for different access patterns:
databricks_ui_api (Workspace-Specific)Purpose: Direct workspace access for UI, REST API, and data plane communication (SCC)
FQDN Pattern:
adb-<workspace-id>.<random-id>.azuredatabricks.net
Use Cases:
Characteristics:
Example:
# Workspace URL
https://adb-1234567890123456.12.azuredatabricks.net
# DNS Resolution (via Private Link)
nslookup adb-1234567890123456.12.azuredatabricks.net
# Answer: 10.178.2.10 (Private IP in Private Link subnet)
browser_authentication (Regional, Shared)Purpose: Azure AD authentication redirect for browser-based login
FQDN Pattern:
adb-<workspace-id>.azuredatabricks.net (no random-id)
Use Cases:
Characteristics:
Example:
# Auth URL (during Azure AD login)
https://adb-1234567890123456.azuredatabricks.net/login.html
# DNS Resolution (via Private Link)
nslookup adb-1234567890123456.azuredatabricks.net
# Answer: 10.178.2.11 (Private IP in Private Link subnet)
sequenceDiagram
accTitle: DNS Resolution Flow for Databricks Private Link Access
accDescr: This diagram shows the 9-step process of how DNS resolution works when accessing a Databricks workspace via Private Link
actor User
participant Browser
participant VPN as VPN/ExpressRoute
participant DNS as Private DNS Zone
participant PE as Private Endpoint
participant WS as Databricks Workspace
rect rgb(230, 240, 255)
Note over User,WS: Phase 1: Initial Access
User->>Browser: 1. Navigate to workspace URL
Browser->>VPN: 2. DNS query (adb-<workspace-id>.<random>.azuredatabricks.net)
end
rect rgb(255, 245, 230)
Note over VPN,DNS: Phase 2: DNS Resolution
VPN->>DNS: 3. Lookup in privatelink.azuredatabricks.net
DNS->>VPN: 4. Return Private IP (10.178.2.10)
end
rect rgb(230, 255, 240)
Note over VPN,WS: Phase 3: Connection Establishment
VPN->>PE: 5. Connect to Private IP
PE->>WS: 6. Forward to Databricks Control Plane
end
rect rgb(255, 240, 245)
Note over User,WS: Phase 4: Authentication (Azure AD)
WS->>Browser: 7. Redirect to Azure AD (via browser_authentication PE)
Browser->>DNS: 8. DNS lookup (adb-<workspace-id>.azuredatabricks.net)
DNS->>Browser: 9. Return Private IP (10.178.2.11)
Browser->>WS: 10. Complete auth, access workspace
end
Timeline:
Region: East US 2
Workspace A:
├─ databricks_ui_api: adb-1111111111111111.12.azuredatabricks.net → 10.178.2.10
└─ browser_authentication: adb-1111111111111111.azuredatabricks.net → 10.178.2.11
Workspace B:
├─ databricks_ui_api: adb-2222222222222222.12.azuredatabricks.net → 10.178.2.12
└─ browser_authentication: adb-2222222222222222.azuredatabricks.net → 10.178.2.11 (SHARED)
Key Points:
databricks_ui_api Private Endpointbrowser_authentication endpoint can be shared (regional)privatelink.azuredatabricks.net DNS zone for all workspacesCost Optimization: Sharing the browser_authentication endpoint reduces Private Endpoint costs in multi-workspace deployments.
Region: East US 2
VNet: 10.178.0.0/20
DNS Zone: privatelink.azuredatabricks.net (linked to VNet)
├─ Workspace A (East US 2): adb-1111111111111111.12.azuredatabricks.net
└─ Workspace B (East US 2): adb-2222222222222222.12.azuredatabricks.net
Region: West US 2
VNet: 10.179.0.0/20
DNS Zone: privatelink.azuredatabricks.net (linked to VNet)
├─ Workspace C (West US 2): adb-3333333333333333.10.azuredatabricks.net
└─ Workspace D (West US 2): adb-4444444444444444.10.azuredatabricks.net
Architecture:
Cross-Region Access:
Storage Account: <workspace-prefix>dbfs<suffix>.dfs.core.windows.net
DNS Resolution Flow:
1. Cluster queries: <storage-account>.dfs.core.windows.net
2. Azure DNS redirects: <storage-account>.privatelink.dfs.core.windows.net
3. Private DNS zone returns: 10.178.2.20 (Private IP)
4. Cluster connects via Private Link
DNS Records Created:
| Resource | Public FQDN | Private DNS Record | Private IP |
|---|---|---|---|
| DBFS Storage (DFS) | <prefix>dbfs<suffix>.dfs.core.windows.net |
<prefix>dbfs<suffix>.privatelink.dfs.core.windows.net |
10.178.2.20 |
| UC Metastore (DFS) | <prefix>uc<suffix>.dfs.core.windows.net |
<prefix>uc<suffix>.privatelink.dfs.core.windows.net |
10.178.2.21 |
| UC External (DFS) | <prefix>ext<suffix>.dfs.core.windows.net |
<prefix>ext<suffix>.privatelink.dfs.core.windows.net |
10.178.2.22 |
Auto-Configured by Terraform:
# List all Private DNS zones
az network private-dns zone list \
--resource-group <rg-name> \
--output table
# Expected output:
# Name ResourceGroup Location
# ------------------------------------- ------------------- --------
# privatelink.azuredatabricks.net <rg-name> global
# privatelink.dfs.core.windows.net <rg-name> global
# privatelink.blob.core.windows.net <rg-name> global
# Check VNet links for Databricks DNS zone
az network private-dns link vnet list \
--resource-group <rg-name> \
--zone-name privatelink.azuredatabricks.net \
--output table
# Expected: VNet should be linked with registrationEnabled=false
From within VNet (via VPN, bastion, or VM):
# Test Databricks workspace resolution
nslookup adb-<workspace-id>.<random-id>.azuredatabricks.net
# Expected Output:
# Server: <dns-server>
# Address: <dns-server-ip>
#
# Non-authoritative answer:
# Name: adb-<workspace-id>.<random-id>.azuredatabricks.net
# Address: 10.178.2.10 ← Private IP (not public)
# Test storage resolution
nslookup <storage-account>.dfs.core.windows.net
# Expected Output:
# Name: <storage-account>.privatelink.dfs.core.windows.net
# Address: 10.178.2.20 ← Private IP
Important: DNS resolution must return private IPs (10.x.x.x), not public IPs. If you see public IPs, the Private DNS zone is not correctly linked to your VNet.
# List A records in Databricks DNS zone
az network private-dns record-set a list \
--resource-group <rg-name> \
--zone-name privatelink.azuredatabricks.net \
--output table
# Expected: A records for workspace UI/API and browser auth
# List A records in DFS DNS zone
az network private-dns record-set a list \
--resource-group <rg-name> \
--zone-name privatelink.dfs.core.windows.net \
--output table
# Expected: A records for DBFS, UC metastore, UC external storage
Symptoms:
$ nslookup adb-<workspace-id>.<random-id>.azuredatabricks.net
Server: <dns-server>
Address: <dns-server-ip>
** server can't find adb-<workspace-id>.<random-id>.azuredatabricks.net: NXDOMAIN
Root Causes:
Solution:
# 1. Verify Private DNS zone exists
az network private-dns zone show \
--resource-group <rg-name> \
--name privatelink.azuredatabricks.net
# 2. Check VNet link
az network private-dns link vnet show \
--resource-group <rg-name> \
--zone-name privatelink.azuredatabricks.net \
--name <link-name>
# 3. Verify Private Endpoint DNS integration
az network private-endpoint show \
--resource-group <rg-name> \
--name <pe-name> \
--query 'privateDnsZoneGroups[0].privateDnsZoneConfigs[0].privateDnsZoneId'
Symptoms:
$ nslookup adb-<workspace-id>.<random-id>.azuredatabricks.net
Name: adb-<workspace-id>.<random-id>.azuredatabricks.net
Address: 20.62.x.x ← Public IP (WRONG!)
Root Causes:
Solution:
# Ensure you're connected via VPN/ExpressRoute
# Verify DNS resolver is set to Azure DNS
# Windows:
ipconfig /all
# Check DNS Servers: Should include 168.63.129.16 or VNet DNS
# Linux:
cat /etc/resolv.conf
# nameserver 168.63.129.16 (or VNet DNS)
# macOS:
scutil --dns
# Should route through VPN DNS
Symptoms:
Error: abfss://<container>@<storage-account>.dfs.core.windows.net: Name or service not known
Root Causes:
Solution:
# 1. Verify storage Private Endpoint
az network private-endpoint list \
--resource-group <rg-name> \
--query "[?contains(name, 'dbfs') || contains(name, 'uc')].{Name:name, State:privateLinkServiceConnections[0].privateLinkServiceConnectionState.status}" \
--output table
# Expected: All endpoints in "Approved" state
# 2. Test DNS resolution
nslookup <storage-account>.dfs.core.windows.net
# Should return private IP (10.x.x.x)
# 3. Check storage firewall rules
az storage account show \
--name <storage-account> \
--resource-group <rg-name> \
--query networkRuleSet.defaultAction
# If "Deny", ensure Private Endpoint is approved
*.azuredatabricks.net to Azure DNSprivatelink.* zones to public DNSFor detailed network architecture including DNS flow for multi-workspace hub-and-spoke deployments, see:
After successful deployment, disable public network access:
# terraform.tfvars
enable_public_network_access = false
terraform apply
This enforces Private Link only access (no public internet).
Ensure Private DNS zones are linked to your VNet:
# List Private DNS zones
az network private-dns zone list --output table
# Link to VNet (if not auto-linked)
az network private-dns link vnet create \
--resource-group <rg-name> \
--zone-name privatelink.azuredatabricks.net \
--name databricks-dns-link \
--virtual-network <vnet-id> \
--registration-enabled false
Required for air-gapped deployments:
Configure Init Scripts:
# /dbfs/init-scripts/configure-repos.sh
#!/bin/bash
# Configure pip
cat > /etc/pip.conf << EOF
[global]
index-url = https://pypi.company.internal/simple
trusted-host = pypi.company.internal
EOF
# Configure Maven
mkdir -p /home/ubuntu/.m2
cat > /home/ubuntu/.m2/settings.xml << EOF
<settings>
<mirrors>
<mirror>
<id>company-maven</id>
<url>https://maven.company.internal/repository/maven-central/</url>
<mirrorOf>central</mirrorOf>
</mirror>
</mirrors>
</settings>
EOF
Private Link Isolation:
Air-Gapped Architecture:
Unity Catalog:
Storage Security:
Customer-Managed Keys (CMK):
Issue: Cannot Access Workspace
Error: Unable to connect to workspace URL
Solution:
nslookup <workspace-url>Issue: Cluster Cannot Start
Symptom: Cluster stuck in PENDING state
Solution:
Issue: Cannot Install Libraries
Error: pip install fails with connection timeout
Solution: This is expected in air-gapped deployments. Libraries cannot be downloaded from the internet.
Workarounds:
/27 minimum, plan for growthenable_public_network_access = false after initial deploymentPattern Version: 1.0 Status: ✅ Production Ready Terraform Version: >= 1.5