Visual Guide: Understand the complete deployment architecture through modular diagrams.
π¦ 7 Terraform Modules β 65-70 AWS/Databricks Resources
β±οΈ 15-20 minutes deployment time
π Private Link + Unity Catalog + CMK Encryption
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
%%{init: {'flowchart': {'htmlLabels': false}}}%%
graph TB
subgraph "AWS Account"
subgraph "VPC 10.0.0.0/22"
subgraph "Public Subnets /26"
NAT["NAT Gateways<br/>2 AZs<br/>High Availability"]
IGW["Internet<br/>Gateway"]
end
subgraph "Private Subnets /24 - Databricks Clusters"
CLUSTER["Cluster Nodes<br/>Spark Workers<br/>502 IPs total"]
end
subgraph "PrivateLink Subnets /26 - VPC Endpoints"
VPCE["VPC Endpoints<br/>β’ Workspace 8443-8451<br/>β’ Relay SCC 6666<br/>β’ AWS Services"]
end
subgraph "Storage Layer"
S3["S3 Buckets<br/>β’ DBFS Root<br/>β’ UC Metastore<br/>β’ UC External<br/>KMS Encrypted"]
end
end
subgraph "IAM Layer"
ROLES["IAM Roles<br/>β’ Cross-Account<br/>β’ UC Metastore<br/>β’ UC External<br/>β’ Instance Profile"]
end
subgraph "Encryption Layer"
KMS["KMS Keys<br/>β’ S3 Buckets<br/>β’ Workspace CMK<br/> DBFS/EBS/MS"]
end
end
subgraph "Databricks Control Plane"
CONTROL["Databricks SaaS<br/>accounts.cloud.databricks.com"]
end
subgraph "Unity Catalog"
UC["Metastore<br/>Catalogs<br/>External Locations"]
end
CLUSTER -->|Private Link| VPCE
VPCE -.->|Backend Private| CONTROL
CLUSTER -->|NAT| NAT
NAT --> IGW
CLUSTER -->|Gateway Endpoint| S3
ROLES -->|Permissions| S3
ROLES -->|Permissions| KMS
KMS -->|Encrypts| S3
CONTROL -->|Provisions| UC
UC -->|Stores Metadata| S3
style CONTROL fill:#FF3621
style S3 fill:#569A31
style VPCE fill:#FF9900
style UC fill:#1B72E8
Key Components:
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
flowchart TD
START["terraform apply"] --> NET["1. Networking Module<br/>VPC, Subnets, Security Groups<br/>VPC Endpoints"]
NET --> IAM["2. IAM Module<br/>Cross-Account Role<br/>UC Metastore Role<br/>Instance Profile"]
IAM --> KMS["3. KMS Module Optional<br/>S3 Encryption Key<br/>Workspace CMK<br/>+ UC Role KMS Policy"]
KMS --> STORAGE["4. Storage Module<br/>S3 Buckets<br/>DBFS Root<br/>UC Buckets"]
STORAGE --> WORKSPACE["5. Databricks Workspace<br/>MWS Resources<br/>Private Access Settings<br/>Workspace Creation"]
WORKSPACE --> UC["6. Unity Catalog Module<br/>Metastore Assignment<br/>External Location<br/>Workspace Catalog<br/>+ External Role KMS Policy"]
UC --> USER["7. User Assignment<br/>Workspace Admin<br/>Permissions"]
USER --> END["Deployment Complete"]
style START fill:#569A31
style END fill:#1B72E8
style KMS fill:#FF9900
style UC fill:#FF3621
Critical Dependencies:
Docs: Databricks Terraform Provider
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
graph TB
subgraph "VPC 10.0.0.0/22 1024 IPs"
subgraph "AZ-1 us-west-1a"
PUB1["Public Subnet<br/>10.0.0.0/26<br/>62 IPs<br/>NAT GW"]
PRIV1["Private Subnet<br/>10.0.1.0/24<br/>251 IPs<br/>Clusters"]
PL1["PrivateLink Subnet<br/>10.0.3.0/26<br/>62 IPs<br/>VPC Endpoints"]
end
subgraph "AZ-2 us-west-1c"
PUB2["Public Subnet<br/>10.0.0.64/26<br/>62 IPs<br/>NAT GW"]
PRIV2["Private Subnet<br/>10.0.2.0/24<br/>251 IPs<br/>Clusters"]
PL2["PrivateLink Subnet<br/>10.0.3.64/26<br/>62 IPs<br/>VPC Endpoints"]
end
end
PUB1 -.->|Internet| IGW[Internet Gateway]
PUB2 -.->|Internet| IGW
PRIV1 -->|via| PUB1
PRIV2 -->|via| PUB2
style PRIV1 fill:#569A31
style PRIV2 fill:#569A31
style PL1 fill:#FF9900
style PL2 fill:#FF9900
IP Allocation:
Docs: VPC and Subnets
Private Subnet Route Table:
βββββββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββββββ
β Destination β Target β Description β
βββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββββββ€
β 10.0.0.0/22 β local β VPC-internal trafficβ
β 0.0.0.0/0 β nat-gateway β Internet via NAT β
βββββββββββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββββββββ
PrivateLink Subnet Route Table:
βββββββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββββββββββ
β Destination β Target β Description β
βββββββββββββββββββΌββββββββββββββββββββΌβββββββββββββββββββββββ€
β 10.0.0.0/22 β local β VPC-internal only β
βββββββββββββββββββ΄ββββββββββββββββββββ΄βββββββββββββββββββββββ
%%{init: {'theme': 'base'}}%%
sequenceDiagram
autonumber
actor User as User/Admin
participant WS as Databricks<br/>Workspace UI
participant CP as Control Plane<br/>via Private Link
participant VPC as Customer VPC
participant CLUSTER as Spark Cluster
participant S3 as S3 DBFS/UC
participant UC as Unity Catalog
User->>WS: Create Cluster
WS->>CP: API Call dbc-*.cloud.databricks.com:8443
Note over WS,CP: DNS returns private IP 10.0.3.x
CP->>VPC: Launch EC2 Instances
VPC->>CLUSTER: Provision nodes in private subnets
CLUSTER->>CP: Register via Relay VPCE:6666
Note over CLUSTER,CP: Secure Cluster Connectivity
CLUSTER->>S3: Mount DBFS via Gateway Endpoint
CLUSTER->>UC: Query catalog metadata
UC-->>CLUSTER: Return table locations
CLUSTER->>S3: Read/Write data with UC permissions
CLUSTER-->>User: Cluster Ready
Timeline:
Docs: Cluster Creation
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
flowchart TD
START["Cluster Node<br/>Initiates Traffic"] --> DNS{DNS Query<br/>What is destination?}
DNS -->|S3 bucket| S3PATH["S3 Gateway Endpoint<br/>FREE, VPC-internal"]
DNS -->|dbc-*.cloud.databricks.com| DBDNS{Private Link<br/>Enabled?}
DNS -->|Public internet| NATPATH["NAT Gateway<br/>β Internet Gateway"]
DBDNS -->|Yes| PRIV["Private IP 10.0.3.x<br/>β VPC Endpoint<br/>β Private Link"]
DBDNS -->|No| NATPATH
S3PATH --> S3["S3 Buckets<br/>DBFS, Unity Catalog"]
PRIV --> CONTROL["Databricks<br/>Control Plane"]
NATPATH --> INTERNET["Public Internet<br/>Maven, PyPI, etc"]
style S3PATH fill:#569A31
style PRIV fill:#FF9900
style NATPATH fill:#FF3621
Key Decision Points:
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
pie title "Resource Distribution 70 Total"
"Networking 30" : 30
"IAM/Security 12" : 12
"Storage 4" : 4
"Databricks 15" : 15
"Unity Catalog 6" : 6
"Optional CMK 3" : 3
VPC & Subnets (9):
βββ 1 VPC
βββ 2 Public subnets
βββ 2 Private subnets (Databricks clusters)
βββ 2 PrivateLink subnets (VPC endpoints)
βββ 2 NAT Gateways
Routing (7):
βββ 3 Route tables (public, private, privatelink)
βββ 6 Route table associations
VPC Endpoints (6):
βββ Databricks Workspace VPCE (8443-8451) [Conditional: Private Link]
βββ Databricks Relay VPCE (6666) [Conditional: Private Link]
βββ S3 Gateway Endpoint (FREE, regional) [Always]
βββ STS Interface Endpoint (regional) [Always]
βββ Kinesis Interface Endpoint (regional) [Always]
βββ RDS Endpoint: NOT CONFIGURED (Unity Catalog deployment)
Regional Endpoint Benefits:
βββ Lower latency (direct regional connections)
βββ Reduced cost (no cross-region data transfer)
βββ Better security (traffic stays in region) β
Security Groups (8):
βββ Workspace SG + 6 rules
βββ VPCE SG + 1 rule
Docs: VPC Requirements
Cross-Account Role (3):
βββ IAM role
βββ IAM policy (inline, Databricks-generated)
βββ Policy attachment
Unity Catalog Metastore Role (3):
βββ IAM role
βββ IAM policy
βββ Policy attachment
Instance Profile (3):
βββ IAM role
βββ IAM policy
βββ IAM instance profile
UC External Location Role (3):
βββ Created in Unity Catalog module
βββ Workspace-specific
βββ Includes workspace ID in name
Docs: IAM Roles
S3 Bucket Encryption:
βββ KMS key
βββ KMS alias
βββ Key policy
Workspace CMK (optional):
βββ KMS key (DBFS/EBS/Managed Services)
βββ KMS alias
βββ Key policy
IAM Policies:
βββ UC Metastore role KMS policy
βββ UC External role KMS policy
Docs: Customer-Managed Keys
S3 Buckets:
βββ DBFS Root bucket
βββ Unity Catalog metastore bucket
βββ Unity Catalog external location bucket
βββ Unity Catalog root storage bucket (conditional)
Docs: S3 Bucket Configuration
MWS Resources:
βββ Credentials configuration
βββ Storage configuration
βββ Network configuration
βββ Customer-managed keys (optional)
βββ Workspace
Private Access Settings:
βββ PAS object (can be reused)
βββ Public access control
Docs: Workspace Configuration
Metastore:
βββ Metastore (or use existing)
βββ Workspace assignment
βββ Admin grants
External Storage:
βββ Storage credential
βββ External location
βββ IAM role (workspace-specific)
βββ IAM policy
βββ Location grants
Workspace Catalog:
βββ Catalog
βββ Default namespace setting
βββ Catalog admin grants
Docs: Unity Catalog Setup
Always Created (55):
βββ Networking: VPC, Subnets, NAT, AWS Service Endpoints
βββ IAM: All roles
βββ Storage: All S3 buckets
βββ Workspace: MWS resources
βββ Unity Catalog: Metastore assignment, catalog
Optional based on enable_private_link=true (2):
βββ Databricks Workspace VPCE
βββ Databricks Relay VPCE
Optional based on enable_encryption=true (3):
βββ S3 encryption KMS key
βββ 2x IAM policies for UC roles
Optional based on enable_workspace_cmk=true (2):
βββ Workspace CMK key
βββ Key policy
Optional based on existing_private_access_settings_id (1):
βββ Private Access Settings (PAS)
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
flowchart LR
START["Configuration<br/>Choice"] --> PL{enable_private_link}
PL -->|true| FULL["Full Private Link<br/>All Databricks traffic<br/>via VPC Endpoints"]
PL -->|false| PUBLIC["Public Internet<br/>via NAT Gateway<br/>Lowest Cost"]
FULL --> ENC{enable_encryption}
PUBLIC --> ENC
ENC -->|true| CMK["+ S3 KMS Encryption<br/>Customer-Managed Keys"]
ENC -->|false| NOCMK["AWS-Managed<br/>Encryption"]
CMK --> WCMK{enable_workspace_cmk}
NOCMK --> WCMK
WCMK -->|true| FULLCMK["+ Workspace CMK<br/>DBFS/EBS/MS Encryption"]
WCMK -->|false| NOWCMK["Standard Encryption"]
style FULL fill:#569A31
style PUBLIC fill:#FF9900
style FULLCMK fill:#1B72E8
Configuration Matrix:
| Scenario | enable_private_link | enable_encryption | enable_workspace_cmk | Cost |
|---|---|---|---|---|
| Development | false | false | false | $ |
| Production Basic | true | false | false | $$ |
| Production Secure | true | true | false | $$$ |
| Maximum Security | true | true | true | \(\) |
β Architecture understood β 02-IAM-SECURITY.md - IAM roles and policies
β Ready to deploy β 04-QUICK-START.md - 5-minute deployment