databricks

Databricks on AWS - Terraform Deployments

Production-ready Terraform configurations for deploying secure Databricks workspaces on AWS with Private Link, Unity Catalog, and customer-managed encryption.

πŸ”‘ Authentication Setup Guide β†’ - Stuck on AWS/Databricks authentication? Start here!


πŸ“ Repository Structure

awsdb4u/
β”œβ”€β”€ aws-pl-ws/                          # Databricks Private Link workspace deployments
β”‚   β”œβ”€β”€ databricks-aws-production/      # ✨ Production-ready deployment (Recommended)
β”‚   β”‚   β”œβ”€β”€ modules/                    # 7 modular Terraform modules
β”‚   β”‚   β”œβ”€β”€ docs/                       # Visual-first documentation
β”‚   β”‚   β”œβ”€β”€ terraform.tfvars.example    # Configuration template
β”‚   β”‚   └── quick-destroy.sh            # Safe cleanup script
β”‚   └── modular-version/                # Legacy version (deprecated)
└── README.md                           # This file

πŸš€ Available Deployments

Production-ready, fully modularized Terraform deployment with comprehensive documentation and enterprise features.

Key Features:

Documentation:

Quick Deploy:

cd aws-pl-ws/databricks-aws-production
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values
terraform init
terraform apply

modular-version (Legacy)

Original modular version - deprecated in favor of databricks-aws-production.

Migration: Users of modular-version should migrate to databricks-aws-production for:


πŸ“‹ Comparison

Feature databricks-aws-production modular-version (legacy)
Status βœ… Active ⚠️ Deprecated
Documentation Visual-first, comprehensive Basic README
Quick Start 5-minute guide Manual configuration
Architecture Diagrams βœ… Mermaid diagrams ❌ None
Troubleshooting βœ… Detailed guide ❌ Limited
CMK Support βœ… Dual-layer (S3 + Workspace) βœ… Basic
BYOR Support ❌ Removed (CREATE only) ❌ N/A
Module Count 7 modules 7 modules
Configuration .tfvars.example template Manual setup

🎯 Which Deployment Should I Use?

Choose databricks-aws-production if you want:

Use modular-version if you:


πŸ—οΈ Architecture Overview

Both deployments create a secure, production-ready Databricks workspace with:

Architecture Diagram

graph TB
    subgraph "AWS VPC (10.0.0.0/22)"
        subgraph "Public Subnets (/26)"
            NAT1[NAT Gateway 1<br/>10.0.0.0/26]
            NAT2[NAT Gateway 2<br/>10.0.0.64/26]
            IGW[Internet Gateway]
        end

        subgraph "Private Subnets (/24 - Databricks Clusters)"
            PRIV1[Private Subnet 1<br/>10.0.1.0/24<br/>251 usable IPs]
            PRIV2[Private Subnet 2<br/>10.0.2.0/24<br/>251 usable IPs]
        end

        subgraph "PrivateLink Subnets (/26 - VPC Endpoints)"
            VPCE1[Workspace VPCE<br/>10.0.3.0/26]
            VPCE2[Relay VPCE<br/>10.0.3.64/26]
            STS[STS VPCE]
            KINESIS[Kinesis VPCE]
        end

        subgraph "Storage"
            S3[S3 Buckets<br/>DBFS + UC]
            S3GW[S3 Gateway Endpoint]
        end
    end

    subgraph "Databricks Control Plane"
        CONTROL[Databricks Control Plane<br/>accounts.cloud.databricks.com]
    end

    PRIV1 -->|NAT| NAT1
    PRIV2 -->|NAT| NAT2
    NAT1 --> IGW
    NAT2 --> IGW

    PRIV1 -.->|Private Link| VPCE1
    PRIV2 -.->|Private Link| VPCE2

    VPCE1 -.->|Backend Private Link| CONTROL
    VPCE2 -.->|Secure Cluster Connectivity| CONTROL

    PRIV1 -->|S3 Access| S3GW
    PRIV2 -->|S3 Access| S3GW
    S3GW --> S3

    style CONTROL fill:#FF3621
    style S3 fill:#569A31
    style VPCE1 fill:#FF9900
    style VPCE2 fill:#FF9900

πŸ“š Detailed Documentation

For detailed technical documentation, architecture, and configuration guides, see:

πŸ‘‰ databricks-aws-production Documentation


🚦 Getting Started

Prerequisites (5 minutes)

1. Databricks Account Requirements

2. AWS Requirements

3. Local Requirements

4. Databricks User


πŸš€ Quick Deployment

1. Choose Your Deployment

cd aws-pl-ws/databricks-aws-production

2. Configure

# Copy example configuration
cp terraform.tfvars.example terraform.tfvars

# Edit with your values
nano terraform.tfvars

3. Set Environment Variables

Add to ~/.zshrc or ~/.bashrc:

export TF_VAR_databricks_account_id="your-account-id"
export TF_VAR_databricks_client_id="your-service-principal-client-id"
export TF_VAR_databricks_client_secret="your-service-principal-secret"

4. Deploy

terraform init
terraform plan
terraform apply

⏱️ Deployment Time: ~15-20 minutes

Full Guide: Quick Start Documentation


πŸ“– Documentation Index

databricks-aws-production

Document Description
00-PREREQUISITES System setup & credentials
01-ARCHITECTURE Architecture & deployment flow
02-IAM-SECURITY IAM roles & policies
03-NETWORK-ENCRYPTION Network security & encryption
04-QUICK-START 5-minute deployment guide
05-TROUBLESHOOTING Common issues & solutions

πŸ”§ Configuration Examples

Minimal Configuration (Default Security)

workspace_name = "my-databricks-workspace"
region         = "us-west-1"
prefix         = "dbx"

# S3 Buckets (globally unique)
root_storage_bucket_name                = "mycompany-dbx-root-storage"
unity_catalog_bucket_name               = "mycompany-dbx-uc-metastore"
unity_catalog_root_storage_bucket_name  = "mycompany-dbx-uc-root-storage"
unity_catalog_external_bucket_name      = "mycompany-dbx-uc-external"

# Security (defaults)
enable_private_link  = true   # Private Link enabled
enable_encryption    = true   # S3 KMS encryption
enable_workspace_cmk = false  # Workspace CMK disabled

Maximum Security Configuration

workspace_name = "my-secure-workspace"
region         = "us-west-1"

# Full encryption
enable_private_link  = true   # Private Link
enable_encryption    = true   # S3 KMS encryption
enable_workspace_cmk = true   # Workspace CMK (DBFS + EBS + Managed Services)

# Public access control
public_access_enabled = false  # Block public internet access

# S3 Buckets
root_storage_bucket_name                = "mycompany-dbx-root-storage"
unity_catalog_bucket_name               = "mycompany-dbx-uc-metastore"
unity_catalog_root_storage_bucket_name  = "mycompany-dbx-uc-root-storage"
unity_catalog_external_bucket_name      = "mycompany-dbx-uc-external"

πŸ› οΈ What Gets Deployed?

AWS Resources (65-70 resources)

β”œβ”€β”€ VPC + 3 subnet tiers (public/private/privatelink)
β”œβ”€β”€ NAT Gateways (2 for HA)
β”œβ”€β”€ Internet Gateway
β”œβ”€β”€ Security Groups (2)
β”œβ”€β”€ VPC Endpoints (5):
β”‚   β”œβ”€β”€ Databricks Workspace (Interface)
β”‚   β”œβ”€β”€ Databricks Relay (Interface)
β”‚   β”œβ”€β”€ S3 (Gateway - FREE)
β”‚   β”œβ”€β”€ STS (Interface)
β”‚   └── Kinesis (Interface)
β”œβ”€β”€ S3 Buckets (4):
β”‚   β”œβ”€β”€ DBFS root storage
β”‚   β”œβ”€β”€ Unity Catalog metastore
β”‚   β”œβ”€β”€ Unity Catalog root storage
β”‚   └── Unity Catalog external
β”œβ”€β”€ IAM Roles (4):
β”‚   β”œβ”€β”€ Cross-account role
β”‚   β”œβ”€β”€ Instance profile role
β”‚   β”œβ”€β”€ UC metastore role
β”‚   └── UC external role
└── KMS Keys (2 - optional):
    β”œβ”€β”€ S3 encryption key
    └── Workspace CMK

Databricks Resources

β”œβ”€β”€ Workspace (with Private Link)
β”œβ”€β”€ Unity Catalog Metastore
β”œβ”€β”€ Unity Catalog Assignment
β”œβ”€β”€ Storage Credentials
β”œβ”€β”€ External Locations
└── Workspace Catalog (optional)

πŸ” Key Features Explained

Traffic Flow:

Databricks Clusters (Private Subnets)
    ↓ (Private)
VPC Endpoints (PrivateLink Subnets)
    ↓ (AWS PrivateLink)
Databricks Control Plane

Benefits:

Dual-Layer Encryption

Layer 1: S3 Bucket Encryption (enable_encryption = true)

Layer 2: Workspace CMK (enable_workspace_cmk = true)

Note: Both layers can be enabled simultaneously for maximum security.

Unity Catalog Integration

Data Governance:

Multi-Workspace Pattern:

Single Unity Catalog Metastore
    β”œβ”€β”€ Workspace 1 (Production)
    β”œβ”€β”€ Workspace 2 (Development)
    └── Workspace 3 (Staging)

βš™οΈ Advanced Configuration

Reuse Existing Resources

Reuse Unity Catalog Metastore

# Skip metastore creation, use existing
metastore_id = "existing-metastore-id"

Reuse Private Access Settings

# Share PAS across multiple workspaces in same region
existing_private_access_settings_id = "existing-pas-id"

Use Existing KMS Key for Workspace CMK

enable_workspace_cmk = true
existing_workspace_cmk_key_arn   = "arn:aws:kms:us-west-1:123456789012:key/12345678-..."
existing_workspace_cmk_key_alias = "alias/databricks-workspace-cmk"

Custom Network Configuration

vpc_cidr                 = "10.0.0.0/22"
private_subnet_cidrs     = ["10.0.1.0/24", "10.0.2.0/24"]
privatelink_subnet_cidrs = ["10.0.3.0/26", "10.0.3.64/26"]
public_subnet_cidrs      = ["10.0.0.0/26", "10.0.0.64/26"]

# Manual AZ selection (or leave empty for auto-detect)
availability_zones = ["us-west-1a", "us-west-1c"]

🧹 Cleanup

Safe Destroy

cd aws-pl-ws/databricks-aws-production
terraform destroy

Issues? See Destroy Troubleshooting


πŸ†˜ Troubleshooting

Common issues and solutions:

Issue Quick Fix
Bucket already exists Change bucket names in terraform.tfvars
AWS auth error aws sso login --profile your-profile
Can’t access workspace Wait 20 minutes after deployment
EIP limit exceeded Release unused Elastic IPs
Provider errors Run terraform init -upgrade

Full Guide: Troubleshooting Documentation


πŸ“ž Support & Resources

Documentation

Getting Help

  1. Check Troubleshooting Guide
  2. Review Architecture Documentation
  3. Enable Terraform debug logs: export TF_LOG=DEBUG
  4. Contact Databricks support for account-specific issues

πŸ“ Version History

Version Status Notes
databricks-aws-production βœ… Active Production-ready, recommended
modular-version ⚠️ Deprecated Legacy version, migrate to production

🀝 Contributing

Improvements and bug fixes are welcome:

  1. Follow visual-first documentation pattern
  2. Test changes thoroughly
  3. Update relevant documentation
  4. Submit issues for questions

πŸ“„ License

This configuration is provided as-is for reference purposes.


Ready to Deploy? β†’ Quick Start Guide ⚑

1. AWS Provider

provider "aws" {
  region  = "us-west-2"
  profile = "your-aws-profile"  # OR use default credentials
}

Authentication Options:

2. Databricks Account Provider

provider "databricks" {
  alias         = "account"
  host          = "https://accounts.cloud.databricks.com"
  account_id    = var.databricks_account_id
  client_id     = var.client_id
  client_secret = var.client_secret
}

Used for:

3. Databricks Workspace Provider

provider "databricks" {
  alias         = "workspace"
  host          = module.databricks_workspace.workspace_url
  client_id     = var.client_id
  client_secret = var.client_secret
}

Used for:


AWS Infrastructure Components

1. Networking (modules/networking)

Creates a 3-tier VPC architecture:

VPC Configuration

VPC CIDR: 10.0.0.0/22 (1024 total IPs - optimized for single workspace)
β”œβ”€β”€ DNS Hostnames: Enabled
└── DNS Support: Enabled

Subnets (2 Availability Zones)

Subnet Type Purpose CIDR Usable IPs Count
Public NAT Gateways, Internet Gateway 10.0.0.0/26, 10.0.0.64/26 59 each 2
Private Databricks Clusters (compute) 10.0.1.0/24, 10.0.2.0/24 251 each 2
PrivateLink VPC Endpoints 10.0.3.0/26, 10.0.3.64/26 59 each 2

CIDR Allocation Strategy:

Route Tables

NAT Gateways

Why Two NAT Gateways?

Aspect 2 NAT Gateways (HA) 1 NAT Gateway (Cost)
Availability βœ… If one AZ fails, other continues ❌ Single point of failure
Cost ~$64/month ~$32/month (50% savings)
Cross-AZ Charges βœ… No extra cost ❌ $0.01/GB transfer fee
Production Ready βœ… Recommended ❌ Dev/test only

πŸ’‘ Cost Optimization: For dev/test environments, you can use a single NAT gateway to save ~$32/month. Update nat_gateway_count = 1 in the networking module configuration. However, two NAT gateways are strongly recommended for production to ensure high availability.


2. VPC Endpoints (modules/networking/vpc_endpoints.tf)

Databricks-Specific Endpoints

Endpoint Type Purpose Port Subnets
Workspace VPC Endpoint Interface REST API, UI access 443, 8443-8451 PrivateLink
Relay VPC Endpoint Interface Secure Cluster Connectivity 6666 PrivateLink

Registration with Databricks:

resource "databricks_mws_vpc_endpoint" "workspace_vpce" {
  account_id          = var.databricks_account_id
  aws_vpc_endpoint_id = aws_vpc_endpoint.workspace.id
  vpc_endpoint_name   = "${var.prefix}-workspace-vpce"
  region              = var.region
}

AWS Service Endpoints

Endpoint Type Purpose
S3 Gateway DBFS, logs, artifacts
STS Interface IAM role assumption
Kinesis Interface Logging and lineage

Private DNS: All interface endpoints have private_dns_enabled = true for automatic DNS resolution.


3. Security Groups (modules/networking/security_groups.tf)

Workspace Security Group (for Databricks Clusters)

Ingress Rules:

βœ… TCP 0-65535 from self (cluster-to-cluster communication)
βœ… UDP 0-65535 from self (cluster-to-cluster communication)

Egress Rules:

βœ… TCP 0-65535 to self (cluster-to-cluster)
βœ… UDP 0-65535 to self (cluster-to-cluster)
βœ… TCP 443 to VPC Endpoint SG (HTTPS to control plane via Private Link)
βœ… TCP 443 to 0.0.0.0/0 (library downloads, external APIs, S3 access)
βœ… TCP 3306 to 0.0.0.0/0 (external metastore - optional)
βœ… TCP 6666 to VPC Endpoint SG (Secure Cluster Connectivity)
βœ… TCP 8443-8451 to VPC Endpoint SG (Unity Catalog, control plane)
βœ… TCP/UDP 53 to 0.0.0.0/0 (DNS resolution)

Important Notes:

VPC Endpoint Security Group

Ingress Rules:

βœ… TCP 443 from Workspace SG (HTTPS from clusters)
βœ… TCP 6666 from Workspace SG (SCC from clusters)
βœ… TCP 8443-8451 from Workspace SG (Unity Catalog, control plane)

Egress Rules:

βœ… ALL to 0.0.0.0/0 (to Databricks control plane)

4. Storage (modules/storage)

Creates S3 buckets with encryption and versioning:

Bucket Purpose Encryption Versioning Public Access
Root Storage Workspace DBFS root AES256 Enabled Blocked
UC Metastore Unity Catalog metastore AES256 Enabled Blocked
UC Root Storage UC root storage location AES256 Enabled Blocked
UC External External data locations AES256 Enabled Blocked

Bucket Policy: Databricks-generated policy attached to root storage bucket for cross-account access.


5. IAM Roles and Policies (modules/iam)

Cross-Account Role

Purpose: Allows Databricks control plane to manage workspace resources

Trust Policy: Databricks AWS account (414351767826)
Permissions: EC2, VPC, S3 (workspace management)

Generated by: databricks_aws_crossaccount_policy data source

Instance Profile Role

Purpose: Grants Databricks clusters access to S3 and AWS services

Trust Policy: EC2 service
Permissions:
  - S3 access to workspace buckets
  - EC2 instance metadata access
  - CloudWatch logs (optional)

Unity Catalog Role

Purpose: Grants Unity Catalog access to S3 data locations

Trust Policy: Databricks Unity Catalog AWS account
Permissions:
  - S3 access to UC metastore bucket
  - S3 access to UC data buckets
  - KMS decrypt (if CMK enabled)

6. Customer-Managed Keys (Optional) (modules/kms)

When enable_workspace_cmk = true:

Single KMS Key for Workspace

Use Cases:

Key Policy:

Permissions:
  βœ… Account root (key administration)
  βœ… Databricks cross-account role (encrypt/decrypt)
  βœ… EC2 service (EBS volume encryption via workspace VPCE)

Policy Conditions:


Databricks Resources

1. Workspace Configuration (modules/databricks_workspace)

Credentials

databricks_mws_credentials
β”œβ”€β”€ Role ARN: Cross-account role
└── Purpose: AWS resource management

Storage Configuration

databricks_mws_storage_configurations
β”œβ”€β”€ Bucket: Root storage bucket
└── Purpose: DBFS root storage

Network Configuration

databricks_mws_networks
β”œβ”€β”€ VPC ID
β”œβ”€β”€ Subnet IDs: Private subnets
β”œβ”€β”€ Security Group: Workspace SG
└── VPC Endpoints:
    β”œβ”€β”€ rest_api: Workspace VPC Endpoint ID
    └── dataplane_relay: Relay VPC Endpoint ID

Private Access Settings

databricks_mws_private_access_settings
β”œβ”€β”€ Public Access: Configurable (default: enabled)
β”œβ”€β”€ Private Access Level: ENDPOINT or ACCOUNT
└── Region: Workspace region

Workspace

databricks_mws_workspaces
β”œβ”€β”€ Credentials ID
β”œβ”€β”€ Storage Configuration ID
β”œβ”€β”€ Network ID
β”œβ”€β”€ Private Access Settings ID
β”œβ”€β”€ CMK IDs (optional):
β”‚   β”œβ”€β”€ Managed Services CMK ID
β”‚   └── Storage CMK ID
└── Pricing Tier: ENTERPRISE

2. Unity Catalog (modules/unity_catalog)

Metastore

databricks_metastore
β”œβ”€β”€ Name: {prefix}-metastore
β”œβ”€β”€ Region: Workspace region
β”œβ”€β”€ Storage Root: s3://uc-metastore-bucket/metastore
└── Owner: Metastore admin email

Metastore Assignment

databricks_metastore_assignment
β”œβ”€β”€ Workspace ID
β”œβ”€β”€ Metastore ID
└── Default Catalog: "main"

Storage Credentials (Optional)

databricks_storage_credential (root_storage)
β”œβ”€β”€ IAM Role: UC root storage role ARN
└── Purpose: Access to UC root storage bucket

databricks_storage_credential (external_storage)
β”œβ”€β”€ IAM Role: UC external storage role ARN
└── Purpose: Access to external data buckets

External Locations (Optional)

databricks_external_location (root_storage)
β”œβ”€β”€ URL: s3://uc-root-storage-bucket/
└── Credential: root_storage

databricks_external_location (external_location)
β”œβ”€β”€ URL: s3://uc-external-bucket/
└── Credential: external_storage

Workspace Catalog (Optional)

databricks_catalog
β”œβ”€β”€ Name: {prefix}_catalog
β”œβ”€β”€ Storage Root: s3://uc-root-storage-bucket/catalog
└── Grants: ALL_PRIVILEGES to workspace admin

3. User Assignment (modules/user_assignment)

Assigns existing Databricks account user as workspace admin:

data "databricks_user" "workspace_access"
β”œβ”€β”€ Provider: databricks.account
└── User Name: workspace_admin_email

databricks_mws_permission_assignment
β”œβ”€β”€ Workspace ID
β”œβ”€β”€ Principal ID: User ID
β”œβ”€β”€ Permissions: ["ADMIN"]
└── Lifecycle: ignore_changes on principal_id

Prerequisites:


Deployment Flow

Module Dependency Graph

graph TD
    A[AWS Infrastructure] --> B[Networking Module]
    A --> C[Storage Module]
    A --> D[IAM Module]
    A --> E[KMS Module - Optional]

    B --> F[VPC + Subnets]
    B --> G[Security Groups]
    B --> H[NAT Gateways]
    B --> I[VPC Endpoints]

    I --> J[Register with Databricks]

    E --> K[Create KMS Key]
    K --> L[Apply Key Policy]

    J --> M[Databricks Workspace Module]
    C --> M
    D --> M
    L --> M

    M --> N[Private Access Settings]
    M --> O[Credentials Config]
    M --> P[Storage Config]
    M --> Q[Network Config]
    N --> R[Create Workspace]
    O --> R
    P --> R
    Q --> R

    R --> S[Unity Catalog Module]
    C --> S

    S --> T[Create Metastore]
    T --> U[Assign to Workspace]
    U --> V[Storage Credentials]
    V --> W[External Locations]
    W --> X[Workspace Catalog]

    U --> Y[User Assignment Module]

    style A fill:#f9f,stroke:#333
    style M fill:#bbf,stroke:#333
    style S fill:#bfb,stroke:#333
    style Y fill:#fbb,stroke:#333
    style R fill:#FF3621

Deployment Sequence

sequenceDiagram
    participant TF as Terraform
    participant AWS as AWS
    participant DB_ACC as Databricks Account
    participant DB_WS as Databricks Workspace

    Note over TF,AWS: Phase 1: AWS Infrastructure
    TF->>AWS: Create VPC + Subnets
    TF->>AWS: Create Security Groups
    TF->>AWS: Create NAT Gateways
    TF->>AWS: Create S3 Buckets
    TF->>AWS: Create IAM Roles
    TF->>AWS: Create VPC Endpoints

    Note over TF,DB_ACC: Phase 2: Register Endpoints
    TF->>DB_ACC: Register Workspace VPCE
    TF->>DB_ACC: Register Relay VPCE
    DB_ACC-->>TF: VPC Endpoint IDs

    Note over TF,AWS: Phase 3: KMS (Optional)
    TF->>AWS: Create KMS Key
    TF->>AWS: Apply Key Policy
    TF->>DB_ACC: Register CMK

    Note over TF,DB_ACC: Phase 4: Workspace
    TF->>DB_ACC: Create Private Access Settings
    TF->>DB_ACC: Create Credentials Config
    TF->>DB_ACC: Create Storage Config
    TF->>DB_ACC: Create Network Config
    TF->>DB_ACC: Create Workspace
    DB_ACC-->>TF: Workspace ID + URL

    Note over TF,DB_ACC: Phase 5: Unity Catalog
    TF->>DB_ACC: Create UC Metastore
    TF->>DB_ACC: Assign Metastore to Workspace
    TF->>DB_WS: Create Storage Credentials
    TF->>DB_WS: Create External Locations
    TF->>DB_WS: Create Workspace Catalog

    Note over TF,DB_ACC: Phase 6: User Assignment
    TF->>DB_ACC: Lookup User
    TF->>DB_ACC: Assign User as Admin

    Note over DB_ACC: Wait 20 minutes for<br/>Backend Private Link

Critical Dependencies

Module-Level Dependencies

module "databricks_workspace" {
  depends_on = [
    module.networking,  # VPC, subnets, VPC endpoints
    module.storage,     # S3 buckets
    module.iam,         # Cross-account role, instance profile
    module.kms          # KMS keys (if enabled)
  ]
}

module "unity_catalog" {
  depends_on = [
    module.databricks_workspace,  # Workspace must exist
    module.storage                # UC buckets must exist
  ]
}

module "user_assignment" {
  depends_on = [
    module.unity_catalog.metastore_assignment_id  # UC must be assigned first
  ]
}

Resource-Level Dependencies

Within Databricks Workspace Module:

VPC Endpoints Registration
    ↓
Private Access Settings (depends on VPC endpoints)
    ↓
Network Configuration (depends on VPC endpoints)
    ↓
Workspace Creation (depends on all configurations)

Within Unity Catalog Module:

Metastore Creation
    ↓
Metastore Assignment to Workspace
    ↓
Storage Credentials (IAM roles must exist)
    ↓
External Locations (credentials must exist)
    ↓
Workspace Catalog (external locations must exist)

Configuration

1. Required Variables

Create terraform.tfvars:

# AWS Configuration
region      = "us-west-2"
aws_profile = "your-aws-profile"  # Or use default credentials

# Databricks Account Configuration
databricks_account_id = "your-account-id"
client_id             = "your-service-principal-client-id"
client_secret         = "your-service-principal-secret"

# Workspace Configuration
workspace_name        = "my-databricks-workspace"
workspace_admin_email = "admin@example.com"

# Metastore Configuration
metastore_admin_email = "admin@example.com"

# Network Configuration
vpc_cidr             = "10.0.0.0/22"
private_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24"]
public_subnet_cidrs  = ["10.0.0.0/26", "10.0.0.64/26"]
privatelink_subnet_cidrs = ["10.0.3.0/26", "10.0.3.64/26"]

# VPC Endpoint Services (region-specific)
workspace_vpce_service = "com.amazonaws.vpce.us-west-2.vpce-svc-xxxxx"
relay_vpce_service     = "com.amazonaws.vpce.us-west-2.vpce-svc-yyyyy"

# AWS Account
aws_account_id = "123456789012"

# Tags
tags = {
  Environment = "dev"
  Project     = "databricks-privatelink"
  ManagedBy   = "terraform"
}

2. Optional Features

Customer-Managed Keys

enable_workspace_cmk = true
cmk_admin_arn        = "arn:aws:iam::123456789012:user/admin"

IP Access Lists

enable_ip_access_lists = true
allowed_ip_addresses   = ["1.2.3.4/32", "5.6.7.8/32"]

Unity Catalog Workspace Catalog

create_workspace_catalog = true

Private Access Settings

public_access_enabled = false      # Fully private workspace
private_access_level  = "ACCOUNT"  # or "ENDPOINT"

Deployment

Step 1: Initialize Terraform

cd aws/modular-version
terraform init

Step 2: Validate Configuration

terraform validate

Step 3: Review Plan

terraform plan

Step 4: Apply Configuration

terraform apply

Deployment Time: ~15-20 minutes

⚠️ IMPORTANT: Wait 20 minutes after workspace creation before creating clusters.

This allows the backend Private Link connection to fully stabilize.

Step 6: Verify Deployment

  1. Access workspace at the output URL
  2. Log in with workspace admin email
  3. Verify Unity Catalog metastore is assigned
  4. Create a test cluster to verify connectivity

Troubleshooting

Common Issues and Solutions

1. Provider Type Mismatch Error

Error:

Error: Provider type mismatch
The local name "databricks.account" in the root module represents provider
"hashicorp/databricks", but "databricks.account" in module represents
"databricks/databricks".

Solution:

rm -rf .terraform .terraform.lock.hcl
terraform init -upgrade

Cause: Terraform provider cache needs to be refreshed.


2. Security Group Egress Rules Warning

Warning:

Warning: Egress rules in the Security Group sg-xxxxx are not configured correctly.
See the requirements at https://docs.databricks.com/administration-guide/cloud-configurations/aws/customer-managed-vpc.html#security-groups.

Common Causes:

  1. Redundant S3 prefix list rule - Having both 0.0.0.0/0 and S3 prefix list for port 443
  2. Missing required egress rules - Not allowing traffic to VPC endpoints or self-referencing rules
  3. Manual AWS Console changes - Rules added/modified outside of Terraform

Solution:

Check for redundant rules:

# View all egress rules for the workspace security group
aws ec2 describe-security-groups \
  --group-ids <WORKSPACE_SG_ID> \
  --query 'SecurityGroups[0].IpPermissionsEgress[?FromPort==`443`]' \
  --region <REGION>

If you see both 0.0.0.0/0 and a prefix list for port 443, remove the prefix list:

# Remove redundant S3 prefix list rule (change pl-xxxxx to your prefix list ID)
aws ec2 revoke-security-group-egress \
  --group-id <WORKSPACE_SG_ID> \
  --ip-permissions 'IpProtocol=tcp,FromPort=443,ToPort=443,PrefixListIds=[{PrefixListId=pl-xxxxx}]' \
  --region <REGION>

Verify Terraform configuration matches AWS:

terraform plan  # Should show no changes if in sync

Why this happens:

Cause: Drift between Terraform state and actual AWS configuration, or manual changes in AWS Console.


3. VPC Endpoint Service Not Found

Error:

Error: creating EC2 VPC Endpoint: InvalidServiceName

Solution:

  1. Verify you’re using the correct region
  2. Get the correct VPC endpoint service names from Databricks support:
    # Contact Databricks support for your region-specific service names
    workspace_vpce_service = "com.amazonaws.vpce.REGION.vpce-svc-xxxxx"
    relay_vpce_service     = "com.amazonaws.vpce.REGION.vpce-svc-yyyyy"
    

4. Cannot Create User Assignment Error

Error:

Error: cannot create mws permission assignment: Permission assignment APIs are not available

Solution: This API requires:

Workaround:

# In terraform.tfvars, leave empty to skip:
workspace_admin_email = ""

# Then assign admin manually via Databricks UI

5. KMS Key Policy Circular Dependency

Error:

Error: Cycle: module.kms, module.iam

Solution: The KMS module constructs the cross-account role ARN internally:

locals {
  cross_account_role_arn = "arn:aws:iam::${var.aws_account_id}:role/${var.prefix}-crossaccount"
}

Verify: Remove explicit depends_on = [module.iam] from KMS module call in root main.tf.


6. Workspace URL Double HTTPS Error

Error:

Error: Config: host=https://https://...

Solution: The workspace URL output should not include https:// prefix:

# Correct:
output "workspace_url" {
  value = databricks_mws_workspaces.workspace.workspace_url
}

# Incorrect:
output "workspace_url" {
  value = "https://${databricks_mws_workspaces.workspace.workspace_url}"
}

7. Cluster Creation Fails After Workspace Deployment

Error:

Cluster creation failed: Unable to connect to data plane

Solution:

  1. Wait 20 minutes after workspace creation for backend Private Link to stabilize
  2. Verify VPC endpoints are in β€œavailable” state:
    aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxxxx
    
  3. Check security group rules allow traffic to VPC endpoints
  4. Verify NAT gateways are healthy in both AZs

8. S3 Access Denied Errors

Error:

Error: Access Denied when accessing S3 bucket

Solution:

  1. Verify IAM roles have correct permissions:
    terraform state show 'module.iam.aws_iam_role.instance_profile_role'
    terraform state show 'module.iam.aws_iam_role.unity_catalog_role'
    
  2. Check bucket policies are attached:
    aws s3api get-bucket-policy --bucket your-bucket-name
    
  3. Verify IAM role assumption is working:
    aws sts assume-role --role-arn arn:aws:iam::ACCOUNT:role/ROLE-NAME --role-session-name test
    

9. Unity Catalog Metastore Assignment Fails

Error:

Error: cannot assign metastore to workspace

Solution:

  1. Verify workspace is in RUNNING state:
    terraform state show 'module.databricks_workspace.databricks_mws_workspaces.workspace'
    
  2. Check metastore exists:
    terraform state show 'module.unity_catalog.databricks_metastore.this'
    
  3. Ensure workspace and metastore are in the same region
  4. Verify account has Unity Catalog enabled

10. Destroy Fails with Dynamic Provider Error

Error:

Error: cannot read storage credential: failed during request visitor

Solution: Use targeted destroy:

# Step 1: Remove Unity Catalog resources
terraform destroy -target=module.user_assignment
terraform destroy -target=module.unity_catalog

# Step 2: Remove workspace
terraform destroy -target=module.databricks_workspace

# Step 3: Remove remaining resources
terraform destroy

Alternative: Use the provided destroy script:

./scripts/pre-destroy.sh
terraform destroy

Debug Commands

Check Terraform State

terraform state list
terraform state show 'module.name.resource.name'

Refresh State

terraform refresh

View Outputs

terraform output
terraform output -json | jq

Enable Debug Logging

export TF_LOG=DEBUG
export TF_LOG_PATH=terraform-debug.log
terraform apply

Validate Network Connectivity

# Test VPC endpoint DNS resolution
nslookup workspace-endpoint-name.vpce-svc-xxxxx.us-west-2.vpce.amazonaws.com

# Check NAT Gateway status
aws ec2 describe-nat-gateways --region us-west-2

# Verify security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxx

Additional Resources


Support

For issues or questions:

  1. Check the Troubleshooting section above
  2. Review Terraform debug logs (TF_LOG=DEBUG)
  3. Consult Databricks documentation
  4. Contact Databricks support for account-specific issues

License

This configuration is provided as-is for reference purposes.