A Terraform configuration for deploying a basic Databricks workspace on Google Cloud Platform (GCP) using a customer-managed VPC.
This deployment creates a basic Databricks workspace with:
graph TB
subgraph "GCP Project - Host/Shared VPC"
subgraph "Customer VPC"
SUBNET[Node Subnet<br/>Databricks Clusters]
NAT[Cloud NAT<br/>Optional]
end
end
subgraph "GCP Project - Service/Consumer"
subgraph "Databricks Managed"
GKE[GKE Cluster<br/>Control Plane Components]
GCS[GCS Bucket<br/>DBFS Storage]
end
end
subgraph "Databricks Control Plane"
CONTROL[Databricks Control Plane<br/>accounts.gcp.databricks.com]
end
subgraph "Users"
USER[Workspace Users<br/>Web Browser]
end
SUBNET --> CONTROL
SUBNET --> GCS
GKE --> SUBNET
USER --> CONTROL
CONTROL --> GKE
style CONTROL fill:#FF3621
style GCS fill:#4285F4
style GKE fill:#4285F4
style SUBNET fill:#34A853
This is a minimal workspace deployment. It does NOT include:
For these features, see:
../byovpc-psc-ws/../byovpc-cmek-ws/../byovpc-psc-cmek-ws/../end2end/../infra4db/https://accounts.gcp.databricks.comautomation-sa@project.iam.gserviceaccount.com)This configuration requires a pre-existing VPC with appropriate subnets. To create the infrastructure, use ../infra4db/ first.
Required:
/24 CIDR recommended (251 usable IPs)The service account needs these IAM roles on both projects:
On Service/Consumer Project (where workspace will be created):
roles/compute.networkAdminroles/iam.serviceAccountAdminroles/resourcemanager.projectIamAdminroles/storage.adminOn Host/Shared VPC Project (if using Shared VPC):
roles/compute.networkUserroles/compute.securityAdminFor detailed role requirements, see Databricks Documentation.
You need two project IDs:
google_project_name): Where Databricks resources will be createdgoogle_shared_vpc_project): Where your VPC network existsNote: If not using Shared VPC, both values should be the same project ID.
gcloud CLI) configureddatabricks_admin_user variableThis deployment uses three Terraform providers:
Manages resources in the service/consumer project.
provider "google" {
project = var.google_project_name
region = var.google_region
}
Manages resources in the host/shared VPC project.
provider "google" {
alias = "vpc_project"
project = var.google_shared_vpc_project
region = var.google_region
}
Creates workspace and account-level configurations.
provider "databricks" {
alias = "accounts"
host = "https://accounts.gcp.databricks.com"
google_service_account = var.google_service_account_email
}
Used for:
Manages workspace-level configurations after workspace creation.
provider "databricks" {
alias = "workspace"
host = databricks_mws_workspaces.databricks_workspace.workspace_url
google_service_account = var.google_service_account_email
}
Used for:
# Set the service account to impersonate
gcloud config set auth/impersonate_service_account automation-sa@project.iam.gserviceaccount.com
# Generate access token
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
# Download service account key
gcloud iam service-accounts keys create ~/sa-key.json \
--iam-account=automation-sa@project.iam.gserviceaccount.com
# Set environment variable
export GOOGLE_APPLICATION_CREDENTIALS=~/sa-key.json
Security Best Practice: Use Option 1 (impersonation) to avoid managing key files.
For detailed authentication guide, see ../sa-impersonation.md.
Before deploying the workspace, ensure you have:
google_vpc_id variablegoogle_shared_vpc_projectnode_subnet variable/24 (251 IPs)google_region variableEgress (Outbound) - Required:
*.gcp.databricks.com (control plane)*.googleapis.com (GCP APIs)*.docker.io, *.maven.org, *.pypi.org (package downloads)Ingress (Inbound) - Optional:
../byovpc-psc-ws/)Minimum required firewall rules (managed separately):
Source: Node subnet CIDR
Target: Node subnet CIDR
Protocols: TCP, UDP, ICMP (all ports)
Source: Node subnet CIDR
Target: 0.0.0.0/0
Protocols: TCP 443, 3306 (HTTPS, external metastore)
For infrastructure creation including firewall rules, use ../infra4db/.
resource "databricks_mws_networks" "databricks_network"
Creates:
Key Attributes:
network_name: Generated with random suffix for uniquenessnetwork_project_id: Host/shared VPC projectvpc_id: Your VPC namesubnet_id: Your node subnet namesubnet_region: Must match workspace regionresource "databricks_mws_workspaces" "databricks_workspace"
Creates:
Key Attributes:
workspace_name: Display name in Databricks consolelocation: GCP region for workspacecloud_resource_container.gcp.project_id: Your service projectnetwork_id: Links to network configurationDeployment Time: ~10-15 minutes
resource "databricks_user" "me"
resource "databricks_group_member" "ws_admin_member0"
Creates:
admins groupgraph TD
A[Start] --> B[Authenticate with GCP]
B --> C[Verify Existing VPC & Subnet]
C --> D[Create Random Suffix]
D --> E[Create Network Configuration]
E --> F[Create Databricks Workspace]
F --> G[Wait for Workspace Provisioning]
G --> H[Lookup Admins Group]
H --> I[Create User in Workspace]
I --> J[Add User to Admins Group]
J --> K[Workspace Ready]
style A fill:#4285F4
style K fill:#34A853
style F fill:#FF3621
style G fill:#FBBC04
sequenceDiagram
participant TF as Terraform
participant GCP as Google Cloud
participant DB_ACC as Databricks Account
participant DB_WS as Databricks Workspace
Note over TF,GCP: Phase 1: Validation
TF->>GCP: Verify VPC exists
TF->>GCP: Verify Subnet exists
TF->>GCP: Verify Service Account permissions
Note over TF,DB_ACC: Phase 2: Network Configuration
TF->>DB_ACC: Create Network Config
DB_ACC-->>TF: Network ID
Note over TF,DB_ACC: Phase 3: Workspace Creation
TF->>DB_ACC: Create Workspace
DB_ACC->>GCP: Deploy GKE Cluster
DB_ACC->>GCP: Create GCS Bucket
DB_ACC->>GCP: Configure Networking
GCP-->>DB_ACC: Resources Ready
DB_ACC-->>TF: Workspace URL + ID
Note over TF,DB_WS: Phase 4: User Assignment
TF->>DB_WS: Lookup Admins Group
TF->>DB_WS: Create User
TF->>DB_WS: Add User to Admins Group
DB_WS-->>TF: User Configured
Note over DB_WS: Workspace Ready for Use
Edit providers.auto.tfvars:
# Service Account for Terraform authentication
google_service_account_email = "automation-sa@my-service-project.iam.gserviceaccount.com"
# Service/Consumer Project (where workspace will be created)
google_project_name = "my-service-project"
# Host/Shared VPC Project (where VPC network exists)
# If not using Shared VPC, use the same value as google_project_name
google_shared_vpc_project = "my-host-project"
# GCP Region
google_region = "us-central1"
Edit workspace.auto.tfvars:
# Databricks Account ID (found in Account Console)
databricks_account_id = "12345678-1234-1234-1234-123456789abc"
# Databricks Account Console URL
databricks_account_console_url = "https://accounts.gcp.databricks.com"
# Workspace Name
databricks_workspace_name = "my-databricks-workspace"
# Admin User Email (must be valid user in your organization)
databricks_admin_user = "admin@mycompany.com"
# Existing VPC Name
google_vpc_id = "my-vpc-network"
# Existing Subnet Name
node_subnet = "databricks-node-subnet"
Before deployment, verify:
# Option 1: Service Account Impersonation (Recommended)
gcloud config set auth/impersonate_service_account automation-sa@project.iam.gserviceaccount.com
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
# Option 2: Service Account Key
export GOOGLE_APPLICATION_CREDENTIALS=~/sa-key.json
cd gcp/gh-repo/gcp/terraform-scripts/byovpc-ws
terraform init
Expected Output:
Initializing provider plugins...
- Installing databricks/databricks...
- Installing hashicorp/google...
- Installing hashicorp/random...
Terraform has been successfully initialized!
terraform validate
terraform plan
Review the plan carefully:
Expected Resources:
random_string.databricks_suffixdatabricks_mws_networks.databricks_networkdatabricks_mws_workspaces.databricks_workspacedatabricks_user.medatabricks_group_member.ws_admin_member0terraform apply
Type yes when prompted.
Deployment Time: ~10-15 minutes
Progress:
terraform output
Expected Outputs:
workspace_url = "https://12345678901234.1.gcp.databricks.com"
After successful deployment, the following outputs are available:
| Output | Description | Example |
|---|---|---|
workspace_url |
URL to access the Databricks workspace | https://1234567890123456.1.gcp.databricks.com |
To view outputs:
terraform output
terraform output workspace_url
terraform output -json
Error:
Error: google: could not find default credentials
Solution:
# Verify authentication
gcloud auth list
# Re-authenticate
gcloud auth application-default login
# Or set service account impersonation
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
Error:
Error: service account not found in Databricks account
Solution:
Error:
Error: network not found: databricks-vpc
Solution:
gcloud compute networks list --project=my-host-project
gcloud compute networks subnets list \
--network=databricks-vpc \
--project=my-host-project
google_vpc_id = "actual-vpc-name"
node_subnet = "actual-subnet-name"
Error:
Error: googleapi: Error 403: Permission denied
Solution:
Verify service account has required roles:
# Check service project permissions
gcloud projects get-iam-policy my-service-project \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:automation-sa@my-service-project.iam.gserviceaccount.com"
# Check host project permissions (if using Shared VPC)
gcloud projects get-iam-policy my-host-project \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:automation-sa@my-service-project.iam.gserviceaccount.com"
Grant missing roles:
# On service project
gcloud projects add-iam-policy-binding my-service-project \
--member="serviceAccount:automation-sa@my-service-project.iam.gserviceaccount.com" \
--role="roles/compute.networkAdmin"
# On host project (if using Shared VPC)
gcloud projects add-iam-policy-binding my-host-project \
--member="serviceAccount:automation-sa@my-service-project.iam.gserviceaccount.com" \
--role="roles/compute.networkUser"
Error:
Error: timeout while waiting for workspace to become ready
Solution:
This can happen if GCP resource quotas are exceeded or there are networking issues.
gcloud compute project-info describe --project=my-service-project
# Remove from Terraform state
terraform state rm databricks_mws_workspaces.databricks_workspace
# Delete manually in Account Console
# Then re-run terraform apply
Error: User sees “Access Denied” when trying to log in to workspace.
Solution:
terraform state show databricks_group_member.ws_admin_member0
terraform apply -target=databricks_user.me
terraform apply -target=databricks_group_member.ws_admin_member0
Error:
Error: network configuration with name already exists
Solution:
The random suffix is not unique. This is rare but can happen.
# Force new random suffix
terraform taint random_string.databricks_suffix
terraform apply
To destroy all resources created by this configuration:
terraform destroy
Warning: This will:
force_destroy = true)Before destroying:
Note: The VPC and subnets are NOT destroyed as they were not created by this configuration.
After successfully deploying your basic workspace, consider:
../byovpc-psc-ws/ for PSC-enabled workspace../byovpc-cmek-ws/ for CMEK-enabled workspace../end2end/ for complete workspace with Unity Catalog../uc/ for standalone Unity Catalog setupConfigure Cluster Policies: Control cluster configurations and costs
Set Up IP Access Lists: Restrict access to specific IP ranges
For issues or questions:
This configuration is provided as a reference implementation for deploying Databricks workspaces on GCP.