A comprehensive Terraform configuration for deploying a production-ready Databricks workspace on Google Cloud Platform (GCP) with Unity Catalog for data governance, external storage locations, cluster policies, and complete user/group management.
This deployment creates a complete, production-ready Databricks platform with:
Note: This configuration assumes you already have VPC infrastructure. For infrastructure creation, see
../infra4db/.
graph TB
subgraph "GCP Project - Host/Shared VPC"
subgraph "Customer VPC"
SUBNET[Node Subnet<br/>Databricks Clusters]
NAT[Cloud NAT]
end
end
subgraph "GCP Project - Service/Consumer"
subgraph "Databricks Workspace"
WS[Workspace<br/>Notebooks & Clusters]
subgraph "Cluster Policies"
CP1[Fair Use Policy<br/>Max 10 DBU/hr]
CP2[Auto-termination<br/>20 minutes]
TAGS[Custom Tags<br/>Team & CostCenter]
end
end
subgraph "Unity Catalog"
META[Metastore<br/>Data Governance]
subgraph "Catalogs"
MAIN[Main Catalog<br/>Default]
DEV[Dev Catalog<br/>Development]
end
subgraph "Schemas"
DEVDB[DevDB Schema<br/>Dev Database]
end
end
subgraph "Storage Accounts"
GCS_META[GCS Bucket<br/>Metastore Storage]
GCS_EXT[GCS Bucket<br/>External Location]
end
subgraph "Storage Credentials"
CRED1[Default Credentials<br/>Databricks SA]
CRED2[External Credentials<br/>Databricks SA]
end
end
subgraph "Databricks Account"
subgraph "Groups"
UC_ADMIN[UC Admins Group]
GROUP1[Data Engineering Group]
GROUP2[Data Science Group]
end
subgraph "Users"
ADMIN1[Admin User 1]
ADMIN2[Service Account]
USER1[Dev User]
USER2[Data Scientist]
end
end
subgraph "Databricks Control Plane"
CONTROL[Databricks Control Plane<br/>accounts.gcp.databricks.com]
end
META --> GCS_META
CRED1 --> GCS_META
CRED2 --> GCS_EXT
DEV --> DEVDB
META --> MAIN
META --> DEV
WS --> META
WS --> CP1
CP1 --> CP2
CP2 --> TAGS
UC_ADMIN --> ADMIN1
UC_ADMIN --> ADMIN2
GROUP1 --> USER1
GROUP2 --> USER2
WS --> GROUP1
WS --> GROUP2
SUBNET --> CONTROL
CONTROL --> WS
style META fill:#FF3621
style UC_ADMIN fill:#FBBC04
style WS fill:#4285F4
style CP1 fill:#34A853
This configuration provides a complete Databricks deployment, not just infrastructure:
| Layer | Components | Purpose |
|---|---|---|
| Infrastructure | VPC, Subnets, Workspace | Foundation for compute and storage |
| Data Governance | Unity Catalog, Metastore | Centralized metadata and access control |
| Storage | Managed + External locations | Organized data storage with credentials |
| Access Management | Groups, Users, Permissions | RBAC across all resources |
| Cost Control | Cluster policies, Tags | Spend management and attribution |
| Security | IP access lists, Firewall | Network and access security |
https://accounts.gcp.databricks.comautomation-sa@project.iam.gserviceaccount.com)This configuration requires a pre-existing VPC. Use ../infra4db/ to create:
Required:
/24 CIDR)On Service/Consumer Project:
roles/compute.networkAdminroles/iam.serviceAccountAdminroles/resourcemanager.projectIamAdminroles/storage.adminOn Host/Shared VPC Project (if using Shared VPC):
roles/compute.networkUserroles/compute.securityAdminmws_permission_assignmentThis deployment uses three Terraform providers:
provider "google" {
project = var.google_project_name
region = var.google_region
}
provider "google" {
alias = "vpc_project"
project = var.google_shared_vpc_project
region = var.google_region
}
provider "databricks" {
alias = "accounts"
host = "https://accounts.gcp.databricks.com"
google_service_account = var.google_service_account_email
}
Used for:
provider "databricks" {
alias = "workspace"
host = databricks_mws_workspaces.databricks_workspace.workspace_url
google_service_account = var.google_service_account_email
}
Used for:
google_vpc_id variablegoogle_shared_vpc_projectnode_subnet variable/24 (251 IPs)Egress (Required):
*.gcp.databricks.com (control plane)*.googleapis.com (GCP APIs)*.docker.io, *.maven.org, *.pypi.org (packages)workspace.tf)resource "databricks_mws_workspaces" "databricks_workspace"
Creates:
Key Features:
unity-setup.tf)resource "databricks_metastore" "this"
Creates:
Configuration:
true (for testing)resource "databricks_metastore_data_access" "first"
Creates:
storage.objectAdmin on metastore bucketstorage.legacyBucketReader on metastore bucketresource "databricks_metastore_assignment" "this"
Links:
resource "databricks_grants" "all_grants"
Grants:
CREATE_CATALOG, CREATE_EXTERNAL_LOCATION, CREATE_STORAGE_CREDENTIALUSE_CONNECTION, CREATE_EXTERNAL_LOCATION, CREATE_STORAGE_CREDENTIALunity-setup.tf)resource "databricks_group" "uc_admins"
Purpose:
Members:
resource "databricks_group" "data_eng"
resource "databricks_group" "data_science"
Purpose:
Workspace Assignment:
resource "databricks_mws_permission_assignment"
data_science → ADMIN roledata_eng → USER roleunity-objects-management.tf)resource "databricks_catalog" "dev"
Configuration:
devGrants:
USE_CATALOG, CREATE_SCHEMAUSE_CATALOG, CREATE_SCHEMAUSE_CATALOG, CREATE_SCHEMA, USE_SCHEMAresource "databricks_schema" "dev_database"
Configuration:
dev catalogdevdbGrants:
USE_SCHEMAUSE_SCHEMAunity-objects-management.tf)resource "google_storage_bucket" "ext_bucket"
Purpose:
resource "databricks_storage_credential" "external_storage1_credential"
Creates:
Permissions:
roles/storage.objectAdmin (read/write)roles/storage.legacyBucketReader (list)Grants:
CREATE_EXTERNAL_TABLE, READ_FILES, WRITE_FILESALL_PRIVILEGESresource "databricks_external_location" "external_storage1"
Configuration:
the-ext-locationPurpose:
cluster_policies.tf)resource "databricks_cluster_policy" "fair_use"
Policy Definition:
| Setting | Type | Value | Purpose |
|---|---|---|---|
dbus_per_hour |
range | max: 10 | Cost control |
autotermination_minutes |
fixed | 20 | Prevent runaway costs |
custom_tags.Team |
fixed | From variable | Cost attribution |
custom_tags.CostCenter |
fixed | From variable | Billing allocation |
Permissions:
CAN_USEBenefits:
sequenceDiagram
participant TF as Terraform
participant GCP as Google Cloud
participant DB_ACC as Databricks Account
participant DB_WS as Databricks Workspace
participant UC as Unity Catalog
Note over TF,DB_ACC: Phase 1: Workspace
TF->>DB_ACC: Create Network Configuration
TF->>DB_ACC: Create Workspace
DB_ACC->>GCP: Deploy GKE Cluster
DB_ACC->>GCP: Create DBFS Bucket
DB_ACC-->>TF: Workspace URL
Note over TF,DB_ACC: Phase 2: Groups and Users
TF->>DB_ACC: Create UC Admins Group
TF->>DB_ACC: Create Data Engineering Group
TF->>DB_ACC: Create Data Science Group
TF->>DB_ACC: Create Users
TF->>DB_ACC: Add Users to Groups
Note over TF,GCP: Phase 3: Storage
TF->>GCP: Create Metastore GCS Bucket
TF->>GCP: Create External GCS Bucket
Note over TF,UC: Phase 4: Unity Catalog
TF->>UC: Create Metastore
TF->>UC: Create Default Storage Credential
TF->>GCP: Grant Bucket Permissions to SA
TF->>UC: Assign Metastore to Workspace
TF->>UC: Grant Metastore Permissions
Note over TF,DB_ACC: Phase 5: Workspace Assignments
TF->>DB_ACC: Assign Data Science Group (ADMIN)
TF->>DB_ACC: Assign Data Engineering Group (USER)
Note over TF,DB_WS: Phase 6: Catalogs & Schemas
TF->>DB_WS: Create Dev Catalog
TF->>DB_WS: Grant Catalog Permissions
TF->>DB_WS: Create DevDB Schema
TF->>DB_WS: Grant Schema Permissions
Note over TF,UC: Phase 7: External Storage
TF->>UC: Create Storage Credential
TF->>GCP: Grant External Bucket Permissions
TF->>UC: Create External Location
TF->>UC: Grant External Location Permissions
Note over TF,DB_WS: Phase 8: Cluster Policies
TF->>DB_WS: Create Fair Use Policy
TF->>DB_WS: Grant Policy Permissions
Note over TF,DB_WS: Phase 9: Workspace Config
TF->>DB_WS: Enable IP Access Lists
TF->>DB_WS: Configure Allowed IPs
Note over DB_WS: Complete Platform Ready
The configuration uses depends_on extensively to ensure proper ordering:
Workspace
↓
Groups & Users
↓
Metastore Creation
↓
Metastore Assignment
↓
Workspace Assignments (Groups)
↓
Catalogs, Schemas, External Locations, Cluster Policies
↓
Permissions and Grants
Edit providers.auto.tfvars:
# Service Account
google_service_account_email = "automation-sa@my-service-project.iam.gserviceaccount.com"
# Projects
google_project_name = "my-service-project"
google_shared_vpc_project = "my-host-project"
# Region
google_region = "us-central1"
Edit workspace.auto.tfvars:
# Databricks Account
databricks_account_id = "12345678-1234-1234-1234-123456789abc"
databricks_account_console_url = "https://accounts.gcp.databricks.com"
databricks_workspace_name = "my-production-workspace"
databricks_admin_user = "admin@mycompany.com"
# Network Configuration
google_vpc_id = "my-vpc-network"
node_subnet = "databricks-node-subnet"
Edit unity-setup.auto.tfvars:
# Unity Catalog Groups
uc_admin_group_name = "unity-catalog-admins"
group_name1 = "data-engineering"
group_name2 = "data-science"
# Metastore Name
metastore_name = "production-metastore"
# External Storage
external_storage = "external-data"
Edit cluster_policies.auto.tfvars:
# Cluster Policy
cluster_policy1_name = "fair-use"
# Custom Tags for Cost Attribution
custom_tag_team = "DataPlatform"
custom_tag_cost_center = "Engineering-12345"
Before deployment:
# Option 1: Service Account Impersonation
gcloud config set auth/impersonate_service_account automation-sa@project.iam.gserviceaccount.com
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
# Option 2: Service Account Key
export GOOGLE_APPLICATION_CREDENTIALS=~/sa-key.json
cd gcp/gh-repo/gcp/terraform-scripts/end2end
terraform init
terraform plan
Expected Resources (~40-50 resources):
terraform apply
Deployment Time: ~20-30 minutes
Progress:
terraform output
Check outputs:
workspace_url = "https://12345678901234.1.gcp.databricks.com"
metastore_id = "uuid"
uc_admins_group_id = "group-id"
...
-- In Databricks SQL or Notebook
SHOW CATALOGS;
-- Should show: main, dev
SHOW SCHEMAS IN dev;
-- Should show: devdb, information_schema
USE CATALOG dev;
USE SCHEMA devdb;
-- Test table creation
CREATE TABLE test_table (id INT, name STRING);
INSERT INTO test_table VALUES (1, 'test');
SELECT * FROM test_table;
-- Create external table
CREATE EXTERNAL TABLE dev.devdb.external_test
LOCATION 'gs://external-data-<region>-<suffix>/test_data';
-- Verify access
SELECT * FROM dev.devdb.external_test;
As Data Engineering User:
-- Should work
USE CATALOG dev;
CREATE SCHEMA test_schema;
-- Should fail (no access to main)
USE CATALOG main;
CREATE SCHEMA test_schema;
As Data Science Admin:
| Output | Description |
|---|---|
workspace_url |
Databricks workspace URL |
workspace_id |
Workspace ID for metastore assignment |
metastore_id |
Unity Catalog metastore ID |
uc_admins_group_id |
UC Admins group ID |
data_eng_group_id |
Data Engineering group ID |
data_science_group_id |
Data Science group ID |
dev_catalog_name |
Development catalog name |
external_location_name |
External location name |
cluster_policy_id |
Fair use cluster policy ID |
ingress_firewall_enabled |
IP access list status |
View all outputs:
terraform output
terraform output -json | jq
Error:
Error: cannot assign metastore to workspace
Solution:
terraform state show databricks_mws_workspaces.databricks_workspace
Ensure workspace and metastore are in same region
Check Unity Catalog is enabled for account
Error:
Error: cannot create mws permission assignment: Permission assignment APIs are not available
Solution:
This API requires Unity Catalog to be assigned:
terraform state show databricks_metastore_assignment.this
Ensure depends_on includes metastore assignment
Error:
Error: cannot create storage credential
Solution:
Verify metastore assignment completed
terraform state show databricks_grants.all_grants
Ensure depends_on includes grants
Error:
Error: external location validation failed: cannot access bucket
Solution:
gcloud storage buckets get-iam-policy gs://external-data-bucket
Check both storage.objectAdmin and storage.legacyBucketReader granted
Wait 1-2 minutes for IAM propagation
terraform apply -target=databricks_external_location.external_storage1
Error:
Error: permission denied when creating catalog
Solution:
terraform state show databricks_mws_permission_assignment.add_admin_group
terraform state show databricks_grants.all_grants
Error:
Error: cannot create cluster policy
Solution:
terraform state show databricks_mws_permission_assignment.add_non_admin_group
terraform console
> jsonencode(local.default_policy)
# Check workspace status
terraform state show databricks_mws_workspaces.databricks_workspace
# Check metastore
terraform state show databricks_metastore.this
# Check metastore assignment
terraform state show databricks_metastore_assignment.this
# Check groups
terraform state show databricks_group.uc_admins
terraform state show databricks_group.data_eng
terraform state show databricks_group.data_science
# Check workspace assignments
terraform state list | grep mws_permission_assignment
# Check catalogs and schemas
terraform state show databricks_catalog.dev
terraform state show databricks_schema.dev_database
# Check external storage
terraform state show google_storage_bucket.ext_bucket
terraform state show databricks_storage_credential.external_storage1_credential
terraform state show databricks_external_location.external_storage1
# Check cluster policy
terraform state show databricks_cluster_policy.fair_use
# View all outputs
terraform output -json | jq
⚠️ Before destroying:
Step 1: Remove metastore data access (Terraform limitation):
# This resource cannot be destroyed via Terraform
terraform state rm databricks_metastore_data_access.first
Step 2: Destroy resources:
terraform destroy
Manual Cleanup:
After terraform destroy, manually delete the metastore in Databricks Account Console if needed.
After deploying your complete platform:
../byovpc-psc-ws/)../byovpc-cmek-ws/)This configuration is provided as a reference implementation for deploying complete, production-ready Databricks workspaces with Unity Catalog on GCP.