databricks

Databricks End-to-End Deployment with Unity Catalog

A comprehensive Terraform configuration for deploying a production-ready Databricks workspace on Google Cloud Platform (GCP) with Unity Catalog for data governance, external storage locations, cluster policies, and complete user/group management.

Table of Contents


Architecture Overview

This deployment creates a complete, production-ready Databricks platform with:

Note: This configuration assumes you already have VPC infrastructure. For infrastructure creation, see ../infra4db/.

Architecture Diagram

graph TB
    subgraph "GCP Project - Host/Shared VPC"
        subgraph "Customer VPC"
            SUBNET[Node Subnet<br/>Databricks Clusters]
            NAT[Cloud NAT]
        end
    end

    subgraph "GCP Project - Service/Consumer"
        subgraph "Databricks Workspace"
            WS[Workspace<br/>Notebooks & Clusters]

            subgraph "Cluster Policies"
                CP1[Fair Use Policy<br/>Max 10 DBU/hr]
                CP2[Auto-termination<br/>20 minutes]
                TAGS[Custom Tags<br/>Team & CostCenter]
            end
        end

        subgraph "Unity Catalog"
            META[Metastore<br/>Data Governance]

            subgraph "Catalogs"
                MAIN[Main Catalog<br/>Default]
                DEV[Dev Catalog<br/>Development]
            end

            subgraph "Schemas"
                DEVDB[DevDB Schema<br/>Dev Database]
            end
        end

        subgraph "Storage Accounts"
            GCS_META[GCS Bucket<br/>Metastore Storage]
            GCS_EXT[GCS Bucket<br/>External Location]
        end

        subgraph "Storage Credentials"
            CRED1[Default Credentials<br/>Databricks SA]
            CRED2[External Credentials<br/>Databricks SA]
        end
    end

    subgraph "Databricks Account"
        subgraph "Groups"
            UC_ADMIN[UC Admins Group]
            GROUP1[Data Engineering Group]
            GROUP2[Data Science Group]
        end

        subgraph "Users"
            ADMIN1[Admin User 1]
            ADMIN2[Service Account]
            USER1[Dev User]
            USER2[Data Scientist]
        end
    end

    subgraph "Databricks Control Plane"
        CONTROL[Databricks Control Plane<br/>accounts.gcp.databricks.com]
    end

    META --> GCS_META
    CRED1 --> GCS_META
    CRED2 --> GCS_EXT
    DEV --> DEVDB

    META --> MAIN
    META --> DEV

    WS --> META
    WS --> CP1
    CP1 --> CP2
    CP2 --> TAGS

    UC_ADMIN --> ADMIN1
    UC_ADMIN --> ADMIN2
    GROUP1 --> USER1
    GROUP2 --> USER2

    WS --> GROUP1
    WS --> GROUP2

    SUBNET --> CONTROL
    CONTROL --> WS

    style META fill:#FF3621
    style UC_ADMIN fill:#FBBC04
    style WS fill:#4285F4
    style CP1 fill:#34A853

What Makes This “End-to-End”?

This configuration provides a complete Databricks deployment, not just infrastructure:

Layer Components Purpose
Infrastructure VPC, Subnets, Workspace Foundation for compute and storage
Data Governance Unity Catalog, Metastore Centralized metadata and access control
Storage Managed + External locations Organized data storage with credentials
Access Management Groups, Users, Permissions RBAC across all resources
Cost Control Cluster policies, Tags Spend management and attribution
Security IP access lists, Firewall Network and access security

Prerequisites

1. Databricks Account Requirements

2. GCP Requirements

Existing VPC Infrastructure

This configuration requires a pre-existing VPC. Use ../infra4db/ to create:

Required:

GCP Service Account Permissions

On Service/Consumer Project:

On Host/Shared VPC Project (if using Shared VPC):

GCP Projects

  1. Service/Consumer Project: Where workspace will be created
  2. Host/Shared VPC Project: Where VPC network exists

3. Local Requirements

4. Users


What’s Included

Core Components

1. Databricks Workspace

2. Unity Catalog Setup

3. Account-Level Groups

4. Users and Assignments

5. Catalogs and Schemas

6. External Storage

7. Cluster Policies

8. Workspace Permissions


Provider Configuration

This deployment uses three Terraform providers:

1. Google Provider (Default)

provider "google" {
  project = var.google_project_name
  region  = var.google_region
}

2. Google Provider (VPC Project)

provider "google" {
  alias   = "vpc_project"
  project = var.google_shared_vpc_project
  region  = var.google_region
}

3. Databricks Account Provider

provider "databricks" {
  alias                  = "accounts"
  host                   = "https://accounts.gcp.databricks.com"
  google_service_account = var.google_service_account_email
}

Used for:

4. Databricks Workspace Provider

provider "databricks" {
  alias                  = "workspace"
  host                   = databricks_mws_workspaces.databricks_workspace.workspace_url
  google_service_account = var.google_service_account_email
}

Used for:


GCP Infrastructure Requirements

VPC and Subnet Requirements

VPC Network

Node Subnet

Network Connectivity

Egress (Required):


Databricks Resources

1. Workspace (workspace.tf)

resource "databricks_mws_workspaces" "databricks_workspace"

Creates:

Key Features:

2. Unity Catalog Setup (unity-setup.tf)

Metastore

resource "databricks_metastore" "this"

Creates:

Configuration:

Default Storage Credential

resource "databricks_metastore_data_access" "first"

Creates:

Metastore Assignment

resource "databricks_metastore_assignment" "this"

Links:

Metastore Grants

resource "databricks_grants" "all_grants"

Grants:

3. Groups and Users (unity-setup.tf)

Unity Catalog Admins Group

resource "databricks_group" "uc_admins"

Purpose:

Members:

Workspace Groups

resource "databricks_group" "data_eng"
resource "databricks_group" "data_science"

Purpose:

Workspace Assignment:

resource "databricks_mws_permission_assignment"

4. Catalogs and Schemas (unity-objects-management.tf)

Dev Catalog

resource "databricks_catalog" "dev"

Configuration:

Grants:

DevDB Schema

resource "databricks_schema" "dev_database"

Configuration:

Grants:

5. External Storage (unity-objects-management.tf)

External GCS Bucket

resource "google_storage_bucket" "ext_bucket"

Purpose:

Storage Credential

resource "databricks_storage_credential" "external_storage1_credential"

Creates:

Permissions:

Grants:

External Location

resource "databricks_external_location" "external_storage1"

Configuration:

Purpose:

6. Cluster Policies (cluster_policies.tf)

resource "databricks_cluster_policy" "fair_use"

Policy Definition:

Setting Type Value Purpose
dbus_per_hour range max: 10 Cost control
autotermination_minutes fixed 20 Prevent runaway costs
custom_tags.Team fixed From variable Cost attribution
custom_tags.CostCenter fixed From variable Billing allocation

Permissions:

Benefits:


Deployment Flow

High-Level Sequence

sequenceDiagram
    participant TF as Terraform
    participant GCP as Google Cloud
    participant DB_ACC as Databricks Account
    participant DB_WS as Databricks Workspace
    participant UC as Unity Catalog

    Note over TF,DB_ACC: Phase 1: Workspace
    TF->>DB_ACC: Create Network Configuration
    TF->>DB_ACC: Create Workspace
    DB_ACC->>GCP: Deploy GKE Cluster
    DB_ACC->>GCP: Create DBFS Bucket
    DB_ACC-->>TF: Workspace URL

    Note over TF,DB_ACC: Phase 2: Groups and Users
    TF->>DB_ACC: Create UC Admins Group
    TF->>DB_ACC: Create Data Engineering Group
    TF->>DB_ACC: Create Data Science Group
    TF->>DB_ACC: Create Users
    TF->>DB_ACC: Add Users to Groups

    Note over TF,GCP: Phase 3: Storage
    TF->>GCP: Create Metastore GCS Bucket
    TF->>GCP: Create External GCS Bucket

    Note over TF,UC: Phase 4: Unity Catalog
    TF->>UC: Create Metastore
    TF->>UC: Create Default Storage Credential
    TF->>GCP: Grant Bucket Permissions to SA
    TF->>UC: Assign Metastore to Workspace
    TF->>UC: Grant Metastore Permissions

    Note over TF,DB_ACC: Phase 5: Workspace Assignments
    TF->>DB_ACC: Assign Data Science Group (ADMIN)
    TF->>DB_ACC: Assign Data Engineering Group (USER)

    Note over TF,DB_WS: Phase 6: Catalogs & Schemas
    TF->>DB_WS: Create Dev Catalog
    TF->>DB_WS: Grant Catalog Permissions
    TF->>DB_WS: Create DevDB Schema
    TF->>DB_WS: Grant Schema Permissions

    Note over TF,UC: Phase 7: External Storage
    TF->>UC: Create Storage Credential
    TF->>GCP: Grant External Bucket Permissions
    TF->>UC: Create External Location
    TF->>UC: Grant External Location Permissions

    Note over TF,DB_WS: Phase 8: Cluster Policies
    TF->>DB_WS: Create Fair Use Policy
    TF->>DB_WS: Grant Policy Permissions

    Note over TF,DB_WS: Phase 9: Workspace Config
    TF->>DB_WS: Enable IP Access Lists
    TF->>DB_WS: Configure Allowed IPs

    Note over DB_WS: Complete Platform Ready

Dependency Management

The configuration uses depends_on extensively to ensure proper ordering:

Workspace
  ↓
Groups & Users
  ↓
Metastore Creation
  ↓
Metastore Assignment
  ↓
Workspace Assignments (Groups)
  ↓
Catalogs, Schemas, External Locations, Cluster Policies
  ↓
Permissions and Grants

Configuration

1. Update Provider Configuration

Edit providers.auto.tfvars:

# Service Account
google_service_account_email = "automation-sa@my-service-project.iam.gserviceaccount.com"

# Projects
google_project_name = "my-service-project"
google_shared_vpc_project = "my-host-project"

# Region
google_region = "us-central1"

2. Update Workspace Configuration

Edit workspace.auto.tfvars:

# Databricks Account
databricks_account_id = "12345678-1234-1234-1234-123456789abc"
databricks_account_console_url = "https://accounts.gcp.databricks.com"
databricks_workspace_name = "my-production-workspace"
databricks_admin_user = "admin@mycompany.com"

# Network Configuration
google_vpc_id = "my-vpc-network"
node_subnet = "databricks-node-subnet"

3. Update Unity Catalog Configuration

Edit unity-setup.auto.tfvars:

# Unity Catalog Groups
uc_admin_group_name = "unity-catalog-admins"
group_name1 = "data-engineering"
group_name2 = "data-science"

# Metastore Name
metastore_name = "production-metastore"

# External Storage
external_storage = "external-data"

4. Update Cluster Policy Configuration

Edit cluster_policies.auto.tfvars:

# Cluster Policy
cluster_policy1_name = "fair-use"

# Custom Tags for Cost Attribution
custom_tag_team = "DataPlatform"
custom_tag_cost_center = "Engineering-12345"

5. Variable Validation Checklist

Before deployment:


Deployment

Step 1: Authenticate with GCP

# Option 1: Service Account Impersonation
gcloud config set auth/impersonate_service_account automation-sa@project.iam.gserviceaccount.com
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)

# Option 2: Service Account Key
export GOOGLE_APPLICATION_CREDENTIALS=~/sa-key.json

Step 2: Navigate to Directory

cd gcp/gh-repo/gcp/terraform-scripts/end2end

Step 3: Initialize Terraform

terraform init

Step 4: Review Plan

terraform plan

Expected Resources (~40-50 resources):

Step 5: Apply Configuration

terraform apply

Deployment Time: ~20-30 minutes

Progress:

  1. Workspace creation (~10-12 min)
  2. Groups and users (~2-3 min)
  3. Storage buckets (~1 min)
  4. Unity Catalog metastore (~2-3 min)
  5. Metastore assignment (~1-2 min)
  6. Workspace group assignments (~1-2 min)
  7. Catalogs and schemas (~2-3 min)
  8. External storage (~2-3 min)
  9. Cluster policies (~1 min)
  10. Permissions and grants (~2-3 min)

Step 6: Verify Deployment

terraform output

Check outputs:

workspace_url = "https://12345678901234.1.gcp.databricks.com"
metastore_id = "uuid"
uc_admins_group_id = "group-id"
...

Post-Deployment

Step 1: Access Workspace

  1. Navigate to workspace URL
  2. Log in with admin user email
  3. Verify Unity Catalog is enabled

Step 2: Verify Unity Catalog

-- In Databricks SQL or Notebook
SHOW CATALOGS;
-- Should show: main, dev

SHOW SCHEMAS IN dev;
-- Should show: devdb, information_schema

USE CATALOG dev;
USE SCHEMA devdb;

-- Test table creation
CREATE TABLE test_table (id INT, name STRING);
INSERT INTO test_table VALUES (1, 'test');
SELECT * FROM test_table;

Step 3: Test External Location

-- Create external table
CREATE EXTERNAL TABLE dev.devdb.external_test
LOCATION 'gs://external-data-<region>-<suffix>/test_data';

-- Verify access
SELECT * FROM dev.devdb.external_test;

Step 4: Test Cluster Policy

  1. Go to ComputeCluster Policies
  2. Verify “fair-use cluster policy” exists
  3. Create cluster using policy
  4. Verify:
    • Auto-termination set to 20 minutes
    • Custom tags applied
    • DBU limit enforced

Step 5: Verify Permissions

As Data Engineering User:

-- Should work
USE CATALOG dev;
CREATE SCHEMA test_schema;

-- Should fail (no access to main)
USE CATALOG main;
CREATE SCHEMA test_schema;

As Data Science Admin:


Outputs

Output Description
workspace_url Databricks workspace URL
workspace_id Workspace ID for metastore assignment
metastore_id Unity Catalog metastore ID
uc_admins_group_id UC Admins group ID
data_eng_group_id Data Engineering group ID
data_science_group_id Data Science group ID
dev_catalog_name Development catalog name
external_location_name External location name
cluster_policy_id Fair use cluster policy ID
ingress_firewall_enabled IP access list status

View all outputs:

terraform output
terraform output -json | jq

Troubleshooting

Common Issues

1. Metastore Assignment Fails

Error:

Error: cannot assign metastore to workspace

Solution:

  1. Verify workspace is running:
    terraform state show databricks_mws_workspaces.databricks_workspace
    
  2. Ensure workspace and metastore are in same region

  3. Check Unity Catalog is enabled for account

  4. Wait a few minutes and retry

2. Group-to-Workspace Assignment Fails

Error:

Error: cannot create mws permission assignment: Permission assignment APIs are not available

Solution:

This API requires Unity Catalog to be assigned:

  1. Verify metastore assignment completed:
    terraform state show databricks_metastore_assignment.this
    
  2. Ensure depends_on includes metastore assignment

  3. Check service account is account admin

3. Storage Credential Creation Fails

Error:

Error: cannot create storage credential

Solution:

  1. Verify metastore assignment completed

  2. Check user has required metastore grants:
    terraform state show databricks_grants.all_grants
    
  3. Ensure depends_on includes grants

  4. Verify GCS bucket exists

4. External Location Validation Fails

Error:

Error: external location validation failed: cannot access bucket

Solution:

  1. Verify Databricks SA has bucket permissions:
    gcloud storage buckets get-iam-policy gs://external-data-bucket
    
  2. Check both storage.objectAdmin and storage.legacyBucketReader granted

  3. Wait 1-2 minutes for IAM propagation

  4. Re-apply:
    terraform apply -target=databricks_external_location.external_storage1
    

5. Cannot Create Catalog or Schema

Error:

Error: permission denied when creating catalog

Solution:

  1. Verify workspace group assignment completed:
    terraform state show databricks_mws_permission_assignment.add_admin_group
    
  2. Check metastore grants:
    terraform state show databricks_grants.all_grants
    
  3. Ensure using workspace provider (not account provider)

6. Cluster Policy Creation Fails

Error:

Error: cannot create cluster policy

Solution:

  1. Verify groups are assigned to workspace:
    terraform state show databricks_mws_permission_assignment.add_non_admin_group
    
  2. Check JSON policy is valid:
    terraform console
    > jsonencode(local.default_policy)
    
  3. Ensure custom tag variables are defined

Debug Commands

# Check workspace status
terraform state show databricks_mws_workspaces.databricks_workspace

# Check metastore
terraform state show databricks_metastore.this

# Check metastore assignment
terraform state show databricks_metastore_assignment.this

# Check groups
terraform state show databricks_group.uc_admins
terraform state show databricks_group.data_eng
terraform state show databricks_group.data_science

# Check workspace assignments
terraform state list | grep mws_permission_assignment

# Check catalogs and schemas
terraform state show databricks_catalog.dev
terraform state show databricks_schema.dev_database

# Check external storage
terraform state show google_storage_bucket.ext_bucket
terraform state show databricks_storage_credential.external_storage1_credential
terraform state show databricks_external_location.external_storage1

# Check cluster policy
terraform state show databricks_cluster_policy.fair_use

# View all outputs
terraform output -json | jq

Cleanup

Important Notes

⚠️ Before destroying:

  1. Export all important notebooks and data
  2. Terminate all running clusters
  3. Remove metastore data access resource from state

Cleanup Steps

Step 1: Remove metastore data access (Terraform limitation):

# This resource cannot be destroyed via Terraform
terraform state rm databricks_metastore_data_access.first

Step 2: Destroy resources:

terraform destroy

Manual Cleanup:

After terraform destroy, manually delete the metastore in Databricks Account Console if needed.


Additional Resources


Next Steps

After deploying your complete platform:

  1. Customize Cluster Policies:
    • Add more policies for different teams
    • Implement ML-specific policies
    • Configure spot instance policies
  2. Expand Unity Catalog:
    • Create production catalog
    • Set up staging environments
    • Configure data classification
  3. Implement CI/CD:
    • Automate notebook deployment
    • Set up environment promotion
    • Configure approval workflows
  4. Add Monitoring:
    • Enable audit logging
    • Set up cost alerts
    • Monitor cluster usage
  5. Security Enhancements:
    • Add Private Service Connect (see ../byovpc-psc-ws/)
    • Enable CMEK (see ../byovpc-cmek-ws/)
    • Configure IP access lists
  6. Data Organization:
    • Define catalog structure
    • Set up data retention policies
    • Implement data quality checks

Best Practices Implemented

✅ Infrastructure as Code

✅ Data Governance

✅ Cost Management

✅ Security

✅ Organization

✅ Automation


License

This configuration is provided as a reference implementation for deploying complete, production-ready Databricks workspaces with Unity Catalog on GCP.