databricks

Create Databricks Workspace

Objective

Create Databricks workspace in a customer managed VPC. VPC could be a shared vpc or a customer managed stand alone vpc.

Customer Managed VPC Architecture

graph TB
    subgraph "Databricks Account"
        ACCT[Account Console]
        NC[Network Configuration]
    end

    subgraph "Customer GCP Project"
        VPC[Customer Managed VPC]

        subgraph "VPC Configuration"
            SUBNET[Primary Subnet<br/>/29 to /9]
            FW[Firewall Rules]
            PGA[Private Google Access<br/>Enabled]
            NAT[Cloud NAT<br/>Egress]
        end

        subgraph "Databricks Workspace"
            WS[Workspace]
            SA[Compute Service Account<br/>databricks-compute@project]
            CLUSTER[Clusters<br/>GCE Instances]
        end

        subgraph "IAM & Policies"
            ORGPOL[Organization Policies]
            IMGPOL[Trusted Image Policy<br/>databricks-external-images]
            AUTHPOL[Storage Auth Policy<br/>SERVICE_ACCOUNT_HMAC]
        end
    end

    ACCT --> NC
    NC --> VPC
    VPC --> SUBNET
    VPC --> FW
    VPC --> PGA
    VPC --> NAT

    NC --> WS
    WS --> SA
    WS --> CLUSTER
    CLUSTER --> SUBNET

    ORGPOL -.validates.-> WS
    IMGPOL -.validates.-> CLUSTER
    AUTHPOL -.validates.-> SA

    style ACCT fill:#1E88E5
    style VPC fill:#4285F4
    style WS fill:#1E88E5
    style CLUSTER fill:#43A047
    style ORGPOL fill:#FF6F00

Before you begin

Domain Restricted Sharing

If your Google Cloud organization enables Domain Restricted Sharing Organization Policy, add Google Workspace customer ID for Databricks (C01p0oudw) to your policy’s allowed list. You may override your policy at the project level instead of modifying at the organization level.

For more information, see Restricting identities by domain

Trusted Image Policies

Add databricks-external-images to the trusted image policy for the GCP Databricks workspace project

Restrict Authentication Types Cloud Storage Policy

Make sure to allow SERVICE_ACCOUNT_HMAC_SIGNED_REQUESTS authentication, more details here

Workspace SA: This is created in the Regional Control Plane that is specific to this Workspace is assigned privileges to create and manage resources inside the Databricks Compute Plane. Its email address looks like db-{workspaceid}@prod-gcp-{region}.iam.gserviceaccount.com

Compute SA: Databricks will use a service account in the Compute Plane named databricks-compute@{workspace-project}.iam.gserviceaccount.com as the SA attached to every VM launched by Databricks in the GCP project. This GSA could be precreated in the project used by Databricks workspace and in that workspace would automatically use it.

Storage SAs (one or more Google Service Accounts) in the Control Plane are used to set up Unity Catalog (UC) Credentials that enable granting access to UC managed storage in your Projects and in the Compute Plane. The Storage SA generates a short-lived token and provides it to the Compute cluster process with privileges to access data. Privileges are scoped down to be specific to the requested operation.

Service Account Interaction Flow

sequenceDiagram
    participant DCP as Databricks<br/>Control Plane
    participant WSA as Workspace SA<br/>db-{workspaceid}@prod-gcp-{region}
    participant CSA as Compute SA<br/>databricks-compute@project
    participant GCE as GCE Instances
    participant GCS as GCS Buckets
    participant UC as Unity Catalog
    participant SSA as Storage SA<br/>(UC Credential)

    Note over DCP,WSA: Workspace Creation
    DCP->>WSA: Create Workspace SA<br/>in Control Plane
    WSA->>CSA: Validate/Create<br/>Compute SA in Project

    Note over DCP,GCE: Cluster Launch
    DCP->>WSA: Launch Cluster Request
    WSA->>GCE: Create GCE Instances
    GCE->>CSA: Attach Compute SA<br/>to VMs

    Note over CSA,GCS: Data Access (Non-UC)
    CSA->>GCS: Access Data<br/>(Using Compute SA permissions)

    Note over UC,SSA: Unity Catalog Data Access
    DCP->>UC: Request Data Access
    UC->>SSA: Generate Short-lived Token<br/>(Scoped Permissions)
    SSA->>GCE: Provide Token to Cluster
    GCE->>GCS: Access UC Managed Data<br/>(Using Storage SA token)

FAQ

Quick sizing guideline

Subnet Size Total Nodes Per Workspace
Nodes subnet size /26 30
Nodes subnet size /25 60
Nodes subnet size /24 120
Nodes subnet size /23 250
Nodes subnet size /22 500
Nodes subnet size /21 1000
Nodes subnet size /20 2000
Nodes subnet size /19 4000

Total Nodes Per Workspace = Total number of concurrent nodes (compute instances) supported by the workspace at a given point in time.

Subnet Sizing Visualization

graph LR
    subgraph "Subnet Size Selection"
        S26["/26 CIDR<br/>30 Nodes<br/>Small Dev/Test"]
        S24["/24 CIDR<br/>120 Nodes<br/>Medium Workloads"]
        S22["/22 CIDR<br/>500 Nodes<br/>Large Production"]
        S20["/20 CIDR<br/>2000 Nodes<br/>Enterprise Scale"]
        S19["/19 CIDR<br/>4000 Nodes<br/>Very Large Scale"]
    end

    S26 -->|Scale Up| S24
    S24 -->|Scale Up| S22
    S22 -->|Scale Up| S20
    S20 -->|Scale Up| S19

    S26 -.cannot resize.-> S26

    style S26 fill:#90CAF9
    style S24 fill:#64B5F6
    style S22 fill:#42A5F5
    style S20 fill:#1E88E5
    style S19 fill:#1565C0

Important Note: Subnet CIDR ranges cannot be changed after workspace creation. Choose carefully based on your expected growth!

Subnet CIDR ranges

Network resource or attribute Description Range
Primary subnet Classic compute nodes between /29 to /9
Region VPC Region Workspace and VPC region must match

Recommendation

Create Workspace (using UI)

Step by Step guide

Create Workspace (using Terraform)

Please follow public documentation. Here’s a few sample TF script to deploy a bring your VPC based workspace using Terraform

Validate setup

Cluster Validation Flow

sequenceDiagram
    participant Admin
    participant WS as Workspace
    participant CP as Control Plane
    participant VPC as Customer VPC
    participant GCE as GCE Instances
    participant NB as Notebook

    Admin->>WS: Create Test Cluster
    WS->>CP: Request Cluster Creation

    CP->>VPC: Validate Network Config
    VPC-->>CP: Network OK

    CP->>GCE: Launch Instances
    activate GCE
    GCE->>VPC: Attach to Subnet
    GCE->>CP: Connect via SCC Relay
    CP-->>WS: Cluster Ready
    WS-->>Admin: Cluster Running ✓

    Admin->>NB: Create Notebook
    Admin->>NB: Run Test Command<br/>%sql show tables
    NB->>GCE: Execute Command
    GCE->>NB: Return Results
    NB-->>Admin: Command Success ✓
    deactivate GCE

    Note over Admin,NB: Workspace Validated!

Troubleshooting

Common Failure Scenarios

graph TB
    START[Cluster Launch Initiated]

    START --> CHECK1{Network<br/>Configuration<br/>Valid?}
    CHECK1 -->|No| FAIL1[Network Config Error<br/>Fix: Verify subnet,<br/>firewall rules]
    CHECK1 -->|Yes| CHECK2{VPC Firewall<br/>Allows Egress?}

    CHECK2 -->|No| FAIL2[DBR_CLUSTER_LAUNCH_TIMEOUT<br/>Fix: Allow egress to<br/>Control Plane]
    CHECK2 -->|Yes| CHECK3{Cloud NAT<br/>Attached?}

    CHECK3 -->|No| FAIL3[No Internet Access<br/>Fix: Attach Cloud NAT<br/>to VPC subnets]
    CHECK3 -->|Yes| CHECK4{GCP Resource<br/>Quota Available?}

    CHECK4 -->|No| FAIL4[Quota Exceeded Error<br/>Fix: Request quota<br/>increase]
    CHECK4 -->|Yes| CHECK5{Organization<br/>Policies OK?}

    CHECK5 -->|No| FAIL5[Policy Violation<br/>Fix: Update org policies<br/>or project settings]
    CHECK5 -->|Yes| SUCCESS[Cluster Running ✓]

    style START fill:#1E88E5
    style SUCCESS fill:#43A047
    style FAIL1 fill:#E53935
    style FAIL2 fill:#E53935
    style FAIL3 fill:#E53935
    style FAIL4 fill:#E53935
    style FAIL5 fill:#E53935