databricks

Databricks Common Questions & Answers

Your guide to frequently asked questions across Databricks topics

Quick Navigation:


General Concepts

Compute Types

Q: What’s the difference between classic compute and serverless compute?

A: Databricks offers two compute types with different networking models:


Networking

General Networking

Q: Do I need customer-managed networking, or can I use Databricks defaults?

A: For classic compute, it depends on your requirements:

Recommendation: For production, customer-managed networking is the recommended approach.

Q: Can I convert a workspace from Databricks-managed to customer-managed networking?

A: No. The networking model is set during workspace creation and cannot be changed. However:

Q: Can multiple workspaces share the same network?

A: Yes! This is a common pattern:

Plan IP capacity for all workspaces combined.

Q: How much IP space do I really need?

A: Use this formula:

AWS:

Databricks uses 2 IPs per node
AWS reserves 5 IPs per subnet

Required IPs = (Max concurrent nodes × 2) + 5

Example for 100 nodes:
(100 × 2) + 5 = 205 IPs
Recommend /24 (251 usable IPs)

Azure:

Databricks uses 1 IP per node
Azure reserves 5 IPs per subnet

Required IPs = (Max concurrent nodes) + 5

Example for 100 nodes:
100 + 5 = 105 IPs
Recommend /24 (251 usable IPs)

GCP:

Databricks uses 1 IP per node
GCP reserves 4 IPs per subnet

Required IPs = (Max concurrent nodes) + 4

Example for 100 nodes:
100 + 4 = 104 IPs
Recommend /24 (251 usable IPs)

Add 30-50% buffer for growth.

Q: What happens if I run out of IP addresses?

A:


AWS Networking

Q: Why does Databricks require outbound 0.0.0.0/0 access?

A: Databricks clusters need outbound access for:

  1. Control plane communication - heartbeats, logs, job submissions (required)
  2. Unity Catalog metadata - via ports 8443-8451 (required)
  3. AWS services - S3, EC2, STS, Kinesis (required)
  4. Package repositories - PyPI, Maven, CRAN (optional - can use mirrors)
  5. Legacy Hive metastore - port 3306 (optional, not needed with Unity Catalog)

Security groups must allow 0.0.0.0/0. For fine-grained control, use firewall or proxy to filter specific destinations.

Q: Can I block specific outbound destinations?

A: Yes, but use the right layer:

This gives you control while meeting Databricks requirements.

Q: Are VPC endpoints required?

A: No, but recommended:

Q: Why does Databricks use 2 IP addresses per node?

A: Databricks assigns:

  1. Management IP - control plane communication, logging, metrics
  2. Spark application IP - data processing, shuffle traffic

This separation improves security and network performance.

Q: Can I use AWS PrivateLink without customer-managed VPC?

A: No. AWS PrivateLink requires customer-managed VPC. You cannot use PrivateLink with Databricks-managed networking.

Q: Where do I find control plane NAT IPs for my region?

A: See official documentation: Databricks Control Plane IPs

These IPs are needed for:

Q: Can I access S3 buckets in different regions?

A: Yes, but with considerations:

Q: Do I need NAT Gateway in every Availability Zone?

A:

Q: Why do Network ACLs require allowing 0.0.0.0/0 inbound?

A: This is about stateless return traffic, not inbound calls:

This is secure because:


Azure Networking

Q: Why does Azure require two subnets?

A: Azure Databricks uses subnets differently than AWS:

Q: What is subnet delegation and why is it needed?

A: Subnet delegation grants Azure Databricks permission to create resources in your subnet:

Q: What are Azure Service Tags and why are they better?

A: Service Tags are Azure-managed labels that represent IP ranges for Azure services:

Benefits:

Q: Can I use the same VNet for multiple workspaces?

A: Yes, with proper planning:

Q: Do Azure NSGs have the same stateless issues as AWS NACLs?

A: No! This is a key difference:

Q: What’s the difference between NAT Gateway and Azure Firewall?

A:

Feature NAT Gateway Azure Firewall
Purpose Basic outbound NAT Advanced security appliance
Filtering None URL/FQDN, IP, application rules
Logging Basic Comprehensive
Cost Lower (~$40/month) Higher (~$1000/month)
Use Case Simple outbound access Compliance, deep inspection
Setup Simple Complex

Recommendation: Start with NAT Gateway, add Azure Firewall if compliance requires egress filtering.

Q: Can I use Azure Private Link without VNet injection?

A: No. Azure Private Link requires VNet injection (customer-managed VNet). You cannot use Private Link with Databricks-managed networking.

Q: How do I access ADLS Gen2 securely?

A: Three options:

  1. Service Endpoints (recommended for simplicity):
    • Enable on private subnet
    • Traffic stays on Azure backbone
    • Free
  2. Private Endpoints (maximum security):
    • Deploy private endpoint for storage account
    • Fully private connectivity
    • Additional cost per endpoint
  3. Storage Firewall:
    • Restrict storage to specific VNets
    • Add exception for Azure Databricks

Q: Why does Databricks use only 1 IP per node on Azure (vs 2 on AWS)?

A: Different architecture:


GCP Networking

Q: How many subnets does GCP require?

A: GCP requires only 1 subnet per workspace - simpler than AWS (2) or Azure (2).

Q: What is Private Google Access and why is it important?

A: Private Google Access allows VMs without external IPs to reach Google services (GCS, Artifact Registry, BigQuery, etc.) using internal IP addresses:

Benefits:

Critical: Must be enabled on the subnet for Databricks to work properly.

Q: What’s the difference between Cloud NAT and Private Google Access?

A:

Feature Cloud NAT Private Google Access
Purpose Access to internet Access to Google services
Traffic Destination External websites, Databricks control plane GCS, GAR, BigQuery, etc.
Cost Per GB processed Free
Required For Databricks control plane access Google service access
Traffic Path Through NAT to internet Google backbone only

Both are typically needed:

Q: What is a Shared VPC in GCP?

A: A Shared VPC (also called Cross Project Network or XPN) allows you to separate:

Use cases:

Note: Don’t confuse this with “sharing a VPC between workspaces” - both standalone and Shared VPCs can host multiple workspaces.

Q: Can subnets be shared across multiple workspaces?

A: Yes! Unlike Azure (where each workspace needs unique subnets), GCP allows:

Q: What is VPC Service Controls (VPC-SC)?

A: VPC Service Controls provides an additional security perimeter around Google Cloud resources:

Purpose:

When to use:

Note: VPC-SC is complex to set up. Only implement if compliance requires it.

Q: What is Private Service Connect (PSC)?

A: Private Service Connect provides private connectivity to Databricks control plane:

Similar to AWS PrivateLink or Azure Private Link.

Q: Do GCP firewall rules work like AWS Security Groups?

A: Similar but with key differences:

Feature GCP Firewall Rules AWS Security Groups
Level VPC-level Instance-level
Statefulness Stateful Stateful
Default Deny ingress, allow egress Deny all unless allowed
Tags Yes (network tags) No (attach to instances)
Complexity Medium Medium

Both automatically allow return traffic (stateful).

Q: Why does Databricks use only 1 IP per node on GCP (vs 2 on AWS)?

A: Similar to Azure, different architecture:

Q: What happens if I forget to enable Private Google Access?

A: Databricks clusters will fail to start because they cannot:

Fix: Enable Private Google Access on the subnet and restart cluster creation.


Authentication & Identity

Coming soon - content from authentication.md will be added here


Security & Compliance

Q: Is traffic between control plane and compute plane encrypted?

A: Yes, all traffic is encrypted:

Q: Can I restrict which storage buckets clusters can access?

A: Yes, multiple layers of control:

  1. Cloud IAM permissions - limit which buckets role/principal can access
  2. Storage bucket policies - restrict access to specific VPCs/endpoints/IPs
  3. Unity Catalog - fine-grained access control on tables/files
  4. VPC/Private Endpoints - network-level restrictions

Combine these for defense-in-depth.

Q: How do I audit network traffic?

A: Cloud-specific options:

AWS:

Azure:

GCP:

Q: Can I deploy Databricks in a fully air-gapped environment?

A: No, minimum connectivity required:


Troubleshooting

Q: Cluster won’t start - how do I debug network issues?

A: Follow this checklist:

  1. Check subnet capacity:
    • AWS: aws ec2 describe-subnets --subnet-ids <subnet-id>
    • Azure: Check in Azure Portal under subnet properties
    • GCP: gcloud compute networks subnets describe <subnet-name>
    • Look for available IP count
  2. Verify security/firewall rules:
    • Check egress rules for required ports (443, 8443-8451)
    • Ensure intra-VNet/VPC traffic allowed
    • Verify service tags (Azure) or CIDR ranges correct
  3. Check route tables:
    • Verify default route (0.0.0.0/0) points to NAT Gateway/Firewall
    • Ensure route table associated with correct subnet
  4. Verify NAT/outbound connectivity:
    • AWS: Check NAT Gateway state
    • Azure: Check NAT Gateway or Azure Firewall
    • GCP: Check Cloud NAT configuration
  5. DNS settings (AWS only):
    • Ensure DNS hostnames enabled
    • Ensure DNS resolution enabled

Q: Cluster started but can’t access storage - what’s wrong?

A: Common causes:

  1. IAM/permissions:
    • Verify instance profile/managed identity has storage read/write
    • Check trust policy allows Databricks to assume role
  2. Storage access policies:
    • Check bucket/storage account policies
    • Verify VPC/VNet allowed in firewall rules
    • Check private endpoint configuration
  3. Network connectivity:
    • Verify VPC/private endpoints configured
    • Check endpoint policies (if restricted)
    • Test connectivity from compute subnet

Q: How do I test networking before creating workspace?

A: Pre-deployment tests:

  1. Launch test VM in same subnet
  2. Test outbound connectivity:
    # Test control plane
    curl -I https://accounts.cloud.databricks.com
    
    # Test storage
    # AWS: curl -I https://s3.<region>.amazonaws.com
    # Azure: curl -I https://<storage>.blob.core.windows.net
    # GCP: curl -I https://storage.googleapis.com
    
    # Test DNS resolution
    nslookup accounts.cloud.databricks.com
    
  3. Verify routing:
    traceroute 8.8.8.8  # Should go through NAT Gateway
    
  4. Check security rules allow required traffic
  5. Verify IAM/permissions on test VM

Performance & Optimization

Coming soon - performance-related Q&A will be added here


How to Use This Guide

  1. Search - Use Ctrl/Cmd+F to find your question
  2. Navigate - Use table of contents for topic browsing
  3. Cross-reference - Questions link to relevant guides
  4. Contribute - Found an answer elsewhere? It should be here!


Last Updated: January 15, 2026 Maintainer: Databricks Platform Engineering


Don’t see your question? Check the official documentation or ask in the Databricks Community.