databricks

03 - Network Security & Encryption

Network Guide: Traffic flows, security groups, and encryption layers visualized.

Quick Reference

πŸ”’ 2 Encryption Layers (Independent):
β”œβ”€β”€ S3 Bucket Encryption (enable_encryption)
└── Workspace CMK (enable_workspace_cmk)

πŸ›‘οΈ 2 Security Groups:
β”œβ”€β”€ Workspace SG (cluster nodes)
└── VPCE SG (VPC endpoints)

🌐 Regional VPC Endpoints (Cost Optimized):
β”œβ”€β”€ S3 Gateway Endpoint (FREE)
β”œβ”€β”€ STS Interface Endpoint
└── Kinesis Interface Endpoint

1. Traffic Flow Patterns

%%{init: {'theme': 'base'}}%%
sequenceDiagram
    autonumber
    participant C as Cluster Node<br/>10.0.1.5
    participant DNS as VPC DNS<br/>10.0.0.2
    participant RT as Route Table
    participant SG as Security Group
    participant VPCE as VPC Endpoint<br/>10.0.3.5
    participant DB as Databricks<br/>Control Plane

    C->>DNS: Resolve dbc-*.cloud.databricks.com
    DNS-->>C: Returns 10.0.3.5 (private IP)
    C->>RT: Lookup route for 10.0.3.5
    RT-->>C: Use "local" route (VPC-internal)
    C->>SG: Check egress rule TCP 8443
    SG-->>C: Allow (rule: TCP 8443-8451 β†’ vpce_sg)
    C->>VPCE: Send request to 10.0.3.5:8443
    VPCE->>DB: Forward via Private Link
    DB-->>VPCE: Response
    VPCE-->>C: Response

Key Points:

Docs: Private Link Architecture

1.2 S3 Access Flow

%%{init: {'theme': 'base'}}%%
flowchart LR
    C["Cluster Node"] -->|1. S3 API call| RT["Route Table"]
    RT -->|2. Match prefix list| GW["S3 Gateway<br/>Endpoint"]
    GW -->|3. VPC-internal| S3["S3 Bucket"]
    S3 -->|4. If encrypted| KMS["KMS Key<br/>Decrypt"]
    KMS -->|5. Decrypted data| S3
    S3 -->|6. Response| C

    style GW fill:#569A31
    style KMS fill:#FF9900

Always FREE - No data transfer charges!


2. Security Group Rules

2.1 Workspace Security Group (Cluster Nodes)

Attached To: EC2 instances in private subnets

Egress Rules (Outbound)

Rule 1: Cluster to Cluster Communication
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 0-65535
β”œβ”€β”€ Destination: self (workspace_sg)
└── Purpose: Spark worker communication

Rule 2: Cluster to Cluster UDP
β”œβ”€β”€ Protocol: UDP
β”œβ”€β”€ Port Range: 0-65535
β”œβ”€β”€ Destination: self (workspace_sg)
└── Purpose: Spark shuffle operations

Rule 3: Control Plane API (Private Link)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 8443-8451
β”œβ”€β”€ Destination: vpce_sg
└── Purpose: Workspace REST API via VPCE

Rule 4: Secure Cluster Connectivity (Private Link)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 6666
β”œβ”€β”€ Destination: vpce_sg
└── Purpose: Relay/SCC via VPCE

Rule 5: FIPS Encryption (Optional)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 2443
β”œβ”€β”€ Destination: 0.0.0.0/0
└── Purpose: FIPS encryption for compliance security profile

Rule 6: Public Internet (if needed)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 443, 53
β”œβ”€β”€ Destination: 0.0.0.0/0
└── Purpose: Maven, PyPI, DNS

Ingress Rules (Inbound)

Rule 1: TCP from Clusters
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 0-65535
β”œβ”€β”€ Source: self (workspace_sg)
└── Purpose: Allow worker-to-worker

Rule 2: UDP from Clusters
β”œβ”€β”€ Protocol: UDP
β”œβ”€β”€ Port Range: 0-65535
β”œβ”€β”€ Source: self (workspace_sg)
└── Purpose: Allow shuffle traffic

Docs: Security Groups

2.2 VPC Endpoint Security Group

Attached To: Databricks VPC endpoints (workspace + relay)

Egress Rules

Rule 1: Allow All Outbound
β”œβ”€β”€ Protocol: All
β”œβ”€β”€ Port Range: All
β”œβ”€β”€ Destination: 0.0.0.0/0
└── Purpose: VPCE to Databricks

Ingress Rules

Rule 1: From Workspace SG (8443-8451)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 8443-8451
β”œβ”€β”€ Source: workspace_sg
└── Purpose: Allow API calls

Rule 2: From Workspace SG (6666)
β”œβ”€β”€ Protocol: TCP
β”œβ”€β”€ Port Range: 6666
β”œβ”€β”€ Source: workspace_sg
└── Purpose: Allow SCC

3. Encryption Layers

3.1 Dual Encryption Architecture

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
flowchart TD
    subgraph "Layer 1: S3 Bucket Encryption"
        KMS1["KMS Key<br/>S3 Encryption"]
        S3["S3 Buckets<br/>β€’ DBFS Root<br/>β€’ UC Metastore<br/>β€’ UC External"]
        KMS1 -->|Encrypts| S3
    end

    subgraph "Layer 2: Workspace CMK"
        KMS2["KMS Key<br/>Workspace Storage"]
        DBFS["DBFS Root<br/>at-rest"]
        EBS["EBS Volumes<br/>cluster storage"]
        MS["Managed Services<br/>notebooks, jobs"]
        KMS2 -->|Encrypts| DBFS
        KMS2 -->|Encrypts| EBS
        KMS2 -->|Encrypts| MS
    end

    style KMS1 fill:#569A31
    style KMS2 fill:#FF9900

Independent Configuration:

Docs: Customer-Managed Keys

3.2 KMS Key Usage

Layer 1 - S3 Bucket Encryption:
β”œβ”€β”€ When: enable_encryption = true
β”œβ”€β”€ Key Created: aws_kms_key.databricks
β”œβ”€β”€ Encrypts: All S3 buckets (SSE-KMS)
└── Permissions: UC roles get KMS permissions

Layer 2 - Workspace CMK:
β”œβ”€β”€ When: enable_workspace_cmk = true
β”œβ”€β”€ Key Created: aws_kms_key.workspace_storage
β”œβ”€β”€ Encrypts: DBFS root, EBS, Managed Services
└── Permissions: In KMS key policy (Databricks service principal)

3.3 Key Rotation

AWS Automatic Rotation (Enabled by default):
β”œβ”€β”€ Rotates underlying key material annually
β”œβ”€β”€ ARN remains the same
β”œβ”€β”€ Applies to both Layer 1 and Layer 2 keys
└── No action required

Manual Rotation to Different Key:
β”œβ”€β”€ Managed Services CMK: βœ… Supported
β”œβ”€β”€ Storage CMK (DBFS/EBS): ❌ Not supported
└── S3 Bucket keys: βœ… Update S3 bucket config

Docs: Key Rotation


4. Network Scenarios

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#e1e1e1'}}}%%
flowchart TD
    START["enable_private_link"] -->|true| PL["Private Link Path"]
    START -->|false| PUB["Public Internet Path"]

    PL --> PLDNS["DNS returns<br/>private IP 10.0.3.x"]
    PLDNS --> PLVPCE["Traffic via<br/>VPC Endpoint"]
    PLVPCE --> PLDB["Databricks<br/>Private Link"]

    PUB --> PUBDNS["DNS returns<br/>public IP"]
    PUBDNS --> NAT["Traffic via<br/>NAT Gateway"]
    NAT --> IGW["Internet<br/>Gateway"]
    IGW --> PUBDB["Databricks<br/>Public Internet"]

    style PL fill:#569A31
    style PUB fill:#FF9900

Comparison:

Aspect Private Link (true) Public Internet (false)
DNS Resolution Private IP 10.0.3.x Public IP
Traffic Path VPC Endpoint β†’ Private Link NAT β†’ Internet
Data Egress Charges Lower Higher
Security No internet exposure Internet-routable
Cost VPCE charges ~$7.2/day NAT charges variable

5. Port Requirements

5.1 Critical Ports

Databricks Control Plane:
β”œβ”€β”€ 8443-8451: REST API, Unity Catalog, WebSockets
β”œβ”€β”€ 6666: Secure Cluster Connectivity (ONLY with Private Link)
└── 2443: FIPS encryption (ONLY if compliance security profile enabled)

AWS Services:
β”œβ”€β”€ 443: S3 Gateway, STS, Kinesis (via regional VPC endpoints)
└── 3306: MySQL metastore (LEGACY - NOT USED with Unity Catalog)

Public Internet (via NAT Gateway):
β”œβ”€β”€ 443: Maven Central, PyPI, Docker registries
└── 53: DNS resolution

5.2 Port 8443-8451 Range Explained

Why 9 ports (8443-8451)?

8443: Primary workspace API
8444-8451: WebSocket connections, streaming, long-running jobs

All 9 ports required for full functionality!

Warning: Restricting to only 8443 will break WebSocket features

Docs: Port Requirements


6. DNS Resolution

6.1 Private DNS for VPC Endpoints

%%{init: {'theme': 'base'}}%%
sequenceDiagram
    participant C as Cluster
    participant DNS as VPC DNS
    participant VPCE as VPC Endpoint

    Note over VPCE: private_dns_enabled = true
    C->>DNS: Query dbc-abc123.cloud.databricks.com
    DNS->>VPCE: Check VPCE private hosted zone
    VPCE-->>DNS: Return 10.0.3.5 (private IP)
    DNS-->>C: 10.0.3.5

    Note over C: Traffic stays in VPC!

Key Setting: private_dns_enabled = true on VPC endpoint

Without Private DNS:


7.1 Why Use Regional Endpoints?

βœ… Already Configured: This deployment uses regional VPC endpoints for all AWS services:

βœ… Benefits:

Docs: Configure Regional Endpoints

7.2 Spark Configuration for Regional Endpoints (Optional)

While VPC endpoints handle AWS service traffic automatically, you may optionally configure Spark to use regional S3/STS endpoints explicitly. This is useful for enforcing data residency requirements.

⚠️ Important: This configuration prevents cross-region S3 access. Only apply if all your S3 buckets are in the same region.

Option A: Notebook-Level Configuration

Add to the beginning of your notebook:

Scala:

%scala
spark.conf.set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
spark.conf.set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")

Python:

%python
spark.conf.set("fs.s3a.stsAssumeRole.stsEndpoint", "https://sts.<region>.amazonaws.com")
spark.conf.set("fs.s3a.endpoint", "https://s3.<region>.amazonaws.com")

Replace <region> with your AWS region (e.g., us-west-2).

Option B: Cluster-Level Configuration

Add to cluster Spark config (Cluster β†’ Edit β†’ Advanced Options β†’ Spark):

spark.hadoop.fs.s3a.endpoint https://s3.<region>.amazonaws.com
spark.hadoop.fs.s3a.stsAssumeRole.stsEndpoint https://sts.<region>.amazonaws.com

Create or update your cluster policy to enforce regional endpoints for all clusters:

{
  "spark_conf.fs.s3a.endpoint": {
    "type": "fixed",
    "value": "https://s3.<region>.amazonaws.com"
  },
  "spark_conf.fs.s3a.stsAssumeRole.stsEndpoint": {
    "type": "fixed",
    "value": "https://sts.<region>.amazonaws.com"
  }
}

7.3 When to Apply Spark Regional Configuration

βœ… Apply When:

❌ Do NOT Apply When:

7.4 How Regional Endpoints Work

%%{init: {'theme': 'base'}}%%
sequenceDiagram
    participant Cluster as Cluster Node
    participant DNS as VPC DNS
    participant VPCE as VPC Endpoint<br/>(Regional)
    participant S3 as S3 Service<br/>(Regional)

    Note over Cluster,S3: Without Spark Config (Default)
    Cluster->>DNS: Resolve s3.amazonaws.com (global)
    DNS-->>Cluster: Private IP (VPC endpoint)
    Cluster->>VPCE: Request via VPC endpoint
    VPCE->>S3: Regional service
    S3-->>VPCE: Response
    VPCE-->>Cluster: Response

    Note over Cluster,S3: With Spark Regional Config
    Cluster->>DNS: Resolve s3.<region>.amazonaws.com
    DNS-->>Cluster: Private IP (VPC endpoint)
    Cluster->>VPCE: Request via VPC endpoint
    VPCE->>S3: Regional service (enforced)
    S3-->>VPCE: Response (same region only)
    VPCE-->>Cluster: Response

Key Differences:

7.5 Troubleshooting Regional Endpoints

Issue: β€œAccess Denied” after applying Spark config

Cause: S3 bucket is in a different region than the workspace Solution: Either move bucket to workspace region, or remove Spark regional config

Issue: Cross-region replication stopped working

Cause: Regional endpoint config blocks cross-region S3 access Solution: Remove fs.s3a.endpoint and fs.s3a.stsAssumeRole.stsEndpoint from Spark config

Issue: Can’t access buckets with global S3 URLs

Cause: Regional config enforces regional URLs only Solution: Update S3 paths to use regional format: s3://bucket/path (Spark handles conversion)

Docs: Troubleshoot Regional Endpoints


Next Steps

βœ… Network security understood β†’ 04-QUICK-START.md - Deploy now!

βœ… Need troubleshooting β†’ 05-TROUBLESHOOTING.md - Common issues

Docs: Network Security