This document contains solutions to common issues encountered during deployment, updates, and destruction of Azure Databricks workspaces.
Error Message:
Error: ServiceEndpointPolicyCannotBeDeletedIfReferencedBySubnet
Service Endpoint Policy cannot be deleted because it is in use with subnet(s).
Root Cause: Azure prevents deletion of a Service Endpoint Policy while it’s still referenced by subnets.
Solutions:
For New Deployments (created after graceful destroy fix):
terraform destroy # Automatic cleanup
For Existing Deployments (created before graceful destroy fix):
# Step 1: Remove SEP from subnets manually
az network vnet subnet update \
--resource-group <RG_NAME> \
--vnet-name <VNET_NAME> \
--name <PUBLIC_SUBNET_NAME> \
--remove serviceEndpointPolicies
az network vnet subnet update \
--resource-group <RG_NAME> \
--vnet-name <VNET_NAME> \
--name <PRIVATE_SUBNET_NAME> \
--remove serviceEndpointPolicies
# Step 2: Destroy
terraform destroy
How to find your resource names:
terraform output -raw resources | jq -r '.network.vnet_name'
terraform output -raw resources | jq -r '.network.subnet_names'
Error Message:
Error: Network Connectivity Config is unable to be deleted because
it is attached to one or more workspaces
Root Cause: NCC binding removal has propagation delay; API still sees it as “attached” when deletion is attempted.
Solution:
# Remove NCC from state (workspace deletion will clean it up)
terraform state rm module.ncc.databricks_mws_network_connectivity_config.this
# Continue with destroy
terraform destroy
Why this is safe:
Error Message:
Error: cannot delete metastore data access: Storage credential 'xxx-metastore-access'
cannot be deleted because it is configured as this metastore's root credential.
Root Cause:
Unity Catalog protects the root storage credential from deletion to prevent data access issues. The force_destroy parameter may not work in all cases due to API limitations.
Solution:
# Remove metastore resources from state
terraform state rm module.unity_catalog.databricks_metastore_data_access.this
terraform state rm module.unity_catalog.databricks_metastore.this
# Continue with destroy
terraform destroy
Prevention (for new deployments):
# modules/unity-catalog/main.tf
resource "databricks_metastore" "this" {
provider = databricks.account
name = var.metastore_name
storage_root = "abfss://..."
region = var.location
force_destroy = true # Set on initial creation
}
Important Notes:
force_destroy = true in production deploymentsforce_destroy = true doesn’t make deletion dangerous - it only allows Terraform to delete when you explicitly run terraform destroyError Message:
Error: cannot update metastore: UpdateMetastore delta_sharing_recipient_token_lifetime_in_seconds
can not be 0, which is infinite token lifetime.
Root Cause:
The Databricks API validates certain metastore parameters during updates. Setting force_destroy from false to true triggers an update operation that fails API validation.
Solution: Remove metastore from state and continue:
terraform state rm module.unity_catalog.databricks_metastore_data_access.this
terraform state rm module.unity_catalog.databricks_metastore.this
terraform destroy
Error Message:
Error: Security rule AllowVnetInBound conflicts with rule
Microsoft.Databricks-workspaces_UseOnly_databricks-worker-to-worker-inbound.
Root Cause: In Non-PL deployments, Databricks automatically creates NSG rules. Custom rules conflict with these.
Solution: NSG rule creation is already conditional - only for Private Link deployments:
# modules/networking/nsg-rules.tf
resource "azurerm_network_security_rule" "inbound_vnet_to_vnet" {
count = var.enable_private_link && !var.enable_public_network_access ? 1 : 0
# ...
}
Rule Summary:
Error Message:
Error: cannot create metastore: Failed to retrieve tenant ID for given token
Root Cause:
Missing DATABRICKS_AZURE_TENANT_ID environment variable for Databricks account provider.
Solution: Export required environment variables:
# Azure Authentication
export ARM_SUBSCRIPTION_ID="your-subscription-id"
export ARM_TENANT_ID="your-tenant-id"
export ARM_CLIENT_ID="your-client-id" # If using service principal
export ARM_CLIENT_SECRET="your-client-secret" # If using service principal
# Databricks Authentication
export DATABRICKS_ACCOUNT_ID="your-account-id"
export DATABRICKS_AZURE_TENANT_ID="$ARM_TENANT_ID"
Provider Configuration:
# deployments/*/providers.tf
provider "databricks" {
alias = "account"
host = "https://accounts.azuredatabricks.net"
account_id = var.databricks_account_id
azure_tenant_id = var.databricks_azure_tenant_id
azure_use_msi = false
azure_environment = "public"
}
Error Message:
Error: checking for existing Container "metastore": unexpected status 403
(403 This request is not authorized to perform this operation.)
Root Cause:
Storage account created with default_action = "Deny" in network rules, preventing local Terraform from creating containers.
Solution:
Use default_action = "Allow" during initial deployment:
# modules/unity-catalog/main.tf
resource "azurerm_storage_account" "metastore" {
name = local.metastore_storage_name
resource_group_name = var.resource_group_name
location = var.location
account_tier = "Standard"
account_replication_type = "LRS"
is_hns_enabled = true
network_rules {
default_action = "Allow" # Required for initial container creation
bypass = ["AzureServices"]
}
}
Post-Deployment Lockdown (optional): After successful deployment, you can manually update to deny default access and use Service Endpoint Policy for security.
For New Deployments (automatic cleanup):
terraform destroy
For Existing Deployments (manual cleanup):
# 1. Start destroy
terraform destroy
# 2. If SEP errors occur, remove from subnets
az network vnet subnet update \
--resource-group <RG_NAME> \
--vnet-name <VNET_NAME> \
--name <SUBNET_NAME> \
--remove serviceEndpointPolicies
# 3. If NCC errors occur, remove from state
terraform state rm module.ncc.databricks_mws_network_connectivity_config.this
# 4. If metastore errors occur, remove from state
terraform state rm module.unity_catalog.databricks_metastore_data_access.this
terraform state rm module.unity_catalog.databricks_metastore.this
# 5. Complete destroy
terraform destroy
# Validate Terraform configuration
terraform validate
# Plan without applying
terraform plan
# Check authentication
az account show
databricks auth env --profile account
# List resources in state
terraform state list
# Show specific resource
terraform state show 'module.unity_catalog.databricks_metastore.this'
# Remove resource from state (does not delete in cloud)
terraform state rm 'resource.address'
# Import existing resource
terraform import 'resource.address' 'resource-id'
Terraform Debug Logs:
export TF_LOG=DEBUG
export TF_LOG_PATH="terraform-debug.log"
terraform apply
Databricks Provider Logs:
export DATABRICKS_DEBUG_TRUNCATE_BYTES=10000
export DATABRICKS_DEBUG_HEADERS=true
terraform apply 2>&1 | tee apply-debug.log
Azure Resources:
# List resource group contents
az resource list --resource-group <RG_NAME> --output table
# Check workspace status
az databricks workspace show \
--name <WORKSPACE_NAME> \
--resource-group <RG_NAME>
# Check storage account
az storage account show \
--name <STORAGE_NAME> \
--resource-group <RG_NAME>
Databricks Resources:
# List metastores (requires account admin)
databricks metastores list --account-id <ACCOUNT_ID>
# Show workspace details
databricks workspace get --workspace-id <WORKSPACE_ID>
force_destroy = true for Metastoresresource "databricks_metastore" "this" {
provider = databricks.account
name = var.metastore_name
force_destroy = true # Essential for clean destroy
}
resource "azurerm_network_security_rule" "example" {
count = var.enable_private_link ? 1 : 0 # Only for Private Link
# ...
}
resource "azurerm_storage_account" "example" {
network_rules {
default_action = "Allow" # Required initially
bypass = ["AzureServices"]
}
}
locals {
all_tags = merge(var.tags, {
Owner = var.tag_owner
KeepUntil = var.tag_keepuntil
})
}
# Always test the full lifecycle
terraform apply
terraform destroy
# If successful, apply to production