Problem Solver: Quick fixes for common deployment issues.
π Use Ctrl+F to find your error message
| Category | Jump To |
|---|---|
| Prerequisites | Setup Issues |
| Terraform Errors | Terraform Issues |
| AWS Errors | AWS Issues |
| Databricks Errors | Databricks Issues |
| KMS/Encryption | Encryption Issues |
| Destroy Problems | Destroy Issues |
terraform: command not foundSolution:
# Install Terraform
brew install terraform # macOS
# or download from https://terraform.io
Docs: Install Terraform
Unable to locate credentialsSolution:
# Check AWS credentials
aws sts get-caller-identity --profile your-profile
# If fails, configure:
aws configure --profile your-profile
# or
aws sso login --profile your-profile
Docs: AWS Auth
Symptom: Terraform asks for databricks_client_id input
Solution:
# Check variables
echo $TF_VAR_databricks_client_id
# If empty, set in ~/.zshrc:
export TF_VAR_databricks_client_id="your-id"
export TF_VAR_databricks_client_secret="your-secret"
# Reload
source ~/.zshrc
Docs: Environment Setup
Error: Missing required argumentFull Error:
Error: Missing required argument
on main.tf line 50, in module "databricks_workspace":
50: module "databricks_workspace" {
The argument "databricks_client_id" is required, but no definition was found.
Solution: Set environment variables (see issue above)
Error: Unsupported argumentFull Error:
Error: Unsupported argument
on modules/unity_catalog/variables.tf line 42:
42: client_id = var.databricks_client_id
An argument named "client_id" is not expected here.
Cause: Variable name mismatch after refactoring
Solution:
# Pull latest code
git pull
# Re-initialize
terraform init -upgrade
Error: Invalid reference in variable validationFull Error:
Error: Invalid reference in variable validation
The condition for variable "existing_workspace_cmk_key_alias" can only refer to the variable itself
Cause: Cross-variable validation not supported
Solution: Validation moved to module logic (fixed in current version)
Full Error:
Error: creating S3 Bucket (mycompany-dbx-root): BucketAlreadyExists
Solution:
# Change bucket names in terraform.tfvars
root_storage_bucket_name = "mycompany-dbx-root-v2"
# or add different prefix
Tip: Random suffix is added automatically, but base name must be unique
MalformedPolicyDocumentException: Policy contains invalid principalsFull Error:
Error: creating KMS Key: MalformedPolicyDocumentException: Policy contains a statement with one or more invalid principals
Cause: Circular dependency - KMS key policy references IAM role before it exists
Solution: Fixed in current version (modules reordered: IAM β KMS β Storage)
Details: KMS Unity Catalog Fix
Full Error:
Error: InvalidServiceName: The Vpc Endpoint Service 'com.amazonaws.vpce.us-west-1.vpce-svc-xxxxx' does not exist
Solution: VPC endpoint service names are region-specific and auto-detected
Supported Regions:
Manual Override (if needed):
workspace_vpce_service = "com.amazonaws.vpce.us-west-1.vpce-svc-actual-id"
relay_vpce_service = "com.amazonaws.vpce.us-west-1.vpce-svc-actual-id"
Full Error:
Error: cannot create external location: AWS IAM role does not have WRITE, DELETE permissions on url s3://...
User: arn:aws:sts::account:assumed-role/dbx-catalog-xxx/databricks is not authorized to perform: kms:GenerateDataKey
Cause: Unity Catalog role missing KMS permissions when enable_encryption=true
Solution: Fixed in current version - KMS policies automatically added
IAM Propagation: If still fails, wait 60 seconds and retry:
terraform apply
# Wait appears in plan, policy created but not propagated yet
Details: KMS Fix Documentation
Full Error:
Error: cannot authenticate Databricks account: 401 Unauthorized
Solution:
# Verify Service Principal credentials
echo $TF_VAR_databricks_client_id
echo $TF_VAR_databricks_account_id
# Test authentication
curl -X GET \
-u "$TF_VAR_databricks_client_id:$TF_VAR_databricks_client_secret" \
https://accounts.cloud.databricks.com/api/2.0/accounts/$TF_VAR_databricks_account_id/workspaces
Check: Service Principal has Account Admin role
Symptom: Workspace URL loads but canβt create clusters
Solution: Wait 20 minutes for Private Link backend stabilization
Why?: Databricks provisions backend infrastructure after workspace creation
Verify:
# Check workspace status
terraform output workspace_status
# Should show: RUNNING
enable_encryption vs enable_workspace_cmk confusionQuestion: Which encryption should I use?
Answer: They are independent encryption layers:
Layer 1 - S3 Bucket Encryption (enable_encryption):
βββ Encrypts: S3 buckets (DBFS, UC metastore, UC external)
βββ Use for: Data at rest in S3
βββ Cost: KMS key charges
Layer 2 - Workspace CMK (enable_workspace_cmk):
βββ Encrypts: DBFS root, EBS volumes, Managed Services
βββ Use for: Workspace-level encryption
βββ Cost: KMS key charges
You can enable:
- Neither (AWS-managed encryption)
- One or the other
- Both simultaneously β
Docs: Encryption Layers
Question: How does key rotation work?
Answer:
AWS Automatic Rotation (enabled by default):
β
Rotates key material annually
β
ARN stays the same
β
No action required
β
Applies to both encryption layers
Manual Rotation to Different Key:
β
Managed Services CMK: Supported
β Storage CMK: NOT supported (only auto-rotation)
β
S3 Bucket keys: Update bucket config
Databricks Docs: Key Rotation
Full Error:
Error: deleting subnet: DependencyViolation: The subnet has dependencies and cannot be deleted
Error: deleting VPC: DependencyViolation: The vpc has dependencies and cannot be deleted
Cause: Databricks launched cluster nodes (EC2) that created ENIs (network interfaces) not tracked by Terraform
Solution:
Step 1: Find VPC ID
VPC_ID=$(terraform output -raw vpc_id)
Step 2: Terminate EC2 instances
# Find instances
aws ec2 describe-instances \
--filters "Name=vpc-id,Values=$VPC_ID" \
"Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[InstanceId,State.Name]' \
--output table
# Terminate
INSTANCE_IDS=$(aws ec2 describe-instances \
--filters "Name=vpc-id,Values=$VPC_ID" \
"Name=instance-state-name,Values=running,stopped" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
# Wait
aws ec2 wait instance-terminated --instance-ids $INSTANCE_IDS
Step 3: Delete unattached ENIs
ENI_IDS=$(aws ec2 describe-network-interfaces \
--filters "Name=vpc-id,Values=$VPC_ID" \
"Name=status,Values=available" \
--query 'NetworkInterfaces[*].NetworkInterfaceId' \
--output text)
for ENI in $ENI_IDS; do
aws ec2 delete-network-interface --network-interface-id $ENI
done
Step 4: Retry destroy
terraform destroy
Full Error:
Error: cannot create permission assignment: resource not found
Cause: User assignment runs before Unity Catalog resources are ready
Solution: Fixed in current version - depends_on added
Workaround (if needed):
# Create everything except user assignment
terraform apply -target=module.unity_catalog
# Then create user assignment
terraform apply
Symptom: Want to use existing metastore instead of creating new one
Solution:
# In terraform.tfvars
metastore_id = "your-existing-metastore-id"
This skips metastore creation, only assigns workspace to existing metastore
Symptom: Deployment takes > 30 minutes
Expected Time:
Check for:
Not Normal:
Full Error:
Error: Invalid value for variable "vpc_cidr": VPC CIDR overlaps with Databricks reserved range
Reserved CIDRs (avoid these):
β 127.187.216.0/24 (Databricks internal)
β 192.168.216.0/24 (Databricks internal)
β 198.18.216.0/24 (Databricks internal)
β 172.17.0.0/16 (Docker default)
Solution:
# Use different CIDR
vpc_cidr = "10.0.0.0/22" # β
Good
vpc_cidr = "172.16.0.0/16" # β
Good
vpc_cidr = "192.168.0.0/16" # β
Good (avoid .216 subnet)
export TF_LOG=DEBUG
export TF_LOG_PATH=terraform-debug.log
terraform apply
# Recent API calls
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=CreateVpc \
--max-results 10
terraform output workspace_idcat terraform-debug.log| Error Pattern | Typical Cause | Solution |
|---|---|---|
403 Forbidden |
IAM permissions | Check AWS/Databricks service principal permissions |
404 Not Found |
Resource doesnβt exist | Check resource IDs, region |
401 Unauthorized |
Auth failure | Verify credentials, environment variables |
400 Bad Request |
Invalid parameter | Check terraform.tfvars values |
409 Conflict |
Resource already exists | Change names or import existing |
DependencyViolation |
Resource in use | Clean up dependencies first |
InvalidParameter |
Wrong value | Check AWS/Databricks API documentation |
Still Stuck? Open an issue with:
terraform versionterraform.tfvars (redact secrets!)Docs: All Documentation