This guide provides best practices for managing users and groups in Databricks on Google Cloud, ensuring security, scalability, and efficient administration. Based on the official Databricks documentation: Users and Groups Best Practices.
graph TB
subgraph "Identity Provider"
IDP[Google Cloud Identity<br/>or External IdP]
SCIM[SCIM Provisioning<br/>Automated Sync]
SSO[Single Sign-On<br/>SAML/OIDC]
end
subgraph "Databricks Account"
ACCT[Account Console]
USERS[Users]
GROUPS[Groups]
SP[Service Principals<br/>For Automation]
end
subgraph "Access Control"
WS_ACCESS[Workspace Access]
CLUSTER_ACCESS[Cluster Permissions]
DATA_ACCESS[Data Permissions<br/>Unity Catalog]
JOB_ACCESS[Job Permissions]
end
subgraph "Audit & Compliance"
AUDIT[Audit Logs]
REVIEW[Access Reviews]
REVOKE[Auto-Revocation<br/>Inactive Users]
end
IDP --> SCIM
SCIM --> USERS
SCIM --> GROUPS
SSO --> ACCT
USERS --> GROUPS
GROUPS --> WS_ACCESS
GROUPS --> CLUSTER_ACCESS
GROUPS --> DATA_ACCESS
GROUPS --> JOB_ACCESS
SP --> JOB_ACCESS
WS_ACCESS --> AUDIT
DATA_ACCESS --> AUDIT
AUDIT --> REVIEW
REVIEW --> REVOKE
style IDP fill:#4285F4
style SCIM fill:#43A047
style GROUPS fill:#1E88E5
style SP fill:#FF6F00
style AUDIT fill:#8E24AA
graph TB
subgraph "User Groups"
DATA_ENG[Data Engineers Group]
DATA_SCI[Data Scientists Group]
DATA_ANAL[Data Analysts Group]
ADMIN[Admins Group]
end
subgraph "Workspace Permissions"
WS_ADMIN[Workspace Admin<br/>Full Control]
WS_USER[Workspace User<br/>Standard Access]
WS_VIEW[Workspace Viewer<br/>Read-only]
end
subgraph "Cluster Permissions"
CLUSTER_CREATE[Can Create Clusters]
CLUSTER_RESTART[Can Restart Clusters]
CLUSTER_ATTACH[Can Attach to Clusters]
end
subgraph "Data Permissions"
UC_ADMIN[Unity Catalog Admin<br/>Manage Catalogs]
DATA_OWNER[Data Owner<br/>Full Table Access]
DATA_READ[Data Reader<br/>SELECT Only]
end
ADMIN --> WS_ADMIN
ADMIN --> CLUSTER_CREATE
ADMIN --> UC_ADMIN
DATA_ENG --> WS_USER
DATA_ENG --> CLUSTER_CREATE
DATA_ENG --> DATA_OWNER
DATA_SCI --> WS_USER
DATA_SCI --> CLUSTER_ATTACH
DATA_SCI --> DATA_READ
DATA_ANAL --> WS_VIEW
DATA_ANAL --> CLUSTER_ATTACH
DATA_ANAL --> DATA_READ
style ADMIN fill:#E53935
style DATA_ENG fill:#1E88E5
style DATA_SCI fill:#43A047
style DATA_ANAL fill:#FF6F00
style WS_ADMIN fill:#FDD835
sequenceDiagram
participant Admin
participant ACCT as Account Console
participant SP as Service Principal
participant JOB as Databricks Job
participant API as Databricks API
participant DATA as Data Sources
Admin->>ACCT: Create Service Principal
ACCT-->>SP: SP Created with Client ID
Admin->>SP: Generate Access Token
SP-->>Admin: Token (expires in N days)
Admin->>SP: Assign Permissions<br/>(Job runner, Data access)
Note over SP,JOB: Automated Job Execution
JOB->>SP: Authenticate with Token
SP->>ACCT: Validate Token
ACCT-->>SP: Token Valid
SP->>JOB: Authorization Granted
JOB->>DATA: Access Data<br/>(via SP permissions)
DATA-->>JOB: Data Returned
Note over API: API Access
API->>SP: API Call with Token
SP->>ACCT: Validate & Authorize
ACCT-->>API: Request Completed
Note over Admin,SP: Best Practices:<br/>- Rotate tokens regularly<br/>- Use least privilege<br/>- One SP per application
stateDiagram-v2
[*] --> Onboarding: New User/Employee
Onboarding --> Provisioned: SCIM Auto-provision
Provisioned --> GroupAssignment: Assign to Groups
GroupAssignment --> Active: Access Granted
Active --> AccessReview: Quarterly Review
AccessReview --> Active: Access Still Needed
AccessReview --> Modified: Role Changed
AccessReview --> Inactive: No Activity Detected
Modified --> GroupAssignment: Update Permissions
Inactive --> Warning90Days: 90 Days No Activity
Warning90Days --> Disabled: Auto-disable Account
Active --> Offboarding: Employee Leaves
Offboarding --> Revoked: Remove All Access
Disabled --> [*]
Revoked --> [*]
note right of Active
Continuous monitoring
Audit logs tracked
Anomaly detection
end note
note right of Disabled
Account disabled
Can be re-enabled
Data preserved
end note
note right of Revoked
Immediate revocation
No grace period
Compliance requirement
end note
graph TB
subgraph "Audit Data Sources"
LOGS[Audit Logs<br/>Account & Workspace]
SYS[System Tables<br/>system.access.*]
GCS[GCS Audit Logs<br/>Storage Access]
end
subgraph "Monitoring Activities"
LOGIN[Login Attempts<br/>Success/Failure]
PERM[Permission Changes<br/>Grants/Revokes]
DATA[Data Access<br/>Table Queries]
API[API Calls<br/>Automation Activity]
end
subgraph "Alerting"
ANOM[Anomaly Detection<br/>Unusual Patterns]
FAILED[Failed Access<br/>Multiple Attempts]
PRIVESC[Privilege Escalation<br/>Admin Changes]
EXFIL[Data Exfiltration<br/>Large Exports]
end
subgraph "Actions"
NOTIFY[Notify Security Team]
SUSPEND[Suspend Account]
REVIEW[Trigger Review]
TICKET[Create Incident Ticket]
end
LOGS --> LOGIN
LOGS --> PERM
SYS --> DATA
SYS --> API
GCS --> DATA
LOGIN --> FAILED
PERM --> PRIVESC
DATA --> EXFIL
API --> ANOM
ANOM --> NOTIFY
FAILED --> SUSPEND
PRIVESC --> REVIEW
EXFIL --> TICKET
style LOGS fill:#1E88E5
style ANOM fill:#FF6F00
style FAILED fill:#E53935
style PRIVESC fill:#E53935
style EXFIL fill:#E53935
style NOTIFY fill:#FDD835
For more details, refer to the official Databricks Users and Groups Best Practices.
Following these best practices ensures secure and efficient user and group management in Databricks on Google Cloud.