Set up a Databricks connection for Python (Beta)
This documentation describes one or more public beta features that are in development. Beta features are subject to quick, iterative changes; therefore the current user experience in the Sigma service can differ from the information provided in this page.
This page should not be considered official published documentation until Sigma removes this notice and the beta flag on the corresponding feature(s) in the Sigma service. For the full beta feature disclaimer, see Beta features.
If you set up Python with your Databricks connection, you can write and run Python code in Sigma as a notebook-style experience, elevating the level of complex analysis that you can perform on your data and reducing friction between business analytics and data science.
After setting up your connection for Python, you can write and run Python code in a Sigma workbook.
Requirements
To set up your Databricks connection for Python, the following requirements must be met.
In your Sigma organization:
- You must be assigned the Admin account type.
In Databricks:
-
You must have access to an all-purpose compute resource, or create one. To create one, you must either be an Admin or have the
Allow unrestricted cluster creationuser entitlement. See Connect to all-purpose and jobs compute in the Databricks documentation.If you create an all-purpose compute resource to run Python from Sigma, start by creating a small cluster. Monitor usage and set up auto-scaling or change the size of the cluster based on usage.
-
If you run Databricks on Azure, you must be running Databricks Runtime LTS 11.3 or higher. See Databricks Runtime release notes versions and compatibility in the Databricks documentation.
Additional requirements apply to set up a new Databricks connection. See Connect to Databricks.
Limitations
- You must have a write-back destination set up to use Python, but your write-back destination cannot use default storage.
- Set a reasonable automatic termination policy for the all-purpose compute. Sigma recommends a policy of at least 3 hours, based on your expected usage. If the all-purpose compute cluster is terminated, users assigned an account type with the Write Python permission enabled can restart the cluster.
Set up a Databricks connection to work with Python
To create or modify a Databricks connection in Sigma to work with Python, complete the steps to connect to Databricks and complete additional configuration steps in both Databricks and Sigma.
Configure Databricks
After you complete the steps to Configure Databricks, perform additional steps specific to configure Databricks for use with Sigma and Python:
-
Identify the cluster ID of the all-purpose compute cluster. See Get identifiers for workspace objects in the Databricks documentation.
-
Confirm that the access token or service principal you plan to use to connect to this compute has
CAN RESTARTpermissions for the compute resource. See Compute ACLs in the Access control lists topic in the Databricks documentation.Sigma requires
CAN RESTARTto allow users to restart a terminated compute cluster from a workbook. If you do not want to grant this level of permission to the access token or service principal, useCAN ATTACH TOinstead and manage the cluster in the Databricks workspace interface. -
Install libraries on your Databricks cluster for use in Python. See Cluster libraries in the Databricks documentation. Several libraries, such as DBUtils and pyspark, are included by default.
Sigma recommends installing the latest version of the libraries. Other libraries such as pandas, pyflakes, numpy, scipy, requests, and others might be useful.
-
To allow API calls made from Python code in Sigma, ensure the network configuration for the cluster allows egress to the relevant API endpoints.
Create or update a Databricks connection in Sigma
After you complete the steps to Create a Databricks connection in Sigma, but before you finish creating your connection, complete the steps to set up Python:
- In the Python section, turn on the toggle for Enable Python queries.
- For Compute cluster, specify the Cluster ID of your all-purpose compute cluster.
- Follow the steps to Finish creating your connection.
After you set up Python for Databricks, users can write and run Python code in a Sigma workbook.
Updated 8 days ago
