Set up a Databricks connection for Python (Beta)

🚩
This documentation describes one or more public beta features that are in development. Beta features are subject to quick, iterative changes; therefore the current user experience in the Sigma service can differ from the information provided in this page.
This page should not be considered official published documentation until Sigma removes this notice and the beta flag on the corresponding feature(s) in the Sigma service. For the full beta feature disclaimer, see Beta features.

If you set up Python with your Databricks connection, you can write and run Python code in Sigma as a notebook-style experience, elevating the level of complex analysis that you can perform on your data and reducing friction between business analytics and data science.

After setting up your connection for Python, you can write and run Python code in a Sigma workbook.

Requirements

To complete these one time setup your Databricks connection for Python, complete these requirements.

You must meet the following requirements in your Sigma organization:

You must be assigned the Admin account type.

You must meet the following requirements in Databricks:

You must have access to an all-purpose compute resource, or create one. To create one, you must either be an Admin or have the Allow unrestricted cluster creation user entitlement. See Connect to all-purpose and jobs compute in the Databricks documentation.

💡
If you create an all-purpose compute resource to run Python from Sigma, start by creating a small cluster. Monitor usage and set up auto-scaling or change the size of the cluster based on usage.
If you run Databricks on Azure, you must be running Databricks Runtime LTS 11.3 or higher. See Databricks Runtime release notes versions and compatibility in the Databricks documentation.

Additional requirements apply to set up a new Databricks connection. See Connect to Databricks.

Limitations

You must have a write-back destination set up to use Python, but your write-back destination cannot use default storage.
Set a reasonable automatic termination policy for the all-purpose compute. Sigma recommends a policy of at least 3 hours, based on your expected usage. If the all-purpose compute cluster is terminated, users assigned an account type with the Write Python permission enabled can restart the cluster.

Set up a Databricks connection to work with Python

To create or modify a Databricks connection in Sigma to work with Python, complete the steps to connect to Databricks and complete additional configuration steps in both Databricks and Sigma.

Configure Databricks

After you complete the steps to Configure Databricks, perform additional steps specific to configure Databricks for use with Sigma and Python:

Identify the cluster ID of the all-purpose compute cluster. See Get identifiers for workspace objects in the Databricks documentation.
Confirm that the access token or service principal you plan to use to connect to this compute has CAN RESTART permissions for the compute resource. See Compute ACLs in the Access control lists topic in the Databricks documentation.

💡
Sigma requires CAN RESTART to allow users to restart a terminated compute cluster from a workbook. If you do not want to grant this level of permission to the access token or service principal, use CAN ATTACH TO instead and manage the cluster in the Databricks workspace interface.
Install libraries on your Databricks cluster for use in Python. See Cluster libraries in the Databricks documentation. Several libraries, such as DBUtils and pyspark, are included by default.

Sigma recommends installing the latest version of the libraries. Other libraries such as pandas, pyflakes, numpy, scipy, requests, and others might be useful.
To allow API calls made from Python code in Sigma, ensure the network configuration for the cluster allows egress to the relevant API endpoints.

💡
Set the time zone of your cluster in the Spark configuration to the same time zone specified in your Sigma organization. To identify the time zone of your Sigma organization, see Change the account time zone. For details on specifying the time zone in your Databricks cluster, see Set Spark configuration properties on Databricks in the Databricks documentation.

Create or update a Databricks connection in Sigma

After you complete the steps to Create a Databricks connection in Sigma, but before you finish creating your connection, complete the steps to set up Python:

In the Python section, turn on the toggle for Enable Python queries.
For Compute cluster, specify the Cluster ID of your all-purpose compute cluster.
Follow the steps to Finish creating your connection.

After you set up Python for Databricks, users can write and run Python code in a Sigma workbook.

Updated about 2 months ago