Set up a Databricks connection for Python (Beta)

🚩

This documentation describes a public beta feature and is under construction. This page should not be considered part of our published documentation until this notice, and the corresponding Beta flag on the feature in the Sigma service, are removed. As with any beta feature, the feature discussed below is subject to quick, iterative changes. The latest experience in the Sigma service might differ from the contents of this document.

Beta features are subject to the Beta features disclaimer.

If you set up Python with your Databricks connection, you can write and run Python code in Sigma as a notebook-style experience, elevating the level of complex analysis that you can perform on your data and reducing friction between business analytics and data science.

After setting up your connection for Python, you can write and run Python code in a Sigma workbook.

Requirements

To set up your Databricks connection for Python, the following requirements must be met.

In your Sigma organization:

In Databricks:

  • You must have access to an all-purpose compute resource, or create one. To create one, you must either be an Admin or have the Allow unrestricted cluster creation user entitlement. See Connect to all-purpose and jobs compute in the Databricks documentation.

    💡

    If you create an all-purpose compute resource to run Python from Sigma, start by creating a small cluster. Monitor usage and set up auto-scaling or change the size of the cluster based on usage.

  • If you run Databricks on Azure, you must be running Databricks Runtime LTS 11.3 or higher. See Databricks Runtime release notes versions and compatibility in the Databricks documentation.

Additional requirements apply to set up a new Databricks connection. See Connect to Databricks.

Limitations

  • You must have a write-back destination set up to use Python, but your write-back destination cannot use default storage.
  • Set a reasonable automatic termination policy for the all-purpose compute. Sigma recommends a policy of at least 3 hours, based on your expected usage. If the all-purpose compute cluster is terminated, users assigned an account type with the Write Python permission enabled can restart the cluster.

Set up a Databricks connection to work with Python

To create or modify a Databricks connection in Sigma to work with Python, complete the steps to connect to Databricks and complete additional configuration steps in both Databricks and Sigma.

Configure Databricks

After you complete the steps to Configure Databricks, perform additional steps specific to configure Databricks for use with Sigma and Python:

  1. Identify the cluster ID of the all-purpose compute cluster. See Get identifiers for workspace objects in the Databricks documentation.

  2. Confirm that the access token or service principal you plan to use to connect to this compute has CAN RESTART permissions for the compute resource. See Compute ACLs in the Access control lists topic in the Databricks documentation.

    💡

    Sigma requires CAN RESTART to allow users to restart a terminated compute cluster from a workbook. If you do not want to grant this level of permission to the access token or service principal, use CAN ATTACH TO instead and manage the cluster in the Databricks workspace interface.

  3. Install libraries on your Databricks cluster for use in Python. See Cluster libraries in the Databricks documentation. Several libraries, such as DBUtils and pyspark, are included by default.

    Sigma recommends installing the latest version of the libraries. Other libraries such as pandas, pyflakes, numpy, scipy, requests, and others might be useful.

  4. To allow API calls made from Python code in Sigma, ensure the network configuration for the cluster allows egress to the relevant API endpoints.

Create or update a Databricks connection in Sigma

After you complete the steps to Create a Databricks connection in Sigma, but before you finish creating your connection, complete the steps to set up Python:

  1. In the Python section, turn on the toggle for Enable Python queries.
  2. For Compute cluster, specify the Cluster ID of your all-purpose compute cluster.
  3. Follow the steps to Finish creating your connection.

After you set up Python for Databricks, users can write and run Python code in a Sigma workbook.