Write and run Python code in Sigma (Beta)

🚩
This documentation describes one or more public beta features that are in development. Beta features are subject to quick, iterative changes; therefore the current user experience in the Sigma service can differ from the information provided in this page.
This page should not be considered official published documentation until Sigma removes this notice and the beta flag on the corresponding feature(s) in the Sigma service. For the full beta feature disclaimer, see Beta features.

You can write and run Python code in Sigma to perform data science, data engineering, and data analysis tasks. For example, define and run a churn prediction or price optimization model, clean and classify data sources, or enrich data from your data platform on demand with information from an API.

📘
This feature isn't supported by all data platform connections. To check if your connection supports it, see Supported data platforms and feature compatibility.

If you want, you can perform complex data aggregation tasks in Sigma, then perform additional analysis in Python.

When you write Python code in Sigma, you can do any of the following:

Reference data elements like tables, pivot tables, and input tables.
Use the values from many control elements in your code.
Build tables and charts with the output from your Python code.
Call API endpoints and work with the response.
Import libraries, including custom libraries, available either in your Databricks instance or in the Snowflake conda channel, depending on the connection that you use to run Python code.
Use autocomplete to reference data elements and controls in your workbook, and select libraries.
If you connect to Databricks, run Databricks notebooks and work with the output.

For examples, see Examples on this page.

To write and run Python code in Sigma, you must first set up Python on your connection.

For a Databricks connection, see Set up a Databricks connection for Python.
For a Snowflake connection, see Set up a Snowflake connection for Python.

User requirements

To add a Python element to a document and modify the code, you must be assigned an account type granted the Write Python permission. See License and account type overview.
To run Python code, you must be granted Can Use access to the connection. To perform some actions in your Python code, you might need higher level access. See Data access overview.
To run Python code on a Snowflake connection that dynamically assigns roles or uses OAuth, your role must also be granted USAGE on the write-back database, schema, and the RUN_PYTHON_CODE stored procedure. For details, see Create or update your connection.

Limitations

If you use a JWT-signed secure embed, you can embed an element, page, or workbook that contains a Python element. Public embeds and secure embeds that are not signed by a JWT are not supported.

The following is unsupported:

Exporting the output from Python code. Scheduled exports use the last run Python output instead of re-running the Python code at the time of the export.
Materializing the output of Python code. Scheduled materializations use the last run Python output instead of re-running the Python code when the materialization runs.
Referencing columns from drill down control elements.
Modifying Python code in a custom view.

Add a Python element to a workbook or data model

All Python code in Sigma is run in a Python element that you add to a workbook or data model.

Create a workbook or data model, or open an existing one and start editing.
In the Add element bar, select Data, then choose Python to add the element to the canvas. You can also drag the element to the canvas.

The element automatically defaults to the available connection that supports running Python code, or you can choose a connection.

💡
By default, all Python elements are named Code. You can rename a Python element by double-clicking the element name in the editor panel or the workbook page overview. Renaming the element can make reviewing the query history for a workbook and data lineage clearer.
Start to write Python code.

📘
Python elements and Python code are not visible when viewing a published data model.

Write Python code

You must be editing your workbook to write Python code in Python elements.

Sigma provides several available Python methods for working with other Sigma elements:

Depending on the connection that you use to run your Python code, you can work with your data as different default data structures:

For a Databricks connection, work with your data as PySpark DataFrames, or convert data into other formats.
For a Snowflake connection, work with your data as Snowpark DataFrames, or convert data into other formats.

For detailed code examples, see Examples.

📘
When running Python code on a Databricks connection for datasets of millions of rows, some data manipulation operations like toPandas, collect, groupBy, orderBy, sort, and distinct can cause an out of memory (OOM) error in the underlying Spark cluster.

Work with packages in your Python code

The connection that you use to run Python code includes specific Python libraries and packages. A user with an Admin account type can specify additional packages to include.

When you select a Python element, the editor panel lists all of the included libraries in the Installed packages section. To use any of those libraries in your Python code, import them first.

For example, to use the pandas library:

import pandas as pd

Or to use the requests library to make API calls:

import requests

If you run Python code on a Databricks connection, you do not need to import the Databricks Utilities (dbutils) package because it is initialized in every Python session:

dbutils.notebook.run('/Workspace/Users/[email protected]/Example Notebook', 60)

If you run Python code on a Snowflake connection, you do not need to import the Snowpark library because it is initialized in every Python session:

session.get_current_user_role()

Use autocomplete when writing Python code

When you write Python code, Sigma autocompletes the available Sigma methods and element names from your document.

Sigma also supports autocomplete and general language support for functions and methods in the following libraries:

pandas, numpy, scikit-learn (sklearn), requests, and scipy
For Python run against a Databricks connection: pyspark with spark and dbutils globals
For Python run against a Snowflake connection: snowflake-snowpark-python with session global

Language support provides error linting, type hinting, method details, auto importing, and more.

Turn off autocomplete for Python libraries

You can turn off autocomplete for libraries at any time. When you do, you turn off autocomplete for all Python elements only for your user.

To turn off autocomplete for Python libraries for your user:

Open a workbook for editing.
For any Python element, select More then select Turn off Python language support.

Autocomplete no longer happens for Python libraries when writing code.

Run Python code

After writing your code in a Python element, click Run to run the code.

📘
You can only run a Python element in a data model while editing the data model.

While the code runs, the Run button is unavailable and shows that the code is running. Dependent elements start to refresh.

After the code runs, any output from your script is shown, including any error messages. Dependent elements update.

To review details about the Python code run, use the query history. For example, retrieve a request ID for troubleshooting purposes, or identify long-running code. See Examine queries in a workbook or data model.

Who the Python code runs as

Your code runs in your data platform:

In Databricks, Python code runs as the role associated with the user running the code:
- For OAuth connections, the user's credentials are used to run Python code.
- For connections using basic authentication, Python code runs using the service account credentials of your connection.
In Snowflake, Python code runs as the role associated with the user running the code:
- For key pair connections, the specified role is used to run Python code. If no role is specified, the default role granted to the user is used to run Python code.
- For key pair connections that dynamically assign a role, the dynamically assigned role is used to run Python code.
- For OAuth connections, the user's credentials are used to run Python code.

Run Python code from an action

You can run the code in a Python element in a workbook using an action. You can trigger an action from a UI element like a button, a change to a control element, or a selection in a data element.

📘
To add the Run Python element action to a workbook, you must have the Write Python permission enabled on your account type.
To run a Run Python element action, you need at least Can use access to the connection.

For example, if your Python code uses the input from a control element, add an action to the control to run the dependent Python element when the control value is changed:

Select the trigger element for the action. In this example, a slider control element.
In the editor panel, select Actions.
In the Actions panel, click Add action in an existing sequence, or click Add action sequence to create a new one.
In the Action modal, configure the required fields to define the response:
- For Action, select Run Python element.
- For Element, select the Python element that you want to run. When you hover over different element names, the relevant element is highlighted on the workbook canvas.

After setting up the action, test it out. For example, change the value of the slider control and confirm that the associated Python element runs.

Use Python output in child elements

When you use the sigma.output() method to make the contents of a variable in your code available to other elements in your workbook, you can then create child elements with the variable.

Update your code to use sigma.output(). See the method reference for sigma_output().
Run your code. See Run Python code.
In the Variables: section, select the name of the variable, then choose the type of data element that you want to create: Chart, Table, or Pivot table.

A data element of the selected type is created and appears below the Python element on the canvas.

Whenever the parent Python code is run, child elements update.

📘
If your Python code is in a data model, create a child element from the output and use that child element as the data source in workbooks and other data models downstream.

Examples

You can write Python code to run models for data science tasks, enrich your data analysis with third-party data sources, and more in Sigma. Review the following examples for sample code and scenarios that you can adapt for your own use cases:

Run a Databricks notebook
Convert a Snowpark DataFrame to a pandas DataFrame
Collect data from an API
Display text output from Python code
Process and enrich data from an API

Basic examples are also available in the Python method reference.

Run a Databricks notebook

If you run Python on a Databricks connection and have a Databricks notebook, you can start a job to run the notebook from a Sigma Python element. Depending on what the notebook does, you can work with the output in Sigma.

📘
You must have at least CAN RUN permissions on the notebook and access to the folder or workspace in which the notebook is stored.

To run a Databricks notebook from Sigma, use the run command of the Notebook utility (dbutils.notebook) of Databricks Utilities (dbutils). For specific syntax, see Orchestrate notebooks and modularize code in notebooks in the Databricks documentation.

For example, to run a notebook called Example Notebook with a timeout limit of 60 seconds:

dbutils.notebook.run('/Workspace/Users/[email protected]/Example Notebook', 60)

If your notebook uses widgets to add parameters, you can pass values from Sigma control elements to the notebook when you run the notebook.

For example, to pass the value of a text input control with a control ID of message to a notebook parameter called sigma_message, make sure your Databricks notebook includes the following syntax:

# Create widget that can listen for the inputs from the Sigma workbook
dbutils.widgets. text("sigma_message" , "test")

# Create widget reference
sigma_message = dbutils.widgets.get ("sigma_message")

Then use the following syntax in a Sigma Python element:

dbutils.notebook.run(
    "/Workspace/Users/[email protected]/Example Notebook",
    60,
    {"sigma_message": sigma.get_control_value("message")},
)

Work with the output

After you run the notebook, how you work with the results depends on what your notebook does:

If your notebook updates a table in Databricks in a catalog and schema that is available in Sigma, you can add a data element to your workbook and use the updated table as the data source.
If your notebook returns data as the output in the dbutils.notebook.exit(<output>) command, you can work with it in your Python element. For details using the exit command of the Notebook utility (dbutils.notebook) of Databricks Utilities (dbutils), see the Databricks documentation.

For example, if you have a Databricks notebook to run a performance model based on a cluster_size widget and output the results with the exit command of dbutils.notebook, you can call the notebook from Sigma with the parameter value specified by a control element.

Given a workbook with a number input control with a control ID of total-nodes and a notebook called Performance Modeling, write and run the following Python code to run the notebook and display the results:

model_output = dbutils.notebook.run(
    "/Workspace/[email protected]/Performance Modeling",
    120,
    {"cluster_size": sigma.get_control_element("total_nodes")},
)

display(model_output)

If your notebook outputs a data structure that you can work with in Sigma functions, such as JSON or a PySpark DataFrame, as part of the exit command of dbutils.notebook, you can work with the output in a child element. To do so, use the sigma_output method:

model_output = dbutils.notebook.run(
    "/Workspace/[email protected]/Performance Modeling",
    120,
    {"cluster_size": sigma.get_control_element("total_nodes")},
)

display(model_output)

sigma.output("performance_results", model_output)

Convert a Snowpark DataFrame to a pandas DataFrame

If you run Python on a Snowflake connection, you might want to convert a Snowpark DataFrame to a pandas DataFrame.

To convert a Snowpark DataFrame to a pandas DataFrame, the snowpark_df.to_pandas method does not currently work. Instead, use a workaround:

Instead of using the snowpark_df.to_pandas method:

snowpark_df = sigma.get_element("TABLE_NAME")

df = snowpark_df.to_pandas()

Use a workaround with the collect() method:

snowpark_df = sigma.get_element("TABLE_NAME")

df = pd.DataFrame(snowpark_df.collect())

Collect data from an API

You can write Python code to call an API from Sigma, letting you enrich data from your data platform with information available through APIs on the web.

📘
To work with API endpoints in your Python code, your connection must allow egress to the public Internet or the relevant API endpoints:

For a Databricks connection, your Databricks cluster must allow egress.

For a Snowflake connection, you must set up a Snowflake network rule and external access integration to allow egress. See Set up a Snowflake connection for Python.

With this example code, you can retrieve metadata about specific Pokémon from the open source PokéAPI, store the response in a pandas DataFrame, then make the DataFrame available to downstream elements in your Sigma workbook.

You can optionally make the code interactive, where users can choose one or more Pokémon from a multi-select list control to provide the list of Pokémon to retrieve details about.

Refer to the following example code:

import requests
import pandas as pd

def get_pokemon_data(pokemon_list):
    base_url = "https://pokeapi.co/api/v2/pokemon/"
    pokemon_data = []

    for pokemon in pokemon_list:
        response = requests.get(f"{base_url}{pokemon}")
        if response.status_code == 200:
            data = response.json()
            pokemon_info = {
                "name": data["name"],
                "id": data["id"],
                "height": data["height"],
                "weight": data["weight"],
                "types": ", ".join([t["type"]["name"] for t in data["types"]])
            }
            pokemon_data.append(pokemon_info)
        else:
            print(f"Failed to retrieve data for {pokemon}")

    return pd.DataFrame(pokemon_data)

# Example usage
# You can also input the pokemon_list with a control
# pokemon_list= sigma.get_control_value(‘pokemon’)
pokemon_list = ["pikachu", "charizard", "bulbasaur", "squirtle"] 
df = get_pokemon_data(pokemon_list)

sigma.output("Pokemon",df)

Display text output from Python code

If your Python code outputs a variable whose value is a string, or a DataFrame with a string column, you can display the string output in a dynamic text formula in a text element, button label, element title, and more.

For example, write code to forecast a cost trend, and use a conditional statement to output a string with information about the trend.

Reference this pseudocode:

# get the cost data

df = sigma.get_element("plugs")

# forecast a trend

forecast = # your forecasting code goes here and outputs an integer for the trend

    # the forecast can include other data in the forecast variable which you
    # can chart on a scatter plot
    trend = # integer output of the forecast

# evaluate the trend forecast

trend_eval = "More expensive" if trend > 1000 else "Same or less expensive"

# make the trend and trend_eval available to other Sigma elements

sigma.output("cost_forecast", forecast)
sigma.output("evaluation", trend_eval)

With code like this, you can do the following to display a scatter plot with dynamic text that references the evaluation:

Create a Chart from the cost_forecast output.
Change the chart element to a scatter plot, then rename it to the following: This forecast is =[Code: evaluation/trend], where the formula after the = is a dynamic text formula that references the trend column of the "evaluation" output from the Python code, in the format: [<Code Element Title>: <Output Name>/<DataFrame Column Name>].

Process and enrich data from an API

In this example, you can process data from your data platform and enrich it with an API call.

📘
To work with API endpoints in your Python code, your connection must allow egress to the public Internet or the relevant API endpoints:

For a Databricks connection, your Databricks cluster must allow egress.

For a Snowflake connection, you must set up a Snowflake network rule and external access integration to allow egress. See Set up a Snowflake connection for Python.

For a given table of sales data, retrieve the current relevant exchange rate and calculate the effect of the exchange rate on your total revenue.

🚩
Running this example requires an API key from fixer.io. Ensure that your planned usage meets the restrictions for the free tier.

To perform this example, do the following:

Open an existing workbook or create one.
Add a text input control to your workbook to use to provide the API key to your code and update the control ID to Fixer-API-Key.

🚩
This is not a secure method of storing or providing API keys.
Add a table with the plugs electronics sample data from the Sigma Sample Database: sigma_sample_data.examples__plugs_electronics.plugs_electronics_hands_on_lab_data.
In the plugs table, create three new columns with the following formulas and names:
- A Sales column with a formula of [Price] * [Quantity]
- A COGs column with a formula of [Cost] * [Quantity]
- A Profit column with a formula of [Sales] - [COGs]
In the plugs table, format the Profit column to use Euros as the currency.
Add a Python element to your workbook, and use the sample code provided below.
Run the Python code, then create a child table with the current_exchange_rates variable. Two columns exist in the output: rate and currency.
In the plugs table, add a new column called Adjusted Profit to calculate the exchange rate for your Profit, with the following formula:

[Profit] * MaxIf([Code: current_exchange_rates/rate], [Code: current_exchange_rates/currency] = "USD")

Example code

import requests
import pandas as pd
from datetime import datetime

def fetch_exchange_rates(url):

    response = requests.get(url)

    if response.status_code == 200:

        data = response.json()

        if data.get('success'):

            rates = data.get('rates', {})
            
            df = pd.DataFrame([
                {'currency': currency, 'rate': rate} 
                for currency, rate in rates.items()
            ])
            
            df['base'] = data.get('base')
            df['date'] = data.get('date')
            df['timestamp'] = data.get('timestamp')

            if 'timestamp' in df.columns:
                df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
            
            return df
        else:
            print(f"API Error: {data.get('error', {}).get('info', 'Unknown error')}")
            return None
    else:
        print(f"HTTP Error: {response.status_code}")
        return None

# Example usage
if __name__ == "__main__":
    api_key = sigma.get_control_value('Fixer-API-Key')
    url = "http://data.fixer.io/api/latest?access_key=" + api_key
    exchange_rates_df = fetch_exchange_rates(url)
    
    if exchange_rates_df is not None:
        print(exchange_rates_df)

sigma.output('current_exchange_rates', exchange_rates_df)

Manage your Databricks Python compute cluster

If you run Python code on a Databricks connection, the Python code runs on an all purpose compute cluster configured when the connection to Databricks was set up.

If your service principal configured on the connection has CAN RESTART on the cluster, any user with the Write Python permission enabled on their account type can manage the cluster.

While customizing or editing a workbook, if you select a Python element, you can review the status of the cluster:

Whether the cluster is ready, pending (restarting), or terminated.
After how many minutes of inactivity the cluster will terminate.

If the cluster is terminated, you can start it in one of the following ways:

Select the Python element, then in the editor panel, select Start cluster.
Run the code in the Python element by clicking Run or triggering an action that runs the code in the Python element.

📘
Restarting the cluster can take up to 5 minutes, during which time the cluster status updates to Pending.

If you encounter any issues managing the status of your all purpose compute cluster in Sigma, work with your Databricks admin to update the cluster in the Databricks console.

Updated 12 days ago

Write and run Python code in Sigma (Beta)

User requirements

Limitations

Add a Python element to a workbook or data model

Write Python code

When running Python code on a Databricks connection for datasets of millions of rows, some data manipulation operations like `toPandas`, `collect`, `groupBy`, `orderBy`, `sort`, and `distinct` can cause an out of memory (OOM) error in the underlying Spark cluster.

Work with packages in your Python code

Use autocomplete when writing Python code

Turn off autocomplete for Python libraries

Run Python code

Who the Python code runs as

Run Python code from an action

Use Python output in child elements

Examples

Run a Databricks notebook

You must have at least CAN RUN permissions on the notebook and access to the folder or workspace in which the notebook is stored.

Work with the output

Convert a Snowpark DataFrame to a pandas DataFrame

Collect data from an API

Display text output from Python code

Process and enrich data from an API

Example code

Manage your Databricks Python compute cluster

Restarting the cluster can take up to 5 minutes, during which time the cluster status updates to Pending.

User requirements

Limitations

Add a Python element to a workbook or data model

Write Python code

When running Python code on a Databricks connection for datasets of millions of rows, some data manipulation operations like toPandas, collect, groupBy, orderBy, sort, and distinct can cause an out of memory (OOM) error in the underlying Spark cluster.

Work with packages in your Python code

Use autocomplete when writing Python code

Turn off autocomplete for Python libraries

Run Python code

Who the Python code runs as

Run Python code from an action

Use Python output in child elements

Examples

Run a Databricks notebook

You must have at least CAN RUN permissions on the notebook and access to the folder or workspace in which the notebook is stored.

Work with the output

Convert a Snowpark DataFrame to a pandas DataFrame

Collect data from an API

Display text output from Python code

Process and enrich data from an API

Example code

Manage your Databricks Python compute cluster

Restarting the cluster can take up to 5 minutes, during which time the cluster status updates to Pending.

When running Python code on a Databricks connection for datasets of millions of rows, some data manipulation operations like `toPandas`, `collect`, `groupBy`, `orderBy`, `sort`, and `distinct` can cause an out of memory (OOM) error in the underlying Spark cluster.