Connect to Databricks

This document provides instructions on how to configure a Databricks connection with Sigma.

Requirements

  • Admin privileges in your Sigma organization; see Account types.
  • Admin privileges in Databricks
  • We recommend that you avoid granting excessive permissions to the account you use when connecting to your data store; for example, you do not require SYSADMIN-level access.

Limitations

When connecting from Sigma to a Databricks data store, Sigma does not support the following features and interactions:

  • Dataset warehouse views
  • Connecting to the warehouse through OAuth
  • Support for function ArrayDistinct

Configure Databricks

Complete the following steps in Databricks before you add a Databricks connection to Sigma.

  1. Create a Databricks SQL endpoint if one doesn't already exist; see Databricks documentation on their specific SQL dialect. Databricks SQL endpoints are only available for Databricks accounts on their Premium plan or higher.

  2. We recommend that you disable Auto stop on your Databricks endpoint. In the Databricks SQL menu, navigate to SQL Warehouses and select the endpoint to use in Sigma.

  3. Click Edit in the more ••• menu and turn off Auto stop.

    📘

    If Auto stop is enabled for your Databricks SQL endpoint, then your first query will be slow or timeout when the SQL endpoint is in a suspended state.

    Starter_Endpoint.png

  4. Go to the SQL Admin Console to grant privileges to Databrick users or groups to add to Sigma.

    • For most use cases, SELECT privileges on tables and USAGE privileges on schemas/catalogs are sufficient.
    • You need to grant SELECT privileges for each table in a schema, even if the user has USAGE privileges on the schema.
    • Tables in the catalog hive_metastore require READ_METADATA privileges.
    • To enable the use of write-access features like input tables, materialization, and CSV upload, you must grant MODIFY privileges on a write schema.
    • Unity Catalog is Databricks’s newest catalog and is currently in public preview. Eventually, it will replace Hive Metastore. If Unity Catalog is enabled, Sigma can access catalogs, schemas, and tables managed by Unity Catalog.

Configure Sigma

Before setting up a connection in Sigma, create an access token,  and gather the Server hostname and HTTP path from your Databricks Starter Endpoint

StarterEP.png

Follow these basic steps to create a connection:

  1. Click the user icon at the top right of your screen. 
    The user icon is usually composed of your initials.

  2. In the drop-down menu, select Add connection.
    Select the Add connection option

  3. The Add new connections page appears.
    The connections page lists all supported database types, as icons

  4. In the Connection details, specify these:

    Name
    Specify the Name of the new connection. Sigma displays this name in the connection list.
    Here, we use Databricks connection.
    Type
    Select the tile that represents the data warehouse you use.
    Here, click the Databricks tile.
  5. In the Connection Credentials section, specify the following:

    Host
    The address of your database.
    This can be a URL, or an IP address.
    HTTP path
    Enter the HTTP path of your Databricks cluster here.
    Access token
    Generate a token in Databricks, using these instructions, and enter its value here.
    For example, t457e0fo09iv7rwqnkof124084sou9jt.

    Connection credentials for Databricks

  6. In the Connection Features section, specify the following:

    Connection timeout
    The time before timeout (or cancellation), in seconds, that Sigma waits for the query to return results.
    Default is 120, or 2 minutes.
    Maximum is 600, or 10 minutes.
    Use friendly names
    This switch makes column names from the data source more readable.
    For example, a database column ORDER_NUMBER appears as Order Number.
    On by default.

    Connection features for Databricks

  7. In the Write Access section, decide if you require write access.
    See Set up write access.

    Enable write access
    Necessary for CSV upload and Materialization.
    Off by default.
    If on, specify the Write schema field.
    Write catalog
    The catalog where Sigma writes tables. 
    The first layer of the object hierarchy, used to organize your data assets.
    Appears only if Enable write access is on.
    Write schema
    The schema where Sigma writes tables. 
    The second layer of the object hierarchy, and contain tables and views; also known as databases.
    Appears only if Enable write access is on.

    Write access for the connection

  8. After you specify all the parameters of the connection, click Create.

  9. After you successfully create your connection, Sigma displays it on the screen.
    New connection created successfully

  10. To verify your connection, click Browse Connection, and then explore the visible databases and tables.
    Explore dbs and tables

  11. Click Add Permission to grant data access for users in your organization.
    See Data permissions.
    alt

  12. The new connection also appears in the list of connections you have in your account. 

Databricks Partner Connect

Databricks is one of Sigma's partners, so you can quickly establish a connection between them through the Databricks Partner Connect interface.