Configure an external storage integration with Google Cloud Storage
Configure an external storage integration using a customer-owned Google Cloud Storage (GCS) bucket to give your organization full control over file location, access, retention, and encryption. Some features, like file upload columns in input tables and exports to cloud storage require an external storage integration. Other features, like CSV upload, have default storage flows that use a Sigma-owned bucket, but you can enable the use of a customer-owned bucket instead.
This document explains how to configure a storage integration with a customer-owned GCS bucket. For information about the general and feature-specific advantages of an external storage integration, see External storage integration overview.
Most of the storage integration configuration requires you to complete steps within a GCP project or your data platform. Because these workflows are maintained and updated by a third party, the steps detailed in this document may reference different UI and terminology than GCP or your data platform.
Requirements
The ability to configure a storage integration that uses a customer-owned GCS bucket requires the following:
- In Sigma, you must be assigned the Admin account type.
- In GCP, you must be granted administrative permissions or have the ability to create and manage a GCS bucket.
- In GCP, you must also be granted permissions required to create and manage core security policies (e.g., IAM roles and service account configurations).
- Your Sigma organization must be hosted in GCP. If your organization is hosted in Amazon Web Services (AWS) or Microsoft Azure, see Configure an external storage integration with Amazon S3 or Configure an external storage integration with Azure Blob Storage.
Configure a storage integration with GCS
To configure a storage integration that uses your own GCS bucket, complete the following procedures:
- Configure a target bucket in GCP
- Configure a service account in GCP
- Configure warehouse-specific external storage access
- Add a GCS integration in Sigma
- Grant impersonation access to the service account in GCP
- Enable cross-origin resource sharing (CORS) in GCP
- Create an IP allowlist in GCP
Configure a target bucket in GCP
In your GCP project, create a dedicated target bucket for file uploads. You can select storage options as you see fit, but Standard storage is recommended. For detailed instructions, see Create a bucket in the GCP documentation.
Configure a service account in GCP
In your GCP project, configure a service account with Storage Object Admin permissions to mediate access to the bucket. For detailed instructions, see Create service accounts in the GCP documentation.
Configure warehouse-specific external storage access
If your organization is connected to Snowflake, Databricks, or BigQuery, and CSV upload is configured to use external stages, you must configure external storage access. The required permissions depend on your data platform. Refer to the relevant section below.
- Configure external storage access in Snowflake
- Configure external storage access in Databricks
- Configure external storage access in BigQuery
Configure external storage access in Snowflake
-
In Snowflake, create a storage integration object that enables access to the GCS bucket. See Configure an integration for Google Cloud Storage in the Snowflake documentation. Record the name of the integration object for later use.
-
Use the
DESCRIBE INTEGRATION {integration_name}command to retrieve the name of the service account referenced by the integration object. -
Grant the following permissions to the service account:
- Storage Bucket Viewer
- Storage Object Viewer
Configure external storage access in Databricks
-
In Databricks, create an external location. See External locations in the Databricks documentation. Record the name of the external location for later use.
-
Configure a credential set to be used with the external location. See Credentials.
-
Grant the following permissions to the service account associated with the credential view:
- Storage Object Viewer
- Storage Object Creator
Configure external storage access in BigQuery
-
Identify the service account or identities associated with your organization's BigQuery connection.
-
In Google Cloud Console, open the target bucket, then grant the service account the Storage Object Viewer IAM role.
Add a GCS integration in Sigma
You can now add a storage integration in Sigma using a GCS bucket.
-
In Sigma, go to Administration > Account > General Settings.
-
In the Storage Integration > External storage integration section, click Add.
-
In the Add storage integration modal, provide the required GCP credentials.
-
In the Provider section, select Google Cloud Storage.
-
In the Service account field, enter the service account ID.
-
(Required by Snowflake and Databricks connections only) In the Warehouse storage integration name field, enter the name of the Snowflake storage integration object or Databricks external location. See Configure warehouse-specific external storage access for more information.
-
In the Bucket name field, enter the name of the GCS bucket.
-
-
Click Save.
Grant impersonation access to the service account in GCP
In your GCP project, grant impersonation access to the service account with the Service Account Token Creator role. For detailed instructions, see Service account impersonation and Manage access to service accounts in the GCP documentation.
Enable cross-origin resource sharing (CORS) in GCP
In your GCP project, enable CORS for the target bucket. For detailed instructions, see Set up and view CORS configurations in the GCP documentation.
Use the following CORS configuration:
{
"origin": ["https://app.sigmacomputing.com"],
"method": ["GET", "POST", "PUT"],
"responseHeader": ["*"],
"maxAgeSeconds": 3600
}This snippet shows a single rule that can be added to an existing list of CORS rules. If there are no other CORS rules configured, wrap the snippet in
[].
Create an IP allowlist in GCP
(Optional) In your GCP project, create IP filtering rules to limit access to your bucket based on IP address. Only traffic from approved IP address ranges will be allowed. For detailed instructions, see Create or update IP filtering rules on an existing bucket in the GCP documentation.
Before you create the filtering rules, you must obtain the relevant IP address ranges.
- Sigma cluster IP addresses: See Add Sigma IPs to the allowlist.
- User IP addresses: Office IP addresses, VPN IP addresses, and any other IP addresses that your Sigma organization users will use to access Sigma.
- Data platform IP addresses: IP addresses used by Snowflake or Databricks instances connected to your Sigma organization. This is only required when using external stages for CSV uploads because the data platform must access the bucket directly.
Updated about 3 hours ago
