Configure storage integration using customer-owned bucket
This documentation describes one or more public beta features that are in development. Beta features are subject to quick, iterative changes; therefore the current user experience in the Sigma service can differ from the information provided in this page.
This page should not be considered official published documentation until Sigma removes this notice and the beta flag on the corresponding feature(s) in the Sigma service. For the full beta feature disclaimer, see Beta features.
Add an external storage integration that uses a customer-owned bucket1 in Amazon S3, Google Cloud Storage (GCS), or Azure Blob Storage. This storage integration supports the following features:
- CSV upload
- File upload column in input tables
- Export to cloud storage
This document explains the general and feature-specific advantages of the external storage integration and how to configure one with a customer-owned S3, GCS, or Azure blob bucket.
1This document uses "bucket" as a generic term for the top-level storage unit when describing general concepts and AWS- or GCP-specific configurations. When describing Azure-specific configurations, however, this document uses "container" to be consistent with Azure Blob Storage terminology for the top-level storage unit.
Much of the storage integration configuration involves completing steps within your cloud storage platform. Because these workflows are maintained and updated by a third party, the steps detailed in this document may differ from the cloud storage platform's current UI and terminology.
Requirements
The ability to configure a storage integration that uses a customer-owned bucket requires the following:
- In Sigma, you must be assigned the Admin account type.
- In your cloud provider, you must be granted administrative permissions or have the ability to create and manage a storage bucket. Supported cloud providers are:
- Amazon Web Services (AWS) (other S3-compatible providers are not supported)
- Google Cloud Platform (GCP)
- Microsoft Azure
- In your cloud provider, you must also be granted permissions required to create and manage core security policies (e.g., IAM roles, ARN definitions, or trust policies).
- The customer-owned bucket must be in the same cloud platform as your Sigma organization. For example, if your Sigma organization is hosted in AWS, you must use an S3 bucket.
Advantages of using a customer-owned storage bucket
Some Sigma features (like file column uploads) have storage flows that require a customer-owned bucket and cannot be used when an external storage integration isn't configured. Other features (like CSV upload) have default storage flows that use a Sigma-owned bucket, but you can enable the use of a customer-owned bucket instead.
When a feature uses a Sigma-owned bucket to stage, cache, and store files, you cannot see, manage, or access the bucket. These restrictions can conflict with your company's security and compliance requirements. When you choose to use a customer-owned bucket, however, your company gains full control over the following:
- Data location (where files live)
- IAM and RBAC policies (who has file access)
- TTL and lifecycle rules (how long files are retained)
- Encryption configuration and keys (how files are encrypted)
This level of access can be necessary if your company has strict compliance requirements, needs to maintain full control over its data, or wants to customize their storage experience to align with their existing infrastructure and security policies.
There are also feature-specific advantages to using a customer-owned bucket. The following table compares each supported feature's default storage flow (without a storage integration) to the customer-owned bucket storage flow (with a storage integration).
| Feature | Without storage integration | With storage integration |
|---|---|---|
| CSV upload2 | Staging files are temporarily stored in a Sigma-owned bucket before loading to your data platform. Sigma controls the bucket region and lifecycle (24-hour TTL). | Staging files are temporarily stored in the customer-owned bucket before loading to your data platform. Your company controls the bucket region and TTL. This helps your organization meet security and compliance requirements that could otherwise block the use of CSV upload. |
| File upload column | File upload columns cannot be used. | Long-lived files that can contain sensitive information are stored in the customer-owned bucket, which offers control over data management, security, and compliance. |
| Export to cloud storage2 | Your data platform builds the export file and writes it to a customer-owned bucket using its own storage integration. This data flow can introduce platform-specific formatting and other inconsistencies in comparison to what users see in Sigma. | Sigma can build the export file and write it directly to the customer-owned bucket. This data flow applies platform-agnostic export logic that results in a cleaner and more consistent export format that aligns with what users see in Sigma. |
2CSV uploads and exports require additional configurations to use the storage integration. For more information, see Configure CSV upload and storage options.
Configure a storage integration with Amazon S3
To configure a storage integration that uses your own S3 bucket, you must complete the following procedures:
- Create an S3 bucket and IAM policy in AWS
- Create a custom IAM role in AWS
- Add an AWS integration in Sigma
- Update the custom IAM role in AWS
- Enable cross-origin resource sharing (CORS) in AWS
- Create an IP allowlist in AWS
Create an S3 bucket and IAM policy in AWS
In your AWS account, create an S3 bucket and an IAM Policy to allow bucket access. For detailed instructions, see Creating a general purpose bucket and Creating IAM policies in the AWS documentation.
When creating the IAM policy, use the following policy template. Replace the {{customer_s3_bucket_name}} and {{prefix}} placeholders with the name of your S3 bucket and the folder path prefix that the integration must be allowed to access.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}"
},
{
"Action": "s3:ListBucket",
"Effect": "Allow",
"Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}",
"Condition": {
"StringLike": {
"s3:prefix": [
"{{prefix}}/*"
]
}
}
},
{
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:PutObjectTagging"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}/{{prefix}}/*"
}
]
}Create a custom IAM role in AWS
In your AWS account, create a custom IAM role that Sigma can assume. This role must be created before you add the storage integration in Sigma because the integration uses credentials AWS issues for the role. For detailed instructions, see Creating an IAM role in the AWS documentation.
While creating the IAM role, ensure that your configurations match these requirements for the integration with Sigma:
- Select AWS Account as the trusted entity type.
- When prompted for an Account ID, you should use your AWS account ID as a temporary value. After you add an AWS integration in Sigma, you must update the IAM role to modify the trusted relationship and grant access to Sigma.
- When creating the role, ensure you select Require external ID.
- When prompted for an external ID, enter a placeholder value (for example,
0000). Sigma generates an external ID when you add an AWS integration, after which you must update the IAM role. - When selecting permissions, use the IAM policy you just created.
Add an AWS S3 integration in Sigma
You can now add a storage integration in Sigma using an S3 bucket.
-
In Sigma, go to Administration > Account > General Settings.
-
In the Storage Integration > External storage integration section, click Add.
-
In the Add storage integration modal, provide the required AWS credentials.
-
In the Provider section, select AWS S3.
-
In the AWS IAM role ARN field, enter the Role ARN value obtained when you created the IAM role.
-
In the Bucket name field, enter the S3 destination folder path that includes the bucket and folder path prefix specified in the IAM policy.
-
-
Click Save, then record the AWS IAM user ARN and AWS external role ARN displayed in the integration details.
Update the custom IAM role trust in AWS
In your AWS account, edit the trust policy document using the ARN values recorded after you created the integration in Sigma. For detailed instructions, see Editing the trust relationship for an existing role in the AWS documentation.
Use the following trust policy template, replacing {{aws_iam_user_arn}} and {{aws_external_role_arn}} with the ARN values you recorded in Sigma.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "{{aws_iam_user_arn}}"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "{{aws_external_role_arn}}"
}
}
}
]
}Enable cross-origin resource sharing (CORS) in AWS
In your AWS account, enable CORS for the S3 bucket. For detailed instructions, see Configuring cross-origin resource sharing (CORS) in the AWS documentation.
Use the following CORS configuration:
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST"
],
"AllowedOrigins": [
"https://app.sigmacomputing.com"
],
"ExposeHeaders": [
"Access-Control-Allow-Origin",
"ETag"
]
}This snippet shows a single rule that can be added to an existing list of CORS rules. If there are no other CORS rules configured, wrap the snippet in
[].
Create an IP allowlist in AWS
(Optional) In your AWS account, use policy condition keys to limit access to your bucket based on IP address. Only traffic from approved IP address ranges will be allowed. For detailed instructions, see AWS global condition context keys in the AWS documentation.
Before you specify the policy conditions, you must obtain the relevant IP address.
- Sigma cluster IP addresses: See Add Sigma IPs to the allowlist.
- User IP addresses: Office IP addresses, VPN IP addresses, and any other IP addresses that your Sigma organization users will use to access Sigma.
- Data platform IP addresses: IP addresses used by Snowflake or Databricks instances connected to your Sigma organization. This is only required when using external stages for CSV uploads because the data platform must access the bucket directly.
Configure a storage integration with GCS
To configure a storage integration that uses your own GCS bucket, you must complete the following procedures:
- Configure a target bucket in GCP
- Configure a service account in GCP
- Configure warehouse-specific external storage access
- Add a GCP integration in Sigma
- Grant impersonation access to the service account in GCP
- Enable cross-origin resource sharing (CORS) in GCP
- Create an IP allowlist in GCP
Configure a target bucket in GCP
In your GCP project, create a dedicated target bucket for file uploads. You can select storage options as you see fit, but Standard storage is recommended. For detailed instructions, see Create a bucket in the GCP documentation.
Configure a service account in GCP
In your GCP project, configure a service account with Storage Object Admin permissions to mediate access to the bucket. For detailed instructions, see Create service accounts in the GCP documentation.
Configure warehouse-specific external storage access in your data platform
If your organization is connected to Snowflake, Databricks, or BigQuery, and CSV upload is configured to use external stages, you must configure external storage access. The required permissions depend on your data platform. Refer to the relevant section below.
- Configure external storage access in Snowflake
- Configure external storage access in Databricks
- Configure external storage access in BigQuery
Configure external storage access in Snowflake
-
In Snowflake, create a storage integration object that enables access to the GCS bucket. See Configure an integration for Google Cloud Storage in the Snowflake documentation. Record the name of the integration object for later use.
-
Use the
DESCRIBE INTEGRATION {integration_name}command to retrieve the name of the service account referenced by the integration object. -
Grant the following permissions to the service account:
- Storage Bucket Viewer
- Storage Object Viewer
Configure external storage access in Databricks
-
In Databricks, create an external location. See External locations in the Databricks documentation. Record the name of the external location for later use.
-
Configure a credential set to be used with the external location. See Credentials.
-
Grant the following permissions to the service account associated with the credential view:
- Storage Object Viewer
- Storage Object Creator
Configure external storage access in BigQuery
-
Identify the service account or identities associated with your organization's BigQuery connection.
-
In Google Cloud Console, open the target bucket, then grant the service account the Storage Object Viewer IAM role.
Add a GCS integration in Sigma
You can now add a storage integration in Sigma using a GCS bucket.
-
In Sigma, go to Administration > Account > General Settings.
-
In the Storage Integration > External storage integration section, click Add.
-
In the Add storage integration modal, provide the required GCP credentials.
-
In the Provider section, select Google Cloud Storage.
-
In the Service account field, enter the service account ID.
-
(Required by Snowflake and Databricks connections only) In the Warehouse storage integration name field, enter the name of the Snowflake storage integration object or Databricks external location. See Configure warehouse-specific external storage access for more information.
-
In the Bucket name field, enter the name of the GCS bucket.
-
-
Click Save.
Grant impersonation access to the service account in GCP
In your GCP project, grant impersonation access to the service account with the Service Account Token Creator role. For detailed instructions, see Service account impersonation and Manage access to service accounts in the GCP documentation.
Enable cross-origin resource sharing (CORS) in GCP
In your GCP project, enable CORS for the target bucket. For detailed instructions, see Set up and view CORS configurations in the GCP documentation.
Use the following CORS configuration:
{
"origin": ["https://app.sigmacomputing.com"],
"method": ["GET", "POST", "PUT"],
"responseHeader": ["*"],
"maxAgeSeconds": 3600
}This snippet shows a single rule that can be added to an existing list of CORS rules. If there are no other CORS rules configured, wrap the snippet in
[].
Create an IP allowlist in GCP
(Optional) In your GCP project, create IP filtering rules to limit access to your bucket based on IP address. Only traffic from approved IP address ranges will be allowed. For detailed instructions, see Create or update IP filtering rules on an existing bucket in the GCP documentation.
Before you create the filtering rules, you must obtain the relevant IP address ranges.
- Sigma cluster IP addresses: See Add Sigma IPs to the allowlist.
- User IP addresses: Office IP addresses, VPN IP addresses, and any other IP addresses that your Sigma organization users will use to access Sigma.
- Data platform IP addresses: IP addresses used by Snowflake or Databricks instances connected to your Sigma organization. This is only required when using external stages for CSV uploads because the data platform must access the bucket directly.
Configure a storage integration with Azure Blob Storage (Beta)
To configure a storage integration that uses your own Azure Blob container (referred to generally as a bucket), you must complete the following procedures:
- Record your Microsoft Entra tenant ID in Azure
- Create a storage account and container in Azure
- Create a custom role in Azure
- Add an Azure Blob Storage integration in Sigma
- Create an enterprise application in Azure
- Assign roles to the enterprise application in Azure
- Enable cross-origin resource sharing (CORS) in Azure
- Create an IP allowlist in Azure
Record your Microsoft Entra tenant ID in Azure
In your Azure portal, find and record the ID for your Microsoft Entra tenant. This ID is required to add an Azure Blob Storage integration in Sigma. For detailed instructions, see Find your Microsoft Entra tenant in the Azure documentation.
Create a storage account and container in Azure
In your Azure portal, create a storage account and target storage container for file uploads. For detailed instructions, see Create an Azure storage account and Create a container in the Azure documentation.
Because cross-origin resource sharing (CORS) is required, creating a new, dedicated storage account is recommended to enforce clear security boundaries that prevent unintended cross-origin access to resources.
Create a custom role in Azure
In your Azure portal, create a custom role to grant Sigma permission to the target storage container. For detailed instructions, see Create or update Azure custom roles using the Azure portal in the Azure documentation.
When referencing the Azure documentation, use the following guidance:
-
In Step 2: Choose how to start, follow the Start from scratch path, and create the custom role in the resource group that contains your storage account.
-
In Step 3: Basics, enter a name and description for the role, then proceed without additional custom configuration until you reach Step 6: JSON.
-
In Step 6: JSON replace the permissions block with the following JSON:
"permissions": [ { "actions": [], "notActions": [], "dataActions": [ "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read", "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write" ], "notDataActions": [] } ]
Add an Azure Blob Storage integration in Sigma
You can now add a storage integration in Sigma using an Azure Blob Storage container.
-
In Sigma, go to Administration > Account > General Settings.
-
In the Storage Integration > External storage integration section, click Add.
-
In the Add storage integration modal, provide the Azure credentials.
-
In the Provider section, select Azure Blob Storage.
-
In the Storage account name field, enter the name of the dedicated storage account.
-
In the Azure tenant ID field, enter your Microsoft Entra tenant ID.
-
In the Bucket name field, enter the name of the target storage container.
-
In the Path prefix field, enter any folder path prefix.
-
-
Click Save, then record the Azure broker application ID displayed in the integration details. You will need this value in an upcoming configuration step.
Create an enterprise application in Azure
Use Azure CLI to create an enterprise application that registers Sigma in your Azure environment. For detailed instructions, see Create an enterprise application from a multitenant application in the Azure documentation. Apply the Azure broker application ID you recorded when you added the storage integration in Sigma.
When you successfully create the enterprise application, Sigma auto-generates the application name using the format sigma-broker-{hash}. This name is unique to your Sigma organization and will be required in an upcoming configuration step.
Assign roles to the enterprise application in Azure
In your Azure portal, assign the following roles to the enterprise application to allow Sigma to operate on the target storage container.
- Storage Blob Delegator role: Enables Sigma to generate short-lived user delegation SAS tokens.
- Custom role created in the Create a custom role in Azure section: Allows Sigma to read and write to the target storage container.
For detailed instructions, see Assign Azure roles using the Azure portal.
When assigning the Storage Blob Delegator role, use the following guidance:
- In Step 1: Identify the needed scope, search for and select your storage account.
- In Step 3: Select the appropriate role, select the Storage Blob Delegator role.
- In Step 4: Select who needs access, search for and select the enterprise application you created in Create an enterprise application in Azure (named
sigma-broker-{hash}).
When assigning the custom role, use the following guidance:
- In Step 1: Identify the needed scope, search for and select the target storage container in your storage account.
- In Step 3: Select the appropriate role, select the custom role you created in Create a custom role in Azure.
- In Step 4: Select who needs access, search for and select the enterprise application you created in Create an enterprise application in Azure (named
sigma-broker-{hash}).
Enable cross-origin resource sharing (CORS) in Azure
In Azure, enable CORS for the storage account. For more information about CORS, see Cross-Origin Resource Sharing (CORS) support for Azure Storage in the Azure documentation. For portal navigation guidance, see Azure's Create a CORS rule quickstart.
UI terminology can differ in your Azure portal and may display as Settings > Resource sharing (CORS) or another variation instead of Settings > CORS, as shown in the Azure documentation.
When configuring CORS for Blob service, set the following values:
- Allowed origins:
https://app.sigmacomputing.com - Allowed methods:
PUT, POST, GET - Allowed headers:
* - Exposed headers:
Access-Control-Allow-Origin - Max age:
3600
Create an IP allowlist in Azure
(Optional) In your Azure portal, create an IP network rule to limit access to your storage account based on IP address. Only traffic from approved IP address ranges will be allowed. For detailed instructions, see Create an IP network rule for Azure Storage in the Azure documentation.
Before you create the network rule, you must obtain the relevant IP address ranges.
- Sigma cluster IP addresses: See Add Sigma IPs to the allowlist.
- User IP addresses: Office IP addresses, VPN IP addresses, and any other IP addresses that your Sigma organization users will use to access Sigma.
- Data platform IP addresses: IP addresses used by Snowflake or Databricks instances connected to your Sigma organization. This is only required when using external stages for CSV uploads because the data platform must access the storage container directly.
Updated 3 days ago
