Configure storage integration using customer-owned bucket (Beta)

🚩
This documentation describes one or more public beta features that are in development. Beta features are subject to quick, iterative changes; therefore the current user experience in the Sigma service can differ from the information provided in this page.
This page should not be considered official published documentation until Sigma removes this notice and the beta flag on the corresponding feature(s) in the Sigma service. For the full beta feature disclaimer, see Beta features.

Add an external storage integration that uses a customer-owned bucket in Amazon S3 or Google Cloud Storage (GCS). This storage integration supports the following features:

CSV upload
File upload column in input tables
Export to cloud bucket

This document explains the general and feature-specific advantages of the external storage integration and how to configure one with a customer-owned S3 or GCS bucket.

📘
Much of the storage integration configuration involves completing steps within your cloud storage platform. Because these workflows are maintained and updated by a third party, the steps detailed in this document may differ from the cloud storage platform's current UI and terminology.

Requirements

The ability to configure a storage integration that uses a customer-owned bucket requires the following:

In Sigma, you must be assigned the Admin account type.
In your cloud provider, you must be granted administrative permissions or have the ability to create and manage a storage bucket. Supported cloud providers are:
- Amazon Web Services (AWS) (other S3-compatible providers are not supported)
- Google Cloud Platform (GCP)
In your cloud provider, you must also be granted permissions required to create and manage core security policies (e.g., IAM roles, ARN definitions, or trust policies).
The customer-owned bucket must be in the same cloud platform as your Sigma organization. If your Sigma organization is hosted in AWS, you must use an S3 bucket. If your Sigma organization is hosted in GCP, you must use a GCS bucket.

Advantages of using a customer-owned storage bucket

Some Sigma features (like file column uploads) have storage flows that require a customer-owned bucket and cannot be used when an external storage integration isn't configured. Other features (like CSV upload) have default storage flows that use a Sigma-owned bucket, but you can enable the use of a customer-owned bucket instead.

When a feature uses a Sigma-owned bucket to stage, cache, and store files, you cannot see, manage, or access the bucket. These restrictions can conflict with your company's security and compliance requirements. When you choose to use a customer-owned bucket, however, your company gains full control over the following:

Data location (where files live)
IAM and RBAC policies (who has file access)
TTL and lifecycle rules (how long files are retained)
Encryption configuration and keys (how files are encrypted)

This level of access can be necessary if your company has strict compliance requirements, needs to maintain full control over its data, or wants to customize their storage experience to align with their existing infrastructure and security policies.

There are also feature-specific advantages to using a customer-owned bucket. The following table compares each supported feature's default storage flow (without a storage integration) to the customer-owned bucket storage flow (with a storage integration).

Feature	Without storage integration	With storage integration
CSV upload¹	Staging files are temporarily stored in a Sigma-owned bucket before loading to your data platform. Sigma controls the bucket region and lifecycle (24-hour TTL).	Staging files are temporarily stored in the customer-owned bucket before loading to your data platform. Your company controls the bucket region and TTL. This helps your organization meet security and compliance requirements that could otherwise block the use of CSV upload.
File upload column	File upload columns cannot be used.	Long-lived files that can contain sensitive information are stored in the customer-owned bucket, which offers control over data management, security, and compliance.
Export to cloud storage¹	Your data platform builds the export file and writes it to a customer-owned bucket using its own storage integration. This data flow can introduce platform-specific formatting and other inconsistencies in comparison to what users see in Sigma.	Sigma can build the export file and write it directly to the customer-owned bucket. This data flow applies platform-agnostic export logic that results in a cleaner and more consistent export format that aligns with what users see in Sigma.

1CSV uploads and exports require additional configurations to use the storage integration. For more information, see Configure CSV upload and storage options.

Configure a storage integration with Amazon S3

To configure a storage integration that uses your own S3 bucket, you must complete the following procedures:

Create an S3 bucket and IAM policy in AWS
Create a custom IAM role in AWS
Add an AWS integration in Sigma
Update the custom IAM role in AWS
Enable CORS for the S3 bucket in AWS

Create an S3 bucket and IAM policy in AWS

In your AWS account, create an S3 bucket and an IAM Policy to allow bucket access. For detailed instructions, see Creating a general purpose bucket and Creating IAM policies in the AWS documentation.

When creating the IAM policy, use the following policy template. Replace the {{customer_s3_bucket_name}} and {{prefix}} placeholders with the name of your S3 bucket and the folder path prefix that the integration must be allowed to access.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}"
        },
        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "{{prefix}}/*"
                    ]
                }
            }
        },
        {
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:PutObjectTagging"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}/{{prefix}}/*"
        }
    ]
}

Create a custom IAM role in AWS

In your AWS account, create a custom IAM role that Sigma can assume. This role must be created before you add the storage integration in Sigma because the integration uses credentials AWS issues for the role. For detailed instructions, see Creating an IAM role in the AWS documentation.

While creating the IAM role, ensure that your configurations match these requirements for the integration with Sigma:

Select AWS Account as the trusted entity type.
When prompted for an Account ID, you should use your AWS account ID as a temporary value. After you add an AWS integration in Sigma, you must update the IAM role to modify the trusted relationship and grant access to Sigma.
When creating the role, ensure you select Require external ID.
When prompted for an external ID, enter a placeholder value (for example, 0000). Sigma generates an external ID when you add an AWS integration, after which you must update the IAM role.
When selecting permissions, use the IAM policy you just created.

Add an AWS S3 integration in Sigma

You can now add a storage integration in Sigma using an S3 bucket.

In Sigma, go to Administration > Account > General Settings.
In the Storage Integration > External storage integration section, click Add.
In the Add storage integration modal, provide the required AWS credentials.
1. In the Provider section, select AWS S3.
2. In the AWS IAM role ARN field, enter the Role ARN value obtained when you created the IAM role.
3. In the Bucket name field, enter the S3 destination folder path that includes the bucket and folder path prefix specified in the IAM policy.
Click Save, then record the AWS IAM user ARN and AWS external role ARN displayed in the integration details.

Update the custom IAM role trust in AWS

In your AWS account, edit the trust policy document using the ARN values recorded after you created the integration in Sigma. For detailed instructions, see Editing the trust relationship for an existing role in the AWS documentation.

Use the following trust policy template, replacing {{aws_iam_user_arn}} and {{aws_external_role_arn}} with the ARN values you recorded in Sigma.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "{{aws_iam_user_arn}}"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "{{aws_external_role_arn}}"
                }
            }
        }
    ]
}

Enable CORS for the S3 bucket in AWS

In your AWS account, enable cross-origin resource sharing (CORS) for the S3 bucket. For detailed instructions, see Configuring cross-origin resource sharing (CORS) in the AWS documentation.

Use the following CORS configuration:

{
    "AllowedHeaders": [
        "*"
    ],
    "AllowedMethods": [
        "GET",
        "PUT",
        "POST"
    ],
    "AllowedOrigins": [
        "https://app.sigmacomputing.com"
    ],
    "ExposeHeaders": [
        "Access-Control-Allow-Origin"
    ]
}

Configure a storage integration with GCS

To configure a storage integration that uses your own GCS bucket, you must complete the following procedures:

Configure a target bucket in GCP
Configure a service account in GCP
Configure warehouse-specific external storage access
Add a GCP integration in Sigma
Grant impersonation access to the service account in GCP
Enable CORS for the target bucket in GCP

Configure a target bucket in GCP

In your GCP project, create a dedicated target bucket for file uploads. You can select storage options as you see fit, but Standard storage is recommended. For detailed instructions, see Create a bucket in the GCP documentation.

Configure a service account in GCP

In your GCP project, configure a service account with Storage Object Admin permissions to mediate access to the bucket. For detailed instructions, see Create service accounts in the GCP documentation.

Configure warehouse-specific external storage access

If your organization is connected to Snowflake, Databricks, or BigQuery, and CSV upload is configured to use external stages, you must configure external storage access. The required permissions depend on your data platform. Refer to the relevant section below.

Configure external storage access in Snowflake
Configure external storage access in Databricks
Configure external storage access in BigQuery

Configure external storage access in Snowflake

In Snowflake, create a storage integration object that enables access to the GCS bucket. See Configure an integration for Google Cloud Storage in the Snowflake documentation. Record the name of the integration object for later use.
Use the DESCRIBE INTEGRATION {integration_name} command to retrieve the name of the service account referenced by the integration object.
Grant the following permissions to the service account:
- Storage Bucket Viewer
- Storage Object Viewer

Configure external storage access in Databricks

In Databricks, create an external location. See External locations in the Databricks documentation. Record the name of the external location for later use.
Configure a credential set to be used with the external location. See Credentials.
Grant the following permissions to the service account associated with the credential view:
- Storage Object Viewer
- Storage Object Creator

Configure external storage access in BigQuery

Identify the service account or identities associated with your organization's BigQuery connection.
In Google Cloud Console, open the target bucket, then grant the service account the Storage Object Viewer IAM role.

Add a GCS integration in Sigma

You can now add a GCS integration in Sigma.

In Sigma, go to Administration > Account > General Settings.
In the Storage Integration > External storage integration section, click Add.
In the Add storage integration modal, provide the required GCP credentials.
1. In the Provider section, select Google Cloud Storage.
2. In the Service account field, enter the service account ID.
3. (Required by Snowflake and Databricks connections only) In the Warehouse storage integration name field, enter the name of the Snowflake storage integration object or Databricks external location. See Configure warehouse-specific external storage access for more information.
4. In the Bucket name field, enter the name of the GCS bucket.
Click Save.

Grant impersonation access to the service account in GCP

In your GCP project, grant impersonation access to the service account with the Service Account Token Creator role. For detailed instructions, see Service account impersonation and Manage access to service accounts in the GCP documentation.

Enable CORS for the target bucket in GCP

In your GCP project, enable cross-origin resource sharing (CORS) for the target bucket. For detailed instructions, see Set up and view CORS configurations in the GCP documentation.

Use the following CORS configuration:

[
    {
      "origin": ["https://app.sigmacomputing.com"],
      "method": ["GET", "POST", "PUT"],
      "responseHeader": ["*"],
      "maxAgeSeconds": 3600
    }
]