Configure an external storage integration with Amazon S3
Configure an external storage integration using a customer-owned Amazon S3 bucket to give your organization full control over file location, access, retention, and encryption. Some features, like file upload columns in input tables and exports to cloud storage require an external storage integration. Other features, like CSV upload, have default storage flows that use a Sigma-owned bucket, but you can enable the use of a customer-owned bucket instead.
This document explains how to configure a storage integration with a customer-owned S3 bucket. For information about the general and feature-specific advantages of an external storage integration, see External storage integration overview.
Most of the storage integration configuration requires you to complete steps within an AWS account. Because these workflows are maintained and updated by a third party, the steps detailed in this document may reference different UI and terminology than AWS.
Requirements
The ability to configure a storage integration that uses a customer-owned S3 bucket requires the following:
- In Sigma, you must be assigned the Admin account type.
- In AWS, you must be granted administrative permissions or have the ability to create and manage an S3 bucket.
- In AWS, you must also be granted permissions required to create and manage core security policies (IAM roles, ARN definitions, and trust policies).
- Your Sigma organization must be hosted in AWS. If your organization is hosted in Google Cloud Platform (GCP) or Microsoft Azure, see Configure an external storage integration with Google Cloud Storage or Configure an external storage integration with Azure Blob Storage.
Configure a storage integration with Amazon S3
To configure a storage integration that uses your own S3 bucket, complete the following procedures:
- Create an S3 bucket and IAM policy in AWS
- Create a custom IAM role in AWS
- Add an AWS S3 integration in Sigma
- Update the custom IAM role trust in AWS
- Enable cross-origin resource sharing (CORS) in AWS
- Create an IP allowlist in AWS
Create an S3 bucket and IAM policy in AWS
In your AWS account, create an S3 bucket and an IAM policy to allow bucket access. For detailed instructions, see Creating a general purpose bucket and Creating IAM policies in the AWS documentation.
When creating the IAM policy, use the following policy template. Replace the {{customer_s3_bucket_name}} and {{prefix}} placeholders with the name of your S3 bucket and the folder path prefix that the integration must be allowed to access.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}"
},
{
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:PutObjectTagging"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::{{customer_s3_bucket_name}}/{{prefix}}/*"
}
]
}Create a custom IAM role in AWS
In your AWS account, create a custom IAM role that Sigma can assume. This role must be created before you add the storage integration in Sigma because the integration uses credentials AWS issues for the role. For detailed instructions, see Creating an IAM role in the AWS documentation.
While creating the IAM role, ensure that your configurations match these requirements for the integration with Sigma:
- Select AWS Account as the trusted entity type.
- When prompted for an Account ID, you should use your AWS account ID as a temporary value. After you add an AWS S3 integration in Sigma, you must update the IAM role to modify the trusted relationship and grant access to Sigma.
- When creating the role, ensure you select Require external ID.
- When prompted for an external ID, enter a placeholder value (for example,
0000). Sigma generates an external ID when you add an AWS S3 integration in Sigma, after which you must update the IAM role. - When selecting permissions, use the IAM policy you just created.
Add an AWS S3 integration in Sigma
You can now add a storage integration in Sigma using an S3 bucket.
-
In Sigma, go to Administration > Account > General Settings.
-
In the Storage Integration > External storage integration section, click Add.
-
In the Add storage integration modal, provide the required AWS credentials.
-
In the Provider section, select AWS S3.
-
In the AWS IAM role ARN field, enter the Role ARN value obtained when you created the IAM role.
-
In the Bucket name field, enter the S3 destination folder path that includes the bucket and folder path prefix specified in the IAM policy.
-
-
Click Save, then record the AWS IAM user ARN and AWS external role ARN displayed in the integration details.
Update the custom IAM role trust in AWS
In your AWS account, edit the trust policy document using the ARN values recorded after you created the integration in Sigma. For detailed instructions, see Editing the trust relationship for an existing role in the AWS documentation.
Use the following trust policy template, replacing {{aws_iam_user_arn}} and {{aws_external_role_arn}} with the ARN values you recorded in Sigma.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "{{aws_iam_user_arn}}"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "{{aws_external_role_arn}}"
}
}
}
]
}Enable cross-origin resource sharing (CORS) in AWS
In your AWS account, enable CORS for the S3 bucket. For detailed instructions, see Configuring cross-origin resource sharing (CORS) in the AWS documentation.
Use the following CORS configuration:
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET",
"PUT",
"POST"
],
"AllowedOrigins": [
"https://app.sigmacomputing.com"
],
"ExposeHeaders": [
"Access-Control-Allow-Origin",
"ETag"
],
"MaxAgeSeconds": 3600
}This snippet shows a single rule that can be added to an existing list of CORS rules. If there are no other CORS rules configured, wrap the snippet in
[].
Create an IP allowlist in AWS
(Optional) In your AWS account, use policy condition keys to limit access to your bucket based on IP address. Only traffic from approved IP address ranges will be allowed. For detailed instructions, see AWS global condition context keys in the AWS documentation.
Before you specify the policy conditions, you must obtain the relevant IP address ranges.
- Sigma cluster IP addresses: See Add Sigma IPs to the allowlist.
- User IP addresses: Office IP addresses, VPN IP addresses, and any other IP addresses that your Sigma organization users will use to access Sigma.
- Data platform IP addresses: IP addresses used by Snowflake or Databricks instances connected to your Sigma organization. This is only required when using external stages for CSV uploads because the data platform must access the bucket directly.
Updated about 3 hours ago
