Skip to main content
Version: 25.7

External S3 bucket storage

Snorkel provides the ability to use Amazon S3 as external storage for datasets and related objects (datasources, uploaded files, etc.). When configured, Snorkel performs in-platform authorization checks to verify user access to files stored in S3.

Prerequisites

Before configuring external S3 storage, ensure you have:

  • A Snorkel instance that is either Snorkel-hosted or running on Amazon EKS
  • An S3 bucket with appropriate permissions
  • AWS credentials and permissions to create IAM roles

Configuration

On-premises instances

For on-premises deployments, configure external storage in your Snorkel configuration:

external_storage:
enabled: false
# bucket: "s3://my-company-snorkel-storage-bucket"
# roleArn: "arn:aws:iam::123456789012:role/SnorkelStorageRole"
# region: "us-west-2"

Note: For on-premises installations, follow the AWS EKS IAM roles for service accounts documentation to set up proper IAM configuration.

Contact Snorkel support for detailed installation guidance specific to your environment.

Snorkel-hosted instances

For Snorkel-hosted instances, follow these steps to configure cross-account S3 access:

Step 1: Obtain OIDC issuer URL

Contact Snorkel support to receive the OIDC issuer URL from your Snorkel cluster.

Step 2: Create IAM OIDC provider.

Follow the AWS cross-account access documentation to:

  1. Create an IAM OIDC provider for your cluster.
  2. Assign IAM roles to Kubernetes service accounts using the issuer URL obtained from Snorkel in the previous step.

Step 3: Configure IAM role permissions

The created IAM role must have the following S3 permissions:

{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::your-bucket-name",
"arn:aws:s3:::your-bucket-name/*"
]
}

Important: Replace your-bucket-name with your actual S3 bucket name.

Step 4: Provide role ARN to Snorkel

After creating the IAM role, provide the role ARN to Snorkel support to complete the configuration.

Final configuration

Once configured with the S3 bucket, role ARN, and AWS region, Snorkel can store and manage datasets in your external S3 bucket while maintaining proper access controls and authorization.