Skip to main content
Version: 25.5

Snorkel Kubernetes installation overview

This topic is your starting point to deploy the Snorkel AI Data Development Platform on Kubernetes. Installing the Snorkel AI Data Development Platform is a two-step process:

  1. Preparing Infrastructure Resources
  2. Installing Snorkel into the Kubernetes Cluster

Preparing infrastructure resources

At a high level, the Snorkel AI Data Development Platform requires the following pieces of infrastructure for deployment:

  • An operational Kubernetes (K8s) cluster
  • NFS or equivalent (NAS drive, etc.)
  • A domain to create URLs for the platform

If you are using a major cloud provider (AWS, GCP, and Azure) for Snorkel, you should deploy into a net-new K8s cluster. However, deploying into an existing cluster and on-prem options are possible. Snorkel offers Terraform and manual methods of spinning up all required cloud infrastructure pieces for the three major cloud providers in preparation for installing the Snorkel AI Data Development Platform.

Assumptions and requirements

The Snorkel AI Data Development Platform can run on any Kubernetes installation as long as it satisfies our requirements on Cluster Specifications and Storage and Ingress.

Cluster Specifications

CategoryRequirement
Minimum Kubernetes version 1.25+ 
Recommended Kubernetes distributionAWS EKS, GCP GKE, Azure AKS, OpenShift 4.9+
Node ProcessorX86_64
Namespace total CPU64+
Namespace total RAM360 GB+
Namespace total GPU4+ T4 or better
Available per-pod CPU16+
Available per-pod RAM64 GB+
Storage volumes768 GB+ (use-case dependent) NFS-equivalent with read/write access

Storage and ingress

To run the Snorkel AI Data Development Platform in your K8s cluster, whether it’s running in a private datacenter or public cloud, you must meet these requirements:

  • A StorageClass that supports the ReadWriteOnly access mode.
  • A StorageClass that supports the ReadWriteMany access mode (e.g., NFS, or other filesystems).
    • For AWS, Elastic File System (EFS) is a good option.
  • An IngressController that is running.
  • A DNS domain/zone that is ready for the Snorkel AI Data Development Platform to use. Snorkel requires five subdomains.
  • A TLS certificate with a publicly-recognized cert.

Major cloud infrastructure scripts

For the three major cloud providers, we provide Terraform and manual instructions to spin up required infrastructure that satisfy the requirements above.

AWS

GCP

Azure

Install Snorkel into the Kubernetes cluster

Once all of the required infrastructure resources are provisioned, you're ready to deploy the Snorkel AI Data Development Platform into the Kubernetes cluster. Choose from these methods of installation into the cluster: