Skip to main content
Version: 0.91

Snorkel Flow Kubernetes Installation Overview

Overview

This document is your starting point to deploy Snorkel Flow on Kubernetes. Overall, installing Snorkel Flow is a 2-step process:

  1. Preparing Infrastructure Resources
  2. Installing Snorkel Flow into the Kubernetes Cluster

Preparing Infrastructure Resources

At a high level, Snorkel Flow requires the following pieces of infrastructure for deployment:

  • An operational kubernetes cluster
  • NFS or equivalent (NAS drive, etc.)
  • A domain, to create urls for the platform

For the 3 major clouds (AWS, GCP, and Azure) for Snorkel Flow, Snorkel Flow prefers to deploy into a net-new k8s cluster, but deploying into an existing cluster and on-prem options are also possible. Below, we offer both terraform as well as manual methods of spinning up all required cloud infrastructure pieces for the 3 major clouds in preparation for installing Snorkel Flow.

Assumptions and Requirements

Snorkel Flow can run on any Kubernetes installation as long as it satisfies our requirements on Cluster Spec, Storage and Ingress.

Cluster Specifications

CategoryRequirement
Minimum Kubernetes version 1.25+ 
Recommended Kubernetes distribution
  • AWS EKS
  • GCP GKE
  • Azure AKS
  • OpenShift 4.9+

| | Node Processor | X86_64 | | Namespace total CPU | 64+ | | Namespace total RAM | 360 GB+ | | Namespace total GPU | 4+ T4 or better | | Available per-pod CPU | 16+ | | Available per-pod RAM | 64 GB+ | | Storage volumes | 768 GB+ (use-case dependent) NFS-equivalent with read/write access |

Storage and Ingress

To run Snorkel Flow in your Kubernetes cluster, whether it’s running in a private data center or public cloud (AWS/GCP/Azure), we will need the following:

  • A StorageClass that supports the ReadWriteOnly access mode.
  • A StorageClass that supports the ReadWriteMany access mode (e.g., NFS, or other filesystems).
    • For AWS, Elastic File System (EFS) is a good option.
  • An IngressController that is running.
  • A DNS domain/zone that is ready for Snorkel Flow to use.
    • Snorkel Flow requires 5 subdomains.
  • A TLS certificate with a publicly-recognized cert.

Major Cloud Infrastructure Scripts

For the 3 major cloud environments (AWS, GCP, Azure), we provide terraform and manual instructions to spin up required infrastructure that satisfy the requirements above.

AWS

GCP

Azure

Installing Snorkel Flow Into the Kubernetes Cluster

At this point, all of the required infrastructure resources are provisioned, and you're ready to deploy Snorkel Flow into the Kubernetes cluster! We offer 2 methods of installation into the cluster: