Snorkel Flow Kubernetes Installation Overview
Overview
This document is your starting point to deploy Snorkel Flow on Kubernetes. Overall, installing Snorkel Flow is a 2-step process:
Preparing Infrastructure Resources
At a high level, Snorkel Flow requires the following pieces of infrastructure for deployment:
- An operational kubernetes cluster
- NFS or equivalent (NAS drive, etc.)
- A domain, to create urls for the platform
For the 3 major clouds (AWS, GCP, and Azure) for Snorkel Flow, Snorkel Flow prefers to deploy into a net-new k8s cluster, but deploying into an existing cluster and on-prem options are also possible. Below, we offer both terraform as well as manual methods of spinning up all required cloud infrastructure pieces for the 3 major clouds in preparation for installing Snorkel Flow.
Assumptions and Requirements
Snorkel Flow can run on any Kubernetes installation as long as it satisfies our requirements on Cluster Spec, Storage and Ingress.
Cluster Specifications
| Category | Requirement |
|---|---|
| Minimum Kubernetes version | 1.25+ |
| Recommended Kubernetes distribution |
- AWS EKS
- GCP GKE
- Azure AKS
- OpenShift 4.9+
| | Node Processor | X86_64 | | Namespace total CPU | 64+ | | Namespace total RAM | 360 GB+ | | Namespace total GPU | 4+ T4 or better | | Available per-pod CPU | 16+ | | Available per-pod RAM | 64 GB+ | | Storage volumes | 768 GB+ (use-case dependent) NFS-equivalent with read/write access |
Storage and Ingress
To run Snorkel Flow in your Kubernetes cluster, whether it’s running in a private data center or public cloud (AWS/GCP/Azure), we will need the following:
- A StorageClass that supports the ReadWriteOnly access mode.
- A StorageClass that supports the ReadWriteMany access mode (e.g., NFS, or other filesystems).
- For AWS, Elastic File System (EFS) is a good option.
- An IngressController that is running.
- For AWS, AWS Load Balancer Controller is a good option.
- A DNS domain/zone that is ready for Snorkel Flow to use.
- Snorkel Flow requires 5 subdomains.
- A TLS certificate with a publicly-recognized cert.
Major Cloud Infrastructure Scripts
For the 3 major cloud environments (AWS, GCP, Azure), we provide terraform and manual instructions to spin up required infrastructure that satisfy the requirements above.
AWS
GCP
Azure
Installing Snorkel Flow Into the Kubernetes Cluster
At this point, all of the required infrastructure resources are provisioned, and you're ready to deploy Snorkel Flow into the Kubernetes cluster! We offer 2 methods of installation into the cluster: