Skip to main content
Version: 0.94

Deploying with Helm

Prerequisites

Before installing Snorkel Flow, make sure the following have been done.

  • You have sent Snorkel AI a Docker ID that you’ll use for the initial installation and updates.
  • You have access to a Kubernetes cluster with specs outlined in the Snorkel Flow Kubernetes Installation Overview.
  • You have privileges to schedule pods, deployments, services, ingresses, and secrets, as well as to create namespaces, within a specific AWS EKS cluster. When referring to “the EKS cluster” in the remainder of this document, we will be referring to this cluster.
  • A workspace where you will have access to EKS, as well as the ability to save your Snorkel Flow configuration file, which will be generated during this setup, to a permanent location. This guide assumes an Ubuntu virtual machine is available.
  • You have a set of Snorkel Flow helm charts sent over by Snorkel AI

Note: These instructions assume that you will install Snorkel Flow using Amazon's Elastic Kubernetes Service. Snorkel Flow can run on any Kubernetes installation that follows the specs in Snorkel Flow Kubernetes Installation Overview, but for this tutorial, we will focus on EKS. If you have specific questions about how Snorkel Flow is configured, please contact Snorkel support. For other general management questions regarding Kubernetes, or EKS, please refer to their documentation.

System Dependencies

From your workspace, install the command line tools required to set up Kubernetes resources:

Ensure Cluster Access

Next, you will need to ensure that your kubectl is configured to access the cluster Snorkel Flow will be deployed into. Run the following to ensure the pods returned is what you expect:

kubectl get pods --all-namespaces

If the output of this command is what you expect to be running in the cluster already, we can move on!

Creating the Kubernetes Namespace

All resources created by the Snorkel Flow utilities will live in a single namespace within the EKS cluster. Create this namespace, and set your kubectl context to point to this namespace.

PROJECT_NAME=<A short, unique, alphanumeric name for this instance of Snorkel Flow, such as “snorkeldemo” or “snorkelproduction”>
kubectl create namespace $PROJECT_NAME
kubectl config set-context --current --namespace $PROJECT_NAME

We set the current namespace to the result - for the remainder of the setup, we will be operating in this namespace unless otherwise noted.

Accessing Docker Hub In Kubernetes

In order for your EKS cluster to download Snorkel Flow images, you must give the cluster a registry credential that it can use as an image pull secret. This can be done by running the following:

DOCKER_USERNAME=<username for accessing the Docker registry>
DOCKER_PASSWORD=<password for the associated Docker registry user>
DOCKER_EMAIL=<Email used for Docker hub>
kubectl create secret docker-registry regcred \
-n $PROJECT_NAME \
--docker-server=index.docker.io \
--docker-username=$DOCKER_USERNAME \
--docker-password=$DOCKER_PASSWORD \
--docker-email=$DOCKER_EMAIL

Install Snorkel Flow with Helm

Finally, we'll use the set of Snorkel Flow helm charts to install the full Snorkel Flow platform. With helm, installation comes in 2 steps:

  • Edit the values.yaml file for your specific installation
  • Use the helm CLI to deploy Snorkel Flow

Tailor the values.yaml File

The following is a guide for fields within the values.yaml file. Documentation is also provided in the values.yaml file itself.

NameTypeDefaultDescription
projectNamestringsnorkelflowThe namespace Snorkel Flow will be installed in.
versionstring[YOUR_SNORKEL_FLOW_VERSION]The version of Snorkel Flow being deployed.
image.imageNamesmap{}(Optional) A key/value mapping of service to images, only used if an internal repository is being referenced.
pagerduty_keystring (Optional) A pagerduty key if paging is needed.
affinity.binPackAlwaysPinnedPodsbooleanfalseMore efficiently bin-pack pods for scaleup and scaledown purposes.
affinity.tolerationslist (Optional) A list of tolerations for non-gpu pods.
affinity.nodeAffinitymap (Optional) A map of node affinity fields for pods for non-gpu pods.
autoscaling.worker_autoscalingstring"0"Turn on or off autoscaling for worker services (engine, model-trainer, ray-worker).
autoscaling.cluster_autoscalingmap{"pod_disruption_budget_toggling": "0", "business_hour_start_utc": 11, "business_hour_end_utc": 4}Configure ability for k8s to move Snorkel Flow pods around during non-business hours.
traffic.basePathstring (Optional) Configure a base path for Snorkel Flow.
traffic.istio.enabledbooleanfalseConfigure istio if using istio as an ingress gateway.
traffic.istio.mtlsmap{"enabled": false}Configures mutual TLS for istio if istio is enabled.
traffic.istio.gatewaymap{"create": true}Allow automatic creation of an istio gateway or not if istio is enabled.
traffic.ingresses.domainstring"snorkel-ai.com"URL domain of the Snorkel Flow install (e.g. for snorkelflow.snorkel-ai.com, domain would be snorkel-ai.com).
traffic.ingresses.ingressClassNamestringnullThe name of the ingress class being used for ingresses, if not using default.
traffic.ingresses.serviceTypestring"ClusterIP"The ServiceType of services that require an ingress (should be either NodePort or ClusterIP).
traffic.ingresses.cloudProviderstring (Optional) Specify the cloud provider the ingresses are for or leave blank otherwise. Currently supported providers: gcp, aws, azure.
traffic.ingresses.tlsHostsmap{"enabled": false}Add tls hosts to the ingress (typically not needed).
traffic.ingresses.annotationsmap{}Global annotations that are applied to all ingress objects, typically used for ingress controller specific annotations.
traffic.ingresses.services.[SERVICE]map{[SERVICE]: {"enabled": true, "urlPrefix": [SERVICE_URL_PREFIX], "annotations": {}}}Configure ingresses objects for each individual ingress.
traffic.tlsmap{"key_secret_name": "envoy-front-proxy-envoy-tls-key-pem-secret", "cert_secret_name": "envoy-front-proxy-envoy-tls-cert-pem-secret"}Configure a cert/key pair for envoy to terminate TLS.
traffic.allowAllInboundTrafficOnKeyServicesbooleantruePermit select services to receive all inbound traffic, defaulting to true. This applies to services that have a direct ingress object.
traffic.allowInternetAccessbooleantruePermit services to access the internet, typically used for downloading external models.
traffic.networkPolicies.enabledbooleanfalseEnable Snorkel Flow network policies to be applied.
traffic.networkPolicies.ingressesmap (Optional) Create additional networkPolicy ingress blocks for services with a direct ingress. Typically used to specify inbound traffic from a specific in-cluster ingress controller.
gpu.enabledbooleanfalseDeploy Snorkel Flow with GPU support.
gpu.gpu_config.tolerationslist (Optional) A list of tolerations for gpu-enabled pods.
gpu.gpu_config.node_selectorsmap (Optional) Key/value pairs of node selectors for the gpu pods.
gpu.gpu_config.schedulerNamestring (Optional) Configure a GPU scheduler (e.g. if using Run:ai).
gpu.separate_gpu_podsbooleanfalseSeparates worker pods into cpu and gpu pods.
prefect.enabledbooleantrueUse Prefect workflow engine.
namespace.enabledbooleantrueEnable namespace creation as part of the helm deploy.
services.envmap{}Key/value pairs of shared environment variables to append to all pods.
services.labelsmap{}Key/value pairs of shared labels to append to all pods.
services.[SERVICE].resourcesmap{}Configure non-default resource allocations for each service.
services.[SERVICE].envmap{}Key/value pairs of environment variables to append to the given service.
services.[SERVICE].labelsmap{}Key/value pairs of labels to append to the given service.
services.[SERVICE].min_replicasint0For autoscaled services, configure a minimum replica count.
services.[SERVICE].max_replicasintVARIESFor autoscaled services, configure a maxiumum replica count.
services.db.shared_buffersstring"2GB"Configure the database shared buffers size. This should ideally be 25% of the requested database memory.
services.jupyterhub.enabledbooleantrueEnable the in-platform per-user notebook service. Disabling this will default to a single shared notebook service.
services.jupyterhub.singleUserNotebook.serviceAccountNamestring"snorkelflow-jupyterhub-user-sa"The name of the service account to bind to the single-user notebook pods.
services.jupyterhub.singleUserNotebook.startTimeoutint300The start timeout of a single-user notebook pod, in seconds.
services.jupyterhub.singleUserNotebook.gpubooleanfalseWhether notebook pods should spin up with a GPU (gpu.gpu_config must be filled out).
services.jupyterhub.singleUserNotebook.resourcesmap{"cpu_guarantee": 1, "cpu_limit": 1, "memory_gurantee": "2G", "memory_limit", "8G"}Define resource requests and limits for single-user notebook pods.
services.jupyterhub.singleUserNotebook.storagemap{"dynamicClass": "null", "type": "dynamic"}Define storage settings for the single-user notebook pods.
services.secretGenerator.enabledbooleanfalseEnable the secrets generator job, which will dynamically create secrets once and exit. Typically not needed.
volumes.[VOLUME].storageClassstringVARIESSpecify the storageclass of the volume if not using the default.
volumes.[VOLUME].storageRequeststringVARIESAmount of memory needed for a particular volume.
volumes.[VOLUME].volumeNamestring (Optional) Specify the volumeName if using a specific PersistentVolume.
volumes.[VOLUME].persistentVolume.enabledbooleanfalseEnable creation of a corresponding PersistentVolume object in the charts for a PVC.
volumes.[VOLUME].persistentVolume.drivermap{}Configure a driver for the PersistentVolume being created. Refer to k8s documentation for all driver plugins.
authorization.adRoles.enabledbooleanfalseEnable active directory roles in Snorkel Flow.
authorization.adRoles.oidcmap{}(Optional) Configure authorization for OIDC. It's possible to configure this once the platform is running. ex: {"claim": "claim", "prefix": "prefix", "separator": "_"}
authorization.adRoles.samlmap{}(Optional) Configure authorization for SAML. It's possible to configure this once the platform is running. ex: {"attributeName": "SnorkelRoles", "prefix": "prefix", "separator": "_"}
authentication.jwtmap {"enabled": false} Use external JWT to login. Please consult Snorkel support before enabling.
authentication.oidcmap Configure OIDC authentication at deploy time. This is not required, as it's possible to also configure OIDC once the platform is running.
authentication.rolemap{"key": null, "value": null}(Optional) Define a role from your cloud provider for deployments to use.
pretrained_models.enabledstringfalseEnable model crate kubernetes job and configure worker pods to use it. (Also requires specifying a valid model crate image in image.imageNames.pretrainedModelImage).

Deploy Snorkel Flow with the Helm CLI

To see the output of the templated charts with the values in the values.yaml file, run the following:

$ helm template --values [PATH_TO_VALUES_FILE] [PATH_TO_CHART_DIRECTORY]

If things look good, install Snorkel Flow by running the following:

$ helm install --values [PATH_TO_VALUES_FILE] [PATH_TO_CHART_DIRECTORY]

At this point Snorkel Flow should be successfully installed into your cluster!