Skip to main content
Version: 0.93

Setup and Installation

Snorkel Flow Python SDK

The snorkelflow SDK can be provided in several formats for installation on the Snorkel Flow host server or remote nodes:

  • A wheel file, e.g., snorkelflow-xx.yy.zz-py3-none-any.whl
  • A Docker image containing the wheel file

Installation instructions for each distribution method can be found below.

tip

In all cases, we recommend using a Python or Anaconda virtual environment.

note

The snorkelflow SDK requires Python 3.8+

note

The snorkelflow wheel requires pip >= 19.0 ,
run python3 -m pip install --upgrade pip to upgrade pip.

Installing from a wheel file

Install the snorkelflow package by running:

python3 -m pip install path/to/snorkelflow-xx.yy.zz-py3-none-any.whl

replacing path/to/snorkelflow-xx.yy.zz-py3-none-any.whl with the correct path and file name. There are a number of extras defined for the snorkelflow SDK which specifies external libraries needed for extra functionality.

  • data: dependencies needed for the dataset loading utilities in snorkelflow
  • debug: dependencies needed to generate support bundles for debugging performance
  • install: dependencies needed for the installer tool (snorkel-install)
  • kubernetes: dependencies needed to generate Kubernetes configurations
  • sdk: dependencies needed for the software development kit interface

For example, to install the sdk extras, run:

python3 -m pip install 'path/to/snorkelflow-xx.yy.zz-py3-none-any.whl[sdk]'

replacing path/to/snorkelflow-xx.yy.zz-py3-none-any.whl with the correct path and file name. Installing from a Docker image The snorkelflow SDK will be bundled in a Docker image called snorkelai/snorkelflow-whl. Extract the snorkelflow SDK by running:

VERSION=<snorkelflow version>
LOCAL_WHL_PATH=/tmp/whl
mkdir -p $LOCAL_WHL_PATH
CONTAINER_ID=$(docker create snorkelai/snorkelflow-whl:$VERSION bash)
docker cp $CONTAINER_ID:/ $LOCAL_WHL_PATH

replacing <snorkelflow version> with the provided version of snorkelflow. Once the wheel file is extracted, follow the instructions for Installing from a wheel file using the wheel file located in $LOCAL_WHL_PATH.

Authenticating your SDK Client

SDK usage outside the Snorkel Flow platform requires authentication via an API key. Instructions on SDK authentication can be found in Snorkel Flow API Authentication.

Installing from a standalone Anaconda channel

Instructions for installation from a standalone Anaconda channel are provided separately by Snorkel AI.

Snorkel Flow Platform

Installing Snorkel Flow

Follow the separately distributed Snorkel Flow Installation Guide to install the platform.

Configuring Snorkel Flow

Configuration options for the Snorkel Flow platform can be updated before or after installation. To make changes after the platform is deployed, the platform needs to be restarted.

snorkel-install shutdown && snorkel-install bootstrap

Snorkel Flow user configuration file

The user configuration file (.snorkel-flow.yaml) is the primary interface for configuring the platform. The CLI needs to know the location of this configuration file to execute commands. It first checks the path supplied via -c/--config-file if the argument is provided, then checks the home directory for a file named .snorkel-flow.yaml (or similar variations). The file follows YAML formatting and can be edited using a standard text editor.

User configuration file options

KeyUser value required?Value typeDefault valueDescription
versionYesstring Snorkel Flow version (e.g. 0.18.0)
project_nameYesstring Name of the docker-compose project (use "snorkelflow")
domainNostring Kubernetes only: The top-level domain for Kubernetes ingress generation, sub-domains will be generated following the convention {project_name}-{service}.{domain}.
pagerduty_keyNostring If set, this is the API integration key that Grafana will use as its default notification channel. This can be used to respond quickly to issues that might be occurring in Snorkel Flow.
host_ipYesstring IP address or host name where Snorkel Flow will be served from
work_directoryYesstring Absolute path of the working directory where configuration files will be stored.
mount_directoryYesstring Absolute path of the mount directory. Data in this directory can be ingested directly by the platform.
platformNostring"docker"Container platform. Options: "docker", "podman", "k8s"
podmanNomap{}Additional options for Podman. Optional keys: "infra_image" (name of the desired infra image)
image_name_suffixNostring""Suffix to append to all image names
docker_groupsNolist[]Supplementary groups to run the container processes as. Snorkel Flow will be able to access files for which these groups have permissions. See https://docs.docker.com/engine/reference/run/#additional-groups.
ui_ingress_portNoint8900Ingress port for the web UI
tdm_api_ingress_portNoint8686Ingress port for the TDM API
engine_notebook_portNoint8888HTTP ingress port for the in-platform Notebook server. If left empty, defaults to 8888. No effect on Kubernetes deployments.
notebook_tls_portNoint TLS ingress port for the in-platform Notebook server. If left empty, defaults to 18888. No effect on Kubernetes deployments.
notebook_ipNostr Parameter to set if notebook should live on a different domain. If left empty, defaults to host_ip. No effect on Kubernetes deployments.
minio_ingress_portNoint8989Ingress port for the MinIO web UI
banner_textNostring""Custom text to display in the banner on all pages of the web UI
set_memory_limitsNobooleanfalseWhether to set automatically-calculated memory limits based on the host machine. Only applicable if platform is set to "docker" or "podman". Consult with Snorkel support before enabling.
tlsNomap{}Configuration options for TLS. See “TLS Configuration Options” table below for options.

The following table details TLS configuration options:

| Key | User value required? | Value type | Default value | Description | | --- | --- | --- | --- | --- |TLS configuration options
| key_pem_path | No | string | "" | A path to a file containing a private key that will be used to secure connections to the Snorkel platform. All services included in the Snorkel platform will be proxied by an Envoy instance using this key on different ports. Defaults to a key and Snorkel-issued self-signed certificate. | | cert_pem_path | No | string | "" | A path to a file containing a certificate that will be presented to clients connecting the Snorkel platform. All services included in the Snorkel platform will use this certificate on different ports. Defaults to a Snorkel-issued self-signed certificate. Note: it is strongly recommended that you issue and configure your own certificate that matches any hostname you set up for Snorkel Flow to ensure a secure connection. |

The following table details Kubernetes specific configuration options. Two performance options engine_dask_replicas and studio_dask_replicas are available for creating additional pods, however, it is not recommended to change the default parameter. Please contact support@snorkel.ai if performance degradations occur.

| Key | User value required? | Value type | Default value | Description | | --- | --- | --- | --- | --- |K8s configuration options
| engine_dask_replicas | No | int | 2 | Advanced: setting for creating additional engine_dask pods | | studio_dask_replicas | No | int | 1 | Advanced: setting for creating additional studio_dask pods |

Advanced configuration options

Additional configuration options are available through manually editing lower-level platform configuration files, though this should be avoided in most circumstances. If you find yourself frequently using any of these options, submit a feature request to Snorkel AI to add it to the user configuration file. For single server installations, advanced configuration options are applied by manually editing the docker-compose.yml found in the platform work_directory. To prevent the platform from overwriting manual edits when starting, use the --no-overwrite option for every subsequent snorkel-install bootstrap call. After each platform upgrade, the manual edits will be overwritten with the default settings and will need to be reapplied. For Kubernetes installations, advanced configuration options are applied by manually editing and reapplying the Kubernetes configuration files.

Configuring TLS for Snorkel Flow

Note
Kubernetes installations may also configure TLS by modifying ingress configurations to match their infrastructure. The provided instructions apply to single-host installations of Snorkel Flow.

Note
Podman installations that configure TLS will not provide any access on the privileged port 443. Snorkel Flow supports TLS (https://) connections for the main web UI, MinIO, and Notebook interfaces. In addition, Snorkel supports TLS usage in the SDK when communicating with the TDM and Studio APIs. These can be enabled by performing the following:

  • Derive a public/private key pair and a certificate chain that can be used to terminate TLS. The certificate should be signed by a trusted authority and include a Subject Alternative Name matching the domain/host where you will be hosting Snorkel Flow (the host_ip setting). A single, non-wildcard certificate is all that is required. You can use any service or authority you trust to sign your key. For more information on how to set up your keys, please follow a tutorial online such as this one.
  • Upload your private key file and certificate file as .pem files on a filesystem accessible by the snorkel-install CLI when running installation or upgrade commands.
  • Set the tls key and certificate paths in the configuration. You can use the snorkel-install config set CLI to do this:
CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH tls.key_pem_path /path/to/private/key.pem # replace with the path to the key
snorkel-install config set --config-file $CONFIG_PATH tls.cert_pem_path /path/to/private/cert_chain.pem # replace with the path to the certificate
  • The next time you run snorkel-install bootstrap, these keys will be made available to Envoy, a load balancer and service proxy that provides TLS support for our services.

HTTPS connectivity for all services is available through Envoy. The following table shows which plaintext ports map to TLS ports hosted on Envoy.

ServicePlaintext PortTLS Port
Flow UI80443
Minio898918989
Notebook888818888
Studio API848418484
TDM API868618686
Admin777717777

Configuring a MinIO Password

Snorkel Flow supports the ability to use a custom access and secret key for MinIO. This can be enabled by performing the following before booting up the app the first time:

  1. Use snorkel-install config set to set the minio fields in the Snorkel Flow configuration file:

    CONFIG_PATH=path/to/snorkel_config.yaml
    snorkel-install config set --config-file $CONFIG_PATH minio.minio_access_key [ACCESS_KEY]
    snorkel-install config set --config-file $CONFIG_PATH minio.minio_secret_key [SECRET_KEY]

  2. When you run snorkel-install bootstrap to boot up the app, the MinIO instance in Snorkel Flow will use the credentials specified above.

Configuring a Grafana Password

Snorkel Flow supports the ability to use a custom username and password for Grafana. This can be enabled by performing the following:

  1. Use snorkel-install config set to set the grafana fields in the Snorkel Flow configuration file:

    CONFIG_PATH=path/to/snorkel_config.yaml
    snorkel-install config set --config-file $CONFIG_PATH grafana.admin_username [USERNAME]
    snorkel-install config set --config-file $CONFIG_PATH grafana.admin_password [PASSWORD]

  2. When you run snorkel-install bootstrap to boot up the app, the Grafana instance in Snorkel Flow will use the credentials specified above.

Configuring an External Postgres for Snorkel Flow

Snorkel Flow supports the ability to use an external Postgres (e.g. an AWS RDS instance) instead of the in-platform Postgres as its database. This can be enabled by performing the following:

  1. Create the external Postgres instance, and take note of the username, password, and URL.

  2. Use snorkel-install config set to set the external_postgres fields in the Snorkel Flow configuration file:

    CONFIG_PATH=path/to/snorkel_config.yaml
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.username [USERNAME]
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.password [PASSWORD]
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.url [URL]

  3. The next time you run snorkel-install bootstrap, Snorkel Flow will use the postgres specified above. Existing data that lives in the in-cluster Postgres (if any) will NOT be transferred over.

Configuring an External Redis for Snorkel Flow

Snorkel Flow supports the ability to use an external Redis (e.g. an AWS ElastiCache instance) instead of the in-platform Redis. This can be enabled by performing the following:

  1. Create the external Redis instance, and take note of the url and password (if any). Also, take note if SSL (e.g. encryption in transit) is enabled on the Redis instance.

  2. Use snorkel-install config set to set the external_redis fields in the Snorkel Flow configuration file:

    CONFIG_PATH=path/to/snorkel_config.yaml
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.url [URL]
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.ssl [True|False]
    snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.password [PASSWORD]

  3. The next time you run snorkel-install bootstrap, Snorkel Flow will use the redis specified above. Existing data that lives in the in-cluster Redis (if any) will NOT be transferred over.

Changing the MinIO password

By default, the MinIO access key and secret key are both set to snorkeladmin. To change them, edit the docker-compose.yaml file if using a single server installation or the Kubernetes configuration deployment file for MinIO if using Kubernetes. Instructions for single-server installations are shown below:

  1. Shutdown the platform: snorkel-install shutdown
  2. Edit the docker-compose.yaml file to add environment variables for the MinIO service called MINIO_ACCESS_KEY_OLD and MINIO_SECRET_KEY_OLD where the values are set to the current access key and secret key, respectively. Update the MINIO_ACCESS_KEY and MINIO_SECRET_KEY environment variables to be the desired access key and secret key, respectively. See the MinIO documentation for updating credentials.
  3. Restart the platform without overwriting the configuration file: snorkel-install bootstrap --no-overwrite
  4. Once the platform has restarted, remove the MINIO_ACCESS_KEY_OLD and MINIO_SECRET_KEY_OLD environment variables from the docker-compose.yaml file to avoid double encryption of data.

Upgrading Snorkel Flow

For single-server installations, the snorkel-install CLI can upgrade the platform either automatically given access to the snorkelflow-whl container image or manually. In both cases, the new platform container images need to be loaded on the local Docker client or accessible to pull from Docker Hub.

Upgrading using the snorkelflow-whl container image

If your snorkelflow Python package is installed via the snorkelflow-whl container image, run snorkel-install upgrade --version <version>, replacing <version> with the new version. If the upgrade succeeds, the CLI should print out a success message.

Upgrading manually

Use the following procedure to upgrade manually:

  1. Run snorkel-install shutdown
  2. Run snorkel-install config set version <version>, replacing <version> with the new version
  3. Reinstall the new snorkelflow Python package using the same procedure as the original platform installation
  4. Run snorkel-install bootstrap

If the upgrade succeeds, the CLI should print out a success message.

Uninstalling Snorkel Flow

To uninstall Snorkel Flow and remove all files, use the following procedure for single-server installations:

  1. Run snorkel-install shutdown --hard and answer the prompt
  2. Remove Snorkel Flow-created directories in the configured mount directory
  3. Remove the configured work directory
  4. Uninstall the snorkelflow Python package
  5. Remove the Snorkel Flow container images

Network topology of Snorkel Flow

A mapping of ports that have been opened and made available. These ports may be changed or expanded in future releases. The “Network or Localhost Only” indicates whether the port needs to be routable via other hosts on the network or if it just needs to be available on the loopback interface. It is sufficient to limit firewall rules to only permitting those ports designated as “Network” ports.

ServiceDescriptionPlaintext PortTLS PortNetwork or Localhost Only
Flow UIThe primary web interface for Snorkel.80443Network
EnvoyAdministrative portal for Envoy, which is used for terminating TLS.9901 Localhost Only
Flow UIAlternate plaintext port for Flow. Used for reverse-proxying.8900 Localhost Only
GrafanaMetrics dashboard to help diagnose problems in Snorkel.3000 Network
InfluxDBMetrics storage backend for Grafana.9999 Localhost Only
MinioFile management UI.898918989Network
NotebookJupyter Notebook server, exposed via Flow UI.888818888Network
Notebook APIInternal management API for Notebooks.8282 Localhost Only
Studio APIAPIs controlling studio functions.848418484Network
TDM APIAPIs controlling training data.868618686Network
AdminAdministrative portal for Snorkel.777717777Network
EngineDebugging view for asynchronous job execution of up to 64 workers.12000-12063 Network