Setup and Installation
Snorkel Flow Python SDK
The snorkelflow
SDK can be provided in several formats for installation on the Snorkel Flow host server or remote nodes:
- A wheel file, e.g.,
snorkelflow-xx.yy.zz-py3-none-any.whl
- A Docker image containing the wheel file
Installation instructions for each distribution method can be found below.
In all cases, we recommend using a Python or Anaconda virtual environment.
The snorkelflow
SDK requires Python 3.8+
The snorkelflow
wheel requires pip >= 19.0
,
run python3 -m pip install --upgrade pip
to upgrade pip.
Installing from a wheel file
Install the snorkelflow
package by running:
python3 -m pip install path/to/snorkelflow-xx.yy.zz-py3-none-any.whl
replacing path/to/snorkelflow-xx.yy.zz-py3-none-any.whl
with the correct path and file name. There are a number of extras
defined for the snorkelflow
SDK which specifies external libraries needed for extra functionality.
data
: dependencies needed for the dataset loading utilities insnorkelflow
debug
: dependencies needed to generate support bundles for debugging performanceinstall
: dependencies needed for the installer tool (snorkel-install
)kubernetes
: dependencies needed to generate Kubernetes configurationssdk
: dependencies needed for the software development kit interface
For example, to install the sdk
extras, run:
python3 -m pip install 'path/to/snorkelflow-xx.yy.zz-py3-none-any.whl[sdk]'
replacing path/to/snorkelflow-xx.yy.zz-py3-none-any.whl
with the correct path and file name. Installing from a Docker image The snorkelflow
SDK will be bundled in a Docker image called snorkelai/snorkelflow-whl
. Extract the snorkelflow
SDK by running:
VERSION=<snorkelflow version>
LOCAL_WHL_PATH=/tmp/whl
mkdir -p $LOCAL_WHL_PATH
CONTAINER_ID=$(docker create snorkelai/snorkelflow-whl:$VERSION bash)
docker cp $CONTAINER_ID:/ $LOCAL_WHL_PATH
replacing <snorkelflow version>
with the provided version of snorkelflow
. Once the wheel file is extracted, follow the instructions for Installing from a wheel file using the wheel file located in $LOCAL_WHL_PATH
.
Authenticating your SDK Client
SDK usage outside the Snorkel Flow platform requires authentication via an API key. Instructions on SDK authentication can be found in Snorkel Flow API Authentication.
Installing from a standalone Anaconda channel
Instructions for installation from a standalone Anaconda channel are provided separately by Snorkel AI.
Snorkel Flow Platform
Installing Snorkel Flow
Follow the separately distributed Snorkel Flow Installation Guide to install the platform.
Configuring Snorkel Flow
Configuration options for the Snorkel Flow platform can be updated before or after installation. To make changes after the platform is deployed, the platform needs to be restarted.
snorkel-install shutdown && snorkel-install bootstrap
Snorkel Flow user configuration file
The user configuration file (.snorkel-flow.yaml
) is the primary interface for configuring the platform. The CLI needs to know the location of this configuration file to execute commands. It first checks the path supplied via -c/--config-file
if the argument is provided, then checks the home directory for a file named .snorkel-flow.yaml
(or similar variations). The file follows YAML formatting and can be edited using a standard text editor.
User configuration file options
Key | User value required? | Value type | Default value | Description |
---|---|---|---|---|
version | Yes | string | Snorkel Flow version (e.g. 0.18.0) | |
project_name | Yes | string | Name of the docker-compose project (use "snorkelflow" ) | |
domain | No | string | Kubernetes only: The top-level domain for Kubernetes ingress generation, sub-domains will be generated following the convention {project_name}-{service}.{domain} . | |
pagerduty_key | No | string | If set, this is the API integration key that Grafana will use as its default notification channel. This can be used to respond quickly to issues that might be occurring in Snorkel Flow. | |
host_ip | Yes | string | IP address or host name where Snorkel Flow will be served from | |
work_directory | Yes | string | Absolute path of the working directory where configuration files will be stored. | |
mount_directory | Yes | string | Absolute path of the mount directory. Data in this directory can be ingested directly by the platform. | |
platform | No | string | "docker" | Container platform. Options: "docker" , "podman" , "k8s" |
podman | No | map | {} | Additional options for Podman. Optional keys: "infra_image" (name of the desired infra image) |
image_name_suffix | No | string | "" | Suffix to append to all image names |
docker_groups | No | list | [] | Supplementary groups to run the container processes as. Snorkel Flow will be able to access files for which these groups have permissions. See https://docs.docker.com/engine/reference/run/#additional-groups. |
ui_ingress_port | No | int | 8900 | Ingress port for the web UI |
tdm_api_ingress_port | No | int | 8686 | Ingress port for the TDM API |
engine_notebook_port | No | int | 8888 | HTTP ingress port for the in-platform Notebook server. If left empty, defaults to 8888. No effect on Kubernetes deployments. |
notebook_tls_port | No | int | TLS ingress port for the in-platform Notebook server. If left empty, defaults to 18888. No effect on Kubernetes deployments. | |
notebook_ip | No | str | Parameter to set if notebook should live on a different domain. If left empty, defaults to host_ip. No effect on Kubernetes deployments. | |
minio_ingress_port | No | int | 8989 | Ingress port for the MinIO web UI |
banner_text | No | string | "" | Custom text to display in the banner on all pages of the web UI |
set_memory_limits | No | boolean | false | Whether to set automatically-calculated memory limits based on the host machine. Only applicable if platform is set to "docker" or "podman" . Consult with Snorkel support before enabling. |
tls | No | map | {} | Configuration options for TLS. See “TLS Configuration Options” table below for options. |
The following table details TLS configuration options:
| Key | User value required? | Value type | Default value | Description |
| --- | --- | --- | --- | --- |TLS configuration options
| key_pem_path
| No | string
| ""
| A path to a file containing a private key that will be used to secure connections to the Snorkel platform. All services included in the Snorkel platform will be proxied by an Envoy instance using this key on different ports. Defaults to a key and Snorkel-issued self-signed certificate. |
| cert_pem_path
| No | string
| ""
| A path to a file containing a certificate that will be presented to clients connecting the Snorkel platform. All services included in the Snorkel platform will use this certificate on different ports. Defaults to a Snorkel-issued self-signed certificate. Note: it is strongly recommended that you issue and configure your own certificate that matches any hostname you set up for Snorkel Flow to ensure a secure connection. |
The following table details Kubernetes specific configuration options. Two performance options engine_dask_replicas
and studio_dask_replicas
are available for creating additional pods, however, it is not recommended to change the default parameter. Please contact support@snorkel.ai if performance degradations occur.
| Key | User value required? | Value type | Default value | Description |
| --- | --- | --- | --- | --- |K8s configuration options
| engine_dask_replicas
| No | int
| 2
| Advanced: setting for creating additional engine_dask pods |
| studio_dask_replicas
| No | int
| 1
| Advanced: setting for creating additional studio_dask pods |
Advanced configuration options
Additional configuration options are available through manually editing lower-level platform configuration files, though this should be avoided in most circumstances. If you find yourself frequently using any of these options, submit a feature request to Snorkel AI to add it to the user configuration file. For single server installations, advanced configuration options are applied by manually editing the docker-compose.yml
found in the platform work_directory
. To prevent the platform from overwriting manual edits when starting, use the --no-overwrite
option for every subsequent snorkel-install bootstrap
call. After each platform upgrade, the manual edits will be overwritten with the default settings and will need to be reapplied. For Kubernetes installations, advanced configuration options are applied by manually editing and reapplying the Kubernetes configuration files.
Configuring TLS for Snorkel Flow
Note
Kubernetes installations may also configure TLS by modifying ingress configurations to match their infrastructure. The provided instructions apply to single-host installations of Snorkel Flow.
Note
Podman installations that configure TLS will not provide any access on the privileged port 443. Snorkel Flow supports TLS (https://) connections for the main web UI, MinIO, and Notebook interfaces. In addition, Snorkel supports TLS usage in the SDK when communicating with the TDM and Studio APIs. These can be enabled by performing the following:
- Derive a public/private key pair and a certificate chain that can be used to terminate TLS. The certificate should be signed by a trusted authority and include a Subject Alternative Name matching the domain/host where you will be hosting Snorkel Flow (the
host_ip
setting). A single, non-wildcard certificate is all that is required. You can use any service or authority you trust to sign your key. For more information on how to set up your keys, please follow a tutorial online such as this one. - Upload your private key file and certificate file as .pem files on a filesystem accessible by the
snorkel-install
CLI when running installation or upgrade commands. - Set the
tls
key and certificate paths in the configuration. You can use thesnorkel-install config set
CLI to do this:
CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH tls.key_pem_path /path/to/private/key.pem # replace with the path to the key
snorkel-install config set --config-file $CONFIG_PATH tls.cert_pem_path /path/to/private/cert_chain.pem # replace with the path to the certificate
- The next time you run
snorkel-install bootstrap
, these keys will be made available to Envoy, a load balancer and service proxy that provides TLS support for our services.
HTTPS connectivity for all services is available through Envoy. The following table shows which plaintext ports map to TLS ports hosted on Envoy.
Service | Plaintext Port | TLS Port |
---|---|---|
Flow UI | 80 | 443 |
Minio | 8989 | 18989 |
Notebook | 8888 | 18888 |
Studio API | 8484 | 18484 |
TDM API | 8686 | 18686 |
Admin | 7777 | 17777 |
Configuring a MinIO Password
Snorkel Flow supports the ability to use a custom access and secret key for MinIO. This can be enabled by performing the following before booting up the app the first time:
-
Use
snorkel-install config set
to set theminio
fields in the Snorkel Flow configuration file:CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH minio.minio_access_key [ACCESS_KEY]
snorkel-install config set --config-file $CONFIG_PATH minio.minio_secret_key [SECRET_KEY] -
When you run
snorkel-install bootstrap
to boot up the app, the MinIO instance in Snorkel Flow will use the credentials specified above.
Configuring a Grafana Password
Snorkel Flow supports the ability to use a custom username and password for Grafana. This can be enabled by performing the following:
-
Use
snorkel-install config set
to set thegrafana
fields in the Snorkel Flow configuration file:CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH grafana.admin_username [USERNAME]
snorkel-install config set --config-file $CONFIG_PATH grafana.admin_password [PASSWORD] -
When you run
snorkel-install bootstrap
to boot up the app, the Grafana instance in Snorkel Flow will use the credentials specified above.
Configuring an External Postgres for Snorkel Flow
Snorkel Flow supports the ability to use an external Postgres (e.g. an AWS RDS instance) instead of the in-platform Postgres as its database. This can be enabled by performing the following:
-
Create the external Postgres instance, and take note of the username, password, and URL.
-
Use
snorkel-install config set
to set theexternal_postgres
fields in the Snorkel Flow configuration file:CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.username [USERNAME]
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.password [PASSWORD]
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_postgres.url [URL] -
The next time you run
snorkel-install bootstrap
, Snorkel Flow will use the postgres specified above. Existing data that lives in the in-cluster Postgres (if any) will NOT be transferred over.
Configuring an External Redis for Snorkel Flow
Snorkel Flow supports the ability to use an external Redis (e.g. an AWS ElastiCache instance) instead of the in-platform Redis. This can be enabled by performing the following:
-
Create the external Redis instance, and take note of the url and password (if any). Also, take note if SSL (e.g. encryption in transit) is enabled on the Redis instance.
-
Use
snorkel-install config set
to set theexternal_redis
fields in the Snorkel Flow configuration file:CONFIG_PATH=path/to/snorkel_config.yaml
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.url [URL]
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.ssl [True|False]
snorkel-install config set --config-file $CONFIG_PATH external_plugins.external_redis.password [PASSWORD] -
The next time you run
snorkel-install bootstrap
, Snorkel Flow will use the redis specified above. Existing data that lives in the in-cluster Redis (if any) will NOT be transferred over.
Changing the MinIO password
By default, the MinIO access key and secret key are both set to snorkeladmin
. To change them, edit the docker-compose.yaml
file if using a single server installation or the Kubernetes configuration deployment file for MinIO if using Kubernetes. Instructions for single-server installations are shown below:
- Shutdown the platform:
snorkel-install shutdown
- Edit the
docker-compose.yaml
file to add environment variables for the MinIO service calledMINIO_ACCESS_KEY_OLD
andMINIO_SECRET_KEY_OLD
where the values are set to the current access key and secret key, respectively. Update theMINIO_ACCESS_KEY
andMINIO_SECRET_KEY
environment variables to be the desired access key and secret key, respectively. See the MinIO documentation for updating credentials. - Restart the platform without overwriting the configuration file:
snorkel-install bootstrap --no-overwrite
- Once the platform has restarted, remove the
MINIO_ACCESS_KEY_OLD
andMINIO_SECRET_KEY_OLD
environment variables from thedocker-compose.yaml
file to avoid double encryption of data.
Upgrading Snorkel Flow
For single-server installations, the snorkel-install
CLI can upgrade the platform either automatically given access to the snorkelflow-whl
container image or manually. In both cases, the new platform container images need to be loaded on the local Docker client or accessible to pull from Docker Hub.
Upgrading using the snorkelflow-whl
container image
If your snorkelflow
Python package is installed via the snorkelflow-whl
container image, run snorkel-install upgrade --version <version>
, replacing <version>
with the new version. If the upgrade succeeds, the CLI should print out a success message.
Upgrading manually
Use the following procedure to upgrade manually:
- Run
snorkel-install shutdown
- Run
snorkel-install config set version <version>
, replacing<version>
with the new version - Reinstall the new
snorkelflow
Python package using the same procedure as the original platform installation - Run
snorkel-install bootstrap
If the upgrade succeeds, the CLI should print out a success message.
Uninstalling Snorkel Flow
To uninstall Snorkel Flow and remove all files, use the following procedure for single-server installations:
- Run
snorkel-install shutdown --hard
and answer the prompt - Remove Snorkel Flow-created directories in the configured mount directory
- Remove the configured work directory
- Uninstall the
snorkelflow
Python package - Remove the Snorkel Flow container images
Network topology of Snorkel Flow
A mapping of ports that have been opened and made available. These ports may be changed or expanded in future releases. The “Network or Localhost Only” indicates whether the port needs to be routable via other hosts on the network or if it just needs to be available on the loopback interface. It is sufficient to limit firewall rules to only permitting those ports designated as “Network” ports.
Service | Description | Plaintext Port | TLS Port | Network or Localhost Only |
---|---|---|---|---|
Flow UI | The primary web interface for Snorkel. | 80 | 443 | Network |
Envoy | Administrative portal for Envoy, which is used for terminating TLS. | 9901 | Localhost Only | |
Flow UI | Alternate plaintext port for Flow. Used for reverse-proxying. | 8900 | Localhost Only | |
Grafana | Metrics dashboard to help diagnose problems in Snorkel. | 3000 | Network | |
InfluxDB | Metrics storage backend for Grafana. | 9999 | Localhost Only | |
Minio | File management UI. | 8989 | 18989 | Network |
Notebook | Jupyter Notebook server, exposed via Flow UI. | 8888 | 18888 | Network |
Notebook API | Internal management API for Notebooks. | 8282 | Localhost Only | |
Studio API | APIs controlling studio functions. | 8484 | 18484 | Network |
TDM API | APIs controlling training data. | 8686 | 18686 | Network |
Admin | Administrative portal for Snorkel. | 7777 | 17777 | Network |
Engine | Debugging view for asynchronous job execution of up to 64 workers. | 12000-12063 | Network |