Skip to main content
Version: 0.93

Single-Node Installation

note

For environments disconnected from the Internet, please use Airgapped Installation.
For a GPU-enabled environment, please see GPU-enabled Installation.

Prerequisites

Before installing Snorkel Flow, make sure the following have been done.

  • You have sent Snorkel AI a Docker ID that you’ll use for the initial installation and updates to support@snorkel.ai.
  • You have permissions to allocate an AWS EC2 instance with the required hardware specs as described in the System Requirements section.

System Requirements

CategoryTypeMinimum Requirement
Hardware Recommended: AWS EC2 m5.8xlarge or similar with 2TB+ hard driveProcessor64-bit
 CPU16 CPU Recommended: 32+ CPU
 RAM64 GB Recommended: 128GB+
 Available disk space1 TB (not including data) Recommended: 2TB+
Operating system Recommended: AWS EC2 Ubuntu 20.04 Base AMI or similarOperating system
  • Ubuntu 18.04
  • CentOS 7
  • Amazon Linux

| | Software | Docker | 18.09+ | |   | docker-compose | 1.24.0 | |   | Python | 3.8+ | |   | pip | 19.0+ |

System Dependencies

The following example commands show how to install software requirements from the System Requirements spec on an AWS EC2 Ubuntu 18.04 Base AMI. Software installation on other Debian-based Linux images should be similar. If these have already been installed via other means, you may skip this step.

Update apt, install pip and Snappy

sudo apt update
sudo apt install python3-pip python-pip-whl

Run:

 sudo apt update

In some distributions, sometimes the "fsspec" module is not present. The command below ensures that the module gets installed:

 pip3 install fsspec

Install Docker and docker-compose

Add your user to the Docker user group using the command below. Log out and back in afterward for changes to take effect.

sudo usermod -aG docker $USER

Python3 vs. Conda

This installation utilizes standard Python3 libraries during setup. If some steps are not working, you may be using conda binaries and need to switch back to using the standard python library. You can do so by removing or commenting out the following in your ~/.bashrc and restarting your installation bash session:

 >>> conda initialize >>>
!! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/home/centos/anaconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/home/centos/anaconda3/etc/profile.d/conda.sh" ]; then
. "/home/centos/anaconda3/etc/profile.d/conda.sh"
else
export PATH="/home/centos/anaconda3/bin:$PATH"
fi
fi
unset __conda_setup
<<< conda initialize <<<

Accessing Docker Hub

This login step lets you download the command-line interface (CLI) and Docker images required to run Snorkel Flow.

docker login # Enter credentials for Docker ID provided to Snorkel AI

Install the snorkelflow Python package

The snorkelflow SDK is bundled in a Docker image called snorkelai/snorkelflow-whl, hosted on Docker hub. It is recommended to extract the snorkelflow SDK by running the following:

VERSION=<snorkelflow version>
LOCAL_WHL_PATH=/tmp/whl
mkdir -p $LOCAL_WHL_PATH
CONTAINER_ID=$(docker create snorkelai/snorkelflow-whl:$VERSION bash)
docker cp $CONTAINER_ID:/ $LOCAL_WHL_PATH

replacing <snorkelflow version> with the provided version of snorkelflow using the wheel file located in $LOCAL_WHL_PATH.

You can now install the snorkelflow SDK by running:

python3 -m pip install $LOCAL_WHL_PATH/snorkelflow-$VERSION-py3-none-any.whl[install]

Installing extras

Several extras are defined for the snorkelflow SDK which specifies external libraries needed for extra functionality.

  • data: dependencies needed for the dataset loading utilities in snorkelflow
  • debug: dependencies needed to generate support bundles for debugging performance
  • kubernetes: dependencies needed to generate Kubernetes configurations

For example, to install the data and debug extras, run:

python3 -m pip install \
$LOCAL_WHL_PATH/snorkelflow-$VERSION-py3-none-any.whl[data,debug]

You may need to install a few additional dependencies, such as gcc-c++ and python3-devel, to compile the extensions required. You should now have the snorkel-install command line utility on your pip path. This can be verified by running which snorkel-install. If the command is not found, check that your pip virtual environment has been correctly configured.

Generate a Snorkel Flow config

User installation settings for the Snorkel Flow platform are stored in a file called snorkel-flow.yaml. The following instructions will generate a default snorkel-flow.yaml file. For instructions on customizing installation settings, contact a Snorkel AI team member.

HOST_IP=<DNS-resolvable hostname or IP address of the server running Snorkel Flow>
LOCAL_DATA_PATH=<absolute path to a directory that Snorkel Flow will read and write data to. It is recommended to make this data a persistent volume>
WORK_DIR=<absolute path to a directory that Snorkel will use during installation. Some sensitive data is stored in this directory and should not be the same as the LOCAL_DATA_PATH.>
PROJECT_NAME=<A short, unique, alphanumeric name for this instance of Snorkel Flow, such as “snorkeldemo” or “snorkelproduction”>
VERSION=<The version of Snorkel Flow being installed. This should be the same as the value specified in the previous step>
snorkel-install generate-config \
--host-ip $HOST_IP \
--mount-directory $LOCAL_DATA_PATH \
--work-directory $WORK_DIR \
--project-name $PROJECT_NAME \
--version=$VERSION \
--path snorkel-config.yaml

The HOST_IP option may be the DNS name for an external proxy, such as an ELB, so long as all relevant service ports Snorkel Flow exposes are available on the proxy. You can view a full list of service ports on the in-app user guide.

If the data files you want to use are on your local file system (as opposed to access via other protocols like HTTP(S) or AWS S3), you’ll need to set up local file support. All local files should be under a single parent directory (but can be organized in subdirectories). This is enabled by setting the LOCAL_DATA_PATH option and setting up a mount in this location.

Install Snorkel Flow

Finally, we’ll use the snorkel-install CLI to install the full Snorkel Flow platform. Snorkel Flow is shipped as a collection of Docker containers orchestrated by docker-compose. The snorkel-install command below will pull the container images from Snorkel AI’s private registries and run them using docker-compose.

snorkel-install bootstrap -c snorkel-config.yaml --overwrite

If you see an output that says

💫 Snorkel Flow bootstrap successful!

then congratulations, you have successfully installed Snorkel Flow! Navigate to http://<HOST-IP> in your browser to access the Snorkel Flow interface. You will need to finalize the installation by adding a license key.

Air-gapped Installation

Snorkel AI supports Single-Node Installation in environments without connection to the Internet by providing all necessary binaries and images to be run.

See additional requirements here.

Prerequisites

Before attempting an air-gapped install, please ensure you have a compressed .tar.gz containing your installation files. If you do not have one, please reach out to your Account Executive at Snorkel AI to have one delivered. Please also ensure your intended VM meets the requirements detailed in System Requirements.

Installation

  1. Copy the provided tarball to your VM's home directory

  2. Extract your delivery

    tar -xzf <your_delivery>.tar.gz
  3. Create a Python3 Virtual Environment

    python3.8 -m venv .env
  4. Activate the environment

    source .env/bin/activate 
  5. Run the installation script

    ./install.sh --install-deps
  6. Exit out of the instance and log back in to fix user groups.

  7. Re-activate the Virtual Environment

    source .env/bin/activate

  8. Make your work directories and mount directories ( mkdir /home/centos/workdir and mkdir /home/centos/mountdir if you just want them in the /home/centos/ directory, but you can place them wherever is preferable including attached block storage).

  9. Set the following environment variables in your terminal session (if you're unsure of the --version you currently have they depend on the tagfor the Snorkel Flow images you have when you list docker images:

    IP=<your instance public IP or public url you intend to use>
    VERSION=<your version number, it would be formatted like 0.52.12-cuda>
    WORKDIR=<path to workdir> MOUNTDIR=<path to mountdir>
    PROJECTNAME=<the project name you want to use for this install>
  10. Run snorkel-install generate-config with the correct parameters (use whatever you want for project name, it will be used in docker resource naming) :

    snorkel-install generate-config --host-ip $IP -d $MOUNTDIR \
    --work-directory $WORKDIR --project-name $PROJECTNAME \
    --version $VERSION --path ~/.snorkel-flow.yaml --overwrite

    For additional configuration options (including TLS configuration) please refer to Snorkel Flow Configuration

  11. Set your configuration:

    snorkel-install config set --config-file \
    /home/centos/.snorkel-flow.yaml image_name_suffix ""
  12. Run your installation:

    snorkel-install config set --config-file \
    /home/centos/.snorkel-flow.yaml image_name_suffix ""
  13. Go to the Snorkel Flow application at https://<your_IP> and either upload your license.key through the provided web UI or use the command line tool:

    snorkel-install license update license.key 
  14. Ensure your platform is running through the web UI at https://<your_IP>.

Upgrade Existing Instance

For an existing instance, you can simply load new containers into your local images and re-run a snorkel install specifying those images.

If your environment is using conda please follow directions in Python3 vs. Conda

  1. Copy the provided tarball to your VM's home directory

  2. If they exist from a previous installation, remove the following directories and files: snorkel containers, snorkel-docker-build, testing-suite, snorkelflow-whl, snorkelflow-whl-dependencies, install.sh, requirements.txt

  3. Extract your delivery:

    tar -xzf iqt-<version>.tar.gz 
  4. If you don't already have one, create a Python3 Virtual Environment python3.8 -m venv .envand activate the environment source .env/bin/activate

  5. Delete all previous images:

  6. Run the install script ./install.sh --install-deps

  7. Shutdown the previous install snorkel-install shutdown -c <path to config>

  8. Edit the version within your config file to match the new version of Snorkel Flow you've just loaded.

  9. Run snorkel-install bootstrap -c <path to config> --overwrite

  10. Navigate to the web UI to confirm that the platform is running.

GPU-Enabled Installation

If you’d like to install a GPU-enabled instance, you’ll need to add "default-runtime": "nvidia" to /etc/docker/daemon.json somewhere in the top level of that JSON file. For example:

$ cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}

Restart the docker service: sudo service docker restart

You can confirm that GPU now works by running

docker run --rm nvidia/cuda:11.0-base nvidia-smi

(or use whatever nvidia/cuda image is available to your organization).

$ docker run --rm nvidia/cuda:11.0-base nvidia-smi
Thu Nov 18 22:00:59 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.142.00 Driver Version: 450.142.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 22C P8 9W / 70W | 3MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Once the above steps have been verified, you can run the installation steps. Running Snorkel Flow containers will utilize this nvidia runtime. You can validate this using the following Python commands in the Snorkel Flow Jupyter Notebook:

import torch

torch.cuda.is_available()
>>> True

torch.cuda.current_device()
>>> 0

torch.cuda.device(0)
>>>

torch.cuda.device_count()
>>> 1

torch.cuda.get_device_name(0)
>>> 'GeForce GTX 950M'

Note
If you already started a snorkel cluster before doing the above, you’ll have to rebootstrap the cluster to enable GPU support.

Note
If the "nvidia" runtime does not already exist in /etc/docker/daemon.json, you may have to follow these instructions to enable it: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker