Version: 0.91

Snorkel Flow architecture overview

Snorkel Flow architecture comprises of several building blocks working in conjunction with each other as described in the illustration below.

Copy of DRAFT - Snorkel Flow Services Architecture August 2023.webp From a foundation level, the platform is an aggregation of the following services:

1. Jupyter Services

This is made up of the open source framework that is used to create individual notebook servers along with the open source proxy used by Jupyterhub.

2. User Interface

This is the Snorkel Flow Graphical User interface, where users can perform data labeling, data management, and annotation workflows within our on-platform UI.

3. Proxy

This is the open source service proxy that is the main entry point into the platform.

4. Batch Compute

This section comprises of the engine services for small jobs, workload execution, parallelization of distributed workload and model training.

5. Backend Services

This is made up of the tdm-api, which forms the backbone of Snorkel Flow central API services.

6. Storage Services

This layer is made up of Postgres that provides an open source RDBMS for tabular data, Redis open source in-memory cache, open source MinIO object storage and the Snorkel Flow Model Registry that is used to store MLFlow deployments.

7. Interactive Compute

This part of the platform is represented by studio-api that provides interactive data analysis and studio-ray which is the open source framework for the parallelization of distributed workloads.

8. Monitoring

This section provides telemetry and observability into the platform is is made up of Grafana, InfluxDB and Telegraf that provide an observability dashboard for platform metrics, a time series database and the collection of platform metrics respectively.

Implementation context

The Snorkel Flow platform is designed to be deployed on Kubernetes environments that are hosted on AWS (EKS), Azure (AKS) and GCP (GKE), as well as with platforms running on private cloud environments. For details on deployment options that are a good fit for you, please contact a Snorkel AI representative for discussing the choices that are available.

1. Jupyter Services​

2. User Interface​

3. Proxy​

4. Batch Compute​

5. Backend Services​

6. Storage Services​

7. Interactive Compute​

8. Monitoring​

Implementation context​