Data preparation
Before uploading your data to Snorkel Flow, we recommend that you go through the basic preparation steps outlined in this page (in addition to any custom prepro...
Supported data source types
This page provides information about the data source types that Snorkel Flow supports. You can follow the steps outlined in Data preparation and Data upload to ...
Upload ground truth
We will demonstrate how to upload ground truth (GT) in Snorkel Flow for various applications. Below, we describe the two types of GT in Snorkel Flow (document-l...
Uploading a dataset
Snorkel Flow organizes data into data sources and datasets:
Uploading files to file collections
User files are the assets related to data points in a given data source. For example, if you have a PDF application, the user files would be the actual .pdf fil...
Scanned PDF guide
Scanned PDF documents are created from scanned images of printed documents. They don't contain information about the text in the document and where it is locate...
Manage the data sources for a model
This article provides an overview of the Snorkel Flow features that allow you to manage the data sources for a model after app creation. For example, you might ...
Managing file collections
Snorkel Flow provides options for managing the files in your datasets. After uploading a dataset, select the Files tab to manage and view the associated files. ...
Working with MinIO
Notice of deprecation
Re-split data
Sometimes, new data can be added to a dataset, or the data distribution can change in an application. When this happens, we may want to resplit the dataset in a...
Tips for splitting and partitioning data
When working with a new dataset, one of the most important steps is to create three representative splits of data.
Training set overview: Review your training sets
This page shows you a list of the training sets that have been created from your LF packages. For a given training set, you can edit its name and view summary s...