Data preparation
Before uploading your data to the Snorkel AI Data Development Platform, we recommend that you go through the basic preparation steps outlined in this page (in a...
Supported data source types
This page provides information about the data source types that the Snorkel AI Data Development Platform supports. You can follow the steps outlined in Data pre...
Uploading ground truth
This topic demonstrates how to upload ground truth (GT) in the Snorkel AI Data Development Platform. There are two types of GT in the Snorkel AI Data Developmen...
Uploading a dataset
The Snorkel AI Data Development Platform organizes data into data sources and datasets:
Uploading files to file collections
User files are the assets related to data points in a given data source. For example, if you have a PDF application, the user files would be the actual .pdf fil...
Managing file collections
The Snorkel AI Data Development Platform provides options for managing the files in your datasets. After uploading a dataset, select the Files tab to manage and...
Working with MinIO
Notice of deprecation
Tips for splitting and partitioning data
When working with a new dataset, one of the most important steps is to create three representative splits of data.
Using data slices
A slice is a filtered subset of data rows that share a specific characteristic, like a topic, language, or error type.