Skip to main content
Version: 0.94

Manage the data sources for a model

This article provides an overview of the Snorkel Flow features that allow you to manage the data sources for a model after app creation. For example, you might need to update the data sources for a model if your dataset contains data points that are not important for your use case. Or, you might need to upload ground truth for a data source.

To access this page, first navigate to the overview page for your app, then select the View Data Sources button.

View Data Sources

Datasources Table

The main table on this page gives you a way to toggle the data sources that are enabled for a model. Every data source in your app is shown as a row in the table. The table also displays information about each of the data sources, including the count of data points, count of ground truth (GT) associated with each data source, the data source identifier, and the storage path for the data source file.

Datasources Table

Select one or more data sources in the table and use the Enable and Disable buttons to enable/disable those data sources for this model. This will affect the data points that are used for training the model.

You can also enable or disable data sources from being used for dev split sampling. The dev split is a collection of data points that are used to build and evaluate the performance of our model and labeling functions (LFs). Because these datapoints are not used during training, you might want to exclude certain data sources from being used for dev split sampling.

Uploading ground truth

Select the Upload GTs button to upload GT for your model. This is done by specifying a path to a file that contains ground truth labels, or by uploading the file directly. Be sure to carefully follow the instructions in the dialog, as each task type has different requirements for the format of the uploaded file. See the Upload ground truth article for more information.

note

Uploading new GT values will overwrite existing ones.