Application data control pane
This page walks through the functionality and settings of the application data control pane.
The application data control pane includes controls for the following actions:
Allocate dedicated resources
If you feel that your dataset is particularly large and is slowing down your application, you can click Shared resources to control the resources allocated per application. Dedicated resources caches your dataset in memory where it can be fetched and operated on quickly. For more information about dedicated resources, see Using dedicated resources to speed up applications with large datasets.
Compute embeddings
If available for your use case, you generate embeddings using SimCSE or RAG. You can use these embeddings to create labeling functions (LFs) during model development. For more information about embeddings, see Utilizing embeddings and Embedding based cluster LFs.
Select data splits and resample data
Use the data splits dropdown to change the active data split or resample data.
Select split
Click the split dropdown to change which split is currently enabled in Application Studio.
Resample data
The longer you iterate, the more likely your LFs are to be tuned to the particular data points that you’ve labeled in your dev set (i.e., overfitting). For this reason, it’s a good practice to periodically refresh your dev set by resampling your dev set from the train split.
Click Resample data to open a modal with various resampling options:
- Sample size: The approximate number of data points to sample from a split.
- Max labeled: The maximum number of labeled data points to include in the sample.
- Min per class: The minimum number of data points per class to include in the sample.
- Random seed: A random integer seed to use for deterministic sampling.
Export data
Click Export to open a modal with various export options.
Commit model node
Click Commit model node to save a specified model into the application pipeline. You must commit your model if you want to deploy your application. For more information about deployments, see Deploying Snorkel-built models.
Export training set labels
Click Export training set labels to export a specified training set into a CSV file. You can specify which columns to export as well as one or more splits to include data from.
Export Studio dataset
Click Export Studio dataset to export your dataset into a CSV file. You can specify what columns to export as well as which model to include predictions from.