Using data slices
A slice is a filtered subset of data rows that share a specific characteristic, like a topic, language, or error type. By using filters, users can create slices to focus on high-impact data areas for analysis, debugging, and targeted model training.
There are two types of slices:
- Manual slices: Individual datapoints are directly assigned to a slice.
- Programmatic slices: Applies a slicing function across the dataset to automatically group relevant datapoints. Slicing functions can be composed from pre-built templates.
Slices have some limitations:
- You cannot create batches from a slice.
- Slices cannot be directly used for model training.
- Slice definitions cannot be exported between projects.
When to use slices
There are two primary use cases for using slices within Snorkel Flow—annotator and data scientist.
Annotator
As an annotator, you can create slices to focus on specific data segments that need careful labeling or review, which helps prioritize data that aligns with specific project goals.
- For more information about how to annotate, see Walkthrough for annotators.
- For more information about how to review annotations, see Walkthrough for reviewers.
Data scientist
As a data scientist, you can use slices to analyze model performance on specific subsets and improve model accuracy in targeted areas. In the Evaluate page, data scientists can track model performance across slices, helping identify where the model is excelling or needs improvement. In Studio, data scientists can filter data by slice to focus on high impact subsets for training and error analysis. Continue reading to learn more about how to create and use slices through the UI and SDK.
Where to find slices
- Annotation Suite
- Studio Page
- Evaluate Page
- Batches
Edit slice membership
You can search for a slice, add a new slice, or check and uncheck current slices for the specific datapoint.
-
Select the Slices icon:
-
Select the Edit slice membership button.
-
Edit the slice membership:
- If you want to search for a slice, type the slice name in the Search or add slice bar:
- If you want to add a new slice, type the slice name in the Search or add slice bar and select Create new slice:
- If you want to check or uncheck current slices for the specific datapoint, select the checkbox next to the slice name.
Manage slices
You can search for a slice, delete a slice, or edit the name of a slice.
-
Select the Slices icon:
-
Select the Manage slices button.
-
Manage the slice:
- To search for a slice, type the slice name in the Search bar:
- To delete a slice, select the Delete icon to the left of to the slice name:
- To edit the name of a slice, type the updated name in the specific slice bar.
Manage slices using the SDK
In the SDK, use the slice
class to create and manage subsets of data slices within a dataset. Slices can be manually or programmatically defined and accessed through the create()
and get()
methods.
-
Initialize a slice. Use
Slice.create()
to define a new slice by specifying the dataset, name, and optional description. Add a configuration (SliceConfig
) to define a slicing function, allowing for automatic assignment of datapoints to the slice based on criteria. This approach is programmatic slices. -
Add or remove datapoints.
- Manually: Use
add_x_uids(x_uids)
to add specific datapoints by UID to a slice. - Programmatically: Configure the slice using
SliceConfig
to automatically assign datapoints based on the criteria.
- Manually: Use
-
Retrieve and list slices:
- Use
Slice.get(dataset, slice)
to retrieve a specific slice. - Use
Slice.list(dataset)
to list all slices associated with a dataset.
- Use
-
Updating and managing slices:
- Modify properties like name or description using
update([name, description, config])
. - Remove datapoints by UID from a slice using
remove_x_uids(x_uids)
.
- Modify properties like name or description using
Programmatic slices
To create a programmatic slice through the Python SDK, write a slicing function by using Slice.create
and the SliceConfig
object that defines the criteria for including datapoints in a slice.
This code snippet is an example of how to create a programmatic slice using the regex template:
from snorkelflow.sdk.fine_tuning_app import FineTuningApp
from snorkelflow.sdk.slices import Slice, SliceConfig
from templates import RegexTemplateSchema
from snorkelflow.utils.graph import DEFAULT_GRAPH
app_name = 'evaluation_workflow_example'
ft_app = FineTuningApp.get(app_name)
topic_disputes_slice = Slice.create(name="topic_disputes",
dataset=ft_app.dataset_uid,
config=SliceConfig(
templates=[
RegexTemplateSchema(
field="question",
regex_pattern=r"\b(appeal|appealed|dispute|disputed|disputes)\b",
case_sensitive=False,
)
],
graph=DEFAULT_GRAPH,
)
)
- Slice Overrides: Allows you to adjust slice memberships manually when an automatic slicing function doesn't fully capture the desired datapoints. When viewing a datapoint, you can use the checkbox to manually override its inclusion in a slice, either adding it to a slice it missed or removing it from one it shouldn't be in.
- Inclusion List: Ensures specific datapoints are part of a slice, regardless of the slicing function's criteria.
- Exclusion List: Excludes specific datapoints from a slice, even if they meet the automatic slicing function's criteria.