Skip to main content
Version: 0.95

Using data slices

A slice is a filtered subset of data rows that share a specific characteristic, like a topic, language, or error type. By using filters, users can create slices to focus on high-impact data areas for analysis, debugging, and targeted model training.

There are two types of slices:

  • Manual slices: Individual datapoints are directly assigned to a slice.
  • Programmatic slices: Applies a slicing function across the dataset to automatically group relevant datapoints. Slicing functions can be composed from pre-built templates.
note

Slices have some limitations:

  • You cannot create batches from a slice.
  • Slices cannot be directly used for model training.
  • Slice definitions cannot be exported between projects.

When to use slices

There are two primary use cases for using slices within Snorkel Flow—annotator and data scientist.

Annotator

As an annotator, you can create slices to focus on specific data segments that need careful labeling or review, which helps prioritize data that aligns with specific project goals.

Data scientist

As a data scientist, you can use slices to analyze model performance on specific subsets and improve model accuracy in targeted areas. In the Evaluate page, data scientists can track model performance across slices, helping identify where the model is excelling or needs improvement. In Studio, data scientists can filter data by slice to focus on high impact subsets for training and error analysis. Continue reading to learn more about how to create and use slices through the UI and SDK.

Where to find slices

  • Annotation Suite
  • Studio Page
  • Evaluate Page
  • Batches

Edit slice membership

You can search for a slice, add a new slice, or check and uncheck current slices for the specific datapoint.

  1. Select the Slices icon: Slices-Icon

  2. Select the Edit slice membership button.

  3. Edit the slice membership:

    • If you want to search for a slice, type the slice name in the Search or add slice bar: Search-Add-Slice
    • If you want to add a new slice, type the slice name in the Search or add slice bar and select Create new slice: Create-New-Slice
    • If you want to check or uncheck current slices for the specific datapoint, select the checkbox next to the slice name.

Manage slices

You can search for a slice, delete a slice, or edit the name of a slice.

  1. Select the Slices icon: Slices-Icon

  2. Select the Manage slices button.

  3. Manage the slice:

    • To search for a slice, type the slice name in the Search bar: Search-Slices
    • To delete a slice, select the Delete icon to the left of to the slice name: Delete-Slice
    • To edit the name of a slice, type the updated name in the specific slice bar.

Manage slices using the SDK

In the SDK, use the slice class to create and manage subsets of data slices within a dataset. Slices can be manually or programmatically defined and accessed through the create() and get() methods.

  1. Initialize a slice. Use Slice.create() to define a new slice by specifying the dataset, name, and optional description. Add a configuration (SliceConfig) to define a slicing function, allowing for automatic assignment of datapoints to the slice based on criteria. This approach is programmatic slices.

  2. Add or remove datapoints.

    • Manually: Use add_x_uids(x_uids) to add specific datapoints by UID to a slice.
    • Programmatically: Configure the slice using SliceConfig to automatically assign datapoints based on the criteria.
  3. Retrieve and list slices:

    • Use Slice.get(dataset, slice) to retrieve a specific slice.
    • Use Slice.list(dataset) to list all slices associated with a dataset.
  4. Updating and managing slices:

    • Modify properties like name or description using update([name, description, config]).
    • Remove datapoints by UID from a slice using remove_x_uids(x_uids).

Programmatic slices

To create a programmatic slice through the Python SDK, write a slicing function by using Slice.create and the SliceConfig object that defines the criteria for including datapoints in a slice.

note

We only support Regex and Keyword templates for programmatic slices.

This code snippet is an example of how to create a programmatic slice using the regex template:

from snorkelflow.sdk.fine_tuning_app import FineTuningApp
from snorkelflow.sdk.slices import Slice, SliceConfig
from templates import RegexTemplateSchema
from snorkelflow.utils.graph import DEFAULT_GRAPH

app_name = 'evaluation_workflow_example'
ft_app = FineTuningApp.get(app_name)
topic_disputes_slice = Slice.create(name="topic_disputes",
dataset=ft_app.dataset_uid,
config=SliceConfig(
templates=[
RegexTemplateSchema(
field="question",
regex_pattern=r"\b(appeal|appealed|dispute|disputed|disputes)\b",
case_sensitive=False,
)
],
graph=DEFAULT_GRAPH,
)
)
  • Slice Overrides: Allows you to adjust slice memberships manually when an automatic slicing function doesn't fully capture the desired datapoints. When viewing a datapoint, you can use the checkbox to manually override its inclusion in a slice, either adding it to a slice it missed or removing it from one it shouldn't be in. Slice-Overrides
  • Inclusion List: Ensures specific datapoints are part of a slice, regardless of the slicing function's criteria.
  • Exclusion List: Excludes specific datapoints from a slice, even if they meet the automatic slicing function's criteria.