Snorkel AI Data Development Platform v25.8 (STS) release notes
Breaking changes
SDK
SDK module, function, class, and parameter list at bottom of documentation
Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.
This list is long for 25.8, so refer to the SDK removals section at the end of this document.
Infrastructure
-
JupyterHub is now disabled by default and will be removed in a future release (see deprecation notice below). If you rely on Snorkel-hosted notebooks, enable JupyterHub explicitly in your deployment configuration before upgrading.
To re-enable in-platform notebook support, change the following flag in your Helm
values.yaml
file:Name Value services.jupyterhub {"enabled": true}
-
The
ExploreNotebook
example notebook was removed because it no longer contains relevant examples. -
MinIO was removed from the default deployment. Read about your options for Managing files in data storage.
Data management
SliceConfig
removed. Migrate any code that imports or constructsSliceConfig
to the new template-based slice APIs for create/update.SliceConfig
- Replaced withSliceTemplateGraph
in slice operationsSliceConfigTemplatesItem
- Replaced withSliceTemplateGraphTemplatesItem
- Tag IDs no longer identify slices. If you previously referenced slices via
tag IDs, switch to using
slice_uid
everywhere. - Transform types are now standardized. If your code branches on transform
types, update it to use the
type
discriminator exposed by the SDK and API.
Deprecations
Infrastructure
Built-in notebooks will be fully removed in 25.11. We recommend running the Snorkel SDK locally with your chosen notebook platform, and exporting the contents of any notebooks currently hosted by Snorkel to import into your platform. Follow the SDK quickstart to get started with running the SDK locally.
Features and improvements
Evaluation
- You can view detailed error messages for evaluator execution and metric computation failures directly in Benchmark run details, helping you diagnose issues faster.
- You can view the datapoints in each error cluster during error-mode analysis while tuning LLMAJ evaluators. This can give you additional insight into why the error occurs. Read more in the Improve LLMAJ alignment documentation.
SDK updates for evaluation
You can programmatically run error analysis with the new ErrorAnalysis
class,
making it easier to script and automate evaluation workflows.
Data management
- Slices are simpler to create. You can create a new data slice from the GUI from an active search filter, and apply the slice to new data. Learn more about using data slices.
- You can continue to define slices from the SDK using regex-style templates,
but the format for those templates has been updated. See the
Slice
class documentation for details.
SDK
- The example
DefaultNotebook
is updated.
Docs
You can read the SDK reference docs for the Taxonomy Distillation SDK, an SDK extension for distilling taxonomies discovered by large language models into smaller, more efficient models. Reach out to your Snorkel contact to access this extension.
Bug fixes
Data management
- All users can list usable connectors without elevated permissions.
- Ingesting 100 or more data sources succeeds reliably.
- Running a filter that matches no rows no longer fails.
- Data filters accept
context_uid
without error.
Annotation
- The batch name displays correctly on the first load of the annotation page.
- Exporting multiple batches produces multiple batch exports, as expected.
Evaluation
- When viewing traces, the ground truth now renders correctly for all datapoints.
SDK
Dataset.get_dataframe()
returns more than 100 rows as expected.LabelSchema.create()
supports multi-label use cases correctly.
Known issues
Data management
- The dataset size shown in the GUI does not always match the actual file size.
- Splitting a dataset by percentage can fail if file upload is disabled.
- Uploading large files from S3 shows an error on first submit during data ingestion, but works when submitted again.
- File upload errors are silently ignored on submit.
- Users without access to the default workspace receive a 404 error when trying to add a datasource to a dataset.
- Dataset preview incorrectly renders all content in rows.
User interface
- The breadcrumb navigation overlaps with page content while scrolling.
SDK removals
Entire modules removed
The annotation workflow has moved from function-based node-level operations to
object-oriented dataset-level batch management. Use the Batch
and
LabelSchema
classes in the develop
SDK instead.
snorkelai.sdk.client.annotations
- All annotation interfaces including:Annotation
classget_annotations
add_annotation
update_annotation
delete_annotation
add_annotations
aggregate_annotations
commit_annotations
get_interannotator_agreement
snorkelai.sdk.client.comments
- All comment interfaces including:create_comment
get_comment
delete_comment
edit_comment
delete_datapoint_comments
get_comments
snorkelai.sdk.client.transfer
- All data transfer functions including:export_ground_truth
import_ground_truth
convert_span_gt_csv_to_span_format
transfer_comments
transfer_tags
export_tag_types
import_tag_types
snorkelai.sdk.client.gts
- All ground truth label interfaces including:create_ground_truth_version
list_ground_truth_versions
load_ground_truth_version
delete_ground_truth_version
add_ground_truth
get_ground_truth
get_document_ground_truth
get_span_level_ground_truth_conflicts
align_external_ground_truth
snorkelai.sdk.client.file_storage_configs
- All file storage configuration functions including:get_file_storage_configs
get_file_storage_config
create_file_storage_config
set_default_file_storage_config
delete_file_storage_config
get_file_storage_config_uid
Functions removed from existing modules
snorkelai.sdk.client.utils
:get_application_uid
get_lf_uid
get_operator_uid
get_tag_type_uid
get_batch_uid
check_model_node
convert_fmt_user_to_raw
convert_fmt_user_to_raw_df
convert_fmt_raw_to_user
convert_fmt_raw_to_user_df
Classes replaced
SliceConfig
- Replaced withSliceTemplateGraph
in slice operationsSliceConfigTemplatesItem
- Replaced withSliceTemplateGraphTemplatesItem
Parameter changes
SnorkelSDKContext.from_endpoint_url
- Theminio_endpoint
parameter was removed and is no longer required.
Migration guidance for annotations
For annotation
workflows, migrate from the removed annotations module
functions to the develop
SDK:
- Replace
get_annotations()
withBatch.get_dataframe(include_annotations=True)
- Replace
commit_annotations()
withBatch.commit(source_uid, label_schema_uids)
- Use
Batch.create()
andLabelSchema.create()
for new annotation workflows