Snorkel AI Data Development Platform v25.8 (STS) release notes

Breaking changes

SDK

SDK module, function, class, and parameter list at bottom of documentation

Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.

This list is long for 25.8, so refer to the SDK removals section at the end of this document.

Infrastructure

JupyterHub is now disabled by default and will be removed in a future release (see deprecation notice below). If you rely on Snorkel-hosted notebooks, enable JupyterHub explicitly in your deployment configuration before upgrading.

To re-enable in-platform notebook support, change the following flag in your Helm values.yaml file:

Name Value
services.jupyterhub {"enabled": true}
The ExploreNotebook example notebook was removed because it no longer contains relevant examples.
MinIO was removed from the default deployment. Read about your options for Managing files in data storage.

Name	Value
services.jupyterhub	`{"enabled": true}`

Data management

SliceConfig removed. Migrate any code that imports or constructs SliceConfig to the new template-based slice APIs for create/update.
- SliceConfig - Replaced with SliceTemplateGraph in slice operations
- SliceConfigTemplatesItem - Replaced with SliceTemplateGraphTemplatesItem
Tag IDs no longer identify slices. If you previously referenced slices via tag IDs, switch to using slice_uid everywhere.
Transform types are now standardized. If your code branches on transform types, update it to use the type discriminator exposed by the SDK and API.

Deprecations

Infrastructure

Built-in notebooks will be fully removed in 25.11. We recommend running the Snorkel SDK locally with your chosen notebook platform, and exporting the contents of any notebooks currently hosted by Snorkel to import into your platform. Follow the SDK quickstart to get started with running the SDK locally.

Features and improvements

Evaluation

You can view detailed error messages for evaluator execution and metric computation failures directly in Benchmark run details, helping you diagnose issues faster.
You can view the datapoints in each error cluster during error-mode analysis while tuning LLMAJ evaluators. This can give you additional insight into why the error occurs. Read more in the Improve LLMAJ alignment documentation.

Data management

Slices are simpler to create. You can create a new data slice from the GUI from an active search filter, and apply the slice to new data. Learn more about using data slices.
You can continue to define slices from the SDK using regex-style templates, but the format for those templates has been updated. See the Slice class documentation for details.

SDK

The example DefaultNotebook is updated.

Docs

You can read the SDK reference docs for the Taxonomy Distillation SDK, an SDK extension for distilling taxonomies discovered by large language models into smaller, more efficient models. Reach out to your Snorkel contact to access this extension.

Bug fixes

Data management

All users can list usable connectors without elevated permissions.
Ingesting 100 or more data sources succeeds reliably.
Running a filter that matches no rows no longer fails.
Data filters accept context_uid without error.

Annotation

The batch name displays correctly on the first load of the annotation page.
Exporting multiple batches produces multiple batch exports, as expected.

Evaluation

When viewing traces, the ground truth now renders correctly for all datapoints.

SDK

Dataset.get_dataframe() returns more than 100 rows as expected.
LabelSchema.create() supports multi-label use cases correctly.

Known issues

Data management

The dataset size shown in the GUI does not always match the actual file size.
Splitting a dataset by percentage can fail if file upload is disabled.
Uploading large files from S3 shows an error on first submit during data ingestion, but works when submitted again.
File upload errors are silently ignored on submit.
Users without access to the default workspace receive a 404 error when trying to add a datasource to a dataset.
Dataset preview incorrectly renders all content in rows.

User interface

The breadcrumb navigation overlaps with page content while scrolling.

SDK

The ErrorAnalysis class documentation is available, but the feature is not currently operative.

SDK removals

Entire modules removed

The annotation workflow has moved from function-based node-level operations to object-oriented dataset-level batch management. Use the Batch and LabelSchema classes in the develop SDK instead.

snorkelai.sdk.client.annotations - All annotation interfaces including:
- Annotation class
- get_annotations
- add_annotation
- update_annotation
- delete_annotation
- add_annotations
- aggregate_annotations
- commit_annotations
- get_interannotator_agreement
snorkelai.sdk.client.comments - All comment interfaces including:
- create_comment
- get_comment
- delete_comment
- edit_comment
- delete_datapoint_comments
- get_comments
snorkelai.sdk.client.transfer - All data transfer functions including:
- export_ground_truth
- import_ground_truth
- convert_span_gt_csv_to_span_format
- transfer_comments
- transfer_tags
- export_tag_types
- import_tag_types
snorkelai.sdk.client.gts - All ground truth label interfaces including:
- create_ground_truth_version
- list_ground_truth_versions
- load_ground_truth_version
- delete_ground_truth_version
- add_ground_truth
- get_ground_truth
- get_document_ground_truth
- get_span_level_ground_truth_conflicts
- align_external_ground_truth
snorkelai.sdk.client.file_storage_configs - All file storage configuration functions including:
- get_file_storage_configs
- get_file_storage_config
- create_file_storage_config
- set_default_file_storage_config
- delete_file_storage_config
- get_file_storage_config_uid

Functions removed from existing modules

snorkelai.sdk.client.utils:
- get_application_uid
- get_lf_uid
- get_operator_uid
- get_tag_type_uid
- get_batch_uid
- check_model_node
- convert_fmt_user_to_raw
- convert_fmt_user_to_raw_df
- convert_fmt_raw_to_user
- convert_fmt_raw_to_user_df

Classes replaced

SliceConfig - Replaced with SliceTemplateGraph in slice operations
SliceConfigTemplatesItem - Replaced with SliceTemplateGraphTemplatesItem

Parameter changes

SnorkelSDKContext.from_endpoint_url - The minio_endpoint parameter was removed and is no longer required.

Migration guidance for annotations

For annotation workflows, migrate from the removed annotations module functions to the develop SDK:

Replace get_annotations() with Batch.get_dataframe(include_annotations=True)
Replace commit_annotations() with Batch.commit(source_uid, label_schema_uids)
Use Batch.create() and LabelSchema.create() for new annotation workflows

Breaking changes​

SDK​

SDK module, function, class, and parameter list at bottom of documentation​

Infrastructure​

Data management​

Deprecations​

Infrastructure​

Features and improvements​

Evaluation​

Data management​

SDK​

Docs​

Bug fixes​

Data management​

Annotation​

Evaluation​

SDK​

Known issues​

Data management​

User interface​

SDK​

SDK removals​

Entire modules removed​

Functions removed from existing modules​

Classes replaced​

Parameter changes​

Migration guidance for annotations​

Breaking changes

SDK

SDK module, function, class, and parameter list at bottom of documentation

Infrastructure

Data management

Deprecations

Infrastructure

Features and improvements

Evaluation

Data management

SDK

Docs

Bug fixes

Data management

Annotation

Evaluation

SDK

Known issues

Data management

User interface

SDK

SDK removals

Entire modules removed

Functions removed from existing modules

Classes replaced

Parameter changes

Migration guidance for annotations