Snorkel AI Data Development Platform v25.8 (STS) release notes
Breaking changes
SDK
SDK module, function, class, and parameter list at bottom of documentation
Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.
This list is long for 25.8, so refer to the SDK removals section at the end of this document.
Infrastructure
-
JupyterHub is now disabled by default and will be removed in a future release (see deprecation notice below). If you rely on Snorkel-hosted notebooks, enable JupyterHub explicitly in your deployment configuration before upgrading.
To re-enable in-platform notebook support, change the following flag in your Helm
values.yamlfile:Name Value services.jupyterhub {"enabled": true} -
The
ExploreNotebookexample notebook was removed because it no longer contains relevant examples. -
MinIO was removed from the default deployment. Read about your options for Managing files in data storage.
Data management
SliceConfigremoved. Migrate any code that imports or constructsSliceConfigto the new template-based slice APIs for create/update.SliceConfig- Replaced withSliceTemplateGraphin slice operationsSliceConfigTemplatesItem- Replaced withSliceTemplateGraphTemplatesItem
- Tag IDs no longer identify slices. If you previously referenced slices via
tag IDs, switch to using
slice_uideverywhere. - Transform types are now standardized. If your code branches on transform
types, update it to use the
typediscriminator exposed by the SDK and API.
Deprecations
Infrastructure
Built-in notebooks will be fully removed in 25.11. We recommend running the Snorkel SDK locally with your chosen notebook platform, and exporting the contents of any notebooks currently hosted by Snorkel to import into your platform. Follow the SDK quickstart to get started with running the SDK locally.
Features and improvements
Evaluation
- You can view detailed error messages for evaluator execution and metric computation failures directly in Benchmark run details, helping you diagnose issues faster.
- You can view the datapoints in each error cluster during error-mode analysis while tuning LLMAJ evaluators. This can give you additional insight into why the error occurs. Read more in the Improve LLMAJ alignment documentation.
Data management
- Slices are simpler to create. You can create a new data slice from the GUI from an active search filter, and apply the slice to new data. Learn more about using data slices.
- You can continue to define slices from the SDK using regex-style templates,
but the format for those templates has been updated. See the
Sliceclass documentation for details.
SDK
- The example
DefaultNotebookis updated.
Docs
You can read the SDK reference docs for the Taxonomy Distillation SDK, an SDK extension for distilling taxonomies discovered by large language models into smaller, more efficient models. Reach out to your Snorkel contact to access this extension.
Bug fixes
Data management
- All users can list usable connectors without elevated permissions.
- Ingesting 100 or more data sources succeeds reliably.
- Running a filter that matches no rows no longer fails.
- Data filters accept
context_uidwithout error.
Annotation
- The batch name displays correctly on the first load of the annotation page.
- Exporting multiple batches produces multiple batch exports, as expected.
Evaluation
- When viewing traces, the ground truth now renders correctly for all datapoints.
SDK
Dataset.get_dataframe()returns more than 100 rows as expected.LabelSchema.create()supports multi-label use cases correctly.
Known issues
Data management
- The dataset size shown in the GUI does not always match the actual file size.
- Splitting a dataset by percentage can fail if file upload is disabled.
- Uploading large files from S3 shows an error on first submit during data ingestion, but works when submitted again.
- File upload errors are silently ignored on submit.
- Users without access to the default workspace receive a 404 error when trying to add a datasource to a dataset.
- Dataset preview incorrectly renders all content in rows.
User interface
- The breadcrumb navigation overlaps with page content while scrolling.
SDK
- The
ErrorAnalysisclass documentation is available, but the feature is not currently operative.
SDK removals
Entire modules removed
The annotation workflow has moved from function-based node-level operations to
object-oriented dataset-level batch management. Use the Batch and
LabelSchema classes in the develop SDK instead.
snorkelai.sdk.client.annotations- All annotation interfaces including:Annotationclassget_annotationsadd_annotationupdate_annotationdelete_annotationadd_annotationsaggregate_annotationscommit_annotationsget_interannotator_agreement
snorkelai.sdk.client.comments- All comment interfaces including:create_commentget_commentdelete_commentedit_commentdelete_datapoint_commentsget_comments
snorkelai.sdk.client.transfer- All data transfer functions including:export_ground_truthimport_ground_truthconvert_span_gt_csv_to_span_formattransfer_commentstransfer_tagsexport_tag_typesimport_tag_types
snorkelai.sdk.client.gts- All ground truth label interfaces including:create_ground_truth_versionlist_ground_truth_versionsload_ground_truth_versiondelete_ground_truth_versionadd_ground_truthget_ground_truthget_document_ground_truthget_span_level_ground_truth_conflictsalign_external_ground_truth
snorkelai.sdk.client.file_storage_configs- All file storage configuration functions including:get_file_storage_configsget_file_storage_configcreate_file_storage_configset_default_file_storage_configdelete_file_storage_configget_file_storage_config_uid
Functions removed from existing modules
snorkelai.sdk.client.utils:get_application_uidget_lf_uidget_operator_uidget_tag_type_uidget_batch_uidcheck_model_nodeconvert_fmt_user_to_rawconvert_fmt_user_to_raw_dfconvert_fmt_raw_to_userconvert_fmt_raw_to_user_df
Classes replaced
SliceConfig- Replaced withSliceTemplateGraphin slice operationsSliceConfigTemplatesItem- Replaced withSliceTemplateGraphTemplatesItem
Parameter changes
SnorkelSDKContext.from_endpoint_url- Theminio_endpointparameter was removed and is no longer required.
Migration guidance for annotations
For annotation workflows, migrate from the removed annotations module
functions to the develop SDK:
- Replace
get_annotations()withBatch.get_dataframe(include_annotations=True) - Replace
commit_annotations()withBatch.commit(source_uid, label_schema_uids) - Use
Batch.create()andLabelSchema.create()for new annotation workflows