Skip to main content

Snorkel AI Data Development Platform v25.8 (STS) release notes

Breaking changes

SDK

SDK module, function, class, and parameter list at bottom of documentation

Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.

This list is long for 25.8, so refer to the SDK removals section at the end of this document.

Infrastructure

  • JupyterHub is now disabled by default and will be removed in a future release (see deprecation notice below). If you rely on Snorkel-hosted notebooks, enable JupyterHub explicitly in your deployment configuration before upgrading.

    To re-enable in-platform notebook support, change the following flag in your Helm values.yaml file:

    NameValue
    services.jupyterhub{"enabled": true}
  • The ExploreNotebook example notebook was removed because it no longer contains relevant examples.

  • MinIO was removed from the default deployment. Read about your options for Managing files in data storage.

Data management

  • SliceConfig removed. Migrate any code that imports or constructs SliceConfig to the new template-based slice APIs for create/update.
    • SliceConfig - Replaced with SliceTemplateGraph in slice operations
    • SliceConfigTemplatesItem - Replaced with SliceTemplateGraphTemplatesItem
  • Tag IDs no longer identify slices. If you previously referenced slices via tag IDs, switch to using slice_uid everywhere.
  • Transform types are now standardized. If your code branches on transform types, update it to use the type discriminator exposed by the SDK and API.

Deprecations

Infrastructure

Built-in notebooks will be fully removed in 25.11. We recommend running the Snorkel SDK locally with your chosen notebook platform, and exporting the contents of any notebooks currently hosted by Snorkel to import into your platform. Follow the SDK quickstart to get started with running the SDK locally.

Features and improvements

Evaluation

  • You can view detailed error messages for evaluator execution and metric computation failures directly in Benchmark run details, helping you diagnose issues faster.
  • You can view the datapoints in each error cluster during error-mode analysis while tuning LLMAJ evaluators. This can give you additional insight into why the error occurs. Read more in the Improve LLMAJ alignment documentation.

SDK updates for evaluation

You can programmatically run error analysis with the new ErrorAnalysis class, making it easier to script and automate evaluation workflows.

Data management

  • Slices are simpler to create. You can create a new data slice from the GUI from an active search filter, and apply the slice to new data. Learn more about using data slices.
  • You can continue to define slices from the SDK using regex-style templates, but the format for those templates has been updated. See the Slice class documentation for details.

SDK

  • The example DefaultNotebook is updated.

Docs

You can read the SDK reference docs for the Taxonomy Distillation SDK, an SDK extension for distilling taxonomies discovered by large language models into smaller, more efficient models. Reach out to your Snorkel contact to access this extension.

Bug fixes

Data management

  • All users can list usable connectors without elevated permissions.
  • Ingesting 100 or more data sources succeeds reliably.
  • Running a filter that matches no rows no longer fails.
  • Data filters accept context_uid without error.

Annotation

  • The batch name displays correctly on the first load of the annotation page.
  • Exporting multiple batches produces multiple batch exports, as expected.

Evaluation

  • When viewing traces, the ground truth now renders correctly for all datapoints.

SDK

  • Dataset.get_dataframe() returns more than 100 rows as expected.
  • LabelSchema.create() supports multi-label use cases correctly.

Known issues

Data management

  • The dataset size shown in the GUI does not always match the actual file size.
  • Splitting a dataset by percentage can fail if file upload is disabled.
  • Uploading large files from S3 shows an error on first submit during data ingestion, but works when submitted again.
  • File upload errors are silently ignored on submit.
  • Users without access to the default workspace receive a 404 error when trying to add a datasource to a dataset.
  • Dataset preview incorrectly renders all content in rows.

User interface

  • The breadcrumb navigation overlaps with page content while scrolling.

SDK removals

Entire modules removed

The annotation workflow has moved from function-based node-level operations to object-oriented dataset-level batch management. Use the Batch and LabelSchema classes in the develop SDK instead.

  • snorkelai.sdk.client.annotations - All annotation interfaces including:
    • Annotation class
    • get_annotations
    • add_annotation
    • update_annotation
    • delete_annotation
    • add_annotations
    • aggregate_annotations
    • commit_annotations
    • get_interannotator_agreement
  • snorkelai.sdk.client.comments - All comment interfaces including:
    • create_comment
    • get_comment
    • delete_comment
    • edit_comment
    • delete_datapoint_comments
    • get_comments
  • snorkelai.sdk.client.transfer - All data transfer functions including:
    • export_ground_truth
    • import_ground_truth
    • convert_span_gt_csv_to_span_format
    • transfer_comments
    • transfer_tags
    • export_tag_types
    • import_tag_types
  • snorkelai.sdk.client.gts - All ground truth label interfaces including:
    • create_ground_truth_version
    • list_ground_truth_versions
    • load_ground_truth_version
    • delete_ground_truth_version
    • add_ground_truth
    • get_ground_truth
    • get_document_ground_truth
    • get_span_level_ground_truth_conflicts
    • align_external_ground_truth
  • snorkelai.sdk.client.file_storage_configs - All file storage configuration functions including:
    • get_file_storage_configs
    • get_file_storage_config
    • create_file_storage_config
    • set_default_file_storage_config
    • delete_file_storage_config
    • get_file_storage_config_uid

Functions removed from existing modules

  • snorkelai.sdk.client.utils:
    • get_application_uid
    • get_lf_uid
    • get_operator_uid
    • get_tag_type_uid
    • get_batch_uid
    • check_model_node
    • convert_fmt_user_to_raw
    • convert_fmt_user_to_raw_df
    • convert_fmt_raw_to_user
    • convert_fmt_raw_to_user_df

Classes replaced

  • SliceConfig - Replaced with SliceTemplateGraph in slice operations
  • SliceConfigTemplatesItem - Replaced with SliceTemplateGraphTemplatesItem

Parameter changes

  • SnorkelSDKContext.from_endpoint_url - The minio_endpoint parameter was removed and is no longer required.

Migration guidance for annotations

For annotation workflows, migrate from the removed annotations module functions to the develop SDK:

  • Replace get_annotations() with Batch.get_dataframe(include_annotations=True)
  • Replace commit_annotations() with Batch.commit(source_uid, label_schema_uids)
  • Use Batch.create() and LabelSchema.create() for new annotation workflows