Snorkel AI Data Development Platform v25.6 (STS) release notes

Breaking changes

SDK

The following SDK functions have been removed from snorkelai.sdk.client.datasources:
- create_datasource
- delete_datasource
- get_datasources
- prep_and_ingest_datasource
- split_datasources_by_percent
- update_datasource
- add_active_datasources
- get_node_datasources
- put_node_datasource
- refresh_active_datasources
The following SDK functions have been removed from snorkelai.sdk.client.evaluation:
- client.evaluation.create_evaluation_report
- client.evaluation.preview_custom_prompt_metric
Use the Snorkel GUI to run evaluations.
The following SDK functions have been removed from snorkelai.sdk.client.metrics:
- client.metrics.add_metric_to_node
- client.metrics.delete_metric_from_node
- client.metrics.get_candidate_extractor_metrics
- client.metrics.get_df_metrics
- client.metrics.get_model_metrics
- client.metrics.list_available_metrics
- client.metrics.register_custom_metric
- client.metrics.tune_threshold_on_valid
The following SDK functions have been removed from snorkelai.sdk.client.models:
- client.models.add_predictions
- client.models.get_models
- client.models.get_predictions
- client.models.register_model
- client.models.register_trained_model
- client.models.train_custom_model
- client.models.train_model
The following SDK functions have been removed from snorkelai.sdk.client.nodes:
- client.nodes.commit_model_to_node
The following SDK functions no longer take start_date and end_date as parameters:
- client.get_node_data
- client.get_ground_truth
- client.get_span_level_ground_truth_conflicts
- client.get_model_metrics
- client.get_predictions
snorkelai.sdk.client.download_remote_object has been removed.

User interface

The JupyterLab application no longer includes a button linking to the integrated Jupyter notebook app. Access the app from the Notebook left navigation menu instead.

Features and improvements

Data upload

The dataset and data source upload user experience is now streamlined and standardized. Read the updated dataset upload guide.
Added GUI-based authentication support for Amazon S3 and Google Cloud Storage to enable secure data access.

Annotation

Annotation now supports multi-label tagging for sequence tagging.
For spans, the inter-annotator agreement metric can now be configured so that strict overlap is not required for agreement to be calculated. You can set the percentage overlap when you create a new label schema and select the overlapping span functionality. The instructions for defining a custom agreement threshold are part of uploading a multi-schema annotation dataset.
Reviewers can now choose which labels are ground truth on a per-annotation, per-document basis.
The annotation task creation interface now displays error messages when names conflict and when entries exceed the character limit.
Annotation task creation supports single and multi-label for sequence tagging.
Quality of life and GUI improvements for annotation task creation.

Prompt development

You can now export a CSV file with data from the current prompt run, including inputs, model information, and the LLM's response.
You can now export a JSON file with a prompt template, containing the model, prompt, and metadata for a prompt version.

Evaluation

From the evaluation dashboard, you can now filter data by slice, score, and inter-annotator agreement. Use this dashboard to easily identify problematic outputs and identify targets for prompt development.
The agreement score filter now uses the more intuitive agree and disagree options for binary and ordinal criteria. Previously, this used unintuitive percentage-based inputs.
You can now create custom code-based evaluators via the SDK (snorkelai.sdk.develop.CodeEvaluator). Code evaluators let you use Python to deterministically, quickly, and automatically assign the correct label to a datapoint during evaluation. In the Snorkel GUI, you can run these evaluators as part of a benchmark and see the results. Read about how to create a code evaluator using the SDK.

Bug fixes

Annotation

Fixed bug that exported only one batch when trying to export multiple batches.
Cursor position for label schemas no longer resets after saving.
Fixed a scrolling bug with tables.

Known issues

Data upload

The dataset size shown in the GUI does not always match the actual file size.

Annotation

In review mode, the GUI does not scroll to the annotation.
In review mode, the annotation filter shows individual status rather than group status.
If all labels are rejected, an annotation is marked Resolved even if no ground truth was committed.

Prompt development

In the prompt workflow, users are unable to select a freeform annotation label schema.

Breaking changes​

SDK​

User interface​

Features and improvements​

Data upload​

Annotation​

Prompt development​

Evaluation​

Bug fixes​

Annotation​

Known issues​

Data upload​

Annotation​

Prompt development​

Breaking changes

SDK

User interface

Features and improvements

Data upload

Annotation

Prompt development

Evaluation

Bug fixes

Annotation

Known issues

Data upload

Annotation

Prompt development