Snorkel AI Data Development Platform v25.6 (STS) release notes
Breaking changes
SDK
- 
The following SDK functions have been removed from snorkelai.sdk.client.datasources:- create_datasource
- delete_datasource
- get_datasources
- prep_and_ingest_datasource
- split_datasources_by_percent
- update_datasource
- add_active_datasources
- get_node_datasources
- put_node_datasource
- refresh_active_datasources
 
- 
The following SDK functions have been removed from snorkelai.sdk.client.evaluation:- client.evaluation.create_evaluation_report
- client.evaluation.preview_custom_prompt_metric
 Use the Snorkel GUI to run evaluations. 
- 
The following SDK functions have been removed from snorkelai.sdk.client.metrics:- client.metrics.add_metric_to_node
- client.metrics.delete_metric_from_node
- client.metrics.get_candidate_extractor_metrics
- client.metrics.get_df_metrics
- client.metrics.get_model_metrics
- client.metrics.list_available_metrics
- client.metrics.register_custom_metric
- client.metrics.tune_threshold_on_valid
 
- 
The following SDK functions have been removed from snorkelai.sdk.client.models:- client.models.add_predictions
- client.models.get_models
- client.models.get_predictions
- client.models.register_model
- client.models.register_trained_model
- client.models.train_custom_model
- client.models.train_model
 
- 
The following SDK functions have been removed from snorkelai.sdk.client.nodes:- client.nodes.commit_model_to_node
 
- 
The following SDK functions no longer take start_dateandend_dateas parameters:- client.get_node_data
- client.get_ground_truth
- client.get_span_level_ground_truth_conflicts
- client.get_model_metrics
- client.get_predictions
 
- 
snorkelai.sdk.client.download_remote_objecthas been removed.
User interface
- The JupyterLab application no longer includes a button linking to the integrated Jupyter notebook app. Access the app from the Notebook left navigation menu instead.
Features and improvements
Data upload
- The dataset and data source upload user experience is now streamlined and standardized. Read the updated dataset upload guide.
- Added GUI-based authentication support for Amazon S3 and Google Cloud Storage to enable secure data access.
Annotation
- Annotation now supports multi-label tagging for sequence tagging.
- For spans, the inter-annotator agreement metric can now be configured so that strict overlap is not required for agreement to be calculated. You can set the percentage overlap when you create a new label schema and select the overlapping span functionality. The instructions for defining a custom agreement threshold are part of uploading a multi-schema annotation dataset.
- Reviewers can now choose which labels are ground truth on a per-annotation, per-document basis.
- The annotation task creation interface now displays error messages when names conflict and when entries exceed the character limit.
- Annotation task creation supports single and multi-label for sequence tagging.
- Quality of life and GUI improvements for annotation task creation.
Prompt development
- You can now export a CSV file with data from the current prompt run, including inputs, model information, and the LLM's response.
- You can now export a JSON file with a prompt template, containing the model, prompt, and metadata for a prompt version.
Evaluation
- From the evaluation dashboard, you can now filter data by slice, score, and inter-annotator agreement. Use this dashboard to easily identify problematic outputs and identify targets for prompt development.
- The agreement score filter now uses the more intuitive agree and disagree options for binary and ordinal criteria. Previously, this used unintuitive percentage-based inputs.
- You can now create custom code-based evaluators via the SDK
(snorkelai.sdk.develop.CodeEvaluator). Code evaluators let you use Python to deterministically, quickly, and automatically assign the correct label to a datapoint during evaluation. In the Snorkel GUI, you can run these evaluators as part of a benchmark and see the results. Read about how to create a code evaluator using the SDK.
Bug fixes
Annotation
- Fixed bug that exported only one batch when trying to export multiple batches.
- Cursor position for label schemas no longer resets after saving.
- Fixed a scrolling bug with tables.
Known issues
Data upload
- The dataset size shown in the GUI does not always match the actual file size.
Annotation
- In review mode, the GUI does not scroll to the annotation.
- In review mode, the annotation filter shows individual status rather than group status.
- If all labels are rejected, an annotation is marked Resolved even if no ground truth was committed.
Prompt development
- In the prompt workflow, users are unable to select a freeform annotation label schema.