Snorkel AI Data Development Platform v25.6 (STS) release notes
Breaking changes
SDK
-
The following SDK functions have been removed from
snorkelai.sdk.client.datasources
:create_datasource
delete_datasource
get_datasources
prep_and_ingest_datasource
split_datasources_by_percent
update_datasource
add_active_datasources
get_node_datasources
put_node_datasource
refresh_active_datasources
-
The following SDK functions have been removed from
snorkelai.sdk.client.evaluation
:client.evaluation.create_evaluation_report
client.evaluation.preview_custom_prompt_metric
Use the Snorkel GUI to run evaluations.
-
The following SDK functions have been removed from
snorkelai.sdk.client.metrics
:client.metrics.add_metric_to_node
client.metrics.delete_metric_from_node
client.metrics.get_candidate_extractor_metrics
client.metrics.get_df_metrics
client.metrics.get_model_metrics
client.metrics.list_available_metrics
client.metrics.register_custom_metric
client.metrics.tune_threshold_on_valid
-
The following SDK functions have been removed from
snorkelai.sdk.client.models
:client.models.add_predictions
client.models.get_models
client.models.get_predictions
client.models.register_model
client.models.register_trained_model
client.models.train_custom_model
client.models.train_model
-
The following SDK functions have been removed from
snorkelai.sdk.client.nodes
:client.nodes.commit_model_to_node
-
The following SDK functions no longer take
start_date
andend_date
as parameters:client.get_node_data
client.get_ground_truth
client.get_span_level_ground_truth_conflicts
client.get_model_metrics
client.get_predictions
-
snorkelai.sdk.client.download_remote_object
has been removed.
User interface
- The JupyterLab application no longer includes a button linking to the integrated Jupyter notebook app. Access the app from the Notebook left navigation menu instead.
Features and improvements
Data upload
- The dataset and data source upload user experience is now streamlined and standardized. Read the updated dataset upload guide.
- Added GUI-based authentication support for Amazon S3 and Google Cloud Storage to enable secure data access.
Annotation
- Annotation now supports multi-label tagging for sequence tagging.
- For spans, the inter-annotator agreement metric can now be configured so that strict overlap is not required for agreement to be calculated. You can set the percentage overlap when you create a new label schema and select the overlapping span functionality. The instructions for defining a custom agreement threshold are part of uploading a multi-schema annotation dataset.
- Reviewers can now choose which labels are ground truth on a per-annotation, per-document basis.
- The annotation task creation interface now displays error messages when names conflict and when entries exceed the character limit.
- Annotation task creation supports single and multi-label for sequence tagging.
- Quality of life and GUI improvements for annotation task creation.
Prompt development
- You can now export a CSV file with data from the current prompt run, including inputs, model information, and the LLM's response.
- You can now export a JSON file with a prompt template, containing the model, prompt, and metadata for a prompt version.
Evaluation
- From the evaluation dashboard, you can now filter data by slice, score, and inter-annotator agreement. Use this dashboard to easily identify problematic outputs and identify targets for prompt development.
- The agreement score filter now uses the more intuitive agree and disagree options for binary and ordinal criteria. Previously, this used unintuitive percentage-based inputs.
- You can now create custom code-based evaluators via the SDK
(
snorkelai.sdk.develop.CodeEvaluator
). Code evaluators let you use Python to deterministically, quickly, and automatically assign the correct label to a datapoint during evaluation. In the Snorkel GUI, you can run these evaluators as part of a benchmark and see the results. Read about how to create a code evaluator using the SDK.
Bug fixes
Annotation
- Fixed bug that exported only one batch when trying to export multiple batches.
- Cursor position for label schemas no longer resets after saving.
- Fixed a scrolling bug with tables.
Known issues
Data upload
- The dataset size shown in the GUI does not always match the actual file size.
Annotation
- In review mode, the GUI does not scroll to the annotation.
- In review mode, the annotation filter shows individual status rather than group status.
- If all labels are rejected, an annotation is marked Resolved even if no ground truth was committed.
Prompt development
- In the prompt workflow, users are unable to select a freeform annotation label schema.