Snorkel AI Data Development Platform v25.7 (LTS) release notes
Breaking changes
SDK
Dataset.get_dataframe
no longer accepts bothsplit
anddatasource_uid
parameters simultaneously.
SDK module, function, and class list at bottom of documentation
Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.
This list is long for 25.7, so refer to the SDK removals section at the end of this document.
Features and improvements
Data management
- You can now use external Amazon S3 buckets to store datasets and associated files. See S3 external storage.
- Updated SDK for data management:
Dataset.create_datasource
now automatically adds datasources to the annotation node.Dataset.create_datasource
can now automatically generateuid_col
values.
Annotation
Prompt development
- You can enhance prompts with one- and few-shot ground-truth examples by selecting one or more datapoints to inject directly into the prompt. Each datapoint includes the input columns, the LLMs reponse from the current run, and the ground truth. See Improve LLMAJ alignment.
- You can run prompts on a filtered subset of datapoints in both prompt development and LLMAJ evaluation workflows.
Evaluation
- You can use the new Improve Prompt error analysis feature to identify and improve prompt accuracy for LLMAJ workflows. After running a prompt, this feature automatically groups evaluator–human disagreements and suggests targeted prompt changes. You can see what went wrong and how to improve the prompt. See Improve LLMAJ alignment.
- You can execute benchmark runs on the test split via
snorkelai.sdk.develop.benchmarks.Benchmark.execute(splits=["test"])
. - You can edit ground truth scores directly from the evaluation results interface to quickly correct or add ground truth labels without leaving the evaluation workflow.
- You can pivot the evaluation table to display criteria as either rows or columns, depending on your analytical needs.
SDK updates for evaluation
- You can
create()
,update()
, andget()
LLMAJ evaluators; seesnorkelai.sdk.develop.PromptEvaluator
for details. - You can execute LLMAJ runs using
PromptEvaluator
methods:execute()
,get_execution_result()
,poll_execution_result()
, andget_executions()
. - You can manage benchmarks and criteria through the SDK with comprehensive
create()
,get()
,update()
, andarchive()
capabilities. - You no longer need to provide a
workspace_id
inBenchmark.create()
; if omitted, the system infersworkspace_uid
from context. - You can provide the workspace name instead of workspace ID in
Benchmark.create()
.
Integrations
- Updated documentation for how to connect external models.
- When running a custom model, you can use custom headers with custom inference service integrations, even for non-OpenAI-compatible services.
Bug fixes
SDK
- Fixed
Dataset.create_datasource
when used with pandas DataFrames. - Fixed optional
description
andworkspace_uid
fields inBenchmark
andCriteria
create()
andupdate()
methods.
Known issues
User interface
- Exporting multiple batches only exports one batch.
Data management
- The dataset size shown in the GUI does not always match the actual file size.
SDK removals
Entire modules removed
-
The entire
snorkelai.sdk.client.nodes
module has been removed, including these functions:client.nodes.add_active_datasources
client.nodes.add_node
client.nodes.add_node_hierarchy
client.nodes.commit_builtin_operator
client.nodes.commit_custom_operator
client.nodes.delete_node
client.nodes.fit_and_commit
client.nodes.get_model_node
client.nodes.get_model_nodes
client.nodes.get_node
client.nodes.get_node_data
client.nodes.get_node_datasources
client.nodes.get_node_input_cols
client.nodes.get_node_inputs_data
client.nodes.get_node_label_map
client.nodes.get_node_output_data
client.nodes.get_node_settings
client.nodes.get_node_uid
client.nodes.get_preprocessing_issues
client.nodes.list_nodes
client.nodes.put_node_datasource
client.nodes.refresh_active_datasources
client.nodes.set_node_settings
client.nodes.uncommit_operator
-
The entire
snorkelai.sdk.client.operators
module has been removed, including these functions:client.operators.add_operator
client.operators.add_operator_class
client.operators.check_conflicting_operator_name
client.operators.delete_operator
client.operators.execute_operators
client.operators.get_custom_operators
client.operators.get_default_operator
client.operators.get_default_operators
client.operators.get_operator_code
client.operators.get_operator_config
-
The entire
snorkelai.sdk.client.lfs
module has been removed, including these functions:client.lfs.archive_lf
client.lfs.archive_lfs
client.lfs.delete_lf
client.lfs.execute_lfs
client.lfs.get_lf
client.lfs.get_lfs
-
The entire
snorkelai.sdk.client.lf_packages
module has been removed, including these functions:client.lf_packages.delete_lf_package
client.lf_packages.export_lf_packages
client.lf_packages.import_lf_packages
client.lf_packages.transfer_lf_packages
-
The entire
snorkelai.sdk.client.lf_templates
module has been removed, including these functions:client.lf_templates.add_lf_template_class
client.lf_templates.delete_lf_template
client.lf_templates.get_lf_templates
-
The entire
snorkelai.sdk.client.dataset_views
module has been removed, including these functions:client.dataset_views.create_dataset_view
client.dataset_views.delete_dataset_view
client.dataset_views.get_dataset_view
client.dataset_views.get_dataset_views
client.dataset_views.update_dataset_view
-
The entire
snorkelai.sdk.client.batches
module has been removed, including these functions:client.batches.create_batches
client.batches.delete_batch
client.batches.get_batches
client.batches.get_x_uids_from_batch
client.batches.update_batch
-
The entire
snorkelai.sdk.client.training_sets
module has been removed, including these functions:client.training_sets.delete_training_set
client.training_sets.get_training_set
Functions removed from existing modules
-
The following SDK functions have been removed from
snorkelai.sdk.client.applications
:client.applications.add_block_to_application
client.applications.create_app_version
client.applications.create_application
client.applications.create_classification_application
client.applications.create_hocr_classification_application
client.applications.create_hocr_extraction_application
client.applications.create_multilabel_classification_application
client.applications.create_native_pdf_classification_application
client.applications.create_native_pdf_extraction_application
client.applications.create_sequence_tagging_application
client.applications.create_text_extraction_application
client.applications.delete_application
client.applications.duplicate_application
client.applications.execute_graph_on_data
client.applications.get_application
client.applications.get_applications
client.applications.list_app_versions
client.applications.load_app_version
client.applications.set_application_visibility
client.applications.update_application
client.applications.visualize_application_graph
-
The following SDK functions have been removed from
snorkelai.sdk.client.blocks
:client.blocks.delete_operator_block
client.blocks.duplicate_block
client.blocks.get_operator_block
client.blocks.get_operator_blocks
-
The following SDK functions have been removed from
snorkelai.sdk.client.fm_suite
:client.fm_suite.run_lf_inference
-
The following SDK functions have been removed from
snorkelai.sdk.client.gts
:client.gts.get_inferred_document_ground_truth_from_span_ground_truth
-
The following SDK functions have been removed from
snorkelai.sdk.client.transfer
:client.transfer.export_lfs
client.transfer.export_node_data
client.transfer.import_lfs
client.transfer.import_node_data
client.transfer.transfer_annotations
client.transfer.transfer_gts
client.transfer.transfer_lfs
client.transfer.transfer_lfs_by_name
Classes removed
- The
ModelNode
SDK class has been removed fromsnorkelai.sdk.develop
.