Snorkel AI Data Development Platform v25.7 (LTS) release notes
Breaking changes
SDK
Dataset.get_dataframeno longer accepts bothsplitanddatasource_uidparameters simultaneously.
SDK module, function, and class list at bottom of documentation
Snorkel has removed a large number of older SDK functions in our ongoing effort to streamline the platform.
This list is long for 25.7, so refer to the SDK removals section at the end of this document.
Features and improvements
Data management
- You can now use external Amazon S3 buckets to store datasets and associated files. See S3 external storage.
- Updated SDK for data management:
Dataset.create_datasourcenow automatically adds datasources to the annotation node.Dataset.create_datasourcecan now automatically generateuid_colvalues.
Annotation
Prompt development
- You can enhance prompts with one- and few-shot ground-truth examples by selecting one or more datapoints to inject directly into the prompt. Each datapoint includes the input columns, the LLMs reponse from the current run, and the ground truth. See Improve LLMAJ alignment.
- You can run prompts on a filtered subset of datapoints in both prompt development and LLMAJ evaluation workflows.
Evaluation
- You can use the new Improve Prompt error analysis feature to identify and improve prompt accuracy for LLMAJ workflows. After running a prompt, this feature automatically groups evaluator–human disagreements and suggests targeted prompt changes. You can see what went wrong and how to improve the prompt. See Improve LLMAJ alignment.
- You can execute benchmark runs on the test split via
snorkelai.sdk.develop.benchmarks.Benchmark.execute(splits=["test"]). - You can edit ground truth scores directly from the evaluation results interface to quickly correct or add ground truth labels without leaving the evaluation workflow.
- You can pivot the evaluation table to display criteria as either rows or columns, depending on your analytical needs.
SDK updates for evaluation
- You can
create(),update(), andget()LLMAJ evaluators; seesnorkelai.sdk.develop.PromptEvaluatorfor details. - You can execute LLMAJ runs using
PromptEvaluatormethods:execute(),get_execution_result(),poll_execution_result(), andget_executions(). - You can manage benchmarks and criteria through the SDK with comprehensive
create(),get(),update(), andarchive()capabilities. - You no longer need to provide a
workspace_idinBenchmark.create(); if omitted, the system infersworkspace_uidfrom context. - You can provide the workspace name instead of workspace ID in
Benchmark.create().
Integrations
- Updated documentation for how to connect external models.
- When running a custom model, you can use custom headers with custom inference service integrations, even for non-OpenAI-compatible services.
Bug fixes
SDK
- Fixed
Dataset.create_datasourcewhen used with pandas DataFrames. - Fixed optional
descriptionandworkspace_uidfields inBenchmarkandCriteriacreate()andupdate()methods.
Known issues
User interface
- Exporting multiple batches only exports one batch.
Data management
- The dataset size shown in the GUI does not always match the actual file size.
SDK removals
Entire modules removed
-
The entire
snorkelai.sdk.client.nodesmodule has been removed, including these functions:client.nodes.add_active_datasourcesclient.nodes.add_nodeclient.nodes.add_node_hierarchyclient.nodes.commit_builtin_operatorclient.nodes.commit_custom_operatorclient.nodes.delete_nodeclient.nodes.fit_and_commitclient.nodes.get_model_nodeclient.nodes.get_model_nodesclient.nodes.get_nodeclient.nodes.get_node_dataclient.nodes.get_node_datasourcesclient.nodes.get_node_input_colsclient.nodes.get_node_inputs_dataclient.nodes.get_node_label_mapclient.nodes.get_node_output_dataclient.nodes.get_node_settingsclient.nodes.get_node_uidclient.nodes.get_preprocessing_issuesclient.nodes.list_nodesclient.nodes.put_node_datasourceclient.nodes.refresh_active_datasourcesclient.nodes.set_node_settingsclient.nodes.uncommit_operator
-
The entire
snorkelai.sdk.client.operatorsmodule has been removed, including these functions:client.operators.add_operatorclient.operators.add_operator_classclient.operators.check_conflicting_operator_nameclient.operators.delete_operatorclient.operators.execute_operatorsclient.operators.get_custom_operatorsclient.operators.get_default_operatorclient.operators.get_default_operatorsclient.operators.get_operator_codeclient.operators.get_operator_config
-
The entire
snorkelai.sdk.client.lfsmodule has been removed, including these functions:client.lfs.archive_lfclient.lfs.archive_lfsclient.lfs.delete_lfclient.lfs.execute_lfsclient.lfs.get_lfclient.lfs.get_lfs
-
The entire
snorkelai.sdk.client.lf_packagesmodule has been removed, including these functions:client.lf_packages.delete_lf_packageclient.lf_packages.export_lf_packagesclient.lf_packages.import_lf_packagesclient.lf_packages.transfer_lf_packages
-
The entire
snorkelai.sdk.client.lf_templatesmodule has been removed, including these functions:client.lf_templates.add_lf_template_classclient.lf_templates.delete_lf_templateclient.lf_templates.get_lf_templates
-
The entire
snorkelai.sdk.client.dataset_viewsmodule has been removed, including these functions:client.dataset_views.create_dataset_viewclient.dataset_views.delete_dataset_viewclient.dataset_views.get_dataset_viewclient.dataset_views.get_dataset_viewsclient.dataset_views.update_dataset_view
-
The entire
snorkelai.sdk.client.batchesmodule has been removed, including these functions:client.batches.create_batchesclient.batches.delete_batchclient.batches.get_batchesclient.batches.get_x_uids_from_batchclient.batches.update_batch
-
The entire
snorkelai.sdk.client.training_setsmodule has been removed, including these functions:client.training_sets.delete_training_setclient.training_sets.get_training_set
Functions removed from existing modules
-
The following SDK functions have been removed from
snorkelai.sdk.client.applications:client.applications.add_block_to_applicationclient.applications.create_app_versionclient.applications.create_applicationclient.applications.create_classification_applicationclient.applications.create_hocr_classification_applicationclient.applications.create_hocr_extraction_applicationclient.applications.create_multilabel_classification_applicationclient.applications.create_native_pdf_classification_applicationclient.applications.create_native_pdf_extraction_applicationclient.applications.create_sequence_tagging_applicationclient.applications.create_text_extraction_applicationclient.applications.delete_applicationclient.applications.duplicate_applicationclient.applications.execute_graph_on_dataclient.applications.get_applicationclient.applications.get_applicationsclient.applications.list_app_versionsclient.applications.load_app_versionclient.applications.set_application_visibilityclient.applications.update_applicationclient.applications.visualize_application_graph
-
The following SDK functions have been removed from
snorkelai.sdk.client.blocks:client.blocks.delete_operator_blockclient.blocks.duplicate_blockclient.blocks.get_operator_blockclient.blocks.get_operator_blocks
-
The following SDK functions have been removed from
snorkelai.sdk.client.fm_suite:client.fm_suite.run_lf_inference
-
The following SDK functions have been removed from
snorkelai.sdk.client.gts:client.gts.get_inferred_document_ground_truth_from_span_ground_truth
-
The following SDK functions have been removed from
snorkelai.sdk.client.transfer:client.transfer.export_lfsclient.transfer.export_node_dataclient.transfer.import_lfsclient.transfer.import_node_dataclient.transfer.transfer_annotationsclient.transfer.transfer_gtsclient.transfer.transfer_lfsclient.transfer.transfer_lfs_by_name
Classes removed
- The
ModelNodeSDK class has been removed fromsnorkelai.sdk.develop.