Snorkel Flow v0.96 (STS) release notes

warning

Snorkel v0.96 is no longer actively maintained or supported. For up-to-date documentation, see the latest version.

New feature

(Beta) Prompt development

note

This is a beta feature available to customers using a Snorkel-hosted instance of Snorkel Flow. Beta features may have known gaps or bugs, but are functional workflows and eligible for Snorkel Support. To access beta features, contact Snorkel Support to enable the feature flag for your Snorkel-hosted instance.

Prompt development, sometimes called prompt engineering, is the process of designing and refining inputs to guide AI models, such as large language models (LLMs), to produce high-quality, task-specific outputs. In Snorkel Flow, prompt development workflows empower users to create, experiment with, and optimize prompts iteratively. This ensures efficient, precise, and reproducible outcomes for downstream tasks.

For more, see Prompt development overview and create a prompt development workflow

Deprecations

Label + training

Users can no longer building custom labeling functions.

SDK

Deprecated image uploads.
Deprecated sf.get_dataset_data and other dataset functions in favor of snorkelflow.sdk.Dataset.

ML tasks

Dropped support for Python 3.8.

Bug fixes

Provisioning

Cleaned up duplicated Studio API environment variables.
Removed inference and added singleuser-notebook to extra images.
Fixed hardcoded project for network policy in helm template.

User interface

Fixed broken back button that wasn't working from Jobs page.
Fixed Collections component that sent requests twice.
Track application recency on URL change.
Fixed logo header.
Fixed evaluation page if no models exist.
Fixed draw not working with embeddings.
Fixed popup not showing up in correct place for multi-schema annotation.
Fixed pagination for dataset collections.
Fixed broken Studio export where # is in selected column names.
Fixed cleared search text in batch label schema dropdown.
Edited workspace dialog overflows.
Fixed multi-node selector to navigate to Develop when user changes a node.
Fixed filter dropdown text that wasn't fully displaying on Deployments page.
Fixed incorrect app links on new application and application setup pages.
Fixed labeling function (LF) names not truncating in LF table.

Data + slices

Fixed incorrect slice membership for MDV table view multiselect.
Enabled selected documents to be added to slice from cluster data explorer.

Label + training

Stopped using non-deployable LFs as features for AutoML.
If x_uids do not exist in df when computing metrics for sequence tagging, ground truths (GT) are ignored.
Fixed ascii offset issues with the new MSA dataset preprocessing.

ML tasks: PDF

Added support for min_per_class for resampling word-based PDF applications.
Fixed the Studio filter at GT=negative for word-based PDF.
Fixed a negated LF for word-based PDF.
Excluded unknown and negative classes when computing model metrics for word-based PDF.

Annotation

Display correct labels when in reviewer mode.
Display MSA marker on Datasets page.
Fixed annotation dataviewer that was not scrolling properly.
Fixed ranking view getting closed while selecting information.
Fixed ranking dataset view throwing error while loading in annotation.

SDK

Automatically align GT spans over all preprocessors in sf.align_external_ground_truth, no need to specify preprocessor_node_uid.

Foundation models

Updated prompt template copy.

Improvements

User interface

Removed jobs dialog in Datasets page.
Updated the Evaluate Model Metrics table to include a Ground Truth count column alongside current aggregated scores.
Enhanced the pop-up Data Viewer in the Evaluation Table to display all metrics on a datapoint level.
Added the ability for users to use their registered foundation models in the evaluation suite. Users can use FMs as judges to evaluate their dataset.

Foundation models

View prompt LF in native viewer

User interface

Removed label options for sequence tagging.
Removed redundant View Cell types.
Cleaned up sidebar if user has no workspaces.

SDK

Scoped Open_file workspace.
Added a description to LabelSchema SDK.

Data + slices

Added improvements to top-level evaluation module.

Enterprise infra

Made the JupyterLab view the default view, as opposed to the single notebook view.

Known issues

Application

Model IDs are not unique on the user interface.
View correct is broken when previewing a prompt LF.
Unable to annotate batch with specific user.
Evaluation data popup does not show any data.
Dataset creation with split data by % does not create data sources.
Create custom operator link is broken: 404 page not found.
Unable to download annotation batches in on-prem instance.
Block action to "Duplicate LF" action in UI for Prompt LFs.
Unable to export dataset annotation.
Unable to see annotate in the sidebar.
Text input for prompt changes viewport position.
When adding a dataset view, validate that the label schema is attached to the dataset.
"Preview LF" on selection in Studio should not happen in index column.
Error on first load of application post-upgrade.
All selected words should have the same label instead of flipping their labels at bounding box selection.
Onboarding accordion not opening at expected time.
Need error message when data sources have inconsistent schemas during app creation.
If labels are added before the primary text field is selected, labels are wiped when the primary text field is chosen.
No data found when applying a filter that yields fewer results than current pagination.
The new labels added in Onboarding page not being visible in Studio.
Saving Settings in Label schema section will be in animation endlessly when clicking on Next button second time.
Reviewer workflow broken for sequence tagging overlapping spans.
File Download Endpoint (/api/download) enables user to read arbitrary files.
DAG drawing is misleading when the first nodes in the downstream block are not connected to the last node in the upstream block.
Bulk accepting suggested LFs takes a while causing confusion to user if they were succesful.

Sequence tagging

Opening metadata breaks up the highlighting in Studio.
When annotating, "label validation error" shows up in Studio.
Error seen in studio when a sequence tagging app is loaded.
Confusion and clarity matrix numbers dont add up for sequence tagging applications.

MLF/Data Platform

Populator saves two different sets of arrow files at the same path when applying.
"Override LF labels with ground truth labels where available" doesn't override LF labels with the negative GT labels.
Annotation Filter with Annotation = negative/GT = negative doesn't work in word-based PDF.
Create LF fails silently for LLM LF.
In a populated evaluation report, the user who created the reported is the requester user, rather than the original user from the populated app.
Swap from tags to slices in get_node_data.
Snorkel Flow-generated labels are not updated for word-based PDF.
Cannot Modify OpenAI module with both base_url and azure_endpoint.
GT is not getting propagated to labels column in the df.
Error when deleting custom models.
Uploading large csv shows unrelated errors during data upload.
Getting internal server error when doing S3 upload of large csv data.
Dev split is not maintained for PDF IE when copying an application.
Server error seen when creating an LF.
Error seen while duplicating chatbot app.
Studio /dataset and /advanced-lf-state error out with a cryptic error message when there is no span.
In a hierarchical app, all but the first LabelFilters stay connected with the model node after adding another node in between.

New feature​

Deprecations​

Label + training​

SDK​

ML tasks​

Bug fixes​

Provisioning​

User interface​

Data + slices​

Label + training​

ML tasks: PDF​

Annotation​

SDK​

Foundation models​

Improvements​

User interface​

Foundation models​

User interface​

SDK​

Data + slices​

Enterprise infra​

Known issues​

Application​

Sequence tagging​

MLF/Data Platform​

New feature

Deprecations

Label + training

SDK

ML tasks

Bug fixes

Provisioning

User interface

Data + slices

Label + training

ML tasks: PDF

Annotation

SDK

Foundation models

Improvements

User interface

Foundation models

User interface

SDK

Data + slices

Enterprise infra

Known issues

Application

Sequence tagging

MLF/Data Platform