Snorkel Flow v0.96 (STS) release notes
New feature
(Beta) Prompt development
This is a beta feature available to customers using a Snorkel-hosted instance of Snorkel Flow. Beta features may have known gaps or bugs, but are functional workflows and eligible for Snorkel Support. To access beta features, contact Snorkel Support to enable the feature flag for your Snorkel-hosted instance.
Prompt development, sometimes called prompt engineering, is the process of designing and refining inputs to guide AI models, such as large language models (LLMs), to produce high-quality, task-specific outputs. In Snorkel Flow, prompt development workflows empower users to create, experiment with, and optimize prompts iteratively. This ensures efficient, precise, and reproducible outcomes for downstream tasks.
For more, see Prompt development overview and create a prompt development workflow
Deprecations
Label + training
- Users can no longer building custom labeling functions.
SDK
- Deprecated image uploads.
- Deprecated
sf.get_dataset_data
and other dataset functions in favor ofsnorkelflow.sdk.Dataset
.
ML tasks
- Dropped support for Python 3.8.
Bug fixes
Provisioning
- Cleaned up duplicated Studio API environment variables.
- Removed inference and added singleuser-notebook to extra images.
- Fixed hardcoded project for network policy in helm template.
User interface
- Fixed broken back button that wasn't working from Jobs page.
- Fixed Collections component that sent requests twice.
- Track application recency on URL change.
- Fixed logo header.
- Fixed evaluation page if no models exist.
- Fixed draw not working with embeddings.
- Fixed popup not showing up in correct place for multi-schema annotation.
- Fixed pagination for dataset collections.
- Fixed broken Studio export where
#
is in selected column names. - Fixed cleared search text in batch label schema dropdown.
- Edited workspace dialog overflows.
- Fixed multi-node selector to navigate to Develop when user changes a node.
- Fixed filter dropdown text that wasn't fully displaying on Deployments page.
- Fixed incorrect app links on new application and application setup pages.
- Fixed labeling function (LF) names not truncating in LF table.
Data + slices
- Fixed incorrect slice membership for MDV table view multiselect.
- Enabled selected documents to be added to slice from cluster data explorer.
Label + training
- Stopped using non-deployable LFs as features for AutoML.
- If
x_uids
do not exist in df when computing metrics for sequence tagging, ground truths (GT) are ignored. - Fixed ascii offset issues with the new MSA dataset preprocessing.
ML tasks: PDF
- Added support for
min_per_class
for resampling word-based PDF applications. - Fixed the Studio filter at
GT=negative
for word-based PDF. - Fixed a negated LF for word-based PDF.
- Excluded unknown and negative classes when computing model metrics for word-based PDF.
Annotation
- Display correct labels when in reviewer mode.
- Display MSA marker on Datasets page.
- Fixed annotation dataviewer that was not scrolling properly.
- Fixed ranking view getting closed while selecting information.
- Fixed ranking dataset view throwing error while loading in annotation.
SDK
- Automatically align GT spans over all preprocessors in
sf.align_external_ground_truth
, no need to specifypreprocessor_node_uid
.
Foundation models
- Updated prompt template copy.
Improvements
User interface
- Removed jobs dialog in Datasets page.
- Updated the Evaluate Model Metrics table to include a Ground Truth count column alongside current aggregated scores.
- Enhanced the pop-up Data Viewer in the Evaluation Table to display all metrics on a datapoint level.
- Added the ability for users to use their registered foundation models in the evaluation suite. Users can use FMs as judges to evaluate their dataset.
Foundation models
- View prompt LF in native viewer
User interface
- Removed label options for sequence tagging.
- Removed redundant View Cell types.
- Cleaned up sidebar if user has no workspaces.
SDK
- Scoped
Open_file
workspace. - Added a description to LabelSchema SDK.
Data + slices
- Added improvements to top-level evaluation module.
Enterprise infra
- Made the JupyterLab view the default view, as opposed to the single notebook view.
Known issues
Application
- Model IDs are not unique on the user interface.
- View correct is broken when previewing a prompt LF.
- Unable to annotate batch with specific user.
- Evaluation data popup does not show any data.
- Dataset creation with split data by % does not create data sources.
- Create custom operator link is broken: 404 page not found.
- Unable to download annotation batches in on-prem instance.
- Block action to "Duplicate LF" action in UI for Prompt LFs.
- Unable to export dataset annotation.
- Unable to see annotate in the sidebar.
- Text input for prompt changes viewport position.
- When adding a dataset view, validate that the label schema is attached to the dataset.
- "Preview LF" on selection in Studio should not happen in index column.
- Error on first load of application post-upgrade.
- All selected words should have the same label instead of flipping their labels at bounding box selection.
- Onboarding accordion not opening at expected time.
- Need error message when data sources have inconsistent schemas during app creation.
- If labels are added before the primary text field is selected, labels are wiped when the primary text field is chosen.
- No data found when applying a filter that yields fewer results than current pagination.
- The new labels added in Onboarding page not being visible in Studio.
- Saving Settings in Label schema section will be in animation endlessly when clicking on Next button second time.
- Reviewer workflow broken for sequence tagging overlapping spans.
- File Download Endpoint (
/api/download
) enables user to read arbitrary files. - DAG drawing is misleading when the first nodes in the downstream block are not connected to the last node in the upstream block.
- Bulk accepting suggested LFs takes a while causing confusion to user if they were succesful.
Sequence tagging
- Opening metadata breaks up the highlighting in Studio.
- When annotating, "label validation error" shows up in Studio.
- Error seen in studio when a sequence tagging app is loaded.
- Confusion and clarity matrix numbers dont add up for sequence tagging applications.
MLF/Data Platform
- Populator saves two different sets of arrow files at the same path when applying.
- "Override LF labels with ground truth labels where available" doesn't override LF labels with the negative GT labels.
- Annotation Filter with
Annotation = negative/GT = negative
doesn't work in word-based PDF. - Create LF fails silently for LLM LF.
- In a populated evaluation report, the user who created the reported is the requester user, rather than the original user from the populated app.
- Swap from tags to slices in
get_node_data
. - Snorkel Flow-generated labels are not updated for word-based PDF.
- Cannot Modify OpenAI module with both
base_url
andazure_endpoint
. - GT is not getting propagated to labels column in the df.
- Error when deleting custom models.
- Uploading large csv shows unrelated errors during data upload.
- Getting internal server error when doing S3 upload of large csv data.
- Dev split is not maintained for PDF IE when copying an application.
- Server error seen when creating an LF.
- Error seen while duplicating chatbot app.
- Studio
/dataset
and/advanced-lf-state
error out with a cryptic error message when there is no span. - In a hierarchical app, all but the first
LabelFilters
stay connected with the model node after adding another node in between.