Snorkel Flow v0.94 (LTS) release notes
Breaking Changes
SDK
- The
snorkelflow.ingest.docs.dirtree_to_parquet()
function now creates the parquet file in a workspace-scoped directory.
Deprecations
Annotation
- Deprecated
annotation_rate
from the annotation overview.
Foundation models
- Deprecated OpenAI GPT-3 Fine-tuning.
SDK
- Deprecated the Prompt Regex Generator.
Upcoming SDK deprecation
-
The following SDK MinIO file upload functions will be replaced or removed. You can replace them with new functions to be released in v0.95 LTS. The new functions upload file(s) to a workspace-scoped directory:
- [upcoming deprecation notice]
upload_to_minio()
; useupload_file
instead - [upcoming deprecation notice]
upload_fileobj_to_minio()
- [upcoming deprecation notice]
upload_files_to_minio()
- [upcoming deprecation notice]
upload_dir_to_minio()
; useupload_dir
instead - [new]
download_file()
- [new]
download_dir()
- [new]
upload_dir()
- [new]
upload_file()
- [new]
list_dir()
- [upcoming deprecation notice]
-
The MinIO Console will be deprecated as of Q4 2024, but will still be accessible until the end of Q1 2025 to support backwards compatibility for older workflows. After Q1 2025, access to MinIO Console will be removed. Instead, use the new Files feature for uploading PDFs and images. Support for arbitrary file type upload and basic file management utilities within the Files feature will be provided by end of Q1 2025 to meet MinIO Console core feature parity. For files that are not PDFs and images, you can continue to use Snorkel Flow's SDK.
-
The MinIO client will be removed from the Notebook environment. For example, the following function will no longer be available to use in a notebook:
!mc <command/arguments>
-
The Boto3 SDK client to upload and download files from MinIO will also be deprecated. Users are encouraged to use the new Files feature.
Bug fixes
Machine Learning: PDF
- Removed rendering of PDFs in Studio record view for extraction apps.
- Fixed PDF processing to prevent unwanted lines being added to PDF.
- Added PDF URL details to error messages from Tesseract operator.
- Fixed an issue where Studio showed all checkboxes in native PDF docs as unchecked.
Foundation Models
- Refreshed prompt templates.
- Supported custom inference services without logprobs.
- Fixed clarity and reduced spam in FM usage tracking errors.
- Increased the SageMaker session duration.
- Set the
selectedLabel
correctly from the previous prompt in sequence tagging. - Resolved prompting bugs for Amazon SageMaker.
- Prevented showing a "no models" state during loading.
- Explicitly requested fields data from the classification prompt view.
User Interface
- Fixed API calls that used the wrong ID after login.
- Fixed an issue where token metrics didn’t show up on some apps.
- Rounded label density to a reasonable number of decimal places.
- Fixed an issue where the suggested LF modal sometimes showed an empty columns list.
- Prevented unexpected logouts.
- Fixed scroll and pagination for Application List.
- Fixed scrolling to PDF Spans for single pages.
- Fixed the inactivity timeout.
- Fixed the ability to manually resize the Studio left sidebar.
- Corrected the casing of model metric names.
Annotation
- Restricted dataset views to classification label schemas only.
- Correctly included the last label during LS creation.
- Adjusted the positioning of the sequence tagging popup.
- Fixed an issue where editing a label changed all labels.
- Defaulted to page view for word-based PDFs.
- Prevented filters from being carried over between Studio and annotation.
- Restricted annotators to viewing only their progress.
- Filtered text label schemas from inter-annotator agreement.
- Fixed an issue where saved annotations didn’t show up in the ranking view.
- Filtered and searched within the sequence tagging popup.
- Displayed the correct completion status for a batch.
- Removed sort parameters for dataset calls in annotation studio.
- Fixed URL parsing.
Data + Slices
- Fixed offset calculation for existing highlights.
- Removed
UNKNOWN
as a label option for sequence tagging. - Filtered out unknown labels for sequence tagging annotation
- Enabled SDK documentation for the Slice module.
- Made highlights display correctly across new lines.
- Memoized query params generation to prevent infinite dataset fetching.
Label + Training
- Correctly copied the model schema from the payload.
- Threw the "Only binary tasks support thresholding." error earlier.
SDK
- Fixed bug that allowed Annotator to have access to all Developer permissions in Notebooks.
- Identified the default workspace name when changed from "default."
- Handled cases with no predictions at
sf.get_node_data
. - Raised an error when the join column was not unique at
FineTuningApp.import_ground_truth
.
Provisioning
- Fixed Helm ingress indentation.
ML Tasks: Text
- Added support for auto-generating negative labels in node block copying.
- Updated the Sequence NER LF template to more explicitly store the correct fields.
Supervision
- Fetched the correct LF configuration from the backend.
Features
Foundation Models
- Enabled prompt FMs to extract multiple entities for sequence tagging.
- Added SDK
sf.prompt_fm()
andsf.prompt_fm_over_dataset()
methods for prompting FMs. - Added
model_name
and detail view support for thesf.get_external_model_endpoint
SDK method. - Added warm start support to GenAI apps.
Provisioning
- Enabled conditional use of file-based prefect secrets in model-trainer.
- Enabled conditional use of file-based secrets for JupyterHubProxy.
- Enabled conditional use of file-based influxDB secrets in flow-ui.
- Allowed conditional use of file-based Database, influxdb environment variables.
- Enabled conditional use of file-based secrets for JupyterHub.
- Supported conditional use of file-based
TDM_CONN_STR
environment variables. - Mounted MinIO secret keys as files when
secretsFromFile
was true. - Enabled conditional use of file-based secrets for
tdm-api
. - Enabled
TDM API
to read a JupyterHub secret from a mounted file. - Removed MinIO Secret Environments from
InferenceService
. - Enabled conditional use of file-based secrets for telegraf.
- Enabled conditional use of file-based secrets for grafana.
- Removed MinIO Secret Environments from
RayHead
andStuioRayHead
. - Removed MinIO Secret Environments from
JupyterHub
andJupyterHubProxy
. - Enabled reading a postgres password from a file.
Data + Slices
- Added custom metrics support to Finetuning App.
- Supported custom metrics with the evaluation report.
- Transitioned cluster view to MDV.
User Interface
- Improved styles for class-level metrics when there were too many.
- Added audio file cell.
- Added admin settings to gate access to Notebooks/Deployments at instance level.
- Updated the Jobs link in the sidebar to open a new page instead of a modal.
Annotation
- Grouped retrieved contexts of a prompt in ranking view for annotation.
- Introduced annotation of retrieved context in single response view and ranking view.
- Made label table labels editable.
Label + Training
- Added per-class token metrics.
Machine Learning: PDF
- Added a new layout-aware parsing library for native PDFs. All new applications will use the new parser.
Image
- Enabled improved computer vision onboarding experience.
Data + App Management
- Added support for authenticated S3 and GCS buckets.
Improvements
SDK
- Enabled conditional use of file-based MinIO secrets.
- Migrated MinIO SDK functionality to HTTP Storage Proxy.
- Enabled the SDK to support datasets with non-text fields for prompting.
- Enabled sequence tagging prompting for Text2Text models via SDK.
Machine Learning: PDF
- Improved PDF rendering performance in Studio.
- Added filter to remove data with parsing issues to the PDF dataset templates.
User Interface
- Added the year to the date format on the application page.
- Improved memory utilization of Studio.
Annotation
- Made word-based label schemas selectable.
- Sorted model and training fields in annotation mode.
- Made annotator names available via tooltip in annotator progress.
Label + Training
- Modified the base behavior of training Transformer models.
Data + Slices
- Added more explanation to
/download
error messages.
Foundation Models
- Displayed abstained examples during prompt previews in sequence tagging.
- Displayed FM outputs for examples during prompt previews in sequence tagging.
- Enabled extraction of multiple entities in sequence tagging.
- Updated the prompt template when the model changed.
- Created prompt templates for sequence tagging.
- Displayed time remaining for in-progress LFs.
- Enabled SageMaker models to be configured with a provider prefix.
- Verified SageMaker authentication setup for fine-tuning.
- Improved performance for preview prompt LFs for all OpenAI models when rate limit values are set. Expect 10x-20x speed improvements.
Known Issues
- When DAG (directed acyclic graph) is stale, users can still make a deployment.
- If labels are added before the primary text field is selected, labels are removed when the primary text field is chosen.
- The Reviewer workflow is broken for sequence tagging overlapping spans.
- "No data found" when applying a filter that yields fewer results than current pagination.
- You cannot switch labels from one label schema to another.