2024.R2 LTS (v0.93) Snorkel Flow release notes

What's new

Annotation

Updated to show only dataset views that match the batch label schemas.
Added persistent filters for multi-schema annotation.
Integrated annotator progress.
Integrated review page endpoints for multi-schema annotation.
Integrated annotation-rate.
Added prompt prefix.
Added ranking view for annotations.
Added grouping by column to batch data endpoint.
Added multi-label support for multi-schema annotation.
Enabled multi-label label schema for multi-schema annotation.
Updated multi-highlight styles to support multi-schema annotation use cases.
Added Batch to Python SDK.
Added LabelSchema to Python SDK.

Data + App Management

Improved text label with success, error and loading state.
Added text label option in create label schema in dataset.
Added save and retrieve freeform label schema in dataset annotation.
Added multiple retrieved contexts in single response view.
Added support of highlight in sequence tagging label schema in response view.
Added comments textbox for single response view.
Hid displayed columns for single response view.

Data + Slices

Added Evaluate page and navigation.
Added evaluation report CRUD endpoints.
Added support for slices in LLM evaluation.
Added endpoints and SDK functions for slice creation and membership.
Added db table for evaluation reports.
Improved the get methods in DatasetTagTypeStore.
Integrated Model Metrics evaluation page.

Enterprise Infrastructure

Created new SAML Helm configs for AD roles sync.
Added RBAC gating for multi-schema annotation export endpoint.
Added RBAC gate logging endpoint using user selectors.
Added endpoints for RBAC resource types activation, deactivation, and enabled endpoints.
Updated roles on login via SAML attributes.
Added UserSelector and UserSelectorMapping for tenant scoped roles.

Front-end Infrastructure

Created a new homepage.
Added data Access Control: Download Controls.
Added authentication settings for session timeout.
Added User Uploaded Files page.

Foundation models

Updated integrations in foundation model management.
Added remote local inference services.
Added render data for foundation model management integrations and models.
Added Snorkel Flow SDK support for third-party fine-tuning.
Added support for gated foundation models in local inference service.
Added confidence support for OpenAI.
Added Azure OpenAI and Azure ML inference support.
Added support for llama fine-tuning for SageMaker.
Added endpoint for setting foundation model integration secrets.
Added APIs for third-party fine-tuning and inference.

Machine Learning

PDFs

Added page limit filter to application template.
Added operator to map table predictions to spans.
Added check on document size.
Added error message when page size of more than 50 is entered.
Added sample by docs option for PDF tasks.
Added Table Detection operator.
Added support for all text2text models in PDF extraction predictor.

Text

Added text label space to store freeform text annotation.

SDK

Added export controls to jupyterhub notebook.
Added Python SDK support for annotation batch export.

Improvements and bug fixes

Annotation

Added tooltip to multi-label confirm button.
Created distinction for multi-schema annotation datasets from non-multi-schema annotation datasets.
Enforced one label schema for ranking view.
Created opt-in for multi-schema annotation.
Improved dataset response structure and SDK.
Fixed bug where ranking view did not show data.
Updated the selected text to display on click.
Updated reviewer mode to show the correct user-selected labels.
Fixed unresponsive annotation page.
Added delete operation for sequence tagging multi-schema annotation.
Modified sequence tagging label popup to display correctly in LLM response view.
Added tooltip for free-text in reviewer mode.
Create filter configuration for multi-label label schema.
Fixed sequence tagging label for that wasn’t updating correctly.
Fixed label-schema form errors for sequence tagging.
Hid other annotations while in annotation mode.
Updated URL fields to display correctly and sort labels.
Excluded text label schemas from aggregate annotations.
Fixed query parameters for batch export.
Updated user role permissions for multi-schema annotation.
Fixed export for text label schema.
Fixed multi-label bug.

General

Added support for boolean values in EmbeddedTable.
Fixed multi-label tooltip clipped by container.
Fixed sidebar overlapping issues.
Other various additional bug fixes and improvements.

Data + App Management

Updated to show text label as type in label schema table.
Fixed bugs in dataset batch setup for text label.
Fixed saved text label that wasn’t showing after loading a page.
Fixed bug that caused label schema list to get cut off from bottom while scrolling.

Data + Slices

Reserved "All Data" and "No Slice Assigned" slice names.
Updated the evaluation page URL printed by the SDK.
Updated the LLM evaluation to include all slices in the dataset by default.
Added support for float type in modular data viewer.
Fixed bug that called Populate-labeling function-suggestion in loop if it fails in an application.
Fixed broken sorting in ModelMetricsTable to make overall and no-slice rows sticky.
Set the correct page when switching from Record to Prompt view in Prompt- modular data viewer.
Fixed metrics/evaluation page integration bug.
Fixed the nesting order of evaluation scores.
Fixed metric name from “reward model” to “quality model” and evaluation page URL.
Updated acceptance rate calculation to exclude null and UNKNOWN values.
Fixed highlighting bugs.

Enterprise Infrastructure

Moved supported_features into access token.
Added default value for _watcher in HTTPClient in sf context.
Created explicit global context in notebook and prevent segv on exit.
If the authentication token is missing for SSO OIDC provider, you are now forwarded to login.
Fixed file browser button for notebook iframe.
Normalized jupyterhub username path.

User interface

Fixed file collection tab state.
Added the Create Application button when no recent applications exist.
Removed the feature flag for user file upload.
Shared refreshed access token across tabs during inactivity timeout.
Added option for no-timeout in authorization settings.
Removed single quotes from labels input label.
Removed (Optional) from Labels input label in add block form.
Fixed DOM-mangling issue during Datasets page refresh.
Fixed loading spinner on data accordion save during onboarding.
Fixed bug that caused selecting file collection to do nothing.

Foundation models

Added a feature flag on foundation model (FM) management and prompt modular data viewer.
Added filtering for third-party fine-tuning and inference input data by x_uids.
Added invalid OpenAI model name errors.
Added support for FM management for external models across workspaces.
Removed auto-advance option for PDF document prompting.
Added keyboard shortcut support for PDF prompt result pagination. You can use the left and right arrow keys to change pages and the up and down arrow keys to move across labels.
Surfaced correct error messaging for FM providers.
Updated to reset prompt state on create prompt labeling function.
Removed color props option for PDF prompt.
Updated to start pagination from page 1 in prompt results.
Updated to reset CreateExternalLLMModal internal state on close.
Fixed foundation model management styling issues.
Updated to render full integrations list and full set of tags in model card.
Added Support Previewing Multi-Field prompts.
Modified to use default workspace for external model endpoints.
Added message to show when no matches are found.
Added handle prompt prefix when loading previous prompt.
Updated ground truth from document view in PDF prompt view.

Label + Training

Added validation for sufficient data points for model training.

Machine Learning

PDFs

Added better error handling for Tesseract Featurizer for PDFs in onboarding.
Fixed preprocessing issues that weren’t working for classification.
Added better error handling.
Fixed autoscroll bug.
Fixed PDF page offset bug.
Fixed the recall metrics for LLM label functions.

Text

Fixed the NOT logic in sequence tagging.

SDK

Added getter function to get slices by name.

Deprecations and breaking changes

Machine Learning

PDFs

Renamed get_skipped_datapoints SDK function to get_preprocessing_issues.

Image

Remove ImageClassification template.

SDK

Deleted sf.update_node_label_space.
Remove the expired add_node_and_commit. Use add_node.
Remove the expired make_and_commit_op_version_to_node. Use commit_builtin_operator.
Remove the expired uncommit_op_version_from_node. Use uncommit_operator.
Remove the expired commit_op_version. Use commit_custom_operator or commit_builtin_operator.
Rename commit_custom_operator_to_node to commit_custom_operator.
Deprecated sf.commit_custom_operator_to_node. Use sf.commit_custom_operator.

Known Issues

In multi-schema annotations, overlapping spans for distinct label schemas do not work correctly.
Reviewers and Annotators have permissions to create Notebooks. In the future, these permissions will be restricted to Developers+.
In Studio, there can be a memory leak when switching between Studio views.
When onboarding a classification application with a PDF that includes more than 100 pages, the file is ignored but the onboarding continues without notification.
In the Suggested LFs pane, the Columns to use dropdown menu doesn’t always populate with columns. When this happens, the Generate LFs button is grayed out and prevents you from updating the suggested LFs.
To workaround this bug, switch pages to records beyond the current revord view or switch through snippet view. This action triggers Snorkel Flow to refresh the data, which updates the missing columns data.

What's new​

Annotation​

Data + App Management​

Data + Slices​

Enterprise Infrastructure​

Front-end Infrastructure​

Foundation models​

Machine Learning​

PDFs​

Text​

SDK​

Improvements and bug fixes​

Annotation​

General​

Data + App Management​

Data + Slices​

Enterprise Infrastructure​

User interface​

Foundation models​

Label + Training​

Machine Learning​

PDFs​

Text​

SDK​

Deprecations and breaking changes​

Machine Learning​

PDFs​

Image​

SDK​

Known Issues​

What's new

Annotation

Data + App Management

Data + Slices

Enterprise Infrastructure

Front-end Infrastructure

Foundation models

Machine Learning

PDFs

Text

SDK

Improvements and bug fixes

Annotation

General

Data + App Management

Data + Slices

Enterprise Infrastructure

User interface

Foundation models

Label + Training

Machine Learning

PDFs

Text

SDK

Deprecations and breaking changes

Machine Learning

PDFs

Image

SDK

Known Issues