2024.R2 LTS (v0.93) Snorkel Flow release notes
What's new
Annotation
- Updated to show only dataset views that match the batch label schemas.
- Added persistent filters for multi-schema annotation.
- Integrated annotator progress.
- Integrated review page endpoints for multi-schema annotation.
- Integrated annotation-rate.
- Added prompt prefix.
- Added ranking view for annotations.
- Added grouping by column to batch data endpoint.
- Added multi-label support for multi-schema annotation.
- Enabled multi-label label schema for multi-schema annotation.
- Updated multi-highlight styles to support multi-schema annotation use cases.
- Added Batch to Python SDK.
- Added LabelSchema to Python SDK.
Data + App Management
- Improved text label with success, error and loading state.
- Added text label option in create label schema in dataset.
- Added save and retrieve freeform label schema in dataset annotation.
- Added multiple retrieved contexts in single response view.
- Added support of highlight in sequence tagging label schema in response view.
- Added comments textbox for single response view.
- Hid displayed columns for single response view.
Data + Slices
- Added Evaluate page and navigation.
- Added evaluation report CRUD endpoints.
- Added support for slices in LLM evaluation.
- Added endpoints and SDK functions for slice creation and membership.
- Added db table for evaluation reports.
- Improved the get methods in DatasetTagTypeStore.
- Integrated Model Metrics evaluation page.
Enterprise Infrastructure
- Created new SAML Helm configs for AD roles sync.
- Added RBAC gating for multi-schema annotation export endpoint.
- Added RBAC gate logging endpoint using user selectors.
- Added endpoints for RBAC resource types activation, deactivation, and enabled endpoints.
- Updated roles on login via SAML attributes.
- Added
UserSelector
andUserSelectorMapping
for tenant scoped roles.
Front-end Infrastructure
- Created a new homepage.
- Added data Access Control: Download Controls.
- Added authentication settings for session timeout.
- Added User Uploaded Files page.
Foundation models
- Updated integrations in foundation model management.
- Added remote local inference services.
- Added render data for foundation model management integrations and models.
- Added Snorkel Flow SDK support for third-party fine-tuning.
- Added support for gated foundation models in local inference service.
- Added confidence support for OpenAI.
- Added Azure OpenAI and Azure ML inference support.
- Added support for llama fine-tuning for SageMaker.
- Added endpoint for setting foundation model integration secrets.
- Added APIs for third-party fine-tuning and inference.
Machine Learning
PDFs
- Added page limit filter to application template.
- Added operator to map table predictions to spans.
- Added check on document size.
- Added error message when page size of more than 50 is entered.
- Added sample by docs option for PDF tasks.
- Added Table Detection operator.
- Added support for all text2text models in PDF extraction predictor.
Text
- Added text label space to store freeform text annotation.
SDK
- Added export controls to jupyterhub notebook.
- Added Python SDK support for annotation batch export.
Improvements and bug fixes
Annotation
- Added tooltip to multi-label confirm button.
- Created distinction for multi-schema annotation datasets from non-multi-schema annotation datasets.
- Enforced one label schema for ranking view.
- Created opt-in for multi-schema annotation.
- Improved dataset response structure and SDK.
- Fixed bug where ranking view did not show data.
- Updated the selected text to display on click.
- Updated reviewer mode to show the correct user-selected labels.
- Fixed unresponsive annotation page.
- Added delete operation for sequence tagging multi-schema annotation.
- Modified sequence tagging label popup to display correctly in LLM response view.
- Added tooltip for free-text in reviewer mode.
- Create filter configuration for multi-label label schema.
- Fixed sequence tagging label for that wasn’t updating correctly.
- Fixed label-schema form errors for sequence tagging.
- Hid other annotations while in annotation mode.
- Updated URL fields to display correctly and sort labels.
- Excluded text label schemas from aggregate annotations.
- Fixed query parameters for batch export.
- Updated user role permissions for multi-schema annotation.
- Fixed export for text label schema.
- Fixed multi-label bug.
General
- Added support for boolean values in EmbeddedTable.
- Fixed multi-label tooltip clipped by container.
- Fixed sidebar overlapping issues.
- Other various additional bug fixes and improvements.
Data + App Management
- Updated to show text label as type in label schema table.
- Fixed bugs in dataset batch setup for text label.
- Fixed saved text label that wasn’t showing after loading a page.
- Fixed bug that caused label schema list to get cut off from bottom while scrolling.
Data + Slices
- Reserved "All Data" and "No Slice Assigned" slice names.
- Updated the evaluation page URL printed by the SDK.
- Updated the LLM evaluation to include all slices in the dataset by default.
- Added support for float type in modular data viewer.
- Fixed bug that called Populate-labeling function-suggestion in loop if it fails in an application.
- Fixed broken sorting in ModelMetricsTable to make overall and no-slice rows sticky.
- Set the correct page when switching from Record to Prompt view in Prompt- modular data viewer.
- Fixed metrics/evaluation page integration bug.
- Fixed the nesting order of evaluation scores.
- Fixed metric name from “reward model” to “quality model” and evaluation page URL.
- Updated acceptance rate calculation to exclude null and UNKNOWN values.
- Fixed highlighting bugs.
Enterprise Infrastructure
- Moved
supported_features
into access token. - Added default value for
_watcher
in HTTPClient insf
context. - Created explicit global context in notebook and prevent
segv
on exit. - If the authentication token is missing for SSO OIDC provider, you are now forwarded to login.
- Fixed file browser button for notebook iframe.
- Normalized jupyterhub username path.
User interface
- Fixed file collection tab state.
- Added the Create Application button when no recent applications exist.
- Removed the feature flag for user file upload.
- Shared refreshed access token across tabs during inactivity timeout.
- Added option for no-timeout in authorization settings.
- Removed single quotes from
labels
input label. - Removed
(Optional)
from Labels input label in add block form. - Fixed DOM-mangling issue during Datasets page refresh.
- Fixed loading spinner on data accordion save during onboarding.
- Fixed bug that caused selecting file collection to do nothing.
Foundation models
- Added a feature flag on foundation model (FM) management and prompt modular data viewer.
- Added filtering for third-party fine-tuning and inference input data by
x_uids
. - Added invalid OpenAI model name errors.
- Added support for FM management for external models across workspaces.
- Removed auto-advance option for PDF document prompting.
- Added keyboard shortcut support for PDF prompt result pagination. You can use the left and right arrow keys to change pages and the up and down arrow keys to move across labels.
- Surfaced correct error messaging for FM providers.
- Updated to reset prompt state on create prompt labeling function.
- Removed color props option for PDF prompt.
- Updated to start pagination from page 1 in prompt results.
- Updated to reset CreateExternalLLMModal internal state on close.
- Fixed foundation model management styling issues.
- Updated to render full integrations list and full set of tags in model card.
- Added Support Previewing Multi-Field prompts.
- Modified to use default workspace for external model endpoints.
- Added message to show when no matches are found.
- Added handle prompt prefix when loading previous prompt.
- Updated ground truth from document view in PDF prompt view.
Label + Training
- Added validation for sufficient data points for model training.
Machine Learning
PDFs
- Added better error handling for Tesseract Featurizer for PDFs in onboarding.
- Fixed preprocessing issues that weren’t working for classification.
- Added better error handling.
- Fixed autoscroll bug.
- Fixed PDF page offset bug.
- Fixed the recall metrics for LLM label functions.
Text
- Fixed the NOT logic in sequence tagging.
SDK
- Added
getter
function to get slices by name.
Deprecations and breaking changes
Machine Learning
PDFs
- Renamed
get_skipped_datapoints
SDK function toget_preprocessing_issues
.
Image
- Remove ImageClassification template.
SDK
- Deleted
sf.update_node_label_space
. - Remove the expired
add_node_and_commit
. Useadd_node
. - Remove the expired
make_and_commit_op_version_to_node
. Usecommit_builtin_operator
. - Remove the expired
uncommit_op_version_from_node
. Useuncommit_operator
. - Remove the expired
commit_op_version
. Usecommit_custom_operator
orcommit_builtin_operator
. - Rename
commit_custom_operator_to_node
tocommit_custom_operator
. - Deprecated
sf.commit_custom_operator_to_node
. Usesf.commit_custom_operator
.
Known Issues
- In multi-schema annotations, overlapping spans for distinct label schemas do not work correctly.
- Reviewers and Annotators have permissions to create Notebooks. In the future, these permissions will be restricted to Developers+.
- In Studio, there can be a memory leak when switching between Studio views.
- When onboarding a classification application with a PDF that includes more than 100 pages, the file is ignored but the onboarding continues without notification.
- In the Suggested LFs pane, the Columns to use dropdown menu doesn’t always populate with columns. When this happens, the Generate LFs button is grayed out and prevents you from updating the suggested LFs.
To workaround this bug, switch pages to records beyond the current revord view or switch through snippet view. This action triggers Snorkel Flow to refresh the data, which updates the missing columns data.