Skip to main content

2024.R1 STS (v0.90) Snorkel Flow release notes

note

STS releases exclusively serve demos and POVs; no customer migrations to v0.90 will occur. These release notes are internal access only, and are intended for the field team to help understand new user facing features, improvements and bug fixes, and deprecations.

What's new

Studio

  • Snippet view has been added to the dataviewer when creating applications using the guided flow.
    • Highlights are supported.
    • You can select what view you want from the dropdown.

Foundation model suite

  • For applications with FM capability, the Prompts LFs tab is displayed by default when opening an application (vs. the Patterns LF tab).

Image

  • Added support for fast model training for multi-label image models.
    • Added a setting for a default negative:positive GT ratio.
  • Added a new page size setting that enables the user to set the number of data points that are displayed in a page within the Grid view.

Deployment

  • Added support for deploying a model to the Databricks Unity Catalog workspace.
    • The deployment name must be in the form <catalog>.<schema>.<model>. For other model registries, a period (.) cannot be part of the name.

Improvements and bug fixes

Studio

  • When creating an application using the guided flow, the define label schema step is now before the define preprocessors step. The new guided flow steps are as follows:
    1. Application name + description
    2. Define data and task type
    3. Define label schema
    4. Define preprocessors
    5. Define preprocessors
  • De-cluttered the sample data preview for PDF applications by removing unnecessary fields.
  • LF votes are now updated automatically in the dataviewer when ground truth is updated (in the dataviewer).
  • Sorting the table in the Labeling Functions pane now works properly.
  • The Export Studio dataset option now correctly downloads the model predictions from the selected model (vs. the current model in Studio).
  • Fixed a pagination issue that appeared on applications that contain very large data points.

Model training

  • The search range for the model decision threshold when threshold tuning has increased by a factor of 5, extending the search granularity from 0.05 increments to 0.01 increments. This broader search area will enhance the ability to capture optimal decision thresholds effectively.
  • When training a custom model, clearer and more regular status updates display under the progress bar.
  • When building a model from a previous AutoML model, the model config under Model options now correctly shows the config from the best model from the AutoML run.

Foundation model suite

  • The Enable local LLM inference service option was added to the replicated deployment options.
  • Clearer error messaging for local inference.
  • Clearer naming convention for prompt LFs (e.g., abbreviation FFP -> PROMPT)
  • When previewing a prompt on a sample of data points, ground truth examples are now guaranteed to be included in the sample.
  • For multi-label applications, all labels are now mapped correctly (vs. just the first label).
  • Fixed issue where prompt builder snippets didn’t always show highlights in sequence tagging applications.

Image

  • Fast models will now only display classes that they were trained on when using the model-based LF composition or when viewing fast model analysis metrics.
  • Users are alerted to switch into the valid split to view fast model analysis.
  • The Text and Image tools were merged into a singular Similarity tool.
  • Fixed issue where selecting a label during LF composition could result in displaying both present and absent matches, when it should only display one or the other.
  • Fixed issue where the similarity histogram calculation was incorrectly filtered by view filters.

PDF

  • The PDF prompt builder can now read files from MinIO and files with an HTTPS path.
  • The Rich Document Expression LF builder is now not case sensitive by default.
  • For PDF applications where page splitting is used, the spans are now sorted in numerical page order in the Span view (instead of alphabetical).

Embeddings

  • If embeddings are created from embeddings home, then they cannot be used in model training. These fields are no longer shown as options for input fields when configuring a model.

Deployment

  • A model signature is now included by default in both the Snorkel Flow UI and the SDK (signature=True is now the default in MLflowDeployment.create). See the mlflow docs for more details about the signature.
  • Deployments are now compatible with Python 3.9 and 3.10
  • Deployments can now be run in Snorkel Flow if the source application has been deleted.

Deprecations and breaking changes

SDK

  • Deprecated sf.poll_job_status_with_timeout in favor of sf.poll_job_status.
  • Deprecated sf.export_workflow_config.

Deployment

  • Deployments that were created in v0.21 or earlier will no longer run in the platform. Please export and run them outside the platform, or recreate them for in-platform inference.
  • Deployments that have a removed built-in operator (such as SpanJoiner) as part of the DAG will no longer run in the platform. Please export and run them outside the platform, or recreate them for in-platform inference.
  • Deployments created between v0.22 and v0.51 still run in the platform but will return a dataframe with a different index and different names for monitoring-related columns.

These breaking changes are a result of needing to upgrade networkx from 2.3 to 2.4 in order to remain compatible with the underlying libraries and Python.

Currently, when creating a deployment, the Snorkel Flow source code is unique to the deployment:

Now, only the artifacts will be unique to a deployment, and all deployments will use the latest Snorkel Flow source code, which will always be compatible with the underlying libraries and Python. As a result, deployments that were made a long time ago are now incompatible with the current Snorkel Flow source code. The bullet list above lists out the criteria for the deployments that will no longer work.

Known issues