Skip to main content
Version: 0.91

Utilizing embeddings

In Snorkel Flow, you can utilize embeddings across your end-to-end workflow: understanding your data, label programmatically, and model training.

Computing embeddings during app creation

You can select to compute embeddings when you create an application or at anytime when you are developing your model in Studio.

When you create an application using the guided workflow, the compute embeddings option will appear at the Development Settings step. The default is set to Yes and calculating on the selected primary text/PDF field using SimCSE.

App-creation-dev-settings.webp

Compute embeddings in embedding home

You may also opt to compute additional embeddings during model development in the Studio page. Click on the Embeddings dropdown on the top right of the studio page, and then click on the add new embeddings button to open the Embedding Home modal.

Embedding-home-entry.webp

In the modal, select desired parameters and click on compute.

Embedding-home-add-new.webp

You can track calculation progress in the table or by hovering over the Embedding dropdown. When the calculation is complete, you can use the new embeddings for embedding-powered features in Studio.

Embedding-home-load.webp

Embedding-powered features in studio

Below are a list of features powered by embeddings.

  • Embedding-based cluster workflow
  • Data summary - embedding map / top n-gram
  • Embedding field for model training

Embeddings-cluster-view.webp

Embeddings-data-summary.webpEmbeddings-model-field.webp

Supported use cases

Below shows the availability of embedding compute and embedding-powered features per use case in the 0.91 release:

 Raw text classificationRaw text candidate extractionRaw text sequence taggingPDF classificationPDF extractionAll multi-label cases
Compute embeddings
Embedding-based cluster workflow (Cluster view)
Embedding map (in Data Summary Pane)
Top n-gram (in Data Summary Pane)
Embedding field for model training