Skip to main content
Version: 0.93

Utilize embeddings

This page walks through how to create and use embeddings in Snorkel Flow. You can utilize embeddings across your end-to-end workflow: understanding your data, labeling programmatically, and model training.

There are two ways to compute embeddings:

Once you have computed your embeddings, you can then utilize them in Studio!

Compute embeddings during application creation

When you create an application using the guided workflow, the compute embeddings option will appear at the Development Settings step. This is set to Yes by default, with the field selection being the primary text/PDF field. The embeddings are calculated using SimCSE.

App-creation-dev-settings.webp

Compute embeddings in embedding home

You may also opt to compute additional embeddings during model development in Studio. To do so, click the Embeddings dropdown on the top-right corner of your screen, then click Add new embeddings. For sequence tagging applications, you can calculate RAG embeddings. For more information, see Prompting with document chunking (RAG).

Embedding-home-entry.webp

In the modal, select the desired parameters, then click Compute.

Embedding-home-add-new.webp

You can track calculation progress in the table or by hovering over the Embeddings dropdown. Once the calculation is complete, you can use the new embeddings throughout Studio.

Embedding-home-load.webp

Embedding-powered features in Studio

Here is a list of features in Studio that are powered by embeddings:

Embeddings-cluster-view.webp

Embeddings-data-summary.webp

  • You can use the embedding field as an input when model training.

Embeddings-model-field.webp

Supported use cases

The table below shows the availability of embedding compute and embedding-powered features per use case in the 2024.R2 LTS (v0.93) release:

 Raw text classificationRaw text candidate extractionRaw text sequence taggingPDF classificationPDF extractionAll multi-label cases
Compute embeddings (simcse, spacy, clip)
Compute RAG embeddings
Embedding-based cluster workflow (Cluster view)
Embedding map (in Data Summary Pane)
Top n-gram (in Data Summary Pane)
Embedding field for model training