Utilizing embeddings
In Snorkel Flow, you can utilize embeddings across your end-to-end workflow: understanding your data, label programmatically, and model training.
Computing embeddings during app creation
You can select to compute embeddings when you create an application or at anytime when you are developing your model in Studio.
When you create an application using the guided workflow, the compute embeddings option will appear at the Development Settings step. The default is set to Yes and calculating on the selected primary text/PDF field using SimCSE.
Compute embeddings in embedding home
You may also opt to compute additional embeddings during model development in the Studio page. Click on the Embeddings dropdown on the top right of the studio page, and then click on the add new embeddings button to open the Embedding Home modal.
In the modal, select desired parameters and click on compute.
You can track calculation progress in the table or by hovering over the Embedding dropdown. When the calculation is complete, you can use the new embeddings for embedding-powered features in Studio.
Embedding-powered features in studio
Below are a list of features powered by embeddings.
- Embedding-based cluster workflow
- Data summary - embedding map / top n-gram
- Embedding field for model training
Supported use cases
Below shows the availability of embedding compute and embedding-powered features per use case in the 0.91 release:
Raw text classification | Raw text candidate extraction | Raw text sequence tagging | PDF classification | PDF extraction | All multi-label cases | |
---|---|---|---|---|---|---|
Compute embeddings | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
Embedding-based cluster workflow (Cluster view) | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
Embedding map (in Data Summary Pane) | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
Top n-gram (in Data Summary Pane) | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
Embedding field for model training | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |