Ground truth annotations
While Snorkel Flow allows you to programmatically generate labels for your training split, having GT labels for data points in other splits is still recommended.
- In the dev split, GT-labeled examples can assist with discovering and iterating on LFs.
- In the valid and test splits, GT-labeled examples can be used to evaluate model performance and facilitate error analysis.
Annotation options
There are two ways to update GT labels in Snorkel Flow:
- Updating GT directly on the Label page
- Creating annotations in the Annotation Mode and then committing those to GT labels
Annotation mode (label page)
On the Label page, GT can be updated via the top bar, the Annotation side bar (when enabled), or both, depending on the application type (details below).
To enter Annotation Mode on the Label page, click the icon on the right side of the dataviewer. From there, ensure that Edit ground truth is selected in the drop down menu.
To exit Annotation Mode, click the icon on the top-right corner of the pane.
Annotation Studio
Annotation Studio consists of two pages—Batch and Annotate. GT labels are applied via this high-level process:
- Create and assign “batches” of data points to specific users to annotate via the Batch page.
- Users view the data points belonging to that batch in the Annotate page and propose GT labels for each.
- If desired, annotations from more than one source can be aggregated (e.g., via majority vote) on the Batch page to form a new set of annotations.
- Finally, a specific annotation set is selected and committed as GT for those data points.
For more information about Annotation Studio, see Annotation Studio overview.
Navigation while annotating
Within the Label page (in Annotation Mode) or Annotation page, navigation is the same. You can navigate through data points using the same keyboard shortcuts applicable elsewhere in the Label page.
- For classification tasks, use the down or right arrows to advance to the next data point, and the up or left examples to go the previous one.
- For tasks with spans, use up/down to go to the previous/next span within a document, and left/right to go to the previous/next document.
You may apply filters to the data as usual to limit which data points are displayed.
Adding annotations
The steps for adding a GT label in the Label page or proposed GT label in the Annotation page are the same, but vary by application type.
Note that in addition to GT labels, you may add notes (for free-form comments) or tags (for arbitrary grouping).
(Classification) adding ground truth
To modify the GT label of an individual data point, you can use the dropdown at the top of the dataviewer when viewing data in Record View. Alternatively, you can enable Annotation Mode and click on the appropriate label in the Annotation pane or use the corresponding shortcut key (indicated to the left of the label).
(Extraction) adding span ground truth
In extraction tasks, GT labels are collected for individual spans. See the Information extraction: Extracting execution dates from contracts tutorial for a description of how these spans are created.
Once GT extractions exist, you can easily propagate these labels to individual spans using the auto-fill functionality made available in the Annotation pane. Auto-fill works by performing a string comparison between document level extractions and all spans found in their corresponding documents. If a span matches a specified GT extraction, it will receive a Positive label. All others in that document will receive a Negative label.
(Entity classification) adding entity ground truth
For entity classification tasks, you can assign a GT label per entity in each document using the Annotation pane. As with extraction tasks, you may also explicitly update the GT label for a given span in Record view by selecting it and updating the GT label dropdown. Note that this will update the corresponding entity label as well.
Auto-fill works similarly to Extraction tasks, except for labels are propagated based on entity assignments, rather than string matches.
Evaluating annotations
(Extraction) span ground truth conflicts
In extraction tasks, there may be conflicting GT labels for the same span text due to labeling errors. You can use the snorkelflow.client.get_span_level_ground_truth_conflicts
SDK method to get the list of span-level GT conflicts. You can also view the span GT label distribution for a span with conflicting span GTs in the GUI by placing the cursur on the pink dot to the left of Ground truth label.