Creating good labeling functions
Developing labeling functions that balance precision vs coverage is task dependent and not always clear. First and foremost, it is critical to leverage the iterative development cycle in the Snorkel Flow platform, making full use of the tools in the Analyze pane. To compliment those tools, this article will go over a number of Best Practices and Tips for creating solid labeling functions.
Best practices
Labeling data when starting new task
-
Page through a few examples in Label Studio to get ideas for labeling functions. As you move through, you can also assign Ground Truth (GT) labels easily with keyboard shortcuts.
These GT labels will also be used for error analysis, particularly identifying what types of examples your model currently makes mistakes on.
-
Your goal should be to write several high empirical precision labeling functions, with at least one per class.
It is not necessary to achieve 100% coverage
-
You can use filters to hone in on specific datapoints.
Filtering by ground truth label will limit you to only data points with ground truth labels – you may also want to look through the rest of the dataset to get ideas!
-
The empirical precision in Label Studio is just a rough estimate given the data in current split. Increasing the accuracy of your labeling functions will help the performance of your model, and as a loose guideline**, we recommend refining any LFs that show up as “red” in the interface**. To refine your LF:
-
Use the View Incorrect and View Correct filter buttons to see data points where the LF was incorrect. Try making the LF pattern more specific and re-saving the LF to overwrite the old one.
-
Add conditions to your LF using the Add condition and Negate options in the advanced options dropdown menu.
-
If you suspect your LF might be more accurate than the current estimates show, try filtering to where your LF voted but the ground truth label is UNKNOWN and supplying new ground truth labels for these examples. This might also reveal ways to refine your LF.
You can optionally view metrics on (GT) instead of estimated metrics
-
Using the analysis pane to create LFs
After training a model, the Analysis pane will guide the refinement of existing LFs and the creation of new ones.
The Clarity Matrix shows how your model’s predictions fared relative to the labels from your labeling functions (de-noised and combined by Snorkel Flow) at a high level:
-
The upper-left box shows data points where the model is incorrect because the labeling functions were incorrect.
- If this is the darkest box, then you should click here to try to refine those incorrect labeling functions.
-
The middle-left box shows data points where the labeling functions were correct but the model was incorrect.
- If this is the darkest box and you have spent some time iterating on your LFs, then you should click here to return to the models page and try a more powerful and/or better-tuned model.
-
The lower-left box shows data points where all of the labeling functions abstained, and the model did not successfully generalize.
- If this is the darkest box, then you should click here to write new labeling functions that cover these data points.
The Confusion Matrix shows where the model confused one class for another (within the off-diagonal buckets). You can click on these to filter data points and write or edit labeling functions to correct this error mode. For example, the below shows that 2 of the 15 loans are instead classified as employment. Clicking on that box would be a good place to start refining your labeling functions.
The Error Correlations view traces the model errors back to labeling functions most correlated with these errors. If an LF is correlated with a large number of errors, you may want to refine or delete it.
The Label Distributions view shows the relationship between the empirically-measured class distribution, and that of your labeling functions and model. If there are significant discrepancies, you may want to write more / fewer labeling functions for that class, and/or select Oversample data to match class distribution on valid split under the Tune hyperparameter options on the Models page.
The longer you iterate, the more likely your LFs are to be tuned to the particular data points that you’ve labeled in your dev set (i.e., overfitting). For this reason, it’s a good practice to periodically refresh your dev set by resampling your dev set from the train split using the resample option on the split selector. Note that since these data points will have already been seen by the model, we suggest training a new model to maximize the usefulness of the analysis tools.
To resample your data, click Resample data... from the split selector dropdown.
For more information about resampling data, see Application data control pane.
Tips and tools
Quantity over quality
Labeling functions do not need to be perfect- though having better labeling functions will in general improve the performance of your model, as will having more, and more diverse, labeling functions.
Labeling functions will in general be noisy and incomplete in their coverage. Snorkel Flow will denoise these labeling functions as well as handle conflicts and overlaps among them, and then train a final machine learning model to generalize to data points your labeling functions do not cover.
You can click on the "View LF Coverage" button to see the coverage by each class.
Task summary statistics
On the upper left corner of the screen, you can see the performance of the current set of labeling functions on the selected split as well as dataset statistics.
Remove poorly performing labeling functions
While labeling functions that perform better than average can improve your end results, labeling functions that perform below average can add noise to your model and harm its performance. One way to handle this is to use the estimated accuracies in Label as a rough guide for very badly performing labeling functions, and try to remove or improve them, e.g. by adding boolean conditions to narrow them and make them more precise.
Improving existing Labeling Functions
After you’ve defined some labeling functions, you can improve their performance even further by using the below strategies:
-
Inject domain expertise directly
- Goal: Increase training set coverage with less overfitting risk (not looking at specific examples for LF ideation)
- Strategy: Reference external resources (e.g. 3+ digit SIC code categories from https://siccode.com/) to write keyword/regex labeling functions
-
Leverage “Trusted Labeling Functions” for label correction
- Goal: “Correct” noisy labeling functions (functions with low precision, and perhaps high coverage)
- Strategy: Write targeted labeling functions to override noisy labels, and select “Trust this labeling function” in the UI
-
Pass-thru servable labeling functions as modeling features
-
Goal: Augment modeling signal with high quality labeling functions
-
Strategy: Selectively pass LF outputs that are:
- Servable: Available at inference time
- High Precision: To avoid model overfitting
-
How: When defining model training configuration settings, in “Train New Model” pane, under “Feature Libraries”, click “True”, before clicking “Train Model” button.
-
-
Detect overfitting using estimated accuracies
- Goal: Mitigate overfitting in settings without a validation split
- Strategy: Remove or prune labeling functions with a large gap between Snorkel-estimated train and dev accuracy