Skip to main content
Version: 0.91

Introduction to labeling functions (LFs)

Overview

Snorkel Flow users use labeling function (LFs) votes to programmatically label data. These LFs specify certain conditions, as well as mappings to corresponding class labels if those conditions are met. If a data point satisfies the conditions specified by an LF, that mapping (or vote) is included with all other signals by the Snorkel Flow labeling model to decide how to label the data point. If the conditions are not met for a data point, the LF abstains (votes UNKNOWN). LFs can be renamed, edited, or deleted.

The key data programming abstraction in Snorkel, a labeling function (LF), is a programmatic rule or heuristic that assigns labels to unlabeled data. Each label function works on one class: to vote whether a data point has a certain label.

Snorkel Core Label Model is responsible for estimating LF accuracies and aggregating them into training labels without relying on ground truth labels.

LF composer toolbar

LFs can be composed using several tools, including:

1. Search based

2. Cluster based

3. Foundation model based

LF summary

On the current split, the LF Summary pane contains the following options that can be selected using the Columns icon in the top right of the tab:

  • Labels: The label assigned by the labeling function

  • Voted (specific to the datapoint selected in Data Viewer tab):

    • Gray ✓ voted, but not GT exists
    • Green ✓ voted correctly based on assigned GT
    • Red x voted incorrectly based on assigned GT
  • Precision (GT): The (number of correct LF votes) / (total number of LF votes), based on the existing GT labels. This is only an estimate based on a potentially very small dev set. The learned precision Snorkel Flow will use to generate probabilistic labels for data points won’t be calculated until a label package is created and can be viewed on the LF Packages page.

  • Precision (Estimated): The same as Precision (GT) but using estimated GT labels (based on LF votes) wherever manual GT labels do not exist. (Total LF votes for class in agreement with estimated GT labels) / (Total LF votes for class).

  • Recall (GT): The (number of correct LF votes) / (total number of examples in class), based on the existing GT labels.

  • Recall (Estimated): The same as Recall, but using estimated GT labels (based on LF votes) wherever manual GT labels do not exist. (Total LF votes for class in agreement with majority) / (Total majority votes for class).ƒ

  • Coverage: The percentage of data points in the current split that this labeling function votes on.

  • Count (GT): The total number of LF votes generated by this LF on the current split.

LF settings

For each labeling function, there are a few settings available in the three-dot menu to the left of each saved LF row in the Active LF list.

  • Rename this LF: Opens an input for renaming an LF. Note that LF names have to be unique.
  • Trust this LF: Trust this LF will tell the labeling model to default to the label specified by this LF for datapoints it votes on.
  • Archive this LF: Sends the LF to the Inactive LF list.