Skip to main content
Version: 0.95

Multi-polar LFs

This page walks through how to create multi-polar LFs and multi-polar crowdworker LFs, and explains when you should take advantage of this flexibility to create highly complex LFs.

Multi-polar LFs can be used to incorporate advanced logic into LFs, allowing an individual LF to vote differently based on different conditions. These LFs can be developed in the in-platform notebook.

What is a multi-polar LF?

The key difference between multi-polar LFs and a typical uni-polar LF is the ability to vote differently based on different conditions:

  • Uni-polar labeling functions can only output a single label from a given label space.
  • Multi-polar labeling functions can output any label from a given label space.

Here are some examples of multi-polar LFs:

  • Code that evaluates the similarity between a data point and each class in a consistent way, and then votes for the “most similar” class.
  • Crowdworker labels from the same source/vendor.

One of the objectives of multi-polar LFs is to reduce the number of LFs that are needed for high-cardinality use cases (e.g., a classification application with a large number of possible classes).

Once you create a multi-polar LF, you can view per-class statistics in the LF summary pane. Click the icon on the right side of the Label column to view the coverage and precision metrics by class name.

Why use multi-polar LFs?

Multi-polar LFs are generally more performant and provide more accurate measurement than creating a larger number of separate uni-polar LFs.

Performance

Some labeling functions are naturally multi-polar, for example, crowdworker LFs, model-based LFs, and embedding-based LFs. The cardinality for these functions tends to be high, which causes usability and performance issues if you’re restricted to using only uni-polar labeling functions.

For single label applications, being limited to purely uni-polar labeling functions means creating as many LFs as the cardinality of the application. This means that for an application with a cardinality of over 1000, 1000 LFs would be necessary. Multi-polar LFs reduce the amount of labels that are needed and therefore increase performance.

Accurate Measurement

Using multi-polar LFs provides one consolidated measure of precision and coverage.

How to create a multi-polar LF

You can create multi-polar LFs using the in-platform notebook. Use @labeling_function() to define your LF, and sf.add_code_lf() to add the LF to the model node for your application.

Here is an example of a multi-polar LF for a multi-label classification application:

@labeling_function()
def sample_multi_polar_code_lf(x):
if x.release_countries is not None and "Hong Kong" in x.release_countries:
return {
'World Cinema': "PRESENT",
'Japanese Movies': "ABSTAIN",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}
elif x.languages is not None and "Japanese" in x.languages:
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "PRESENT",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}
elif x.runtime < 50:
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "ABSTAIN",
'Short Film': "PRESENT",
'Black and White': "ABSTAIN"
}
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "ABSTAIN",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}

lf = sf.add_code_lf(node, sample_multi_polar_code_lf, is_multipolar=True)

Multi-polar crowdworker LFs

The upload format for multi-polar crowdworkers supports input for multiple labels at a time. Compared to the standard crowdworker builder, the multi-polar crowdworker allows for both single-label and multi-label inputs.

Here is an example CSV file of a multi-polar crowdworker for a multi-label classification application:

uid,label
doc::10009844,"{""Black and White"": ""PRESENT"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}"
doc::10016997,"{""Black and White"": ""ABSTAIN"", ""Japanese Movies"": ""ABSENT"", ""Short Film"": ""PRESENT"", ""World Cinema"": ""ABSTAIN""}"
doc::10004664,"{""Black and White"": ""ABSTAIN"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}"

Note

The CSV file does not have to cover all x_uids. If an x_uid is not covered, a crowdworker LF votes abstain. For example, doc::1000708 is not included in the CSV file above. Therefore the LF votes abstain for all ground truth classes.

For multi-label applications, the label of any row should be a serialized dictionary with all ground truth classes.

  • doc::10009844,"{""Black and White"": ""PRESENT""}" is not a valid form. You cannot just say that one of the classes is present.
  • doc::10009844,"{""Black and White"": ""PRESENT"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}" is a valid form. All ground truth classes are specified in the label.