Multi-polar LFs
This page walks through how to create multi-polar LFs and multi-polar crowdworker LFs, and explains when you should take advantage of this flexibility to create highly complex LFs.
Multi-polar LFs can be used to incorporate advanced logic into LFs, allowing an individual LF to vote differently based on different conditions. These LFs can be developed in the in-platform notebook.
What is a multi-polar LF?
The key difference between multi-polar LFs and a typical uni-polar LF is the ability to vote differently based on different conditions:
- Uni-polar labeling functions can only output a single label from a given label space.
- Multi-polar labeling functions can output any label from a given label space.
Here are some examples of multi-polar LFs:
- Code that evaluates the similarity between a data point and each class in a consistent way, and then votes for the “most similar” class.
- Crowdworker labels from the same source/vendor.
One of the objectives of multi-polar LFs is to reduce the number of LFs that are needed for high-cardinality use cases (e.g., a classification application with a large number of possible classes).
Once you create a multi-polar LF, you can view per-class statistics in the LF summary pane. Click the icon on the right side of the Label column to view the coverage and precision metrics by class name.
data:image/s3,"s3://crabby-images/107ad/107ade7419c4d1e48420bbdf4d5619e20cdc8232" alt=""
Why use multi-polar LFs?
Multi-polar LFs are generally more performant and provide more accurate measurement than creating a larger number of separate uni-polar LFs.
Performance
Some labeling functions are naturally multi-polar, for example, crowdworker LFs, model-based LFs, and embedding-based LFs. The cardinality for these functions tends to be high, which causes usability and performance issues if you’re restricted to using only uni-polar labeling functions.
For single label applications, being limited to purely uni-polar labeling functions means creating as many LFs as the cardinality of the application. This means that for an application with a cardinality of over 1000, 1000 LFs would be necessary. Multi-polar LFs reduce the amount of labels that are needed and therefore increase performance.
Accurate Measurement
Using multi-polar LFs provides one consolidated measure of precision and coverage.
How to create a multi-polar LF
You can create multi-polar LFs using the in-platform notebook. Use @labeling_function()
to define your LF, and sf.add_code_lf()
to add the LF to the model node for your application.
Here is an example of a multi-polar LF for a multi-label classification application:
@labeling_function()
def sample_multi_polar_code_lf(x):
if x.release_countries is not None and "Hong Kong" in x.release_countries:
return {
'World Cinema': "PRESENT",
'Japanese Movies': "ABSTAIN",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}
elif x.languages is not None and "Japanese" in x.languages:
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "PRESENT",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}
elif x.runtime < 50:
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "ABSTAIN",
'Short Film': "PRESENT",
'Black and White': "ABSTAIN"
}
return {
'World Cinema': "ABSTAIN",
'Japanese Movies': "ABSTAIN",
'Short Film': "ABSTAIN",
'Black and White': "ABSTAIN"
}
lf = sf.add_code_lf(node, sample_multi_polar_code_lf, is_multipolar=True)
Multi-polar crowdworker LFs
The upload format for multi-polar crowdworkers supports input for multiple labels at a time. Compared to the standard crowdworker builder, the multi-polar crowdworker allows for both single-label and multi-label inputs.
Here is an example CSV file of a multi-polar crowdworker for a multi-label classification application:
uid,label
doc::10009844,"{""Black and White"": ""PRESENT"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}"
doc::10016997,"{""Black and White"": ""ABSTAIN"", ""Japanese Movies"": ""ABSENT"", ""Short Film"": ""PRESENT"", ""World Cinema"": ""ABSTAIN""}"
doc::10004664,"{""Black and White"": ""ABSTAIN"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}"
Note
The CSV file does not have to cover all x_uids
. If an x_uid
is not covered, a crowdworker LF votes abstain. For example, doc::1000708
is not included in the CSV file above. Therefore the LF votes abstain for all ground truth classes.
For multi-label applications, the label of any row should be a serialized dictionary with all ground truth classes.
doc::10009844,"{""Black and White"": ""PRESENT""}"
is not a valid form. You cannot just say that one of the classes is present.doc::10009844,"{""Black and White"": ""PRESENT"", ""Japanese Movies"": ""ABSTAIN"", ""Short Film"": ""ABSTAIN"", ""World Cinema"": ""ABSTAIN""}"
is a valid form. All ground truth classes are specified in the label.