Custom LFs
Custom LF builders via python SDK
This section describes how to register a custom LF builder using Jupyter Notebook, and then use that LF builder in Studio.
Defining the classes
To develop a custom LF builder, we need to define 2 classes; a custom LF template and a custom LF template schema. To define a custom LF template schema, we write a class via the SDK that inherits from theTemplateSchema
class, in which we define the input fields, their data types, optional default values, and optional validators. This is similar to a pydantic BaseModel class.
For example, if we want to define a custom LF builder that can label based on a numeric field value comparison (say, "if field num_images in the dataset is > 4, then label as spam"), we will need 3 inputs to the LF: the data field (e.g., num_images), the comparison operator (e.g., >), and the value to compare (e.g., 4).
Along with these, a doc string in this custom LF template schema class can optionally specify UI rendering in Studio. For example, the order in which the inputs should render can be specified by the docstring. Input fields should be enclosed in square brackets. If input names has a word field in it, a dropdown list with all available data fields is rendered in the UI. If only integer fields should be rendered, specify (int) beside the name, similarly (string) for string fields. See the doc string Field [field](int) is [operator] [value] used below.
from templates.utils import TemplateSchema
from pydantic import validator
class CustomNumericTemplateSchema(TemplateSchema):
'''
Field [field](int) is [operator] [value]
'''
field: str
operator: str = "="
value: float
@validator("operator")
def check_operator(cls, op: str) -> str:
if op not in [">", "<", "="]:
raise ValueError(f"Invalid operator {op}. Should be one of the [ >, <, =]")
return op
Validator defined here will check for valid operator input. The doc string in the above example shows up as follows:
To define a custom LF template, we write a class via the SDK that inherits from the Template
class. We need to define several necessary fields and methods as part of this class:template_type
,abbreviation
,description
,menu_type
,docs_link
,template_schema
,__init__
,check
.
template_type
is a unique identifier for LF builder and should always start with custom
abbreviation
is used to distinguish between LF builders in Studio easily.
description
to specify a description for the LF builder. Shown in the UI help tool tip.
menu_type
is a dictionary with 3 required fields : name, value and category. Value should be a unique template_type. Category is used to group LF builders in Studio and filter by application type.
docs_link
is an optional user guide link for the LF builder. This is linked in the help tool tip
template_schema
is the template schema class name that should be used with this LF builder.
__init__
is a required method that gets the TemplateConfig input which is a dict of all the inputs defined in the TemplateSchema. Fetch this input data so it can be used in the check method.
check
is a required method that should return List[Tuple[int, int]] for sequence tagging labeling (span votes with span_start and span_end. Example : [(1, 5), (6, 10)])
or a boolean for all other kinds of labeling (true if the LF should vote for the label, false otherwise). See the example below:
from templates.template import Template, TemplateConfig
import pandas as pd
class CustomNumericTemplate(Template):
"""Custom LF Template based on numerical comparisons"""
template_type = "custom_numeric"
abbreviation = "NUM"
description = "If [field] has a numeric value that is [>, =, etc.] [value], then label."
menu_type = {
"name": "Custom Numeric Builder",
"value": template_type,
"category": ["custom"],
}
docs_link = (
"/user_guide/reference/label/lf_builders.html#"
+ "numeric-builder-numerical-lfs"
)
template_schema = "CustomNumericTemplateSchema"
def __init__(self, template_config: TemplateConfig) -> None:
"""LF Template based on numeric comparisons.
Heuristic:
"If x[field] [OPERATOR] [value], return True"
"""
self._field = template_config["field"]
self.op_str = template_config["operator"]
self._value = template_config["value"]
def check(self, x: pd.Series) -> bool:
field_value = x[self._field]
if self.op_str == "=":
return field_value == self._value
elif self.op_str == ">":
return field_value > self._value
elif self.op_str == "<":
return field_value < self._value
return False
Registering the class
After defining the LF Template and LF Template schema class, we are now ready to register them for use in Snorkel Flow.
We do so by calling the snorkelflow.client.add_lf_template_class
SDK method and pass in our class definitions CustomNumericTemplate
and CustomNumericTemplateSchema
.
import snorkelflow.client as sf
sf.add_lf_template_class(
CustomNumericTemplate, CustomNumericTemplateSchema,
)