Skip to main content
Version: 0.91

Custom LFs

Custom LF builders via python SDK

This section describes how to register a custom LF builder using Jupyter Notebook, and then use that LF builder in Studio.

Defining the classes

To develop a custom LF builder, we need to define 2 classes; a custom LF template and a custom LF template schema. To define a custom LF template schema, we write a class via the SDK that inherits from theTemplateSchemaclass, in which we define the input fields, their data types, optional default values, and optional validators. This is similar to a pydantic BaseModel class.

For example, if we want to define a custom LF builder that can label based on a numeric field value comparison (say, "if field num_images in the dataset is > 4, then label as spam"), we will need 3 inputs to the LF: the data field (e.g., num_images), the comparison operator (e.g., >), and the value to compare (e.g., 4).

Along with these, a doc string in this custom LF template schema class can optionally specify UI rendering in Studio. For example, the order in which the inputs should render can be specified by the docstring. Input fields should be enclosed in square brackets. If input names has a word field in it, a dropdown list with all available data fields is rendered in the UI. If only integer fields should be rendered, specify (int) beside the name, similarly (string) for string fields. See the doc string Field [field](int) is [operator] [value] used below.

from templates.utils import TemplateSchema
from pydantic import validator
class CustomNumericTemplateSchema(TemplateSchema):
'''
Field [field](int) is [operator] [value]
'''
field: str
operator: str = "="
value: float

@validator("operator")
def check_operator(cls, op: str) -> str:
if op not in [">", "<", "="]:
raise ValueError(f"Invalid operator {op}. Should be one of the [ >, <, =]")
return op

Validator defined here will check for valid operator input. The doc string in the above example shows up as follows:

custom_lf_1.webp

To define a custom LF template, we write a class via the SDK that inherits from the Templateclass. We need to define several necessary fields and methods as part of this class:template_type,abbreviation,description,menu_type,docs_link,template_schema,__init__,check.

template_typeis a unique identifier for LF builder and should always start with custom

abbreviationis used to distinguish between LF builders in Studio easily.

descriptionto specify a description for the LF builder. Shown in the UI help tool tip.

menu_type is a dictionary with 3 required fields : name, value and category. Value should be a unique template_type. Category is used to group LF builders in Studio and filter by application type.

docs_link is an optional user guide link for the LF builder. This is linked in the help tool tip

template_schema is the template schema class name that should be used with this LF builder.

__init__is a required method that gets the TemplateConfig input which is a dict of all the inputs defined in the TemplateSchema. Fetch this input data so it can be used in the check method.

checkis a required method that should return List[Tuple[int, int]] for sequence tagging labeling (span votes with span_start and span_end. Example : [(1, 5), (6, 10)])

or a boolean for all other kinds of labeling (true if the LF should vote for the label, false otherwise). See the example below:

from templates.template import Template, TemplateConfig
import pandas as pd
class CustomNumericTemplate(Template):
"""Custom LF Template based on numerical comparisons"""

template_type = "custom_numeric"
abbreviation = "NUM"
description = "If [field] has a numeric value that is [>, =, etc.] [value], then label."
menu_type = {
"name": "Custom Numeric Builder",
"value": template_type,
"category": ["custom"],
}
docs_link = (
"/user_guide/reference/label/lf_builders.html#"
+ "numeric-builder-numerical-lfs"
)
template_schema = "CustomNumericTemplateSchema"

def __init__(self, template_config: TemplateConfig) -> None:
"""LF Template based on numeric comparisons.

Heuristic:
"If x[field] [OPERATOR] [value], return True"
"""
self._field = template_config["field"]
self.op_str = template_config["operator"]
self._value = template_config["value"]

def check(self, x: pd.Series) -> bool:
field_value = x[self._field]
if self.op_str == "=":
return field_value == self._value
elif self.op_str == ">":
return field_value > self._value
elif self.op_str == "<":
return field_value < self._value
return False

Registering the class

After defining the LF Template and LF Template schema class, we are now ready to register them for use in Snorkel Flow.

We do so by calling the snorkelflow.client.add_lf_template_class SDK method and pass in our class definitions CustomNumericTemplate and CustomNumericTemplateSchema.

import snorkelflow.client as sf
sf.add_lf_template_class(
CustomNumericTemplate, CustomNumericTemplateSchema,
)