Skip to main content
Version: 0.95

Custom labeling function builders

In Snorkel Flow, labeling functions (LFs) play a key role in programmatically labeling data. When predefined LF builders do not meet your specific needs, you can create custom labeling function builders using the Python SDK within Jupyter Notebook. This guide explains how to define and register custom LF builders that can be used directly in Snorkel Flow Studio, allowing you to extend the labeling logic for your application.

The process involves defining custom LF templates and schemas, which help structure the logic and data for labeling, and then registering them for use in your labeling workflows.

Defining the classes

Creating a custom LF builder requires defining 2 key components:

  1. A custom LF template class.
  2. A custom LF template schema class.

Defining a Custom LF Template schema

The schema defines the input fields, their data types, optional default values, and optional validators. This is similar to a pydantic BaseModel class.

For example, if you want to define an LF that labels data based on the following numeric field comparison:
“If the field num_images in the dataset is greater than 4, then label it as spam.”

This LF needs the following inputs:

  • Data field: num_images
  • Comparison operator: >
  • Value to compare: 4

Along with these, a doc string in this custom LF template schema class can optionally specify UI rendering in Develop (Studio). For example, the order in which the inputs should render can be specified by the docstring. Input fields should be enclosed in square brackets. If input names has a word field in it, a dropdown list with all available data fields is rendered in the UI. If only integer fields should be rendered, specify (int) beside the name, similarly (string) for string fields. See the doc string Field [field](int) is [operator] [value] used below.

from templates.utils import TemplateSchema
from pydantic import validator
class CustomNumericTemplateSchema(TemplateSchema):
'''
Field [field](int) is [operator] [value]
'''
field: str
operator: str = "="
value: float

@validator("operator")
def check_operator(cls, op: str) -> str:
if op not in [">", "<", "="]:
raise ValueError(f"Invalid operator {op}. Should be one of the [ >, <, =]")
return op

In this example:

  • The @validator ensures that only valid operators are used.
  • The docstring in the above example defines how the UI will render the fields in Snorkel Flow Studio:

custom_lf_1.webp

Defining a Custom LF Template

Next, define the custom LF template class, which contains the logic for the labeling function. This class must inherit from the Template class in the Snorkel Flow SDK, and must implement several required fields and methods:

  • template_type: A unique identifier for the LF builder. It should start with custom.
  • abbreviation: A short identifier to distinguish LF builders.
  • description: A description visible in the UI tooltip to explain the LF builder’s function.
  • menu_type: A dictionary to categorize the LF builder in the UI.
  • template_schema: The schema class name that defines the input fields for this LF builder.
  • __init__: Initializes the LF template with the schema inputs.
  • check: Contains the labeling logic and returns a boolean value or a list of spans.

Here’s an example of a custom numeric LF builder based on the schema defined earlier:

from templates.template import Template, TemplateConfig
import pandas as pd
class CustomNumericTemplate(Template):
"""Custom LF Template based on numerical comparisons"""

template_type = "custom_numeric"
abbreviation = "NUM"
description = "If [field] has a numeric value that is [>, =, etc.] [value], then label."
menu_type = {
"name": "Custom Numeric Builder",
"value": template_type,
"category": ["custom"],
}
docs_link = (
"/user_guide/reference/label/lf_builders.html#"
+ "numeric-builder-numerical-lfs"
)
template_schema = "CustomNumericTemplateSchema"

def __init__(self, template_config: TemplateConfig) -> None:
"""LF Template based on numeric comparisons.

Heuristic:
"If x[field] [OPERATOR] [value], return True"
"""
self._field = template_config["field"]
self.op_str = template_config["operator"]
self._value = template_config["value"]

def check(self, x: pd.Series) -> bool:
field_value = x[self._field]
if self.op_str == "=":
return field_value == self._value
elif self.op_str == ">":
return field_value > self._value
elif self.op_str == "<":
return field_value < self._value
return False

In this example:

  • The __init__ method pulls the inputs (field, operator, and value) from the schema.
  • The check method runs the actual comparison logic and returns True or False based on the field value and operator.

Registering the class

After defining the LF Template and LF Template schema class, we are now ready to register them for use in Snorkel Flow.

We do so by calling the snorkelflow.client.add_lf_template_class SDK method and pass in our class definitions CustomNumericTemplate and CustomNumericTemplateSchema.

import snorkelflow.client as sf
sf.add_lf_template_class(
CustomNumericTemplate, CustomNumericTemplateSchema,
)

By following this guide, you can create custom LF builders to fit your specific application needs. This process provides flexibility for handling complex labeling scenarios programmatically, giving you more control over your data labeling in Snorkel Flow.