snorkelflow.operators
Functionality for writing custom operators (preprocessors and postprocessors) in Python.
In addition to built-in operators, one can develop a custom operator like below and use it as part of an application. In a nutshell, a custom operator is a Python function that accepts and outputs dataframes, decorated by one of Snorkel Flow operator decorators.
@pandas_featurizer(
name="add_num", input_schema={}, output_schema={"added_num": int},
resources={"num": 7})
def add_num(df: pd.DataFrame, num: int) -> pd.DataFrame:
df["added_num"] = num
return df
Generally, parameters for custom operators are hard-coded within the user-defined function when they are developed for simplicity, e.g., "num": 7
in the example above.
To create custom operators that take in parameters when they are used, you can instead create custom operator classes.
The diagram below should help you decide which approach to take and which decorator/class to use.
Special decorators
It’s recommended to use special decorators rather than generic decorators whenever possible as the former is easier to use and less error-prone.
For example, use field_extractor
rather than dask_extractor
.
| Decorator for generating candidate spans for extraction tasks. |
| Decorator for splitting PDFs into groups of pages. |
| Decorator for filtering rows of a dataframe. |
| Decorator for converting span text to a standard format. |
| Decorator for aggregating span-level model predictions to document-level predictions for extraction tasks. |
| Decorator for aggregating lower-level model predictions to higher-level predictions. |
Generic decorators
If none of the special decorators above suits your need, you can use one of the generic decorators below.
In Snorkel Flow, operators, whether built-in or custom, are applied to a Dask dataframe, which is composed of many smaller Pandas dataframes (see here for more details).
It’s recommended to use pandas_featurizer
or pandas_operator
, which allows the user-defined function to deal with each Pandas dataframe at a time for simplicity unless you have to work with the whole Dask dataframe.
| Decorator for adding columns to a dataframe. |
| Decorator that wraps a function mapping a |
| Decorator that wraps a function mapping a |
| Decorator to define Dask Combiner from a function. |
| Decorator to define Dask Extractor from a function. |
Classes
Operator class that adds one or more columns (features) to a DataFrame. | |
| Operator class that performs some transformation on dask dataframes. |