Operators: Transform and process your data
In Snorkel Flow, operators perform transformations over DataFrames. Consider an application where you want to classify emails as SPAM or HAM (not spam). Before you begin your modeling work, you can clean up some of the data to make it easier to work with.
For example, emails might have a lot of whitespace and odd formatting that makes them hard to work with. As a result, you might want to trim whitespace in the main body of each email. With your email data is in a DataFrame, where the body is represented in a column called text
, you can write labeling functions and train a model. Finally, you can add a post-processor to filter out all SPAM emails and leave only non-spam ones.
Each of these steps can be performed with an operator. Internally, operators act over Dask DataFrames as input, perform transformations, and output a Dask DataFrame.
Snorkel Flow has a series of built-in operators to perform commonly used transformations, and also allows users to write their own user-defined operators. Using Application Studio, you can commit operators to different nodes in the application graph to compose end-to-end applications.