Skip to main content
Version: 0.96

snorkelflow.studio

Functionality for writing custom labeling functions.

Basic LF decorator

See Code LFs.

Advanced LF decorators

Labeling functions that use common NLP Libraries

To write an LF that just requires a SpaCy or Stanza model, use the @spacy_labeling_function or @stanza_labeling_function decorators. The decorated function must take nlp as an additional argument, which stands for the model.

tip
We recommend SpaCy for most cases, since Stanza may be slow to apply.
from snorkelflow.studio import spacy_labeling_function

@spacy_labeling_function(name="spacy_companies")
def spacy_companies(x, nlp):
companies = {"Microsoft", "Yahoo", "Apple", "Google"}
doc = nlp(x.text)
for ent in doc.ents:
if ent.text in companies:
return "LABEL"
return "UNKNOWN"

sf.add_code_lf(node, spacy_companies, label="LABEL")
from snorkelflow.studio import stanza_labeling_function

@stanza_labeling_function(name="stanza_companies")
def stanza_companies(x, nlp):
companies = {"Microsoft", "Yahoo", "Apple", "Google"}
doc = nlp(x.text)
for ent in doc.ents:
if ent.text in companies:
return "LABEL"
return "UNKNOWN"

sf.add_code_lf(node, stanza_companies, label="LABEL")

Passing external resources using a resources_fn

When using a very large resource like a SpaCy model, passing it via resources can cause the serialized labeling function to become very large. In those cases, you can pass the resource via a function using the @resources_fn_labeling_function decorator. Write a function that creates the resources and returns a dictionary mapping resource names to the resources. Then use the decorator with that function, and the labeling function being decorated can take the resources (denoted by the resource names) as additional arguments. The resources function is only run once, and the results are made available to all invocations of the labeling function.

from snorkelflow.studio import resources_fn_labeling_function

# Function to compute a spacy model, and the spacy module itself, and return them
# as a dictionary.
def get_nlp():
import spacy
return {"nlp": spacy.load("en_core_web_sm"), "spacy": spacy}

# Decorate function that takes nlp and spacy (keys from the get_nlp dict above)
# as additional arguments.
@resources_fn_labeling_function(name="my_other_lf", resources_fn=get_nlp)
def starts_with_noun(x, nlp, spacy):
doc = nlp(x.txt)
first_word = doc[0]
return "LABEL_A" if not first_word.pos == spacy.parts_of_speech.NOUN else "UNKNOWN"

# Function to download and return a nltk tokenizer and parser.
def get_nltk_tokenizer_parser():
import nltk
nltk.download("punkt", download_dir="/tmp/nltk")
nltk.download("averaged_perceptron_tagger", download_dir="/tmp/nltk")
return {"tokenize": nltk.word_tokenize, "parse": nltk.pos_tag}

@resources_fn_labeling_function(
name="check_vbp", resources_fn=get_nltk_tokenizer_parser,
)
def check_vbp(x, tokenize, parse):
pos_pairs = parse(tokenize(x.text))
for token, pos in pos_pairs:
if pos == "VBP":
return "LABEL"
return "UNKNOWN"

Classes

resources_fn_labeling_function([name, ...])

Subclass of @labeling_function decorator that allows passing resources as functions.

spacy_labeling_function([name, resources])

Convenience decorator for spacy based labeling functions.

stanza_labeling_function([name, resources])

Convenience decorator for stanza based labeling functions.