snorkelflow.studio
Functionality for writing custom labeling functions.
Basic LF decorator
Basic LF decorator
See Code LFs.
Advanced LF decorators
Advanced LF decorators
Labeling functions that use common NLP Libraries
Labeling functions that use common NLP Libraries
To write an LF that just requires a SpaCy or Stanza model,
use the @spacy_labeling_function
or @stanza_labeling_function
decorators.
The decorated function must take nlp
as an additional argument,
which stands for the model.
from snorkelflow.studio import spacy_labeling_function
@spacy_labeling_function(name="spacy_companies")
def spacy_companies(x, nlp):
companies = {"Microsoft", "Yahoo", "Apple", "Google"}
doc = nlp(x.text)
for ent in doc.ents:
if ent.text in companies:
return "LABEL"
return "UNKNOWN"
sf.add_code_lf(node, spacy_companies, label="LABEL")
from snorkelflow.studio import stanza_labeling_function
@stanza_labeling_function(name="stanza_companies")
def stanza_companies(x, nlp):
companies = {"Microsoft", "Yahoo", "Apple", "Google"}
doc = nlp(x.text)
for ent in doc.ents:
if ent.text in companies:
return "LABEL"
return "UNKNOWN"
sf.add_code_lf(node, stanza_companies, label="LABEL")
Passing external resources using a resources_fn
Passing external resources using a resources_fn
When using a very large resource like a SpaCy model, passing it via resources can cause
the serialized labeling function to become very large. In those cases,
you can pass the resource via a function using the @resources_fn_labeling_function
decorator.
Write a function that creates the resources and returns a dictionary mapping resource
names to the resources.
Then use the decorator with that function, and the labeling function being decorated
can take the resources (denoted by the resource names) as additional arguments.
The resources function is only run once, and the results are made available
to all invocations of the labeling function.
from snorkelflow.studio import resources_fn_labeling_function
# Function to compute a spacy model, and the spacy module itself, and return them
# as a dictionary.
def get_nlp():
import spacy
return {"nlp": spacy.load("en_core_web_sm"), "spacy": spacy}
# Decorate function that takes nlp and spacy (keys from the get_nlp dict above)
# as additional arguments.
@resources_fn_labeling_function(name="my_other_lf", resources_fn=get_nlp)
def starts_with_noun(x, nlp, spacy):
doc = nlp(x.txt)
first_word = doc[0]
return "LABEL_A" if not first_word.pos == spacy.parts_of_speech.NOUN else "UNKNOWN"
# Function to download and return a nltk tokenizer and parser.
def get_nltk_tokenizer_parser():
import nltk
nltk.download("punkt", download_dir="/tmp/nltk")
nltk.download("averaged_perceptron_tagger", download_dir="/tmp/nltk")
return {"tokenize": nltk.word_tokenize, "parse": nltk.pos_tag}
@resources_fn_labeling_function(
name="check_vbp", resources_fn=get_nltk_tokenizer_parser,
)
def check_vbp(x, tokenize, parse):
pos_pairs = parse(tokenize(x.text))
for token, pos in pos_pairs:
if pos == "VBP":
return "LABEL"
return "UNKNOWN"
Classes
| Subclass of |
| Convenience decorator for spacy based labeling functions. |
| Convenience decorator for stanza based labeling functions. |