LLM automation
Snorkel Flow has integrated large language model (LLM) automation in multiple ways to support your machine learning development cycle within the platform.
This initiative, along with warm start, prompt builder, and fine-tuning, empowers users with Foundation models (FMs). See Foundation model suite to learn more about our approach to Foundation models.
Prerequisite
Before you can start using this feature, you need to insert your OpenAI API Key into Snorkel's secure database. As an admin user, you can accomplish this by inserting your secret API key in the SDK as follows:
sf.set_secret("OPENAI_API_KEY", "your_openai_api_key_here")
Use cases
-
Suggesting regex patterns from natural prompts
Provide a natural language prompt; this method will suggest a regular expression matching your requests. Improve the efficiency of brainstorming regular expressions with our tool automatically suggesting regex patterns from natural language prompts.
Example:sf.lfs.create_regex_from_prompt(“Capturing all capitalized words”)
>> '\\b[A-Z]\\w+' -
**Suggesting keywords for supported applications inside Snorkel Flow
**Provide a natural language prompt, and this method will suggest keyword labeling functions representative of your desired data labels. Make keyword brainstorming for supported applications in Snorkel Flow more efficient with suggested labeling functions generated from natural language inputs.
Example:sf.add_prompt_keyword_lf(node, text_field="summary", label="Short Film")
>> # Snorkel Labeling Function with keywords: “Short Film” “Short Movie” “Short Video”, “Short Clip”, “Short Animation”, etc. -
Suggesting code labeling function for supported applications inside Snorkel Flow
Provide a natural language prompt, and this method will suggest code labeling functions representing your description. Enhance the efficiency of writing custom code labeling functions for supported applications in SnorkelFlow with our tool that generates function suggestions based on natural language inputs.
Example:sf.create_prompt_code_lf(
nodes,
prompt='check if famous early 1900s short film actors are in x.cast',
label={'Short Film':"PRESENT"}
)
>> # Output:
@labeling_function(name="""autogenerate-{'Short Film': 'PRESENT'}-check if famous""")
def custom_lf(x):
short_film_actors = ['Charlie Chaplin', 'Buster Keaton', 'Harold Lloyd']
if any(actor in x.cast for actor in short_film_actors):
return {'Short Film': 'PRESENT'}
else:
return {'_default': 'ABSTAIN'}
sf.add_code_lf(node_uid, custom_lf, is_multipolar=True) -
Suggesting code operators inside Snorkel Flow
Provide a natural language prompt, and this method will suggest Snorkel-compatible code functions performing your request. Accelerate the process of creating custom Snorkel operators with our tool that generates code function suggestions based on natural language prompts, simplifying the custom operator creation process.
Example:sf.create_custom_operator_from_prompt(
prompt='use spacy to extract all nouns from a column called text, return a list of tuple of start and end char',
operator_type='pandas_operator'
)
>> # Output:
@pandas_operator(name="extract_nouns", input_schema={"text": str}, output_schema={"nouns": list},)
def extract_nouns(df: pd.DataFrame) -> pd.DataFrame:
import spacy
nlp = spacy.load("en_core_web_sm")
df["nouns"] = [
[(token.idx, token.idx + len(token)) for token in nlp(text) if token.pos_ == "NOUN"]
for text in df["text"]
]
return df