Version: 25.2

snorkelflow.client.fm_suite.prompt_fm_over_dataset

snorkelflow.client.fm_suite.prompt_fm_over_dataset(prompt_template, dataset, x_uids, model_name, model_type=None, runs_per_prompt=1, sync=True, cache_name='default', system_prompt=None, **fm_hyperparameters)

Run a prompt over a dataset. Any field in the dataset can be referenced in the prompt by using curly braces, {field_name}.

Parameters Parameters

Name	Type	Default	Info
prompt_template	`str`		The prompt template used to format input rows sent to the foundation model.
dataset	`Union[str, int]`		The name or UID of the dataset containing the data we want to prompt over.
x_uids	`List[str]`		The x_uids of the rows within the dataset to prompt over.
model_name	`str`		The name of the foundation model to use.
model_type	`Optional[LLMType]`	`None`	The way we should use the foundation model, must be one of the LLMType values.
runs_per_prompt	`int`	`1`	The number of times to run inference over an xuid, note each response can be different. All will be cached.
sync	`bool`	`True`	Whether to wait for the job to complete before returning the result.
cache_name	`str`	`'default'`	The cache name is used in the hash construction. To run a prompt and get a different result, you should change the cache name to something that hasn’t been used before. For example: >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) << hit’s the cache The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”, cache_name=”run_2”) << hit’s a different part of the cache The meaning of life is to have fun!.
system_prompt	`Optional[str]`	`None`	The system prompt to prepend to the prompt.
fm_hyperparameters	`Any`		Additional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.

Return type Return type

Union[DataFrame, str]

Returns Returns

df – Dataframe containing the predictions for the data points. There are two columns, the input prompt and the output of the foundation model.
job_id – The job id of the prompt inference job which can be used to monitor progress with sf.poll_job_status(job_id).

Examples

>>> sf.prompt_fm_over_dataset(prompt_template="{email_subject}. What is this email about?", dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4")
   | email_subject                          | generated_text                                                          | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0  | Fill in survey for $50 amazon voucher  | The email is asking you to fill in a survey for an amazon voucher       | 0.891
1  | Hey it's Bob, free on Sat?             | The email is from your friend Bob as if you're free on Saturday         | 0.787

Parameters

Parameters​

Return type

Return type​

Returns

Returns​

Examples​

Parameters

Return type

Returns

Examples