Skip to main content
Version: 25.1

snorkelflow.client.fm_suite.prompt_fm_over_dataset

snorkelflow.client.fm_suite.prompt_fm_over_dataset(prompt_template, dataset, x_uids, model_name, model_type=None, runs_per_prompt=1, sync=True, cache_name='default', system_prompt=None, **fm_hyperparameters)

Run a prompt over a dataset. Any field in the dataset can be referenced in the prompt by using curly braces, {field_name}.

Parameters

NameTypeDefaultInfo
prompt_templatestrThe prompt template used to format input rows sent to the foundation model.
datasetUnion[str, int]The name or UID of the dataset containing the data we want to prompt over.
x_uidsList[str]The x_uids of the rows within the dataset to prompt over.
model_namestrThe name of the foundation model to use.
model_typeOptional[LLMType]NoneThe way we should use the foundation model, must be one of the LLMType values.
runs_per_promptint1The number of times to run inference over an xuid, note each response can be different. All will be cached.
syncboolTrueWhether to wait for the job to complete before returning the result.
cache_namestr'default'The cache name is used in the hash construction. To run a prompt and get a different result, you should change the cache name to something that hasn’t been used before. For example: >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) << hit’s the cache The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”, cache_name=”run_2”) << hit’s a different part of the cache The meaning of life is to have fun!.
system_promptOptional[str]NoneThe system prompt to prepend to the prompt.
fm_hyperparametersAnyAdditional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.

Return type

Union[DataFrame, str]

Returns

  • df – Dataframe containing the predictions for the data points. There are two columns, the input prompt and the output of the foundation model.

  • job_id – The job id of the prompt inference job which can be used to monitor progress with sf.poll_job_status(job_id).

Examples

>>> sf.prompt_fm_over_dataset(prompt_template="{email_subject}. What is this email about?", dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4")
| email_subject | generated_text | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0 | Fill in survey for $50 amazon voucher | The email is asking you to fill in a survey for an amazon voucher | 0.891
1 | Hey it's Bob, free on Sat? | The email is from your friend Bob as if you're free on Saturday | 0.787