snorkelflow.client.fm_suite.prompt_fm_over_dataset
- snorkelflow.client.fm_suite.prompt_fm_over_dataset(prompt_template, dataset, x_uids, model_name, model_type=None, runs_per_prompt=1, sync=True, cache_name='default', system_prompt=None, **fm_hyperparameters)
Run a prompt over a dataset. Any field in the dataset can be referenced in the prompt by using curly braces, {field_name}.
- Parameters:
prompt_template (
str
) – The prompt template used to format input rows sent to the foundation model.dataset (
Union
[str
,int
]) – The name or UID of the dataset containing the data we want to prompt over.x_uids (
List
[str
]) – The x_uids of the rows within the dataset to prompt over.model_name (
str
) – The name of the foundation model to use.model_type (
Optional
[LLMType
], default:None
) – The way we should use the foundation model, must be one of the LLMType values.runs_per_prompt (
int
, default:1
) – The number of times to run inference over an xuid, note each response can be different. All will be cached.sync (
bool
, default:True
) – Whether to wait for the job to complete before returning the result.cache_name (
str
, default:'default'
) – The cache name is used in the hash construction. To run a prompt and get a different result, you should change the cache name to something that hasn’t been used before. For example: >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) << hit’s the cache The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”, cache_name=”run_2”) << hit’s a different part of the cache The meaning of life is to have fun!system_prompt (
Optional
[str
], default:None
) – The system prompt to prepend to the prompt.fm_hyperparameters (
Any
) – Additional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.
- Return type:
Union
[DataFrame
,str
]- Returns:
df – Dataframe containing the predictions for the data points. There are two columns, the input prompt and the output of the foundation model.
job_id – The job id of the prompt inference job which can be used to monitor progress with sf.poll_job_status(job_id).
Examples
>>> sf.prompt_fm_over_dataset(prompt_template="{email_subject}. What is this email about?", dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4")
| email_subject | generated_text | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0 | Fill in survey for $50 amazon voucher | The email is asking you to fill in a survey for an amazon voucher | 0.891
1 | Hey it's Bob, free on Sat? | The email is from your friend Bob as if you're free on Saturday | 0.787