Skip to main content
Version: 25.1

snorkelflow.client.fm_suite.prompt_fm_over_dataset

snorkelflow.client.fm_suite.prompt_fm_over_dataset(prompt_template, dataset, x_uids, model_name, model_type=None, runs_per_prompt=1, sync=True, cache_name='default', system_prompt=None, **fm_hyperparameters)

Run a prompt over a dataset. Any field in the dataset can be referenced in the prompt by using curly braces, {field_name}.

Parameters:
  • prompt_template (str) – The prompt template used to format input rows sent to the foundation model.

  • dataset (Union[str, int]) – The name or UID of the dataset containing the data we want to prompt over.

  • x_uids (List[str]) – The x_uids of the rows within the dataset to prompt over.

  • model_name (str) – The name of the foundation model to use.

  • model_type (Optional[LLMType], default: None) – The way we should use the foundation model, must be one of the LLMType values.

  • runs_per_prompt (int, default: 1) – The number of times to run inference over an xuid, note each response can be different. All will be cached.

  • sync (bool, default: True) – Whether to wait for the job to complete before returning the result.

  • cache_name (str, default: 'default') – The cache name is used in the hash construction. To run a prompt and get a different result, you should change the cache name to something that hasn’t been used before. For example: >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”) << hit’s the cache The meaning of life is to work… >> sf.prompt_fm(“What is the meaning of life?”, “openai/gpt-4o”, cache_name=”run_2”) << hit’s a different part of the cache The meaning of life is to have fun!

  • system_prompt (Optional[str], default: None) – The system prompt to prepend to the prompt.

  • fm_hyperparameters (Any) – Additional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.

Return type:

Union[DataFrame, str]

Returns:

  • df – Dataframe containing the predictions for the data points. There are two columns, the input prompt and the output of the foundation model.

  • job_id – The job id of the prompt inference job which can be used to monitor progress with sf.poll_job_status(job_id).

Examples

>>> sf.prompt_fm_over_dataset(prompt_template="{email_subject}. What is this email about?", dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4")
| email_subject | generated_text | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0 | Fill in survey for $50 amazon voucher | The email is asking you to fill in a survey for an amazon voucher | 0.891
1 | Hey it's Bob, free on Sat? | The email is from your friend Bob as if you're free on Saturday | 0.787