Version: 25.9

snorkelai.sdk.client.synthetic.augment_dataset

snorkelai.sdk.client.synthetic.augment_dataset(dataset, x_uids, model_name, runs_per_prompt=1, prompt='Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.', fields=None, sync=True, **fm_hyperparameters)

Augment each row of the dataset by the number of times specified and return a dataframe containing only the synthetic data. By default, all fields are augmented and the foundation model performs the augmentation of each row (all fields) in one inference step.

Parameters Parameters

Name	Type	Default	Info
dataset	`Union[str, int]`		The name or UID of the dataset to generate a synthetic augmentation of.
x_uids	`List[str]`		The x_uids within the dataset to augment.
model_name	`str`		The name of the foundation model to use.
runs_per_prompt	`int`	`1`	The number of times to augment each row.
prompt	`str`	`'Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.'`	The prompt passed to the foundation model for each row. Note that by default, the prompt is appended with the fields to make the following: “Rewrite the following text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original. Return your answer in a json format with the same keys as the fields: [field_1, field_2, …] Here is the data you have to rewrite…”. To override this default behavior, simply pass at least one field wrapped in parentheses, e.g. {field_1}, within the prompt and no additional text will be append to the prompt.
fields	`Optional[List[str]]`	`None`	The fields to augment. If not provided, all fields will be augmented.
sync	`bool`	`True`	Whether to wait for the job to complete before returning the result.
fm_hyperparameters	`Any`		Additional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.

Return type Return type

Union[DataFrame, str]

Returns Returns

df – Dataframe containing the augmentations for the data points.
job_id – The job id of the augment data job which can be used to monitor progress with sai.poll_job_status(job_id).

Examples

>>> sai.augment_dataset(dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4", runs_per_prompt=2)
   | subject                                | body                                                                    | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0  | Fill in survey for $50 amazon voucher  | The email is asking you to fill in a survey for an amazon voucher       | 0.891
1  | Hey it's Bob, free on Sat?             | The email is from your friend Bob asking if you're free on Saturday     | 0.787
0  | Free survey for $50                    | Want a free $50 amazon voucher? Fill in our survey.                     | 0.911
1  | No Plans on Sat, Bob?                  | Let's meet up on Sat. Bob.                                              | 0.991

Parameters

Parameters​

Return type

Return type​

Returns

Returns​

Examples​

Parameters

Return type

Returns

Examples