Skip to main content
Version: 25.9

snorkelai.sdk.client.synthetic.augment_dataset

snorkelai.sdk.client.synthetic.augment_dataset(dataset, x_uids, model_name, runs_per_prompt=1, prompt='Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.', fields=None, sync=True, **fm_hyperparameters)

Augment each row of the dataset by the number of times specified and return a dataframe containing only the synthetic data. By default, all fields are augmented and the foundation model performs the augmentation of each row (all fields) in one inference step.

Parameters

NameTypeDefaultInfo
datasetUnion[str, int]The name or UID of the dataset to generate a synthetic augmentation of.
x_uidsList[str]The x_uids within the dataset to augment.
model_namestrThe name of the foundation model to use.
runs_per_promptint1The number of times to augment each row.
promptstr'Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.'The prompt passed to the foundation model for each row. Note that by default, the prompt is appended with the fields to make the following: “Rewrite the following text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original. Return your answer in a json format with the same keys as the fields: [field_1, field_2, …] Here is the data you have to rewrite…”. To override this default behavior, simply pass at least one field wrapped in parentheses, e.g. {field_1}, within the prompt and no additional text will be append to the prompt.
fieldsOptional[List[str]]NoneThe fields to augment. If not provided, all fields will be augmented.
syncboolTrueWhether to wait for the job to complete before returning the result.
fm_hyperparametersAnyAdditional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc.

Return type

Union[DataFrame, str]

Returns

  • df – Dataframe containing the augmentations for the data points.

  • job_id – The job id of the augment data job which can be used to monitor progress with sai.poll_job_status(job_id).

Examples

>>> sai.augment_dataset(dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4", runs_per_prompt=2)
| subject | body | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0 | Fill in survey for $50 amazon voucher | The email is asking you to fill in a survey for an amazon voucher | 0.891
1 | Hey it's Bob, free on Sat? | The email is from your friend Bob asking if you're free on Saturday | 0.787
0 | Free survey for $50 | Want a free $50 amazon voucher? Fill in our survey. | 0.911
1 | No Plans on Sat, Bob? | Let's meet up on Sat. Bob. | 0.991