snorkelai.sdk.client.synthetic.augment_dataset
- snorkelai.sdk.client.synthetic.augment_dataset(dataset, x_uids, model_name, runs_per_prompt=1, prompt='Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.', fields=None, sync=True, **fm_hyperparameters)
Augment each row of the dataset by the number of times specified and return a dataframe containing only the synthetic data. By default, all fields are augmented and the foundation model performs the augmentation of each row (all fields) in one inference step.
Parameters
Parameters
Return type
Return type
Union
[DataFrame
,str
]Returns
Returns
df – Dataframe containing the augmentations for the data points.
job_id – The job id of the augment data job which can be used to monitor progress with sai.poll_job_status(job_id).
Name Type Default Info dataset Union[str, int]
The name or UID of the dataset to generate a synthetic augmentation of. x_uids List[str]
The x_uids within the dataset to augment. model_name str
The name of the foundation model to use. runs_per_prompt int
1
The number of times to augment each row. prompt str
'Your task is to rewrite the a set of text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original.'
The prompt passed to the foundation model for each row. Note that by default, the prompt is appended with the fields to make the following: “Rewrite the following text fields whilst retaining the core meaning. You should keep the same language and ensure each re-written field is of a similar length to the original. Return your answer in a json format with the same keys as the fields: [field_1, field_2, …] Here is the data you have to rewrite…”. To override this default behavior, simply pass at least one field wrapped in parentheses, e.g. {field_1}, within the prompt and no additional text will be append to the prompt. fields Optional[List[str]]
None
The fields to augment. If not provided, all fields will be augmented. sync bool
True
Whether to wait for the job to complete before returning the result. fm_hyperparameters Any
Additional keyword arguments to pass to the foundation model such as temperature, max_tokens, etc. Examples
>>> sai.augment_dataset(dataset=1, x_uids=["0", "1"], model_name="openai/gpt-4", runs_per_prompt=2)
| subject | body | perplexity
-----------------------------------------------------------------------------------------------------------------------------------
0 | Fill in survey for $50 amazon voucher | The email is asking you to fill in a survey for an amazon voucher | 0.891
1 | Hey it's Bob, free on Sat? | The email is from your friend Bob asking if you're free on Saturday | 0.787
0 | Free survey for $50 | Want a free $50 amazon voucher? Fill in our survey. | 0.911
1 | No Plans on Sat, Bob? | Let's meet up on Sat. Bob. | 0.991