snorkelflow.sdk.FTDataset
- class snorkelflow.sdk.FTDataset(df, dataset_uid, label_schema_uid, model_node_uid)
Bases:
object- __init__(df, dataset_uid, label_schema_uid, model_node_uid)
\_\_init\_\_
__init__
Methods
__init__(df, dataset_uid, label_schema_uid, ...)append(ft_dataset)Append the given FTDataset to the current FTDataset. create_annotation_batches([assignees])Create an annotation batch for the ft dataset. export_data(format, filepath)Export the data in the FTDataset to the specified format and write to the provided filepath. filter([source_uids, splits, x_uids, ...])Filter the dataset based on the given filters. get_data()Get the data associated with the fine tuning dataset. get_x_uids()Get the x_uids in the FTDataset. set_as_dev_set()Resample the x_uids within the FTDataset as the dev set for the fine tuning application. - append(ft_dataset)
Append the given FTDataset to the current FTDataset.
append
append
- create_annotation_batches(assignees=None)
Create an annotation batch for the ft dataset. A batch will be created for each split the x_uids in the dataset are a part of.
create\_annotation\_batches
create_annotation_batches
- export_data(format, filepath)
Export the data in the FTDataset to the specified format and write to the provided filepath.
export\_data
export_data
- filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None)
Filter the dataset based on the given filters.
Parameters
Parameters
Returns
Returns
The filtered dataset
Return type
Return type
Name Type Default Info source_uids Optional[List[int]]NoneThe source uids to filter by. splits Optional[List[str]]NoneThe splits to filter by. x_uids Optional[List[str]]NoneThe x uids to filter by. feature_hashes Optional[List[str]]NoneThe feature hashes to filter by. has_gt Optional[bool]NoneFilter by the existence / non-existence of ground truth.
filter
filter
- get_data()
Get the data associated with the fine tuning dataset.
get\_data
get_data
- get_x_uids()
Get the x_uids in the FTDataset.
get\_x\_uids
get_x_uids