snorkelflow.sdk.QualityDataset
- class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)
Bases:
FTDataset
- __init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)
\_\_init\_\_
__init__
Methods
__init__
(df, dataset_uid, label_schema_uid, ...)append
(ft_dataset)Append the given FTDataset to the current FTDataset.
create_annotation_batches
([assignees])Create an annotation batch for the ft dataset.
export_data
(format, filepath)Export the data in the FTDataset to the specified format and write to the provided filepath.
filter
([source_uids, splits, x_uids, ...])Filter the dataset based on the given filters.
get_data
()Get the data associated with the fine tuning dataset.
get_x_uids
()Get the x_uids in the FTDataset.
set_as_dev_set
()Resample the x_uids within the FTDataset as the dev set for the fine tuning application.
- filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None, labels=None, confidence_threshold=None)
Filter the dataset based on the given filters.
Parameters
Parameters
Returns
Returns
The filtered dataset
Return type
Return type
Name Type Default Info source_uids Optional[List[int]]
None
The source uids to filter by. splits Optional[List[str]]
None
The splits to filter by. x_uids Optional[List[str]]
None
The x uids to filter by. feature_hashes Optional[List[str]]
None
The feature hashes to filter by. has_gt Optional[bool]
None
Filter by the existence / non-existence of ground truth. labels Optional[List[str]]
None
The labels to filter by. confidence_threshold Optional[float]
None
The confidence threshold to filter by.
filter
filter