Version: 0.93

snorkelflow.sdk.QualityDataset

class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

__init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Methods

`__init__`(df, dataset_uid, label_schema_uid, ...)
`append`(ft_dataset)	Append the given FTDataset to the current FTDataset.
`create_annotation_batches`([assignees])	Create an annotation batch for the ft dataset.
`export_data`(format, filepath)	Export the data in the FTDataset to the specified format and write to the provided filepath.
`filter`([source_uids, splits, x_uids, ...])	Filter the dataset based on the given filters.
`get_data`()	Get the data associated with the fine tuning dataset.
`get_x_uids`()	Get the x_uids in the FTDataset.
`set_as_dev_set`()	Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

filter

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None, labels=None, confidence_threshold=None)

Filter the dataset based on the given filters.

Name	Type	Default	Info
source_uids	`Optional[List[int]]`	`None`	The source uids to filter by.
splits	`Optional[List[str]]`	`None`	The splits to filter by.
x_uids	`Optional[List[str]]`	`None`	The x uids to filter by.
feature_hashes	`Optional[List[str]]`	`None`	The feature hashes to filter by.
has_gt	`Optional[bool]`	`None`	Filter by the existence / non-existence of ground truth.
labels	`Optional[List[str]]`	`None`	The labels to filter by.
confidence_threshold	`Optional[float]`	`None`	The confidence threshold to filter by.