Skip to main content
Version: 0.93

snorkelflow.sdk.QualityDataset

class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Bases: FTDataset

__init__

__init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Methods

__init__(df, dataset_uid, label_schema_uid, ...)

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

create_annotation_batches([assignees])

Create an annotation batch for the ft dataset.

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

filter([source_uids, splits, x_uids, ...])

Filter the dataset based on the given filters.

get_data()

Get the data associated with the fine tuning dataset.

get_x_uids()

Get the x_uids in the FTDataset.

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

filter

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None, labels=None, confidence_threshold=None)

Filter the dataset based on the given filters.

Parameters

NameTypeDefaultInfo
source_uidsOptional[List[int]]NoneThe source uids to filter by.
splitsOptional[List[str]]NoneThe splits to filter by.
x_uidsOptional[List[str]]NoneThe x uids to filter by.
feature_hashesOptional[List[str]]NoneThe feature hashes to filter by.
has_gtOptional[bool]NoneFilter by the existence / non-existence of ground truth.
labelsOptional[List[str]]NoneThe labels to filter by.
confidence_thresholdOptional[float]NoneThe confidence threshold to filter by.

Returns

The filtered dataset

Return type

QualityDataset