Skip to main content
Version: 0.94

snorkelflow.sdk.QualityDataset

class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Bases: FTDataset

__init__

__init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Methods

__init__(df, dataset_uid, label_schema_uid, ...)
append(ft_dataset)Append the given FTDataset to the current FTDataset.
create_annotation_batches([assignees])Create an annotation batch for the ft dataset.
export_data(format, filepath)Export the data in the FTDataset to the specified format and write to the provided filepath.
filter([source_uids, splits, x_uids, ...])Filter the dataset based on the given filters.
get_data()Get the data associated with the fine tuning dataset.
get_x_uids()Get the x_uids in the FTDataset.
mix(mix_on, weights, n_samples[, seed])Mix the dataset by split, source_uid, or slice based on the given weights, returning up to limit samples.
sample(n[, seed])Sample n samples from the FTDataset.
save(name)Save the FTDataset as a slice.
set_as_dev_set()Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

filter

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, slices=None, has_gt=None, labels=None, confidence_threshold=None)

Filter the dataset based on the given filters.

Parameters

NameTypeDefaultInfo
source_uidsOptional[List[int]]NoneThe source uids to filter by.
splitsOptional[List[str]]NoneThe splits to filter by.
x_uidsOptional[List[str]]NoneThe x uids to filter by.
feature_hashesOptional[List[str]]NoneThe feature hashes to filter by.
slicesOptional[List[Slice]]NoneThe slices to filter by, rows within at least one slice will be included.
has_gtOptional[bool]NoneFilter by the existence / non-existence of ground truth.
labelsOptional[List[str]]NoneThe labels to filter by.
confidence_thresholdOptional[float]NoneThe confidence threshold to filter by.

Returns

The filtered dataset

Return type

QualityDataset