Skip to main content
Version: 25.2


class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Bases: FTDataset

__init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)


__init__(df, dataset_uid, label_schema_uid, ...)


Append the given FTDataset to the current FTDataset.


Create an annotation batch for the ft dataset.

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

filter([source_uids, splits, x_uids, ...])

Filter the dataset based on the given filters.


Get the data associated with the fine tuning dataset.


Get the x_uids in the FTDataset.

mix(mix_on, weights, n_samples[, seed])

Mix the dataset by split, source_uid, or slice based on the given weights, returning up to limit samples.

sample(n[, seed])

Sample n samples from the FTDataset.


Save the FTDataset as a slice.


Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, slices=None, has_gt=None, labels=None, confidence_threshold=None)

Filter the dataset based on the given filters.

  • source_uids (Optional[List[int]], default: None) – The source uids to filter by

  • splits (Optional[List[str]], default: None) – The splits to filter by

  • x_uids (Optional[List[str]], default: None) – The x uids to filter by

  • feature_hashes (Optional[List[str]], default: None) – The feature hashes to filter by

  • slices (Optional[List[Slice]], default: None) – The slices to filter by, rows within at least one slice will be included

  • has_gt (Optional[bool], default: None) – Filter by the existence / non-existence of ground truth

  • labels (Optional[List[str]], default: None) – The labels to filter by

  • confidence_threshold (Optional[float], default: None) – The confidence threshold to filter by


The filtered dataset

Return type:
