Skip to main content
Version: 0.94

snorkelflow.sdk.QualityDataset

class snorkelflow.sdk.QualityDataset(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Bases: FTDataset

__init__(df, dataset_uid, label_schema_uid, model_node_uid, label_map)

Methods

__init__(df, dataset_uid, label_schema_uid, ...)

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

create_annotation_batches([assignees])

Create an annotation batch for the ft dataset.

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

filter([source_uids, splits, x_uids, ...])

Filter the dataset based on the given filters.

get_data()

Get the data associated with the fine tuning dataset.

get_x_uids()

Get the x_uids in the FTDataset.

mix(mix_on, weights, n_samples[, seed])

Mix the dataset by split, source_uid, or slice based on the given weights, returning up to limit samples.

sample(n[, seed])

Sample n samples from the FTDataset.

save(name)

Save the FTDataset as a slice.

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, slices=None, has_gt=None, labels=None, confidence_threshold=None)

Filter the dataset based on the given filters.

Parameters:
  • source_uids (Optional[List[int]], default: None) – The source uids to filter by

  • splits (Optional[List[str]], default: None) – The splits to filter by

  • x_uids (Optional[List[str]], default: None) – The x uids to filter by

  • feature_hashes (Optional[List[str]], default: None) – The feature hashes to filter by

  • slices (Optional[List[Slice]], default: None) – The slices to filter by, rows within at least one slice will be included

  • has_gt (Optional[bool], default: None) – Filter by the existence / non-existence of ground truth

  • labels (Optional[List[str]], default: None) – The labels to filter by

  • confidence_threshold (Optional[float], default: None) – The confidence threshold to filter by

Returns:

The filtered dataset

Return type:

QualityDataset