Skip to main content
Version: 0.93

snorkelflow.sdk.FTDataset

class snorkelflow.sdk.FTDataset(df, dataset_uid, label_schema_uid, model_node_uid)

Bases: object

__init__(df, dataset_uid, label_schema_uid, model_node_uid)

Methods

__init__(df, dataset_uid, label_schema_uid, ...)

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

create_annotation_batches([assignees])

Create an annotation batch for the ft dataset.

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

filter([source_uids, splits, x_uids, ...])

Filter the dataset based on the given filters.

get_data()

Get the data associated with the fine tuning dataset.

get_x_uids()

Get the x_uids in the FTDataset.

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

Parameters:

ft_dataset (FTDataset) – The FTDataset to append to the current FTDataset

Returns:

The appended FTDataset

Return type:

FTDataset

create_annotation_batches(assignees=None)

Create an annotation batch for the ft dataset. A batch will be created for each split the x_uids in the dataset are a part of.

Parameters:

assignees (Optional[List[int]], default: None) – The user uids of the assignees of the annotation batch

Returns:

The created annotation batch

Return type:

List[Dict[str, Any]]

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

Parameters:
  • format (ExportFormat) – The format to export the data to.

  • filepath (str) – The filepath to write the exported data to.

Return type:

None

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None)

Filter the dataset based on the given filters.

Parameters:
  • source_uids (Optional[List[int]], default: None) – The source uids to filter by

  • splits (Optional[List[str]], default: None) – The splits to filter by

  • x_uids (Optional[List[str]], default: None) – The x uids to filter by

  • feature_hashes (Optional[List[str]], default: None) – The feature hashes to filter by

  • has_gt (Optional[bool], default: None) – Filter by the existence / non-existence of ground truth

Returns:

The filtered dataset

Return type:

FTDataset

get_data()

Get the data associated with the fine tuning dataset.

Returns:

The data associated with the fine tuning dataset

Return type:

pd.DataFrame

get_x_uids()

Get the x_uids in the FTDataset.

Returns:

The x_uids in the FTDataset

Return type:

List[str]

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application. Note that a dev set must only contain x_uids that are in the train set.

Return type:

None