Skip to main content
Version: 0.93

snorkelflow.sdk.FTDataset

class snorkelflow.sdk.FTDataset(df, dataset_uid, label_schema_uid, model_node_uid)

Bases: object

__init__

__init__(df, dataset_uid, label_schema_uid, model_node_uid)

Methods

__init__(df, dataset_uid, label_schema_uid, ...)

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

create_annotation_batches([assignees])

Create an annotation batch for the ft dataset.

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

filter([source_uids, splits, x_uids, ...])

Filter the dataset based on the given filters.

get_data()

Get the data associated with the fine tuning dataset.

get_x_uids()

Get the x_uids in the FTDataset.

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

append

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

Parameters

NameTypeDefaultInfo
ft_datasetFTDatasetThe FTDataset to append to the current FTDataset.

Returns

The appended FTDataset

Return type

FTDataset

create_annotation_batches

create_annotation_batches(assignees=None)

Create an annotation batch for the ft dataset. A batch will be created for each split the x_uids in the dataset are a part of.

Parameters

NameTypeDefaultInfo
assigneesOptional[List[int]]NoneThe user uids of the assignees of the annotation batch.

Returns

The created annotation batch

Return type

List[Dict[str, Any]]

export_data

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

Parameters

NameTypeDefaultInfo
formatExportFormatThe format to export the data to.
filepathstrThe filepath to write the exported data to.

Return type

None

filter

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None)

Filter the dataset based on the given filters.

Parameters

NameTypeDefaultInfo
source_uidsOptional[List[int]]NoneThe source uids to filter by.
splitsOptional[List[str]]NoneThe splits to filter by.
x_uidsOptional[List[str]]NoneThe x uids to filter by.
feature_hashesOptional[List[str]]NoneThe feature hashes to filter by.
has_gtOptional[bool]NoneFilter by the existence / non-existence of ground truth.

Returns

The filtered dataset

Return type

FTDataset

get_data

get_data()

Get the data associated with the fine tuning dataset.

Returns

The data associated with the fine tuning dataset

Return type

pd.DataFrame

get_x_uids

get_x_uids()

Get the x_uids in the FTDataset.

Returns

The x_uids in the FTDataset

Return type

List[str]

set_as_dev_set

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application. Note that a dev set must only contain x_uids that are in the train set.

Return type

None