Version: 0.93

snorkelflow.sdk.FTDataset

class snorkelflow.sdk.FTDataset(df, dataset_uid, label_schema_uid, model_node_uid)

Bases: object

__init__(df, dataset_uid, label_schema_uid, model_node_uid)

Methods

`__init__`(df, dataset_uid, label_schema_uid, ...)
`append`(ft_dataset)	Append the given FTDataset to the current FTDataset.
`create_annotation_batches`([assignees])	Create an annotation batch for the ft dataset.
`export_data`(format, filepath)	Export the data in the FTDataset to the specified format and write to the provided filepath.
`filter`([source_uids, splits, x_uids, ...])	Filter the dataset based on the given filters.
`get_data`()	Get the data associated with the fine tuning dataset.
`get_x_uids`()	Get the x_uids in the FTDataset.
`set_as_dev_set`()	Resample the x_uids within the FTDataset as the dev set for the fine tuning application.

append

append(ft_dataset)

Append the given FTDataset to the current FTDataset.

Parameters Parameters
Returns Returns: The appended FTDataset
Return type Return type: FTDataset

Name	Type	Default	Info
ft_dataset	`FTDataset`		The FTDataset to append to the current FTDataset.

create_annotation_batches

create_annotation_batches(assignees=None)

Create an annotation batch for the ft dataset. A batch will be created for each split the x_uids in the dataset are a part of.

Parameters Parameters
Returns Returns: The created annotation batch
Return type Return type: List[Dict[str, Any]]

Name	Type	Default	Info
assignees	`Optional[List[int]]`	`None`	The user uids of the assignees of the annotation batch.

export_data

export_data(format, filepath)

Export the data in the FTDataset to the specified format and write to the provided filepath.

Parameters Parameters
Return type Return type: None

Name	Type	Default	Info
format	`ExportFormat`		The format to export the data to.
filepath	`str`		The filepath to write the exported data to.

filter

filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None)

Filter the dataset based on the given filters.

Parameters Parameters
Returns Returns: The filtered dataset
Return type Return type: FTDataset

Name	Type	Default	Info
source_uids	`Optional[List[int]]`	`None`	The source uids to filter by.
splits	`Optional[List[str]]`	`None`	The splits to filter by.
x_uids	`Optional[List[str]]`	`None`	The x uids to filter by.
feature_hashes	`Optional[List[str]]`	`None`	The feature hashes to filter by.
has_gt	`Optional[bool]`	`None`	Filter by the existence / non-existence of ground truth.

get_data

get_data()

Get the data associated with the fine tuning dataset.

Returns Returns: The data associated with the fine tuning dataset
Return type Return type: pd.DataFrame

get_x_uids

get_x_uids()

Get the x_uids in the FTDataset.

Returns Returns: The x_uids in the FTDataset
Return type Return type: List[str]

set_as_dev_set

set_as_dev_set()

Resample the x_uids within the FTDataset as the dev set for the fine tuning application. Note that a dev set must only contain x_uids that are in the train set.

Return type Return type: None

\_\_init\_\_

__init__​

append

append​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

create\_annotation\_batches

create_annotation_batches​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

export\_data

export_data​

Parameters

Parameters​

Return type

Return type​

filter

filter​

Parameters

Parameters​

Returns

Returns​

Return type

Return type​

get\_data

get_data​

Returns

Returns​

Return type

Return type​

get\_x\_uids

get_x_uids​

Returns

Returns​

Return type

Return type​

set\_as\_dev\_set

set_as_dev_set​

Return type

Return type​

init

append

Parameters

Returns

Return type

create_annotation_batches

Parameters

Returns

Return type

export_data

Parameters

Return type

filter

Parameters

Returns

Return type

get_data

Returns

Return type

get_x_uids

Returns

Return type

set_as_dev_set

Return type