snorkelflow.sdk.FTDataset
- class snorkelflow.sdk.FTDataset(df, dataset_uid, label_schema_uid, model_node_uid)
Bases:
object
- __init__(df, dataset_uid, label_schema_uid, model_node_uid)
Methods
__init__
(df, dataset_uid, label_schema_uid, ...)append
(ft_dataset)Append the given FTDataset to the current FTDataset.
create_annotation_batches
([assignees])Create an annotation batch for the ft dataset.
export_data
(format, filepath)Export the data in the FTDataset to the specified format and write to the provided filepath.
filter
([source_uids, splits, x_uids, ...])Filter the dataset based on the given filters.
get_data
()Get the data associated with the fine tuning dataset.
Get the x_uids in the FTDataset.
Resample the x_uids within the FTDataset as the dev set for the fine tuning application.
- append(ft_dataset)
Append the given FTDataset to the current FTDataset.
- create_annotation_batches(assignees=None)
Create an annotation batch for the ft dataset. A batch will be created for each split the x_uids in the dataset are a part of.
- Parameters:
assignees (
Optional
[List
[int
]], default:None
) – The user uids of the assignees of the annotation batch- Returns:
The created annotation batch
- Return type:
List[Dict[str, Any]]
- export_data(format, filepath)
Export the data in the FTDataset to the specified format and write to the provided filepath.
- Parameters:
format (
ExportFormat
) – The format to export the data to.filepath (
str
) – The filepath to write the exported data to.
- Return type:
None
- filter(source_uids=None, splits=None, x_uids=None, feature_hashes=None, has_gt=None)
Filter the dataset based on the given filters.
- Parameters:
source_uids (
Optional
[List
[int
]], default:None
) – The source uids to filter bysplits (
Optional
[List
[str
]], default:None
) – The splits to filter byx_uids (
Optional
[List
[str
]], default:None
) – The x uids to filter byfeature_hashes (
Optional
[List
[str
]], default:None
) – The feature hashes to filter byhas_gt (
Optional
[bool
], default:None
) – Filter by the existence / non-existence of ground truth
- Returns:
The filtered dataset
- Return type:
- get_data()
Get the data associated with the fine tuning dataset.
- Returns:
The data associated with the fine tuning dataset
- Return type:
pd.DataFrame
- get_x_uids()
Get the x_uids in the FTDataset.
- Returns:
The x_uids in the FTDataset
- Return type:
List[str]
- set_as_dev_set()
Resample the x_uids within the FTDataset as the dev set for the fine tuning application. Note that a dev set must only contain x_uids that are in the train set.
- Return type:
None