snorkelflow.sdk.Batch
- class snorkelflow.sdk.Batch(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)
Bases:
object
The Batch object represents an annotation batch in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) annotation batches.
- __init__(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)
Create a batch object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the
create()
andget()
methods.- Parameters:
name (
str
) – The name of the batchuid (
int
) – The UID for the batch within Snorkel Flowdataset_uid (
int
) – The UID for the dataset within Snorkel Flowlabel_schemas (
List
[LabelSchema
]) – The list of label schemas associated with this batchbatch_size (
int
) – The number of examples in the batchts (
datetime
) – The timestamp at which the batch was createdx_uids (
List
[str
]) – The UIDs for the examples in the batch
Methods
__init__
(name, uid, dataset_uid, ...)Create a batch object in-memory with necessary properties.
commit
(source_uid[, label_schema_uids])Commit a source on a batch as ground truth.
create
(cls, dataset_uid[, name, assignees, ...])Create one or more annotation batches for a dataset.
delete
(batch_uid)Delete an annotation batch by its UID.
export
(path[, selected_fields, ...])Export the batch to a zipped CSV file.
get
(batch_uid)Retrieve an annotation batch by its UID.
get_dataframe
([selected_fields, ...])Get a pandas DataFrame representation of the batch.
update
([name, assignees, expert_source_uid])Update properties of the annotation batch.
Attributes
The number of examples in the batch.
The UID for the dataset within Snorkel Flow.
The list of label schemas associated with this batch.
The name of the batch.
The timestamp at which the batch was created.
The UID for the batch within Snorkel Flow.
The UIDs for the examples in the batch.
- commit(source_uid, label_schema_uids=None)
Commit a source on a batch as ground truth.
- Parameters:
source_uid (
int
) – The UID for the source on the batchlabel_schema_uids (
Optional
[List
[int
]], default:None
) – The label schema UIDs to commit, defaults to all label schemas if not set
- Return type:
None
- classmethod create(cls, dataset_uid, name=None, assignees=None, label_schemas=None, batch_size=None, num_batches=None, randomize=False, random_seed=123, selection_strategy=None, split=None, x_uids=None, filter_by_x_uids_not_in_batch=False, divide_x_uids_evenly_to_assignees=False)
Create one or more annotation batches for a dataset.
Typically, Dataset.create_batches() is the recommended entrypoint for creating batches.
- Parameters:
dataset_uid (
int
) – The UID for the dataset within Snorkel Flowname (
Optional
[str
], default:None
) – The name of the batchassignees (
Optional
[List
[int
]], default:None
) – The user UIDs for the assignees of the batcheslabel_schemas (
Optional
[List
[LabelSchema
]], default:None
) – The label schemas assigned for the batchesbatch_size (
Optional
[int
], default:None
) – The size of the batchesnum_batches (
Optional
[int
], default:None
) – The number of batchesrandomize (
Optional
[bool
], default:False
) – Whether to randomize the batchesrandom_seed (
Optional
[int
], default:123
) – The seed for the randomizationselection_strategy (
Optional
[SelectionStrategy
], default:None
) – The SelectionStrategy for the batchessplit (
Optional
[str
], default:None
) – The split (“train”, “test”, or “valid”) of the batchesx_uids (
Optional
[List
[str
]], default:None
) – A list of datapoint uids to create batches fromfilter_by_x_uids_not_in_batch (
Optional
[bool
], default:False
) – Whether to create batches with datapoints not in a batchdivide_x_uids_evenly_to_assignees (
Optional
[bool
], default:False
) – Whether to divide the datapoints evenly among the provided assignees
- Returns:
The list of created batches
- Return type:
List[Batch]
- classmethod delete(batch_uid)
Delete an annotation batch by its UID.
- Parameters:
batch_uid (
int
) – The UID for the batch within Snorkel Flow- Return type:
None
- export(path, selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000, csv_delimiter=',', quote_char='"', escape_char='\\\\')
Export the batch to a zipped CSV file.
- Parameters:
path (
Union
[str
,Path
]) – The path to the zipped CSV file. If the path does not end in .zip, it will be appended to the path.selected_fields (
Optional
[List
[str
]], default:None
) – A list of fields to export. If not set, all fields will be exported.include_annotations (
bool
, default:False
) – Whether to include annotations in the exportinclude_ground_truth (
bool
, default:False
) – Whether to include ground truth in the exportmax_rows (
int
, default:10000
) – The maximum number of rows to exportcsv_delimiter (
str
, default:','
) – The delimiter to use for CSV fieldsquote_char (
str
, default:'"'
) – The character to use for quoted fields in the CSVescape_char (
str
, default:'\\\\'
) – The character to use for escaping special characters in the CSV
- Returns:
The path to the zipped CSV file
- Return type:
pathlib.Path
- classmethod get(batch_uid)
Retrieve an annotation batch by its UID.
- Parameters:
batch_uid (
int
) – The UID for the batch within Snorkel Flow- Returns:
The batch object
- Return type:
- get_dataframe(selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000)
Get a pandas DataFrame representation of the batch.
- Parameters:
selected_fields (
Optional
[List
[str
]], default:None
) – A list of fields to include in the DataFrame. If not set, all fields will be included.include_annotations (
bool
, default:False
) – Whether to include annotations in the DataFrameinclude_ground_truth (
bool
, default:False
) – Whether to include ground truth in the DataFramemax_rows (
int
, default:10000
) – The maximum number of rows to include in the DataFrame
- Returns:
The pandas DataFrame representation of the batch
- Return type:
pd.DataFrame
- update(name=None, assignees=None, expert_source_uid=None)
Update properties of the annotation batch.
- Parameters:
name (
Optional
[str
], default:None
) – The new name of the batchassignees (
Optional
[List
[int
]], default:None
) – The user UIDs for the new assignees of the batchesexpert_source_uid (
Optional
[int
], default:None
) – The UID for the new expert source of the batches
- Return type:
None
- property batch_size: int
The number of examples in the batch.
- property dataset_uid: int
The UID for the dataset within Snorkel Flow.
- property label_schemas: List[LabelSchema]
The list of label schemas associated with this batch.
- property name: str
The name of the batch.
- property ts: datetime
The timestamp at which the batch was created.
- property uid: int
The UID for the batch within Snorkel Flow.
- property x_uids: List[str]
The UIDs for the examples in the batch.