Skip to main content
Version: 25.3

snorkelflow.sdk.Batch

class snorkelflow.sdk.Batch(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Bases: object

The Batch object represents an annotation batch in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) annotation batches.

__init__

__init__(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Create a batch object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the create() and get() methods.

Parameters

NameTypeDefaultInfo
namestrThe name of the batch.
uidintThe UID for the batch within Snorkel Flow.
dataset_uidintThe UID for the dataset within Snorkel Flow.
label_schemasList[LabelSchema]The list of label schemas associated with this batch.
batch_sizeintThe number of examples in the batch.
tsdatetimeThe timestamp at which the batch was created.
x_uidsList[str]The UIDs for the examples in the batch.

Methods

__init__(name, uid, dataset_uid, ...)

Create a batch object in-memory with necessary properties.

commit(source_uid[, label_schema_uids])

Commit a source on a batch as ground truth.

create(cls, dataset_uid[, name, assignees, ...])

Create one or more annotation batches for a dataset.

delete(batch_uid)

Delete an annotation batch by its UID.

export(path[, selected_fields, ...])

Export the batch to a zipped CSV file.

get(batch_uid)

Retrieve an annotation batch by its UID.

get_dataframe([selected_fields, ...])

Get a pandas DataFrame representation of the batch.

update([name, assignees, expert_source_uid])

Update properties of the annotation batch.

Attributes

batch_size

The number of examples in the batch.

dataset_uid

The UID for the dataset within Snorkel Flow.

label_schemas

The list of label schemas associated with this batch.

name

The name of the batch.

ts

The timestamp at which the batch was created.

uid

The UID for the batch within Snorkel Flow.

x_uids

The UIDs for the examples in the batch.

commit

commit(source_uid, label_schema_uids=None)

Commit a source on a batch as ground truth.

Parameters

NameTypeDefaultInfo
source_uidintThe UID for the source on the batch.
label_schema_uidsOptional[List[int]]NoneThe label schema UIDs to commit, defaults to all label schemas if not set.

Return type

None

create

classmethod create(cls, dataset_uid, name=None, assignees=None, label_schemas=None, batch_size=None, num_batches=None, randomize=False, random_seed=123, selection_strategy=None, split=None, x_uids=None, filter_by_x_uids_not_in_batch=False, divide_x_uids_evenly_to_assignees=False)

Create one or more annotation batches for a dataset.

Typically, Dataset.create_batches() is the recommended entrypoint for creating batches.

Parameters

NameTypeDefaultInfo
dataset_uidintThe UID for the dataset within Snorkel Flow.
nameOptional[str]NoneThe name of the batch.
assigneesOptional[List[int]]NoneThe user UIDs for the assignees of the batches.
label_schemasOptional[List[LabelSchema]]NoneThe label schemas assigned for the batches.
batch_sizeOptional[int]NoneThe size of the batches.
num_batchesOptional[int]NoneThe number of batches.
randomizeOptional[bool]FalseWhether to randomize the batches.
random_seedOptional[int]123The seed for the randomization.
selection_strategyOptional[SelectionStrategy]NoneThe SelectionStrategy for the batches.
splitOptional[str]NoneThe split (“train”, “test”, or “valid”) of the batches.
x_uidsOptional[List[str]]NoneA list of datapoint uids to create batches from.
filter_by_x_uids_not_in_batchOptional[bool]FalseWhether to create batches with datapoints not in a batch.
divide_x_uids_evenly_to_assigneesOptional[bool]FalseWhether to divide the datapoints evenly among the provided assignees.

Returns

The list of created batches

Return type

List[Batch]

delete

classmethod delete(batch_uid)

Delete an annotation batch by its UID.

Parameters

NameTypeDefaultInfo
batch_uidintThe UID for the batch within Snorkel Flow.

Return type

None

export

export(path, selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000, csv_delimiter=',', quote_char='"', escape_char='\\\\')

Export the batch to a zipped CSV file.

Parameters

NameTypeDefaultInfo
pathUnion[str, Path]The path to the zipped CSV file. If the path does not end in .zip, it will be appended to the path.
selected_fieldsOptional[List[str]]NoneA list of fields to export. If not set, all fields will be exported.
include_annotationsboolFalseWhether to include annotations in the export.
include_ground_truthboolFalseWhether to include ground truth in the export.
max_rowsint10000The maximum number of rows to export.
csv_delimiterstr','The delimiter to use for CSV fields.
quote_charstr'"'The character to use for quoted fields in the CSV.
escape_charstr'\\\\'The character to use for escaping special characters in the CSV.

Returns

The path to the zipped CSV file

Return type

pathlib.Path

get

classmethod get(batch_uid)

Retrieve an annotation batch by its UID.

Parameters

NameTypeDefaultInfo
batch_uidintThe UID for the batch within Snorkel Flow.

Returns

The batch object

Return type

Batch

get_dataframe

get_dataframe(selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000)

Get a pandas DataFrame representation of the batch.

Parameters

NameTypeDefaultInfo
selected_fieldsOptional[List[str]]NoneA list of fields to include in the DataFrame. If not set, all fields will be included.
include_annotationsboolFalseWhether to include annotations in the DataFrame.
include_ground_truthboolFalseWhether to include ground truth in the DataFrame.
max_rowsint10000The maximum number of rows to include in the DataFrame.

Returns

The pandas DataFrame representation of the batch

Return type

pd.DataFrame

update

update(name=None, assignees=None, expert_source_uid=None)

Update properties of the annotation batch.

Parameters

NameTypeDefaultInfo
nameOptional[str]NoneThe new name of the batch.
assigneesOptional[List[int]]NoneThe user UIDs for the new assignees of the batches.
expert_source_uidOptional[int]NoneThe UID for the new expert source of the batches.

Return type

None

property batch_size: int

The number of examples in the batch.

property dataset_uid: int

The UID for the dataset within Snorkel Flow.

property label_schemas: List[LabelSchema]

The list of label schemas associated with this batch.

property name: str

The name of the batch.

property ts: datetime

The timestamp at which the batch was created.

property uid: int

The UID for the batch within Snorkel Flow.

property x_uids: List[str]

The UIDs for the examples in the batch.