Skip to main content
Version: 0.96

snorkelflow.sdk.Batch

class snorkelflow.sdk.Batch(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Bases: object

The Batch object represents an annotation batch in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) annotation batches.

__init__(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Create a batch object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the create() and get() methods.

Parameters:
  • name (str) – The name of the batch

  • uid (int) – The UID for the batch within Snorkel Flow

  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • label_schemas (List[LabelSchema]) – The list of label schemas associated with this batch

  • batch_size (int) – The number of examples in the batch

  • ts (datetime) – The timestamp at which the batch was created

  • x_uids (List[str]) – The UIDs for the examples in the batch

Methods

__init__(name, uid, dataset_uid, ...)

Create a batch object in-memory with necessary properties.

commit(source_uid[, label_schema_uids])

Commit a source on a batch as ground truth.

create(cls, dataset_uid[, name, assignees, ...])

Create one or more annotation batches for a dataset.

delete(batch_uid)

Delete an annotation batch by its UID.

export(path[, selected_fields, ...])

Export the batch to a zipped CSV file.

get(batch_uid)

Retrieve an annotation batch by its UID.

get_dataframe([selected_fields, ...])

Get a pandas DataFrame representation of the batch.

update([name, assignees, expert_source_uid])

Update properties of the annotation batch.

Attributes

batch_size

The number of examples in the batch.

dataset_uid

The UID for the dataset within Snorkel Flow.

label_schemas

The list of label schemas associated with this batch.

name

The name of the batch.

ts

The timestamp at which the batch was created.

uid

The UID for the batch within Snorkel Flow.

x_uids

The UIDs for the examples in the batch.

commit(source_uid, label_schema_uids=None)

Commit a source on a batch as ground truth.

Parameters:
  • source_uid (int) – The UID for the source on the batch

  • label_schema_uids (Optional[List[int]], default: None) – The label schema UIDs to commit, defaults to all label schemas if not set

Return type:

None

classmethod create(cls, dataset_uid, name=None, assignees=None, label_schemas=None, batch_size=None, num_batches=None, randomize=False, random_seed=123, selection_strategy=None, split=None, x_uids=None, filter_by_x_uids_not_in_batch=False, divide_x_uids_evenly_to_assignees=False)

Create one or more annotation batches for a dataset.

Typically, Dataset.create_batches() is the recommended entrypoint for creating batches.

Parameters:
  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • name (Optional[str], default: None) – The name of the batch

  • assignees (Optional[List[int]], default: None) – The user UIDs for the assignees of the batches

  • label_schemas (Optional[List[LabelSchema]], default: None) – The label schemas assigned for the batches

  • batch_size (Optional[int], default: None) – The size of the batches

  • num_batches (Optional[int], default: None) – The number of batches

  • randomize (Optional[bool], default: False) – Whether to randomize the batches

  • random_seed (Optional[int], default: 123) – The seed for the randomization

  • selection_strategy (Optional[SelectionStrategy], default: None) – The SelectionStrategy for the batches

  • split (Optional[str], default: None) – The split (“train”, “test”, or “valid”) of the batches

  • x_uids (Optional[List[str]], default: None) – A list of datapoint uids to create batches from

  • filter_by_x_uids_not_in_batch (Optional[bool], default: False) – Whether to create batches with datapoints not in a batch

  • divide_x_uids_evenly_to_assignees (Optional[bool], default: False) – Whether to divide the datapoints evenly among the provided assignees

Returns:

The list of created batches

Return type:

List[Batch]

classmethod delete(batch_uid)

Delete an annotation batch by its UID.

Parameters:

batch_uid (int) – The UID for the batch within Snorkel Flow

Return type:

None

export(path, selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000, csv_delimiter=',', quote_char='"', escape_char='\\\\')

Export the batch to a zipped CSV file.

Parameters:
  • path (Union[str, Path]) – The path to the zipped CSV file. If the path does not end in .zip, it will be appended to the path.

  • selected_fields (Optional[List[str]], default: None) – A list of fields to export. If not set, all fields will be exported.

  • include_annotations (bool, default: False) – Whether to include annotations in the export

  • include_ground_truth (bool, default: False) – Whether to include ground truth in the export

  • max_rows (int, default: 10000) – The maximum number of rows to export

  • csv_delimiter (str, default: ',') – The delimiter to use for CSV fields

  • quote_char (str, default: '"') – The character to use for quoted fields in the CSV

  • escape_char (str, default: '\\\\') – The character to use for escaping special characters in the CSV

Returns:

The path to the zipped CSV file

Return type:

pathlib.Path

classmethod get(batch_uid)

Retrieve an annotation batch by its UID.

Parameters:

batch_uid (int) – The UID for the batch within Snorkel Flow

Returns:

The batch object

Return type:

Batch

get_dataframe(selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000)

Get a pandas DataFrame representation of the batch.

Parameters:
  • selected_fields (Optional[List[str]], default: None) – A list of fields to include in the DataFrame. If not set, all fields will be included.

  • include_annotations (bool, default: False) – Whether to include annotations in the DataFrame

  • include_ground_truth (bool, default: False) – Whether to include ground truth in the DataFrame

  • max_rows (int, default: 10000) – The maximum number of rows to include in the DataFrame

Returns:

The pandas DataFrame representation of the batch

Return type:

pd.DataFrame

update(name=None, assignees=None, expert_source_uid=None)

Update properties of the annotation batch.

Parameters:
  • name (Optional[str], default: None) – The new name of the batch

  • assignees (Optional[List[int]], default: None) – The user UIDs for the new assignees of the batches

  • expert_source_uid (Optional[int], default: None) – The UID for the new expert source of the batches

Return type:

None

property batch_size: int

The number of examples in the batch.

property dataset_uid: int

The UID for the dataset within Snorkel Flow.

property label_schemas: List[LabelSchema]

The list of label schemas associated with this batch.

property name: str

The name of the batch.

property ts: datetime

The timestamp at which the batch was created.

property uid: int

The UID for the batch within Snorkel Flow.

property x_uids: List[str]

The UIDs for the examples in the batch.