Version: 0.93


class snorkelflow.sdk.Batch(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Bases: object

The Batch object represents an annotation batch in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) annotation batches.

__init__(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)

Create a batch object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the create() and get() methods.

  • name (str) – The name of the batch

  • uid (int) – The UID for the batch within Snorkel Flow

  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • label_schemas (List[LabelSchema]) – The list of label schemas associated with this batch

  • batch_size (int) – The number of examples in the batch

  • ts (datetime) – The timestamp at which the batch was created

  • x_uids (List[str]) – The UIDs for the examples in the batch


commit(source_uid, label_schema_uids=None)

Commit a source on a batch as ground truth.

  • source_uid (int) – The UID for the source on the batch

  • label_schema_uids (Optional[List[int]], default: None) – The label schema UIDs to commit, defaults to all label schemas if not set

Return type:


classmethod create(dataset_uid, name=None, assignees=None, label_schemas=None, batch_size=None, num_batches=None, randomize=False, random_seed=123, selection_strategy=None, split=None, x_uids=None)

Create one or more annotation batches for a dataset.

Typically, Dataset.create_batches() is the recommended entrypoint for creating batches.

  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • name (Optional[str], default: None) – The name of the batch

  • assignees (Optional[List[int]], default: None) – The user UIDs for the assignees of the batches

  • label_schemas (Optional[List[LabelSchema]], default: None) – The label schemas assigned for the batches

  • batch_size (Optional[int], default: None) – The size of the batches

  • num_batches (Optional[int], default: None) – The number of batches

  • randomize (Optional[bool], default: False) – Whether to randomize the batches

  • random_seed (Optional[int], default: 123) – The seed for the randomization

  • selection_strategy (Optional[SelectionStrategy], default: None) – The SelectionStrategy for the batches

  • split (Optional[str], default: None) – The split (“train”, “test”, or “valid”) of the batches

  • x_uids (Optional[List[str]], default: None) – The UIDs of the examples to include in the batch


The list of created batches

Return type:


classmethod delete(batch_uid)

Delete an annotation batch by its UID.


batch_uid (int) – The UID for the batch within Snorkel Flow

Return type:


export(path, selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000, csv_delimiter=',', quote_char='"', escape_char='\\\\')

Export the batch to a zipped CSV file.

  • path (Union[str, Path]) – The path to the zipped CSV file. If the path does not end in .zip, it will be appended to the path.

  • selected_fields (Optional[List[str]], default: None) – A list of fields to export. If not set, all fields will be exported.

  • include_annotations (bool, default: False) – Whether to include annotations in the export

  • include_ground_truth (bool, default: False) – Whether to include ground truth in the export

  • max_rows (int, default: 10000) – The maximum number of rows to export

  • csv_delimiter (str, default: ',') – The delimiter to use for CSV fields

  • quote_char (str, default: '"') – The character to use for quoted fields in the CSV

  • escape_char (str, default: '\\\\') – The character to use for escaping special characters in the CSV


The path to the zipped CSV file

Return type:


classmethod get(batch_uid)

Retrieve an annotation batch by its UID.


batch_uid (int) – The UID for the batch within Snorkel Flow


The batch object

Return type:


get_dataframe(selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000)

Get a pandas DataFrame representation of the batch.

  • selected_fields (Optional[List[str]], default: None) – A list of fields to include in the DataFrame. If not set, all fields will be included.

  • include_annotations (bool, default: False) – Whether to include annotations in the DataFrame

  • include_ground_truth (bool, default: False) – Whether to include ground truth in the DataFrame

  • max_rows (int, default: 10000) – The maximum number of rows to include in the DataFrame


The pandas DataFrame representation of the batch

Return type:


update(name=None, assignees=None, expert_source_uid=None)

Update properties of the annotation batch.

  • name (Optional[str], default: None) – The new name of the batch

  • assignees (Optional[List[int]], default: None) – The user UIDs for the new assignees of the batches

  • expert_source_uid (Optional[int], default: None) – The UID for the new expert source of the batches

Return type:


property batch_size: int

The number of examples in the batch.

property dataset_uid: int

The UID for the dataset within Snorkel Flow.

property label_schemas: List[LabelSchema]

The list of label schemas associated with this batch.

property name: str

The name of the batch.

property ts: datetime

The timestamp at which the batch was created.

property uid: int

The UID for the batch within Snorkel Flow.

property x_uids: List[str]

The UIDs for the examples in the batch.