snorkelflow.sdk.Batch
- class snorkelflow.sdk.Batch(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)
Bases:
object
The Batch object represents an annotation batch in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) annotation batches.
- __init__(name, uid, dataset_uid, label_schemas, batch_size, ts, x_uids)
Create a batch object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the
create()
andget()
methods.Parameters
Parameters
Name Type Default Info name str
The name of the batch. uid int
The UID for the batch within Snorkel Flow. dataset_uid int
The UID for the dataset within Snorkel Flow. label_schemas List[LabelSchema]
The list of label schemas associated with this batch. batch_size int
The number of examples in the batch. ts datetime
The timestamp at which the batch was created. x_uids List[str]
The UIDs for the examples in the batch.
\_\_init\_\_
__init__
Methods
__init__
(name, uid, dataset_uid, ...)Create a batch object in-memory with necessary properties.
commit
(source_uid[, label_schema_uids])Commit a source on a batch as ground truth.
create
(cls, dataset_uid[, name, assignees, ...])Create one or more annotation batches for a dataset.
delete
(batch_uid)Delete an annotation batch by its UID.
export
(path[, selected_fields, ...])Export the batch to a zipped CSV file.
get
(batch_uid)Retrieve an annotation batch by its UID.
get_dataframe
([selected_fields, ...])Get a pandas DataFrame representation of the batch.
update
([name, assignees, expert_source_uid])Update properties of the annotation batch.
Attributes
The number of examples in the batch.
The UID for the dataset within Snorkel Flow.
The list of label schemas associated with this batch.
The name of the batch.
The timestamp at which the batch was created.
The UID for the batch within Snorkel Flow.
The UIDs for the examples in the batch.
- commit(source_uid, label_schema_uids=None)
Commit a source on a batch as ground truth.
commit
commit
- classmethod create(cls, dataset_uid, name=None, assignees=None, label_schemas=None, batch_size=None, num_batches=None, randomize=False, random_seed=123, selection_strategy=None, split=None, x_uids=None, filter_by_x_uids_not_in_batch=False, divide_x_uids_evenly_to_assignees=False)
Create one or more annotation batches for a dataset.
Typically, Dataset.create_batches() is the recommended entrypoint for creating batches.
Parameters
Parameters
Returns
Returns
The list of created batches
Return type
Return type
List[Batch]
Name Type Default Info dataset_uid int
The UID for the dataset within Snorkel Flow. name Optional[str]
None
The name of the batch. assignees Optional[List[int]]
None
The user UIDs for the assignees of the batches. label_schemas Optional[List[LabelSchema]]
None
The label schemas assigned for the batches. batch_size Optional[int]
None
The size of the batches. num_batches Optional[int]
None
The number of batches. randomize Optional[bool]
False
Whether to randomize the batches. random_seed Optional[int]
123
The seed for the randomization. selection_strategy Optional[SelectionStrategy]
None
The SelectionStrategy for the batches. split Optional[str]
None
The split (“train”, “test”, or “valid”) of the batches. x_uids Optional[List[str]]
None
A list of datapoint uids to create batches from. filter_by_x_uids_not_in_batch Optional[bool]
False
Whether to create batches with datapoints not in a batch. divide_x_uids_evenly_to_assignees Optional[bool]
False
Whether to divide the datapoints evenly among the provided assignees.
create
create
- classmethod delete(batch_uid)
Delete an annotation batch by its UID.
delete
delete
- export(path, selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000, csv_delimiter=',', quote_char='"', escape_char='\\\\')
Export the batch to a zipped CSV file.
Parameters
Parameters
Returns
Returns
The path to the zipped CSV file
Return type
Return type
pathlib.Path
Name Type Default Info path Union[str, Path]
The path to the zipped CSV file. If the path does not end in .zip, it will be appended to the path. selected_fields Optional[List[str]]
None
A list of fields to export. If not set, all fields will be exported. include_annotations bool
False
Whether to include annotations in the export. include_ground_truth bool
False
Whether to include ground truth in the export. max_rows int
10000
The maximum number of rows to export. csv_delimiter str
','
The delimiter to use for CSV fields. quote_char str
'"'
The character to use for quoted fields in the CSV. escape_char str
'\\\\'
The character to use for escaping special characters in the CSV.
export
export
- classmethod get(batch_uid)
Retrieve an annotation batch by its UID.
get
get
- get_dataframe(selected_fields=None, include_annotations=False, include_ground_truth=False, max_rows=10000)
Get a pandas DataFrame representation of the batch.
Parameters
Parameters
Returns
Returns
The pandas DataFrame representation of the batch
Return type
Return type
pd.DataFrame
Name Type Default Info selected_fields Optional[List[str]]
None
A list of fields to include in the DataFrame. If not set, all fields will be included. include_annotations bool
False
Whether to include annotations in the DataFrame. include_ground_truth bool
False
Whether to include ground truth in the DataFrame. max_rows int
10000
The maximum number of rows to include in the DataFrame.
get\_dataframe
get_dataframe
- update(name=None, assignees=None, expert_source_uid=None)
Update properties of the annotation batch.
update
update
- property batch_size: int
The number of examples in the batch.
- property dataset_uid: int
The UID for the dataset within Snorkel Flow.
- property label_schemas: List[LabelSchema]
The list of label schemas associated with this batch.
- property name: str
The name of the batch.
- property ts: datetime
The timestamp at which the batch was created.
- property uid: int
The UID for the batch within Snorkel Flow.
- property x_uids: List[str]
The UIDs for the examples in the batch.