Skip to main content
Version: 0.94

snorkelflow.sdk.LabelSchema

class snorkelflow.sdk.LabelSchema(name, uid, dataset_uid, label_map)

Bases: object

The LabelSchema object represents a label schema in Snorkel Flow. Currently, this interface only represents Dataset-level (not Node-level) label schemas.

__init__(name, uid, dataset_uid, label_map)

Create a label schema object in-memory with necessary properties. This constructor should not be called directly, and should instead be accessed through the create() and get() methods

Parameters:
  • name (str) – The name of the label schema

  • uid (int) – The UID for the label schema within Snorkel Flow

  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • label_map (Dict[str, int]) – The label map of the label schema

Methods

__init__(name, uid, dataset_uid, label_map)

Create a label schema object in-memory with necessary properties.

copy(name[, description, label_map, ...])

Copy a label schema.

create(dataset_uid, name, data_type, ...[, ...])

Create a label schema for a dataset.

delete(label_schema)

Delete a label schema by name or UID.

get(label_schema)

Retrieve a label schema by name or UID.

Attributes

dataset_uid

The UID for the dataset within Snorkel Flow.

label_map

The label map of the label schema.

name

The name of the label schema.

uid

The UID for the label schema within Snorkel Flow.

copy(name, description=None, label_map=None, label_descriptions=None, updated_label_schema=None)

Copy a label schema.

Parameters:
  • name (str) – The name of the new label schema

  • description (Optional[str], default: None) – The description of the new label schema

  • label_map (Optional[Dict[str, int]], default: None) – The label map of the new label schema

  • label_descriptions (Optional[Dict[str, str]], default: None) – The label descriptions of the new label schema

  • updated_label_schema (Optional[Dict[str, str]], default: None) – The update mapping to apply to the new label schema. This is a dictionary mapping label names for the current label schema to those for the new label schema. If a label for the current label schema is removed, it is mapped to None. Examples: 1. Rename “old_1” to “new_1” and remove “old_2”: {“old_1”: “new_1”, “old_2”: None} 2. Merge “old_1” and “old_2” to “new_1”: {“old_1”: “new_1”, “old_2”: “new_1”} 3. Split “old_1” to “new_1” and “new_2”, and keep assets labeled as “old_1” at “new_1”: {“old_1”: “new_1”} 4. Add “new_3”: None (no change to the existing assets)

Returns:

The new label schema object

Return type:

LabelSchema

classmethod create(dataset_uid, name, data_type, task_type, label_map, multi_label=False, description=None, label_column=None, label_descriptions=None, primary_field=None)

Create a label schema for a dataset.

Typically, Dataset.create_label_schema() is the recommended entrypoint for creating label schemas.

Parameters:
  • dataset_uid (int) – The UID for the dataset within Snorkel Flow

  • name (str) – The name of the label schema

  • data_type (str) – The data type of the label schema

  • task_type (str) – The task type of the label schema

  • label_map (Union[Dict[str, int], List[str]]) – A dictionary mapping label names to their integer values, or a list of label names

  • multi_label (bool, default: False) – Whether the label schema is a multi-label schema, by default False

  • description (Optional[str], default: None) – A description of the label schema, by default None

  • label_column (Optional[str], default: None) – The name of the column that contains the labels, by default None

  • label_descriptions (Optional[Dict[str, str]], default: None) – A dictionary mapping label names to their descriptions, by default None

  • primary_field (Optional[str], default: None) – The primary field of the label schema, by default None

Returns:

The label schema object

Return type:

LabelSchema

classmethod delete(label_schema)

Delete a label schema by name or UID.

Parameters:

label_schema (Union[str, int]) – The name or UID of the label schema

Return type:

None

classmethod get(label_schema)

Retrieve a label schema by name or UID.

Parameters:

label_schema (Union[str, int]) – The name or UID of the label schema

Returns:

The label schema object

Return type:

LabelSchema

Raises:

ValueError – If no label schema is found with the given name or UID

property dataset_uid: int

The UID for the dataset within Snorkel Flow.

property label_map: Dict[str, int]

The label map of the label schema.

property name: str

The name of the label schema.

property uid: int

The UID for the label schema within Snorkel Flow.