Skip to main content
Version: 25.9

snorkelai.sdk.develop.Cluster

final class snorkelai.sdk.develop.Cluster(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))

Bases: Base

Provides methods for viewing and updating clusters and the ability to view datapoints assigned to a cluster.

Clusters represent groups of similar datapoints identified during error analysis. They help identify common failure patterns in model predictions and provide insights for targeted improvements. Clusters can currently only be created and deleted through the ErrorAnalysis class.

Read more in the Error Analysis Guide.

Using the Cluster class requires the following import:

from snorkelai.sdk.develop import Cluster

__init__

__init__(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))

Initializes a Cluster instance.

Parameters

NameTypeDefaultInfo
cluster_uidintUnique identifier for the cluster.
error_analysis_uidintUnique identifier for the associated error analysis run.
namestrName of the cluster.
descriptionOptional[str]NoneDescription of the cluster.
improvement_strategyOptional[str]NoneSuggested improvement strategy for the cluster.
examplesOptional[List[str]]NoneExample datapoints in the cluster.
datapoint_countint0Number of datapoints in the cluster.
virtualized_dataset_uidOptional[int]NoneUnique identifier for the virtualized dataset containing the datapoints in the cluster.
created_atOptional[datetime]datetime.datetime(2025, 10, 9, 1, 48, 17, 386394)Timestamp when the cluster was created.
updated_atOptional[datetime]datetime.datetime(2025, 10, 9, 1, 48, 17, 386397)Timestamp when the cluster was last updated.

Methods

__init__(cluster_uid, error_analysis_uid, name)Initializes a Cluster instance.
create()Creates this cluster.
delete(cluster_uid)Deletes this cluster.
get(cluster_uid)Retrieves an existing cluster by its unique identifier.
get_cluster_membership()Fetches datapoint membership for a specific cluster.
get_clusters(error_analysis_uid, benchmark_uid)Fetches clusters from a completed error analysis.
update([name, description])Updates the cluster properties.

Attributes

created_atThe timestamp when the cluster was created.
datapoint_countThe number of datapoints in the cluster.
descriptionThe description of the cluster.
error_analysis_uidThe unique identifier for the associated error analysis run.
examplesExample datapoints in the cluster.
improvement_strategyThe suggested improvement strategy for the cluster.
nameThe name of the cluster.
uidThe unique identifier for the cluster.
updated_atThe timestamp when the cluster was last updated.
virtualized_dataset_uidThe unique identifier for the virtualized dataset containing the datapoints in the cluster.

create

classmethod create()

Creates this cluster.

Raises

NotImplementedError – Cluster creation is not supported directly. Use ErrorAnalysis to create clusters.

Return type

Cluster

delete

classmethod delete(cluster_uid)

Deletes this cluster.

Parameters

NameTypeDefaultInfo
cluster_uidintUnique identifier of the cluster to delete.

Raises

NotImplementedError – Cluster deletion is not supported. Delete the associated error analysis run to remove clusters.

Return type

None

get

classmethod get(cluster_uid)

Retrieves an existing cluster by its unique identifier.

Parameters

NameTypeDefaultInfo
cluster_uidintUnique identifier of the cluster to retrieve.

Returns

The Cluster instance for the specified cluster.

Return type

Cluster

Raises

ValueError – If no cluster exists with the given ID.

Example

from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)

get_cluster_membership

get_cluster_membership()

Fetches datapoint membership for a specific cluster.

Returns

DataFrame containing all the datapoints in the cluster.

Return type

pd.DataFrame

Raises

ValueError – If there are no datapoints assigned to the cluster.

Example

from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
membership_df = cluster.get_cluster_membership()

get_clusters

classmethod get_clusters(error_analysis_uid, benchmark_uid)

Fetches clusters from a completed error analysis.

Parameters

NameTypeDefaultInfo
error_analysis_uidintUnique identifier of the error analysis run.
benchmark_uidintUnique identifier of the benchmark associated with the error analysis run.

Returns

List of clusters.

Return type

List[Cluster]

Raises

  • RuntimeError – If called before analysis is complete.

  • ValueError – If analysis failed or was deleted.

Example

from snorkelai.sdk.develop import Cluster
clusters = Cluster.get_clusters(error_analysis_uid=123, benchmark_uid=456)

update

update(name=None, description=None)

Updates the cluster properties.

Parameters

NameTypeDefaultInfo
nameOptional[str]NoneThe new name for the cluster, by default None.
descriptionOptional[str]NoneThe new description for the cluster, by default None.

Raises

ValueError – If there are other errors during cluster update.

Return type

None

Example

from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
cluster.update(name="New cluster name", description="Updated description")
property created_at: datetime

The timestamp when the cluster was created.

property datapoint_count: int

The number of datapoints in the cluster.

property description: str | None

The description of the cluster.

property error_analysis_uid: int

The unique identifier for the associated error analysis run.

property examples: List[str] | None

Example datapoints in the cluster.

property improvement_strategy: str | None

The suggested improvement strategy for the cluster.

property name: str

The name of the cluster.

property uid: int

The unique identifier for the cluster.

property updated_at: datetime

The timestamp when the cluster was last updated.

property virtualized_dataset_uid: int | None

The unique identifier for the virtualized dataset containing the datapoints in the cluster.