snorkelai.sdk.develop.Cluster
- final class snorkelai.sdk.develop.Cluster(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))
Bases:
Base
Provides methods for viewing and updating clusters and the ability to view datapoints assigned to a cluster.
Clusters represent groups of similar datapoints identified during error analysis. They help identify common failure patterns in model predictions and provide insights for targeted improvements. Clusters can currently only be created and deleted through the ErrorAnalysis class.
Read more in the Error Analysis Guide.
Using the
Cluster
class requires the following import:from snorkelai.sdk.develop import Cluster
- __init__(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))
Initializes a Cluster instance.
Parameters
Parameters
Name Type Default Info cluster_uid int
Unique identifier for the cluster. error_analysis_uid int
Unique identifier for the associated error analysis run. name str
Name of the cluster. description Optional[str]
None
Description of the cluster. improvement_strategy Optional[str]
None
Suggested improvement strategy for the cluster. examples Optional[List[str]]
None
Example datapoints in the cluster. datapoint_count int
0
Number of datapoints in the cluster. virtualized_dataset_uid Optional[int]
None
Unique identifier for the virtualized dataset containing the datapoints in the cluster. created_at Optional[datetime]
datetime.datetime(2025, 10, 9, 1, 48, 17, 386394)
Timestamp when the cluster was created. updated_at Optional[datetime]
datetime.datetime(2025, 10, 9, 1, 48, 17, 386397)
Timestamp when the cluster was last updated.
\_\_init\_\_
__init__
Methods
__init__
(cluster_uid, error_analysis_uid, name)Initializes a Cluster instance. create
()Creates this cluster. delete
(cluster_uid)Deletes this cluster. get
(cluster_uid)Retrieves an existing cluster by its unique identifier. get_cluster_membership
()Fetches datapoint membership for a specific cluster. get_clusters
(error_analysis_uid, benchmark_uid)Fetches clusters from a completed error analysis. update
([name, description])Updates the cluster properties. Attributes
created_at
The timestamp when the cluster was created. datapoint_count
The number of datapoints in the cluster. description
The description of the cluster. error_analysis_uid
The unique identifier for the associated error analysis run. examples
Example datapoints in the cluster. improvement_strategy
The suggested improvement strategy for the cluster. name
The name of the cluster. uid
The unique identifier for the cluster. updated_at
The timestamp when the cluster was last updated. virtualized_dataset_uid
The unique identifier for the virtualized dataset containing the datapoints in the cluster. - classmethod create()
Creates this cluster.
create
create
- classmethod delete(cluster_uid)
Deletes this cluster.
delete
delete
- classmethod get(cluster_uid)
Retrieves an existing cluster by its unique identifier.
Parameters
Parameters
Returns
Returns
The Cluster instance for the specified cluster.
Return type
Return type
Raises
Raises
ValueError – If no cluster exists with the given ID.
Name Type Default Info cluster_uid int
Unique identifier of the cluster to retrieve. Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
get
get
- get_cluster_membership()
Fetches datapoint membership for a specific cluster.
Returns
Returns
DataFrame containing all the datapoints in the cluster.
Return type
Return type
pd.DataFrame
Raises
Raises
ValueError – If there are no datapoints assigned to the cluster.
Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
membership_df = cluster.get_cluster_membership()
get\_cluster\_membership
get_cluster_membership
- classmethod get_clusters(error_analysis_uid, benchmark_uid)
Fetches clusters from a completed error analysis.
Parameters
Parameters
Returns
Returns
List of clusters.
Return type
Return type
List[Cluster]
Raises
Raises
RuntimeError – If called before analysis is complete.
ValueError – If analysis failed or was deleted.
Name Type Default Info error_analysis_uid int
Unique identifier of the error analysis run. benchmark_uid int
Unique identifier of the benchmark associated with the error analysis run. Example
from snorkelai.sdk.develop import Cluster
clusters = Cluster.get_clusters(error_analysis_uid=123, benchmark_uid=456)
get\_clusters
get_clusters
- update(name=None, description=None)
Updates the cluster properties.
Parameters
Parameters
Raises
Raises
ValueError – If there are other errors during cluster update.
Return type
Return type
None
Name Type Default Info name Optional[str]
None
The new name for the cluster, by default None. description Optional[str]
None
The new description for the cluster, by default None. Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
cluster.update(name="New cluster name", description="Updated description")
update
update
- property created_at: datetime
The timestamp when the cluster was created.
- property datapoint_count: int
The number of datapoints in the cluster.
- property description: str | None
The description of the cluster.
- property error_analysis_uid: int
The unique identifier for the associated error analysis run.
- property examples: List[str] | None
Example datapoints in the cluster.
- property improvement_strategy: str | None
The suggested improvement strategy for the cluster.
- property name: str
The name of the cluster.
- property uid: int
The unique identifier for the cluster.
- property updated_at: datetime
The timestamp when the cluster was last updated.
- property virtualized_dataset_uid: int | None
The unique identifier for the virtualized dataset containing the datapoints in the cluster.