snorkelai.sdk.develop.Cluster
- final class snorkelai.sdk.develop.Cluster(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))
Bases:
BaseProvides methods for viewing and updating clusters and the ability to view datapoints assigned to a cluster.
Clusters represent groups of similar datapoints identified during error analysis. They help identify common failure patterns in model predictions and provide insights for targeted improvements. Clusters can currently only be created and deleted through the ErrorAnalysis class.
Read more in the Error Analysis Guide.
Using the
Clusterclass requires the following import:from snorkelai.sdk.develop import Cluster- __init__(cluster_uid, error_analysis_uid, name, description=None, improvement_strategy=None, examples=None, datapoint_count=0, virtualized_dataset_uid=None, created_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386394), updated_at=datetime.datetime(2025, 10, 9, 1, 48, 17, 386397))
Initializes a Cluster instance.
Parameters
Parameters
Name Type Default Info cluster_uid intUnique identifier for the cluster. error_analysis_uid intUnique identifier for the associated error analysis run. name strName of the cluster. description Optional[str]NoneDescription of the cluster. improvement_strategy Optional[str]NoneSuggested improvement strategy for the cluster. examples Optional[List[str]]NoneExample datapoints in the cluster. datapoint_count int0Number of datapoints in the cluster. virtualized_dataset_uid Optional[int]NoneUnique identifier for the virtualized dataset containing the datapoints in the cluster. created_at Optional[datetime]datetime.datetime(2025, 10, 9, 1, 48, 17, 386394)Timestamp when the cluster was created. updated_at Optional[datetime]datetime.datetime(2025, 10, 9, 1, 48, 17, 386397)Timestamp when the cluster was last updated.
\_\_init\_\_
__init__
Methods
__init__(cluster_uid, error_analysis_uid, name)Initializes a Cluster instance. create()Creates this cluster. delete(cluster_uid)Deletes this cluster. get(cluster_uid)Retrieves an existing cluster by its unique identifier. get_cluster_membership()Fetches datapoint membership for a specific cluster. get_clusters(error_analysis_uid, benchmark_uid)Fetches clusters from a completed error analysis. update([name, description])Updates the cluster properties. Attributes
created_atThe timestamp when the cluster was created. datapoint_countThe number of datapoints in the cluster. descriptionThe description of the cluster. error_analysis_uidThe unique identifier for the associated error analysis run. examplesExample datapoints in the cluster. improvement_strategyThe suggested improvement strategy for the cluster. nameThe name of the cluster. uidThe unique identifier for the cluster. updated_atThe timestamp when the cluster was last updated. virtualized_dataset_uidThe unique identifier for the virtualized dataset containing the datapoints in the cluster. - classmethod create()
Creates this cluster.
create
create
- classmethod delete(cluster_uid)
Deletes this cluster.
delete
delete
- classmethod get(cluster_uid)
Retrieves an existing cluster by its unique identifier.
Parameters
Parameters
Returns
Returns
The Cluster instance for the specified cluster.
Return type
Return type
Raises
Raises
ValueError – If no cluster exists with the given ID.
Name Type Default Info cluster_uid intUnique identifier of the cluster to retrieve. Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
get
get
- get_cluster_membership()
Fetches datapoint membership for a specific cluster.
Returns
Returns
DataFrame containing all the datapoints in the cluster.
Return type
Return type
pd.DataFrameRaises
Raises
ValueError – If there are no datapoints assigned to the cluster.
Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
membership_df = cluster.get_cluster_membership()
get\_cluster\_membership
get_cluster_membership
- classmethod get_clusters(error_analysis_uid, benchmark_uid)
Fetches clusters from a completed error analysis.
Parameters
Parameters
Returns
Returns
List of clusters.
Return type
Return type
List[Cluster]
Raises
Raises
RuntimeError – If called before analysis is complete.
ValueError – If analysis failed or was deleted.
Name Type Default Info error_analysis_uid intUnique identifier of the error analysis run. benchmark_uid intUnique identifier of the benchmark associated with the error analysis run. Example
from snorkelai.sdk.develop import Cluster
clusters = Cluster.get_clusters(error_analysis_uid=123, benchmark_uid=456)
get\_clusters
get_clusters
- update(name=None, description=None)
Updates the cluster properties.
Parameters
Parameters
Raises
Raises
ValueError – If there are other errors during cluster update.
Return type
Return type
None
Name Type Default Info name Optional[str]NoneThe new name for the cluster, by default None. description Optional[str]NoneThe new description for the cluster, by default None. Example
from snorkelai.sdk.develop import Cluster
cluster = Cluster.get(cluster_uid=123)
cluster.update(name="New cluster name", description="Updated description")
update
update
- property created_at: datetime
The timestamp when the cluster was created.
- property datapoint_count: int
The number of datapoints in the cluster.
- property description: str | None
The description of the cluster.
- property error_analysis_uid: int
The unique identifier for the associated error analysis run.
- property examples: List[str] | None
Example datapoints in the cluster.
- property improvement_strategy: str | None
The suggested improvement strategy for the cluster.
- property name: str
The name of the cluster.
- property uid: int
The unique identifier for the cluster.
- property updated_at: datetime
The timestamp when the cluster was last updated.
- property virtualized_dataset_uid: int | None
The unique identifier for the virtualized dataset containing the datapoints in the cluster.