snorkelai.sdk.develop.ModelNode
- class snorkelai.sdk.develop.ModelNode(uid, application_uid, config)
Bases:
NodeModelNode class represents a model node.
- __init__(uid, application_uid, config)
\_\_init\_\_
__init__
Methods
__init__(uid, application_uid, config)get(node_uid)Fetches a node by its UID. get_comments([username])Retreive the comments left on the current model node. get_dataframe([split, columns])Retrieve the data being passed directly through this node. get_ground_truth([split, user_format])Retrieve ground truth data for the current model node. get_lfs()Retrieve a list of currently active labeling functions for the current model node. get_training_set(training_set_uid[, split, ...])Retrieve a training set for this model node, specified by the training set UID. Attributes
application_uidThe unique identifier for the application this node belongs to configReturns the detailed configuration information for this node uidThe unique identifier for this node - get_comments(username=None)
Retreive the comments left on the current model node. This method will return a Pandas DataFrame whose columns contain the metadata and content for the comment.
Examples
>>> my_node.get_comments()
comment_uid user_uid x_uid body created_at is_edited
7 3 doc::1 hello 2023-09-26T17 False
get\_comments
get_comments
- get_dataframe(split=None, columns=None)
Retrieve the data being passed directly through this node. Can be filtered by a split or by a subset of columns (useful for large datasets). The data can also optionally include tag and comment metadata.
This dataframe is not the same as the dataframe returned by
Dataset.get_dataframe(). WhileDataset.get_dataframe()returns the source data, the dataframe returned byNode.get_dataframe()has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.Parameters
Parameters
Returns
Returns
A dataframe of the data being passed directly through this node, optionally filtered by split and/or columns and indexed by
x_uid. This DataFrame is the result of all preprocessing in the DAG pipeline up to this point.Return type
Return type
pd.DataFrame
Name Type Default Info split Optional[str]NoneOptionally restrict the data retrieved to a particular split, by default None (i.e., all splits). columns Optional[List[str]]NoneOptionally restrict the columns returned by this function, by default None. Useful for large datasets to significantly speed up retrieval time.
get\_dataframe
get_dataframe
- get_ground_truth(split=None, user_format=False)
Retrieve ground truth data for the current model node. Optionally filter by a particular split.
Parameters
Parameters
Returns
Returns
A Pandas DataFrame mapping the data index to the ground truth label. If
user_formatis True, thelabelcolumn will contain human-readable label names.Return type
Return type
pd.DataFrame
Name Type Default Info split Optional[str]NoneWhich data split to select, by default None (all splits). Can be one of “dev”, “train”, “valid”, or “test. user_format boolFalseWhether to return the ground truth in a human-readable format, by default False.
get\_ground\_truth
get_ground_truth
- get_lfs()
Retrieve a list of currently active labeling functions for the current model node.
Examples
>>> my_node.get_lfs()
[
LF(name='LF 1', label=3, templates=[...]),
LF(name='LF 2', label=2, templates=[...]),
LF(name='LF 3', label=1, templates=[...]),
]
get\_lfs
get_lfs
- get_training_set(training_set_uid, split=None, user_format=True)
Retrieve a training set for this model node, specified by the training set UID. Allow allows for filtering the training set by a particular data split.
Examples
>>> my_node.get_training_set(1)
training_set_labels training_set_probs
doc::100 stock [0.030..., 0.024...]Parameters
Parameters
Returns
Returns
A Pandas DataFrame of training set predictions and probabilities, indexed by
x_uid. Ifuser_formatis True, thepredscolumn will contain human-readable label names.Return type
Return type
pd.DataFrame
Name Type Default Info training_set_uid intA training set UID, which can be found under the “Models” accordion in Developer Studio. split Optional[str]NoneAn optional data split to return predictions for, by default None (all splits). Can be one of “train”, “dev”, “valid”, or “test”. user_format boolTrueWhether to return the predictions in a human-readable or compressed integer format, by default True (returning a human-readable format).
get\_training\_set
get_training_set