snorkelflow.sdk.ModelNode
- class snorkelflow.sdk.ModelNode(uid, application_uid, config)
Bases:
Node
ModelNode class represents a model node.
- __init__(uid, application_uid, config)
Methods
__init__
(uid, application_uid, config)get
(node_uid)Fetches a node by its UID.
get_comments
([username])Retreive the comments left on the current model node.
get_dataframe
([split, columns])Retrieve the data being passed directly through this node.
get_ground_truth
([split, user_format])Retrieve ground truth data for the current model node.
get_lfs
()Retrieve a list of currently active labeling functions for the current model node.
get_model_preds
([model_uid, split, ...])Retrieve model predictions and probabilities for the current model node and a given model UID.
get_tags
([is_context])Retrieve tags put on the current model node.
get_training_set
(training_set_uid[, split, ...])Retrieve a training set for this model node, specified by the training set UID.
Attributes
application_uid
The unique identifier for the application this node belongs to
config
Returns the detailed configuration information for this node
uid
The unique identifier for this node
- get_comments(username=None)
Retreive the comments left on the current model node. This method will return a Pandas DataFrame whose columns contain the metadata and content for the comment.
Examples
>>> my_node.get_comments()
comment_uid user_uid x_uid body created_at is_edited
7 3 doc::1 hello 2023-09-26T17 False- Parameters:
username (
Optional
[str
], default:None
) – Optionally, return only a specific user’s coments. By default returns all comments.- Returns:
A Pandas DataFrame containing the comments left on the model node.
- Return type:
pd.DataFrame
- get_dataframe(split=None, columns=None)
Retrieve the data being passed directly through this node. Can be filtered by a split or by a subset of columns (useful for large datasets). The data can also optionally include tag and comment metadata.
This dataframe is not the same as the dataframe returned by
Dataset.get_dataframe()
. WhileDataset.get_dataframe()
returns the source data, the dataframe returned byNode.get_dataframe()
has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.- Parameters:
split (
Optional
[str
], default:None
) – Optionally restrict the data retrieved to a particular split, by default None (i.e., all splits)columns (
Optional
[List
[str
]], default:None
) – Optionally restrict the columns returned by this function, by default None. Useful for large datasets to significantly speed up retrieval time
- Returns:
A dataframe of the data being passed directly through this node, optionally filtered by split and/or columns and indexed by
x_uid
. This DataFrame is the result of all preprocessing in the DAG pipeline up to this point.- Return type:
pd.DataFrame
- get_ground_truth(split=None, user_format=False)
Retrieve ground truth data for the current model node. Optionally filter by a particular split.
- Parameters:
split (
Optional
[str
], default:None
) – Which data split to select, by default None (all splits). Can be one of “dev”, “train”, “valid”, or “testuser_format (
bool
, default:False
) – Whether to return the ground truth in a human-readable format, by default False
- Returns:
A Pandas DataFrame mapping the data index to the ground truth label. If
user_format
is True, thelabel
column will contain human-readable label names.- Return type:
pd.DataFrame
- get_lfs()
Retrieve a list of currently active labeling functions for the current model node.
Examples
>>> my_node.get_lfs()
[
LF(name='LF 1', label=3, templates=[...]),
LF(name='LF 2', label=2, templates=[...]),
LF(name='LF 3', label=1, templates=[...]),
]- Returns:
A list of all currently active labeling functions for the current model node.
- Return type:
List[LF]
- get_model_preds(model_uid=None, split=None, is_context=False, user_format=True)
Retrieve model predictions and probabilities for the current model node and a given model UID. If no model UID is provided, the most recent model’s predictions are returned.
Examples
>>> my_node.get_model_preds()
preds probs
x_uid
doc::994 0 [0.543..., 0.080..., 0.37...]
doc::999 2 [0.327..., 0.201..., 0.4...]- Parameters:
model_uid (
Optional
[int
], default:None
) – The UID of a trained model, by default the latest model. All trained models can be seen from the “Models” accordion in Developer Studio.split (
Optional
[str
], default:None
) – Optionally filter model predictions by split, by default returns predictions for all splits. Splits can be one of “train”, “dev”, “valid”, or “test”.is_context (
bool
, default:False
) – When True, retrieves predictions at the document level instead of the span level, by default False. Only applicable for information extraction tasks.user_format (
bool
, default:True
) – Whether to return the predictions in a human-readable or compressed integer format, by default True (returning a human-readable format).
- Returns:
A Pandas DataFrame of model predictions and probabilities, indexed by
x_uid
. Ifuser_format
is True, thepreds
column will contain human-readable label names.- Return type:
pd.DataFrame
- get_tags(is_context=False)
Retrieve tags put on the current model node. For information extraction tasks, this method allows for fine-grained control over whether you want to retrieve tags at the document level or at the span level.
Examples
>>> my_node.get_tags()
x_uid
doc::10005 [loan-err, new_tag1]
doc::10006 [new_tag1]
doc::10198 [Key-EMP-error]
Name: tags, dtype: object- Parameters:
is_context (
bool
, default:False
) – When True, retrieves tags at the document level instead of the span level, by default False. Only applicable for information extraction tasks- Returns:
A Pandas Series containing the tags put on the model node, indexed by
x_uid
.- Return type:
pd.Series
- get_training_set(training_set_uid, split=None, user_format=True)
Retrieve a training set for this model node, specified by the training set UID. Allow allows for filtering the training set by a particular data split.
Examples
>>> my_node.get_training_set(1)
training_set_labels training_set_probs
doc::100 stock [0.030..., 0.024...]- Parameters:
training_set_uid (
int
) – A training set UID, which can be found under the “Models” accordion in Developer Studio.split (
Optional
[str
], default:None
) – An optional data split to return predictions for, by default None (all splits). Can be one of “train”, “dev”, “valid”, or “test”.user_format (
bool
, default:True
) – Whether to return the predictions in a human-readable or compressed integer format, by default True (returning a human-readable format).
- Returns:
A Pandas DataFrame of training set predictions and probabilities, indexed by
x_uid
. Ifuser_format
is True, thepreds
column will contain human-readable label names.- Return type:
pd.DataFrame