Version: 0.91


class snorkelflow.sdk.ModelNode(uid, application_uid, config)

Bases: Node

ModelNode class represents a model node.

__init__(uid, application_uid, config)


Fetches a node by its UID.


Retreive the comments left on the current model node.

get_dataframe([split, columns])

Retrieve the data being passed directly through this node.

get_ground_truth([split, user_format])

Retrieve ground truth data for the current model node.


Retrieve a list of currently active labeling functions for the current model node.

get_model_preds([model_uid, split, ...])

Retrieve model predictions and probabilities for the current model node and a given model UID.


Retrieve tags put on the current model node.

get_training_set(training_set_uid[, split, ...])

Retrieve a training set for this model node, specified by the training set UID.



The unique identifier for the application this node belongs to


Returns the detailed configuration information for this node


The unique identifier for this node


Retreive the comments left on the current model node. This method will return a Pandas DataFrame whose columns contain the metadata and content for the comment.


>>> my_node.get_comments()
comment_uid user_uid x_uid body created_at is_edited
7 3 doc::1 hello 2023-09-26T17 False

username (Optional[str], default: None) – Optionally, return only a specific user’s coments. By default returns all comments.


A Pandas DataFrame containing the comments left on the model node.

get_dataframe(split=None, columns=None)

Retrieve the data being passed directly through this node. Can be filtered by a split or by a subset of columns (useful for large datasets). The data can also optionally include tag and comment metadata.

This dataframe is not the same as the dataframe returned by Dataset.get_dataframe(). While Dataset.get_dataframe() returns the source data, the dataframe returned by Node.get_dataframe() has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.

  • split (Optional[str], default: None) – Optionally restrict the data retrieved to a particular split, by default None (i.e., all splits)

  • columns (Optional[List[str]], default: None) – Optionally restrict the columns returned by this function, by default None. Useful for large datasets to significantly speed up retrieval time


A dataframe of the data being passed directly through this node, optionally filtered by split and/or columns and indexed by x_uid. This DataFrame is the result of all preprocessing in the DAG pipeline up to this point.

get_ground_truth(split=None, user_format=False)

Retrieve ground truth data for the current model node. Optionally filter by a particular split.

  • split (Optional[str], default: None) – Which data split to select, by default None (all splits). Can be one of “dev”, “train”, “valid”, or “test

  • user_format (bool, default: False) – Whether to return the ground truth in a human-readable format, by default False


A Pandas DataFrame mapping the data index to the ground truth label. If user_format is True, the label column will contain human-readable label names.

Retrieve a list of currently active labeling functions for the current model node.


>>> my_node.get_lfs()
LF(name='LF 1', label=3, templates=[...]),
LF(name='LF 2', label=2, templates=[...]),
LF(name='LF 3', label=1, templates=[...]),

A list of all currently active labeling functions for the current model node.

get_model_preds(model_uid=None, split=None, is_context=False, user_format=True)

Retrieve model predictions and probabilities for the current model node and a given model UID. If no model UID is provided, the most recent model’s predictions are returned.


>>> my_node.get_model_preds()
preds probs
doc::994 0 [0.543..., 0.080..., 0.37...]
doc::999 2 [0.327..., 0.201..., 0.4...]
  • model_uid (Optional[int], default: None) – The UID of a trained model, by default the latest model. All trained models can be seen from the “Models” accordion in Developer Studio.

  • split (Optional[str], default: None) – Optionally filter model predictions by split, by default returns predictions for all splits. Splits can be one of “train”, “dev”, “valid”, or “test”.

  • is_context (bool, default: False) – When True, retrieves predictions at the document level instead of the span level, by default False. Only applicable for information extraction tasks.

  • user_format (bool, default: True) – Whether to return the predictions in a human-readable or compressed integer format, by default True (returning a human-readable format).


A Pandas DataFrame of model predictions and probabilities, indexed by x_uid. If user_format is True, the preds column will contain human-readable label names.

Retrieve tags put on the current model node. For information extraction tasks, this method allows for fine-grained control over whether you want to retrieve tags at the document level or at the span level.


>>> my_node.get_tags()
doc::10005 [loan-err, new_tag1]
doc::10006 [new_tag1]
doc::10198 [Key-EMP-error]
Name: tags, dtype: object

is_context (bool, default: False) – When True, retrieves tags at the document level instead of the span level, by default False. Only applicable for information extraction tasks


A Pandas Series containing the tags put on the model node, indexed by x_uid .

get_training_set(training_set_uid, split=None, user_format=True)

Retrieve a training set for this model node, specified by the training set UID. Allow allows for filtering the training set by a particular data split.


>>> my_node.get_training_set(1)
training_set_labels training_set_probs
doc::100 stock [0.030..., 0.024...]
  • training_set_uid (int) – A training set UID, which can be found under the “Models” accordion in Developer Studio.

  • split (Optional[str], default: None) – An optional data split to return predictions for, by default None (all splits). Can be one of “train”, “dev”, “valid”, or “test”.

  • user_format (bool, default: True) – Whether to return the predictions in a human-readable or compressed integer format, by default True (returning a human-readable format).


A Pandas DataFrame of training set predictions and probabilities, indexed by x_uid. If user_format is True, the preds column will contain human-readable label names.

