Skip to main content
Version: 0.96

snorkelflow.sdk.Node

class snorkelflow.sdk.Node(uid, application_uid, config)

Bases: ABC

The Node object represents atomic data processing units in Snorkel Flow.

Nodes Quickstart

from snorkelflow.sdk import Node
# Get a Node object by its UID
my_node = Node.get(123)

# Get a dataframe
df = my_node.get_dataframe()

Nodes Concepts

Nodes

A Node is a unit of data processing in Snorkel Flow. Nodes are split into two broad categories, Operators and Models. Operator nodes apply transformations to the data, such as adding a column, modifying a column’s values, combining multiple dataframes, or filtering rows. Model nodes are the machine learning hubs in the data pipeline, where you can query and finetune foundation models, explore your data, and train your own models. Nodes are stitched together in a particular order, which dictates how your data flows from your source Dataset all the way to your final desired outputs. The Application DAG (directed acyclic graph) helps visualize the order that the nodes are organized in.

Fetching Data in the Notebook

The Node object is the primary way to fetch data in the Snorkel Flow Notebook. The Node object has a get_dataframe() method, which returns a Pandas DataFrame corresponding to what the DAG sees when it is processing data at that point in the graph. This is useful for debugging and understanding how your data is being transformed throughout the data development process. However, since the Notebook environment often has fewer compute resources than the core Snorkel Flow backend, there are some caveats to be aware of when interacting with your data this way. By default, the get_dataframe() method will return a maximum of 10 rows of data, to prevent the Notebook from running out of memory. This safety latch can be manually overridden, but should be done with caution.

__init__(uid, application_uid, config)

Methods

__init__(uid, application_uid, config)

get(node_uid)

Fetches a node by its UID.

get_dataframe()

Retrieve the data being passed directly through this node.

Attributes

application_uid

The unique identifier for the application this node belongs to

config

Returns the detailed configuration information for this node

uid

The unique identifier for this node

classmethod get(node_uid)

Fetches a node by its UID. Returns either a ModelNode or OperatorNode object, which can be used to fetch and manipulate node-level data.

Parameters:

node_uid (int) – The UID for the particular node. You can find this UID by clicking on the node in the DAG view in Developer Studio, or by looking at the node_dag field of the return value of snorkelflow.client.get_application().

Returns:

A Node object, either an instance of ModelNode or OperatorNode.

Return type:

Node

abstract get_dataframe()

Retrieve the data being passed directly through this node.

This dataframe is not the same as the dataframe returned by Dataset.get_dataframe(). While Dataset.get_dataframe() returns the source data, the dataframe returned by Node.get_dataframe() has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.

Returns:

A DataFrame of the data being passed directly through this node. This DataFrame is the result of all preprocessing in the DAG pipeline up to this point.

Return type:

pd.DataFrame

property application_uid: int

The unique identifier for the application this node belongs to

property config: Dict[str, Any]

Returns the detailed configuration information for this node

property uid: int

The unique identifier for this node