snorkelflow.sdk.Node
- class snorkelflow.sdk.Node(uid, application_uid, config)
Bases:
ABC
The Node object represents atomic data processing units in Snorkel Flow.
Nodes Quickstart
Nodes Quickstart
from snorkelflow.sdk import Node
# Get a Node object by its UID
my_node = Node.get(123)
# Get a dataframe
df = my_node.get_dataframe()Nodes Concepts
Nodes Concepts
Nodes
Nodes
A Node is a unit of data processing in Snorkel Flow. Nodes are split into two broad categories, Operators and Models. Operator nodes apply transformations to the data, such as adding a column, modifying a column’s values, combining multiple dataframes, or filtering rows. Model nodes are the machine learning hubs in the data pipeline, where you can query and finetune foundation models, explore your data, and train your own models. Nodes are stitched together in a particular order, which dictates how your data flows from your source Dataset all the way to your final desired outputs. The Application DAG (directed acyclic graph) helps visualize the order that the nodes are organized in.
Fetching Data in the Notebook
Fetching Data in the Notebook
The Node object is the primary way to fetch data in the Snorkel Flow Notebook. The Node object has a
get_dataframe()
method, which returns a Pandas DataFrame corresponding to what the DAG sees when it is processing data at that point in the graph. This is useful for debugging and understanding how your data is being transformed throughout the data development process. However, since the Notebook environment often has fewer compute resources than the core Snorkel Flow backend, there are some caveats to be aware of when interacting with your data this way. By default, theget_dataframe()
method will return a maximum of 10 rows of data, to prevent the Notebook from running out of memory. This safety latch can be manually overridden, but should be done with caution.- __init__(uid, application_uid, config)
Methods
__init__
(uid, application_uid, config)get
(node_uid)Fetches a node by its UID.
Retrieve the data being passed directly through this node.
Attributes
The unique identifier for the application this node belongs to
Returns the detailed configuration information for this node
The unique identifier for this node
- classmethod get(node_uid)
Fetches a node by its UID. Returns either a ModelNode or OperatorNode object, which can be used to fetch and manipulate node-level data.
- Parameters:
node_uid (
int
) – The UID for the particular node. You can find this UID by clicking on the node in the DAG view in Developer Studio, or by looking at thenode_dag
field of the return value ofsnorkelflow.client.get_application()
.- Returns:
A
Node
object, either an instance ofModelNode
orOperatorNode
.- Return type:
- abstract get_dataframe()
Retrieve the data being passed directly through this node.
This dataframe is not the same as the dataframe returned by
Dataset.get_dataframe()
. WhileDataset.get_dataframe()
returns the source data, the dataframe returned byNode.get_dataframe()
has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.- Returns:
A DataFrame of the data being passed directly through this node. This DataFrame is the result of all preprocessing in the DAG pipeline up to this point.
- Return type:
pd.DataFrame
- property application_uid: int
The unique identifier for the application this node belongs to
- property config: Dict[str, Any]
Returns the detailed configuration information for this node
- property uid: int
The unique identifier for this node