Skip to main content
Version: 25.6

snorkelai.sdk.client.nodes

In a data pipeline, nodes are configurable computation steps that organize and connect the flow of your data in a Directed Acyclic Graph (DAG). Each node is the site of an operation that transforms data. A sequence of nodes contains many operations that comprise the full DAG. Nodes serve as containers that link data sources, processing operators, and other pipeline components together.

A node:

  • Can have an operator committed to it that defines its computation logic.

  • Serves as the point of execution for an operation that transforms data.

  • Can be associated with a specific expected operator type (like Featurizer or Model).

  • Can be connected to other nodes via the DAG with input/output relationships.

  • Acts as a container for configuration, settings, and metadata.

  • Has a unique identifier (node_uid) within the application.

For example, a Featurizer node with an EmbeddingFeaturizer operator converts text to embeddings, adding a new column to your DataFrame.

Snorkel represents your data pipeline, including nodes, on the Pipeline (DAG) page in the GUI, making it easy to visualize how data transforms as it moves through your pipeline. You can create and modify nodes using both the GUI and the SDK.

Read more about building data pipelines in our user guide:

Functions

add_active_datasources(node[, ...])Adds multiple datasources to a node and activates them all.
add_node(application[, input_node_uids, ...])Adds a new node to an application's data processing pipeline.
add_node_hierarchy(application, ...)Creates a set of nodes for an application in a hierarchical structure, enabling multi-level classification.
commit_builtin_operator(node, op_type[, ...])Commit a built-in operator to the node
commit_custom_operator(node, operator_uid)Commits a customer operator to the specified node
delete_node(node)Deletes the specified node
fit_and_commit(node_uid, op_type, fit_config)Fits an operator given a node_uid and commits a fitted op_version to the node.
get_model_node(application)Returns the model node for an application, if there is exactly one
get_model_nodes(application)Returns all model nodes in the application
get_node(node)Gets detailed information about a specific node in a data pipeline (DAG).
get_node_data(node[, split, data, ...])Get a dataframe from a model node, optionally including annotations and labels.
get_node_datasources(node[, compute_staleness])Returns the datasources for a given node.
get_node_input_cols(application, node)Returns a list of available columns at the specified node.
get_node_inputs_data(application, node[, ...])Gets the input data frames at a specific node.
get_node_label_map(node_uid)Returns the label map for a given model node.
get_node_output_data(application, node[, ...])Get the output dataframe at a specific node.
get_node_settings(node, setting)Get specified node setting.
get_node_uid(application, search_op_type)Returns all nodes in the application with a specified expected_op_type
get_preprocessing_issues(node)Returns all datapoint UIDs with errors or warnings raised during preprocessing up to (but excluding) the node uid.
list_nodes(application)Gets a list of all nodes in the specified application.
put_node_datasource(node, datasource_uid, ...)Set the active state of a datasource in a node.
refresh_active_datasources(node)Refreshes the active data sources at a node.
set_node_settings(node, setting, setting_value)Set specified task setting.
uncommit_operator(node)Uncommit a committed operator from the node