snorkelai.sdk.client.nodes
In a data pipeline, nodes are configurable computation steps that organize and connect the flow of your data in a Directed Acyclic Graph (DAG). Each node is the site of an operation that transforms data. A sequence of nodes contains many operations that comprise the full DAG. Nodes serve as containers that link data sources, processing operators, and other pipeline components together.
A node:
Can have an operator committed to it that defines its computation logic.
Serves as the point of execution for an operation that transforms data.
Can be associated with a specific expected operator type (like
Featurizer
orModel
).Can be connected to other nodes via the DAG with input/output relationships.
Acts as a container for configuration, settings, and metadata.
Has a unique identifier (node_uid) within the application.
For example, a Featurizer
node with an EmbeddingFeaturizer
operator
converts text to embeddings, adding a new column to your DataFrame.
Snorkel represents your data pipeline, including nodes, on the Pipeline (DAG) page in the GUI, making it easy to visualize how data transforms as it moves through your pipeline. You can create and modify nodes using both the GUI and the SDK.
Read more about building data pipelines in our user guide:
Applications and operators: Build end-to-end AI applications introduces Snorkel’s approach to data pipelines.
Built-in operators describes available data transformation actions.
Functions
add_active_datasources (node[, ...]) | Adds multiple datasources to a node and activates them all. |
add_node (application[, input_node_uids, ...]) | Adds a new node to an application's data processing pipeline. |
add_node_hierarchy (application, ...) | Creates a set of nodes for an application in a hierarchical structure, enabling multi-level classification. |
commit_builtin_operator (node, op_type[, ...]) | Commit a built-in operator to the node |
commit_custom_operator (node, operator_uid) | Commits a customer operator to the specified node |
delete_node (node) | Deletes the specified node |
fit_and_commit (node_uid, op_type, fit_config) | Fits an operator given a node_uid and commits a fitted op_version to the node. |
get_model_node (application) | Returns the model node for an application, if there is exactly one |
get_model_nodes (application) | Returns all model nodes in the application |
get_node (node) | Gets detailed information about a specific node in a data pipeline (DAG). |
get_node_data (node[, split, data, ...]) | Get a dataframe from a model node, optionally including annotations and labels. |
get_node_datasources (node[, compute_staleness]) | Returns the datasources for a given node. |
get_node_input_cols (application, node) | Returns a list of available columns at the specified node. |
get_node_inputs_data (application, node[, ...]) | Gets the input data frames at a specific node. |
get_node_label_map (node_uid) | Returns the label map for a given model node. |
get_node_output_data (application, node[, ...]) | Get the output dataframe at a specific node. |
get_node_settings (node, setting) | Get specified node setting. |
get_node_uid (application, search_op_type) | Returns all nodes in the application with a specified expected_op_type |
get_preprocessing_issues (node) | Returns all datapoint UIDs with errors or warnings raised during preprocessing up to (but excluding) the node uid. |
list_nodes (application) | Gets a list of all nodes in the specified application. |
put_node_datasource (node, datasource_uid, ...) | Set the active state of a datasource in a node. |
refresh_active_datasources (node) | Refreshes the active data sources at a node. |
set_node_settings (node, setting, setting_value) | Set specified task setting. |
uncommit_operator (node) | Uncommit a committed operator from the node |