Skip to main content
Version: 0.91

snorkelflow.sdk.OperatorNode

class snorkelflow.sdk.OperatorNode(uid, application_uid, config)

Bases: Node

OperatorNode class represents a non-model, operator node.

__init__(uid, application_uid, config)

Methods

__init__(uid, application_uid, config)

get(node_uid)

Fetches a node by its UID.

get_dataframe([max_input_rows, ...])

Retrieve the data being passed directly through this node.

Attributes

application_uid

The unique identifier for the application this node belongs to

config

Returns the detailed configuration information for this node

uid

The unique identifier for this node

get_dataframe(max_input_rows=10, datasource_uids=None, partition=None)

Retrieve the data being passed directly through this node. By default, this function will only process a maximum of 10 rows of data, to prevent the Notebook from running out of memory. To override this limit, set max_input_rows to a higher value.

This dataframe is not the same as the dataframe returned by Dataset.get_dataframe(). While Dataset.get_dataframe() returns the source data, the dataframe returned by Node.get_dataframe() has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.

Parameters:
  • max_input_rows (int, default: 10) – The number of rows that should be pushed through this node, by default 10

  • datasource_uids (Optional[List[int]], default: None) – A list of datasource UIDs to process, useful if you have some specific datasources you want to examine, by default None. See the Dataset class for more information on fetching a datasource UID.

  • partition (Optional[int], default: None) – A specific file partition to process, by default None. Only applicable if the source dataset files are in a readily partitioned format.

Returns:

A DataFrame displaying the results when the source dataset is pushed through this node.

Return type:

pd.DataFrame