snorkelflow.sdk.OperatorNode
- class snorkelflow.sdk.OperatorNode(uid, application_uid, config)
Bases:
Node
OperatorNode class represents a non-model, operator node.
- __init__(uid, application_uid, config)
Methods
__init__
(uid, application_uid, config)get
(node_uid)Fetches a node by its UID.
get_dataframe
([max_input_rows, ...])Retrieve the data being passed directly through this node.
Attributes
application_uid
The unique identifier for the application this node belongs to
config
Returns the detailed configuration information for this node
uid
The unique identifier for this node
- get_dataframe(max_input_rows=10, datasource_uids=None, partition=None)
Retrieve the data being passed directly through this node. By default, this function will only process a maximum of 10 rows of data, to prevent the Notebook from running out of memory. To override this limit, set
max_input_rows
to a higher value.This dataframe is not the same as the dataframe returned by
Dataset.get_dataframe()
. WhileDataset.get_dataframe()
returns the source data, the dataframe returned byNode.get_dataframe()
has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.- Parameters:
max_input_rows (
int
, default:10
) – The number of rows that should be pushed through this node, by default 10datasource_uids (
Optional
[List
[int
]], default:None
) – A list of datasource UIDs to process, useful if you have some specific datasources you want to examine, by default None. See theDataset
class for more information on fetching a datasource UID.partition (
Optional
[int
], default:None
) – A specific file partition to process, by default None. Only applicable if the source dataset files are in a readily partitioned format.
- Returns:
A DataFrame displaying the results when the source dataset is pushed through this node.
- Return type:
pd.DataFrame