snorkelflow.sdk.OperatorNode
- class snorkelflow.sdk.OperatorNode(uid, application_uid, config)
Bases:
NodeOperatorNode class represents a non-model, operator node.
- __init__(uid, application_uid, config)
\_\_init\_\_
__init__
Methods
__init__(uid, application_uid, config)get(node_uid)Fetches a node by its UID. get_dataframe([max_input_rows, ...])Retrieve the data being passed directly through this node. Attributes
application_uidThe unique identifier for the application this node belongs to configReturns the detailed configuration information for this node uidThe unique identifier for this node - get_dataframe(max_input_rows=10, datasource_uids=None, partition=None)
Retrieve the data being passed directly through this node. By default, this function will only process a maximum of 10 rows of data, to prevent the Notebook from running out of memory. To override this limit, set
max_input_rowsto a higher value.This dataframe is not the same as the dataframe returned by
Dataset.get_dataframe(). WhileDataset.get_dataframe()returns the source data, the dataframe returned byNode.get_dataframe()has also undergone all the preprocessing/DAG transformations up to this point in the processing pipeline.Parameters
Parameters
Returns
Returns
A DataFrame displaying the results when the source dataset is pushed through this node.
Return type
Return type
pd.DataFrame
Name Type Default Info max_input_rows int10The number of rows that should be pushed through this node, by default 10. datasource_uids Optional[List[int]]NoneA list of datasource UIDs to process, useful if you have some specific datasources you want to examine, by default None. See the Datasetclass for more information on fetching a datasource UID.partition Optional[int]NoneA specific file partition to process, by default None. Only applicable if the source dataset files are in a readily partitioned format.
get\_dataframe
get_dataframe