snorkelflow.operators.Operator
- class snorkelflow.operators.Operator
Bases:
object
Operator class that performs some transformation on dask dataframes.
See Using custom data points as one of the example usages.
- __init__()
\_\_init\_\_
__init__
Methods
__init__
()check_fit_arguments
(**kwargs)Check fit arguments.
- rtype:
Dict
[str
,str
]
- rtype:
Performance
estimate_perf
(df)- rtype:
Performance
examples
()List of examples (pairs of input df and operator kwargs).
execute
(input_ddfs[, callback])Public method for validating + executing operator.
fit
(df, **kwargs)Error handling wrapper for _fit method to fit operator to training data.
Return fit argument types if operator implements _fit method, and None otherwise.
fit_input_schema
(**kwargs)Error handling wrapper for _fit_input_schema method.
Mapping from operator config keys to data artifact paths.
get_datapoint_cols
(input_datapoint_cols)Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
get_datapoint_instance
(input_datapoint_instances)Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.
get_datapoint_type
(input_datapoint_types)Get datapoint_type for output DataFrame of the operator, given types for inputs.
- rtype:
int
- rtype:
Dict
[int
,bool
]
get_predictions
(input_df, output_df)Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions.
Returns a boolean indicating whether this operator is disabled by a feature flag.
Attributes
List of cols that are dropped as a result of this operator.
Col to dtype map to validate the dtypes of the input dataframe.
Col to dtype map to validate the dtypes of the output dataframe.
Used to tell whether operator requires shuffle before execution.
Whether this operator will run on GPU if one is available.
- classmethod check_fit_arguments(**kwargs)
Check fit arguments.
Return type
Return type
None
check\_fit\_arguments
check_fit_arguments
- static examples()
List of examples (pairs of input df and operator kwargs).
Return type
Return type
List
[OperatorExample
]
examples
examples
- execute(input_ddfs, callback=<function no_op_progress_callback>)
Public method for validating + executing operator. Must return dataframe.
Return type
Return type
DataFrame
execute
execute
- classmethod fit(df, **kwargs)
Error handling wrapper for _fit method to fit operator to training data.
Return type
Return type
Dict
[str
,Any
]
fit
fit
- classmethod fit_arguments()
Return fit argument types if operator implements _fit method, and None otherwise.
Return type
Return type
Optional
[Dict
[str
,Any
]]
fit\_arguments
fit_arguments
- classmethod fit_input_schema(**kwargs)
Error handling wrapper for _fit_input_schema method.
Return type
Return type
Optional
[Dict
[str
,Any
]]
fit\_input\_schema
fit_input_schema
- get_artifact_config()
Mapping from operator config keys to data artifact paths.
Keys correspond to artifact_config_keys property.
Return type
Return type
Dict
[str
,str
]
get\_artifact\_config
get_artifact_config
- get_datapoint_cols(input_datapoint_cols)
Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
Return type
Return type
List
[str
]
get\_datapoint\_cols
get_datapoint_cols
- get_datapoint_instance(input_datapoint_instances)
Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.
Return type
Return type
DatapointType
get\_datapoint\_instance
get_datapoint_instance
- get_datapoint_type(input_datapoint_types)
Get datapoint_type for output DataFrame of the operator, given types for inputs.
Return type
Return type
Type
[DatapointType
]
get\_datapoint\_type
get_datapoint_type
- classmethod get_op_impl_version()
Return type
Return type
int
get\_op\_impl\_version
get_op_impl_version
- classmethod get_operator_impl_version_to_ds_migration()
Return type
Return type
Dict
[int
,bool
]
get\_operator\_impl\_version\_to\_ds\_migration
get_operator_impl_version_to_ds_migration
- get_predictions(input_df, output_df)
Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. df should be the input df to the operator. In the future, if spaces other than the SequenceLabelSpace are required, we can add that parameter here, however try-catch blocks will be required at import time to avoid circular dependencies.
Return type
Return type
Dict
[str
,Any
]
get\_predictions
get_predictions
- static is_disabled_by_feature_flag()
Returns a boolean indicating whether this operator is disabled by a feature flag.
Return type
Return type
bool
is\_disabled\_by\_feature\_flag
is_disabled_by_feature_flag
-
artifact_config_keys:
List
[str
] = []
- property drop_schema: List[str] | None
List of cols that are dropped as a result of this operator.
If value is None, does NOT drop any columns.
-
errors_file_path:
Optional
[str
] = None
-
fit_params:
Optional
[Dict
[Any
,Any
]] = None
-
init_params:
Optional
[Dict
[Any
,Any
]] = None
- abstract property input_schema: Dict[str, Any] | None
Col to dtype map to validate the dtypes of the input dataframe.
If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.
-
is_deprecated:
bool
= False
-
is_expensive:
bool
= False
-
new_datapoint_cols:
List
[str
] = []
-
node_uid:
Optional
[int
] = None
-
operator_impl_version:
int
= 0
-
operator_impl_version_to_ds_migration:
Dict
[int
,bool
] = {}
- property output_meta: Dict[str, Any] | None
- abstract property output_schema: Dict[str, Any] | None
Col to dtype map to validate the dtypes of the output dataframe.
If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.
Don’t sort the keys in the output_schema if using the cache_features decorator if the order of features could change depending on the initialization.
-
show_args_in_gui:
bool
= True
- property shuffle_cols: List[str]
Used to tell whether operator requires shuffle before execution. The specified cols will be shuffled into the same partition
- property use_gpu_if_available: bool
Whether this operator will run on GPU if one is available.