snorkelflow.operators.Featurizer
- class snorkelflow.operators.Featurizer
Bases:
ChangeColumnsOperator class that adds one or more columns (features) to a DataFrame.
See Custom Operator Classes for examples.
- __init__()
\_\_init\_\_
__init__
Methods
__init__()check_fit_arguments(**kwargs)Check fit arguments. column_docs()- rtype:
Dict[str,str]
estimate_gpu_perf(df)- rtype:
Performance
estimate_perf(df)- rtype:
Performance
examples()List of examples (pairs of input df and operator kwargs). execute(input_ddfs[, callback])Public method for validating + executing operator. fit(df, **kwargs)Error handling wrapper for _fit method to fit operator to training data. fit_arguments()Return fit argument types if operator implements _fit method, and None otherwise. fit_input_schema(**kwargs)Error handling wrapper for _fit_input_schema method. get_artifact_config()Mapping from operator config keys to data artifact paths. get_datapoint_cols(input_datapoint_cols)Get datapoint_cols for output DataFrame of the operator, given columns for inputs. get_datapoint_instance(input_datapoint_instances)Return datapoint_instance of first input. get_datapoint_type(input_datapoint_types)Get datapoint_type for output DataFrame of the operator, given types for inputs. get_featurizer_hash()Used as part of hash when storing features. get_op_impl_version()- rtype:
int
get_operator_impl_version_to_ds_migration()- rtype:
Dict[int,bool]
get_predictions(input_df, output_df)Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. get_row_hashes(df)Returns a unique hash for each row in the input_df. is_disabled_by_feature_flag()Returns a boolean indicating whether this operator is disabled by a feature flag. is_featurizer()- rtype:
bool
Attributes
artifact_config_keysdrop_schemaerrors_file_pathfit_paramsinit_paramsinput_schemaCol to dtype map to validate the dtypes of the input dataframe. is_deprecatedis_expensivenew_datapoint_colsnode_uidoperator_impl_versionoperator_impl_version_to_ds_migrationoutput_metaoutput_schemaCol to dtype map to validate the dtypes of the output dataframe. show_args_in_guishuffle_colsUsed to tell whether operator requires shuffle before execution. use_gpu_if_availableWhether this operator will run on GPU if one is available. - execute(input_ddfs, callback=<function no_op_progress_callback>)
Public method for validating + executing operator. Must return dataframe.
Return type
Return type
DataFrame
execute
execute
- get_featurizer_hash()
Used as part of hash when storing features. Has to be overwritten in order to use the cache_features decorator.
Return type
Return type
str
get\_featurizer\_hash
get_featurizer_hash
- get_row_hashes(df)
Returns a unique hash for each row in the input_df. Only relevant if doing caching using the cache_features decorator.
Return type
Return type
Series
get\_row\_hashes
get_row_hashes
-
drop_schema:
Final= None