snorkelflow.operators.Featurizer
- class snorkelflow.operators.Featurizer
Bases:
ChangeColumns
Operator class that adds one or more columns (features) to a DataFrame.
See Custom Operator Classes for examples.
- __init__()
\_\_init\_\_
__init__
Methods
__init__
()check_fit_arguments
(**kwargs)Check fit arguments. column_docs
()- rtype:
Dict
[str
,str
]
estimate_gpu_perf
(df)- rtype:
Performance
estimate_perf
(df)- rtype:
Performance
examples
()List of examples (pairs of input df and operator kwargs). execute
(input_ddfs[, callback])Public method for validating + executing operator. fit
(df, **kwargs)Error handling wrapper for _fit method to fit operator to training data. fit_arguments
()Return fit argument types if operator implements _fit method, and None otherwise. fit_input_schema
(**kwargs)Error handling wrapper for _fit_input_schema method. get_artifact_config
()Mapping from operator config keys to data artifact paths. get_datapoint_cols
(input_datapoint_cols)Get datapoint_cols for output DataFrame of the operator, given columns for inputs. get_datapoint_instance
(input_datapoint_instances)Return datapoint_instance of first input. get_datapoint_type
(input_datapoint_types)Get datapoint_type for output DataFrame of the operator, given types for inputs. get_featurizer_hash
()Used as part of hash when storing features. get_op_impl_version
()- rtype:
int
get_operator_impl_version_to_ds_migration
()- rtype:
Dict
[int
,bool
]
get_predictions
(input_df, output_df)Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. get_row_hashes
(df)Returns a unique hash for each row in the input_df. is_disabled_by_feature_flag
()Returns a boolean indicating whether this operator is disabled by a feature flag. is_featurizer
()- rtype:
bool
Attributes
artifact_config_keys
drop_schema
errors_file_path
fit_params
init_params
input_schema
Col to dtype map to validate the dtypes of the input dataframe. is_deprecated
is_expensive
new_datapoint_cols
node_uid
operator_impl_version
operator_impl_version_to_ds_migration
output_meta
output_schema
Col to dtype map to validate the dtypes of the output dataframe. show_args_in_gui
shuffle_cols
Used to tell whether operator requires shuffle before execution. use_gpu_if_available
Whether this operator will run on GPU if one is available. - execute(input_ddfs, callback=<function no_op_progress_callback>)
Public method for validating + executing operator. Must return dataframe.
Return type
Return type
DataFrame
execute
execute
- get_featurizer_hash()
Used as part of hash when storing features. Has to be overwritten in order to use the cache_features decorator.
Return type
Return type
str
get\_featurizer\_hash
get_featurizer_hash
- get_row_hashes(df)
Returns a unique hash for each row in the input_df. Only relevant if doing caching using the cache_features decorator.
Return type
Return type
Series
get\_row\_hashes
get_row_hashes
-
drop_schema:
Final
= None