Skip to main content
Version: 25.3

snorkelflow.operators.Featurizer

class snorkelflow.operators.Featurizer

Bases: ChangeColumns

Operator class that adds one or more columns (features) to a DataFrame.

See Custom Operator Classes for examples.

__init__

__init__()

Methods

__init__()
check_fit_arguments(**kwargs)Check fit arguments.
column_docs()
rtype:

Dict[str, str]

estimate_gpu_perf(df)
rtype:

Performance

estimate_perf(df)
rtype:

Performance

examples()List of examples (pairs of input df and operator kwargs).
execute(input_ddfs[, callback])Public method for validating + executing operator.
fit(df, **kwargs)Error handling wrapper for _fit method to fit operator to training data.
fit_arguments()Return fit argument types if operator implements _fit method, and None otherwise.
fit_input_schema(**kwargs)Error handling wrapper for _fit_input_schema method.
get_artifact_config()Mapping from operator config keys to data artifact paths.
get_datapoint_cols(input_datapoint_cols)Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
get_datapoint_instance(input_datapoint_instances)Return datapoint_instance of first input.
get_datapoint_type(input_datapoint_types)Get datapoint_type for output DataFrame of the operator, given types for inputs.
get_featurizer_hash()Used as part of hash when storing features.
get_op_impl_version()
rtype:

int

get_operator_impl_version_to_ds_migration()
rtype:

Dict[int, bool]

get_predictions(input_df, output_df)Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions.
get_row_hashes(df)Returns a unique hash for each row in the input_df.
is_disabled_by_feature_flag()Returns a boolean indicating whether this operator is disabled by a feature flag.
is_featurizer()
rtype:

bool

Attributes

artifact_config_keys
drop_schema
errors_file_path
fit_params
init_params
input_schemaCol to dtype map to validate the dtypes of the input dataframe.
is_deprecated
is_expensive
new_datapoint_cols
node_uid
operator_impl_version
operator_impl_version_to_ds_migration
output_meta
output_schemaCol to dtype map to validate the dtypes of the output dataframe.
show_args_in_gui
shuffle_colsUsed to tell whether operator requires shuffle before execution.
use_gpu_if_availableWhether this operator will run on GPU if one is available.

execute

execute(input_ddfs, callback=<function no_op_progress_callback>)

Public method for validating + executing operator. Must return dataframe.

Return type

DataFrame

get_featurizer_hash

get_featurizer_hash()

Used as part of hash when storing features. Has to be overwritten in order to use the cache_features decorator.

Return type

str

get_row_hashes

get_row_hashes(df)

Returns a unique hash for each row in the input_df. Only relevant if doing caching using the cache_features decorator.

Return type

Series

is_featurizer

static is_featurizer()

Return type

bool

drop_schema: Final = None