Skip to main content
Version: 0.95

snorkelflow.operators.Operator

class snorkelflow.operators.Operator

Bases: object

Operator class that performs some transformation on dask dataframes.

See Using custom data points as one of the example usages.

__init__()

Methods

__init__()

check_fit_arguments(**kwargs)

Check fit arguments.

column_docs()

rtype:

Dict[str, str]

estimate_gpu_perf(df)

rtype:

Performance

estimate_perf(df)

rtype:

Performance

examples()

List of examples (pairs of input df and operator kwargs).

execute(input_ddfs[, callback])

Public method for validating + executing operator.

fit(df, **kwargs)

Error handling wrapper for _fit method to fit operator to training data.

fit_arguments()

Return fit argument types if operator implements _fit method, and None otherwise.

fit_input_schema(**kwargs)

Error handling wrapper for _fit_input_schema method.

get_artifact_config()

Mapping from operator config keys to data artifact paths.

get_datapoint_cols(input_datapoint_cols)

Get datapoint_cols for output DataFrame of the operator, given columns for inputs.

get_datapoint_instance(input_datapoint_instances)

Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.

get_datapoint_type(input_datapoint_types)

Get datapoint_type for output DataFrame of the operator, given types for inputs.

get_op_impl_version()

rtype:

int

get_operator_impl_version_to_ds_migration()

rtype:

Dict[int, bool]

get_predictions(input_df, output_df)

Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions.

is_disabled_by_feature_flag()

Returns a boolean indicating whether this operator is disabled by a feature flag.

Attributes

artifact_config_keys

drop_schema

List of cols that are dropped as a result of this operator.

errors_file_path

fit_params

init_params

input_schema

Col to dtype map to validate the dtypes of the input dataframe.

is_deprecated

is_expensive

new_datapoint_cols

node_uid

operator_impl_version

operator_impl_version_to_ds_migration

output_meta

output_schema

Col to dtype map to validate the dtypes of the output dataframe.

show_args_in_gui

shuffle_cols

Used to tell whether operator requires shuffle before execution.

use_gpu_if_available

Whether this operator will run on GPU if one is available.

classmethod check_fit_arguments(**kwargs)

Check fit arguments.

Return type:

None

column_docs()
Return type:

Dict[str, str]

estimate_gpu_perf(df)
Return type:

Performance

estimate_perf(df)
Return type:

Performance

static examples()

List of examples (pairs of input df and operator kwargs).

Return type:

List[OperatorExample]

execute(input_ddfs, callback=<function no_op_progress_callback>)

Public method for validating + executing operator. Must return dataframe.

Return type:

DataFrame

classmethod fit(df, **kwargs)

Error handling wrapper for _fit method to fit operator to training data.

Return type:

Dict[str, Any]

classmethod fit_arguments()

Return fit argument types if operator implements _fit method, and None otherwise.

Return type:

Optional[Dict[str, Any]]

classmethod fit_input_schema(**kwargs)

Error handling wrapper for _fit_input_schema method.

Return type:

Optional[Dict[str, Any]]

get_artifact_config()

Mapping from operator config keys to data artifact paths.

Keys correspond to artifact_config_keys property.

Return type:

Dict[str, str]

get_datapoint_cols(input_datapoint_cols)

Get datapoint_cols for output DataFrame of the operator, given columns for inputs.

Return type:

List[str]

get_datapoint_instance(input_datapoint_instances)

Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.

Return type:

DatapointType

get_datapoint_type(input_datapoint_types)

Get datapoint_type for output DataFrame of the operator, given types for inputs.

Return type:

Type[DatapointType]

classmethod get_op_impl_version()
Return type:

int

classmethod get_operator_impl_version_to_ds_migration()
Return type:

Dict[int, bool]

get_predictions(input_df, output_df)

Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. df should be the input df to the operator. In the future, if spaces other than the SequenceLabelSpace are required, we can add that parameter here, however try-catch blocks will be required at import time to avoid circular dependencies.

Return type:

Dict[str, Any]

static is_disabled_by_feature_flag()

Returns a boolean indicating whether this operator is disabled by a feature flag.

Return type:

bool

artifact_config_keys: List[str] = []
property drop_schema: List[str] | None

List of cols that are dropped as a result of this operator.

  • If value is None, does NOT drop any columns.

errors_file_path: Optional[str] = None
fit_params: Optional[Dict[Any, Any]] = None
init_params: Optional[Dict[Any, Any]] = None
abstract property input_schema: Dict[str, Any] | None

Col to dtype map to validate the dtypes of the input dataframe.

  • If value is None, any schema is allowed.

  • These are the _minimum_ required fields. Other fields are allowed by default.

  • If type is None, any type will be allowed for this field.

is_deprecated: bool = False
is_expensive: bool = False
new_datapoint_cols: List[str] = []
node_uid: Optional[int] = None
operator_impl_version: int = 0
operator_impl_version_to_ds_migration: Dict[int, bool] = {}
property output_meta: Dict[str, Any] | None
abstract property output_schema: Dict[str, Any] | None

Col to dtype map to validate the dtypes of the output dataframe.

  • If value is None, any schema is allowed.

  • These are the _minimum_ required fields. Other fields are allowed by default.

  • If type is None, any type will be allowed for this field.

Don’t sort the keys in the output_schema if using the cache_features decorator if the order of features could change depending on the initialization.

show_args_in_gui: bool = True
property shuffle_cols: List[str]

Used to tell whether operator requires shuffle before execution. The specified cols will be shuffled into the same partition

property use_gpu_if_available: bool

Whether this operator will run on GPU if one is available.