Version: 0.94

snorkelflow.operators.Operator

class snorkelflow.operators.Operator

Bases: object

Operator class that performs some transformation on dask dataframes.

See Using custom data points as one of the example usages.

__init__()

Methods

`__init__`()
`check_fit_arguments`(**kwargs)	Check fit arguments.
`column_docs`()	rtype: `Dict`[`str`, `str`]
`estimate_gpu_perf`(df)	rtype: `Performance`
`estimate_perf`(df)	rtype: `Performance`
`examples`()	List of examples (pairs of input df and operator kwargs).
`execute`(input_ddfs[, callback])	Public method for validating + executing operator.
`fit`(df, **kwargs)	Error handling wrapper for _fit method to fit operator to training data.
`fit_arguments`()	Return fit argument types if operator implements _fit method, and None otherwise.
`fit_input_schema`(**kwargs)	Error handling wrapper for _fit_input_schema method.
`get_artifact_config`()	Mapping from operator config keys to data artifact paths.
`get_datapoint_cols`(input_datapoint_cols)	Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
`get_datapoint_instance`(input_datapoint_instances)	Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.
`get_datapoint_type`(input_datapoint_types)	Get datapoint_type for output DataFrame of the operator, given types for inputs.
`get_op_impl_version`()	rtype: `int`
`get_operator_impl_version_to_ds_migration`()	rtype: `Dict`[`int`, `bool`]
`get_predictions`(input_df, output_df)	Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions.
`is_disabled_by_feature_flag`()	Returns a boolean indicating whether this operator is disabled by a feature flag.

Attributes

`artifact_config_keys`
`drop_schema`	List of cols that are dropped as a result of this operator.
`errors_file_path`
`fit_params`
`init_params`
`input_schema`	Col to dtype map to validate the dtypes of the input dataframe.
`is_deprecated`
`is_expensive`
`new_datapoint_cols`
`node_uid`
`operator_impl_version`
`operator_impl_version_to_ds_migration`
`output_meta`
`output_schema`	Col to dtype map to validate the dtypes of the output dataframe.
`show_args_in_gui`
`shuffle_cols`	Used to tell whether operator requires shuffle before execution.
`use_gpu_if_available`	Whether this operator will run on GPU if one is available.

classmethod check_fit_arguments(**kwargs)

Check fit arguments.

Return type:: None

column_docs()

Return type:: Dict[str, str]

estimate_gpu_perf(df)

Return type:: Performance

estimate_perf(df)

Return type:: Performance

static examples()

List of examples (pairs of input df and operator kwargs).

Return type:: List[OperatorExample]

execute(input_ddfs, callback=<function no_op_progress_callback>)

Public method for validating + executing operator. Must return dataframe.

Return type:: DataFrame

classmethod fit(df, **kwargs)

Error handling wrapper for _fit method to fit operator to training data.

Return type:: Dict[str, Any]

classmethod fit_arguments()

Return fit argument types if operator implements _fit method, and None otherwise.

Return type:: Optional[Dict[str, Any]]

classmethod fit_input_schema(**kwargs)

Error handling wrapper for _fit_input_schema method.

Return type:: Optional[Dict[str, Any]]

get_artifact_config()

Mapping from operator config keys to data artifact paths.

Keys correspond to artifact_config_keys property.

Return type:: Dict[str, str]

get_datapoint_cols(input_datapoint_cols)

Get datapoint_cols for output DataFrame of the operator, given columns for inputs.

Return type:: List[str]

get_datapoint_instance(input_datapoint_instances)

Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.

Return type:: DatapointType

get_datapoint_type(input_datapoint_types)

Get datapoint_type for output DataFrame of the operator, given types for inputs.

Return type:: Type[DatapointType]

classmethod get_op_impl_version()

Return type:: int

classmethod get_operator_impl_version_to_ds_migration()

Return type:: Dict[int, bool]

get_predictions(input_df, output_df)

Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. df should be the input df to the operator. In the future, if spaces other than the SequenceLabelSpace are required, we can add that parameter here, however try-catch blocks will be required at import time to avoid circular dependencies.

Return type:: Dict[str, Any]

static is_disabled_by_feature_flag()

Returns a boolean indicating whether this operator is disabled by a feature flag.

Return type:: bool

artifact_config_keys: List[str] = []

property drop_schema: List[str] | None

List of cols that are dropped as a result of this operator.

If value is None, does NOT drop any columns.

errors_file_path: Optional[str] = None

fit_params: Optional[Dict[Any, Any]] = None

init_params: Optional[Dict[Any, Any]] = None

abstract property input_schema: Dict[str, Any] | None

Col to dtype map to validate the dtypes of the input dataframe.

If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.

is_deprecated: bool = False

is_expensive: bool = False

new_datapoint_cols: List[str] = []

node_uid: Optional[int] = None

operator_impl_version: int = 0

operator_impl_version_to_ds_migration: Dict[int, bool] = {}

property output_meta: Dict[str, Any] | None

abstract property output_schema: Dict[str, Any] | None

Col to dtype map to validate the dtypes of the output dataframe.

If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.

Don’t sort the keys in the output_schema if using the cache_features decorator if the order of features could change depending on the initialization.

show_args_in_gui: bool = True

property shuffle_cols: List[str]: Used to tell whether operator requires shuffle before execution. The specified cols will be shuffled into the same partition

property use_gpu_if_available: bool: Whether this operator will run on GPU if one is available.