snorkelflow.operators.Operator
- class snorkelflow.operators.Operator
Bases:
object
Operator class that performs some transformation on dask dataframes.
See Using custom data points as one of the example usages.
- __init__()
Methods
__init__
()check_fit_arguments
(**kwargs)Check fit arguments.
- rtype:
Dict
[str
,str
]
- rtype:
Performance
estimate_perf
(df)- rtype:
Performance
examples
()List of examples (pairs of input df and operator kwargs).
execute
(input_ddfs[, callback])Public method for validating + executing operator.
fit
(df, **kwargs)Error handling wrapper for _fit method to fit operator to training data.
Return fit argument types if operator implements _fit method, and None otherwise.
fit_input_schema
(**kwargs)Error handling wrapper for _fit_input_schema method.
Mapping from operator config keys to data artifact paths.
get_datapoint_cols
(input_datapoint_cols)Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
get_datapoint_instance
(input_datapoint_instances)Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.
get_datapoint_type
(input_datapoint_types)Get datapoint_type for output DataFrame of the operator, given types for inputs.
- rtype:
int
- rtype:
Dict
[int
,bool
]
get_predictions
(input_df, output_df)Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions.
Returns a boolean indicating whether this operator is disabled by a feature flag.
Attributes
List of cols that are dropped as a result of this operator.
Col to dtype map to validate the dtypes of the input dataframe.
Col to dtype map to validate the dtypes of the output dataframe.
Used to tell whether operator requires shuffle before execution.
Whether this operator will run on GPU if one is available.
- classmethod check_fit_arguments(**kwargs)
Check fit arguments.
- Return type:
None
- column_docs()
- Return type:
Dict
[str
,str
]
- estimate_gpu_perf(df)
- Return type:
Performance
- estimate_perf(df)
- Return type:
Performance
- static examples()
List of examples (pairs of input df and operator kwargs).
- Return type:
List
[OperatorExample
]
- execute(input_ddfs, callback=<function no_op_progress_callback>)
Public method for validating + executing operator. Must return dataframe.
- Return type:
DataFrame
- classmethod fit(df, **kwargs)
Error handling wrapper for _fit method to fit operator to training data.
- Return type:
Dict
[str
,Any
]
- classmethod fit_arguments()
Return fit argument types if operator implements _fit method, and None otherwise.
- Return type:
Optional
[Dict
[str
,Any
]]
- classmethod fit_input_schema(**kwargs)
Error handling wrapper for _fit_input_schema method.
- Return type:
Optional
[Dict
[str
,Any
]]
- get_artifact_config()
Mapping from operator config keys to data artifact paths.
Keys correspond to artifact_config_keys property.
- Return type:
Dict
[str
,str
]
- get_datapoint_cols(input_datapoint_cols)
Get datapoint_cols for output DataFrame of the operator, given columns for inputs.
- Return type:
List
[str
]
- get_datapoint_instance(input_datapoint_instances)
Get datapoint_instance for output DataFrame of the operator, given datapoint_instances for inputs.
- Return type:
DatapointType
- get_datapoint_type(input_datapoint_types)
Get datapoint_type for output DataFrame of the operator, given types for inputs.
- Return type:
Type
[DatapointType
]
- classmethod get_op_impl_version()
- Return type:
int
- classmethod get_operator_impl_version_to_ds_migration()
- Return type:
Dict
[int
,bool
]
- get_predictions(input_df, output_df)
Operators capable of producting predictions, such as Models and Extractors, can subclass this to define how they return their predictions. df should be the input df to the operator. In the future, if spaces other than the SequenceLabelSpace are required, we can add that parameter here, however try-catch blocks will be required at import time to avoid circular dependencies.
- Return type:
Dict
[str
,Any
]
- static is_disabled_by_feature_flag()
Returns a boolean indicating whether this operator is disabled by a feature flag.
- Return type:
bool
-
artifact_config_keys:
List
[str
] = []
- property drop_schema: List[str] | None
List of cols that are dropped as a result of this operator.
If value is None, does NOT drop any columns.
-
errors_file_path:
Optional
[str
] = None
-
fit_params:
Optional
[Dict
[Any
,Any
]] = None
-
init_params:
Optional
[Dict
[Any
,Any
]] = None
- abstract property input_schema: Dict[str, Any] | None
Col to dtype map to validate the dtypes of the input dataframe.
If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.
-
is_deprecated:
bool
= False
-
is_expensive:
bool
= False
-
new_datapoint_cols:
List
[str
] = []
-
node_uid:
Optional
[int
] = None
-
operator_impl_version:
int
= 0
-
operator_impl_version_to_ds_migration:
Dict
[int
,bool
] = {}
- property output_meta: Dict[str, Any] | None
- abstract property output_schema: Dict[str, Any] | None
Col to dtype map to validate the dtypes of the output dataframe.
If value is None, any schema is allowed.
These are the _minimum_ required fields. Other fields are allowed by default.
If type is None, any type will be allowed for this field.
Don’t sort the keys in the output_schema if using the cache_features decorator if the order of features could change depending on the initialization.
-
show_args_in_gui:
bool
= True
- property shuffle_cols: List[str]
Used to tell whether operator requires shuffle before execution. The specified cols will be shuffled into the same partition
- property use_gpu_if_available: bool
Whether this operator will run on GPU if one is available.