operators.candidates.extractor.RegexSpanFeaturizer
- class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)
A SpanFeaturizer that yields all matches for a given regular expression
This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.
- Parameters:
regex (
str
) – The regular expression to apply over the datafield (
str
) – The name of the column in the dataframe to apply to provided regex overignore_case (
bool
, default:False
) – If true, ignore case when considering regular expression matches (defaults to false)capture_group (
int
, default:0
) – The capture group to provide to the regex resultscol_suffix (
Optional
[str
], default:None
) – An optional suffix for the column containing the extracted spans