operators.candidates.extractor.RegexSpanFeaturizer
- class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)
A SpanFeaturizer that yields all matches for a given regular expression
This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.
Parameters
Parameters
Name Type Default Info regex str
The regular expression to apply over the data. field str
The name of the column in the dataframe to apply to provided regex over. ignore_case bool
False
If true, ignore case when considering regular expression matches (defaults to false). capture_group int
0
The capture group to provide to the regex results. col_suffix Optional[str]
None
An optional suffix for the column containing the extracted spans.