operators.candidates.extractor.RegexSpanExtractor
- class operators.candidates.extractor.RegexSpanExtractor(regex, field, ignore_case=False, capture_group=0, col_suffix=None)
A SpanExtractor that yields all matches for a given regular expression
This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.
Parameters
Parameters
Name Type Default Info regex str
The regular expression to apply over the data. field str
The name of the column in the dataframe to apply to provided regex over. ignore_case bool
If true, ignore case when considering regular expression matches (defaults to false). capture_group str
The capture group to provide to the regex results. col_suffix str
An optional suffix for the column containing the extracted spans.