operators.candidates.extractor.RegexSpanFeaturizer
- class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)
A SpanFeaturizer that yields all matches for a given regular expression
This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.
Parameters
Parameters
Name Type Default Info regex strThe regular expression to apply over the data. field strThe name of the column in the dataframe to apply to provided regex over. ignore_case boolFalseIf true, ignore case when considering regular expression matches (defaults to false). capture_group int0The capture group to provide to the regex results. col_suffix Optional[str]NoneAn optional suffix for the column containing the extracted spans.