Version: 0.96

operators.candidates.extractor.RegexSpanFeaturizer

class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)

A SpanFeaturizer that yields all matches for a given regular expression

This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.

Name	Type	Default	Info
regex	`str`		The regular expression to apply over the data.
field	`str`		The name of the column in the dataframe to apply to provided regex over.
ignore_case	`bool`	`False`	If true, ignore case when considering regular expression matches (defaults to false).
capture_group	`int`	`0`	The capture group to provide to the regex results.
col_suffix	`Optional[str]`	`None`	An optional suffix for the column containing the extracted spans.