Skip to main content
Version: 0.96

operators.candidates.extractor.RegexSpanFeaturizer

class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)

A SpanFeaturizer that yields all matches for a given regular expression

This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.

Parameters

NameTypeDefaultInfo
regexstrThe regular expression to apply over the data.
fieldstrThe name of the column in the dataframe to apply to provided regex over.
ignore_caseboolFalseIf true, ignore case when considering regular expression matches (defaults to false).
capture_groupint0The capture group to provide to the regex results.
col_suffixOptional[str]NoneAn optional suffix for the column containing the extracted spans.