Skip to main content
Version: 0.96

operators.candidates.extractor.RegexSpanFeaturizer

class operators.candidates.extractor.RegexSpanFeaturizer(regex, field, ignore_case=False, capture_group=0, col_suffix=None)

A SpanFeaturizer that yields all matches for a given regular expression

This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.

Parameters:
  • regex (str) – The regular expression to apply over the data

  • field (str) – The name of the column in the dataframe to apply to provided regex over

  • ignore_case (bool, default: False) – If true, ignore case when considering regular expression matches (defaults to false)

  • capture_group (int, default: 0) – The capture group to provide to the regex results

  • col_suffix (Optional[str], default: None) – An optional suffix for the column containing the extracted spans