Skip to main content
Version: 25.2

operators.candidates.extractor.RegexSpanExtractor

class operators.candidates.extractor.RegexSpanExtractor(regex, field, ignore_case=False, capture_group=0, col_suffix=None)

A SpanExtractor that yields all matches for a given regular expression

This operator applies a given regular expression over a specified field in the passing DataFrame and adds each match as a new span in the output.

Parameters

NameTypeDefaultInfo
regexstrThe regular expression to apply over the data.
fieldstrThe name of the column in the dataframe to apply to provided regex over.
ignore_caseboolIf true, ignore case when considering regular expression matches (defaults to false).
capture_groupstrThe capture group to provide to the regex results.
col_suffixstrAn optional suffix for the column containing the extracted spans.