Skip to main content
Version: 0.93

operators.candidates.rich_doc_features.RichDocRegexNGramDetector

class operators.candidates.rich_doc_features.RichDocRegexNGramDetector(regex, target_field=None, capture_group=0, case_sensitive=True)

Featurizer that detects ngrams matching a regex pattern.

Parameters

NameTypeDefaultInfo
regexstrThe regex pattern to search for.
target_fieldOptional[str]NoneThe name of the field to store the detected ngrams in.
capture_groupint0The capture group to use when extracting the ngram text.
case_sensitiveboolTrueWhether to ignore case when searching for the regex pattern.