Skip to main content
Version: 0.93

operators.candidates.rich_doc_features.RichDocRegexNGramDetector

class operators.candidates.rich_doc_features.RichDocRegexNGramDetector(regex, target_field=None, capture_group=0, case_sensitive=True)

Featurizer that detects ngrams matching a regex pattern.

Parameters:
  • regex (str) – The regex pattern to search for

  • target_field (Optional[str], default: None) – The name of the field to store the detected ngrams in

  • capture_group (int, default: 0) – The capture group to use when extracting the ngram text

  • case_sensitive (bool, default: True) – Whether to ignore case when searching for the regex pattern