operators.candidates.rich_doc_features.RichDocRegexNGramDetector
- class operators.candidates.rich_doc_features.RichDocRegexNGramDetector(regex, target_field=None, capture_group=0, case_sensitive=True)
Featurizer that detects ngrams matching a regex pattern.
- Parameters:
regex (
str
) – The regex pattern to search fortarget_field (
Optional
[str
], default:None
) – The name of the field to store the detected ngrams incapture_group (
int
, default:0
) – The capture group to use when extracting the ngram textcase_sensitive (
bool
, default:True
) – Whether to ignore case when searching for the regex pattern