operators.candidates.extractor.EntityDictRegexSpanExtractor
- class operators.candidates.extractor.EntityDictRegexSpanExtractor(entity_dict_path, field, ignore_case=False, link_entities=True, col_suffix=None)
A SpanExtractor that yields (and optionally links) spans in an entity-to-aliases dictionary, which supports regexes. Entity Dict Extractor is better suited for keywords.
This is used for entity classification tasks. It additionally annotates each span with the linked entity, using the dictionary value. By default regexes provided are surrounded by (
\b
).