Skip to main content
Version: 25.2

operators.candidates.extractor.EntityDictRegexSpanFeaturizer

class operators.candidates.extractor.EntityDictRegexSpanFeaturizer(entity_dict_path, field, ignore_case=False, link_entities=True, col_suffix=None)

A SpanFeaturizer that yields (and optionally links) spans in an entity-to-aliases dictionary, which supports regexes. Entity Dict Featurizer is better suited for keywords.

This is used for entity classification tasks. It additionally annotates each span with the linked entity, using the dictionary value. By default regexes provided are surrounded by (\b).