Skip to main content
Version: 0.96

operators.candidates.extractor.EntityDictSpanExtractor

class operators.candidates.extractor.EntityDictSpanExtractor(entity_dict_path, field, ignore_case=False, link_entities=True, col_suffix=None)

SpanExtractor that yields (and optionally links) spans in an entity-to-aliases dictionary

This is used for entity classification tasks. It additionally annotates each span with the linked entity, using the dictionary value. This operator is Optimized for keyword aliases.

An example of the entity-to-aliases dictionary can be found in Entity Classification Tutorials.

Parameters:
  • entity_dict_path (str) – The path to the entity-to-aliases dictionary

  • field (str) – The dataframe column to extract spans from

  • ignore_case (bool, default: False) – If true, the extraction will be NOT case sensitive (default to false)

  • link_entities (bool, default: True) – If true, the extracted span will be linked with its original entity/aliases

  • col_suffix (Optional[str], default: None) – An optional suffix for the column containing the extracted spans