Skip to main content
Version: 0.91

operators.candidates.extractor_spacy.TokenSpanFeaturizer

class operators.candidates.extractor_spacy.TokenSpanFeaturizer(field, tokenizer='spacy', **spacy_span_kwargs)

A SpanFeaturizer that yields every token, given a selected tokenization strategy

Given a valid tokenization strategy, this operator will tokenize the input dataframe into spans based on the produced tokens.

Parameters:
  • field (str) – The dataframe column to apply the tokenization strategy over

  • tokenizer (str, default: 'spacy') – The tokenizer strategy (one of “spacy” or “whitespace”)