operators.candidates.extractor_spacy.TokenSpanFeaturizer
- class operators.candidates.extractor_spacy.TokenSpanFeaturizer(field, tokenizer='spacy', **spacy_span_kwargs)
A SpanFeaturizer that yields every token, given a selected tokenization strategy
Given a valid tokenization strategy, this operator will tokenize the input dataframe into spans based on the produced tokens.
- Parameters:
field (
str
) – The dataframe column to apply the tokenization strategy overtokenizer (
str
, default:'spacy'
) – The tokenizer strategy (one of “spacy” or “whitespace”)