operators.candidates.extractor_spacy.TokenSpanFeaturizer
- class operators.candidates.extractor_spacy.TokenSpanFeaturizer(field, tokenizer='spacy', **spacy_span_kwargs)
A SpanFeaturizer that yields every token, given a selected tokenization strategy
Given a valid tokenization strategy, this operator will tokenize the input dataframe into spans based on the produced tokens.
Parameters
Parameters
Name Type Default Info field str
The dataframe column to apply the tokenization strategy over. tokenizer str
'spacy'
The tokenizer strategy (one of “spacy” or “whitespace”).