Skip to main content
Version: 25.4

operators.spacy.SpacyPreprocessor

class operators.spacy.SpacyPreprocessor(field, target_field='doc', model='en_core_web_sm', disable=None, **spacy_kwargs)

Preprocessor that parses document and adds json doc column.

Used by Sequence Tagging applications to add additional document metadata.

Parameters

NameTypeDefaultInfo
fieldstrThe field to parse with spacy.
target_fieldstr'doc'The field in which to store the parsed doc object.
modelstr'en_core_web_sm'The model to load into spaCy (only supports models in strap).
disableOptional[List[str]]NoneOptional list of pipeline steps to disable.
spacy_kwargsDict[str, Any]Kwargs to forward to the spacy.load function.