operators.whitespace.WhitespacePreprocessor
- class operators.whitespace.WhitespacePreprocessor(fields, to_replace=None, output_field_suffix='')
Preprocessor that normalizes whitespace.
This operator finds all of the different types of whitespace in a given text field and normalizes it to the regular space character (U+0020). By default, the following non-standard space characters with the regular space: U+00A0, U+2000 to U+200A, U+202F, U+205F, U+3000. See https://en.wikipedia.org/wiki/Whitespace_character for more details on what these UTF-8 code points mean.
Parameters
Parameters
Name Type Default Info fields List[str]
The fields to apply whitespace pre-processing to. to_replace Optional[str]
None
A string containing all characters to be replaced with a regular whitespace (U+0020). output_field_suffix Optional[str]
''
To avoid updating in place, optionally specify a suffix to add to specified fields.