operators.candidates.extractor.ParagraphSpanExtractor
- class operators.candidates.extractor.ParagraphSpanExtractor(field, col_suffix=None)
Extracts spans (slices of documents) that contain paragraphs (using regex)
This operator uses a regex pattern to extract all paragraphs as spans from the parent document. Trailing newline characters are preserved for each paragraph
Parameters
Parameters
Name Type Default Info field str
The dataframe column to extract paragraph spans from. col_suffix Optional[str]
None
An optional suffix for the column containing the extracted spans.