operators.candidates.extractor.ParagraphSpanExtractor
- class operators.candidates.extractor.ParagraphSpanExtractor(field, col_suffix=None)
Extracts spans (slices of documents) that contain paragraphs (using regex)
This operator uses a regex pattern to extract all paragraphs as spans from the parent document. Trailing newline characters are preserved for each paragraph
- Parameters:
field (
str
) – The dataframe column to extract paragraph spans fromcol_suffix (
Optional
[str
], default:None
) – An optional suffix for the column containing the extracted spans