Skip to main content
Version: 0.94

operators.candidates.extractor.ParagraphSpanExtractor

class operators.candidates.extractor.ParagraphSpanExtractor(field, col_suffix=None)

Extracts spans (slices of documents) that contain paragraphs (using regex)

This operator uses a regex pattern to extract all paragraphs as spans from the parent document. Trailing newline characters are preserved for each paragraph

Parameters

NameTypeDefaultInfo
fieldstrThe dataframe column to extract paragraph spans from.
col_suffixOptional[str]NoneAn optional suffix for the column containing the extracted spans.