Skip to main content
Version: 0.91

operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor

class operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor(window=1, scope_unit='line', direction='before or after', feature_name_override=None)

Operator to compute structural Rich Doc features for span.

This operator computes structural Rich Doc features for span. Available Features (optionally with a suffix on the feature name):

Note: rich_doc_proximate_text: The text in [window] [scope_unit]s [direction] of span (e.g., [1] [line] [before] the span)

Parameters

NameTypeDefaultInfo
windowint1Number of scope units to extract feature text from.
scope_unitstr'line'The unit to use (word / line / par / area / page).
directionstr'before or after'The direction (before_only / after_only / before_or_after) to extract feature text from relative to span.
feature_name_overrideOptional[str]NoneIf not None, use this as the generated column name (instead of an auto-generated name).