Skip to main content
Version: 0.91

operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor

class operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor(window=1, scope_unit='line', direction='before or after', feature_name_override=None)

Operator to compute structural Rich Doc features for span.

This operator computes structural Rich Doc features for span. Available Features (optionally with a suffix on the feature name):

Note: rich_doc_proximate_text: The text in [window] [scope_unit]s [direction] of span (e.g., [1] [line] [before] the span)

Parameters:
  • window (int, default: 1) – Number of scope units to extract feature text from.

  • scope_unit (str, default: 'line') – The unit to use (word / line / par / area / page).

  • direction (str, default: 'before or after') – The direction (before_only / after_only / before_or_after) to extract feature text from relative to span.

  • feature_name_override (Optional[str], default: None) – If not None, use this as the generated column name (instead of an auto-generated name).