operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor
- class operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor(window=1, scope_unit='line', direction='before or after', feature_name_override=None)
Operator to compute structural Rich Doc features for span.
This operator computes structural Rich Doc features for span. Available Features (optionally with a suffix on the feature name):
Note: rich_doc_proximate_text: The text in [window] [scope_unit]s [direction] of span (e.g., [1] [line] [before] the span)
Parameters
Parameters
Name Type Default Info window int
1
Number of scope units to extract feature text from. scope_unit str
'line'
The unit to use (word / line / par / area / page). direction str
'before or after'
The direction (before_only / after_only / before_or_after) to extract feature text from relative to span. feature_name_override Optional[str]
None
If not None, use this as the generated column name (instead of an auto-generated name).