operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor
- class operators.candidates.rich_doc_features.RichDocSpanStructuralPreprocessor(window=1, scope_unit='line', direction='before or after', feature_name_override=None)
Operator to compute structural Rich Doc features for span.
This operator computes structural Rich Doc features for span. Available Features (optionally with a suffix on the feature name):
Note: rich_doc_proximate_text: The text in [window] [scope_unit]s [direction] of span (e.g., [1] [line] [before] the span)
- Parameters:
window (
int
, default:1
) – Number of scope units to extract feature text from.scope_unit (
str
, default:'line'
) – The unit to use (word / line / par / area / page).direction (
str
, default:'before or after'
) – The direction (before_only / after_only / before_or_after) to extract feature text from relative to span.feature_name_override (
Optional
[str
], default:None
) – If not None, use this as the generated column name (instead of an auto-generated name).