operators.pdf.text_cluster.TextClusterSpanFeaturizer
- class operators.pdf.text_cluster.TextClusterSpanFeaturizer
Featurizer that creates list of spans with one span per horizontal text cluster.
This operator considers all text clusters created by TextClusterer as individual spans. For each spans, Prerequisite: The DAG must contain the TextCluterer before TextClusterSpanFeaturizer.
This operator utilizes existing RichDoc prepopulated columns (no input required), includes {{RichDocCols.TEXT_CLUSTERS}: None, {RichDocCols.TEXT_COL}: str}