Skip to main content
Version: 0.94

operators.pdf.text_cluster.TextClusterSpanFeaturizer

class operators.pdf.text_cluster.TextClusterSpanFeaturizer

Featurizer that creates list of spans with one span per horizontal text cluster.

This operator considers all text clusters created by TextClusterer as individual spans. For each spans, Prerequisite: The DAG must contain the TextCluterer before TextClusterSpanFeaturizer.

This operator utilizes existing RichDoc prepopulated columns (no input required), includes {{RichDocCols.TEXT_CLUSTERS}: None, {RichDocCols.TEXT_COL}: str}