operators.pdf.table.TableFeaturizer
- class operators.pdf.table.TableFeaturizer(field='rich_doc_pdf_url', model='microsoft/table-transformer-structure-recognition', pages_field=None)
A featurizer that detects tables in PDF documents.
- Parameters:
field (
str
, default:'rich_doc_pdf_url'
) – The name of the column containing the PDF URL paths.model (
str
, default:'microsoft/table-transformer-structure-recognition'
) – The pretrained Table Transformer model to use for table detection.pages_field (
Optional
[str
], default:None
) – The name of the column containing the page numbers on which to run the operator on. If None, the operator will run on all pages. Defaults to None.
- Returns:
A Table object containing the table metadata information.
- Return type:
{RichDocCols.TABLES}