Skip to main content
Version: 0.95

operators.pdf.table.TableFeaturizer

class operators.pdf.table.TableFeaturizer(field='rich_doc_pdf_url', model='microsoft/table-transformer-structure-recognition', pages_field=None)

A featurizer that detects tables in PDF documents.

Parameters

NameTypeDefaultInfo
fieldstr'rich_doc_pdf_url'The name of the column containing the PDF URL paths.
modelstr'microsoft/table-transformer-structure-recognition'The pretrained Table Transformer model to use for table detection.
pages_fieldOptional[str]NoneThe name of the column containing the page numbers on which to run the operator on. If None, the operator will run on all pages. Defaults to None.

Returns

A Table object containing the table metadata information.

Return type

{RichDocCols.TABLES}