operators.pdf.table.TableFeaturizer
- class operators.pdf.table.TableFeaturizer(field='rich_doc_pdf_url', model='microsoft/table-transformer-structure-recognition', pages_field=None)
A featurizer that detects tables in PDF documents.
Parameters
Parameters
Returns
Returns
A Table object containing the table metadata information.
Return type
Return type
{RichDocCols.TABLES}
Name Type Default Info field str
'rich_doc_pdf_url'
The name of the column containing the PDF URL paths. model str
'microsoft/table-transformer-structure-recognition'
The pretrained Table Transformer model to use for table detection. pages_field Optional[str]
None
The name of the column containing the page numbers on which to run the operator on. If None, the operator will run on all pages. Defaults to None.