Table Recognition and Understanding from PDF Files | Zendy

Tamir  Hassan | Zendy; Robert  Baumgartner | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Table Recognition and Understanding from PDF Files

Author(s) -

Tamir Hassan,

Robert Baumgartner

Publication year - 2007

Publication title -

ninth international conference on document analysis and recognition (icdar 2007)

Language(s) - English

DOI - 10.1109/icdar.2007.241

We propose a flexible method for detecting and understanding tables in PDF files, which is not reliant upon one particular feature being present, for example ruling lines or indentations, and is therefore applicable to a wide variety of visual presentations. We describe the steps required in transforming the low-level PDF instructions into text segments, lines and boxes on a page. We propose three different classifications for published tables, and develop methods to detect these tables and correctly identify their respective rows and columns. We also explain how to recognize spanning rows and columns, and multi-line rows. Experimental results show that our algorithm is effective in converting a wide variety of tabular presentations into HTML for information extraction purposes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research