z-logo
open-access-imgOpen Access
Automatic building of a large and complete dataset for image-based table structure recognition
Author(s) -
Trần Quang Vinh,
AUTHOR_ID,
Nguyen Thi Ngọc Diep,
AUTHOR_ID
Publication year - 2021
Publication title -
tạp chí khoa học đại học quốc gia hà nội: công nghệ thông tin - truyền thông (vnu journal of science: computer science and communication engineering)
Language(s) - English
Resource type - Journals
eISSN - 2615-9260
pISSN - 2588-1086
DOI - 10.25073/2588-1086/vnucsce.293
Subject(s) - table (database) , annotation , computer science , minimum bounding box , bounding overwatch , artificial intelligence , image (mathematics) , correctness , sequence (biology) , pattern recognition (psychology) , representation (politics) , automatic image annotation , information retrieval , data mining , image processing , algorithm , genetics , biology , politics , political science , law
Table is one of the most common ways to represent structured data in documents. Existing researches on image-based table structure recognition often rely on limited datasets with the largest amount of 3,789 human-labeled tables as ICDAR 19 Track B dataset. A recent TableBank dataset for table structures contains 145K tables, however, the tables are labeled in an HTML tag sequence format, which impedes the development of image-based recognition methods. In this paper, we propose several processing methods that automatically convert an HTML tag sequence annotation into bounding box annotation for table cells in one table image. By ensembling these methods, we could convert 42,028 tables with high correctness, which is 11 times larger than the largest existing dataset (ICDAR 19). We then demonstrate that using these bounding box annotations, a straightforward representation of objects in images, we can achieve much higher F1-scores of table structure recognition at many high IoU thresholds using only off-the-shelf deep learning models: F1-score of 0.66 compared to the state-of-the-art of 0.44 for ICDAR19 dataset. A further experiment on using explicit bounding box annotation for image-based table structure recognition results in higher accuracy (70.6%) than implicit text sequence annotation (only 33.8%). The experiments show the effectiveness of our largest-to-date dataset to open up opportunities to generalize on real-world applications. Our dataset and experimental models are publicly available at shorturl.at/hwHY3

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here