<title>Document structure analysis algorithms: a literature survey</title>
Author(s) -
Song Mao,
Azriel Rosenfeld,
Tapas Kanungo
Publication year - 2003
Publication title -
proceedings of spie, the international society for optical engineering/proceedings of spie
Language(s) - English
Resource type - Conference proceedings
SCImago Journal Rank - 0.192
H-Index - 176
eISSN - 1996-756X
pISSN - 0277-786X
DOI - 10.1117/12.476326
Subject(s) - computer science , information retrieval , document structure description , tree structure , data structure , document layout analysis , algorithm , theoretical computer science , data mining , natural language processing , artificial intelligence , xml , programming language , world wide web , image (mathematics)
Document structure analysis can be regarded as a syntactic analysis problem. The order and containment relations among,the physical or logical components of a document page can be described by an ordered tree structure and can be modeled by a tree grammar,which describes the page at the component level in terms of regions or blocks. This paper provides a detailed survey of past work on document structure analysis algorithms and summarize the limitations of past approaches. In particular, we survey past work on document physical layout representations and algorithms, document logical structure representations and algorithms, and performance evaluation of document structure analysis algorithms. In the last section, we summarize this work and point out its limitations.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom