A hybrid intelligence approach to artifact recognition in digital publishing
Author(s) -
J. Fernando Vega-Riveros,
Hector J. Santos Villalobos
Publication year - 2006
Publication title -
proceedings of spie, the international society for optical engineering/proceedings of spie
Language(s) - English
Resource type - Conference proceedings
SCImago Journal Rank - 0.192
H-Index - 176
eISSN - 1996-756X
pISSN - 0277-786X
DOI - 10.1117/12.646240
Subject(s) - computer science , artifact (error) , segmentation , artificial intelligence , document layout analysis , set (abstract data type) , information retrieval , data mining , pattern recognition (psychology) , image (mathematics) , programming language
The system presented integrates rule-based and case-based reasoning for artifact recognition in Digital Publishing. In Variable Data Printing (VDP) human proofing could result prohibitive since a job could contain millions of different instances that may contain two types of artifacts: 1) evident defects, like a text overflow or overlapping 2) style-dependent artifacts, subtle defects that show as inconsistencies with regard to the original job design. We designed a Knowledge-Based Artifact Recognition tool for document segmentation, layout understanding, artifact detection, and document design quality assessment. Document evaluation is constrained by reference to one instance of the VDP job proofed by a human expert against the remaining instances. Fundamental rules of document design are used in the rule-based component for document segmentation and layout understanding. Ambiguities in the design principles not covered by the rule-based system are analyzed by case-based reasoning, using the Nearest Neighbor Algorithm, where features from previous jobs are used to detect artifacts and inconsistencies within the document layout. We used a subset of XSL-FO and assembled a set of 44 document samples. The system detected all the job layout changes, while obtaining an overall average accuracy of 84.56%, with the highest accuracy of 92.82%, for overlapping and the lowest, 66.7%, for the lack-of-white-space.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom