Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset | Zendy

Wataru Ohyama | Zendy; Masakazu Suzuki | Zendy; Seiichi Uchida | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Detecting Mathematical Expressions in Scientific Document Images Using a U-Net Trained on a Diverse Dataset

Author(s) -

Wataru Ohyama,

Masakazu Suzuki,

Seiichi Uchida

Publication year - 2019

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2019.2945825

Subject(s) - computer science , optical character recognition , artificial intelligence , pipeline (software) , benchmarking , symbol (formal) , pattern recognition (psychology) , segmentation , process (computing) , machine learning , natural language processing , image (mathematics) , marketing , business , programming language , operating system

A detection method for mathematical expressions in scientific document images is proposed. Inspired by the promising performance of U-Net, a convolutional network architecture originally proposed for the semantic segmentation of biomedical images, the proposed method uses image conversion by a U-Net framework. The proposed method does not use any information from mathematical and linguistic grammar so that it can be a supplemental bypass in the conventional mathematical optical character recognition (OCR) process pipeline. The evaluation experiments confirmed that (1) the performance of mathematical symbol and expression detection by the proposed method is superior to that of InftyReader, which is state-of-the-art software for mathematical OCR; (2) the coverage of the training dataset to the variation of document style is important; and (3) retraining with small additional training samples will be effective to improve the performance. An additional contribution is the release of a dataset for benchmarking the OCR for scientific documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research