2D Positional Embedding-based Transformer for Scene Text Recognition
Author(s) -
Zobeir Raisi,
Mohamed A. Naiel,
Paul Fieguth,
Steven Wardell,
John Zelek
Publication year - 2021
Publication title -
journal of computational vision and imaging systems
Language(s) - English
Resource type - Journals
ISSN - 2562-0444
DOI - 10.15353/jcvis.v6i1.3533
Subject(s) - computer science , transformer , artificial intelligence , encoder , embedding , leverage (statistics) , architecture , pattern recognition (psychology) , text recognition , computer vision , image (mathematics) , engineering , geography , electrical engineering , archaeology , voltage , operating system
Recent state-of-the-art scene text recognition methods are primarily based on Recurrent Neural Networks (RNNs), however, these methods require one-dimensional (1D) features and are not designed for recognizing irregular-text instances due to the loss of spatial information present in the original two-dimensional (2D) images. In this paper, we leverage a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images. The proposed method takes advantage of using a 2D positional encoder with the Transformer architecture to better preserve the spatial information of 2D image features than previous methods. The experiments on popular benchmarks, including the challenging COCO-Text dataset, demonstrate that the proposed scene text recognition method outperformed the state-of-the-art in most cases, especially on irregular-text recognition.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom