z-logo
open-access-imgOpen Access
2D Positional Embedding-based Transformer for Scene Text Recognition
Author(s) -
Zobeir Raisi,
Mohamed A. Naiel,
Paul Fieguth,
Steven Wardell,
John Zelek
Publication year - 2021
Publication title -
journal of computational vision and imaging systems
Language(s) - English
Resource type - Journals
ISSN - 2562-0444
DOI - 10.15353/jcvis.v6i1.3533
Subject(s) - computer science , transformer , artificial intelligence , encoder , embedding , leverage (statistics) , architecture , pattern recognition (psychology) , text recognition , computer vision , image (mathematics) , engineering , geography , electrical engineering , archaeology , voltage , operating system
Recent state-of-the-art scene text recognition methods are primarily based on Recurrent Neural Networks (RNNs), however, these methods require one-dimensional (1D) features and are not designed for recognizing irregular-text instances due to the loss of spatial information present in the original two-dimensional (2D) images.  In this paper, we leverage a Transformer-based architecture for recognizing both regular and irregular text-in-the-wild images. The proposed method takes advantage of using a 2D positional encoder with the Transformer architecture to better preserve the spatial information of 2D image features than previous methods. The experiments on popular benchmarks, including the challenging COCO-Text dataset, demonstrate that the proposed scene text recognition method outperformed the state-of-the-art in most cases, especially on irregular-text recognition.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom