Transformer-based OCR System for Handwritten Signature Verification in Lawmaking Petitions: A Proof of Concept
Author(s) -
Elena Sanchez-Nielsen,
Ismael Martin-Herrera
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3620686
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Handwritten signature verification (HSV) is essential for validating lawmaking petitions, where each signatory provides a handwritten signature and a personal identifier (ID). Manual verification is slow and error-prone due to large petition volumes and handwriting variability. This paper presents a methodology for automated HSV: an end-to-end transformer-based OCR pipeline built on TrOCR, a state-of-the-art model for text recognition. While TrOCR focuses on cropped text segments, the proposed methodology extends it to full-document, integrated processing. It performs document structure detection, ID recognition, handwritten signature detection, and ID–signature pairing within a unified framework. As part of the methodology, we developed a synthetic handwriting generation method to create training data, addressing data scarcity and protecting privacy without using real personal information. The system was developed in collaboration with the Parliament of the Canary Islands, ensuring institutional alignment and practical applicability. Evaluation considered post-processing, model fine-tuning, and dataset scale. The pipeline achieved 98.9% character recognition and 90.5% complete ID recognition on synthetic data, and 84.3% character recognition and 80.0% ID recognition on real petition images. In comparison, a convolutional recurrent neural network (CRNN) baseline achieved 81.2% character recognition and 12.6% ID recognition on synthetic data, and 66.2% and 7.4% on real petitions. Post-processing improved recognition for non-fine-tuned models, while fine-tuning on larger datasets was crucial for high accuracy and generalization. In conclusion, this methodology demonstrates that extending TrOCR into a full-document, transformer-based pipeline provides a scalable and privacy-preserving solution for HSV in legislative processes, outperforming conventional neural networks and reducing manual verification workload.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom