
GeoViT: Mixed-Scale Transformer for Perspective Correction in Print-Cam Image Watermarking
Author(s) -
Said Boujerfaoui,
Anass Mancour-Billah,
Hassan Douzi,
Rachid Harba
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3575472
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Printed identity documents, such as ID cards and passports, continue to play a vital role in identity verification, despite the growing adoption of digital authentication methods. The print-cam process, which involves printing awatermarked image and capturing it with a smartphone camera, provides a practical approach to the authentication of mobile-based documents. However, this process introduces challenges such as perspective distortions, compression artifacts, noise, and lighting variations, making accurate watermark detection difficult. Existing distortion correction techniques often struggle to fully address these issues, especially in practical scenarios where handheld camera use is common and conditions are less controlled. In this study, we propose GeoViT, a Transformer-based framework that enhances watermark robustness against print-cam attacks. GeoViT utilizes a multi-head attention mechanism to capture global dependencies and spatial variations, improving feature extraction for distortion rectification. To address the limitations of the naive feed-forward network in Transformers for multi-scale information, we introduce a mixed-scale feed-forward network, which generates robust features for geometric alignment. Additionally, we incorporate a mixture of expert feature compensators, integrating local context from CNN-based operators to refine distortion correction. Our method significantly outperforms existing approaches in geometric accuracy, visual fidelity, and perceptual quality. Extensive experiments on a diverse set of ID images captured under various conditions with different smartphone models demonstrate that GeoViT significantly improves watermark robustness. These results highlight GeoViT’s effectiveness as a secure and efficient solution for mobile-based identity document authentication, advancing the development of watermarking techniques for real-time, smartphone-compatible systems.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom