z-logo
open-access-imgOpen Access
ViT-Core: Lightweight Anomaly Detection Model using Transformer-based Feature Extractor
Author(s) -
Byeong-Uk Jeon,
Dong-Joon Suh,
Joo-Chang Kim,
Kyungyong Chung
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3618462
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In modern smart manufacturing, real-time anomaly detection is critical, yet state of the art (SOTA) models like PatchCore are often too computationally demanding for practical deployment on resource -constrained edge devices. This creates a crucial gap between algorithmic potential and industrial application. To bridge this gap, we propose ViT-Core, a lightweight and efficient model for anomaly detection. Our primary innovation involves replacing the conventional CNN backbone with an efficient Swin Transformer, which reduces feature map dimensionality and, consequently, memory usage. To maintain high accuracy, we employ a Cut-Paste-based transfer learning stage, a self-supervised process that fine-tunes the model to the target data distribution without requiring complex training or additional labels. Evaluated on the comprehensive MVTec AD benchmark, ViT-Core demonstrates a drastic reduction in computational overhead, with memory usage decreased by 49.2% and inference time by 49.5% compared to the PatchCore baseline. This optimization is achieved with a statistically insignificant difference in image-level classification performance (0.9859 AUROC vs. 0.9865). Moreover, ViT-Core excels in anomaly localization, improving the pixel-level AUROC to 0.9817 from PatchCore's 0.9756. Consequently, ViT-Core presents an optimal balance of accuracy and efficiency, providing a practical and scalable solution that enables the widespread deployment of high-performance, real-time quality inspection systems directly on existing industrial hardware.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom