
Minimizing model size of CNN-based Vehicle Make Recognition for Frontal Vehicle Images
Author(s) -
Wiput Puisamlee,
Rathachai Chawuthai
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3574187
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Vehicle Make Model Recognition (VMMR) is generally considered in any Intelligent Transport Systems (ITS), free flow image-based toll systems, and enforcement systems. These systems need to analyze and process any images of the front of vehicles as evidence of use. Currently, Convolutional Neural Networks (CNN) are well-known techniques for image classification research and they applied to solve problems in the VMMR domain. Increasing the accuracy of classification with a large number of classes requires more complex model structures and a greater number of internal parameters. It resulted in the issues larger models and potentially longer processing times. This work aims to study and develop a smaller CNN model that is suitable for devices with limited resources, such as embedded computers and embedded computer cameras, for recognizing vehicle makes from frontal images. The experimental datasets were collected from actual free-flow toll systems, and a CNN model was developed that achieved 99% accuracy in recognizing vehicle makes. The developed model is smaller than state-of-the-art CNN models tested (VGG16, InceptionV3, Yolo11m-cls, and ResNet50) and achieves over 90% accuracy. It was able to develop the CTv1 model to achieve an F1 score approximately 2.06% higher than the best one, which is InceptionV3, while reducing the number of parameters by 69.95%. The model was tested on a Raspberry Pi 3 Model B, where it processed images at the average speed of 1 second per image and a power consumption of 25 milliwatts-hour (mWh). Our study also reduces the CNN model size using Depth-wise Separable Convolutional and 1x1 Convolutional Dimension Reduction (Bottleneck) methods, as well as adjusting the Padding and Stride of the Convolutional Layer to test the accuracy, training time, processing time, and model size for vehicle make recognition.