
Lightweight Spatial Sliced-Concatenate-Multireceptive-Field Enhance and Joint Channel Attention Mechanism for Infrared Object Detection
Author(s) -
Zhiheng Pan,
Liuchao Xu,
Chuandong Liang,
Kui Pan,
Mi Zhao,
Min Lu
Publication year - 2022
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2022.3172504
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Infrared object detection has high application value in the field of remote sensing due to its anti-interference ability and long detection distance. However, infrared images suffer from many disadvantages such as poor fine-grained information, low resolution and contrast, which makes infrared object detection methods have rather poor performance while utilizing conventional object detection methods. Two novel lightweight attention mechanisms were proposed in this study to solve the problem. Sliced concatenate and multi receptive-field spatial group-wise enhance (SCMR-SGE) module, utilizing grouping feature operation, enhances the sub-features by generating attention factors at each location in each semantic group and suppresses irrelevant information. Joint attention module is used to selectively enhance or inhibit channel information through attention factors generated by three different pooling layers. Unlike the previous work, each module was used only once, and was embed into two modules into feature pyramid network (FPN) instead of backbone network. The mAP50 of our method based on YOLOv5m alone reached 82.7%, which was the best result on the original FLIR dataset which didn’t process the imbalanced sample problem. At the same time, the detection speed can still be maintained at around 60 FPS on single GPU. Our experiments demonstrated that our lightweight attention mechanisms have better performance than mainstream ones, and the method of embedding our attention mechanisms into the CNN is effective and universal.