Pillars-SCANet: A 3D Object Detection Algorithm Integrating Multi-Head Spatial and Channel Attention with Feature Pyramid | Zendy

Hao Jiang | Zendy; Ge Peng | Zendy; Xin Wang | Zendy; He Huang | Zendy; Junxing Yang | Zendy

Open Access

Pillars-SCANet: A 3D Object Detection Algorithm Integrating Multi-Head Spatial and Channel Attention with Feature Pyramid

Author(s) -

Hao Jiang,

Ge Peng,

Xin Wang,

He Huang,

Junxing Yang

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3596703

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Point cloud 3D object detection technology has increasingly gained attention due to its precision in rendering three-dimensional environments essential for autonomous driving. However, prevalent detection methods demonstrate limited adaptability to variable target scales, leading to inadequate detection across different target types. Furthermore, voxel-based methods, which are commonly adopted to accelerate detection speeds, convert point clouds into voxels or pillars. This transformation often neglects the disparity in the receptive fields horizontally and vertically during the generation of 2D pseudo-images by pillars. To mitigate these limitations, this study introduces the Pillars-SCANet, a model equipped with a novel multi-scale feature extraction network and an adaptive attention feature fusion network. The former employs grouped residual attention modules that stra-tegically balance the receptive fields horizontally and vertically within the Pillar encoding module. It methodically progresses through four stages, each deepening the analysis to construct a comprehensive multi-level feature pyramid. The latter network enhances the model’s adaptability to various target sizes by guiding features across both channel and spatial dimensions. Extensive experimental results indicate that Pillars-SCANet opti-mally balances inference speed and detection accuracy. The innovative design of its mod-ules contributes to a parameter count of only 6.63M, achieving an inference speed of 24 FPS. Evaluated on the KITTI dataset, Pillars-SCANet attains mean average precisions (mAP) of 69.85%, 62.27%, and 71.68% in BEV, 3D detection box, and AOS benchmarks, respectively. These results represent improvements of 3.66%, 3.07%, and 2.82% over the Pointpillars network. On the Waymo dataset, the Pillars-SCANet model achieved mean Average Precision (mAP) of 72.3% and mean Average Precision weighted by Heading (mAPH) of 69.6%, representing improvements of 9.5% and 11.8% respectively compared to the PointPillars network.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research