z-logo
open-access-imgOpen Access
Classification of short segment pediatric heart sounds based on a transformer-based convolutional neural network
Author(s) -
Md Hassanuzzaman,
Samit Kumar Ghosh,
Mohammad Nurul Akhtar Hasan,
Mohammad Abdullah Al Mamun,
Khawza I Ahmed,
Raqibul Mostafa,
Ahsan H Khandoker
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3573870
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Congenital heart diseases (CHDs), caused by structural abnormalities in the heart and blood vessels, pose a significant public health concern and contribute significantly to the socioeconomic burden, particularly in pediatric populations. Phonocardiograms (PCGs), as a non-invasive and cost-effective diagnostic modality, capture vital acoustic signals that reflect the mechanical activity of the heart and can reveal pathological patterns associated with various CHD types. This study investigates the minimum signal duration required for accurate automatic classification of heart sounds and evaluates signal quality using the root mean square of successive differences (RMSSD) and the zero-crossing rate (ZCR). Mel-frequency cepstral coefficients (MFCCs) are extracted as features and fed into a transformer-based residual one-dimensional convolutional neural network (1D-CNN) for classification. Experimental results show that a threshold of 0.4 for RMSSD and ZCR yields optimal classification performance, with a minimum signal length of 5 seconds required for reliable results. Shorter segments (3 seconds) lack sufficient diagnostic information, while longer segments (15 seconds) may introduce additional noise. The proposed model achieves a maximum classification accuracy of 93.69% with 5-second signals.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here