StructMamba: Structured Harmonic and Temporal Music Analysis via Dual-Axis Mamba and Attention
Author(s) -
Amit Kumar Bairwa,
Siddhanth Bhat,
Tanishk Sawant,
Manoj Kumar Bohra
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3617184
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Modeling musical audio requires capturing hierarchical relationships between harmonic textures, rhythmic motifs, and long-range structural repetitions. Convolutional networks extract local features efficiently, while transformers provide global modeling, yet both face mismatches with musical structure. In this work we introduce StructMamba, a dual-axis architecture that unifies state-space modeling with global two-dimensional attention. Our design decomposes spectrogram modeling into frequency-wise and time-wise Mamba modules, enabling independent learning of harmonic and rhythmic dependencies before fusing them through structured attention. Evaluated on benchmark tasks in genre classification, onset detection, and structural segmentation, StructMamba outperforms strong CNN, transformer, and hybrid baselines, while maintaining stability in low-resource settings. Beyond accuracy, its internal representations align with music-theoretic constructs such as motifs, downbeats, and sectional boundaries, offering rare interpretability for deep audio models. These findings position StructMamba as an efficient and musically aligned solution for time–frequency audio modeling, with practical implications for music education, annotation, and production.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom