
Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum
Author(s) -
Yuanming Zhang,
Jing Lu,
Fei Chen,
Haoliang Du,
Xia Gao,
Zhibin Lin
Publication year - 2025
Publication title -
ieee transactions on neural systems and rehabilitation engineering
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 1.093
H-Index - 140
eISSN - 1558-0210
pISSN - 1534-4320
DOI - 10.1109/tnsre.2025.3591819
Subject(s) - bioengineering , computing and processing , robotics and control systems , signal processing and analysis , communication, networking and broadcast technologies
Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom