Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum | Zendy

Yuanming Zhang | Zendy; Jing Lu | Zendy; Fei Chen | Zendy; Haoliang Du | Zendy; Xia Gao | Zendy; Zhibin Lin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

Author(s) -

Yuanming Zhang,

Jing Lu,

Fei Chen,

Haoliang Du,

Xia Gao,

Zhibin Lin

Publication year - 2025

Publication title -

ieee transactions on neural systems and rehabilitation engineering

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 1.093

H-Index - 140

eISSN - 1558-0210

pISSN - 1534-4320

DOI - 10.1109/tnsre.2025.3591819

Subject(s) - bioengineering , computing and processing , robotics and control systems , signal processing and analysis , communication, networking and broadcast technologies

Prior research on directional focus decoding, a.k.a. selective Auditory Attention Decoding (sAAD), has primarily focused on binary “left-right” tasks. However, decoding of the attended speaker’s precise direction is desired. Existing approaches often underutilize spatial audio information, resulting in suboptimal performance. In this paper, we address this limitation by leveraging a recent dataset containing two concurrent speakers at two of 14 possible directions. We demonstrate that models relying solely on EEG yield limited decoding accuracy in leave-one-out settings. To enhance performance, we propose to integrate spatial spectra as an additional input. We evaluate three model architectures, namely CNN, LSM-CNN, and Deformer, under two strategies for utilizing spatial information: all-in-one (end-to-end) and pairwise (two-stage) decoding. While all-in-one decoders directly take dual-modal inputs and output the attended direction, pairwise decoders first leverage spatial spectra to decode the competing pairs, and then a specific model is used to decode the attended direction. Our proposed all-in-one Sp-EEG-Deformer model achieves 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios, respectively, using 1second decision windows (chance level: 50%, indicating random guessing). Meanwhile, the pairwise Sp-EEG-Deformer decoder achieves a 14-class decoding accuracy of 63.62% (10 s). Our experiments reveal that spatial spectra are particularly effective at reducing the 14-class problem into a binary one. On the other hand, EEG features are more discriminative and play a crucial role in precisely identifying the final attended direction within this reduced 2-class set. These results highlight the effectiveness of our proposed dual-modal directional decoding strategies.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research