A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex | Zendy

Qingtian Zhang | Zendy; Xiaolin Hu | Zendy; Bo Hong | Zendy; Bo Zhang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex

Author(s) -

Qingtian Zhang,

Xiaolin Hu,

Bo Hong,

Bo Zhang

Publication year - 2019

Publication title -

plos computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.628

H-Index - 182

eISSN - 1553-7358

pISSN - 1553-734X

DOI - 10.1371/journal.pcbi.1006766

Subject(s) - auditory cortex , neural coding , receptive field , inferior colliculus , formant , computer science , speech recognition , pattern recognition (psychology) , vowel , voxel , auditory system , artificial intelligence , neuroscience , biology , nucleus

The auditory pathway consists of multiple stages, from the cochlear nucleus to the auditory cortex. Neurons acting at different stages have different functions and exhibit different response properties. It is unclear whether these stages share a common encoding mechanism. We trained an unsupervised deep learning model consisting of alternating sparse coding and max pooling layers on cochleogram-filtered human speech. Evaluation of the response properties revealed that computing units in lower layers exhibited spectro-temporal receptive fields (STRFs) similar to those of inferior colliculus neurons measured in physiological experiments, including properties such as sound onset and termination, checkerboard pattern, and spectral motion. Units in upper layers tended to be tuned to phonetic features such as plosivity and nasality, resembling the results of field recording in human auditory cortex. Variation of the sparseness level of the units in each higher layer revealed a positive correlation between the sparseness level and the strength of phonetic feature encoding. The activities of the units in the top layer, but not other layers, correlated with the dynamics of the first two formants (F1, F2) of all phonemes, indicating the encoding of phoneme dynamics in these units. These results suggest that the principles of sparse coding and max pooling may be universal in the human auditory pathway.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research