z-logo
Premium
A framework for evaluating multimodal music mood classification
Author(s) -
Hu Xiao,
Choi Kahyun,
Downie J. Stephen
Publication year - 2017
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.23649
Subject(s) - lyrics , computer science , music information retrieval , mood , set (abstract data type) , feature (linguistics) , speech recognition , audio signal processing , artificial intelligence , natural language processing , audio analyzer , audio signal , psychology , linguistics , musical , speech coding , art , visual arts , philosophy , literature , psychiatry , programming language
This research proposes a framework for music mood classification that uses multiple and complementary information sources, namely, music audio, lyric text, and social tags associated with music pieces. This article presents the framework and a thorough evaluation of each of its components. Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio‐only features. Automatic feature selection techniques were further proved to have reduced feature space. In addition, the examination of learning curves shows that the hybrid systems using lyrics and audio needed fewer training samples and shorter audio clips to achieve the same or better classification accuracies than systems using lyrics or audio singularly. Last but not least, performance comparisons reveal the relative importance of audio and lyric features across mood categories.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here