Predicting the perception of performed dynamics in music audio with ensemble learning | Zendy

Anders Elowsson | Zendy; Anders Friberg | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Predicting the perception of performed dynamics in music audio with ensemble learning

Author(s) -

Anders Elowsson,

Anders Friberg

Publication year - 2017

Publication title -

the journal of the acoustical society of america

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.619

H-Index - 187

eISSN - 1520-8524

pISSN - 0001-4966

DOI - 10.1121/1.4978245

Subject(s) - computer science , active listening , dynamics (music) , perception , set (abstract data type) , speech recognition , perceptron , feature (linguistics) , ground truth , artificial intelligence , pattern recognition (psychology) , acoustics , artificial neural network , psychology , physics , linguistics , philosophy , communication , neuroscience , programming language

By varying the dynamics in a musical performance, the musician can convey structure and different expressions. Spectral properties of most musical instruments change in a complex way with the performed dynamics, but dedicated audio features for modeling the parameter are lacking. In this study, feature extraction methods were developed to capture relevant attributes related to spectral characteristics and spectral fluctuations, the latter through a sectional spectral flux. Previously, ground truths ratings of performed dynamics had been collected by asking listeners to rate how soft/loud the musicians played in a set of audio files. The ratings, averaged over subjects, were used to train three different machine learning models, using the audio features developed for the study as input. The highest result was produced from an ensemble of multilayer perceptrons with an R2 of 0.84. This result seems to be close to the upper bound, given the estimated uncertainty of the ground truth data. The result is well above that of individual human listeners of the previous listening experiment, and on par with the performance achieved from the average rating of six listeners. Features were analyzed with a factorial design, which highlighted the importance of source separation in the feature extraction.

QC 20170406

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research