
Heterogeneous AI Music Generation Technology Integrating Fine-grained Control
Author(s) -
Hongtao Wang,
Li Gong
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3592699
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
As artificial intelligence algorithms continue to advance, researchers have increasingly harnessed their capabilities to generate music that resonates with human emotions, offering a novel means of alleviating the escalating pressures of contemporary life. To tackle the persistent issue of low accuracy in current emotion recognition and music generation systems, an innovative approach was proposed that fused a graph convolutional neural network with a channel attention mechanism for emotion recognition. This integrated model was subsequently paired with a Transformer architecture, creating a sophisticated framework capable of fine-grained control and heterogeneous music generation. In comparing the performance of the emotion recognition model against other leading models, the results underscored its exceptional accuracy across five distinct electroencephalogram signal bands: 97.3%, 95.8%, 96.9%, 98.4%, and 97.6%, respectively. Crucially, all these accuracy metrics exceeded the 95% benchmark, clearly demonstrating superiority over the comparative models. Additionally, a rigorous performance assessment was conducted to evaluate the music generation model’s capabilities against alternative approaches. The findings revealed that the suggested model achieved an average mean square error of 0.27 and an average root mean square error of 0.24. These error rates were notably lower than those of the competing models, highlighting the enhanced precision and fidelity of the music generated. Together, these results validated the effectiveness of both the emotion recognition and music generation models developed in this research. This research not only propelled the existing frontiers of emotion detection and musical composition forward but also laid a robust theoretical framework to facilitate subsequent investigations into the emerging field of emotion-aware music generation.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom