Heterogeneous AI Music Generation Technology Integrating Fine-grained Control | Zendy

Hongtao Wang | Zendy; Li Gong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Heterogeneous AI Music Generation Technology Integrating Fine-grained Control

Author(s) -

Hongtao Wang,

Li Gong

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3592699

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

As artificial intelligence algorithms continue to advance, researchers have increasingly harnessed their capabilities to generate music that resonates with human emotions, offering a novel means of alleviating the escalating pressures of contemporary life. To tackle the persistent issue of low accuracy in current emotion recognition and music generation systems, an innovative approach was proposed that fused a graph convolutional neural network with a channel attention mechanism for emotion recognition. This integrated model was subsequently paired with a Transformer architecture, creating a sophisticated framework capable of fine-grained control and heterogeneous music generation. In comparing the performance of the emotion recognition model against other leading models, the results underscored its exceptional accuracy across five distinct electroencephalogram signal bands: 97.3%, 95.8%, 96.9%, 98.4%, and 97.6%, respectively. Crucially, all these accuracy metrics exceeded the 95% benchmark, clearly demonstrating superiority over the comparative models. Additionally, a rigorous performance assessment was conducted to evaluate the music generation model’s capabilities against alternative approaches. The findings revealed that the suggested model achieved an average mean square error of 0.27 and an average root mean square error of 0.24. These error rates were notably lower than those of the competing models, highlighting the enhanced precision and fidelity of the music generated. Together, these results validated the effectiveness of both the emotion recognition and music generation models developed in this research. This research not only propelled the existing frontiers of emotion detection and musical composition forward but also laid a robust theoretical framework to facilitate subsequent investigations into the emerging field of emotion-aware music generation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research