XMusic:通向通用且可控的符号音乐生成框架

XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

January 15, 2025
作者: Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu
cs.AI

摘要

近年来,在图像合成和文本生成领域取得了显著进展的人工智能生成内容(AIGC),生成的内容与人类产生的相媲美。然而,人工智能生成的音乐质量尚未达到这一水准,主要是由于有效控制音乐情感和确保高质量输出的挑战。本文提出了一个通用的符号音乐生成框架XMusic,支持灵活的提示(例如图像、视频、文本、标签和哼唱)以生成可控制情感和高质量的符号音乐。XMusic包括两个核心组件,XProjector和XComposer。XProjector将各种形式的提示解析为符号音乐元素(例如情感、流派、节奏和音符)在投影空间内生成匹配的音乐。XComposer包含一个生成器和一个选择器。生成器基于我们创新的符号音乐表示生成可控制情感且旋律优美的音乐,而选择器通过构建涉及质量评估、情感识别和流派识别任务的多任务学习方案来识别高质量的符号音乐。此外,我们构建了XMIDI,一个包含108,023个MIDI文件的大规模符号音乐数据集,标有精确的情感和流派标签。客观和主观评估表明,XMusic在音乐质量方面明显优于当前最先进的方法。我们的XMusic已被评为2023年WAIC的九个收藏品亮点之一。XMusic的项目主页是https://xmusic-project.github.io。
English
In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quality outputs. This paper presents a generalized symbolic music generation framework, XMusic, which supports flexible prompts (i.e., images, videos, texts, tags, and humming) to generate emotionally controllable and high-quality symbolic music. XMusic consists of two core components, XProjector and XComposer. XProjector parses the prompts of various modalities into symbolic music elements (i.e., emotions, genres, rhythms and notes) within the projection space to generate matching music. XComposer contains a Generator and a Selector. The Generator generates emotionally controllable and melodious music based on our innovative symbolic music representation, whereas the Selector identifies high-quality symbolic music by constructing a multi-task learning scheme involving quality assessment, emotion recognition, and genre recognition tasks. In addition, we build XMIDI, a large-scale symbolic music dataset that contains 108,023 MIDI files annotated with precise emotion and genre labels. Objective and subjective evaluations show that XMusic significantly outperforms the current state-of-the-art methods with impressive music quality. Our XMusic has been awarded as one of the nine Highlights of Collectibles at WAIC 2023. The project homepage of XMusic is https://xmusic-project.github.io.

Summary

AI-Generated Summary

PDF92January 16, 2025