XMusic:朝向通用且可控制的符號音樂生成框架

XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

January 15, 2025
作者: Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu
cs.AI

摘要

近年來,在影像合成和文本生成領域取得了顯著進展,人工智慧生成內容(AIGC)能夠生成與人類相媲美的內容。然而,人工智慧生成的音樂質量尚未達到這一水準,主要是由於有效控制音樂情感和確保高質量輸出的挑戰。本文提出了一個通用的符號音樂生成框架 XMusic,支持靈活的提示(例如圖像、視頻、文本、標籤和哼唱),以生成可控制情感且高質量的符號音樂。XMusic 包含兩個核心組件,XProjector 和 XComposer。XProjector 將各種形式的提示解析為符號音樂元素(例如情感、流派、節奏和音符),在投影空間內生成匹配的音樂。XComposer 包含一個生成器和一個選擇器。生成器基於我們創新的符號音樂表示生成可控制情感且旋律優美的音樂,而選擇器通過構建涉及質量評估、情感識別和流派識別任務的多任務學習方案來識別高質量的符號音樂。此外,我們構建了一個大規模的符號音樂數據集 XMIDI,其中包含 108,023 個帶有精確情感和流派標籤的 MIDI 文件。客觀和主觀評估顯示,XMusic 在音樂質量方面顯著優於當前最先進的方法。我們的 XMusic 被評為 2023 年 WAIC 的九大收藏品之一。XMusic 的項目主頁為 https://xmusic-project.github.io。
English
In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quality outputs. This paper presents a generalized symbolic music generation framework, XMusic, which supports flexible prompts (i.e., images, videos, texts, tags, and humming) to generate emotionally controllable and high-quality symbolic music. XMusic consists of two core components, XProjector and XComposer. XProjector parses the prompts of various modalities into symbolic music elements (i.e., emotions, genres, rhythms and notes) within the projection space to generate matching music. XComposer contains a Generator and a Selector. The Generator generates emotionally controllable and melodious music based on our innovative symbolic music representation, whereas the Selector identifies high-quality symbolic music by constructing a multi-task learning scheme involving quality assessment, emotion recognition, and genre recognition tasks. In addition, we build XMIDI, a large-scale symbolic music dataset that contains 108,023 MIDI files annotated with precise emotion and genre labels. Objective and subjective evaluations show that XMusic significantly outperforms the current state-of-the-art methods with impressive music quality. Our XMusic has been awarded as one of the nine Highlights of Collectibles at WAIC 2023. The project homepage of XMusic is https://xmusic-project.github.io.

Summary

AI-Generated Summary

PDF92January 16, 2025