PerceiverS：一種具有效分割的多尺度Perceiver，用於長期具表現力的象徵音樂生成

摘要

音樂生成已取得顯著進展，特別是在音頻生成領域。然而，生成既具有長期結構又具有表現力的象徵音樂仍然是一個重大挑戰。在本文中，我們提出了PerceiverS（分割和尺度），這是一種新穎的架構，旨在通過利用有效分割和多尺度注意機制來解決這個問題。我們的方法通過同時學習長期結構依賴性和短期表現細節來增強象徵音樂生成。通過在多尺度環境中結合交叉注意力和自注意力，PerceiverS捕捉了長程音樂結構，同時保留了表現細微差異。所提出的模型在Maestro等數據集上進行評估，展示了在生成具有結構一致性和表現變化的連貫且多樣化音樂方面的改進。項目演示和生成的音樂樣本可通過以下鏈接訪問：https://perceivers.github.io。

English

Music generation has progressed significantly, especially in the domain of audio generation. However, generating symbolic music that is both long-structured and expressive remains a significant challenge. In this paper, we propose PerceiverS (Segmentation and Scale), a novel architecture designed to address this issue by leveraging both Effective Segmentation and Multi-Scale attention mechanisms. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. By combining cross-attention and self-attention in a Multi-Scale setting, PerceiverS captures long-range musical structure while preserving performance nuances. The proposed model, evaluated on datasets like Maestro, demonstrates improvements in generating coherent and diverse music with both structural consistency and expressive variation. The project demos and the generated music samples can be accessed through the link: https://perceivers.github.io.

PerceiverS：一種具有效分割的多尺度Perceiver，用於長期具表現力的象徵音樂生成

PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation

摘要

Support