DiffVox:一種可微分模型,用於捕捉與分析專業效果分佈
DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions
April 20, 2025
作者: Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji
cs.AI
摘要
本研究介紹了一種新穎且可解釋的模型——DiffVox,用於匹配音樂製作中的聲效處理。DiffVox,全稱為「可微分聲效處理」,整合了參數均衡、動態範圍控制、延遲和混響等效果,並通過高效的微分實現來支持基於梯度的參數估計優化。聲效預設從兩個數據集中提取,包括來自MedleyDB的70首曲目和來自私人收藏的365首曲目。參數相關性分析揭示了效果與參數之間的強烈關聯,例如高通和低架濾波器常共同作用以塑造低頻部分,而延遲時間則與延遲信號的強度相關。主成分分析揭示了與McAdams音色維度的聯繫,其中最重要的成分調節感知的空間感,而次要成分則影響頻譜亮度。統計測試確認了參數分佈的非高斯性質,凸顯了聲效處理空間的複雜性。這些關於參數分佈的初步發現為未來聲效建模和自動混音的研究奠定了基礎。我們的源代碼和數據集可在https://github.com/SonyResearch/diffvox 獲取。
English
This study introduces a novel and interpretable model, DiffVox, for matching
vocal effects in music production. DiffVox, short for ``Differentiable Vocal
Fx", integrates parametric equalisation, dynamic range control, delay, and
reverb with efficient differentiable implementations to enable gradient-based
optimisation for parameter estimation. Vocal presets are retrieved from two
datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private
collection. Analysis of parameter correlations highlights strong relationships
between effects and parameters, such as the high-pass and low-shelf filters
often behaving together to shape the low end, and the delay time correlates
with the intensity of the delayed signals. Principal component analysis reveals
connections to McAdams' timbre dimensions, where the most crucial component
modulates the perceived spaciousness while the secondary components influence
spectral brightness. Statistical testing confirms the non-Gaussian nature of
the parameter distribution, highlighting the complexity of the vocal effects
space. These initial findings on the parameter distributions set the foundation
for future research in vocal effects modelling and automatic mixing. Our source
code and datasets are accessible at https://github.com/SonyResearch/diffvox.Summary
AI-Generated Summary