人口感知擴散用於時間序列生成
Population Aware Diffusion for Time Series Generation
January 1, 2025
作者: Yang Li, Han Meng, Zhenyu Bi, Ingolv T. Urnes, Haipeng Chen
cs.AI
摘要
擴散模型在生成高質量時間序列(TS)數據方面展現了令人期待的能力。儘管最初取得成功,現有研究主要集中在個體層面數據的真實性上,但較少關注保留整個數據集上的人口層級特性。這些人口層級特性包括每個維度的值分佈以及不同維度之間某些功能依賴(例如,交叉相關,CC)的分佈。例如,在生成房屋能耗時間序列數據時,應當保留室外溫度和廚房溫度的值分佈,以及它們之間的CC分佈。保留這些TS人口層級特性對於保持數據集的統計見解、減輕模型偏差以及增強下游任務(如TS預測)至關重要。然而,現有模型往往忽視了這一點。因此,現有模型生成的數據往往與原始數據存在分佈偏移。我們提出了面向時間序列的人口感知擴散(PaD-TS),這是一種新的TS生成模型,更好地保留了人口層級特性。PaD-TS的關鍵創新包括1)一種明確納入TS人口層級特性保留的新訓練方法,以及2)一種更好地捕捉TS數據結構的新雙通道編碼器模型架構。在主要基準數據集上的實證結果顯示,PaD-TS可以將真實數據和合成數據之間的平均CC分佈偏移得分提高5.9倍,同時保持與最先進模型在個體層面真實性上相當的性能。
English
Diffusion models have shown promising ability in generating high-quality time
series (TS) data. Despite the initial success, existing works mostly focus on
the authenticity of data at the individual level, but pay less attention to
preserving the population-level properties on the entire dataset. Such
population-level properties include value distributions for each dimension and
distributions of certain functional dependencies (e.g., cross-correlation, CC)
between different dimensions. For instance, when generating house energy
consumption TS data, the value distributions of the outside temperature and the
kitchen temperature should be preserved, as well as the distribution of CC
between them. Preserving such TS population-level properties is critical in
maintaining the statistical insights of the datasets, mitigating model bias,
and augmenting downstream tasks like TS prediction. Yet, it is often overlooked
by existing models. Hence, data generated by existing models often bear
distribution shifts from the original data. We propose Population-aware
Diffusion for Time Series (PaD-TS), a new TS generation model that better
preserves the population-level properties. The key novelties of PaD-TS include
1) a new training method explicitly incorporating TS population-level property
preservation, and 2) a new dual-channel encoder model architecture that better
captures the TS data structure. Empirical results in major benchmark datasets
show that PaD-TS can improve the average CC distribution shift score between
real and synthetic data by 5.9x while maintaining a performance comparable to
state-of-the-art models on individual-level authenticity.Summary
AI-Generated Summary