ChatPaper.aiChatPaper

通過從自回歸模型進行調適來擴展擴散語言模型

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

October 23, 2024
作者: Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong
cs.AI

摘要

擴散語言模型(DLMs)已成為一種有前途的新範式,用於文本生成建模,潛在地解決自回歸(AR)模型的局限性。然而,目前的DLMs在規模上研究較小,與其AR對應物相比,並且缺乏在語言建模基準上的公平比較。此外,從頭開始規模化訓練擴散模型仍然具有挑戰性。鑒於開源AR語言模型的普及,我們提議適應這些模型來構建文本擴散模型。我們展示了AR和擴散建模目標之間的聯繫,並引入了一種簡單的持續預訓練方法,用於訓練擴散模型。通過對語言建模、推理和常識基準的系統性評估,我們展示了我們可以將範圍從127M擴展到7B參數(GPT2和LLaMA)的AR模型轉換為擴散模型DiffuGPT和DiffuLLaMA,使用少於200B令牌進行訓練。我們的實驗結果顯示,這些模型優於先前的DLMs,並與其AR對應物競爭。我們釋出了一套DLMs(具有127M、355M和7B參數),能夠生成流暢的文本,執行上下文學習,填補中間而無需重新排序提示,並遵循指示。https://github.com/HKUNLP/DiffuLLaMA。
English
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions https://github.com/HKUNLP/DiffuLLaMA.

Summary

AI-Generated Summary

PDF162November 16, 2024