ChatPaper.aiChatPaper

TESS 2:一款大规模通用扩散语言模型

TESS 2: A Large-Scale Generalist Diffusion Language Model

February 19, 2025
作者: Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan
cs.AI

摘要

我们推出了TESS 2,这是一款通用的指令跟随扩散语言模型,其性能超越了当前经过指令调优的扩散模型,并在某些情况下与强大的自回归(AR)模型相媲美甚至更胜一筹。我们通过首先采用一个强大的AR模型,利用常规的交叉熵作为扩散损失进行持续预训练,随后进行进一步的指令调优来训练TESS 2。我们发现,适应训练以及基础模型的选择对于训练出优秀的指令跟随扩散模型至关重要。此外,我们提出了奖励引导,这是一种新颖且模块化的推理时引导方法,无需训练底层模型即可对齐模型输出。最后,我们展示了TESS 2随着推理时计算资源的增加而进一步改进,凸显了扩散语言模型在推理时对计算量进行细粒度控制的价值。代码和模型可在https://github.com/hamishivi/tess-2获取。
English
We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.

Summary

AI-Generated Summary

PDF63February 20, 2025