ChatPaper.aiChatPaper

超越仅解码器:大型语言模型亦可胜任机器翻译的编码任务

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

March 9, 2025
作者: Yingfeng Luo, Tong Zheng, Yongyu Mu, Bei Li, Qinghong Zhang, Yongqi Gao, Ziqiang Xu, Peinan Feng, Xiaoqian Liu, Tong Xiao, Jingbo Zhu
cs.AI

摘要

随着大型语言模型(LLMs)的出现,神经机器翻译(NMT)领域发生了显著变化。近期自然语言处理(NLP)的研究重点多集中于利用单一预训练的Transformer解码器来建模机器翻译及其他诸多问题,而早先NMT模型中作为标准的编码器-解码器架构则相对较少受到关注。本文中,我们通过融合LLMs与NMT的世界,探索了通用、高效且易于优化的翻译模型。我们将LLMs应用于NMT编码,同时保持NMT解码器不变,并开发了使LLMs更好地与NMT解码器协同工作的方法。此外,我们构建了一个包含多任务的新数据集,以评估机器翻译系统在各类任务上的泛化能力。在WMT及我们数据集上的评估表明,采用我们的方法在翻译质量上达到或超越了一系列基线模型,同时实现了2.4至6.5倍的推理加速,以及KV缓存内存占用减少75%。该方法还展示出在多种翻译相关任务上的强大泛化能力。
English
The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve 2.4 sim 6.5 times inference speedups and a 75% reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

Summary

AI-Generated Summary

PDF42March 12, 2025