R1-T1:通过推理学习全面激励大语言模型的翻译能力
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
February 27, 2025
作者: Minggui He, Yilun Liu, Shimin Tao, Yuanchang Luo, Hongyong Zeng, Chang Su, Li Zhang, Hongxia Ma, Daimeng Wei, Weibin Meng, Hao Yang, Boxing Chen, Osamu Yoshie
cs.AI
摘要
尽管近期在推理增强的大型语言模型(LLMs)如DeepSeek-R1方面取得了突破,将推理时推理融入机器翻译(MT)——人类译者自然采用结构化、多层次思维链(CoTs)的领域——仍待深入探索。现有方法要么为特定MT子任务(如文学翻译)设计固定的CoT,要么依赖于合成与人类思维不一致的CoTs及易引发灾难性遗忘的监督微调(SFT),这限制了它们适应多样化翻译场景的能力。本文介绍了R1-Translator(R1-T1),一种通过强化学习(RL)结合包含六种常见模式的人类对齐CoTs,实现通用MT推理时推理的新框架。我们的方法开创了三大创新:(1)将基于推理的翻译扩展至MT子任务之外,涵盖六种语言及多样任务(如法律/医疗领域适应、习语解析);(2)形式化六种专家策划的CoT模板,这些模板反映了如上下文感知的意译及回译等混合人类策略;(3)通过带有KL约束奖励的RL,实现自我进化的CoT发现及抗遗忘适应。实验结果显示,在Flores-101测试集上,21种语言及80个翻译方向上的翻译性能稳步提升,特别是在训练中未见的15种语言上,相较于单纯的SFT,其通用多语言能力得以保持。
English
Despite recent breakthroughs in reasoning-enhanced large language models
(LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine
translation (MT), where human translators naturally employ structured,
multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored.
Existing methods either design a fixed CoT tailored for a specific MT sub-task
(e.g., literature translation), or rely on synthesizing CoTs unaligned with
humans and supervised fine-tuning (SFT) prone to catastrophic forgetting,
limiting their adaptability to diverse translation scenarios. This paper
introduces R1-Translator (R1-T1), a novel framework to achieve inference-time
reasoning for general MT via reinforcement learning (RL) with human-aligned
CoTs comprising six common patterns. Our approach pioneers three innovations:
(1) extending reasoning-based translation beyond MT sub-tasks to six languages
and diverse tasks (e.g., legal/medical domain adaptation, idiom resolution);
(2) formalizing six expert-curated CoT templates that mirror hybrid human
strategies like context-aware paraphrasing and back translation; and (3)
enabling self-evolving CoT discovery and anti-forgetting adaptation through RL
with KL-constrained rewards. Experimental results indicate a steady translation
performance improvement in 21 languages and 80 translation directions on
Flores-101 test set, especially on the 15 languages unseen from training, with
its general multilingual abilities preserved compared with plain SFT.Summary
AI-Generated Summary