DRT-o1:通过长链推理优化的深度翻译
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
December 23, 2024
作者: Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou
cs.AI
摘要
最近,类似于O1的模型已经成为代表性示例,展示了在推理任务(如数学和编码任务)中长思维链(CoT)的有效性。在本文中,我们介绍了DRT-o1,旨在将长CoT的成功引入神经机器翻译(MT)中。具体而言,鉴于可能涉及比喻和隐喻的文学作品,将这些文本翻译成目标语言在实践中非常困难,因为存在文化差异。在这种情况下,逐字翻译通常无法有效传达预期的含义。即使对于专业的人类翻译人员,也必须认真考虑在整个翻译过程中保留语义。为了模拟LLM在MT中的长思维能力,我们首先从现有文学作品中挖掘包含比喻或隐喻的句子,然后开发一个多代理框架来通过长思维翻译这些句子。在多代理框架中,使用一个翻译器来根据顾问提供的建议迭代地翻译源句。为了确保长思维的有效性,还雇用了一个评估器来判断当前轮次的翻译是否比上一轮更好。通过这种方式,我们收集了数万条长思维MT数据,用于训练我们的DRT-o1。文学翻译的实验结果显示了DRT-o1的有效性。使用Qwen2.5-7B和Qwen2.5-14B作为骨干,DRT-o1带来的改进达到了7.33~8.26 BLEU和1.66~3.36 CometScore。此外,DRT-o1-7B可以比QwQ-32B-Preview提高7.82 BLEU和1.46 CometScore,显示了其有效性。该项目可在https://github.com/krystalan/DRT-o1找到。
English
Recently, O1-like models have emerged as representative examples,
illustrating the effectiveness of long chain-of-thought (CoT) in reasoning
tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an
attempt to bring the success of long CoT to neural machine translation (MT).
Specifically, in view of the literature books that might involve similes and
metaphors, translating these texts to a target language is very difficult in
practice due to cultural differences. In such cases, literal translation often
fails to convey the intended meaning effectively. Even for professional human
translators, considerable thought must be given to preserving semantics
throughout the translation process. To simulate LLMs' long thought ability in
MT, we first mine sentences containing similes or metaphors from existing
literature books, and then develop a multi-agent framework to translate these
sentences via long thought. In the multi-agent framework, a translator is used
to iteratively translate the source sentence under the suggestions provided by
an advisor. To ensure the effectiveness of the long thoughts, an evaluator is
also employed to judge whether the translation in the current round is better
than the previous one or not. In this manner, we collect tens of thousands of
long-thought MT data, which is used to train our DRT-o1. The experimental
results on literature translation demonstrate the effectiveness of the DRT-o1.
Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by
DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can
outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its
effectiveness. The project is available at https://github.com/krystalan/DRT-o1Summary
AI-Generated Summary