DRT-o1:透過長度連貫的思考鏈路進行優化的深度推理翻譯
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
December 23, 2024
作者: Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou
cs.AI
摘要
最近,類似 O1 的模型已經成為代表性的例子,展示了在推理任務中,如數學和編碼任務中,長思維(CoT)的有效性。在本文中,我們介紹了 DRT-o1,試圖將長 CoT 的成功帶入神經機器翻譯(MT)。具體來說,鑒於可能涉及比喻和隱喻的文學書籍,在實踐中將這些文本翻譯成目標語言是非常困難的,這是由於文化差異。在這些情況下,直譯通常無法有效傳達預期的含義。即使對於專業的人類翻譯人員,也必須仔細考慮在整個翻譯過程中保留語義。為了模擬LLMs在MT中的長思維能力,我們首先從現有的文學書籍中挖掘包含比喻或隱喻的句子,然後開發一個多智能體框架來通過長思維翻譯這些句子。在多智能體框架中,使用一個翻譯器來根據顧問提供的建議迭代地翻譯源句。為了確保長思維的有效性,還雇用了一個評估器來判斷當前回合的翻譯是否比上一個更好。通過這種方式,我們收集了數以萬計的長思維MT數據,用於訓練我們的DRT-o1。在文學翻譯上的實驗結果展示了DRT-o1的有效性。使用Qwen2.5-7B和Qwen2.5-14B作為骨幹,DRT-o1帶來的改進達到了7.33~8.26 BLEU和1.66~3.36 CometScore。此外,DRT-o1-7B可以比QwQ-32B-Preview高出7.82 BLEU和1.46 CometScore,顯示了其有效性。該項目可在https://github.com/krystalan/DRT-o1找到。
English
Recently, O1-like models have emerged as representative examples,
illustrating the effectiveness of long chain-of-thought (CoT) in reasoning
tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an
attempt to bring the success of long CoT to neural machine translation (MT).
Specifically, in view of the literature books that might involve similes and
metaphors, translating these texts to a target language is very difficult in
practice due to cultural differences. In such cases, literal translation often
fails to convey the intended meaning effectively. Even for professional human
translators, considerable thought must be given to preserving semantics
throughout the translation process. To simulate LLMs' long thought ability in
MT, we first mine sentences containing similes or metaphors from existing
literature books, and then develop a multi-agent framework to translate these
sentences via long thought. In the multi-agent framework, a translator is used
to iteratively translate the source sentence under the suggestions provided by
an advisor. To ensure the effectiveness of the long thoughts, an evaluator is
also employed to judge whether the translation in the current round is better
than the previous one or not. In this manner, we collect tens of thousands of
long-thought MT data, which is used to train our DRT-o1. The experimental
results on literature translation demonstrate the effectiveness of the DRT-o1.
Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by
DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can
outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its
effectiveness. The project is available at https://github.com/krystalan/DRT-o1Summary
AI-Generated Summary