DRT-o1:透過長度連貫的思考鏈路進行優化的深度推理翻譯

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

December 23, 2024
作者: Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou
cs.AI

摘要

最近,類似 O1 的模型已經成為代表性的例子,展示了在推理任務中,如數學和編碼任務中,長思維(CoT)的有效性。在本文中,我們介紹了 DRT-o1,試圖將長 CoT 的成功帶入神經機器翻譯(MT)。具體來說,鑒於可能涉及比喻和隱喻的文學書籍,在實踐中將這些文本翻譯成目標語言是非常困難的,這是由於文化差異。在這些情況下,直譯通常無法有效傳達預期的含義。即使對於專業的人類翻譯人員,也必須仔細考慮在整個翻譯過程中保留語義。為了模擬LLMs在MT中的長思維能力,我們首先從現有的文學書籍中挖掘包含比喻或隱喻的句子,然後開發一個多智能體框架來通過長思維翻譯這些句子。在多智能體框架中,使用一個翻譯器來根據顧問提供的建議迭代地翻譯源句。為了確保長思維的有效性,還雇用了一個評估器來判斷當前回合的翻譯是否比上一個更好。通過這種方式,我們收集了數以萬計的長思維MT數據,用於訓練我們的DRT-o1。在文學翻譯上的實驗結果展示了DRT-o1的有效性。使用Qwen2.5-7B和Qwen2.5-14B作為骨幹,DRT-o1帶來的改進達到了7.33~8.26 BLEU和1.66~3.36 CometScore。此外,DRT-o1-7B可以比QwQ-32B-Preview高出7.82 BLEU和1.46 CometScore,顯示了其有效性。該項目可在https://github.com/krystalan/DRT-o1找到。
English
Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an attempt to bring the success of long CoT to neural machine translation (MT). Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences. In such cases, literal translation often fails to convey the intended meaning effectively. Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process. To simulate LLMs' long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought. In the multi-agent framework, a translator is used to iteratively translate the source sentence under the suggestions provided by an advisor. To ensure the effectiveness of the long thoughts, an evaluator is also employed to judge whether the translation in the current round is better than the previous one or not. In this manner, we collect tens of thousands of long-thought MT data, which is used to train our DRT-o1. The experimental results on literature translation demonstrate the effectiveness of the DRT-o1. Using Qwen2.5-7B and Qwen2.5-14B as the backbones, the improvement brought by DRT-o1 achieves 7.33~8.26 BLEU and 1.66~3.36 CometScore. Besides, DRT-o1-7B can outperform QwQ-32B-Preview by 7.82 BLEU and 1.46 CometScore, showing its effectiveness. The project is available at https://github.com/krystalan/DRT-o1

Summary

AI-Generated Summary

PDF214December 24, 2024