多语言思维能否增强大语言模型的推理能力？

摘要

先前的研究表明，大型语言模型存在显著的“英语偏向性”，即在任务以英语呈现时，其表现往往更优。有趣的是，我们观察到在某些推理任务中，使用其他特定语言反而能获得优于英语的表现。然而，这一现象尚未得到充分探索。本文旨在探讨多语言在推理任务中的潜力上限，提出多语言推理相较于仅用英语推理，不仅有望显著提升（约10个Acc@k点），而且展现出更强的鲁棒性（对翻译质量和语言选择的变动具有容忍度）。除了分析这一上限背后的原因及实现过程中面临的挑战外，我们还发现，常见的答案选择方法因其局限性和偏见，无法触及这一上限。这些发现为未来研究如何充分利用多语言推理在大型语言模型中的潜力，指明了方向。

English

Previous work indicates that large language models exhibit a significant "English bias", i.e. they often perform better when tasks are presented in English. Interestingly, we have observed that using certain other languages in reasoning tasks can yield better performance than English. However, this phenomenon remains under-explored. In this paper, we explore the upper bound of harnessing multilingualism in reasoning tasks, suggesting that multilingual reasoning promises significantly (by nearly 10 Acc@k points) and robustly (tolerance for variations in translation quality and language choice) higher upper bounds than English-only reasoning. Besides analyzing the reason behind the upper bound and challenges in reaching it, we also find that common answer selection methods cannot achieve this upper bound, due to their limitations and biases. These insights could pave the way for future research aimed at fully harnessing the potential of multilingual reasoning in LLMs.

多语言思维能否增强大语言模型的推理能力？

Could Thinking Multilingually Empower LLM Reasoning?

摘要

Summary

Support

Support