ChatPaper.aiChatPaper

在大型語言模型中進行零-shot 跨語言轉移的層交換

Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models

October 2, 2024
作者: Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj, Rui Hou, Nayan Singhal, Hongjiang Lv, Bing Liu
cs.AI

摘要

模型合併,例如模型混合,是將具有相同架構的不同模型結合在一起而無需進行進一步訓練的做法。在這項工作中,我們提出了一種模型合併方法,以應對在非英語語言中為目標任務微調大型語言模型(LLMs)時所面臨的困難,因為特定任務的數據通常是不可用的。我們專注於數學推理,在缺乏語言內數學數據的情況下,通過組合語言和數學能力來促進跨語言轉移。從相同的預訓練模型開始,我們在英語數學指導數據和目標語言的通用指導數據上對數學專家進行單獨的微調。然後,我們直接用語言專家的層替換數學專家的頂部和底部變壓器層,從而增強目標語言中的數學性能。結果合併的模型在四種主要語言上的數學基準測試MGSM上表現優於單獨的專家和其他合併方法,其中數學指導數據稀缺。此外,這種層替換方法簡單、成本低廉且直觀,因為它基於對每個專家微調期間最重要參數變化的解釋性分析。成功以這種方式重新組合LLMs以進行跨語言轉移的能力,為將來結合模型專業知識、創建模塊化解決方案以及在事後跨語言轉移推理能力開啟了新的可能性。
English
Model merging, such as model souping, is the practice of combining different models with the same architecture together without further training. In this work, we present a model merging methodology that addresses the difficulty of fine-tuning Large Language Models (LLMs) for target tasks in non-English languages, where task-specific data is often unavailable. We focus on mathematical reasoning and without in-language math data, facilitate cross-lingual transfer by composing language and math capabilities. Starting from the same pretrained model, we fine-tune separate "experts" on math instruction data in English and on generic instruction data in the target language. We then replace the top and bottom transformer layers of the math expert directly with layers from the language expert, which consequently enhances math performance in the target language. The resulting merged models outperform the individual experts and other merging methods on the math benchmark, MGSM, by 10% across four major languages where math instruction data is scarce. In addition, this layer swapping is simple, inexpensive, and intuitive, as it is based on an interpretative analysis of the most important parameter changes during the fine-tuning of each expert. The ability to successfully re-compose LLMs for cross-lingual transfer in this manner opens up future possibilities to combine model expertise, create modular solutions, and transfer reasoning capabilities across languages all post hoc.

Summary

AI-Generated Summary

PDF53November 16, 2024