透過學習嵌入傳播促進大型語言模型的俄語適應
Facilitating large language model Russian adaptation with Learned Embedding Propagation
December 30, 2024
作者: Mikhail Tikhomirov, Daniil Chernyshev
cs.AI
摘要
大型語言模型(LLM)技術的快速進展導致強大的開源指導調整的LLM的推出,其文本生成質量與GPT-4等最先進的對應模型相同。雖然這些模型的出現加速了LLM技術在敏感信息環境中的應用,但這些模型的作者並未披露複製結果所需的訓練數據,因此使這些成就僅限於模型本身。由於這些開源模型也是多語言的,這反過來降低了訓練特定語言的LLM的好處,因為提高的推理計算效率成為這種昂貴程序的唯一確定優勢。由於缺乏高質量的指導調整數據,更具成本效益的選項,如詞彙擴展和隨後持續的預訓練,也受到限制,因為這是導致結果LLM任務解決能力的主要因素。為了應對這些限制並降低語言適應流程的成本,我們提出了學習嵌入傳播(LEP)。與現有方法不同,我們的方法由於對現有LLM知識的影響極小,因此對訓練數據量的要求較低,我們使用新型的特設嵌入傳播程序加以強化,該程序允許跳過指導調整步驟,而是將新的語言知識直接植入任何現有的指導調整變體中。我們對LLaMa-3-8B和Mistral-7B進行了四種俄語詞彙適應的評估,結果顯示LEP與傳統的指導調整方法競爭力相當,實現了與OpenChat 3.5和LLaMa-3-8B-Instruct相當的性能,通過自校準和持續調整進一步提高了任務解決能力。
English
Rapid advancements of large language model (LLM) technologies led to the
introduction of powerful open-source instruction-tuned LLMs that have the same
text generation quality as the state-of-the-art counterparts such as GPT-4.
While the emergence of such models accelerates the adoption of LLM technologies
in sensitive-information environments the authors of such models don not
disclose the training data necessary for replication of the results thus making
the achievements model-exclusive. Since those open-source models are also
multilingual this in turn reduces the benefits of training a language specific
LLMs as improved inference computation efficiency becomes the only guaranteed
advantage of such costly procedure. More cost-efficient options such as
vocabulary extension and subsequent continued pre-training are also inhibited
by the lack of access to high-quality instruction-tuning data since it is the
major factor behind the resulting LLM task-solving capabilities. To address the
limitations and cut the costs of the language adaptation pipeline we propose
Learned Embedding Propagation (LEP). Unlike existing approaches our method has
lower training data size requirements due to minimal impact on existing LLM
knowledge which we reinforce using novel ad-hoc embedding propagation procedure
that allows to skip the instruction-tuning step and instead implant the new
language knowledge directly into any existing instruct-tuned variant. We
evaluated four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B,
showing that LEP is competitive with traditional instruction-tuning methods,
achieving performance comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct, with
further improvements via self-calibration and continued tuning enhancing
task-solving capabilities.Summary
AI-Generated Summary