利用学习嵌入传播促进大型语言模型的俄语适应。
Facilitating large language model Russian adaptation with Learned Embedding Propagation
December 30, 2024
作者: Mikhail Tikhomirov, Daniil Chernyshev
cs.AI
摘要
大型语言模型(LLM)技术的快速发展导致了功能强大的开源指导调整的LLMs的推出,其文本生成质量与GPT-4等最先进的模型相当。虽然这类模型的出现加速了LLM技术在敏感信息环境中的采用,但这些模型的作者并未披露用于复制结果所必需的训练数据,从而使这些成就成为特定模型专属。由于这些开源模型也是多语言的,这反过来减少了训练特定语言的LLMs的好处,因为提高推理计算效率成为了这种昂贵过程的唯一可保证的优势。由于无法获取高质量的指导调整数据,诸如词汇扩展和随后的持续预训练等更具成本效益的选择也受到了限制,因为这是导致最终LLM任务解决能力的主要因素。为了解决这些限制并降低语言适应流程的成本,我们提出了学习嵌入传播(LEP)。与现有方法不同,我们的方法由于对现有LLM知识影响较小而具有更低的训练数据量要求,我们使用新颖的特设嵌入传播过程来加强这一点,该过程允许跳过指导调整步骤,而是直接将新的语言知识植入到任何现有的指导调整变体中。我们对LLaMa-3-8B和Mistral-7B进行了四种俄语词汇适应的评估,结果显示LEP与传统的指导调整方法相竞争,性能可与OpenChat 3.5和LLaMa-3-8B-Instruct相媲美,通过自校准和持续调整进一步提升了任务解决能力。
English
Rapid advancements of large language model (LLM) technologies led to the
introduction of powerful open-source instruction-tuned LLMs that have the same
text generation quality as the state-of-the-art counterparts such as GPT-4.
While the emergence of such models accelerates the adoption of LLM technologies
in sensitive-information environments the authors of such models don not
disclose the training data necessary for replication of the results thus making
the achievements model-exclusive. Since those open-source models are also
multilingual this in turn reduces the benefits of training a language specific
LLMs as improved inference computation efficiency becomes the only guaranteed
advantage of such costly procedure. More cost-efficient options such as
vocabulary extension and subsequent continued pre-training are also inhibited
by the lack of access to high-quality instruction-tuning data since it is the
major factor behind the resulting LLM task-solving capabilities. To address the
limitations and cut the costs of the language adaptation pipeline we propose
Learned Embedding Propagation (LEP). Unlike existing approaches our method has
lower training data size requirements due to minimal impact on existing LLM
knowledge which we reinforce using novel ad-hoc embedding propagation procedure
that allows to skip the instruction-tuning step and instead implant the new
language knowledge directly into any existing instruct-tuned variant. We
evaluated four Russian vocabulary adaptations for LLaMa-3-8B and Mistral-7B,
showing that LEP is competitive with traditional instruction-tuning methods,
achieving performance comparable to OpenChat 3.5 and LLaMa-3-8B-Instruct, with
further improvements via self-calibration and continued tuning enhancing
task-solving capabilities.Summary
AI-Generated Summary