AdaptiVocab：通过轻量级词汇适配提升大语言模型在特定领域的效率

摘要

大型语言模型（LLMs）已展现出作为通用模型的卓越多功能性。然而，其广泛适用性伴随着高昂的计算开销，特别是在自回归解码过程中，每一步都需要进行一次前向传播。在特定领域场景下，通用能力并非必需，可以换取效率的提升。本研究中，我们采用了一种新颖的领域适应视角，通过调整词汇表以适应聚焦的领域，从而降低延迟和计算成本。我们提出了AdaptiVocab，一种端到端的词汇适应方法，旨在提升LLMs在低资源领域中的效率。AdaptiVocab可应用于任何分词器和架构，通过用基于领域特定n-gram的词汇替换原有词汇，减少输入处理和输出生成所需的词汇数量。AdaptiVocab采用现有嵌入的指数加权组合来初始化新的n-词汇嵌入，并实施轻量级的微调阶段，该阶段可在单GPU上高效完成。我们评估了两个7B规模的LLMs在三个细分领域中的效率、生成质量及最终任务表现。结果表明，AdaptiVocab在不影响性能的前提下，减少了超过25%的词汇使用量。

English

Large Language Models (LLMs) have shown impressive versatility as general purpose models. However, their broad applicability comes at a high-cost computational overhead, particularly in auto-regressive decoding where each step requires a forward pass. In domain-specific settings, general-purpose capabilities are unnecessary and can be exchanged for efficiency. In this work, we take a novel perspective on domain adaptation, reducing latency and computational costs by adapting the vocabulary to focused domains of interest. We introduce AdaptiVocab, an end-to-end approach for vocabulary adaptation, designed to enhance LLM efficiency in low-resource domains. AdaptiVocab can be applied to any tokenizer and architecture, modifying the vocabulary by replacing tokens with domain-specific n-gram-based tokens, thereby reducing the number of tokens required for both input processing and output generation. AdaptiVocab initializes new n-token embeddings using an exponentially weighted combination of existing embeddings and employs a lightweight fine-tuning phase that can be efficiently performed on a single GPU. We evaluate two 7B LLMs across three niche domains, assessing efficiency, generation quality, and end-task performance. Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance

AdaptiVocab：通过轻量级词汇适配提升大语言模型在特定领域的效率

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

摘要

Summary

Support

Support