低秩适配器遇上神经架构搜索用于LLM压缩

摘要

大型语言模型（LLMs）的快速扩展给微调和部署所需的计算资源带来了重大挑战。最近在低秩适配器方面取得的进展展示了它们在这些模型的参数高效微调（PEFT）中的功效。本回顾性论文全面讨论了将低秩表示与神经架构搜索（NAS）技术相结合的创新方法，特别是权重共享的超网络。通过整合这些方法论，开发了压缩和微调大型预训练模型的稳健解决方案。我们的分析突显了这些组合策略在民主化LLMs的使用方面的潜力，使其更易于在资源受限环境中部署。由此产生的模型具有较小的内存占用和更快的推理时间，为LLMs的更实用和可扩展应用铺平了道路。模型和代码可在以下链接找到：https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning。

English

The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

低秩适配器遇上神经架构搜索用于LLM压缩

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

摘要

Summary

Support