저차원 어댑터가 LLM 압축을 위한 신경구조 탐색과 만난다

초록

대형 언어 모델(LLMs)의 급격한 확장은 세밀한 조정과 배포에 필요한 계산 자원에 관한 중요한 도전을 제기했습니다. 최근에는 저랭크 어댑터의 발전이 이러한 모델의 매개 효율적인 세밀한 조정(PEFT)에서 효과를 입증했습니다. 본 회고 논문은 저랭크 표현과 신경망 구조 탐색(NAS) 기법, 특히 가중치 공유 슈퍼 네트워크와 상호 작용하는 혁신적인 접근 방법을 체계적으로 논의합니다. 이러한 방법론을 통합하여 대규모 사전 훈련된 모델을 압축하고 세밀하게 조정하는 견고한 솔루션이 개발되었습니다. 저희의 분석은 이러한 복합 전략이 LLMs의 사용을 대중화시키는 잠재력을 강조하며, 이를 통해 자원 제약 환경에서의 배포에 더 접근하기 쉽게 만들어줍니다. 결과 모델은 메모리 풋프린트를 줄이고 추론 시간을 단축시켜 더 실용적이고 확장 가능한 LLMs 응용 프로그램을 위한 길을 열어갑니다. 모델과 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

English

The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

저차원 어댑터가 LLM 압축을 위한 신경구조 탐색과 만난다

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression

초록

Support