제어 LLM: LLM에서 지능 유지를 위한 제어된 진화

초록

대규모 언어 모델(LLM)은 상당한 컴퓨팅 자원을 요구하여, 처음부터 재훈련 없이 그들의 능력을 향상시키는 것이 중요합니다. 이 도메인에서의 주요 과제 중 하나는 연속 사전 훈련(CPT) 및 연속 지도 미세 조정(CSFT) 중 성능을 저해하는 재앙적인 잊기(CF)입니다. 우리는 제어 LLM(Control LLM)이라는 혁신적인 접근 방식을 제안합니다. 이 방법은 병렬 사전 훈련 및 확장된 트랜스포머 블록을 활용하여 그들의 숨겨진 상태를 보간 전략을 통해 정렬합니다. 이 방법은 기존 작업의 성능을 효과적으로 보존하면서 새로운 지식을 매끄럽게 통합합니다. 광범위한 실험은 제어 LLM의 CPT 및 CSFT에서의 효과를 입증합니다. Llama3.1-8B-Instruct에서는 수학적 추론(+14.4%의 Math-Hard) 및 코딩 성능(+10%의 MBPP-PLUS)에서 상당한 향상을 달성합니다. Llama3.1-8B에서는 다국어 능력(+10.6%의 C-Eval, +6.8%의 CMMLU, 그리고 +30.2%의 CMMLU-0shot-CoT)을 향상시킵니다. 이는 동일한 기본 모델에서 튜닝된 오픈 소스 모델 중 최고 성능을 달성하며, 훨씬 적은 데이터와 컴퓨팅을 사용합니다. 중요한 점은 이러한 이득이 강력한 원래 능력을 보존하면서 실현되었으며, 오픈 소스 수학 및 코딩 모델에서의 >35%에 비해 최소한의 저하(<4.3%의 MMLU)가 있습니다. 이 방법은 LinkedIn의 GenAI 기반 구직자 및 광고 제품에 성공적으로 적용되었습니다. 더 나아가는 연구를 지원하기 위해 우리는 훈련 및 평가 코드(https://github.com/linkedin/ControlLLM)와 공개 데이터셋에서 훈련된 모델(https://huggingface.co/ControlLLM)을 커뮤니티에 공개합니다.

English

Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is catastrophic forgetting (CF), which hampers performance during Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT). We propose Control LLM, a novel approach that leverages parallel pre-trained and expanded transformer blocks, aligning their hidden-states through interpolation strategies This method effectively preserves performance on existing tasks while seamlessly integrating new knowledge. Extensive experiments demonstrate the effectiveness of Control LLM in both CPT and CSFT. On Llama3.1-8B-Instruct, it achieves significant improvements in mathematical reasoning (+14.4% on Math-Hard) and coding performance (+10% on MBPP-PLUS). On Llama3.1-8B, it enhances multilingual capabilities (+10.6% on C-Eval, +6.8% on CMMLU, and +30.2% on CMMLU-0shot-CoT). It surpasses existing methods and achieves SOTA among open-source models tuned from the same base model, using substantially less data and compute. Crucially, these gains are realized while preserving strong original capabilities, with minimal degradation (<4.3% on MMLU) compared to >35% in open-source Math and Coding models. This approach has been successfully deployed in LinkedIn's GenAI-powered job seeker and Ads unit products. To support further research, we release the training and evaluation code (https://github.com/linkedin/ControlLLM) along with models trained on public datasets ( https://huggingface.co/ControlLLM) to the community.

제어 LLM: LLM에서 지능 유지를 위한 제어된 진화

Control LLM: Controlled Evolution for Intelligence Retention in LLM

초록

Support