미세 조정과 모델 병합을 통해 범용 특징 추적하기

초록

우리는 다른 텍스트 도메인에서 세밀하게 조정된 모델들 사이에서 특징이 어떻게 발생하고 사라지며 유지되는지 연구합니다. 더 구체적으로는, 우리는 BabyLM 말뭉치와 The Stack의 Python 코드 컬렉션을 결합하여 훈련된 기본 단일 레이어 Transformer 언어 모델에서 시작합니다. 이 기본 모델은 각각 TinyStories와 Lua 프로그래밍 언어 두 새로운 텍스트 도메인으로 적응되며, 그런 다음 이 두 모델은 구면 선형 보간을 사용하여 병합됩니다. 우리의 탐구는 소규모 모델과 희소 오토인코더를 사용하여 전형적인 전이 학습 시나리오에서 특징의 안정성과 변형에 대한 심층적인 통찰력을 제공하는 것을 목표로 합니다.

English

We study how features emerge, disappear, and persist across models fine-tuned on different domains of text. More specifically, we start from a base one-layer Transformer language model that is trained on a combination of the BabyLM corpus, and a collection of Python code from The Stack. This base model is adapted to two new domains of text: TinyStories, and the Lua programming language, respectively; and then these two models are merged using these two models using spherical linear interpolation. Our exploration aims to provide deeper insights into the stability and transformation of features across typical transfer-learning scenarios using small-scale models and sparse auto-encoders.

미세 조정과 모델 병합을 통해 범용 특징 추적하기

Tracking Universal Features Through Fine-Tuning and Model Merging

초록

Summary

Support