通過微調和模型合併追蹤通用特徵
Tracking Universal Features Through Fine-Tuning and Model Merging
October 16, 2024
作者: Niels Horn, Desmond Elliott
cs.AI
摘要
我們研究特徵如何在不同文本領域微調模型時出現、消失和持久。更具體地說,我們從一個基礎單層Transformer語言模型出發,該模型在BabyLM語料庫和The Stack的Python代碼集合的組合上進行訓練。這個基礎模型被適應到兩個新的文本領域:TinyStories和Lua編程語言,然後這兩個模型通過球面線性插值合併。我們的探索旨在深入了解特徵在典型的遷移學習場景中的穩定性和轉變,使用小規模模型和稀疏自編碼器。
English
We study how features emerge, disappear, and persist across models fine-tuned
on different domains of text. More specifically, we start from a base one-layer
Transformer language model that is trained on a combination of the BabyLM
corpus, and a collection of Python code from The Stack. This base model is
adapted to two new domains of text: TinyStories, and the Lua programming
language, respectively; and then these two models are merged using these two
models using spherical linear interpolation. Our exploration aims to provide
deeper insights into the stability and transformation of features across
typical transfer-learning scenarios using small-scale models and sparse
auto-encoders.Summary
AI-Generated Summary