GPT還是BERT:為什麼不兩者兼得呢?
GPT or BERT: why not both?
October 31, 2024
作者: Lucas Georges Gabriel Charpentier, David Samuel
cs.AI
摘要
我們提出了一種將遮罩語言建模與因果語言建模相結合的簡單方法。這種混合訓練目標導致一個模型,將兩種建模範式的優勢結合在單個變壓器堆中:GPT-BERT可以像任何標準的因果或遮罩語言模型一樣透明地使用。我們在2024年的BabyLM挑戰賽上測試了使這種靈活行為成為可能的預訓練過程。結果顯示,混合預訓練優於僅遮罩或僅因果模型。我們公開發布了模型、訓練語料庫和代碼。
English
We present a simple way to merge masked language modeling with causal
language modeling. This hybrid training objective results in a model that
combines the strengths of both modeling paradigms within a single transformer
stack: GPT-BERT can be transparently used like any standard causal or masked
language model. We test the pretraining process that enables this flexible
behavior on the BabyLM Challenge 2024. The results show that the hybrid
pretraining outperforms masked-only or causal-only models. We openly release
the models, training corpora and code.Summary
AI-Generated Summary