GPT 또는 BERT: 왜 둘 다 선택하지 않을까요?

초록

마스크된 언어 모델링과 인과적 언어 모델링을 병합하는 간단한 방법을 제시합니다. 이러한 하이브리드 훈련 목표는 단일 트랜스포머 스택 내에서 두 모델링 패러다임의 장점을 결합한 모델을 얻게 됩니다: GPT-BERT는 표준 인과적 또는 마스크된 언어 모델과 마찬가지로 투명하게 사용할 수 있습니다. 우리는 이 유연한 행동을 가능케 하는 사전 훈련 과정을 BabyLM Challenge 2024에서 테스트했습니다. 결과는 하이브리드 사전 훈련이 오직 마스크만 사용한 모델이나 오직 인과적인 모델을 능가한다는 것을 보여줍니다. 우리는 모델, 훈련 말뭉치, 그리고 코드를 공개적으로 배포합니다.

English

We present a simple way to merge masked language modeling with causal language modeling. This hybrid training objective results in a model that combines the strengths of both modeling paradigms within a single transformer stack: GPT-BERT can be transparently used like any standard causal or masked language model. We test the pretraining process that enables this flexible behavior on the BabyLM Challenge 2024. The results show that the hybrid pretraining outperforms masked-only or causal-only models. We openly release the models, training corpora and code.

GPT 또는 BERT: 왜 둘 다 선택하지 않을까요?

GPT or BERT: why not both?

초록

Summary

Support