無需調整指示的指示跟隨

摘要

指令微調通常指對語言模型進行指令-回應對的微調。我們發現兩種適應（微調）形式相較於指令微調來說存在缺陷，但仍然能產生指令遵循；我們稱之為隱式指令微調。首先，我們發現指令-回應對並非必要：僅在回應上進行訓練，沒有對應的指令，也能產生指令遵循。這表明預訓練模型具有一種指令-回應映射，透過教導模型所需的回應分佈來揭示。然而，我們後來發現並非必要教導所需的回應分佈：在像詩歌這樣的狹義領域數據上進行指令-回應訓練仍然會導致像食譜生成這樣的廣泛指令遵循行為。特別是，當指令與狹義微調領域中的指令非常不同時，模型的回應不會遵循微調領域的風格。為了開始解釋隱式指令微調，我們假設對語言模型的分佈進行非常簡單的更改就能產生指令遵循。我們通過手寫基於規則的語言模型來支持這一點，在預訓練模型中使用專家乘積生成指令遵循。這些規則是逐漸增加結束序列的概率、懲罰重複，以及均勻改變15個單詞的概率。總之，即使沒有設計為產生指令遵循，所做的適應也可以隱式實現。

English

Instruction tuning commonly means finetuning a language model on instruction-response pairs. We discover two forms of adaptation (tuning) that are deficient compared to instruction tuning, yet still yield instruction following; we call this implicit instruction tuning. We first find that instruction-response pairs are not necessary: training solely on responses, without any corresponding instructions, yields instruction following. This suggests pretrained models have an instruction-response mapping which is revealed by teaching the model the desired distribution of responses. However, we then find it's not necessary to teach the desired distribution of responses: instruction-response training on narrow-domain data like poetry still leads to broad instruction-following behavior like recipe generation. In particular, when instructions are very different from those in the narrow finetuning domain, models' responses do not adhere to the style of the finetuning domain. To begin to explain implicit instruction tuning, we hypothesize that very simple changes to a language model's distribution yield instruction following. We support this by hand-writing a rule-based language model which yields instruction following in a product-of-experts with a pretrained model. The rules are to slowly increase the probability of ending the sequence, penalize repetition, and uniformly change 15 words' probabilities. In summary, adaptations made without being designed to yield instruction following can do so implicitly.

無需調整指示的指示跟隨

Instruction Following without Instruction Tuning

摘要

Summary

Support

Support