SmolTulu：較高的學習率與批次大小比率可能導致在SLM中更好的推理能力。

摘要

我們介紹了 SmolTulu-1.7b-Instruct，在本報告中稱為 SmolTulu-DPO-1130，這是一個經過指令調整的語言模型，它適應了AllenAI的Tulu 3後訓練流程，以增強Huggingface的SmolLM2-1.7B基礎模型。通過使用一個包含1.35億參數的模型進行全面的實證分析，我們展示了學習率和批量大小之間的關係對模型性能有著顯著的、與任務相關的影響。我們的研究結果揭示了一個明顯的區分：像ARC和GSM8K這樣的推理任務受益於較高的學習率與批量大小比率，而像HellaSwag和IFEval這樣的模式識別任務則在較低比率下表現出最佳性能。這些見解促成了SmolTulu的開發，該模型在遵循指令方面實現了頂尖性能，在IFEval上得分67.7%（Delta11%），在GSM8K上的數學推理得分為51.6%（Delta3.4%），另一個版本在ARC上得分57.1%（Delta5.4%）。我們釋出了我們的模型、訓練配方和消融研究，以促進對高效模型對齊的進一步研究，顯示了對優化動態的細心適應可以幫助彌合小型和大型語言模型之間的能力差距。

English

We present SmolTulu-1.7b-Instruct, referenced in this report as SmolTulu-DPO-1130, an instruction-tuned language model that adapts AllenAI's Tulu 3 post-training pipeline to enhance Huggingface's SmolLM2-1.7B base model. Through comprehensive empirical analysis using a 135M parameter model, we demonstrate that the relationship between learning rate and batch size significantly impacts model performance in a task-dependent manner. Our findings reveal a clear split: reasoning tasks like ARC and GSM8K benefit from higher learning rate to batch size ratios, while pattern recognition tasks such as HellaSwag and IFEval show optimal performance with lower ratios. These insights informed the development of SmolTulu, which achieves state-of-the-art performance among sub-2B parameter models on instruction following, scoring 67.7% on IFEval (Delta11%), and mathematical reasoning with 51.6% on GSM8K (Delta3.4%), with an alternate version achieving scoring 57.1% on ARC (Delta5.4%). We release our model, training recipes, and ablation studies to facilitate further research in efficient model alignment, demonstrating that careful adaptation of optimization dynamics can help bridge the capability gap between small and large language models.

SmolTulu：較高的學習率與批次大小比率可能導致在SLM中更好的推理能力。

SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

摘要

Summary

Support