ChatPaper.aiChatPaper

蒂娜:通过LoRA实现的微型推理模型

Tina: Tiny Reasoning Models via LoRA

April 22, 2025
作者: Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Willie Neiswanger
cs.AI

摘要

如何在语言模型中经济高效地实现强大的推理能力?基于这一根本性问题,我们推出了Tina系列,这是一组以高成本效益实现的小型推理模型。尤为突出的是,Tina展示了仅需极少量资源,通过在强化学习(RL)过程中应用参数高效更新策略,即采用低秩适应(LoRA)技术,对仅1.5B参数的微型基础模型进行调整,即可显著提升推理性能。这种极简主义方法所构建的模型,其推理表现不仅与基于同一基础模型构建的当前最优(SOTA)RL推理模型相媲美,有时甚至超越之。关键在于,这一切是在仅需现有SOTA模型训练后计算成本极小一部分的情况下实现的。实际上,最佳Tina模型在AIME24上实现了超过20%的推理性能提升和43.33%的Pass@1准确率,而训练后评估成本仅为9美元(即估计成本降低了260倍)。我们的研究揭示了通过LoRA实现高效RL推理的惊人效果。我们在多个开源推理数据集及多种消融设置下,从一组固定的超参数出发,验证了这一发现。此外,我们推测这种高效性与有效性源于LoRA快速使模型适应RL奖励的推理结构格式,同时很大程度上保留了基础模型的底层知识。为了促进可访问性和开放研究,我们完全开源了所有代码、训练日志及模型权重与检查点。
English
How cost-effectively can strong reasoning abilities be achieved in language models? Driven by this fundamental question, we present Tina, a family of tiny reasoning models achieved with high cost-efficiency. Notably, Tina demonstrates that substantial reasoning performance can be developed using only minimal resources, by applying parameter-efficient updates during reinforcement learning (RL), using low-rank adaptation (LoRA), to an already tiny 1.5B parameter base model. This minimalist approach produces models that achieve reasoning performance which is competitive with, and sometimes surpasses, SOTA RL reasoning models built upon the same base model. Crucially, this is achieved at a tiny fraction of the computational post-training cost employed by existing SOTA models. In fact, the best Tina model achieves a >20\% reasoning performance increase and 43.33\% Pass@1 accuracy on AIME24, at only \$9 USD post-training and evaluation cost (i.e., an estimated 260x cost reduction). Our work reveals the surprising effectiveness of efficient RL reasoning via LoRA. We validate this across multiple open-source reasoning datasets and various ablation settings starting with a single, fixed set of hyperparameters. Furthermore, we hypothesize that this effectiveness and efficiency stem from LoRA rapidly adapting the model to the structural format of reasoning rewarded by RL, while largely preserving the base model's underlying knowledge. In service of accessibility and open research, we fully open-source all code, training logs, and model weights \& checkpoints.

Summary

AI-Generated Summary

PDF153April 24, 2025