Pensa Due Volte: Migliorare il Ragionamento dei Modelli Linguistici su Grande Scala attraverso Pensiero Multi-round in Fase di Test

Abstract

I recenti progressi nei grandi modelli linguistici (LLM), come OpenAI-o1 e DeepSeek-R1, hanno dimostrato l'efficacia dello scaling al momento del test, dove processi di ragionamento estesi migliorano sostanzialmente le prestazioni del modello. Nonostante ciò, i modelli attuali sono limitati nella gestione di testi lunghi e nell'efficienza dell'addestramento con apprendimento per rinforzo (RL). Per affrontare questi problemi, proponiamo un approccio semplice ma efficace di scaling al momento del test chiamato Multi-round Thinking. Questo metodo affina iterativamente il ragionamento del modello sfruttando le risposte precedenti come prompt per i round successivi. Esperimenti estesi su più modelli, tra cui QwQ-32B e DeepSeek-R1, mostrano costantemente miglioramenti delle prestazioni su vari benchmark come AIME 2024, MATH-500, GPQA-diamond e LiveCodeBench. Ad esempio, l'accuratezza di QwQ-32B è migliorata dall'80,3% (Round 1) all'82,1% (Round 2) sul dataset AIME 2024, mentre DeepSeek-R1 ha mostrato un aumento simile dal 79,7% all'82,0%. Questi risultati confermano che Multi-round Thinking è un approccio ampiamente applicabile e semplice per ottenere miglioramenti stabili nelle prestazioni del modello, sottolineandone il potenziale per futuri sviluppi nelle tecniche di scaling al momento del test. Il prompt chiave: {Prompt della domanda originale} La risposta precedente dell'assistente è: <risposta> {risposta del round precedente} </risposta>, e si prega di rispondere nuovamente.

English

Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current models are constrained by limitations in handling long texts and reinforcement learning (RL) training efficiency. To address these issues, we propose a simple yet effective test-time scaling approach Multi-round Thinking. This method iteratively refines model reasoning by leveraging previous answers as prompts for subsequent rounds. Extensive experiments across multiple models, including QwQ-32B and DeepSeek-R1, consistently show performance improvements on various benchmarks such as AIME 2024, MATH-500, GPQA-diamond, and LiveCodeBench. For instance, the accuracy of QwQ-32B improved from 80.3% (Round 1) to 82.1% (Round 2) on the AIME 2024 dataset, while DeepSeek-R1 showed a similar increase from 79.7% to 82.0%. These results confirm that Multi-round Thinking is a broadly applicable, straightforward approach to achieving stable enhancements in model performance, underscoring its potential for future developments in test-time scaling techniques. The key prompt: {Original question prompt} The assistant's previous answer is: <answer> {last round answer} </answer>, and please re-answer.

Pensa Due Volte: Migliorare il Ragionamento dei Modelli Linguistici su Grande Scala attraverso Pensiero Multi-round in Fase di Test

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Abstract

Summary

Support

Support