透過單一序列內的平行解碼加速可平行化的推理過程
Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence
March 26, 2025
作者: Yijiong Yu
cs.AI
摘要
近期推理模型的進展顯示,通過採用詳盡且全面的推理過程,特別是在數學推理等複雜任務上,準確性有了顯著提升。然而,生成這些冗長的推理序列在計算上既昂貴又耗時。為解決這一效率問題,我們利用某些任務固有的可並行性來加速推理過程。具體而言,當存在多個並行推理分支時,我們使用專門的注意力掩碼在每一步解碼多個標記,並在單一序列中處理它們,從而避免額外的記憶體使用。實驗結果表明,我們的方法在保持答案品質的同時,解碼時間實現了超過100%的加速。
English
Recent advances in reasoning models have demonstrated significant
improvements in accuracy, particularly for complex tasks such as mathematical
reasoning, by employing detailed and comprehensive reasoning processes.
However, generating these lengthy reasoning sequences is computationally
expensive and time-consuming. To address this inefficiency, we leverage the
inherent parallelizability of certain tasks to accelerate the reasoning
process. Specifically, when multiple parallel reasoning branches exist, we
decode multiple tokens per step using a specialized attention mask, processing
them within a single sequence, avoiding additional memory usage. Experimental
results show that our method achieves over 100% speedup in decoding time while
maintaining the answer quality.Summary
AI-Generated Summary