HARP:在Transformer 推論過程中考慮猶豫的重新構架
HARP: Hesitation-Aware Reframing in Transformer Inference Pass
December 10, 2024
作者: Romain Storaï, Seung-won Hwang
cs.AI
摘要
本文旨在改善大型語言模型的性能,針對推論步驟中存在的可變計算需求進行處理,其中一些標記需要比其他標記更多的計算資源。我們提出了HARP,這是對“現成”Transformer前向傳遞的簡單修改。借鑒於決策中的猶豫和框架效應,HARP在模型在標記生成過程中遇到不確定性時選擇性地應用額外的計算。我們的方法通過在困難的決策點暫停並重新構思輸入以獲得不同的視角,模仿人類的認知過程。與其他方法不同,HARP是與模型無關、無需訓練且易於實施的。我們在各種下游任務和模型大小上進行了全面評估,展示了高達+5.16%的性能改進。值得注意的是,HARP實現了這些增益,同時推論時間比束搜索快兩倍。HARP既簡單又具有顯著的增益,為改善基於Transformer的語言模型的性能提供了一個實用的解決方案,並對計算影響最小。
English
This paper aims to improve the performance of large language models by
addressing the variable computational demands in inference steps, where some
tokens require more computational resources than others. We present HARP, a
simple modification to "off-the-shelf" Transformer forward pass. Drawing from
hesitation and the framing effect in decision-making, HARP selectively applies
additional computation when the model encounters uncertainty during token
generation. Our method mimics human cognitive processes by pausing at difficult
decision points and reframing inputs for a different perspective. Unlike other
approaches, HARP is model-agnostic, training-free, and easy to implement. We
thoroughly evaluate our method across various downstream tasks and model sizes,
demonstrating performance improvements up to +5.16%. Notably, HARP achieves
these gains while maintaining inference times twice faster than beam search.
Simple and yet with significant gains, HARP offers a practical solution for
enhancing the performance of Transformer-based language models with minimal
computational impact.Summary
AI-Generated Summary