數學神經外科手術:僅利用前向傳遞來分離語言模型的數學推理能力
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes
October 22, 2024
作者: Bryan R. Christ, Zack Gottesman, Jonathan Kropko, Thomas Hartvigsen
cs.AI
摘要
數學推理是大型語言模型(LLM)研究中一個極具活力的領域,因為它是人工智慧的標誌。然而,很少有研究探討數學推理如何在LLM參數中編碼,以及它是否可以在模型中被孤立出來。這樣做可以實現有針對性的干預,以提高數學表現,同時不改變非數學行為,並促進對模型如何編碼數學推理的理解。我們介紹了數學神經外科(MathNeuro),這是一種僅使用前向傳遞來孤立LLM中數學特定參數的方法。MathNeuro在現有工作的基礎上進行了擴展,通過使用權重和激活來計算參數重要性,但通過刪除那些對於一般語言任務重要的參數,從而孤立出數學特定參數。MathNeuro識別出的修剪參數會刪除LLM的數學推理能力,而不會破壞其一般語言能力。通過將這些參數按一個小常數進行縮放,可以使預訓練或指導調整的LLM在GSM8K上的性能提高4-17%,同時不改變非數學行為。MathNeuro還具有數據效率:當使用單個樣本識別數學特定參數時,其大部分效果仍然存在。MathNeuro突顯了未來工作介入數學特定參數的潛力。
English
Math reasoning is a highly active area of Large Language Model (LLM) research
because it is a hallmark of artificial intelligence. However, few works have
explored how math reasoning is encoded within LLM parameters and if it is a
skill that can be isolated within a model. Doing so could allow targeted
intervention to improve math performance without altering non-math behavior and
foster understanding of how models encode math reasoning. We introduce Math
Neurosurgery (MathNeuro), a method for isolating math-specific parameters in
LLMs using only forward passes. MathNeuro builds on existing work by using
weights and activations to calculate parameter importance, but isolates
math-specific parameters by removing those important for general language
tasks. Pruning parameters MathNeuro identifies deletes a LLM's math reasoning
ability without destroying its general language ability. Scaling these
parameters by a small constant improves a pretrained or instruction-tuned LLM's
performance by 4-17% on GSM8K while leaving non-math behavior unaltered.
MathNeuro is also data efficient: most of its effectiveness holds when
identifying math-specific parameters using a single sample. MathNeuro
highlights the potential for future work to intervene on math-specific
parameters.Summary
AI-Generated Summary