在學習過程中適應:通過智能工具使用適應來為科學問題奠定LLMs基礎
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
November 1, 2024
作者: Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
cs.AI
摘要
大型語言模型(LLMs)展示了在解決簡單科學問題方面的潛力,但在複雜問題上常常產生幻覺。雖然將LLMs與工具整合可以提高可靠性,但這種方法通常會導致對工具的過度依賴,從而降低模型通過基本推理解決簡單問題的能力。相比之下,人類專家會在選擇適當的解決方案之前,首先利用領域知識評估問題的複雜性。受到這種人類解決問題過程的啟發,我們提出了一種新穎的雙組件微調方法。在第一個組件「世界知識蒸餾」(WKD)中,LLMs直接從使用工具信息生成的解決方案中學習,以內化領域知識。在第二個組件「工具使用適應」(TUA)中,我們根據模型的直接回答準確性將問題劃分為簡單和困難類別。在保持對簡單問題的對齊目標與WKD相同的情況下,我們訓練模型智能地在面對更具挑戰性的問題時切換到使用工具。我們在涵蓋數學、氣候科學和流行病學的六個科學基準數據集上驗證了我們的方法。平均而言,我們的模型在所有數據集上的答案準確性提高了28.18%,工具使用精確度提高了13.89%,超越了包括GPT-4o和Claude-3.5在內的最先進模型。
English
Large Language Models (LLMs) demonstrate promising capabilities in solving
simple scientific problems but often produce hallucinations for complex ones.
While integrating LLMs with tools can increase reliability, this approach
typically results in over-reliance on tools, diminishing the model's ability to
solve simple problems through basic reasoning. In contrast, human experts first
assess problem complexity using domain knowledge before choosing an appropriate
solution approach. Inspired by this human problem-solving process, we propose a
novel two-component fine-tuning method. In the first component World Knowledge
Distillation (WKD), LLMs learn directly from solutions generated using tool's
information to internalize domain knowledge. In the second component Tool Usage
Adaptation (TUA), we partition problems into easy and hard categories based on
the model's direct answering accuracy. While maintaining the same alignment
target for easy problems as in WKD, we train the model to intelligently switch
to tool usage for more challenging problems. We validate our method on six
scientific benchmark datasets, spanning mathematics, climate science and
epidemiology. On average, our models demonstrate a 28.18% improvement in answer
accuracy and a 13.89% increase in tool usage precision across all datasets,
surpassing state-of-the-art models including GPT-4o and Claude-3.5.Summary
AI-Generated Summary