ChatPaper.aiChatPaper

学习过程中的适应性:利用智能工具适应性为科学问题奠定基础

Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

November 1, 2024
作者: Bohan Lyu, Yadi Cao, Duncan Watson-Parris, Leon Bergen, Taylor Berg-Kirkpatrick, Rose Yu
cs.AI

摘要

大型语言模型(LLMs)展示了在解决简单科学问题方面的有希望的能力,但在复杂问题上往往会产生幻觉。虽然将LLMs与工具集成可以提高可靠性,但这种方法通常会导致对工具的过度依赖,降低模型通过基本推理解决简单问题的能力。相比之下,人类专家会首先利用领域知识评估问题复杂性,然后选择适当的解决方案。受到这种人类解决问题过程的启发,我们提出了一种新颖的两部分微调方法。在第一部分世界知识蒸馏(WKD)中,LLMs直接从使用工具信息生成的解决方案中学习,以内化领域知识。在第二部分工具使用适应(TUA)中,我们根据模型的直接回答准确性将问题分为简单和困难两类。在保持与WKD中简单问题的相同对齐目标的同时,我们训练模型智能地在面对更具挑战性的问题时切换到工具使用。我们在涵盖数学、气候科学和流行病学的六个科学基准数据集上验证了我们的方法。平均而言,我们的模型在所有数据集上的回答准确性提高了28.18%,工具使用精度提高了13.89%,超过了包括GPT-4o和Claude-3.5在内的最先进模型。
English
Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but often produce hallucinations for complex ones. While integrating LLMs with tools can increase reliability, this approach typically results in over-reliance on tools, diminishing the model's ability to solve simple problems through basic reasoning. In contrast, human experts first assess problem complexity using domain knowledge before choosing an appropriate solution approach. Inspired by this human problem-solving process, we propose a novel two-component fine-tuning method. In the first component World Knowledge Distillation (WKD), LLMs learn directly from solutions generated using tool's information to internalize domain knowledge. In the second component Tool Usage Adaptation (TUA), we partition problems into easy and hard categories based on the model's direct answering accuracy. While maintaining the same alignment target for easy problems as in WKD, we train the model to intelligently switch to tool usage for more challenging problems. We validate our method on six scientific benchmark datasets, spanning mathematics, climate science and epidemiology. On average, our models demonstrate a 28.18% improvement in answer accuracy and a 13.89% increase in tool usage precision across all datasets, surpassing state-of-the-art models including GPT-4o and Claude-3.5.

Summary

AI-Generated Summary

PDF93November 13, 2024