ChemAgent：在大型語言模型中自我更新的程式庫改進化學推理

摘要

化學推理通常涉及複雜的多步驟過程，需要精確計算，即使是輕微的錯誤也可能導致連鎖失敗。此外，大型語言模型（LLMs）在處理特定領域的公式、準確執行推理步驟和有效整合代碼時遇到困難，尤其在處理化學推理任務時。為應對這些挑戰，我們提出了ChemAgent，這是一個旨在通過動態、自我更新的庫來提高LLMs性能的新框架。該庫通過將化學任務分解為子任務，將這些子任務編譯成結構化集合，以供未來查詢。然後，當遇到新問題時，ChemAgent從庫中檢索並精煉相關信息，我們稱之為記憶，促進有效的任務分解和解決方案的生成。我們的方法設計了三種記憶類型和一個增強庫的推理組件，使LLMs能夠通過經驗不斷改進。從SciBench的四個化學推理數據集的實驗結果顯示，ChemAgent實現了高達46%（GPT-4）的性能增益，明顯優於現有方法。我們的研究結果表明，在未來應用中存在巨大潛力，包括藥物發現和材料科學等任務。我們的代碼可在https://github.com/gersteinlab/chemagent 找到。

English

Chemical reasoning usually involves complex, multi-step processes that demand precise calculations, where even minor errors can lead to cascading failures. Furthermore, large language models (LLMs) encounter difficulties handling domain-specific formulas, executing reasoning steps accurately, and integrating code effectively when tackling chemical reasoning tasks. To address these challenges, we present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library. This library is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries. Then, when presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory, facilitating effective task decomposition and the generation of solutions. Our method designs three types of memory and a library-enhanced reasoning component, enabling LLMs to improve over time through experience. Experimental results on four chemical reasoning datasets from SciBench demonstrate that ChemAgent achieves performance gains of up to 46% (GPT-4), significantly outperforming existing methods. Our findings suggest substantial potential for future applications, including tasks such as drug discovery and materials science. Our code can be found at https://github.com/gersteinlab/chemagent

ChemAgent：在大型語言模型中自我更新的程式庫改進化學推理

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

摘要

Support