ChatPaper.aiChatPaper

FLARE:忠實邏輯輔助推理與探索

FLARE: Faithful Logic-Aided Reasoning and Exploration

October 14, 2024
作者: Erik Arakelyan, Pasquale Minervini, Pat Verga, Patrick Lewis, Isabelle Augenstein
cs.AI

摘要

基於大型語言模型(LLMs)的現代問答(QA)和推理方法通常使用提示技術,例如Chain-of-Thought(CoT),假設生成的結果將對問題空間和範圍進行更細緻的探索和推理。然而,這些方法在生成符合模型產生的中間推理鏈的輸出時存在困難。在另一端,神經符號方法,如Faithful CoT(F-CoT),提議將LLMs與外部符號求解器結合。雖然這些方法具有高度的忠實度,但通常需要為代碼生成訓練的模型,並且在處理模糊或難以嚴格形式化的任務時存在困難。我們引入Faithful Logic-Aided Reasoning and Exploration(\ours),這是一種新穎的可解釋方法,用於通過任務分解遍歷問題空間。我們使用LLM來規劃解決方案,通過邏輯編程代碼將查詢軟形式化為事實和謂詞,並使用在定義空間上進行積極多跳搜索的代碼模擬該代碼執行。我們的方法使我們能夠計算推理過程相對於生成的代碼的忠實度,並分析多跳搜索步驟,而無需依賴外部求解器。我們的方法在9個不同的推理基準測試中有7個取得了SOTA結果。我們還展示了模型的忠實度與整體性能呈正相關,並進一步證明{\ours}能夠找出導致正確答案的關鍵因素,並在多跳搜索期間進行最佳推理。
English
Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation will have a more granular exploration and reasoning over the question space and scope. However, such methods struggle with generating outputs that are faithful to the intermediate chain of reasoning produced by the model. On the other end of the spectrum, neuro-symbolic methods such as Faithful CoT (F-CoT) propose to combine LLMs with external symbolic solvers. While such approaches boast a high degree of faithfulness, they usually require a model trained for code generation and struggle with tasks that are ambiguous or hard to formalise strictly. We introduce Faithful Logic-Aided Reasoning and Exploration (\ours), a novel interpretable approach for traversing the problem space using task decompositions. We use the LLM to plan a solution, soft-formalise the query into facts and predicates using a logic programming code and simulate that code execution using an exhaustive multi-hop search over the defined space. Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers. Our methods achieve SOTA results on 7 out of 9 diverse reasoning benchmarks. We also show that model faithfulness positively correlates with overall performance and further demonstrate that {\ours} allows pinpointing the decisive factors sufficient for and leading to the correct answer with optimal reasoning during the multi-hop search.

Summary

AI-Generated Summary

PDF42November 16, 2024