通过分析、检索和推理实现大型语言模型的问答系统
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
February 7, 2025
作者: Yuwei Yin, Giuseppe Carenini
cs.AI
摘要
大型语言模型(LLMs)在往往以多项选择问答(QA)任务结构化的具有挑战性的基准测试中取得了显著的表现。零-shot Chain-of-Thought(CoT)提示增强了LLMs中的推理能力,但只提供了模糊和通用的指导(“逐步思考”)。本文介绍了ARR,这是一种直观有效的零-shot提示方法,明确地将QA解决中的三个关键步骤纳入其中:分析问题意图、检索相关信息和逐步推理。在各种具有挑战性的QA任务上进行的全面实验表明,ARR始终提高了基准(不使用ARR提示)的性能,并且胜过了CoT。消融和案例研究进一步验证了每个组成部分的积极贡献:分析、检索和推理。值得注意的是,意图分析在ARR中起着至关重要的作用。此外,在各种模型大小、LLM系列和生成设置上进行的广泛评估巩固了ARR的有效性、稳健性和泛化能力。
English
Large language models (LLMs) achieve remarkable performance on challenging
benchmarks that are often structured as multiple-choice question-answering (QA)
tasks. Zero-shot Chain-of-Thought (CoT) prompting enhances reasoning in LLMs
but provides only vague and generic guidance ("think step by step"). This paper
introduces ARR, an intuitive and effective zero-shot prompting method that
explicitly incorporates three key steps in QA solving: analyzing the intent of
the question, retrieving relevant information, and reasoning step by step.
Comprehensive experiments across diverse and challenging QA tasks demonstrate
that ARR consistently improves the Baseline (without ARR prompting) and
outperforms CoT. Ablation and case studies further validate the positive
contributions of each component: analyzing, retrieving, and reasoning. Notably,
intent analysis plays a vital role in ARR. Additionally, extensive evaluations
across various model sizes, LLM series, and generation settings solidify the
effectiveness, robustness, and generalizability of ARR.Summary
AI-Generated Summary