ChatPaper.aiChatPaper

SurveyForge:基于大纲启发式、记忆驱动生成与多维度评估的自动化问卷撰写系统

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

March 6, 2025
作者: Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai
cs.AI

摘要

综述论文在科学研究中扮演着至关重要的角色,尤其是在研究出版物迅速增长的背景下。近期,研究者们开始利用大语言模型(LLMs)自动化生成综述,以提高效率。然而,LLM生成的综述与人类撰写的综述之间仍存在显著的质量差距,特别是在提纲质量和引用准确性方面。为缩小这些差距,我们推出了SurveyForge,它首先通过分析人类撰写综述的逻辑结构并参考检索到的领域相关文章来生成提纲。随后,借助我们的学术导航代理从记忆中检索到的高质量论文,SurveyForge能够自动生成并优化文章内容。此外,为实现全面评估,我们构建了SurveyBench,其中包含100篇人类撰写的综述论文用于胜率比较,并从参考文献、提纲和内容质量三个维度评估AI生成的综述论文。实验表明,SurveyForge在性能上超越了AutoSurvey等先前工作。
English
Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

Summary

AI-Generated Summary

PDF152March 11, 2025