LazyReview:揭示NLP同行评审中惰性思维的数据集
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews
April 15, 2025
作者: Sukannya Purkayastha, Zhuang Li, Anne Lauscher, Lizhen Qu, Iryna Gurevych
cs.AI
摘要
同行评审是科学出版质量控制的核心环节。随着工作量的不断增加,无意中采用“快速”启发式方法(即所谓的惰性思维)已成为影响评审质量的常见问题。自动化检测此类启发式方法的手段有助于改进同行评审流程。然而,针对这一问题的自然语言处理研究尚显不足,且缺乏支持检测工具开发的真实数据集。本研究推出了LazyReview,一个标注了细粒度惰性思维类别的同行评审句子数据集。我们的分析表明,大型语言模型(LLMs)在零样本设置下难以有效识别这些实例。但基于我们数据集进行指令微调后,模型性能显著提升了10到20个百分点,凸显了高质量训练数据的重要性。此外,一项对照实验证明,经过惰性思维反馈修订的评审意见比未接受此类反馈撰写的更为全面且具有可操作性。我们将公开此数据集及改进后的指导原则,供社区用于培训初级评审员。(代码获取地址:https://github.com/UKPLab/arxiv2025-lazy-review)
English
Peer review is a cornerstone of quality control in scientific publishing.
With the increasing workload, the unintended use of `quick' heuristics,
referred to as lazy thinking, has emerged as a recurring issue compromising
review quality. Automated methods to detect such heuristics can help improve
the peer-reviewing process. However, there is limited NLP research on this
issue, and no real-world dataset exists to support the development of detection
tools. This work introduces LazyReview, a dataset of peer-review sentences
annotated with fine-grained lazy thinking categories. Our analysis reveals that
Large Language Models (LLMs) struggle to detect these instances in a zero-shot
setting. However, instruction-based fine-tuning on our dataset significantly
boosts performance by 10-20 performance points, highlighting the importance of
high-quality training data. Furthermore, a controlled experiment demonstrates
that reviews revised with lazy thinking feedback are more comprehensive and
actionable than those written without such feedback. We will release our
dataset and the enhanced guidelines that can be used to train junior reviewers
in the community. (Code available here:
https://github.com/UKPLab/arxiv2025-lazy-review)Summary
AI-Generated Summary