LazyReview: Un Dataset per Identificare il Pensiero Pigro nelle Revisioni Paritetiche in NLP

Abstract

La revisione tra pari è un pilastro fondamentale del controllo di qualità nella pubblicazione scientifica. Con il crescente carico di lavoro, l'uso involontario di euristiche "rapide", denominato pensiero pigro, è emerso come un problema ricorrente che compromette la qualità delle revisioni. Metodi automatizzati per rilevare tali euristiche possono contribuire a migliorare il processo di revisione tra pari. Tuttavia, la ricerca in NLP su questo tema è limitata, e non esiste un dataset reale per supportare lo sviluppo di strumenti di rilevamento. Questo lavoro introduce LazyReview, un dataset di frasi di revisione tra pari annotate con categorie dettagliate di pensiero pigro. La nostra analisi rivela che i Large Language Models (LLMs) faticano a rilevare questi casi in uno scenario zero-shot. Tuttavia, il fine-tuning basato su istruzioni utilizzando il nostro dataset migliora significativamente le prestazioni di 10-20 punti, sottolineando l'importanza di dati di addestramento di alta qualità. Inoltre, un esperimento controllato dimostra che le revisioni modificate con feedback sul pensiero pigro sono più complete e azionabili rispetto a quelle scritte senza tale feedback. Rilascieremo il nostro dataset e le linee guida migliorate che possono essere utilizzate per formare i revisori junior nella comunità. (Codice disponibile qui: https://github.com/UKPLab/arxiv2025-lazy-review)

English

Peer review is a cornerstone of quality control in scientific publishing. With the increasing workload, the unintended use of `quick' heuristics, referred to as lazy thinking, has emerged as a recurring issue compromising review quality. Automated methods to detect such heuristics can help improve the peer-reviewing process. However, there is limited NLP research on this issue, and no real-world dataset exists to support the development of detection tools. This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Our analysis reveals that Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. However, instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points, highlighting the importance of high-quality training data. Furthermore, a controlled experiment demonstrates that reviews revised with lazy thinking feedback are more comprehensive and actionable than those written without such feedback. We will release our dataset and the enhanced guidelines that can be used to train junior reviewers in the community. (Code available here: https://github.com/UKPLab/arxiv2025-lazy-review)

LazyReview: Un Dataset per Identificare il Pensiero Pigro nelle Revisioni Paritetiche in NLP

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews

Abstract

Summary

Support

Support