Chatbot 可信賴人類評估的挑戰

Challenges in Trustworthy Human Evaluation of Chatbots

December 5, 2024
作者: Wenting Zhao, Alexander M. Rush, Tanya Goyal
cs.AI

摘要

像Chatbot Arena這樣的開放社區驅動平台,從網站訪客中收集使用者偏好數據,已經成為LLM性能最可信賴的公開基準之一。雖然這已經成為標準,但實施有效的防護措施以從人類獲取高質量標註並不容易。在本文中,我們展示了三種不良標註的來源,包括惡意和其他方式,可能會損害開放排行榜排名的可靠性。特別是,我們發現只有10\%的低質量投票,來自對此漠不關心(網站訪客沒有適當激勵給出正確投票)或敵對(惡意行為者試圖提升目標模型排名)的標註者,就能將模型在排行榜上的排名改變多達5個位置。最後,我們討論確保高質量人類標註的開放挑戰。
English
Open community-driven platforms like Chatbot Arena that collect user preference data from site visitors have gained a reputation as one of the most trustworthy publicly available benchmarks for LLM performance. While now standard, it is tricky to implement effective guardrails to collect high-quality annotations from humans. In this paper, we demonstrate that three sources of bad annotations, both malicious and otherwise, can corrupt the reliability of open leaderboard rankings. In particular, we show that only 10\% of poor quality votes by apathetic (site visitors not appropriately incentivized to give correct votes) or adversarial (bad actors seeking to inflate the ranking of a target model) annotators can change the rankings of models by up to 5 places on the leaderboard. Finally, we discuss open challenges in ensuring high-quality human annotations.

Summary

AI-Generated Summary

PDF22December 6, 2024