ChatPaper.aiChatPaper

大型語言模型可能成為危險的說服者:大型語言模型說服安全性的實證研究

LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models

April 14, 2025
作者: Minqian Liu, Zhiyang Xu, Xinyi Zhang, Heajun An, Sarvech Qadir, Qi Zhang, Pamela J. Wisniewski, Jin-Hee Cho, Sang Won Lee, Ruoxi Jia, Lifu Huang
cs.AI

摘要

近期大型語言模型(LLMs)的進展,使其具備了接近人類水準的說服能力。然而,這種潛力也引發了對LLM驅動說服的安全風險的擔憂,尤其是其可能通過操縱、欺騙、利用弱點及其他多種有害策略進行不道德影響的潛在風險。在本研究中,我們從兩個關鍵方面對LLM說服安全性進行了系統性調查:(1) LLMs是否能夠恰當地拒絕不道德的說服任務,並在執行過程中避免使用不道德策略,包括初始說服目標看似道德中立的情況;(2) 人格特質和外部壓力等影響因素如何影響其行為。為此,我們提出了PersuSafety,這是首個全面的說服安全性評估框架,包含三個階段,即說服場景創建、說服對話模擬和說服安全性評估。PersuSafety涵蓋了6種不同的不道德說服主題和15種常見的不道德策略。通過對8個廣泛使用的LLMs進行大量實驗,我們觀察到大多數LLMs存在顯著的安全隱患,包括未能識別有害的說服任務以及利用各種不道德的說服策略。我們的研究呼籲更多地關注在漸進式和目標驅動的對話(如說服)中提升安全對齊性。
English
Recent advancements in Large Language Models (LLMs) have enabled them to approach human-level persuasion capabilities. However, such potential also raises concerns about the safety risks of LLM-driven persuasion, particularly their potential for unethical influence through manipulation, deception, exploitation of vulnerabilities, and many other harmful tactics. In this work, we present a systematic investigation of LLM persuasion safety through two critical aspects: (1) whether LLMs appropriately reject unethical persuasion tasks and avoid unethical strategies during execution, including cases where the initial persuasion goal appears ethically neutral, and (2) how influencing factors like personality traits and external pressures affect their behavior. To this end, we introduce PersuSafety, the first comprehensive framework for the assessment of persuasion safety which consists of three stages, i.e., persuasion scene creation, persuasive conversation simulation, and persuasion safety assessment. PersuSafety covers 6 diverse unethical persuasion topics and 15 common unethical strategies. Through extensive experiments across 8 widely used LLMs, we observe significant safety concerns in most LLMs, including failing to identify harmful persuasion tasks and leveraging various unethical persuasion strategies. Our study calls for more attention to improve safety alignment in progressive and goal-driven conversations such as persuasion.

Summary

AI-Generated Summary

PDF32April 15, 2025