上下文化的反驳:适应、个性化和评估策略

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

December 10, 2024
作者: Lorenzo Cima, Alessio Miaschi, Amaury Trujillo, Marco Avvenuti, Felice Dell'Orletta, Stefano Cresci
cs.AI

摘要

AI生成的反言论提供了一种有前景且可扩展的策略,通过直接回复来促进文明对话以遏制在线毒性。然而,当前的反言论是一刀切的,缺乏对调节背景和相关用户的适应性。我们提出并评估了多种生成定制反言论的策略,这些反言论适应于调节背景并为被调节用户个性化定制。我们指导一个LLaMA2-13B模型生成反言论,通过基于不同上下文信息和微调策略的各种配置进行实验。我们通过一个预先注册的混合设计众包实验,结合定量指标和人类评估,确定了能够生成有说服力反言论的配置。结果显示,情境化反言论在充分性和说服力方面可以显著优于最先进的通用反言论,而不会影响其他特征。我们的研究结果还揭示了定量指标和人类评估之间的较差相关性,表明这些方法评估了不同方面,突出了对细致评估方法的需求。情境化AI生成的反言论的有效性以及人类和算法评估之间的分歧强调了在内容调节中增加人工智能与人类合作的重要性。
English
AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

Summary

AI-Generated Summary

PDF22December 11, 2024