情境化對抗言論:適應、個人化和評估策略
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation
December 10, 2024
作者: Lorenzo Cima, Alessio Miaschi, Amaury Trujillo, Marco Avvenuti, Felice Dell'Orletta, Stefano Cresci
cs.AI
摘要
AI生成的對話反制提供了一種有前途且可擴展的策略,通過直接回覆來促進文明對話,以遏制網絡毒性。然而,目前的對話反制是一刀切的,缺乏對調節上下文和參與用戶的適應性。我們提出並評估了多種生成定制對話反制策略,這些策略適應了調節上下文並為被調節用戶量身定制。我們指導一個LLaMA2-13B模型生成對話反制,通過基於不同上下文信息和微調策略的各種配置進行實驗。我們通過預先註冊的混合設計眾包實驗,通過定量指標和人類評估的結合,確定生成具有說服力的對話反制的配置。結果顯示,情境化的對話反制在適當性和說服力方面可以顯著優於最先進的通用對話反制,而不會影響其他特徵。我們的研究還揭示了定量指標和人類評估之間的差異,表明這些方法評估了不同的方面,並突出了對細緻評估方法的需求。情境化AI生成的對話反制的有效性以及人類和算法評估之間的分歧突顯了在內容調節中增加人工智能與人類合作的重要性。
English
AI-generated counterspeech offers a promising and scalable strategy to curb
online toxicity through direct replies that promote civil discourse. However,
current counterspeech is one-size-fits-all, lacking adaptation to the
moderation context and the users involved. We propose and evaluate multiple
strategies for generating tailored counterspeech that is adapted to the
moderation context and personalized for the moderated user. We instruct an
LLaMA2-13B model to generate counterspeech, experimenting with various
configurations based on different contextual information and fine-tuning
strategies. We identify the configurations that generate persuasive
counterspeech through a combination of quantitative indicators and human
evaluations collected via a pre-registered mixed-design crowdsourcing
experiment. Results show that contextualized counterspeech can significantly
outperform state-of-the-art generic counterspeech in adequacy and
persuasiveness, without compromising other characteristics. Our findings also
reveal a poor correlation between quantitative indicators and human
evaluations, suggesting that these methods assess different aspects and
highlighting the need for nuanced evaluation methodologies. The effectiveness
of contextualized AI-generated counterspeech and the divergence between human
and algorithmic evaluations underscore the importance of increased human-AI
collaboration in content moderation.Summary
AI-Generated Summary