情境化對抗言論：適應、個人化和評估策略

摘要

AI生成的對話反制提供了一種有前途且可擴展的策略，通過直接回覆來促進文明對話，以遏制網絡毒性。然而，目前的對話反制是一刀切的，缺乏對調節上下文和參與用戶的適應性。我們提出並評估了多種生成定制對話反制策略，這些策略適應了調節上下文並為被調節用戶量身定制。我們指導一個LLaMA2-13B模型生成對話反制，通過基於不同上下文信息和微調策略的各種配置進行實驗。我們通過預先註冊的混合設計眾包實驗，通過定量指標和人類評估的結合，確定生成具有說服力的對話反制的配置。結果顯示，情境化的對話反制在適當性和說服力方面可以顯著優於最先進的通用對話反制，而不會影響其他特徵。我們的研究還揭示了定量指標和人類評估之間的差異，表明這些方法評估了不同的方面，並突出了對細緻評估方法的需求。情境化AI生成的對話反制的有效性以及人類和算法評估之間的分歧突顯了在內容調節中增加人工智能與人類合作的重要性。

English

AI-generated counterspeech offers a promising and scalable strategy to curb online toxicity through direct replies that promote civil discourse. However, current counterspeech is one-size-fits-all, lacking adaptation to the moderation context and the users involved. We propose and evaluate multiple strategies for generating tailored counterspeech that is adapted to the moderation context and personalized for the moderated user. We instruct an LLaMA2-13B model to generate counterspeech, experimenting with various configurations based on different contextual information and fine-tuning strategies. We identify the configurations that generate persuasive counterspeech through a combination of quantitative indicators and human evaluations collected via a pre-registered mixed-design crowdsourcing experiment. Results show that contextualized counterspeech can significantly outperform state-of-the-art generic counterspeech in adequacy and persuasiveness, without compromising other characteristics. Our findings also reveal a poor correlation between quantitative indicators and human evaluations, suggesting that these methods assess different aspects and highlighting the need for nuanced evaluation methodologies. The effectiveness of contextualized AI-generated counterspeech and the divergence between human and algorithmic evaluations underscore the importance of increased human-AI collaboration in content moderation.

情境化對抗言論：適應、個人化和評估策略

Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation

摘要

Support