ChatPaper.aiChatPaper

使用通用多提示进行越狱

Jailbreaking with Universal Multi-Prompts

February 3, 2025
作者: Yu-Ling Hsu, Hsuan Su, Shang-Tse Chen
cs.AI

摘要

近年来,大型语言模型(LLMs)取得了快速发展,彻底改变了各种应用,并显著提高了便利性和生产率。然而,除了它们令人印象深刻的能力之外,也出现了伦理关切和新型攻击,如越狱。虽然大多数提示技术侧重于为个别情况优化对抗性输入,导致在处理大型数据集时产生更高的计算成本。较少的研究涉及训练通用攻击者的更一般设置,该攻击者可以转移到未见任务。在本文中,我们介绍了JUMP,这是一种基于提示的方法,旨在使用通用多提示来越狱LLMs。我们还将我们的方法调整为防御,我们称之为DUMP。实验结果表明,我们优化通用多提示的方法胜过现有技术。
English
Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While most prompting techniques focus on optimizing adversarial inputs for individual cases, resulting in higher computational costs when dealing with large datasets. Less research has addressed the more general setting of training a universal attacker that can transfer to unseen tasks. In this paper, we introduce JUMP, a prompt-based method designed to jailbreak LLMs using universal multi-prompts. We also adapt our approach for defense, which we term DUMP. Experimental results demonstrate that our method for optimizing universal multi-prompts outperforms existing techniques.

Summary

AI-Generated Summary

PDF92February 6, 2025