从专家混合模型中窃取用户提示

摘要

混合专家（MoE）模型通过将每个标记路由到每一层中的少数专家来提高密集语言模型的效率和可伸缩性。在本文中，我们展示了一个对手可以安排他们的查询出现在与受害者查询相同的示例批次中，从而利用专家选择路由来完全披露受害者提示的方法。我们成功地在一个两层Mixtral模型上展示了这种攻击的有效性，利用了torch.topk CUDA实现的处理并列情况的行为。我们的结果表明，在我们考虑的设置中，我们可以使用O（{VM}^2）个查询（其中词汇量V和提示长度M）或平均每个标记使用100个查询来提取整个提示。这是第一个利用架构缺陷来提取用户提示的攻击，引入了一类新的LLM漏洞。

English

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using O({VM}^2) queries (with vocabulary size V and prompt length M) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

从专家混合模型中窃取用户提示

Stealing User Prompts from Mixture of Experts

摘要

Summary

Support

Support