從專家混合模型中竊取使用者提示

摘要

Mixture-of-Experts (MoE) 模型通過將每個標記路由到每一層中的少數專家，提高了密集語言模型的效率和可擴展性。在本文中，我們展示了一種對手可以安排他們的查詢出現在與受害者查詢相同的示例批次中，從而利用專家選擇路由來完全揭示受害者提示的方法。我們成功地展示了這種攻擊對一個雙層 Mixtral 模型的有效性，利用了 torch.topk CUDA 實現的處理並列情況的行為。我們的結果顯示，在我們考慮的情況下，我們可以使用 O({VM}^2) 個查詢（其中詞彙大小 V 和提示長度 M）或平均每個標記 100 個查詢來提取整個提示。這是第一個利用架構缺陷來提取用戶提示的攻擊，引入了一類新的大型語言模型漏洞。

English

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using O({VM}^2) queries (with vocabulary size V and prompt length M) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

從專家混合模型中竊取使用者提示

Stealing User Prompts from Mixture of Experts

摘要

Summary

Support

Support