전문가 집합에서 사용자 프롬프트 도용

초록

전문가들의 혼합 (Mixture-of-Experts, MoE) 모델은 각 토큰을 각 층의 소수의 전문가에 경로 지정하여 밀집 언어 모델의 효율성과 확장성을 향상시킵니다. 본 논문에서는 피해자의 쿼리가 피해자의 쿼리와 동일한 예제 배치에 나타나도록 배열할 수 있는 적대적 요소가 전문가 선택 라우팅을 악용하여 피해자의 프롬프트를 완전히 노출시킬 수 있다는 것을 보여줍니다. 우리는 torch.topk CUDA 구현의 tie-handling 동작을 악용하여 두 층의 Mixtral 모델에서 이 공격의 효과를 성공적으로 시연했습니다. 결과는 우리가 고려하는 설정에서 O({VM}^2) 쿼리(어휘 크기 V 및 프롬프트 길이 M) 또는 토큰 당 평균 100개의 쿼리를 사용하여 전체 프롬프트를 추출할 수 있다는 것을 보여줍니다. 이는 사용자 프롬프트를 추출하기 위해 구조적 결함을 악용하는 첫 번째 공격으로, 새로운 LLM 취약점 클래스를 소개합니다.

English

Mixture-of-Experts (MoE) models improve the efficiency and scalability of dense language models by routing each token to a small number of experts in each layer. In this paper, we show how an adversary that can arrange for their queries to appear in the same batch of examples as a victim's queries can exploit Expert-Choice-Routing to fully disclose a victim's prompt. We successfully demonstrate the effectiveness of this attack on a two-layer Mixtral model, exploiting the tie-handling behavior of the torch.topk CUDA implementation. Our results show that we can extract the entire prompt using O({VM}^2) queries (with vocabulary size V and prompt length M) or 100 queries on average per token in the setting we consider. This is the first attack to exploit architectural flaws for the purpose of extracting user prompts, introducing a new class of LLM vulnerabilities.

전문가 집합에서 사용자 프롬프트 도용

Stealing User Prompts from Mixture of Experts

초록

Support