您是否物有所值?大型语言模型API中的模型替换审计
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
April 7, 2025
作者: Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song
cs.AI
摘要
大型語言模型(LLMs)通過黑箱API的廣泛應用,帶來了一個重大的信任挑戰:用戶基於廣告宣傳的模型能力(如規模、性能)支付服務費用,但提供商可能暗中將指定模型替換為成本更低、質量較差的替代品,以降低運營成本。這種透明度的缺失損害了公平性,削弱了信任,並使可靠的基準測試變得複雜。由於黑箱性質,檢測此類替換十分困難,通常僅限於輸入輸出查詢。本文正式定義了LLM API中的模型替換檢測問題。我們系統地評估了現有的驗證技術,包括基於輸出的統計測試、基準評估和對數概率分析,在各種現實攻擊場景下(如模型量化、隨機替換和基準規避)的表現。我們的研究揭示了僅依賴文本輸出方法的侷限性,尤其是在面對微妙或自適應攻擊時。雖然對數概率分析在可用時提供了更強的保證,但其可訪問性往往受限。最後,我們討論了基於硬件的解決方案(如可信執行環境TEEs)作為實現可證明模型完整性的潛在途徑,強調了安全性、性能和提供商採用之間的權衡。代碼可在https://github.com/sunblaze-ucb/llm-api-audit獲取。
English
The proliferation of Large Language Models (LLMs) accessed via black-box APIs
introduces a significant trust challenge: users pay for services based on
advertised model capabilities (e.g., size, performance), but providers may
covertly substitute the specified model with a cheaper, lower-quality
alternative to reduce operational costs. This lack of transparency undermines
fairness, erodes trust, and complicates reliable benchmarking. Detecting such
substitutions is difficult due to the black-box nature, typically limiting
interaction to input-output queries. This paper formalizes the problem of model
substitution detection in LLM APIs. We systematically evaluate existing
verification techniques, including output-based statistical tests, benchmark
evaluations, and log probability analysis, under various realistic attack
scenarios like model quantization, randomized substitution, and benchmark
evasion. Our findings reveal the limitations of methods relying solely on text
outputs, especially against subtle or adaptive attacks. While log probability
analysis offers stronger guarantees when available, its accessibility is often
limited. We conclude by discussing the potential of hardware-based solutions
like Trusted Execution Environments (TEEs) as a pathway towards provable model
integrity, highlighting the trade-offs between security, performance, and
provider adoption. Code is available at
https://github.com/sunblaze-ucb/llm-api-auditSummary
AI-Generated Summary