您是否物有所值？大型语言模型API中的模型替换审计

摘要

大型語言模型（LLMs）通過黑箱API的廣泛應用，帶來了一個重大的信任挑戰：用戶基於廣告宣傳的模型能力（如規模、性能）支付服務費用，但提供商可能暗中將指定模型替換為成本更低、質量較差的替代品，以降低運營成本。這種透明度的缺失損害了公平性，削弱了信任，並使可靠的基準測試變得複雜。由於黑箱性質，檢測此類替換十分困難，通常僅限於輸入輸出查詢。本文正式定義了LLM API中的模型替換檢測問題。我們系統地評估了現有的驗證技術，包括基於輸出的統計測試、基準評估和對數概率分析，在各種現實攻擊場景下（如模型量化、隨機替換和基準規避）的表現。我們的研究揭示了僅依賴文本輸出方法的侷限性，尤其是在面對微妙或自適應攻擊時。雖然對數概率分析在可用時提供了更強的保證，但其可訪問性往往受限。最後，我們討論了基於硬件的解決方案（如可信執行環境TEEs）作為實現可證明模型完整性的潛在途徑，強調了安全性、性能和提供商採用之間的權衡。代碼可在https://github.com/sunblaze-ucb/llm-api-audit獲取。

English

The proliferation of Large Language Models (LLMs) accessed via black-box APIs introduces a significant trust challenge: users pay for services based on advertised model capabilities (e.g., size, performance), but providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs. This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking. Detecting such substitutions is difficult due to the black-box nature, typically limiting interaction to input-output queries. This paper formalizes the problem of model substitution detection in LLM APIs. We systematically evaluate existing verification techniques, including output-based statistical tests, benchmark evaluations, and log probability analysis, under various realistic attack scenarios like model quantization, randomized substitution, and benchmark evasion. Our findings reveal the limitations of methods relying solely on text outputs, especially against subtle or adaptive attacks. While log probability analysis offers stronger guarantees when available, its accessibility is often limited. We conclude by discussing the potential of hardware-based solutions like Trusted Execution Environments (TEEs) as a pathway towards provable model integrity, highlighting the trade-offs between security, performance, and provider adoption. Code is available at https://github.com/sunblaze-ucb/llm-api-audit

您是否物有所值？大型语言模型API中的模型替换审计

Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs

摘要

Summary

Support

Support