您是否物有所值?大语言模型API中的模型替换审计
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
April 7, 2025
作者: Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song
cs.AI
摘要
大型语言模型(LLMs)通过黑箱API的广泛普及,带来了一个显著的信任挑战:用户基于广告宣传的模型能力(如规模、性能)支付服务费用,但提供商可能暗中将指定模型替换为成本更低、质量较差的替代品以降低运营成本。这种透明度的缺失损害了公平性,削弱了信任,并使可靠的基准测试变得复杂。由于黑箱特性,检测此类替换十分困难,通常仅限于输入输出查询的交互。本文正式定义了LLM API中的模型替换检测问题。我们系统评估了现有的验证技术,包括基于输出的统计测试、基准评估和日志概率分析,在多种现实攻击场景下(如模型量化、随机替换和基准规避)的表现。研究发现,仅依赖文本输出的方法在面对微妙或适应性攻击时存在明显局限。尽管日志概率分析在可用时提供了更强的保证,但其可访问性往往受限。最后,我们探讨了基于硬件的解决方案(如可信执行环境TEEs)作为实现可证明模型完整性的潜在途径,强调了安全性、性能与提供商采纳之间的权衡。代码可在https://github.com/sunblaze-ucb/llm-api-audit获取。
English
The proliferation of Large Language Models (LLMs) accessed via black-box APIs
introduces a significant trust challenge: users pay for services based on
advertised model capabilities (e.g., size, performance), but providers may
covertly substitute the specified model with a cheaper, lower-quality
alternative to reduce operational costs. This lack of transparency undermines
fairness, erodes trust, and complicates reliable benchmarking. Detecting such
substitutions is difficult due to the black-box nature, typically limiting
interaction to input-output queries. This paper formalizes the problem of model
substitution detection in LLM APIs. We systematically evaluate existing
verification techniques, including output-based statistical tests, benchmark
evaluations, and log probability analysis, under various realistic attack
scenarios like model quantization, randomized substitution, and benchmark
evasion. Our findings reveal the limitations of methods relying solely on text
outputs, especially against subtle or adaptive attacks. While log probability
analysis offers stronger guarantees when available, its accessibility is often
limited. We conclude by discussing the potential of hardware-based solutions
like Trusted Execution Environments (TEEs) as a pathway towards provable model
integrity, highlighting the trade-offs between security, performance, and
provider adoption. Code is available at
https://github.com/sunblaze-ucb/llm-api-auditSummary
AI-Generated Summary