告诉我原因：视觉基础模型作为自解释分类器

摘要

视觉基础模型（VFMs）因其卓越的性能而日益受到关注。然而，在关键应用中，可解释性仍然至关重要。自解释模型（SEM）旨在提供可解释的分类器，将预测分解为可解释概念的加权和。尽管前景广阔，但近期研究表明这些解释往往缺乏忠实性。在本研究中，我们将VFMs与一种新颖的原型架构及专门的训练目标相结合。通过在冻结的VFMs之上仅训练一个轻量级头部（约100万参数），我们的方法（ProtoFM）提供了一种高效且可解释的解决方案。评估结果表明，我们的方法在保持竞争力的分类性能的同时，在一系列源自文献的可解释性指标上超越了现有模型。代码可在https://github.com/hturbe/proto-fm获取。

English

Visual foundation models (VFMs) have become increasingly popular due to their state-of-the-art performance. However, interpretability remains crucial for critical applications. In this sense, self-explainable models (SEM) aim to provide interpretable classifiers that decompose predictions into a weighted sum of interpretable concepts. Despite their promise, recent studies have shown that these explanations often lack faithfulness. In this work, we combine VFMs with a novel prototypical architecture and specialized training objectives. By training only a lightweight head (approximately 1M parameters) on top of frozen VFMs, our approach (ProtoFM) offers an efficient and interpretable solution. Evaluations demonstrate that our approach achieves competitive classification performance while outperforming existing models across a range of interpretability metrics derived from the literature. Code is available at https://github.com/hturbe/proto-fm.

告诉我原因：视觉基础模型作为自解释分类器

Tell me why: Visual foundation models as self-explainable classifiers

摘要

Summary

Support

Support