利用自然语言监督进行跨模态知识迁移
Knowledge Transfer Across Modalities with Natural Language Supervision
November 23, 2024
作者: Carlo Alberto Barbano, Luca Molinaro, Emanuele Aiello, Marco Grangetto
cs.AI
摘要
我们提出了一种通过仅使用文本描述来学习新概念的方法。我们将这种方法称为知识迁移。类似于人类感知,我们利用跨模态交互来引入新概念。我们假设在预先训练的视觉编码器中已经学习了足够的低级特征(例如形状、外观、颜色),可以用来描述先前未知的高级概念。通过提供新概念的文本描述,我们的方法通过将视觉编码器的已知低级特征与其高级文本描述对齐来运作。我们展示了知识迁移可以通过仅需要目标概念的单一描述,以非常高效的方式在多模态模型中成功引入新概念。我们的方法既适用于独立的文本和视觉编码器(例如CLIP),也适用于跨模态共享参数。我们还展示了,遵循相同原则,知识迁移可以改善模型已知的概念。通过利用知识迁移,我们提高了跨不同任务的零样本性能,例如分类、分割、图像文本检索和字幕生成。
English
We present a way to learn novel concepts by only using their textual
description. We call this method Knowledge Transfer. Similarly to human
perception, we leverage cross-modal interaction to introduce new concepts. We
hypothesize that in a pre-trained visual encoder there are enough low-level
features already learned (e.g. shape, appearance, color) that can be used to
describe previously unknown high-level concepts. Provided with a textual
description of the novel concept, our method works by aligning the known
low-level features of the visual encoder to its high-level textual description.
We show that Knowledge Transfer can successfully introduce novel concepts in
multimodal models, in a very efficient manner, by only requiring a single
description of the target concept. Our approach is compatible with both
separate textual and visual encoders (e.g. CLIP) and shared parameters across
modalities. We also show that, following the same principle, Knowledge Transfer
can improve concepts already known by the model. Leveraging Knowledge Transfer
we improve zero-shot performance across different tasks such as classification,
segmentation, image-text retrieval, and captioning.Summary
AI-Generated Summary