利用自然語言監督實現跨模態知識轉移
Knowledge Transfer Across Modalities with Natural Language Supervision
November 23, 2024
作者: Carlo Alberto Barbano, Luca Molinaro, Emanuele Aiello, Marco Grangetto
cs.AI
摘要
我們提出了一種僅使用文字描述來學習新概念的方法。我們稱之為知識轉移。類似於人類感知,我們利用跨模態互動來引入新概念。我們假設在預先訓練的視覺編碼器中已經學到足夠的低級特徵(例如形狀、外觀、顏色),可以用來描述先前未知的高級概念。根據新概念的文字描述,我們的方法通過將視覺編碼器的已知低級特徵與其高級文字描述對齊來工作。我們展示了知識轉移可以成功地在多模態模型中引入新概念,而且非常高效,只需要對目標概念進行一次描述。我們的方法與分開的文字和視覺編碼器(例如CLIP)以及跨模態共享參數兼容。我們還展示了,遵循相同原則,知識轉移可以改善模型已知的概念。通過利用知識轉移,我們提高了跨不同任務的零樣本性能,例如分類、分割、圖像-文本檢索和字幕生成。
English
We present a way to learn novel concepts by only using their textual
description. We call this method Knowledge Transfer. Similarly to human
perception, we leverage cross-modal interaction to introduce new concepts. We
hypothesize that in a pre-trained visual encoder there are enough low-level
features already learned (e.g. shape, appearance, color) that can be used to
describe previously unknown high-level concepts. Provided with a textual
description of the novel concept, our method works by aligning the known
low-level features of the visual encoder to its high-level textual description.
We show that Knowledge Transfer can successfully introduce novel concepts in
multimodal models, in a very efficient manner, by only requiring a single
description of the target concept. Our approach is compatible with both
separate textual and visual encoders (e.g. CLIP) and shared parameters across
modalities. We also show that, following the same principle, Knowledge Transfer
can improve concepts already known by the model. Leveraging Knowledge Transfer
we improve zero-shot performance across different tasks such as classification,
segmentation, image-text retrieval, and captioning.Summary
AI-Generated Summary