我們是否已完成物件中心學習？

摘要

物件中心學習（Object-centric learning, OCL）旨在學習僅編碼單一物件的表徵，將其與場景中的其他物件或背景線索隔離。此方法支撐了多種目標，包括分佈外（out-of-distribution, OOD）泛化、樣本高效組合以及結構化環境的建模。多數研究聚焦於開發無監督機制，將物件分離至表徵空間中的離散槽位，並透過無監督物件發現進行評估。然而，隨著近期樣本高效分割模型的出現，我們能在像素空間中分離物件並獨立編碼它們。這在OOD物件發現基準測試中實現了顯著的零樣本性能，可擴展至基礎模型，並能開箱即用地處理可變數量的槽位。因此，OCL方法獲取物件中心表徵的目標已大體達成。儘管取得進展，一個關鍵問題依然存在：在場景中分離物件的能力如何貢獻於更廣泛的OCL目標，如OOD泛化？我們透過OCL的視角探討由虛假背景線索引起的OOD泛化挑戰來回答此問題。我們提出了一種名為「應用遮罩的物件中心分類」（Object-Centric Classification with Applied Masks, OCCAM）的新穎、無需訓練的探針，證明基於分割的單一物件編碼顯著優於基於槽位的OCL方法。然而，實際應用中的挑戰依然存在。我們為OCL社群提供了使用可擴展物件中心表徵的工具箱，並聚焦於實際應用與基礎問題，如理解人類認知中的物件感知。我們的程式碼可在https://github.com/AlexanderRubinstein/OCCAM{此處}獲取。

English

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization, sample-efficient composition, and modeling of structured environments. Most research has focused on developing unsupervised mechanisms that separate objects into discrete slots in the representation space, evaluated using unsupervised object discovery. However, with recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. This achieves remarkable zero-shot performance on OOD object discovery benchmarks, is scalable to foundation models, and can handle a variable number of slots out-of-the-box. Hence, the goal of OCL methods to obtain object-centric representations has been largely achieved. Despite this progress, a key question remains: How does the ability to separate objects within a scene contribute to broader OCL objectives, such as OOD generalization? We address this by investigating the OOD generalization challenge caused by spurious background cues through the lens of OCL. We propose a novel, training-free probe called Object-Centric Classification with Applied Masks (OCCAM), demonstrating that segmentation-based encoding of individual objects significantly outperforms slot-based OCL methods. However, challenges in real-world applications remain. We provide the toolbox for the OCL community to use scalable object-centric representations, and focus on practical applications and fundamental questions, such as understanding object perception in human cognition. Our code is available https://github.com/AlexanderRubinstein/OCCAM{here}.

我們是否已完成物件中心學習？

Are We Done with Object-Centric Learning?

摘要

Summary

Support

Support