我们是否已完成面向对象的学习？

摘要

对象中心学习（OCL）旨在学习仅编码单个对象的表示，将其从场景中的其他对象或背景线索中分离出来。这一方法支撑了多种目标，包括分布外（OOD）泛化、样本高效组合以及结构化环境建模。大多数研究集中于开发无监督机制，将对象分离至表示空间中的离散槽位，并通过无监督对象发现进行评估。然而，随着近期样本高效分割模型的出现，我们能够在像素空间中分离对象并独立编码它们。这在OOD对象发现基准测试中实现了显著的零样本性能，可扩展至基础模型，并能开箱即用地处理可变数量的槽位。因此，OCL方法获取对象中心表示的目标已基本达成。尽管取得这些进展，一个关键问题依然存在：在场景中分离对象的能力如何促进更广泛的OCL目标，如OOD泛化？我们通过OCL视角探讨由虚假背景线索引发的OOD泛化挑战，提出了一种无需训练的新型探针——应用掩码的对象中心分类（OCCAM），证明基于分割的单个对象编码显著优于基于槽位的OCL方法。然而，实际应用中的挑战依然存在。我们为OCL社区提供了使用可扩展对象中心表示的工具箱，并聚焦于实际应用与基础问题，如理解人类认知中的对象感知。我们的代码可在https://github.com/AlexanderRubinstein/OCCAM获取。

English

Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene. This approach underpins various aims, including out-of-distribution (OOD) generalization, sample-efficient composition, and modeling of structured environments. Most research has focused on developing unsupervised mechanisms that separate objects into discrete slots in the representation space, evaluated using unsupervised object discovery. However, with recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently. This achieves remarkable zero-shot performance on OOD object discovery benchmarks, is scalable to foundation models, and can handle a variable number of slots out-of-the-box. Hence, the goal of OCL methods to obtain object-centric representations has been largely achieved. Despite this progress, a key question remains: How does the ability to separate objects within a scene contribute to broader OCL objectives, such as OOD generalization? We address this by investigating the OOD generalization challenge caused by spurious background cues through the lens of OCL. We propose a novel, training-free probe called Object-Centric Classification with Applied Masks (OCCAM), demonstrating that segmentation-based encoding of individual objects significantly outperforms slot-based OCL methods. However, challenges in real-world applications remain. We provide the toolbox for the OCL community to use scalable object-centric representations, and focus on practical applications and fundamental questions, such as understanding object perception in human cognition. Our code is available https://github.com/AlexanderRubinstein/OCCAM{here}.

我们是否已完成面向对象的学习？

Are We Done with Object-Centric Learning?

摘要

Summary

Support

Support