Developing high-performing, yet interpretable models remains a critical challenge in modern AI. Concept-based models (CBMs) attempt to address this by extracting human-understandable concepts from a global encoding (e.g., image encoding) and then applying a linear classifier on the resulting concept activations, enabling transparent decision-making. However, their reliance on holistic image encodings limits their expressiveness in object-centric real-world settings and thus hinders their ability to solve complex vision tasks beyond single-label classification. To tackle these challenges, we introduce Object-Centric Concept Bottlenecks (OCB), a framework that combines the strengths of CBMs and pre-trained object-centric foundation models, boosting performance and interpretability. We evaluate OCB on complex image datasets and conduct a comprehensive ablation study to analyze key components of the framework, such as strategies for aggregating object-concept encodings. The results show that OCB outperforms traditional CBMs and allows one to make interpretable decisions for complex visual tasks.
翻译:开发高性能且可解释的模型仍然是现代人工智能领域的一个关键挑战。基于概念的模型试图通过从全局编码(例如图像编码)中提取人类可理解的概念,然后在生成的概念激活上应用线性分类器来解决这一问题,从而实现透明的决策。然而,它们对整体图像编码的依赖限制了其在面向对象的现实世界场景中的表达能力,从而阻碍了其解决超越单标签分类的复杂视觉任务的能力。为了应对这些挑战,我们引入了面向对象的中心概念瓶颈,这是一个结合了基于概念的模型与预训练的面向对象基础模型优势的框架,旨在提升性能和可解释性。我们在复杂的图像数据集上评估了该框架,并进行了全面的消融研究,以分析框架的关键组成部分,例如聚合对象-概念编码的策略。结果表明,该框架优于传统的基于概念的模型,并允许人们对复杂的视觉任务做出可解释的决策。