Object-Centric Learning (OCL) represents dense image or video pixels as sparse object features. Representative methods utilize discrete representation composed of Variational Autoencoder (VAE) template features to suppress pixel-level information redundancy and guide object-level feature aggregation. The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes. However, its naive channel grouping as decomposition may erroneously group channels belonging to different attributes together and discretize them as sub-optimal template attributes, which losses information and harms expressivity. We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes. In unsupervised segmentation experiments, OGDR is fully superior to GDR in augmentating classical transformer-based OCL methods; it even improves state-of-the-art diffusion-based ones. Codebook PCA and representation similarity analyses show that compared with GDR, our OGDR eliminates redundancy and preserves information better for guiding object representation learning. The source code is available in the supplementary material.
翻译:物体中心学习(OCL)将稠密的图像或视频像素表示为稀疏的物体特征。代表性方法利用由变分自编码器(VAE)模板特征构成的离散表示来抑制像素级信息冗余并引导物体级特征聚合。最新的进展——分组离散表示(GDR)——进一步将这些模板特征分解为属性。然而,其简单的通道分组作为分解方式可能错误地将属于不同属性的通道分组在一起,并将其离散化为次优的模板属性,从而导致信息丢失并损害表达能力。我们提出了结构化GDR(OGDR),以将属于相同属性的通道组织在一起,实现从特征到属性的正确分解。在无监督分割实验中,OGDR在增强经典的基于Transformer的OCL方法方面完全优于GDR;它甚至改进了最先进的基于扩散的方法。码本PCA和表示相似性分析表明,与GDR相比,我们的OGDR消除了冗余并更好地保留了信息,从而更有效地引导物体表示学习。源代码可在补充材料中获取。