Organized Grouped Discrete Representation for Object-Centric Learning

Object-Centric Learning (OCL) represents dense image or video pixels as sparse object features. Representative methods utilize discrete representation composed of Variational Autoencoder (VAE) template features to suppress pixel-level information redundancy and guide object-level feature aggregation. The most recent advancement, Grouped Discrete Representation (GDR), further decomposes these template features into attributes. However, its naive channel grouping as decomposition may erroneously group channels belonging to different attributes together and discretize them as sub-optimal template attributes, which losses information and harms expressivity. We propose Organized GDR (OGDR) to organize channels belonging to the same attributes together for correct decomposition from features into attributes. In unsupervised segmentation experiments, OGDR is fully superior to GDR in augmentating classical transformer-based OCL methods; it even improves state-of-the-art diffusion-based ones. Codebook PCA and representation similarity analyses show that compared with GDR, our OGDR eliminates redundancy and preserves information better for guiding object representation learning. The source code is available in the supplementary material.

翻译：物体中心学习（OCL）将稠密的图像或视频像素表示为稀疏的物体特征。代表性方法利用由变分自编码器（VAE）模板特征构成的离散表示来抑制像素级信息冗余并引导物体级特征聚合。最新的进展——分组离散表示（GDR）——进一步将这些模板特征分解为属性。然而，其简单的通道分组作为分解方式可能错误地将属于不同属性的通道分组在一起，并将其离散化为次优的模板属性，从而导致信息丢失并损害表达能力。我们提出了结构化GDR（OGDR），以将属于相同属性的通道组织在一起，实现从特征到属性的正确分解。在无监督分割实验中，OGDR在增强经典的基于Transformer的OCL方法方面完全优于GDR；它甚至改进了最先进的基于扩散的方法。码本PCA和表示相似性分析表明，与GDR相比，我们的OGDR消除了冗余并更好地保留了信息，从而更有效地引导物体表示学习。源代码可在补充材料中获取。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日