Despite tremendous progress over the past decade, deep learning methods generally fall short of human-level systematic generalization. It has been argued that explicitly capturing the underlying structure of data should allow connectionist systems to generalize in a more predictable and systematic manner. Indeed, evidence in humans suggests that interpreting the world in terms of symbol-like compositional entities may be crucial for intelligent behavior and high-level reasoning. Another common limitation of deep learning systems is that they require large amounts of training data, which can be expensive to obtain. In representation learning, large datasets are leveraged to learn generic data representations that may be useful for efficient learning of arbitrary downstream tasks. This thesis is about structured representation learning. We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure. In the first part of the thesis, we focus on representations that disentangle the explanatory factors of variation of the data. We scale up disentangled representation learning to a novel robotic dataset, and perform a systematic large-scale study on the role of pretrained representations for out-of-distribution generalization in downstream robotic tasks. The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities, such as objects in visual scenes. Object-centric learning methods learn to form meaningful entities from unstructured input, enabling symbolic information processing on a connectionist substrate. In this study, we train a selection of methods on several common datasets, and investigate their usefulness for downstream tasks and their ability to generalize out of distribution.
翻译:尽管过去十年取得了巨大进展,但深度学习方法在人类水平的系统性泛化能力上仍存在明显不足。现有研究表明,显式捕捉数据的底层结构可使联结主义系统以更可预测和系统化的方式进行泛化。事实上,人类认知证据显示,以类符号组合实体(symbol-like compositional entities)的方式解读世界,可能是实现智能行为与高层推理的关键。深度学习系统的另一个普遍局限在于其需要海量训练数据,而获取这类数据往往成本高昂。在表征学习中,大规模数据集被用于学习通用数据表征,从而支持任意下游任务的高效学习。本论文专注于结构化表征学习领域,研究如何在极少或无监督条件下,从非结构化数据中学习能捕捉其隐藏结构的表征方法。论文第一部分聚焦于解耦数据变异解释因子(explanatory factors of variation)的表征。我们将解耦表征学习扩展至新型机器人数据集,并通过系统性大规模实验,探究预训练表征在机器人下游任务分布外泛化(out-of-distribution generalization)中的作用。第二部分聚焦以对象为中心的表征(object-centric representations),这类表征通过类符号实体(如视觉场景中的物体)捕捉输入的组合结构。对象中心学习方法能从非结构化输入中学习构建有意义的实体,从而在联结主义基质(connectionist substrate)上实现符号信息处理。本研究选取多种方法在多个常用数据集上进行训练,系统评估其在下游任务中的实用性及分布外泛化能力。