We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is well-studied as dictionary learning and factor analysis. In this work, we propose a combinatorial model in which to study this question, motivated by the way objects occlude each other in a scene to form an image. First, we identify a property we call "well-structuredness" of a set of low-dimensional components which ensures that no two components in the set are too similar. We show how well-structuredness is sufficient for learning the set of latent components comprising a set of sample instances. We then consider the problem: given a set of components and an instance generated from some unknown subset of them, identify which parts of the instance arise from which components. We consider two variants: (1) determine the minimal number of components required to explain the instance; (2) determine the correct explanation for as many locations as possible. For the latter goal, we also devise a version that is robust to adversarial corruptions, with just a slightly stronger assumption on the components. Finally, we show that the learning problem is computationally infeasible in the absence of any assumptions.
翻译:我们常常关注将复杂结构化数据分解为解释数据的简单成分。该问题的线性形式作为字典学习和因子分析已得到充分研究。在本工作中,我们提出研究该问题的组合模型,其动机源于场景中物体相互遮挡形成图像的方式。首先,我们定义了一组低维成分的"良构性"属性,确保集合中任意两个成分不会过于相似。我们证明良构性足以学习构成样本实例集的潜在成分集合。随后我们研究以下问题:给定一组成分及由其中未知子集生成的实例,如何识别实例中哪些部分源于哪些成分。我们考虑两种变体:(1) 确定解释实例所需的最少成分数量;(2) 为尽可能多的位置确定正确解释。针对后一目标,我们设计了对抗性干扰的鲁棒版本,仅需对成分稍强的假设。最后,我们证明在没有任何假设的情况下,学习问题在计算上是不可行的。