Understanding a visual scene by inferring identities and poses of its individual objects is still and open problem. Here we propose a neuromorphic solution that utilizes an efficient factorization network based on three key concepts: (1) a computational framework based on Vector Symbolic Architectures (VSA) with complex-valued vectors; (2) the design of Hierarchical Resonator Networks (HRN) to deal with the non-commutative nature of translation and rotation in visual scenes, when both are used in combination; (3) the design of a multi-compartment spiking phasor neuron model for implementing complex-valued resonator networks on neuromorphic hardware. The VSA framework uses vector binding operations to produce generative image models in which binding acts as the equivariant operation for geometric transformations. A scene can therefore be described as a sum of vector products, which in turn can be efficiently factorized by a resonator network to infer objects and their poses. The HRN enables the definition of a partitioned architecture in which vector binding is equivariant for horizontal and vertical translation within one partition and for rotation and scaling within the other partition. The spiking neuron model allows mapping the resonator network onto efficient and low-power neuromorphic hardware. Our approach is demonstrated on synthetic scenes composed of simple 2D shapes undergoing rigid geometric transformations and color changes. A companion paper demonstrates the same approach in real-world application scenarios for machine vision and robotics.
翻译:理解一个视觉场景,通过推断其中各个物体的身份和姿态,至今仍是一个开放性问题。本文提出一种神经形态解决方案,该方案利用基于三个核心概念的高效因子分解网络:(1)基于复数向量的向量符号架构(VSA)计算框架;(2)设计分层谐振网络(HRN)以处理视觉场景中平移和旋转在结合使用时具有的非交换性;(3)设计多隔室脉冲相位神经元模型,用于在神经形态硬件上实现复数谐振网络。VSA框架利用向量绑定操作生成生成式图像模型,其中绑定作为几何变换的等变操作。因此,场景可描述为向量乘积之和,进而可通过谐振网络高效因子分解以推断物体及其姿态。HRN支持定义分区架构,其中一个分区内向量绑定对水平和垂直平移具有等变性,另一分区内对旋转和缩放具有等变性。脉冲神经元模型使得谐振网络能够映射到高效低功耗的神经形态硬件上。我们的方法在由简单二维形状组成的合成场景上进行了验证,这些形状经历刚性几何变换和颜色变化。配套论文在机器视觉和机器人技术的实际应用场景中演示了相同方法。