We propose a system for visual scene analysis and recognition based on encoding the sparse, latent feature-representation of an image into a high-dimensional vector that is subsequently factorized to parse scene content. The sparse feature representation is learned from image statistics via convolutional sparse coding, while scene parsing is performed by a resonator network. The integration of sparse coding with the resonator network increases the capacity of distributed representations and reduces collisions in the combinatorial search space during factorization. We find that for this problem the resonator network is capable of fast and accurate vector factorization, and we develop a confidence-based metric that assists in tracking the convergence of the resonator network.
翻译:我们提出一种视觉场景分析与识别系统:首先将图像的稀疏隐式特征表示编码为高维向量,再通过分解该向量来解析场景内容。稀疏特征表示通过卷积稀疏编码从图像统计特征中学习获得,场景解析则由谐振网络完成。稀疏编码与谐振网络的融合增强了分布式表示的容量,同时降低了分解过程中组合搜索空间的碰撞概率。实验表明,谐振网络在该问题中可实现快速且精确的向量分解,我们据此开发了基于置信度的度量指标,用于辅助追踪谐振网络的收敛过程。