Factorizers for Distributed Sparse Block Codes

Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-with vectors. One major challenge however is to disentangle, or factorize, such data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when queried by noisy SBCs wherein symbol representations are relaxed due to perceptual uncertainty and approximations made when modern neural networks are used to generate the query vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, a conditional random sampling, and an $\ell_\infty$-based similarity metric. Its random sampling mechanism in combination with the search in superposition allows to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC's bundling capacity. Secondly, the proposed factorizer maintains its high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F-factor codebooks, each with $\sqrt[\leftroot{-2}\uproot{2}F]{C}$ fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are significantly reduced compared to the FCL.

翻译：分布式稀疏块码通过固定维度的向量以紧凑的方式编码和操作符号数据结构，但核心挑战之一在于如何从这些数据结构中解构出（即因式分解）其组成元素，而无需遍历所有可能的组合。当查询向量因感知不确定性或现代神经网络生成近似表示而产生噪声时，这种因式分解的难度进一步加剧。为解决这些问题，本文首先提出一种快速高精度的因式分解方法，适用于更灵活且泛化的稀疏块码形式（GSBC）。我们的迭代因式分解引入基于阈值的非线性激活、条件随机采样及ℓ∞范数相似度度量。其随机采样机制与叠加搜索相结合，可解析确定预期的解码迭代次数，该结果与GSBC捆绑容量下的实验观测一致。其次，当使用深度卷积神经网络生成的噪声乘积向量进行查询时，所提因式分解方法仍能保持高精度。这使其可替代CNN中的大型全连接层：由F个因式码本（每个码本包含√[F]C个固定码向量）构成的因式分解器可隐式表示C个可训练类别向量（或属性组合）。我们提供了一种方法论，通过新颖的损失函数将因式分解器灵活集成至CNN分类层。在CIFAR-100、ImageNet-1K和RAVEN数据集上基于四种深度CNN架构的实验验证了该方法的可行性。所有应用案例中，参数数量和运算量均较全连接层显著降低。