Efficient Vector Symbolic Architectures from Histogram Recovery

Vector symbolic architectures (VSAs) are a family of information representation techniques which enable composition, i.e., creating complex information structures from atomic vectors via binding and superposition, and have recently found wide ranging applications in various neurosymbolic artificial intelligence (AI) systems and hardware systems. Recently, Raviv proposed the use of random linear codes in VSAs, suggesting that their subcode structure enables efficient unbinding, while preserving the quasi-orthogonality that is necessary for neural processing. Yet, random linear codes are difficult to decode under noise, which severely limits the resulting VSA's ability to support recovery, i.e., the retrieval of information objects and their attributes from a noisy compositional representation. In this work we bridge this gap by utilizing coding theoretic tools. First, we argue that the concatenation of Reed-Solomon and Hadamard codes is suitable for VSA, due to the mutual quasi-orthogonality of the resulting codewords (a folklore result). Second, we show that recovery of the resulting compositional representations can be done by solving a problem we call histogram recovery. In histogram recovery, a collection of $N$ histograms over a finite field is given as input, and one must find a collection of Reed-Solomon codewords of length $N$ whose entry-wise symbol frequencies obey those histograms. We present an optimal solution to the histogram recovery problem by using algorithms related to list-decoding, and analyze the resulting noise resilience. Our results give rise to a noise-resilient VSA with formal guarantees regarding efficient encoding, quasi-orthogonality, and recovery, without relying on any heuristics or training, and while operating at improved parameters relative to similar solutions such as the Hadamard code.

翻译：向量符号架构（VSAs）是一类信息表示技术，通过绑定与叠加实现从原子向量构建复杂信息结构的组合操作。近年来，该方法已广泛应用于各类神经符号人工智能系统与硬件系统。最近，Raviv提出在VSAs中使用随机线性码，其子码结构既能实现高效解绑，又可保留神经处理所需的准正交性。然而，随机线性码在噪声环境下难以解码，严重限制了VSA支持恢复的能力——即从含有噪声的组合表示中检索信息对象及其属性。本研究通过引入编码理论工具填补了这一空白。首先，我们论证了里德-所罗门码与哈达玛码级联适用于VSA，其关键在于生成的码字具有准正交性（一项未正式发表的成果）。其次，我们证明可通过求解一种称为"直方图恢复"的问题来实现对组合表示的恢复。在直方图恢复问题中，输入为有限域上$N$个直方图的集合，需寻找一组长度为$N$的里德-所罗门码字，使得每个位置的符号频率符合这些直方图。我们利用列表解码相关算法给出了直方图恢复问题的最优解，并分析了相应的噪声鲁棒性。本研究提出的方案无需依赖任何启发式方法或训练，即可在改善参数性能（相较于哈达玛码等同类方案）的前提下，构建具有高效编码、准正交性与恢复能力的形式化保障的噪声鲁棒型VSA。