Vector symbolic architectures (VSAs) are a family of information representation techniques which enable composition, i.e., creating complex information structures from atomic vectors via binding and superposition, and have recently found wide ranging applications in various neurosymbolic artificial intelligence (AI) systems. Recently, Raviv proposed the use of random linear codes in VSAs, suggesting that their subcode structure enables efficient binding, while preserving the quasi-orthogonality that is necessary for neural processing. Yet, random linear codes are difficult to decode under noise, which severely limits the resulting VSA's ability to support recovery, i.e., the retrieval of information objects and their attributes from a noisy compositional representation. In this work we bridge this gap by utilizing coding theoretic tools. First, we argue that the concatenation of Reed-Solomon and Hadamard codes is suitable for VSA, due to the mutual quasi-orthogonality of the resulting codewords (a folklore result). Second, we show that recovery of the resulting compositional representations can be done by solving a problem we call histogram recovery. In histogram recovery, a collection of $N$ histograms over a finite field is given as input, and one must find a collection of Reed-Solomon codewords of length $N$ whose entry-wise symbol frequencies obey those histograms. We present an optimal solution to the histogram recovery problem by using algorithms related to list-decoding, and analyze the resulting noise resilience. Our results give rise to a noise-resilient VSA with formal guarantees regarding efficient encoding, quasi-orthogonality, and recovery, without relying on any heuristics or training, and while operating at improved parameters relative to similar solutions such as the Hadamard code.
翻译:向量符号架构(VSAs)是一类信息表示技术,支持通过绑定与叠加操作,从原子向量构建复杂信息结构的组合能力,近年来已在多种神经符号人工智能(AI)系统中得到广泛应用。近期,Raviv提出在VSAs中使用随机线性码,指出其子码结构能够实现高效绑定,同时保持神经处理所需的准正交性。然而,随机线性码在噪声下难以解码,这严重限制了相应VSA支持恢复的能力,即从含噪声的组合表示中检索信息对象及其属性。本研究通过运用编码理论工具弥合了这一差距。首先,我们论证了Reed-Solomon码与Hadamard码的级联适用于VSA,这源于所得码字间的相互准正交性(一个已知结论)。其次,我们证明所得组合表示的恢复可通过解决一个称为直方图恢复的问题来实现。在直方图恢复中,输入为有限域上$N$个直方图的集合,需要找到一组长度为$N$的Reed-Solomon码字,使其逐符号频率分布符合这些直方图。我们利用与列表解码相关的算法提出了直方图恢复问题的最优解,并分析了所得方案的噪声鲁棒性。我们的研究成果催生了一种具有噪声鲁棒性的VSA,其在高效编码、准正交性和恢复能力方面均具备形式化保证,无需依赖任何启发式方法或训练过程,且相较于Hadamard码等类似方案,其运行参数更优。