Associative memory and probabilistic modeling are two fundamental topics in artificial intelligence. The first studies recurrent neural networks designed to denoise, complete and retrieve data, whereas the second studies learning and sampling from probability distributions. Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions. We showcase four examples: First, we propose new energy-based models that flexibly adapt their energy functions to new in-context datasets, an approach we term \textit{in-context learning of energy functions}. Second, we propose two new associative memory models: one that dynamically creates new memories as necessitated by the training data using Bayesian nonparametrics, and another that explicitly computes proportional memory assignments using the evidence lower bound. Third, using tools from associative memory, we analytically and numerically characterize the memory capacity of Gaussian kernel density estimators, a widespread tool in probababilistic modeling. Fourth, we study a widespread implementation choice in transformers -- normalization followed by self attention -- to show it performs clustering on the hypersphere. Altogether, this work urges further exchange of useful ideas between these two continents of artificial intelligence.
翻译:联想记忆与概率建模是人工智能领域的两个基本主题。前者研究用于去噪、补全和检索数据的循环神经网络,后者则研究从概率分布中进行学习和采样。基于联想记忆的能量函数可视为概率建模的负对数似然这一观察,我们在两者之间搭建了一座桥梁,使思想能够在两个方向上进行有益的流动。我们展示了四个例子:首先,我们提出了一种新的基于能量的模型,该模型能够灵活地调整其能量函数以适应新的上下文内数据集,我们将这种方法称为“能量函数的上下文内学习”。其次,我们提出了两种新的联想记忆模型:一种通过贝叶斯非参数方法根据训练数据动态创建新记忆,另一种利用证据下界显式计算比例记忆分配。第三,利用联想记忆中的工具,我们从解析和数值角度表征了高斯核密度估计器(概率建模中一种广泛使用的工具)的记忆容量。第四,我们研究了Transformer中一种常见的实现选择——归一化后接自注意力机制,以证明其在超球面上执行聚类。总体而言,这项工作促进了人工智能这两大领域间有用思想的进一步交流。