We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code. This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere. We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code. This unique perspective leads to: (i) An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models. (ii) A sub-linear time algorithm $\mathtt{U}\text{-}\mathtt{Hop}$+ to reach KHMs' optimal capacity. (iii) An analysis of the scaling behavior of the required feature dimension relative to the number of stored memories. These efforts improve both the retrieval capability of KHMs and the representation learning of corresponding transformers. Experimentally, we provide thorough numerical results to back up theoretical findings.
翻译:我们研究了现代Hopfield模型及其Transformer兼容的密集联想记忆子类——核化Hopfield模型(KHMs)的最优记忆容量。通过建立KHMs的记忆配置与信息论中球面码之间的联系,我们提出了严格的分析框架。具体而言,我们将存储的记忆集合视为一种特殊的球面码,从而将KHMs的记忆问题转化为超球面上的点分布问题。我们证明,当特征空间允许记忆形成最优球面码时,KHMs达到最优容量。这一独特视角带来以下贡献:(一)分析了KHMs实现最优记忆容量的机制,并确定了相应的必要条件。特别地,我们建立了与文献中已知指数下界相匹配的容量上界,首次为现代Hopfield模型提供了严格的最优渐近记忆容量。(二)提出了亚线性时间算法$\mathtt{U}\text{-}\mathtt{Hop}$+以实现KHMs的最优容量。(三)分析了所需特征维度相对于存储记忆数量的缩放规律。这些工作同时提升了KHMs的检索能力与对应Transformer的表征学习性能。实验方面,我们提供了详尽的数值结果以支撑理论发现。