We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code. This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere. We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code. This unique perspective leads to: (i) An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models. (ii) A sub-linear time algorithm $\mathtt{U}\text{-}\mathtt{Hop}$+ to reach KHMs' optimal capacity. (iii) An analysis of the scaling behavior of the required feature dimension relative to the number of stored memories. These efforts improve both the retrieval capability of KHMs and the representation learning of corresponding transformers. Experimentally, we provide thorough numerical results to back up theoretical findings.
翻译:本研究探讨了现代Hopfield模型及其扩展形式——核化Hopfield模型(KHMs)的最优记忆容量,该类模型属于与Transformer兼容的密集关联记忆体系。通过建立KHMs的记忆配置与信息论中球面码之间的理论联系,我们提出了严格的分析框架。具体而言,我们将存储的记忆集合视为特殊类型的球面码,从而将KHMs的记忆存储问题转化为超球面上的点分布优化问题。研究表明,当特征空间允许记忆模式形成最优球面码时,KHMs能达到理论最优容量。这一独特视角带来以下贡献:(一)系统分析KHMs实现最优记忆容量的机制,并推导出相应的必要条件。特别重要的是,我们建立了与文献中已知指数级下界相匹配的容量上界,首次为现代Hopfield模型提供了紧致且渐进最优的记忆容量理论。(二)提出亚线性时间算法 $\mathtt{U}\text{-}\mathtt{Hop}$+ 以实现KHMs的理论最优容量。(三)深入分析所需特征维度相对于存储记忆数量的标度规律。这些成果同时提升了KHMs的检索能力与对应Transformer的表征学习性能。实验方面,我们通过详尽的数值模拟验证了理论结论。