We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.
翻译:本文提出上下文去噪任务,旨在深化基于注意力的架构与密集联想记忆网络(亦称现代Hopfield网络)之间的理论关联。通过贝叶斯框架,我们从理论与实验两方面证明:即使仅使用单层transformer,也能以最优方式解决特定受限去噪问题。研究表明,经过训练的注意力层通过以下方式处理每个去噪提示:在上下文感知的DAM能量景观上执行单步梯度下降更新,其中上下文标记作为联想记忆存储,查询标记则作为初始状态。这种单步更新产生的解,优于直接检索上下文标记或陷入伪局部极小值的精确检索,为DAM网络超越标准检索范式提供了具体例证。总体而言,本研究巩固了Ramsauer等人首次揭示的联想记忆与注意力机制之间的关联,并论证了联想记忆模型在上下文学习研究中的重要意义。