In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak induced bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer reasonably achieves the improvement compared to the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.
翻译:近年来,Transformer在计算机视觉领域的成功激发了诸多与Transformer竞争的替代模型,例如MLP-Mixer。尽管这些模型弱归纳偏置较弱,但其性能已可与研究充分的卷积神经网络相媲美。近期关于现代Hopfield网络的研究揭示了特定能量基联想记忆模型与Transformer或MLP-Mixer之间的对应关系,为Transformer类架构设计的理论基础提供了新见解。本文将这一对应关系推广至新近提出的层级Hopfield网络,并发现了MLP-Mixer模型的新型泛化结构——iMixer。与普通前馈神经网络不同,iMixer包含从输出端向输入端前向传播的MLP层。我们将该模块特征化为一种可逆、隐式且迭代的混合模块。通过多种数据集上的图像分类任务性能评估,我们发现iMixer相较于基础版MLP-Mixer合理实现了性能提升。这一结果表明,Hopfield网络与混合器模型之间的对应关系可作为理解更广泛Transformer类架构设计的基本原则。