Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.
翻译:Transformer已在多种领域和任务中展现出成功。然而,由于其有限容量,Transformer难以处理长输入序列。一种解决方案是增加输入长度,但无限制地拉伸长度并不现实。此外,人类能够有选择地记住并仅使用输入中的相关信息,而Transformer则从头至尾处理所有原始数据。我们提出Memoria,一种通用记忆网络,它应用赫布理论(解释人类记忆形成的主要理论)来增强神经网络中的长程依赖。Memoria在工作记忆、短时记忆和长时记忆等多个记忆层级中存储和检索称为"记忆印迹"的信息,并使用根据赫布规则变化的连接权重。通过与BERT和GPT等主流基于Transformer的模型进行实验,我们表明Memoria显著提升了各类任务中考虑长程依赖的能力。结果显示,在排序、语言建模及长文本分类任务中,Memoria均优于现有方法。