Transformer-based models still face the structural limitation of fixed context length in processing long sequence input despite their effectiveness in various fields. While various external memory techniques were introduced, most previous techniques fail to avoid fateful forgetting, where even the most important memories are inevitably forgotten after a sufficient number of time steps. We designed Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories related to memory. Experimentally, we demonstrated the effectiveness of Memoria in tasks such as sorting and language modeling, surpassing conventional techniques.
翻译:基于Transformer的模型在处理长序列输入时,尽管在多个领域展现出有效性,但仍面临固定上下文长度的结构限制。尽管已有多种外部记忆技术被提出,但大多数先前技术无法避免致命遗忘问题——即便最重要的记忆在足够多的时间步后也终将被遗忘。我们设计了Memoria——一种受人类启发的人工神经网络记忆系统,该系统整合了多种与记忆相关的神经科学与心理学理论。实验证明,Memoria在排序和语言建模等任务中超越了传统技术,展现了其有效性。