Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.
翻译:现有问答方法通常假设输入内容(如文档或视频)始终可访问以完成任务。为此,记忆网络被提出以模拟人类逐步理解并压缩信息至固定容量记忆的过程。然而,这些模型仅通过反向传播答案误差来学习维护记忆。与之不同,研究表明人类具备预演(rehearsal)与预指(anticipation)等增强记忆能力的有效机制。受此启发,我们提出一种记忆模型,可在处理输入时执行预演与预指操作,以记忆重要信息解决流式数据问答任务。所提机制通过聚焦共指信息的掩码建模任务,在训练过程中以自监督方式应用。我们在短序列(bAbI)及长序列文本(NarrativeQA)与视频(ActivityNet-QA)问答数据集上验证模型,相较以往记忆网络方法取得显著提升。此外,消融实验证实了所提机制对记忆模型的重要性。