Most existing forecasting systems are memory-based methods, which attempt to mimic human forecasting ability by employing various memory mechanisms and have progressed in temporal modeling for memory dependency. Nevertheless, an obvious weakness of this paradigm is that it can only model limited historical dependence and can not transcend the past. In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks. In addition, owing to the inherent superiority of MAT, it can process online action detection and anticipation tasks in a unified manner. The proposed MAT model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and EPIC-Kitchens-100, for online action detection and anticipation tasks, and it significantly outperforms all existing methods. Code is available at https://github.com/Echo0125/Memory-and-Anticipation-Transformer.
翻译:现有的大多数预测系统都是基于记忆的方法,这类方法试图通过采用各种记忆机制来模仿人类的预测能力,并在时间依赖性建模方面取得了进展。然而,这种范式的一个明显缺陷是只能建模有限的历史依赖性,且无法超越过去。本文重新思考事件演化中的时间依赖性,提出了一种新颖的基于记忆与预期的范式,以建模包含过去、现在和未来的完整时间结构。基于这一思想,我们提出了记忆与预期Transformer(MAT),一种基于记忆与预期的方法,用于解决在线动作检测和预期任务。此外,由于MAT的固有优势,它能够以统一的方式处理在线动作检测和预期任务。所提出的MAT模型在四个具有挑战性的基准数据集——TVSeries、THUMOS'14、HDD和EPIC-Kitchens-100上,针对在线动作检测和预期任务进行了测试,显著优于所有现有方法。代码开源地址:https://github.com/Echo0125/Memory-and-Anticipation-Transformer。