The transformer neural network architecture allows for autoregressive sequence-to-sequence modeling through the use of attention layers. It was originally created with the application of machine translation but has revolutionized natural language processing. Recently, transformers have also been applied across a wide variety of pattern recognition tasks, particularly in computer vision. In this literature review, we describe major advances in computer vision utilizing transformers. We then focus specifically on Multi-Object Tracking (MOT) and discuss how transformers are increasingly becoming competitive in state-of-the-art MOT works, yet still lag behind traditional deep learning methods.
翻译:Transformer神经网络架构通过注意力层实现了自回归序列到序列建模。该架构最初为机器翻译应用而创建,但已彻底革新了自然语言处理领域。近年来,Transformer也被广泛应用于各种模式识别任务,特别是在计算机视觉领域。本文献综述首先阐述了利用Transformer在计算机视觉领域取得的主要进展。随后我们特别聚焦于多目标跟踪(MOT)任务,探讨Transformer如何在当前最先进的多目标跟踪研究中日益展现出竞争力,但仍落后于传统的深度学习方法。