基于Transformer的目标重识别研究综述 (Transformer for Object Re-Identification: A Survey)

Object Re-identification (Re-ID) aims to identify specific objects across different times and scenes, which is a widely researched task in computer vision. For a prolonged period, this field has been predominantly driven by deep learning technology based on convolutional neural networks. In recent years, the emergence of Vision Transformers has spurred a growing number of studies delving deeper into Transformer-based Re-ID, continuously breaking performance records and witnessing significant progress in the Re-ID field. Offering a powerful, flexible, and unified solution, Transformers cater to a wide array of Re-ID tasks with unparalleled efficacy. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single/cross modal tasks. For the under-explored animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task and facilitate future research. Finally, we discuss some important yet under-investigated open issues in the large foundation model era, we believe it will serve as a new handbook for researchers in this field. A periodically updated website will be available at https://github.com/mangye16/ReID-Survey.

翻译：目标重识别（Re-ID）旨在跨不同时间和场景识别特定目标，是计算机视觉领域广泛研究的一项任务。长期以来，该领域主要由基于卷积神经网络的深度学习技术驱动。近年来，视觉Transformer的出现促使越来越多的研究深入探索基于Transformer的Re-ID方法，不断刷新性能记录，并见证了Re-ID领域的显著进展。Transformer提供了一种强大、灵活且统一的解决方案，以无与伦比的效能适应广泛的Re-ID任务。本文对基于Transformer的Re-ID研究进行了全面回顾与深入分析。通过将现有工作分类为基于图像/视频的Re-ID、有限数据/标注下的Re-ID、跨模态Re-ID以及特殊Re-ID场景，我们系统阐述了Transformer在应对这些领域中诸多挑战时所展现的优势。考虑到无监督Re-ID的发展趋势，我们提出了一个新的Transformer基线模型UntransReID，在单模态/跨模态任务上均实现了最先进的性能。针对尚未充分探索的动物Re-ID任务，我们设计了一个标准化的实验基准，并进行了大量实验以探索Transformer在此任务中的适用性，为未来研究提供便利。最后，我们讨论了在大规模基础模型时代一些重要但尚未深入研究的开放性问题，相信本文将成为该领域研究者的新手册。定期更新的网站将发布于https://github.com/mangye16/ReID-Survey。