Multi-Attention-Based Soft Partition Network for Vehicle Re-Identification

Vehicle re-identification helps in distinguishing between images of the same and other vehicles. It is a challenging process because of significant intra-instance differences between identical vehicles from different views and subtle inter-instance differences between similar vehicles. To solve this issue, researchers have extracted view-aware or part-specific features via spatial attention mechanisms, which usually result in noisy attention maps or otherwise require expensive additional annotation for metadata, such as key points, to improve the quality. Meanwhile, based on the researchers' insights, various handcrafted multi-attention architectures for specific viewpoints or vehicle parts have been proposed. However, this approach does not guarantee that the number and nature of attention branches will be optimal for real-world re-identification tasks. To address these problems, we proposed a new vehicle re-identification network based on a multiple soft attention mechanism for capturing various discriminative regions from different viewpoints more efficiently. Furthermore, this model can significantly reduce the noise in spatial attention maps by devising a new method for creating an attention map for insignificant regions and then excluding it from generating the final result. We also combined a channel-wise attention mechanism with a spatial attention mechanism for the efficient selection of important semantic attributes for vehicle re-identification. Our experiments showed that our proposed model achieved a state-of-the-art performance among the attention-based methods without metadata and was comparable to the approaches using metadata for the VehicleID and VERI-Wild datasets.

翻译：车辆重识别有助于区分同一车辆与不同车辆的图像。由于不同视角下相同车辆存在显著实例内差异，而相似车辆间存在细微实例间差异，该任务具有挑战性。为解决此问题，研究人员通过空间注意力机制提取视角感知特征或部位特定特征，但这通常会导致注意力图噪声过大，或需要昂贵的额外标注（如关键点）来提升质量。同时，基于研究人员的洞察，针对特定视角或车辆部件，已提出多种手工设计的注意力架构。然而，这种方法无法保证注意力分支的数量和性质对实际重识别任务最优。针对这些问题，我们提出了一种基于多重软注意力机制的新型车辆重识别网络，能够更高效地从不同视角捕获多种判别性区域。此外，该模型通过设计新方法为不显著区域生成注意力图，并将其从最终结果生成中排除，从而显著降低空间注意力图中的噪声。我们还结合通道注意力机制与空间注意力机制，用于高效选择车辆重识别所需的重要语义属性。实验表明，在VehicleID和VERI-Wild数据集上，我们提出的模型在无元数据的注意力方法中达到了最先进的性能，且与使用元数据的方法性能相当。