Since the person re-identification task often suffers from the problem of pose changes and occlusions, some attentive local features are often suppressed when training CNNs. In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch. The global branch encodes the global salient representations. Meanwhile, the feature dropping branch consists of an attentive feature learning module called Batch DropBlock, which randomly drops the same region of all input feature maps in a batch to reinforce the attentive feature learning of local regions. The network then concatenates features from both branches and provides a more comprehensive and spatially distributed feature representation. Albeit simple, our method achieves state-of-the-art on person re-identification and it is also applicable to general metric learning tasks. For instance, we achieve 76.4% Rank-1 accuracy on the CUHK03-Detect dataset and 83.0% Recall-1 score on the Stanford Online Products dataset, outperforming the existing works by a large margin (more than 6%).
翻译:鉴于行人重识别任务常受姿态变化与遮挡问题影响,训练卷积神经网络时某些注意力局部特征易被抑制。本文提出批处理DropBlock网络,该网络由传统ResNet-50全局分支与特征丢弃分支构成的双分支网络。全局分支编码全局显著性表征,而特征丢弃分支包含名为批处理DropBlock的注意力特征学习模块,该模块在批次中随机丢弃所有输入特征图的相同区域,以增强局部区域的注意力特征学习。网络随后拼接两个分支的特征,生成更全面且空间分布更均衡的特征表示。尽管方法简洁,我们的方法在行人重识别任务上达到最先进水平,并可推广至通用度量学习任务。例如,在CUHK03-Detect数据集上取得76.4%的Rank-1准确率,在Stanford Online Products数据集上取得83.0%的Recall-1分数,以显著优势(超过6%)超越现有工作。