Part-based approaches for fine-grained recognition do not show the expected performance gain over global methods, although explicitly focusing on small details that are relevant for distinguishing highly similar classes. We assume that part-based methods suffer from a missing representation of local features, which is invariant to the order of parts and can handle a varying number of visible parts appropriately. The order of parts is artificial and often only given by ground-truth annotations, whereas viewpoint variations and occlusions result in not observable parts. Therefore, we propose integrating a Fisher vector encoding of part features into convolutional neural networks. The parameters for this encoding are estimated by an online EM algorithm jointly with those of the neural network and are more precise than the estimates of previous works. Our approach improves state-of-the-art accuracies for three bird species classification datasets.
翻译:在细粒度识别中,基于部件的方法相较于全局方法并未展现出预期的性能提升,尽管这些方法明确关注对区分高度相似类别至关重要的细微细节。我们认为,基于部件的方法因缺少一种对部件顺序具有不变性、并能适当处理可变数量可见部件的局部特征表示而受限。部件顺序是人为主观设定的,通常仅由真实标注给出,而视角变化与遮挡会导致部分部件不可见。因此,我们提出将部件特征的费舍尔向量编码集成到卷积神经网络中。该编码的参数通过在线期望最大化算法与神经网络参数联合估计,其估计精度优于此前工作。我们的方法在三个鸟类物种分类数据集上提升了当前最优精度。