Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.
翻译:细粒度图像分类(FGIC)是计算机视觉中一项具有挑战性的任务,其原因在于子类别间视觉差异微小而类内差异显著。深度学习方法已在解决FGIC方面取得显著成功。本文提出一种融合全局纹理与局部区域块信息的融合方法来解决FGIC问题。第一通道从多个固定大小且不重叠的区域块中提取深度特征,并利用长短期记忆(LSTM)网络通过序列建模对特征进行编码;另一通道则采用局部二值模式(LBP)在多尺度上计算图像级纹理。两路优势融合后形成高效的图像分类特征向量。该方法在八个涵盖人脸、皮肤病变、食物菜品、海洋生物等数据集上使用四种标准主干卷积神经网络(CNN)进行测试。与现有方法相比,我们的方法在分类准确率上取得了显著提升,且优势明显。