Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.
翻译:细粒度图像分类(FGIC)是计算机视觉领域的一项挑战性任务,其难点在于子类间视觉差异微小而类内差异显著。深度学习方法在解决FGIC问题上取得了显著成果。本文提出一种融合全局纹理与局部块信息的FGIC方法。第一个处理流程从固定大小的非重叠块中提取深层特征,并通过长短期记忆网络(LSTM)进行序列建模编码特征;另一个流程采用局部二值模式(LBP)在多个尺度上计算图像级纹理特征。通过整合两个分支的优势,构建高效的图像分类特征向量。该方法在涉及人脸、皮肤病变、菜品、海洋生物等领域的八个数据集上,采用四种标准骨干CNN进行测试。实验结果表明,本方法的分类准确率显著优于现有方法。