Food recognition has a wide range of applications, such as health-aware recommendation and self-service restaurants. Most previous methods of food recognition firstly locate informative regions in some weakly-supervised manners and then aggregate their features. However, location errors of informative regions limit the effectiveness of these methods to some extent. Instead of locating multiple regions, we propose a Progressive Self-Distillation (PSD) method, which progressively enhances the ability of network to mine more details for food recognition. The training of PSD simultaneously contains multiple self-distillations, in which a teacher network and a student network share the same embedding network. Since the student network receives a modified image from its teacher network by masking some informative regions, the teacher network outputs stronger semantic representations than the student network. Guided by such teacher network with stronger semantics, the student network is encouraged to mine more useful regions from the modified image by enhancing its own ability. The ability of the teacher network is also enhanced with the shared embedding network. By using progressive training, the teacher network incrementally improves its ability to mine more discriminative regions. In inference phase, only the teacher network is used without the help of the student network. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method and state-of-the-art performance.
翻译:食物识别在健康意识推荐与自助餐厅等场景中具有广泛应用。现有方法多通过弱监督方式定位显著区域并聚合其特征,但区域定位误差在一定程度上限制了此类方法的有效性。本文提出渐进式自蒸馏方法(Progressive Self-Distillation, PSD),无需显式定位多个区域,即可逐步增强网络对食物细节特征的挖掘能力。PSD训练过程同时包含多层自蒸馏结构,其中教师网络与学生网络共享同一嵌入网络。通过掩蔽学生网络输入图像中的显著区域,教师网络能够输出比学生网络更强的语义表征。在具有更强语义的教师网络引导下,学生网络被迫从被掩蔽图像中挖掘更多有效区域,从而提升自身特征提取能力。而共享的嵌入网络同步强化了教师网络的表征能力。通过渐进式训练策略,教师网络逐步提升其区分性区域挖掘能力。推理阶段仅需使用教师网络,无需学生网络辅助。在三个数据集上的大量实验表明,所提方法具有显著有效性,并达到了最先进的性能水平。