Embrace Limited and Imperfect Training Datasets: Opportunities and Challenges in Plant Disease Recognition Using Deep Learning

Recent advancements in deep learning have brought significant improvements to plant disease recognition. However, achieving satisfactory performance often requires high-quality training datasets, which are challenging and expensive to collect. Consequently, the practical application of current deep learning-based methods in real-world scenarios is hindered by the scarcity of high-quality datasets. In this paper, we argue that embracing poor datasets is viable and aim to explicitly define the challenges associated with using these datasets. To delve into this topic, we analyze the characteristics of high-quality datasets, namely large-scale images and desired annotation, and contrast them with the \emph{limited} and \emph{imperfect} nature of poor datasets. Challenges arise when the training datasets deviate from these characteristics. To provide a comprehensive understanding, we propose a novel and informative taxonomy that categorizes these challenges. Furthermore, we offer a brief overview of existing studies and approaches that address these challenges. We believe that our paper sheds light on the importance of embracing poor datasets, enhances the understanding of the associated challenges, and contributes to the ambitious objective of deploying deep learning in real-world applications. To facilitate the progress, we finally describe several outstanding questions and point out potential future directions. Although our primary focus is on plant disease recognition, we emphasize that the principles of embracing and analyzing poor datasets are applicable to a wider range of domains, including agriculture.

翻译：近期深度学习的进展为植物病害识别带来了显著提升。然而，实现令人满意的性能通常需要高质量的训练数据集，而这类数据的采集既具挑战性又成本高昂。因此，当前基于深度学习方法在实际场景中的应用受限于高质量数据集的匮乏。本文论证了接受低质量数据集的可行性，并旨在明确定义使用此类数据集所面临的挑战。深入探究该主题时，我们分析了高质量数据集的特征，即大规模图像与理想标注，并将其与低质量数据集的“有限”和“不完善”特性进行对比。当训练数据集偏离这些特征时，挑战随之产生。为提供全面理解，我们提出了一种新颖且具启发性的分类体系，对相关挑战进行归类。此外，我们简要概述了应对这些挑战的现有研究与方法。我们相信本文揭示了接纳低质量数据集的重要性，加深了对相关挑战的理解，并为在现实应用中部署深度学习这一宏伟目标做出贡献。为促进研究进展，我们最终提出了若干未解决问题并指出了未来潜在方向。尽管本文主要聚焦于植物病害识别，但我们强调，接纳与分析低质量数据集的原则适用于更广泛的领域，包括农业。