This study proposes an overall deep learning architecture for multi-class classification of plant diseases from high-resolution leaf imagery, with a particular interest in investigating the behavior of ResNet-50 and a hybrid ResNet + Vision Transformer (ViT) design. A specially gathered image database with 15,200 training images and 3,800 validation images spanning 38 classes across multiple crops, including tomato, apple, grape etc. were subjected to preprocessing steps such as resizing, normalization, and data augmentation to enhance model robustness. Multiple architectures, including ResNet-50, MobileNetV2, and EfficientNet-B0, were trained and compared with the hybrid ResNet + ViT model. All models were fine-tuned using the AdamW optimizer and cross-entropy loss, with early stopping applied to prevent overfitting and ensure generalization. Furthermore, interpretability techniques such as Grad-CAM and saliency maps were implemented to indicate disease-relevant regions, while segmentation-based analysis was performed to identify the affected parts of a leaf. For every one of the considered architectures, ResNet-50 led to the highest accuracy of 98.74%, whereas the hybrid ResNet + ViT model achieved a competitive accuracy of 98.58%, showing that the hybrid architectures were effective in capturing both local and overall information. The experimental results showcase the promise of transformer-based models to achieve highly accurate, interpretable, and computationally efficient computer-based multi-class multi-disease classification systems, providing helpful assistance for cultivation management practices as well as for precision farming.
翻译:本研究提出了一种面向高分辨率叶片图像的多类植物病害分类的完整深度学习架构,重点探究了ResNet-50与混合ResNet+视觉Transformer(ViT)设计的行为特性。针对包含番茄、苹果、葡萄等多种作物共38个类别的专用图像数据库(含15,200张训练图像与3,800张验证图像),实施了尺寸调整、归一化及数据增强等预处理步骤以提升模型鲁棒性。我们训练了包括ResNet-50、MobileNetV2、EfficientNet-B0在内的多种架构,并与混合ResNet+ViT模型进行性能对比。所有模型均采用AdamW优化器与交叉熵损失进行微调,同时应用早停策略防止过拟合并确保泛化能力。此外,通过Grad-CAM与显著性图等可解释性技术定位病害相关区域,并基于分割分析识别叶片受损部位。在各架构中,ResNet-50取得最高准确率98.74%,而混合ResNet+ViT模型也达到具有竞争力的98.58%,表明混合架构能有效捕捉局部与全局信息。实验结果展示了基于Transformer的模型在构建高精度、可解释且计算高效的计算机辅助多类多病害分类系统中的潜力,可为栽培管理实践及精准农业提供有益支持。