在五个异构图像数据集上训练定制CNN (Training a Custom CNN on Five Heterogeneous Image Datasets)

Deep learning has transformed visual data analysis, with Convolutional Neural Networks (CNNs) becoming highly effective in learning meaningful feature representations directly from images. Unlike traditional manual feature engineering methods, CNNs automatically extract hierarchical visual patterns, enabling strong performance across diverse real-world contexts. This study investigates the effectiveness of CNN-based architectures across five heterogeneous datasets spanning agricultural and urban domains: mango variety classification, paddy variety identification, road surface condition assessment, auto-rickshaw detection, and footpath encroachment monitoring. These datasets introduce varying challenges, including differences in illumination, resolution, environmental complexity, and class imbalance, necessitating adaptable and robust learning models. We evaluate a lightweight, task-specific custom CNN alongside established deep architectures, including ResNet-18 and VGG-16, trained both from scratch and using transfer learning. Through systematic preprocessing, augmentation, and controlled experimentation, we analyze how architectural complexity, model depth, and pre-training influence convergence, generalization, and performance across datasets of differing scale and difficulty. The key contributions of this work are: (1) the development of an efficient custom CNN that achieves competitive performance across multiple application domains, and (2) a comprehensive comparative analysis highlighting when transfer learning and deep architectures provide substantial advantages, particularly in data-constrained environments. These findings offer practical insights for deploying deep learning models in resource-limited yet high-impact real-world visual classification tasks.

翻译：深度学习已经彻底改变了视觉数据分析，其中卷积神经网络（CNN）在直接从图像中学习有意义的特征表示方面变得极为高效。与传统的手工特征工程方法不同，CNN自动提取层次化的视觉模式，从而能够在多样化的现实场景中实现强大的性能。本研究探讨了基于CNN的架构在跨越农业和城市领域的五个异构数据集上的有效性：芒果品种分类、水稻品种识别、路面状况评估、电动三轮车检测和人行道侵占监测。这些数据集带来了不同的挑战，包括光照、分辨率、环境复杂性和类别不平衡等方面的差异，因此需要适应性强且鲁棒的学习模型。我们评估了一个轻量级的、针对特定任务的定制CNN，以及包括ResNet-18和VGG-16在内的成熟深度架构，这些模型分别通过从头训练和迁移学习进行训练。通过系统的预处理、数据增强和受控实验，我们分析了架构复杂性、模型深度和预训练如何影响在不同规模和难度数据集上的收敛性、泛化能力和性能。本工作的主要贡献是：（1）开发了一种高效的定制CNN，在多个应用领域中实现了有竞争力的性能；（2）进行了一项全面的比较分析，阐明了迁移学习和深度架构在何时能提供显著优势，特别是在数据受限的环境中。这些发现为在资源有限但高影响力的现实世界视觉分类任务中部署深度学习模型提供了实用的见解。