Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners often choose among (i) training a compact custom CNN from scratch, (ii) using a large pre-trained CNN as a fixed feature extractor, and (iii) performing transfer learning via partial or full fine-tuning of a pre-trained backbone. This report presents a controlled comparison of these three paradigms across five real-world image classification datasets spanning road-surface defect recognition, agricultural variety identification, fruit/leaf disease recognition, pedestrian walkway encroachment recognition, and unauthorized vehicle recognition. Models are evaluated using accuracy and macro F1-score, complemented by efficiency metrics including training time per epoch and parameter counts. The results show that transfer learning consistently yields the strongest predictive performance, while the custom CNN provides an attractive efficiency--accuracy trade-off, especially when compute and memory budgets are constrained.
翻译:卷积神经网络(CNN)因其能够从原始像素中学习层次化表示,已成为视觉识别的标准方法。在实践中,从业者通常从以下三种方案中选择:(i)从头开始训练一个紧凑的定制CNN,(ii)使用一个大型预训练CNN作为固定特征提取器,以及(iii)通过对预训练骨干网络进行部分或全部微调来执行迁移学习。本报告在五个现实世界图像分类数据集上对这三种范式进行了受控比较,这些数据集涵盖路面缺陷识别、农作物品种识别、水果/叶片病害识别、人行道侵占识别以及未授权车辆识别。模型使用准确率和宏观F1分数进行评估,并辅以每个周期的训练时间和参数量等效率指标。结果表明,迁移学习始终能产生最强的预测性能,而定制CNN则提供了有吸引力的效率-准确性权衡,特别是在计算和内存预算受限的情况下。