In contemporary computer vision applications, particularly image classification, architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors. Despite the widespread use of these pre-trained convolutional neural networks (CNNs), there remains a gap in understanding the performance of various resource-efficient backbones across diverse domains and dataset sizes. Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings across a variety of datasets, including natural images, medical images, galaxy images, and remote sensing images. This comprehensive analysis aims to aid machine learning practitioners in selecting the most suitable backbone for their specific problem, especially in scenarios involving small datasets where fine-tuning a pre-trained network is crucial. Even though attention-based architectures are gaining popularity, we observed that they tend to perform poorly under low data finetuning tasks compared to CNNs. We also observed that some CNN architectures such as ConvNeXt, RegNet and EfficientNet performs well compared to others on a diverse set of domains consistently. Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones, facilitating informed decision-making in model selection for a broad spectrum of computer vision domains. Our code is available here: https://github.com/pranavphoenix/Backbones
翻译:在当代计算机视觉应用中,特别是图像分类领域,基于大型数据集(如ImageNet)预训练的架构骨干网络通常被用作特征提取器。尽管这些预训练卷积神经网络(CNN)得到广泛应用,但学界对于各类资源高效型骨干网络在不同领域和数据集规模下的性能表现仍缺乏系统认知。本研究在统一的训练设置下,系统评估了多种轻量级预训练CNN骨干网络在包括自然图像、医学图像、星系图像和遥感图像在内的多样化数据集上的表现。这项综合分析旨在帮助机器学习从业者针对特定问题(尤其是在涉及小数据集的场景中,此时微调预训练网络至关重要)选择最合适的骨干网络。尽管基于注意力机制的架构日益流行,但我们观察到在低数据量微调任务中,其表现往往逊色于CNN。同时我们发现某些CNN架构(如ConvNeXt、RegNet和EfficientNet)在多样化领域中的表现持续优于其他架构。我们的研究结果为不同骨干网络的性能权衡与有效性提供了可操作的见解,有助于在广泛的计算机视觉领域中进行模型选择的科学决策。代码已开源:https://github.com/pranavphoenix/Backbones