超参数优化对实时图像分类轻量级深度模型的影响分析 (Analysis of Hyperparameter Optimization Effects on Lightweight Deep Models for Real-Time Image Classification)

Lightweight convolutional and transformer-based networks are increasingly preferred for real-time image classification, especially on resource-constrained devices. This study evaluates the impact of hyperparameter optimization on the accuracy and deployment feasibility of seven modern lightweight architectures: ConvNeXt-T, EfficientNetV2-S, MobileNetV3-L, MobileViT v2 (S/XS), RepVGG-A2, and TinyViT-21M, trained on a class-balanced subset of 90,000 images from ImageNet-1K. Under standardized training settings, this paper investigates the influence of learning rate schedules, augmentation, optimizers, and initialization on model performance. Inference benchmarks are performed using an NVIDIA L40s GPU with batch sizes ranging from 1 to 512, capturing latency and throughput in real-time conditions. This work demonstrates that controlled hyperparameter variation significantly alters convergence dynamics in lightweight CNN and transformer backbones, providing insight into stability regions and deployment feasibility in edge artificial intelligence. Our results reveal that tuning alone leads to a top-1 accuracy improvement of 1.5 to 3.5 percent over baselines, and select models (e.g., RepVGG-A2, MobileNetV3-L) deliver latency under 5 milliseconds and over 9,800 frames per second, making them ideal for edge deployment. This work provides reproducible, subset-based insights into lightweight hyperparameter tuning and its role in balancing speed and accuracy. The code and logs may be seen at: https://vineetkumarrakesh.github.io/lcnn-opt

翻译：基于卷积和Transformer的轻量级网络在实时图像分类任务中日益受到青睐，尤其在资源受限的设备上。本研究评估了超参数优化对七种现代轻量级架构的精度与部署可行性的影响，这些架构包括：ConvNeXt-T、EfficientNetV2-S、MobileNetV3-L、MobileViT v2 (S/XS)、RepVGG-A2 以及 TinyViT-21M，均在ImageNet-1K中选取的包含90,000张图像的类别平衡子集上进行训练。在标准化的训练设置下，本文探究了学习率调度策略、数据增强方法、优化器选择以及初始化方式对模型性能的影响。推理基准测试在NVIDIA L40s GPU上执行，批处理大小范围为1至512，以捕获实时条件下的延迟与吞吐量。本研究表明，受控的超参数变化会显著改变轻量级CNN与Transformer骨干网络的收敛动态，从而为边缘人工智能中的稳定性区域与部署可行性提供见解。我们的结果显示，仅通过调优即可在基线基础上实现1.5%至3.5%的Top-1精度提升，且部分模型（如RepVGG-A2、MobileNetV3-L）能够实现低于5毫秒的延迟和超过9,800帧/秒的吞吐量，使其成为边缘部署的理想选择。本研究基于可复现的子集实验，为轻量级超参数调优及其在平衡速度与精度方面的作用提供了深入洞察。代码与训练日志可见于：https://vineetkumarrakesh.github.io/lcnn-opt