Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited.
翻译:传统CNN模型在相对低分辨率图像(<300像素)上训练和测试,由于计算和内存限制,无法直接处理大规模图像。我们提出补丁梯度下降(PatchGD),一种有效的学习策略,允许以端到端方式在现有CNN架构上训练大规模图像。PatchGD基于以下假设:与其一次性对整个图像进行基于梯度的更新,不如每次仅对图像的局部部分进行模型更新,并在迭代过程中确保覆盖图像的大部分区域,从而获得良好的解决方案。因此,PatchGD在训练大规模图像模型时,显著提升了内存和计算效率。我们在PANDA和UltraMNIST两个数据集上,结合ResNet50和MobileNetV2模型,在不同内存约束下对PatchGD进行了全面评估。评估结果明确表明,在处理大图像时,PatchGD比标准梯度下降方法更加稳定和高效,尤其是在计算内存受限的情况下。