Implicit Neural Representation (INR) is an innovative approach for representing complex shapes or objects without explicitly defining their geometry or surface structure. Instead, INR represents objects as continuous functions. Previous research has demonstrated the effectiveness of using neural networks as INR for image compression, showcasing comparable performance to traditional methods such as JPEG. However, INR holds potential for various applications beyond image compression. This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks. Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training. Additionally, the decoding process from INR to RGB format is highly parallelized and executed on-the-fly. To further enhance compression, we propose iterative and dynamic pruning, as well as layer-wise quantization, building upon previous work. We evaluate our framework on the image classification task, utilizing the ResNet-18 backbone network and three commonly used datasets with varying image sizes. Rapid-INR reduces memory consumption to only 5% of the original dataset size and achieves a maximum 6$\times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2x speedup over the DALI training pipeline, with only a marginal decrease in accuracy. Importantly, Rapid-INR can be readily applied to other computer vision tasks and backbone networks with reasonable engineering efforts. Our implementation code is publicly available at https://anonymous.4open.science/r/INR-4BF7.
翻译:摘要:隐式神经表示(INR)是一种无需显式定义几何或表面结构即可表示复杂形状或物体的创新方法。该方法将物体表示为连续函数。已有研究表明,使用神经网络作为图像压缩的INR具有可行性,其性能与传统JPEG等方法相当。然而,INR在图像压缩之外的其他应用中仍具有潜力。本文提出Rapid-INR,一种利用INR对图像进行编码与压缩以加速计算机视觉任务中神经网络训练的新型方法。我们的方法将整个数据集直接以INR格式存储于GPU中,从而消除训练过程中CPU与GPU间显著的数据通信开销。同时,从INR到RGB格式的解码过程采用高度并行化方式实时执行。为进一步提升压缩效率,我们在先前研究工作基础上提出迭代动态剪枝及逐层量化方法。我们使用ResNet-18骨干网络及三种不同图像尺寸的常用数据集,在图像分类任务上评估了该框架。Rapid-INR将内存消耗降低至原始数据集大小的5%,相较于PyTorch训练流程实现最高6倍加速,相较于DALI训练流程实现最高1.2倍加速,且精度损失极小。重要的是,通过合理的工程实现,Rapid-INR可便捷地应用于其他计算机视觉任务及骨干网络。我们的实现代码已公开至https://anonymous.4open.science/r/INR-4BF7。