Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates the modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better fit workloads that don't require all the memory and compute resources of a full GPU. In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads of three sizes focusing on image recognition training with ResNet models. We investigate the behavior of these workloads when running in isolation on a variety of MIG instances allowed by the GPU in addition to running them in parallel on homogeneous instances co-located on the same GPU. Our results demonstrate that employing MIG can significantly improve the utilization of the GPU when the workload is too small to utilize the whole GPU in isolation. By training multiple small models in parallel, more work can be performed by the GPU per unit of time, despite the increase in time-per-epoch, leading to $\sim$3 times the throughput. In contrast, for medium and large-sized workloads, which already utilize the whole GPU well on their own, MIG only provides marginal performance improvements. Nevertheless, we observe that training models in parallel using separate MIG partitions does not exhibit interference underlining the value of having a functionality like MIG on modern GPUs.
翻译:深度学习训练是一个广泛使用GPU的高成本过程,但并非所有模型训练都能充分利用现代高性能GPU。多实例GPU(MIG)是英伟达推出的一项新技术,可将GPU分区以更好地适配那些不需要完整GPU全部内存和计算资源的工作负载。本文聚焦于使用ResNet模型进行图像识别训练,在三种规模深度学习工作负载下评估了支持MIG的A100 GPU性能。我们研究了这些工作负载在GPU支持的多种MIG实例上独立运行时的行为,以及在同一GPU上同质实例并行运行时的表现。结果表明,当工作负载较小时(独立运行时无法充分利用整个GPU),采用MIG可显著提升GPU利用率。通过并行训练多个小型模型,尽管每轮训练时间增加,但GPU单位时间内能完成更多工作,吞吐量提升约3倍。相比之下,对于本已能充分利用整块GPU的中等和大型工作负载,MIG仅带来边际性能提升。然而,我们观察到,使用独立MIG分区并行训练模型时未出现干扰现象,这凸显了在现代GPU上具备MIG此类功能的价值。