Large neural networks can improve the accuracy and generalization on tasks across many domains. However, this trend cannot continue indefinitely due to limited hardware memory. As a result, researchers have devised a number of memory optimization methods (MOMs) to alleviate the memory bottleneck, such as gradient checkpointing, quantization, and swapping. In this work, we study memory optimization methods and show that, although these strategies indeed lower peak memory usage, they can actually decrease training throughput by up to 9.3x. To provide practical guidelines for practitioners, we propose a simple but effective performance model PAPAYA to quantitatively explain the memory and training time trade-off. PAPAYA can be used to determine when to apply the various memory optimization methods in training different models. We outline the circumstances in which memory optimization techniques are more advantageous based on derived implications from PAPAYA. We assess the accuracy of PAPAYA and the derived implications on a variety of machine models, showing that it achieves over 0.97 R score on predicting the peak memory/throughput, and accurately predicts the effectiveness of MOMs across five evaluated models on vision and NLP tasks.
翻译:大型神经网络能够提升多项任务在多个领域的准确性与泛化能力。然而,由于硬件内存的限制,这一趋势无法无限持续。为此,研究人员设计了多种内存优化方法(MOMs),如梯度检查点、量化和交换,以缓解内存瓶颈。在本研究中,我们分析了内存优化方法,并发现尽管这些策略确实降低了峰值内存占用,但它们可能使训练吞吐量下降高达9.3倍。为给实践者提供实用指南,我们提出一个简单而有效的性能模型PAPAYA,用于定量解释内存与训练时间之间的权衡。PAPAYA可用于确定在训练不同模型时何时应用各类内存优化方法。基于PAPAYA推导的结论,我们概述了内存优化技术更占优势的情形。我们在多种机器模型上评估了PAPAYA及其推导结论的准确性,结果显示,在预测峰值内存/吞吐量方面,其R²得分超过0.97,并能在视觉与NLP任务的五个评估模型中准确预测MOMs的有效性。