Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.
翻译:大规模图神经网络的高效训练研究重点关注其内存消耗的降低。Liu等人(2022)提出极端激活压缩(EXACT)方法,通过将中间激活图量化至INT2精度,显著减少了内存消耗。该方法在实现GPU内存消耗大幅降低的同时,几乎未出现性能下降。本文提出对EXACT策略的改进,采用中间激活图的分块量化方法。我们通过实验分析不同分块大小,发现在保持与原EXACT相似性能权衡的情况下,即使进行极端量化,仍能进一步降低内存消耗(超过15%)并提升每轮训练速度(约5%)。此外,我们修正了EXACT中关于中间激活图分布(假设为均匀分布)的假设,展示了量化与反量化步骤中改进的方差估计。