Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.
翻译:大规模图神经网络的高效训练研究重点聚焦于降低其内存消耗。Liu等人(2022)提出的极端激活压缩(EXACT)方法,通过将中间激活图量化至INT2精度,在显著降低内存消耗的同时,几乎未造成性能损失,并大幅减少了GPU内存占用。本文通过引入中间激活图的分块量化策略,对EXACT方法进行了改进。我们实验分析了不同块尺寸的影响,结果表明即使采用与原始EXACT同等程度的量化方案,内存消耗仍可进一步降低(>15%),且每轮训练速度提升约5%,同时保持相似的性能权衡。此外,我们纠正了EXACT中关于中间激活图分布(原假设为均匀分布)的假设偏差,并改进了量化与反量化步骤的方差估计方法。