Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent vanishing gradients during back-propagation. However, the STE often causes the gradient mismatch problem, leading to sub-optimal results. In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings and a quantized module for compressing full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from both lower memory requirements and faster inference speeds compared to vanilla GNNs. To address the gradient mismatch problem in STE, we further consider the quantized errors and its second-order derivatives for better stability. The experimental results on several large-scale datasets show that HQ-GNN achieves a good balance between latency and performance.
翻译:图神经网络(GNN)已在推荐系统中取得了最先进的性能。然而,从大规模物品库中进行搜索和排序的过程通常需要较高的延迟,这限制了GNN在工业级应用中的广泛部署。为解决此问题,许多方法将用户/物品表示压缩到二值嵌入空间以减少存储需求并加速推理。同时,它们使用直通估计器(STE)来防止反向传播过程中的梯度消失。但STE常导致梯度失配问题,从而产生次优结果。本文提出Hessian感知量化GNN(HQ-GNN),作为实现用户/物品离散表示的有效方案,支持快速检索。HQ-GNN由两个组件构成:用于学习连续节点嵌入的GNN编码器,以及将全精度嵌入压缩为低位嵌入的量化模块。因此,与原始GNN相比,HQ-GNN兼具更低存储需求和更快推理速度的优势。为解决STE中的梯度失配问题,我们进一步考虑量化误差及其二阶导数以提升稳定性。在多个大规模数据集上的实验结果表明,HQ-GNN在延迟与性能之间取得了良好平衡。