The successful integration of graph neural networks into recommender systems (RSs) has led to a novel paradigm in collaborative filtering (CF), graph collaborative filtering (graph CF). By representing user-item data as an undirected, bipartite graph, graph CF utilizes short- and long-range connections to extract collaborative signals that yield more accurate user preferences than traditional CF methods. Although the recent literature highlights the efficacy of various algorithmic strategies in graph CF, the impact of datasets and their topological features on recommendation performance is yet to be studied. To fill this gap, we propose a topology-aware analysis of graph CF. In this study, we (i) take some widely-adopted recommendation datasets and use them to generate a large set of synthetic sub-datasets through two state-of-the-art graph sampling methods, (ii) measure eleven of their classical and topological characteristics, and (iii) estimate the accuracy calculated on the generated sub-datasets considering four popular and recent graph-based RSs (i.e., LightGCN, DGCF, UltraGCN, and SVD-GCN). Finally, the investigation presents an explanatory framework that reveals the linear relationships between characteristics and accuracy measures. The results, statistically validated under different graph sampling settings, confirm the existence of solid dependencies between topological characteristics and accuracy in the graph-based recommendation, offering a new perspective on how to interpret graph CF.
翻译:图神经网络成功融入推荐系统,催生了协同过滤的新范式——图协同过滤(graph CF)。通过将用户-项目数据表示为无向二分图,图协同过滤利用短程和长程连接提取协同信号,相比传统协同过滤方法能更精准地获取用户偏好。尽管近期文献强调了图协同过滤中多种算法策略的有效性,但数据集及其拓扑特征对推荐性能的影响尚未得到系统研究。为填补这一空白,我们提出一种面向图协同过滤的拓扑感知分析方法。本研究:(i)选取若干广泛采用的推荐数据集,通过两种前沿图采样方法生成大量合成子数据集;(ii)测量其十一种经典拓扑特征;(iii)基于四个主流新型图推荐系统(LightGCN、DGCF、UltraGCN和SVD-GCN),评估生成子数据集的计算准确率。最终,研究提出了一个解释性框架,揭示了特征与准确率指标间的线性关系。经不同图采样设置下的统计验证,结果证实了基于图的推荐中拓扑特征与准确率之间存在显著依存关系,为理解图协同过滤提供了全新视角。