Recommender Systems (RS) shape the filtering and curation of online content, yet we have limited understanding of how predictable their recommendation outputs are. We propose data-driven metrics that quantify the predictability of recommendation datasets by measuring the structural complexity of the user-item interaction matrix. High complexity indicates intricate interaction patterns that are harder to predict; low complexity indicates simpler, more predictable structures. We operationalize structural complexity via data perturbations, using singular value decomposition (SVD) to assess how stable the latent structure remains under perturbations. Our hypothesis is that random perturbations minimally affect highly organized data, but cause substantial structural disruption in intrinsically complex data. By analyzing prediction errors on perturbed interactions, we derive metrics that quantify this sensitivity at both the dataset and the interaction levels, yielding a principled measure of inherent predictability. Experiments on real-world datasets show that our structural complexity metrics correlate with the performance of state-of-the-art recommendation algorithms. We also demonstrate structure-aware data selection: in low-data settings, models trained on a carefully chosen subset of interactions with low structural perturbation error consistently outperform models trained on the full dataset. Thus, structural complexity serves both as a precise diagnostic of dataset complexity and as a principled foundation for efficient, data-centric training of RS.
翻译:推荐系统塑造了在线内容的过滤与策展,但我们对其推荐输出的可预测性仍认知有限。我们提出基于数据的度量方法,通过衡量用户-物品交互矩阵的结构复杂性来量化推荐数据集的预测难度。高复杂性意味着难以预测的复杂交互模式,而低复杂性则对应更简单、更易预测的结构。我们通过数据扰动实现结构复杂性的可操作化,利用奇异值分解评估潜在结构在扰动下的稳定性。我们的假设是:随机扰动对高度组织化的数据影响极小,但会在内在复杂的数据中引发显著结构破坏。通过分析扰动交互的预测误差,我们推导出在数据集和交互层面量化这种敏感性的度量指标,从而获得对内在可预测性的原理性测量。真实世界数据集上的实验表明,我们的结构复杂性度量与最先进推荐算法的性能相关。我们还展示了结构感知型数据筛选:在低数据场景下,基于精心筛选的低结构扰动误差交互子集训练的模型,其性能始终优于在全数据集上训练的模型。因此,结构复杂性既可作为数据集复杂性的精确诊断工具,也可作为高效、以数据为中心的推荐系统训练的原理性基础。