The performance of Recommender Systems (RS) varies significantly across users, yet the underlying reasons for this variance remain poorly understood. This paper introduces a unified framework to analyze and explain this performance gap by quantifying user profile characteristics. We propose two novel, information-theoretic measures: Mean Surprise (S(u)), which captures a user's deviation from popular items and is closely related to popularity bias, and Mean Conditional Surprise (CS(u)), which measures the internal coherence of a user's interactions in a domain-agnostic manner. Through extensive experiments on 7 algorithms and 9 datasets, we demonstrate that these measures are strong predictors of recommendation performance. Our analysis reveals that performance gains from complex models are concentrated on "coherent" users, while all algorithms perform poorly on "incoherent" users. We show how these measures provide practical utility for the Web community by: (1) enabling robust, stratified evaluation to identify model weaknesses; (2) facilitating a novel analysis of the behavioral alignment of recommendations; and (3) guiding targeted system design, which we validate by training a specialized model on a segment of "coherent" users that achieves superior performance for that group with significantly less data. This work provides a new lens for understanding user behavior and offers practical tools for building more robust and efficient large-scale recommender systems.
翻译:推荐系统(RS)在不同用户间的性能差异显著,然而导致这种差异的根本原因仍不甚明了。本文引入一个统一框架,通过量化用户画像特征来分析和解释这种性能差距。我们提出了两种新颖的信息论度量指标:平均惊奇度(S(u)),用于捕捉用户对热门项目的偏离程度,并与流行度偏差密切相关;以及平均条件惊奇度(CS(u)),以领域无关的方式衡量用户交互行为的内在一致性。通过对7种算法和9个数据集的大规模实验,我们证明这些指标是推荐性能的有力预测因子。我们的分析表明,复杂模型带来的性能提升主要集中在“一致性”用户上,而所有算法在“非一致性”用户上均表现不佳。我们展示了这些度量指标如何为网络社区提供实际效用:(1)实现稳健的分层评估以识别模型弱点;(2)促进对推荐行为对齐性的新颖分析;(3)指导针对性系统设计——我们通过在“一致性”用户子集上训练一个专用模型对此进行了验证,该模型仅用显著更少的数据即在该群体上取得了更优性能。本研究为理解用户行为提供了新视角,并为构建更稳健、高效的大规模推荐系统提供了实用工具。