In this paper we argue that conventional unitary-invariant measures of recommender system (RS) performance based on measuring differences between predicted ratings and actual user ratings fail to assess fundamental RS properties. More specifically, posing the optimization problem as one of predicting exact user ratings provides only an indirect suboptimal approximation for what RS applications typically need, which is an ability to accurately predict user preferences. We argue that scalar measures such as RMSE and MAE with respect to differences between actual and predicted ratings are only proxies for measuring RS ability to accurately estimate user preferences. We propose what we consider to be a measure that is more fundamentally appropriate for assessing RS performance, rank-preference consistency, which simply counts the number of prediction pairs that are inconsistent with the user's expressed product preferences. For example, if an RS predicts the user will prefer product A over product B, but the user's withheld ratings indicate s/he prefers product B over A, then rank-preference consistency has been violated. Our test results conclusively demonstrate that methods tailored to optimize arbitrary measures such as RMSE are not generally effective at accurately predicting user preferences. Thus, we conclude that conventional methods used for assessing RS performance are arbitrary and misleading.
翻译:本文指出,基于预测评分与实际用户评分差异的传统酉不变推荐系统(RS)性能度量方法无法评估系统的基本属性。具体而言,将优化问题设定为精确预测用户评分仅为推荐系统实际需求(即准确预测用户偏好的能力)提供了间接且次优的近似。我们认为,诸如RMSE和MAE等基于预测评分与实际评分差异的标量度量,仅是衡量推荐系统准确估计用户偏好能力的代理指标。我们提出一种更根本适合评估推荐系统性能的度量——排名偏好一致性,该指标仅统计与用户表达的产品偏好不一致的预测对数量。例如,若推荐系统预测用户偏好产品A胜于产品B,但用户隐藏的评分显示其偏好产品B胜于产品A,则违反了排名偏好一致性。我们的测试结果明确表明:针对RMSE等任意度量优化的方法普遍无法有效准确预测用户偏好。因此,我们得出结论:用于评估推荐系统性能的传统方法具有任意性和误导性。