Learning to Recommend Using Non-Uniform Data

Learning user preferences for products based on their past purchases or reviews is at the cornerstone of modern recommendation engines. One complication in this learning task is that some users are more likely to purchase products or review them, and some products are more likely to be purchased or reviewed by the users. This non-uniform pattern degrades the power of many existing recommendation algorithms, as they assume that the observed data are sampled uniformly at random among user-product pairs. In addition, existing literature on modeling non-uniformity either assume user interests are independent of the products, or lack theoretical understanding. In this paper, we first model the user-product preferences as a partially observed matrix with non-uniform observation pattern. Next, building on the literature about low-rank matrix estimation, we introduce a new weighted trace-norm penalized regression to predict unobserved values of the matrix. We then prove an upper bound for the prediction error of our proposed approach. Our upper bound is a function of a number of parameters that are based on a certain weight matrix that depends on the joint distribution of users and products. Utilizing this observation, we introduce a new optimization problem to select a weight matrix that minimizes the upper bound on the prediction error. The final product is a new estimator, NU-Recommend, that outperforms existing methods in both synthetic and real datasets. Our approach aims at accurate predictions for all users while prioritizing fairness. To achieve this, we employ a bias-variance tradeoff mechanism that ensures good overall prediction performance without compromising the predictive accuracy for less active users.

翻译：根据用户过往购买记录或评论学习其产品偏好是现代推荐引擎的核心。然而，这一学习任务中存在一个复杂因素：部分用户更倾向于购买或评论产品，而部分产品也更容易被用户购买或评论。这种非均匀模式削弱了许多现有推荐算法的性能，因为它们假设观测数据是从用户-产品对中均匀随机采样的。此外，现有关于非均匀性建模的研究要么假设用户兴趣与产品无关，要么缺乏理论支撑。本文首先将用户-产品偏好建模为具有非均匀观测模式的部分观测矩阵；其次，基于低秩矩阵估计的相关文献，我们引入一种新的加权迹范数惩罚回归方法来预测矩阵的未观测值；接着，我们证明了该方法的预测误差上界。这个上界是若干参数的函数，这些参数取决于一个基于用户与产品联合分布的权重矩阵。基于这一发现，我们提出一个新的优化问题，用于选择能最小化预测误差上界的权重矩阵。最终得到的估计器NU-Recommend在合成数据集和真实数据集中均优于现有方法。我们的方法在优先保障公平性的前提下，旨在对所有用户实现精准预测。为此，我们采用偏差-方差权衡机制，在确保整体预测性能的同时，避免牺牲活跃度较低用户的预测准确度。