Collaborative filtering (CF) has become a popular method for developing recommender systems (RSs) where ratings of a user for new items are predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for rating predictions, which has not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
翻译:协同过滤已成为推荐系统的主流方法,其通过利用用户历史偏好及其他用户的可用偏好信息来预测用户对新项目的评分。尽管基于协同过滤的方法广受欢迎,但观测数据稀疏性往往严重制约其性能。本研究首次探索了最大间隔矩阵分解(一种广泛用于评分预测的协同过滤技术)在数据增强与精化方面的应用潜力。我们利用协同过滤算法的固有特性评估单个评分的置信水平,提出基于自训练的半监督评分增强方法。假设协同过滤算法低置信度预测源于训练数据缺陷,通过系统性数据增强策略可提升算法性能。我们迭代采用高置信度预测评分扩充训练数据,同时通过精化过程移除低置信度条目。重复该过程可使系统逐步提升预测精度。在多个前沿协同过滤算法上的实验结果表明,本方法能实现有效的评分增强,显著改善基线方法的性能表现。