Collaborative filtering (CF) has become a popular method for developing recommender systems (RS) where ratings of a user for new items is predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for the rating predictions, which have not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.
翻译:协同过滤(CF)已成为开发推荐系统(RS)的流行方法,该方法根据用户过去的偏好和其他用户的可用偏好信息,预测用户对新项目的评分。尽管基于CF的方法广受欢迎,但其性能常因观测条目的稀疏性而受到极大限制。在本研究中,我们探索了最大间隔矩阵分解(MMMF)中的数据增强与精炼方面,这是一种广泛用于评分预测的CF技术,而此前尚未被研究。我们利用CF算法的固有特性来评估单个评分的置信水平,并提出一种基于自训练的半监督评分增强方法。我们假设,任何CF算法低置信度的预测均源于训练数据的某种缺陷,因此,通过采用系统的数据增强策略可提升算法性能。我们迭代地使用部分高置信度预测评分来扩充训练数据,并通过精炼过程移除低置信度条目。重复此过程后,系统学会提高预测准确性。我们的方法在多种先进的CF算法上进行了实验评估,实现了有信息量的评分增强,并提升了基线方法的性能。