Top-N recommendation aims to recommend each consumer a small set of N items from a large collection of items, and its accuracy is one of the most common indexes to evaluate the performance of a recommendation system. While a large number of algorithms are proposed to push the Top-N accuracy by learning the user preference from their history purchase data, a predictability question is naturally raised - whether there is an upper limit of such Top-N accuracy. This work investigates such predictability by studying the degree of regularity from a specific set of user behavior data. Quantifying the predictability of Top-N recommendations requires simultaneously quantifying the limits on the accuracy of the N behaviors with the highest probability. This greatly increases the difficulty of the problem. To achieve this, we firstly excavate the associations among N behaviors with the highest probability and describe the user behavior distribution based on the information theory. Then, we adopt the Fano inequality to scale and obtain the Top-N predictability. Extensive experiments are conducted on the real-world data where significant improvements are observed compared to the state-of-the-art methods. We have not only completed the predictability calculation for N targets but also obtained predictability that is much closer to the true value than existing methods. We expect our results to assist these research areas where the quantitative requirement of Top-N predictability is required.
翻译:Top-N推荐旨在从大量物品中为每位消费者推荐一个包含N个物品的小集合,其准确性是评估推荐系统性能的最常见指标之一。尽管已有大量算法通过从用户历史购买数据中学习偏好来提高Top-N准确性,但一个自然的问题随之产生:这种Top-N准确性是否存在上限?本文通过研究特定用户行为数据集的规律性程度来探讨这种可预测性。量化Top-N推荐的可预测性需要同时量化概率最高的N种行为准确性的极限,这极大地增加了问题的难度。为此,我们首先挖掘概率最高的N种行为之间的关联性,并基于信息论描述用户行为分布。然后,我们采用Fano不等式进行缩放,从而获得Top-N可预测性。在真实数据集上进行的大量实验表明,与现有最先进方法相比,我们取得了显著改进。我们不仅完成了N个目标的可预测性计算,还获得了比现有方法更接近真实值的可预测性。期望我们的研究成果能够为需要定量分析Top-N可预测性的研究领域提供支持。