The strategy for selecting candidate sets -- the set of items that the recommendation system is expected to rank for each user -- is an important decision in carrying out an offline top-$N$ recommender system evaluation. The set of candidates is composed of the union of the user's test items and an arbitrary number of non-relevant items that we refer to as decoys. Previous studies have aimed to understand the effect of different candidate set sizes and selection strategies on evaluation. In this paper, we extend this knowledge by studying the specific interaction of candidate set selection strategies with popularity bias, and use simulation to assess whether sampled candidate sets result in metric estimates that are less biased with respect to the true metric values under complete data that is typically unavailable in ordinary experiments.
翻译:选择候选集的策略(即推荐系统预期为每个用户排序的项目集合)是进行离线Top-N推荐系统评估的重要决策。候选集由用户测试项目与任意数量的非相关项目(我们称之为干扰项)的并集构成。以往研究旨在探究不同候选集大小及选择策略对评估效果的影响。本文通过研究候选集选择策略与流行度偏差的具体交互作用来拓展这一认知,并借助模拟仿真评估:在完整数据(常规实验中通常无法获取)条件下,采样候选集所得指标估计值相对于真实指标值的偏差是否更小。