Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.
翻译:近年来,序列推荐和下一项预测任务在推荐系统领域中日益流行。目前,两个最先进的基线模型是基于Transformer的SASRec和BERT4Rec。过去几年中,已有不少出版物对这两种算法进行了比较,并提出了新的最先进模型。在大多数出版物中,BERT4Rec的表现优于SASRec。但BERT4Rec对所有物品使用基于softmax的交叉熵损失,而SASRec使用负采样并计算一个正例和一个负例的二元交叉熵损失。在我们的工作中,我们表明,如果两种模型使用相同的损失函数(即BERT4Rec所用的损失)进行训练,那么SASRec在质量和训练速度两方面都将显著优于BERT4Rec。此外,我们还表明,SASRec可以有效使用负采样进行训练,并且仍然优于BERT4Rec,但负例数量应远大于1。