News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.
翻译:新闻推荐旨在根据用户行为预测点击行为。如何有效建模用户表示是推荐偏好新闻的关键。现有工作主要集中在有监督微调阶段的改进上,然而目前仍缺乏针对用户表示优化的基于PLM的无监督预训练方法。本文提出了一种包含两个任务的非监督预训练范式,即用户行为掩盖与用户行为生成,两者均致力于有效的用户行为建模。首先,我们引入用户行为掩盖预训练任务,根据上下文行为恢复被掩盖的用户行为。通过这种方式,模型能够捕捉到更强且更全面的用户新闻阅读模式。此外,我们提出一种新颖的辅助任务——用户行为生成预训练任务,以增强从用户编码器导出的用户表示向量。在下游微调阶段,我们使用上述预训练的用户建模编码器获取新闻与用户表示。在真实新闻基准数据集上的评估表明,该方法相比现有基线模型取得了显著的性能提升。