We introduce the Markovian Pre-trained Transformer (MPT) for next-item recommendation, a transferable model fully pre-trained on synthetic Markov chains, yet capable of achieving state-of-the-art performance by fine-tuning a lightweight adaptor. This counterintuitive success stems from the observation of the `Markovian' nature: advanced sequential recommenders coincidentally rely on the latest interaction to make predictions, while the historical interactions serve mainly as auxiliary cues for inferring the user's general, non-sequential identity. This characteristic necessitates the capabilities of a universal recommendation model to effectively summarize the user sequence, with particular emphasis on the latest interaction. MPT inherently has the potential to be universal and transferable. On the one hand, when trained to predict the next state of Markov chains, it acquires the capabilities to estimate transition probabilities from the context (one adaptive manner for summarizing sequences) and attend to the last state to ensure accurate state transitions. On the other hand, unlike the heterogeneous interaction data, an unlimited amount of controllable Markov chains is available to boost the model capacity. We conduct extensive experiments on five public datasets from three distinct platforms to validate the superiority of Markovian pre-training over traditional recommendation pre-training and recent language pre-training paradigms.
翻译:我们提出了用于下一项推荐的马尔可夫预训练Transformer(MPT),这是一个完全在合成马尔可夫链上预训练的可迁移模型,仅需微调轻量适配器即可实现最先进的性能。这一反直觉的成功源于对“马尔可夫性”的观察:先进的序列推荐模型实际上都依赖最近一次交互进行预测,而历史交互主要作为推断用户一般性、非序列身份的辅助线索。这一特性要求通用推荐模型必须具备有效总结用户序列的能力,并特别强调最近一次交互。MPT本质上具有成为通用可迁移模型的潜力。一方面,当训练其预测马尔可夫链的下一状态时,模型获得了根据上下文估计转移概率(一种自适应序列总结方式)以及关注最后状态以确保准确状态转移的能力。另一方面,与异构交互数据不同,无限量的可控马尔可夫链可用于提升模型容量。我们在来自三个不同平台的五个公共数据集上进行了大量实验,验证了马尔可夫预训练相较于传统推荐预训练及近期语言预训练范式的优越性。