Mutual Wasserstein Discrepancy Minimization for Sequential Recommendation

Self-supervised sequential recommendation significantly improves recommendation performance by maximizing mutual information with well-designed data augmentations. However, the mutual information estimation is based on the calculation of Kullback Leibler divergence with several limitations, including asymmetrical estimation, the exponential need of the sample size, and training instability. Also, existing data augmentations are mostly stochastic and can potentially break sequential correlations with random modifications. These two issues motivate us to investigate an alternative robust mutual information measurement capable of modeling uncertainty and alleviating KL divergence limitations. To this end, we propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation. We propose the Wasserstein Discrepancy Measurement to measure the mutual information between augmented sequences. Wasserstein Discrepancy Measurement builds upon the 2-Wasserstein distance, which is more robust, more efficient in small batch sizes, and able to model the uncertainty of stochastic augmentation processes. We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement. Extensive experiments on four benchmark datasets demonstrate the effectiveness of MStein over baselines. More quantitative analyses show the robustness against perturbations and training efficiency in batch size. Finally, improvements analysis indicates better representations of popular users or items with significant uncertainty. The source code is at https://github.com/zfan20/MStein.

翻译：自监督序列推荐通过精心设计的数据增强最大化互信息，显著提升了推荐性能。然而，互信息估计基于Kullback-Leibler散度的计算，存在非对称估计、样本量指数级需求以及训练不稳定等局限。此外，现有数据增强大多具有随机性，可能通过随机修改破坏序列相关性。这两个问题促使我们探索一种替代的鲁棒互信息度量方法，既能建模不确定性，又能缓解KL散度的局限。为此，我们提出一种基于交互Wasserstein散度最小化（MStein）的新型自监督学习框架，用于序列推荐。我们设计了Wasserstein散度度量（Wasserstein Discrepancy Measurement）来度量增强序列间的互信息，该度量建立在2-Wasserstein距离基础上，具有更强的鲁棒性、小批量数据下的高效性，并能建模随机增强过程的不确定性。我们还提出一种基于Wasserstein散度度量的新型对比学习损失函数。在四个基准数据集上的广泛实验表明，MStein方法优于基线模型。进一步的定量分析展示了其对扰动的鲁棒性及批量尺寸上的训练效率。最后，改进分析表明该方法能显著提升对具有较大不确定性的热门用户或物品的表示质量。源代码见https://github.com/zfan20/MStein。