Sequential recommendation (SR) models based on Transformers have achieved remarkable successes. The self-attention mechanism of Transformers for computer vision and natural language processing suffers from the oversmoothing problem, i.e., hidden representations becoming similar to tokens. In the SR domain, we, for the first time, show that the same problem occurs. We present pioneering investigations that reveal the low-pass filtering nature of self-attention in the SR, which causes oversmoothing. To this end, we propose a novel method called Beyond Self-Attention for Sequential Recommendation (BSARec), which leverages the Fourier transform to i) inject an inductive bias by considering fine-grained sequential patterns and ii) integrate low and high-frequency information to mitigate oversmoothing. Our discovery shows significant advancements in the SR domain and is expected to bridge the gap for existing Transformer-based SR models. We test our proposed approach through extensive experiments on 6 benchmark datasets. The experimental results demonstrate that our model outperforms 7 baseline methods in terms of recommendation performance.
翻译:基于Transformer的序列推荐(SR)模型已取得显著成功。用于计算机视觉和自然语言处理的Transformer自注意力机制存在过度平滑问题,即隐藏表示变得与令牌相似。在SR领域,我们首次证明了相同问题的出现。我们开展的先驱性研究揭示了SR中自注意力的低通滤波本质,这正是导致过度平滑的原因。为此,我们提出了一种名为"超越自注意力序列推荐"(BSARec)的新方法,该方法利用傅里叶变换实现以下两点:i)通过考虑细粒度序列模式注入归纳偏置;ii)整合低频与高频信息以减轻过度平滑。我们的发现展现了SR领域的重大进展,有望填补现有基于Transformer的SR模型的空白。我们在6个基准数据集上进行了广泛实验来测试所提方法,结果表明我们的模型在推荐性能上优于7种基线方法。