Sequential recommenders weight historical interactions either through positional self-attention as in Transformers or through a single implicit decay schedule as in State-Space Models. Neither makes the multi-scale temporal structure of real user behaviour explicit. We propose MARS, an encoder-agnostic aggregation operator that consumes real timestamps and produces K summaries emphasising distinct recency scales, fused by a context-adaptive gate. MARS adds at most 6% parameters and runs in $\mathcal{O}(LdK)$ time. MARS adapts to data density by automatically selecting between two encoder instantiations: MARS-T (Transformer) for sparse data and MARS-M (Mamba) for dense data, based on the average sequence length of the training set. On five public benchmarks against ten Transformer- and Mamba-based baselines under a unified RecBole protocol, MARS attains the best HR@10 on every benchmark, with mean relative gain +19.7% over the strongest content-only Transformer baseline on sparse data (reaching +36.2% on Games) and +3.2% HR@10 / +0.9% NDCG over SIGMA on dense ML-1M at 42% fewer MFLOPs, occupying the accuracy-efficiency Pareto frontier across the data-density spectrum. A backbone-only ablation isolates the marginal contribution of MARS at +4% to +19% HR@10 on sparse data and motivates the dual-instantiation design. The code is included in the supplementary material.
翻译:序列推荐模型通常通过Transformer中的位置自注意力机制或状态空间模型中的单一隐式衰减计划对历史交互进行加权,但均未显式建模真实用户行为的多尺度时间结构。我们提出MARS——一种编码器无关的聚合算子,它消耗真实时间戳并生成K个强调不同新近尺度的摘要,通过上下文自适应门控进行融合。MARS仅增加最多6%的参数,运行时间复杂度为$\mathcal{O}(LdK)$。MARS通过自动选择两种编码器实例化来适应数据密度:基于训练集平均序列长度,稀疏数据选用MARS-T(Transformer),稠密数据选用MARS-M(Mamba)。在五个公开基准上,与十种基于Transformer和Mamba的基线模型在统一RecBole协议下比较,MARS在所有基准上均取得最佳HR@10,相比于稀疏数据上最强的纯内容Transformer基线,平均相对增益达+19.7%(在Games数据集上达到+36.2%),且在稠密ML-1M数据集上以42%更少的MFLOPs实现比SIGMA高出+3.2%的HR@10和+0.9%的NDCG,占据数据密度频谱中精度-效率的帕累托前沿。仅使用骨干网络消融实验可隔离MARS的边际贡献:在稀疏数据上HR@10提升4%至19%,并验证了双实例化设计的合理性。代码包含在补充材料中。