Test-time augmentation (TTA) has become a promising approach for mitigating data sparsity in sequential recommendation by improving inference accuracy without requiring costly model retraining. However, existing TTA methods typically rely on uniform, user-agnostic augmentation strategies. We show that this "one-size-fits-all" design is inherently suboptimal, as it neglects substantial behavioral heterogeneity across users, and empirically demonstrate that the optimal augmentation operators vary significantly across user sequences with different characteristics for the first time. To address this limitation, we propose AdaTTA, a plug-and-play reinforcement learning-based adaptive inference framework that learns to select sequence-specific augmentation operators on a per-sequence basis. We formulate augmentation selection as a Markov Decision Process and introduce an Actor-Critic policy network with hybrid state representations and a joint macro-rank reward design to dynamically determine the optimal operator for each input user sequence. Extensive experiments on four real-world datasets and two recommendation backbones demonstrate that AdaTTA consistently outperforms the best fixed-strategy baselines, achieving up to 26.31% relative improvement on the Home dataset while incurring only moderate computational overhead
翻译:测试时增广(Test-Time Augmentation, TTA)已成为缓解序列推荐中数据稀疏性的一种有前景的方法,能够在无需昂贵模型重新训练的情况下提升推理精度。然而,现有的TTA方法通常依赖统一且与用户无关的增广策略。我们首次证明,这种“一刀切”的设计本质上存在次优性,因为它忽略了用户间显著的行为异质性,并实证表明最优增广算子在不同特征的用户序列间存在巨大差异。为克服这一局限,我们提出AdaTTA——一种基于强化学习的即插即用自适应推理框架,能够学习按序列粒度选择专用的增广算子。我们将增广选择建模为马尔可夫决策过程,并引入具有混合状态表示的Actor-Critic策略网络及联合宏排名奖励设计,以动态确定每个输入用户序列的最优算子。在四个真实世界数据集和两个推荐主干网络上的广泛实验表明,AdaTTA始终优于最优固定策略基线方法,在Home数据集上实现了高达26.31%的相对提升,且仅产生适中的计算开销。