On the Accuracy Limits of Sequential Recommender Systems: An Entropy-Based Approach

Sequential recommender systems have achieved steady gains in offline accuracy, yet it remains unclear how close current models are to the intrinsic accuracy limit imposed by the data. A reliable, model-agnostic estimate of this ceiling would enable principled difficulty assessment and headroom estimation before costly model development. Existing predictability analyses typically combine entropy estimation with Fano's inequality inversion; however, in recommendation they are hindered by sensitivity to candidate-space specification and distortion from Fano-based scaling in low-predictability regimes. We develop an entropy-induced, training-free approach for quantifying accuracy limits in sequential recommendation, yielding a candidate-size-agnostic estimate. Experiments on controlled synthetic generators and diverse real-world benchmarks show that the estimator tracks oracle-controlled difficulty more faithfully than baselines, remains insensitive to candidate-set size, and achieves high rank consistency with best-achieved offline accuracy across state-of-the-art sequential recommenders (Spearman rho up to 0.914). It also supports user-group diagnostics by stratifying users by novelty preference, long-tail exposure, and activity, revealing systematic predictability differences. Furthermore, predictability can guide training data selection: training sets constructed from high-predictability users yield strong downstream performance under reduced data budgets. Overall, the proposed estimator provides a practical reference for assessing attainable accuracy limits, supporting user-group diagnostics, and informing data-centric decisions in sequential recommendation.

翻译：序列推荐系统在离线精度上持续取得进展，但当前模型距离数据固有的精度极限还有多远仍不明确。一种可靠且与模型无关的该上限估计，能够在昂贵的模型开发之前，实现原则性的难度评估与提升空间估算。现有的可预测性分析通常将熵估计与法诺不等式反演相结合；然而，在推荐场景中，这类方法受限于对候选空间设定的敏感性，以及在低可预测性区域由法诺缩放导致的失真。我们提出了一种基于熵、无需训练的序列推荐精度极限量化方法，可得到与候选集大小无关的估计。在可控的合成数据生成器和多样化真实世界基准上的实验表明，该估计器能够比基线方法更准确地追踪Oracle控制的难度，对候选集规模不敏感，并且与当前最优序列推荐器的最佳离线精度具有高度秩一致性（斯皮尔曼秩相关系数最高达0.914）。它还支持用户组诊断，可通过新颖性偏好、长尾曝光度和活跃度对用户进行分层，揭示系统性的可预测性差异。此外，可预测性可用于指导训练数据选择：基于高可预测性用户构建的训练集在缩减数据预算下仍能获得强劲的下游性能。总体而言，所提出的估计器为评估可达精度极限、支持用户组诊断以及指导序列推荐中数据为中心决策提供了实用参考。