On the Accuracy Limits of Sequential Recommender Systems: An Entropy-Based Approach

Sequential recommender systems have achieved steady gains in offline accuracy, yet it remains unclear how close current models are to the intrinsic accuracy limit imposed by the data. A reliable, model-agnostic estimate of this ceiling would enable principled difficulty assessment and headroom estimation before costly model development. Existing predictability analyses typically combine entropy estimation with Fano's inequality inversion; however, in recommendation they are hindered by sensitivity to candidate-space specification and distortion from Fano-based scaling in low-predictability regimes. We develop an entropy-induced, training-free approach for quantifying accuracy limits in sequential recommendation, yielding a candidate-size-agnostic estimate. Experiments on controlled synthetic generators and diverse real-world benchmarks show that the estimator tracks oracle-controlled difficulty more faithfully than baselines, remains insensitive to candidate-set size, and achieves high rank consistency with best-achieved offline accuracy across state-of-the-art sequential recommenders (Spearman rho up to 0.914). It also supports user-group diagnostics by stratifying users by novelty preference, long-tail exposure, and activity, revealing systematic predictability differences. Furthermore, predictability can guide training data selection: training sets constructed from high-predictability users yield strong downstream performance under reduced data budgets. Overall, the proposed estimator provides a practical reference for assessing attainable accuracy limits, supporting user-group diagnostics, and informing data-centric decisions in sequential recommendation.

翻译：序列推荐系统在离线准确率上持续取得进步，但当前模型距离数据固有的准确率极限还有多远仍不清楚。对这一上限进行可靠且与模型无关的估计，有助于在昂贵的模型开发之前进行原则性的难度评估和余量估计。现有的可预测性分析通常结合熵估计与法诺不等式反推；然而，在推荐场景中，它们受到候选空间规范敏感性和低可预测性情况下基于法诺缩放导致的失真的阻碍。我们提出了一种基于熵的、无需训练的方法来量化序列推荐中的准确率极限，从而得到与候选集规模无关的估计。在受控合成生成器和多种真实世界基准上的实验表明，该估计器比基线方法更忠实地追踪了 oracle 控制的难度，对候选集规模不敏感，并且与最先进的序列推荐器实现的最佳离线准确率具有高度秩一致性（斯皮尔曼相关系数高达 0.914）。它还可以通过按新颖性偏好、长尾曝光和活跃度对用户进行分层，支持用户组诊断，揭示系统性的可预测性差异。此外，可预测性可以指导训练数据选择：由高可预测性用户构建的训练集在减少数据预算的情况下仍能产生强大的下游性能。总体而言，所提出的估计器为评估可达到的准确率极限、支持用户组诊断以及为序列推荐中的数据为中心决策提供实用参考。