高效优化生成式推荐的分层标识符 (Efficient Optimization of Hierarchical Identifiers for Generative Recommendation)

SEATER is a generative retrieval model that improves recommendation inference efficiency and retrieval quality by utilizing balanced tree-structured item identifiers and contrastive training objectives. We reproduce and validate SEATER's reported improvements in retrieval quality over strong baselines across all datasets from the original work, and extend the evaluation to Yambda, a large-scale music recommendation dataset. Our experiments verify SEATER's strong performance, but show that its tree construction step during training becomes a major bottleneck as the number of items grows. To address this, we implement and evaluate two alternative construction algorithms: a greedy method optimized for minimal build time, and a hybrid method that combines greedy clustering at high levels with more precise grouping at lower levels. The greedy method reduces tree construction time to less than 2% of the original with only a minor drop in quality on the dataset with the largest item collection. The hybrid method achieves retrieval quality on par with the original, and even improves on the largest dataset, while cutting construction time to just 5-8%. All data and code are publicly available for full reproducibility at https://github.com/joshrosie/re-seater.

翻译：SEATER是一种生成式检索模型，它通过利用平衡树状结构项目标识符和对比训练目标，提高了推荐推理效率和检索质量。我们复现并验证了SEATER在所有原始工作数据集上相对于强基线在检索质量方面的改进报告，并将评估扩展到Yambda——一个大规模音乐推荐数据集。我们的实验证实了SEATER的强大性能，但也表明随着项目数量的增长，其训练过程中的树构建步骤成为主要瓶颈。为解决此问题，我们实现并评估了两种替代构建算法：一种为最小构建时间优化的贪心方法，以及一种在高层结合贪心聚类、在低层采用更精确分组的混合方法。贪心方法在拥有最大项目集合的数据集上仅导致质量轻微下降，同时将树构建时间减少至原方法的不到2%。混合方法实现了与原方法相当的检索质量，甚至在最大数据集上有所提升，同时将构建时间削减至仅5-8%。所有数据和代码已在https://github.com/joshrosie/re-seater公开，以确保完全可复现性。