SEATER is a generative retrieval model that improves recommendation inference efficiency and retrieval quality by utilizing balanced tree-structured item identifiers and contrastive training objectives. We reproduce and validate SEATER's reported improvements in retrieval quality over strong baselines across all datasets from the original work, and extend the evaluation to Yambda, a large-scale music recommendation dataset. Our experiments verify SEATER's strong performance, but show that its tree construction step during training becomes a major bottleneck as the number of items grows. To address this, we implement and evaluate two alternative construction algorithms: a greedy method optimized for minimal build time, and a hybrid method that combines greedy clustering at high levels with more precise grouping at lower levels. The greedy method reduces tree construction time to less than 2% of the original with only a minor drop in quality on the dataset with the largest item collection. The hybrid method achieves retrieval quality on par with the original, and even improves on the largest dataset, while cutting construction time to just 5-8%. All data and code are publicly available for full reproducibility at https://github.com/joshrosie/re-seater.
翻译:SEATER是一种生成式检索模型,它通过利用平衡树结构项目标识符和对比训练目标来提高推荐推理效率和检索质量。我们复现并验证了SEATER在所有原始工作数据集上相对于强基线在检索质量方面的改进,并将评估扩展到Yambda这一大规模音乐推荐数据集。我们的实验证实了SEATER的强大性能,但也表明随着项目数量的增长,其训练过程中的树构建步骤成为主要瓶颈。为解决此问题,我们实现并评估了两种替代构建算法:一种针对最短构建时间优化的贪婪方法,以及一种在高层结合贪婪聚类、在低层采用更精确分组的混合方法。在拥有最大项目集合的数据集上,贪婪方法将树构建时间减少至原方法的不到2%,而质量仅轻微下降。混合方法在检索质量上与原方法持平,甚至在最大数据集上有所提升,同时将构建时间缩短至仅5-8%。所有数据和代码已在https://github.com/joshrosie/re-seater公开,以确保完全可复现性。