Cold-start recommendation remains a central challenge in dynamic, open-world platforms, requiring models to recommend for newly registered users (user cold-start) and to recommend newly introduced items to existing users (item cold-start) under sparse or missing interaction signals. Recent generative recommenders built on pre-trained language models (PLMs) are often expected to mitigate cold-start by using item semantic information (e.g., titles and descriptions) and test-time conditioning on limited user context. However, cold-start is rarely treated as a primary evaluation setting in existing studies, and reported gains are difficult to interpret because key design choices, such as model scale, identifier design, and training strategy, are frequently changed together. In this work, we present a systematic reproducibility study of generative recommendation under a unified suite of cold-start protocols.
翻译:冷启动推荐在动态、开放世界平台中仍是一个核心挑战,要求模型在交互信号稀疏或缺失的情况下,为新注册用户(用户冷启动)进行推荐,并向现有用户推荐新引入的物品(物品冷启动)。基于预训练语言模型(PLM)的近期生成式推荐系统,通常期望通过利用物品语义信息(如标题和描述)以及基于有限用户上下文的测试时条件控制来缓解冷启动问题。然而,现有研究中很少将冷启动作为主要评估场景,且由于关键设计选择(如模型规模、标识符设计和训练策略)经常同时变化,所报告的性能提升难以解释。在本工作中,我们针对生成式推荐开展了一项系统的可重复性研究,在统一的冷启动协议套件下进行。