Recent years have witnessed a rapid surge in research leveraging Large Language Models (LLMs) for recommendation. These methods typically employ supervised fine-tuning (SFT) to adapt LLMs to recommendation scenarios, and utilize beam search during inference to efficiently retrieve $B$ top-ranked recommended items. However, we identify a critical training-inference inconsistency: while SFT optimizes the overall probability of positive items, it does not guarantee that such items will be retrieved by beam search even if they possess high overall probabilities. Due to the greedy pruning mechanism, beam search can prematurely discard a positive item once its prefix probability is insufficient. To address this inconsistency, we propose BEAR (Beam-SEarch-Aware Regularization), a novel fine-tuning objective that explicitly accounts for beam search behavior during training. Rather than directly simulating beam search for each instance during training, which is computationally prohibitive, BEAR enforces a relaxed necessary condition: each token in a positive item must rank within the top-$B$ candidate tokens at each decoding step. This objective effectively mitigates the risk of incorrect pruning while incurring negligible computational overhead compared to standard SFT. Extensive experiments across four real-world datasets demonstrate that BEAR significantly outperforms strong baselines. Code will be released upon acceptance.
翻译:近年来,利用大语言模型(LLMs)进行推荐的研究迅速兴起。这些方法通常采用监督微调(SFT)使LLMs适应推荐场景,并在推理阶段使用束搜索来高效检索排名前$B$的推荐项目。然而,我们发现一个关键的训练-推理不一致性问题:尽管SFT优化了正样本项目的整体概率,但并不能保证此类项目即使具有较高的整体概率也能被束搜索检索到。由于束搜索的贪心剪枝机制,一旦正样本项目的前缀概率不足,它就可能被过早地丢弃。为解决这一不一致性问题,我们提出了BEAR(束搜索感知正则化),这是一种新颖的微调目标,在训练过程中显式地考虑束搜索的行为。BEAR并非在训练中直接为每个实例模拟束搜索(这在计算上是不可行的),而是强制一个宽松的必要条件:正样本项目中的每个词元必须在每个解码步骤中排名于前$B$个候选词元之内。该目标有效降低了错误剪枝的风险,同时与标准SFT相比仅带来可忽略的计算开销。在四个真实世界数据集上的大量实验表明,BEAR显著优于现有强基线方法。代码将在论文被接受后开源。