Does LLM Focus on the Right Words? Mitigating Context Bias in LLM-based Recommenders

Large language models (LLMs), owing to their extensive open-domain knowledge and semantic reasoning capabilities, have been increasingly integrated into recommender systems (RS). However, a substantial gap remains between the pre-training objectives of LLMs and the specific requirements of recommendation tasks. To address this gap, supervised fine-tuning (SFT) is commonly performed on specially curated recommendation datasets to further enhance their predictive ability. Despite its success, SFT exhibits a critical limitation: it induces Context Bias, whereby the model over-relies on auxiliary tokens, such as task descriptions and prefix-generated tokens, while underutilizing core user interaction tokens that encode user-specific preferences. This bias not only undermines recommendation accuracy but also raises unfairness concerns. To address this issue, we propose Group Distributionally Robust Optimization-based Tuning (GDRT), a novel fine-tuning paradigm that enforces consistent model performance across token groups with varying degrees of relevance to auxiliary tokens. By adaptively upweighting underperforming groups, typically those weakly correlated with auxiliary tokens, GDRT shifts the model's attention from superficial auxiliary cues to informative user interaction tokens, thereby mitigating context bias. Extensive experiments conducted on three public datasets demonstrate that GDRT effectively mitigates context bias, yielding substantial improvements in recommendation accuracy (with an average NDCG@10 gain of 24.29%) and significantly enhancing recommendation fairness. The code is available at https://github.com/WANGBohaO-jpg/GDRT.

翻译：大型语言模型（LLMs）凭借其广泛的开域知识和语义推理能力，正日益被整合到推荐系统（RS）中。然而，LLMs的预训练目标与推荐任务的具体需求之间仍存在显著差距。为弥合这一差距，通常会在专门构建的推荐数据集上进行监督微调（SFT），以进一步提升其预测能力。尽管SFT取得了成功，但它存在一个关键局限：它会引发上下文偏见，即模型过度依赖辅助标记（如任务描述和前缀生成的标记），而未能充分利用编码用户特定偏好的核心用户交互标记。这种偏见不仅损害推荐准确性，还引发了公平性担忧。为解决此问题，我们提出了基于组分布鲁棒优化的微调（GDRT），这是一种新颖的微调范式，它强制模型在与辅助标记相关度不同的标记组之间保持一致的性能表现。通过自适应地提升表现不佳的组（通常是与辅助标记弱相关的组）的权重，GDRT将模型的注意力从表面的辅助线索转移到信息丰富的用户交互标记上，从而缓解上下文偏见。在三个公开数据集上进行的大量实验表明，GDRT有效缓解了上下文偏见，显著提升了推荐准确性（NDCG@10平均增益达24.29%），并显著增强了推荐公平性。代码可在 https://github.com/WANGBohaO-jpg/GDRT 获取。