Hard negative sampling improves recommendation performance by accelerating convergence and sharpening the decision boundary. However, most existing methods rely on heuristic strategies, selecting negatives from a fixed candidate pool. Lacking semantic awareness, these methods often misclassify items that align with users' semantic interests as negatives, resulting in False Hard Negative Samples (FHNS). Such FHNS inject noisy supervision and hinder the model's optimal performance. To address this challenge, we propose HNLMRec, a generative semantic negative sampling framework. Leveraging the semantic reasoning capabilities of Large Language Models (LLMs), HNLMRec directly generates negative samples that are behaviorally distinct yet semantically relevant with respect to user preferences. Furthermore, we integrate collaborative filtering signals into the LLM via supervised fine-tuning, guiding the model to synthesize more reliable and informative hard negatives. Extensive experiments on multiple real-world datasets demonstrate that HNLMRec significantly outperforms traditional methods and LLM-enhanced baselines, while effectively mitigating popularity bias and data sparsity, thereby improving generalization.
翻译:难负样本采样通过加速收敛和锐化决策边界来提升推荐性能。然而,现有方法大多依赖启发式策略,从固定候选池中选取负样本。由于缺乏语义感知能力,这些方法常将与用户语义兴趣相符的物品误判为负样本,导致产生伪难负样本。此类伪难负样本会引入噪声监督,阻碍模型达到最优性能。为应对这一挑战,我们提出HNLMRec——一种生成式语义负采样框架。该框架利用大语言模型的语义推理能力,直接生成在行为模式上相异但与用户偏好语义相关的负样本。此外,我们通过监督微调将协同过滤信号融入大语言模型,引导其合成更可靠且信息量更大的难负样本。在多个真实数据集上的大量实验表明,HNLMRec显著优于传统方法及大语言模型增强基线,同时有效缓解流行度偏差与数据稀疏性问题,从而提升模型的泛化能力。