Personalized recommendation requires models that capture sequential user preferences while remaining robust to sparse feedback and semantic ambiguity. Recent work has explored large language models (LLMs) as recommenders and re-rankers, but pure prompt-based ranking often suffers from poor calibration, sensitivity to candidate ordering, and popularity bias. These limitations make LLMs useful semantic reasoners, but unreliable as standalone ranking engines. We present \textbf{GraphRAG-IRL}, a hybrid recommendation framework that combines graph-grounded feature construction, inverse reinforcement learning (IRL), and persona-guided LLM re-ranking. Our method constructs a heterogeneous knowledge graph over items, categories, and concepts, retrieves both individual and community preference context, and uses these signals to train a Maximum Entropy IRL model for calibrated pre-ranking. An LLM is then applied only to a short candidate list, where persona-guided prompts provide complementary semantic judgments that are fused with IRL rankings. Experiments show that GraphRAG-IRL is a strong standalone recommender: IRL-MLP with GraphRAG improves NDCG@10 by 15.7\% on MovieLens and 16.6\% on KuaiRand over supervised baselines. The results also show that IRL and GraphRAG are superadditive, with the combined gain exceeding the sum of their individual improvements. Persona-guided LLM fusion further improves ranking quality, yielding up to 16.8\% NDCG@10 improvement over the IRL-only baseline on MovieLens ml-1m, while score fusion on KuaiRand provides consistent gains of 4--6\% across LLM providers.
翻译:个性化推荐需要模型在捕捉用户序列偏好的同时,对稀疏反馈与语义歧义保持鲁棒性。近期研究探索了大语言模型(LLM)作为推荐器和重排序器的应用,但纯提示式排序常存在校准不足、对候选顺序敏感以及流行度偏差等问题。这些局限使LLM成为有效的语义推理器,却难以作为独立排序引擎可靠运行。我们提出**GraphRAG-IRL**混合推荐框架,融合基于图的特征构建、逆强化学习(IRL)与人格引导的LLM重排序。该方法构建包含物品、类别与概念的异质知识图谱,提取个体及社区偏好上下文,并利用这些信号训练最大熵IRL模型以实现校准后的预排序。随后,仅对短候选列表应用LLM,通过人格引导提示提供补充语义判断,与IRL排序结果融合。实验表明,GraphRAG-IRL作为独立推荐器性能强劲:基于GraphRAG的IRL-MLP在MovieLens和KuaiRand数据集上相较于监督基线分别实现NDCG@10提升15.7%和16.6%。结果同时显示IRL与GraphRAG具有超可加性——组合提升幅度超过各自改进之和。在MovieLens ml-1m数据集上,人格引导的LLM融合进一步优化排序质量,较纯IRL基线实现NDCG@10最高提升16.8%;而在KuaiRand数据集上,分数融合在不同LLM供应商中持续带来4-6%的增益。