Advancing large language models (LLMs) for the next point-of-interest (POI) recommendation task faces two fundamental challenges: (i) although existing methods produce semantic IDs that incorporate semantic information, their topology-blind indexing fails to preserve semantic continuity, meaning that proximity in ID values does not mirror the coherence of the underlying semantics; and (ii) supervised fine-tuning (SFT)-based methods restrict model outputs to top-1 predictions. These approaches suffer from "answer fixation" and neglect the need for top-k ranked lists and reasoning due to the scarcity of supervision. We propose Refine-POI, a framework that addresses these challenges through topology-aware ID generation and reinforcement fine-tuning. First, we introduce a hierarchical self-organizing map (SOM) quantization strategy to generate semantic IDs, ensuring that coordinate proximity in the codebook reflects semantic similarity in the latent space. Second, we employ a policy-gradient framework to optimize the generation of top-k recommendation lists, liberating the model from strict label matching. Extensive experiments on three real-world datasets demonstrate that Refine-POI significantly outperforms state-of-the-art baselines, effectively synthesizing the reasoning capabilities of LLMs with the representational fidelity required for accurate and explainable next-POI recommendation.
翻译:将大语言模型(LLMs)应用于下一兴趣点(POI)推荐任务面临两个基本挑战:(i)尽管现有方法生成的语义ID融合了语义信息,但其无视拓扑结构的索引方式无法保持语义连续性,即ID值的邻近性并不能反映底层语义的一致性;(ii)基于监督微调(SFT)的方法将模型输出限制为Top-1预测。这些方法存在“答案固化”问题,并且由于监督信号的稀缺,忽略了生成Top-k排序列表和推理过程的需求。我们提出了Refine-POI框架,通过拓扑感知的ID生成和强化微调来解决这些挑战。首先,我们引入一种分层自组织映射(SOM)量化策略来生成语义ID,确保码本中的坐标邻近性能够反映潜在空间中的语义相似性。其次,我们采用策略梯度框架来优化Top-k推荐列表的生成,使模型摆脱严格的标签匹配约束。在三个真实世界数据集上的大量实验表明,Refine-POI显著优于当前最先进的基线方法,有效地将LLMs的推理能力与实现准确、可解释的下一POI推荐所需的表征保真度相结合。