Evaluating recommender systems remains challenging due to the gap between offline metrics and real user behavior, as well as the scarcity of interaction data. Recent work explores large language model (LLM) agents as synthetic users, yet they typically rely on few-shot prompting, which yields a shallow understanding of the environment and limits their ability to faithfully reproduce user actions. We introduce AlignUSER, a framework that learns world-model-driven agents from human interactions. Given rollout sequences of actions and states, we formalize world modeling as a next state prediction task that helps the agent internalize the environment. To align actions with human personas, we generate counterfactual trajectories around demonstrations and prompt the LLM to compare its decisions with human choices, identify suboptimal actions, and extract lessons. The learned policy is then used to drive agent interactions with the recommender system. We evaluate AlignUSER across multiple datasets and demonstrate closer alignment with genuine humans than prior work, both at the micro and macro levels.
翻译:推荐系统的评估仍面临挑战,这主要源于离线指标与真实用户行为之间的差距,以及交互数据的稀缺。近期研究探索将大型语言模型(LLM)智能体作为合成用户,但这些方法通常依赖少量示例提示,导致对环境理解浅层化,限制了其忠实复现用户行为的能力。本文提出AlignUSER框架,该框架从人类交互中学习由世界模型驱动的智能体。给定动作与状态的展开序列,我们将世界建模形式化为下一状态预测任务,以帮助智能体内化环境。为使动作与人类角色对齐,我们在演示轨迹周围生成反事实轨迹,并提示LLM比较其决策与人类选择、识别次优动作并提取经验教训。学习得到的策略随后用于驱动智能体与推荐系统的交互。我们在多个数据集上评估AlignUSER,结果表明其在微观与宏观层面均比先前工作更贴近真实人类行为。