Recent advances in Large Language Models (LLMs) have spurred interest in designing LLM-based agents for tasks that involve interaction with human and artificial agents. This paper addresses a key aspect in the design of such agents: Predicting human decision in off-policy evaluation (OPE), focusing on language-based persuasion games, where the agent's goal is to influence its partner's decisions through verbal messages. Using a dedicated application, we collected a dataset of 87K decisions from humans playing a repeated decision-making game with artificial agents. Our approach involves training a model on human interactions with one agents subset to predict decisions when interacting with another. To enhance off-policy performance, we propose a simulation technique involving interactions across the entire agent space and simulated decision makers. Our learning strategy yields significant OPE gains, e.g., improving prediction accuracy in the top 15% challenging cases by 7.1%. Our code and the large dataset we collected and generated are submitted as supplementary material and publicly available in our GitHub repository: https://github.com/eilamshapira/HumanChoicePrediction
翻译:近期大型语言模型(LLMs)的进展推动了基于LLM的智能体在与人及智能体交互任务中的设计研究。本文聚焦此类智能体设计的关键环节:离策略评估(OPE)中的人类决策预测,特别针对语言型说服博弈场景——其中智能体需通过语言信息影响合作对象的决策。我们通过专用应用程序收集了人类与人工智能体进行重复决策博弈时的87,000个决策数据。研究方法包含:基于人类与某子集智能体的交互数据训练模型,以预测其与另一子集智能体交互时的决策。为提升离策略评估效果,我们提出一种跨全智能体空间及模拟决策者的交互模拟技术。该学习策略显著提升了离策略评估性能,例如在最具挑战性的前15%案例中预测准确率提升7.1%。本研究代码及收集生成的大型数据集作为补充材料提交,并公开于GitHub仓库:https://github.com/eilamshapira/HumanChoicePrediction