Persuasion games have been fundamental in economics and AI research, and have significant practical applications. Recent works in this area have started to incorporate natural language, moving beyond the traditional stylized message setting. However, previous research has focused on on-policy prediction, where the train and test data have the same distribution, which is not representative of real-life scenarios. In this paper, we tackle the challenging problem of off-policy evaluation (OPE) in language-based persuasion games. To address the inherent difficulty of human data collection in this setup, we propose a novel approach which combines real and simulated human-bot interaction data. Our simulated data is created by an exogenous model assuming decision makers (DMs) start with a mixture of random and decision-theoretic based behaviors and improve over time. We present a deep learning training algorithm that effectively integrates real interaction and simulated data, substantially improving over models that train only with interaction data. Our results demonstrate the potential of real interaction and simulation mixtures as a cost-effective and scalable solution for OPE in language-based persuasion games.\footnote{Our code and the large dataset we collected and generated are submitted as supplementary material and will be made publicly available upon acceptance.
翻译:说服博弈一直是经济学和人工智能研究中的基础性课题,具有重要的实际应用价值。近年来该领域的研究开始引入自然语言,突破了传统程式化信息传递的框架。然而,先前研究聚焦于同策略预测——训练数据与测试数据具有相同分布,这与真实场景存在显著差异。本文针对基于语言的说服博弈中具有挑战性的离策略评估(OPE)问题展开研究。为克服该场景下人类数据采集的内在困难,我们提出了一种融合真实人机交互数据与模拟数据的新方法。模拟数据通过外生模型生成,该模型假设决策者(DMs)在初始阶段混合了随机行为与决策理论导向行为,并随时间推移逐步优化。我们提出了一种深度学习训练算法,能够有效整合真实交互数据与模拟数据,显著优于仅使用交互数据训练的模型。研究结果表明,真实交互与模拟数据的混合方法可作为语言说服博弈中离策略评估的经济高效且可扩展的解决方案。(注:我们的代码及收集生成的大型数据集已作为补充材料提交,将在论文录用后公开。)