Persuasion games have been fundamental in economics and AI research, and have significant practical applications. Recent works in this area have started to incorporate natural language, moving beyond the traditional stylized message setting. However, previous research has focused on on-policy prediction, where the train and test data have the same distribution, which is not representative of real-life scenarios. In this paper, we tackle the challenging problem of off-policy evaluation (OPE) in language-based persuasion games. To address the inherent difficulty of human data collection in this setup, we propose a novel approach which combines real and simulated human-bot interaction data. Our simulated data is created by an exogenous model assuming decision makers (DMs) start with a mixture of random and decision-theoretic based behaviors and improve over time. We present a deep learning training algorithm that effectively integrates real interaction and simulated data, substantially improving over models that train only with interaction data. Our results demonstrate the potential of real interaction and simulation mixtures as a cost-effective and scalable solution for OPE in language-based persuasion games.\footnote{Our code and the large dataset we collected and generated are submitted as supplementary material and will be made publicly available upon acceptance.
翻译:说服博弈在经济学和人工智能研究中具有基础性地位,且拥有重要的实际应用。近期该领域的研究已开始融入自然语言,突破了传统程式化信息设置的局限。然而,先前研究主要集中在同策略预测上,即训练数据与测试数据具有相同分布,这并不能代表现实场景。本文针对语言型说服博弈中离策略评估这一具有挑战性的问题展开研究。为了解决该设定下人类数据收集的固有困难,我们提出了一种融合真实人与机器人交互数据与模拟数据的新方法。模拟数据由外生模型生成,该模型假设决策者最初混合采用随机行为与基于决策理论的行为,并随时间推移逐步优化。我们提出了一种深度学习训练算法,该算法能有效整合真实交互数据与模拟数据,其在性能上显著优于仅使用交互数据训练的模型。我们的研究结果表明,真实交互与模拟数据的混合方案作为语言型说服博弈中离策略评估的一种经济高效且可扩展的解决方案具有巨大潜力。\footnote{我们的代码及所收集生成的大规模数据集已作为补充材料提交,并在接收后公开发布。}