Dynamically planning in multi-agent systems has been explored to improve decision-making in various domains. Professional basketball serves as a compelling example of a dynamic spatio-temporal game, encompassing both concealed strategic policies and decision-making. However, processing the diverse on-court signals and navigating the vast space of potential actions and outcomes makes it difficult for existing approaches to swiftly identify optimal strategies in response to evolving circumstances. In this study, we first formulate the sequential decision-making process as a conditional trajectory generation process. We further introduce PLAYBEST (PLAYer BEhavior SynThesis), a method for enhancing player decision-making. We extend the state-of-the-art generative model, diffusion probabilistic model, to learn challenging multi-agent environmental dynamics from historical National Basketball Association (NBA) player motion tracking data. To incorporate data-driven strategies, an auxiliary value function is trained using the play-by-play data with corresponding rewards acting as the plan guidance. To accomplish reward-guided trajectory generation, conditional sampling is introduced to condition the diffusion model on the value function and conduct classifier-guided sampling. We validate the effectiveness of PLAYBEST via comprehensive simulation studies from real-world data, contrasting the generated trajectories and play strategies with those employed by professional basketball teams. Our results reveal that the model excels at generating high-quality basketball trajectories that yield efficient plays, surpassing conventional planning techniques in terms of adaptability, flexibility, and overall performance. Moreover, the synthesized play strategies exhibit a remarkable alignment with professional tactics, highlighting the model's capacity to capture the intricate dynamics of basketball games.
翻译:在多智能体系统中进行动态规划已被探索用于提升各领域的决策能力。职业篮球作为动态时空博弈的典型范例,既包含隐蔽的战略策略又涉及决策过程。然而,处理场上多样化信号并探索潜在动作与结果的广阔空间,使得现有方法难以快速识别应对动态局势的最优策略。本研究首先将序贯决策过程形式化为条件轨迹生成过程,进而提出PLAYBEST(PLAYer BEhavior SynThesis)方法以增强球员决策能力。我们扩展了最先进的生成模型——扩散概率模型,使其能从历史NBA球员运动追踪数据中学习具有挑战性的多智能体环境动态。为融入数据驱动策略,我们利用逐回合数据及对应奖励作为规划引导训练辅助价值函数。为实现奖励引导的轨迹生成,引入条件采样技术对扩散模型施加价值函数约束,并执行分类器引导采样。通过基于真实数据的全面仿真研究,将生成轨迹与比赛策略同职业篮球队实际采用的策略进行对比,验证了PLAYBEST的有效性。结果表明,该模型在生成高效比赛的高质量篮球轨迹方面表现卓越,在适应性、灵活性和整体性能上均超越传统规划技术。此外,合成比赛策略与专业战术高度吻合,凸显了模型捕捉篮球比赛复杂动态的能力。