This paper presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Traditional fixed-viewpoint multi-camera solutions for human motion capture (MoCap) are limited in capture space and susceptible to dynamic occlusions. Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction. However, current methods still face challenges with credit assignment and environment dynamics. To address these issues, our proposed method introduces a novel Collaborative Triangulation Contribution Reward (CTCR) that improves convergence and alleviates multi-agent credit assignment issues resulting from using 3D reconstruction accuracy as the shared reward. Additionally, we jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics and encourage anticipatory behaviors for occlusion avoidance. We evaluate our proposed method in four photo-realistic UE4 environments to ensure validity and generalizability. Empirical results show that our method outperforms fixed and active baselines in various scenarios with different numbers of cameras and humans.
翻译:本文提出了一种基于多智能体强化学习(MARL)的主动式多相机协作方案,用于动态人群场景中的三维人体姿态估计。传统固定视角的多相机人体运动捕捉(MoCap)方案存在捕获空间有限、易受动态遮挡影响等局限。主动式相机方法通过主动控制相机位姿来寻找最优三维重建视角。然而,现有方法仍面临信用分配和环境动态性两大挑战。为解决上述问题,本文引入新型协同三角测量贡献奖励(CTCR)机制,该机制在提升收敛性的同时,缓解了将三维重建精度作为共享奖励时产生的多智能体信用分配难题。此外,我们通过联合训练多个环境动态学习任务,使模型能够更精准地捕捉环境变化,并促进避障预判行为。在四个具备照片级真实度的UE4环境中进行验证,实验结果表明,本方法在不同相机数量与人群规模的多样化场景中均优于固定视角与主动式基线方案。