We introduce a novel method for human shape and pose recovery that can fully leverage multiple static views. We target fixed-multiview people monitoring, including elderly care and safety monitoring, in which calibrated cameras can be installed at the corners of a room or an open space but whose configuration may vary depending on the environment. Our key idea is to formulate it as neural optimization. We achieve this with HeatFormer, a neural optimizer that iteratively refines the SMPL parameters given multiview images, which is fundamentally agonistic to the configuration of views. HeatFormer realizes this SMPL parameter estimation as heat map generation and alignment with a novel transformer encoder and decoder. We demonstrate the effectiveness of HeatFormer including its accuracy, robustness to occlusion, and generalizability through an extensive set of experiments. We believe HeatFormer can serve a key role in passive human behavior modeling.
翻译:我们提出了一种新颖的人体形状与姿态恢复方法,该方法能够充分利用多个静态视角。我们针对固定多视角人员监控场景(包括老年人护理与安全监控)展开研究,此类场景中可在房间或开放空间的角落安装已标定的摄像头,但其配置可能随环境而变化。我们的核心思想是将其构建为神经优化问题。为此,我们设计了HeatFormer——一种神经优化器,它能够基于多视角图像迭代优化SMPL参数,且本质上对视角配置具有无关性。HeatFormer通过新型Transformer编码器-解码器实现热图生成与对齐,从而完成SMPL参数估计。我们通过大量实验验证了HeatFormer的有效性,包括其精度、对遮挡的鲁棒性以及泛化能力。我们相信HeatFormer能够在被动式人类行为建模中发挥关键作用。