Existing 3D human pose estimators face challenges in adapting to new datasets due to the lack of 2D-3D pose pairs in training sets. To overcome this issue, we propose \textit{Multi-Hypothesis \textbf{P}ose \textbf{Syn}thesis \textbf{D}omain \textbf{A}daptation} (\textbf{PoSynDA}) framework to bridge this data disparity gap in target domain. Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain. By incorporating a multi-hypothesis network, PoSynDA generates diverse pose hypotheses and aligns them with the target domain. To do this, it first utilizes target-specific source augmentation to obtain the target domain distribution data from the source domain by decoupling the scale and position parameters. The process is then further refined through the teacher-student paradigm and low-rank adaptation. With extensive comparison of benchmarks such as Human3.6M and MPI-INF-3DHP, PoSynDA demonstrates competitive performance, even comparable to the target-trained MixSTE model\cite{zhang2022mixste}. This work paves the way for the practical application of 3D human pose estimation in unseen domains. The code is available at https://github.com/hbing-l/PoSynDA.
翻译:现有三维人体姿态估计器因训练集中缺乏2D-3D姿态对,难以适应新数据集。针对此问题,我们提出多假设姿态合成域适应(PoSynDA)框架以弥合目标域中的数据差异鸿沟。具体而言,PoSynDA采用类扩散结构模拟目标域中的三维姿态分布。通过集成多假设网络,PoSynDA生成多样化的姿态假设并将其与目标域对齐。为此,该框架首先利用目标特定源数据增强技术,通过解耦尺度与位置参数从源域获取目标域分布数据;随后通过师生范式与低秩适应进一步优化该流程。基于Human3.6M和MPI-INF-3DHP等基准的广泛对比表明,PoSynDA展现出具有竞争力的性能,甚至可与目标域训练的MixSTE模型相媲美。本研究为三维人体姿态估计在未知域中的实际应用奠定了基础。代码开源于https://github.com/hbing-l/PoSynDA。