This paper proposes an end-to-end framework for generating 3D human pose datasets using Neural Radiance Fields (NeRF). Public datasets generally have limited diversity in terms of human poses and camera viewpoints, largely due to the resource-intensive nature of collecting 3D human pose data. As a result, pose estimators trained on public datasets significantly underperform when applied to unseen out-of-distribution samples. Previous works proposed augmenting public datasets by generating 2D-3D pose pairs or rendering a large amount of random data. Such approaches either overlook image rendering or result in suboptimal datasets for pre-trained models. Here we propose PoseGen, which learns to generate a dataset (human 3D poses and images) with a feedback loss from a given pre-trained pose estimator. In contrast to prior art, our generated data is optimized to improve the robustness of the pre-trained model. The objective of PoseGen is to learn a distribution of data that maximizes the prediction error of a given pre-trained model. As the learned data distribution contains OOD samples of the pre-trained model, sampling data from such a distribution for further fine-tuning a pre-trained model improves the generalizability of the model. This is the first work that proposes NeRFs for 3D human data generation. NeRFs are data-driven and do not require 3D scans of humans. Therefore, using NeRF for data generation is a new direction for convenient user-specific data generation. Our extensive experiments show that the proposed PoseGen improves two baseline models (SPIN and HybrIK) on four datasets with an average 6% relative improvement.
翻译:摘要:本文提出一种利用神经辐射场(NeRF)生成三维人体姿态数据集的端到端框架。由于采集三维人体姿态数据需要大量资源,公开数据集在人体姿态和相机视角方面的多样性通常有限。因此,在公开数据集上训练的姿态估计器应用于未知的分布外样本时表现显著下降。先前的工作通过生成2D-3D姿态对或渲染大量随机数据来增强公开数据集,这些方法要么忽略图像渲染,要么为预训练模型生成次优数据集。本文提出PoseGen,该方法通过给定预训练姿态估计器的反馈损失来学习生成数据集(人体3D姿态与图像)。与现有技术不同,我们生成的数据经过优化以提升预训练模型的鲁棒性。PoseGen的目标是学习一种数据分布,使给定预训练模型的预测误差最大化。由于学习到的数据分布包含预训练模型的分布外样本,从该分布中采样数据对预训练模型进行进一步微调可提升模型的泛化能力。这是首个提出利用NeRF生成三维人体数据的工作。NeRF基于数据驱动,无需人体三维扫描。因此,使用NeRF生成数据为便捷的用户定制数据生成开辟了新方向。大量实验表明,所提出的PoseGen在四个数据集上使两个基线模型(SPIN与HybrIK)平均相对性能提升6%。