Animal pose estimation has become a crucial area of research, but the scarcity of annotated data is a significant challenge in developing accurate models. Synthetic data has emerged as a promising alternative, but it frequently exhibits domain discrepancies with real data. Style transfer algorithms have been proposed to address this issue, but they suffer from insufficient spatial correspondence, leading to the loss of label information. In this work, we present a new approach called Synthetic Pose-aware Animal ControlNet (SPAC-Net), which incorporates ControlNet into the previously proposed Prior-Aware Synthetic animal data generation (PASyn) pipeline. We leverage the plausible pose data generated by the Variational Auto-Encoder (VAE)-based data generation pipeline as input for the ControlNet Holistically-nested Edge Detection (HED) boundary task model to generate synthetic data with pose labels that are closer to real data, making it possible to train a high-precision pose estimation network without the need for real data. In addition, we propose the Bi-ControlNet structure to separately detect the HED boundary of animals and backgrounds, improving the precision and stability of the generated data. Using the SPAC-Net pipeline, we generate synthetic zebra and rhino images and test them on the AP10K real dataset, demonstrating superior performance compared to using only real images or synthetic data generated by other methods. Our work demonstrates the potential for synthetic data to overcome the challenge of limited annotated data in animal pose estimation.
翻译:动物姿态估计已成为一个关键研究领域,但标注数据的稀缺性是开发精确模型面临的重要挑战。合成数据作为一种有前景的替代方案已崭露头角,但其与真实数据之间常存在领域差异。现有风格迁移算法虽被提出以解决该问题,却因空间对应关系不足导致标签信息丢失。本研究提出一种名为合成姿态感知动物控制网络(SPAC-Net)的新方法,将ControlNet整合到先前提出的先验感知合成动物数据生成(PASyn)流程中。我们利用基于变分自编码器(VAE)的数据生成流程所输出的合理姿态数据,作为ControlNet全嵌套边缘检测(HED)边界任务模型的输入,生成更接近真实数据且携带姿态标签的合成数据,从而无需真实数据即可训练高精度姿态估计网络。此外,我们提出双ControlNet结构分别检测动物与背景的HED边界,提升了生成数据的精度与稳定性。通过SPAC-Net流程生成的合成斑马与犀牛图像,在AP10K真实数据集上的测试表现优于仅使用真实图像或其他方法生成的合成数据。本工作证明了合成数据在克服动物姿态估计中标注数据匮乏问题方面的潜力。