Learning Part Segmentation from Synthetic Animals

Semantic part segmentation provides an intricate and interpretable understanding of an object, thereby benefiting numerous downstream tasks. However, the need for exhaustive annotations impedes its usage across diverse object types. This paper focuses on learning part segmentation from synthetic animals, leveraging the Skinned Multi-Animal Linear (SMAL) models to scale up existing synthetic data generated by computer-aided design (CAD) animal models. Compared to CAD models, SMAL models generate data with a wider range of poses observed in real-world scenarios. As a result, our first contribution is to construct a synthetic animal dataset of tigers and horses with more pose diversity, termed Synthetic Animal Parts (SAP). We then benchmark Syn-to-Real animal part segmentation from SAP to PartImageNet, namely SynRealPart, with existing semantic segmentation domain adaptation methods and further improve them as our second contribution. Concretely, we examine three Syn-to-Real adaptation methods but observe relative performance drop due to the innate difference between the two tasks. To address this, we propose a simple yet effective method called Class-Balanced Fourier Data Mixing (CB-FDM). Fourier Data Mixing aligns the spectral amplitudes of synthetic images with real images, thereby making the mixed images have more similar frequency content to real images. We further use Class-Balanced Pseudo-Label Re-Weighting to alleviate the imbalanced class distribution. We demonstrate the efficacy of CB-FDM on SynRealPart over previous methods with significant performance improvements. Remarkably, our third contribution is to reveal that the learned parts from synthetic tiger and horse are transferable across all quadrupeds in PartImageNet, further underscoring the utility and potential applications of animal part segmentation.

翻译：语义部位分割提供了对物体精细且可解释的理解，从而有益于众多下游任务。然而，穷举标注的需求阻碍了其在多种物体类型中的使用。本文聚焦于从合成动物中学习部位分割，利用蒙皮多动物线性（SMAL）模型来扩展由计算机辅助设计（CAD）动物模型生成的现有合成数据。与CAD模型相比，SMAL模型生成的数据具有更广泛的姿态范围，这些姿态在现实场景中可观察到。因此，我们的第一个贡献是构建了一个具有更丰富姿态多样性的老虎和马匹合成动物数据集，称为合成动物部位（SAP）。随后，我们使用现有的语义分割域适应方法，以从SAP到PartImageNet为目标进行合成-真实动物部位分割的基准测试，即SynRealPart，并将这些方法进一步改进作为我们的第二个贡献。具体而言，我们考察了三种合成-真实域适应方法，但观察到由于两项任务之间的固有差异，性能相对下降。为解决此问题，我们提出了一种简单而有效的方法，称为类别平衡傅里叶数据混合（CB-FDM）。傅里叶数据混合将合成图像的频谱幅度与真实图像对齐，从而使混合图像具有与真实图像更相似的频率内容。我们进一步使用类别平衡伪标签重加权来缓解类别分布不平衡问题。我们在SynRealPart上证明了CB-FDM相比之前方法的有效性，并取得了显著的性能提升。值得注意的是，我们的第三个贡献是揭示了从合成老虎和马匹中学习到的部位在PartImageNet的所有四足动物中具有可迁移性，进一步强调了动物部位分割的实用性和潜在应用价值。