Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

In this work, we aim to enhance model-based face reconstruction by avoiding fitting the model to outliers, i.e. regions that cannot be well-expressed by the model such as occluders or make-up. The core challenge for localizing outliers is that they are highly variable and difficult to annotate. To overcome this challenging problem, we introduce a joint Face-autoencoder and outlier segmentation approach (FOCUS).In particular, we exploit the fact that the outliers cannot be fitted well by the face model and hence can be localized well given a high-quality model fitting. The main challenge is that the model fitting and the outlier segmentation are mutually dependent on each other, and need to be inferred jointly. We resolve this chicken-and-egg problem with an EM-type training strategy, where a face autoencoder is trained jointly with an outlier segmentation network. This leads to a synergistic effect, in which the segmentation network prevents the face encoder from fitting to the outliers, enhancing the reconstruction quality. The improved 3D face reconstruction, in turn, enables the segmentation network to better predict the outliers. To resolve the ambiguity between outliers and regions that are difficult to fit, such as eyebrows, we build a statistical prior from synthetic data that measures the systematic bias in model fitting. Experiments on the NoW testset demonstrate that FOCUS achieves SOTA 3D face reconstruction performance among all baselines that are trained without 3D annotation. Moreover, our results on CelebA-HQ and the AR database show that the segmentation network can localize occluders accurately despite being trained without any segmentation annotation.

翻译：本研究旨在通过避免将模型拟合到离群点（即眼镜、妆容等无法被模型良好表达的区域）来提升基于模型的面部重建效果。定位离群点的核心挑战在于其高度可变且难以标注。为攻克这一难题，我们提出了一种联合面部自编码器与离群点分割方法(FOCUS)。具体而言，我们利用离群点无法被面部模型良好拟合的特性，从而在高质量模型拟合条件下实现精准定位。核心难点在于模型拟合与离群点分割存在相互依赖关系，需要联合推断。我们采用EM式训练策略解决这一"鸡生蛋"困境，将面部自编码器与离群点分割网络进行联合训练。这种协同效应使分割网络能够阻止面部编码器拟合离群点，从而提升重建质量；而改进的三维面部重建反过来又使分割网络能更准确地预测离群点。为消除离群点与眉毛等难拟合区域之间的歧义，我们基于合成数据构建了统计先验，用以度量模型拟合的系统性偏差。在NoW测试集上的实验表明，FOCUS在所有未使用三维标注训练的基线方法中取得了最优的三维面部重建性能。此外，在CelebA-HQ和AR数据库上的结果证明，即使未使用任何分割标注进行训练，分割网络仍能精确定位遮挡物。