The incorporation of 3D data in facial analysis tasks has gained popularity in recent years. Though it provides a more accurate and detailed representation of the human face, accruing 3D face data is more complex and expensive than 2D face images. Either one has to rely on expensive 3D scanners or depth sensors which are prone to noise. An alternative option is the reconstruction of 3D faces from uncalibrated 2D images in an unsupervised way without any ground truth 3D data. However, such approaches are computationally expensive and the learned model size is not suitable for mobile or other edge device applications. Predicting dense 3D landmarks over the whole face can overcome this issue. As there is no public dataset available containing dense landmarks, we propose a pipeline to create a dense keypoint training dataset containing 520 key points across the whole face from an existing facial position map data. We train a lightweight MobileNet-based regressor model with the generated data. As we do not have access to any evaluation dataset with dense landmarks in it we evaluate our model against the 68 keypoint detection task. Experimental results show that our trained model outperforms many of the existing methods in spite of its lower model size and minimal computational cost. Also, the qualitative evaluation shows the efficiency of our trained models in extreme head pose angles as well as other facial variations and occlusions.
翻译:近年来,将3D数据融入面部分析任务逐渐流行。尽管它能提供更精确、更细致的人脸表征,但获取3D面部数据比2D人脸图像更复杂且成本高昂——要么依赖昂贵的3D扫描仪,要么依赖易受噪声干扰的深度传感器。另一种替代方案是通过无监督方式从无标定2D图像重建3D人脸,且无需任何真实3D数据。然而,此类方法计算成本高昂,且训练出的模型规模不适用于移动设备或其他边缘设备。预测全脸密集3D关键点可解决这一问题。由于目前尚无包含密集关键点的公开数据集,我们提出了一套流程:从现有面部位置图数据中构建包含全脸520个关键点的密集关键点训练数据集。我们利用生成的数据训练了一个基于轻量级MobileNet的回归模型。因无法获得带有密集关键点的评估数据集,我们将其与68关键点检测任务进行对比评估。实验结果表明,尽管模型尺寸更小且计算成本极低,但训练后的模型性能优于现有多种方法。此外,定性评估显示,该模型在极端头部姿态角度及其他面部变化与遮挡场景下均表现出高效性。