While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel model called Deep Fusion MVR (DF-MVR) and design a multi-view encoding to single decoding framework with skip connections, able to extract, integrate, and compensate deep features with attention from multi-view images. Furthermore, we adopt the involution kernel to enrich deep fusion features with channel features. In addition, we develop the face parse network to learn, identify, and emphasize the critical common face area within multi-view images. Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of our model. Without 3D annotation, DF-MVR achieves 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs respectively on Pixel-Face and Bosphorus dataset. Code will be available publicly at https://github.com/weiguangzhao/DF_MVR.
翻译:摘要:尽管弱监督多视角人脸重建(MVR)正受到越来越多关注,但一个关键问题仍悬而未决:如何有效融合多幅图像信息以重建高精度三维模型。为此,我们提出一种名为Deep Fusion MVR(DF-MVR)的新模型,并设计了具有跳跃连接的多视角编码到单解码框架,能够从多视角图像中提取、集成并利用注意力机制补偿深度特征。进一步,我们采用卷积核通过通道特征丰富深度融合特征。此外,我们开发了人脸解析网络,用于学习、识别并强调多视角图像中关键的公共人脸区域。在Pixel-Face和Bosphorus数据集上的实验表明了我们模型的优越性。无需三维标注,DF-MVR在Pixel-Face和Bosphorus数据集上相比现有弱监督MVR方法分别实现了5.2%和3.0%的均方根误差(RMSE)提升。代码将公开于https://github.com/weiguangzhao/DF_MVR。