Face parsing is defined as the per-pixel labeling of images containing human faces. The labels are defined to identify key facial regions like eyes, lips, nose, hair, etc. In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. We propose a simple architecture having a convolutional encoder and a pixel MLP decoder that uses 1/26th number of parameters compared to the state-of-the-art models and yet matches or outperforms state-of-the-art models on multiple datasets, like CelebAMask-HQ and LaPa. We do not use any pretraining, and compared to other works, our network can also generate segmentation at different resolutions without any changes in the input resolution. This work enables the use of facial segmentation on low-compute or low-bandwidth devices because of its higher FPS and smaller model size.
翻译:人脸解析定义为对含有人脸图像进行逐像素标注。标签用于识别关键面部区域,如眼睛、嘴唇、鼻子、头发等。本文利用人脸的结构一致性,提出一种基于局部隐式函数网络的轻量级人脸解析方法FP-LIIF。我们设计了一种简单架构,包括卷积编码器和像素级MLP解码器,其参数量仅为最先进模型的1/26,但在CelebAMask-HQ和LaPa等多个数据集上达到或超越了最先进模型的性能。该方法无需预训练,且相较于其他工作,我们的网络可在不改变输入分辨率的情况下生成不同分辨率的分割结果。由于更高的帧率和更小的模型体积,本工作使得面部分割能够在低计算能力或低带宽设备上部署。