Recently, due to COVID-19 and the growing demand for remote work, video conferencing apps have become especially widespread. The most valuable features of video chats are real-time background removal and face beautification. While solving these tasks, computer vision researchers face the problem of having relevant data for the training stage. There is no large dataset with high-quality labeled and diverse images of people in front of a laptop or smartphone camera to train a lightweight model without additional approaches. To boost the progress in this area, we provide a new image dataset, EasyPortrait, for portrait segmentation and face parsing tasks. It contains 20,000 primarily indoor photos of 8,377 unique users, and fine-grained segmentation masks separated into 9 classes. Images are collected and labeled from crowdsourcing platforms. Unlike most face parsing datasets, in EasyPortrait, the beard is not considered part of the skin mask, and the inside area of the mouth is separated from the teeth. These features allow using EasyPortrait for skin enhancement and teeth whitening tasks. This paper describes the pipeline for creating a large-scale and clean image segmentation dataset using crowdsourcing platforms without additional synthetic data. Moreover, we trained several models on EasyPortrait and showed experimental results. Proposed dataset and trained models are publicly available.
翻译:近期,受新冠疫情及远程工作需求增长的影响,视频会议应用变得尤为普及。视频通话中最具价值的功能是实时背景去除与人脸美化。在解决这些任务时,计算机视觉研究者面临训练阶段缺乏相关数据的问题。目前尚无一个包含高质量标注、多样化图像的大型数据集,用于训练无需额外方法的轻量级模型(图像采集自笔记本电脑或智能手机摄像头)。为促进该领域的发展,我们提供了一个名为EasyPortrait的新图像数据集,用于肖像分割与人脸解析任务。该数据集包含20,000张主要拍摄于室内的照片,涵盖8,377位独立用户,并附有分为9个类别的细粒度分割掩码。图像通过众包平台收集并标注。与大多数人脸解析数据集不同,EasyPortrait中胡须不被视为皮肤掩码的一部分,口腔内部区域也与牙齿分离。这些特性使EasyPortrait可用于皮肤美化与牙齿美白任务。本文描述了利用众包平台构建大规模、干净图像分割数据集的流程(无需额外合成数据)。此外,我们在EasyPortrait上训练了多个模型并展示了实验结果。所提出的数据集及训练模型均已公开。