EasyPortrait -- Face Parsing and Portrait Segmentation Dataset

Recently, due to COVID-19 and the growing demand for remote work, video conferencing apps have become especially widespread. The most valuable features of video chats are real-time background removal and face beautification. While solving these tasks, computer vision researchers face the problem of having relevant data for the training stage. There is no large dataset with high-quality labeled and diverse images of people in front of a laptop or smartphone camera to train a lightweight model without additional approaches. To boost the progress in this area, we provide a new image dataset, EasyPortrait, for portrait segmentation and face parsing tasks. It contains 20,000 primarily indoor photos of 8,377 unique users, and fine-grained segmentation masks separated into 9 classes. Images are collected and labeled from crowdsourcing platforms. Unlike most face parsing datasets, in EasyPortrait, the beard is not considered part of the skin mask, and the inside area of the mouth is separated from the teeth. These features allow using EasyPortrait for skin enhancement and teeth whitening tasks. This paper describes the pipeline for creating a large-scale and clean image segmentation dataset using crowdsourcing platforms without additional synthetic data. Moreover, we trained several models on EasyPortrait and showed experimental results. Proposed dataset and trained models are publicly available.

翻译：近期，受新冠疫情及远程工作需求增长的影响，视频会议应用变得尤为普及。视频通话中最具价值的功能是实时背景去除与人脸美化。在解决这些任务时，计算机视觉研究者面临训练阶段缺乏相关数据的问题。目前尚无一个包含高质量标注、多样化图像的大型数据集，用于训练无需额外方法的轻量级模型（图像采集自笔记本电脑或智能手机摄像头）。为促进该领域的发展，我们提供了一个名为EasyPortrait的新图像数据集，用于肖像分割与人脸解析任务。该数据集包含20,000张主要拍摄于室内的照片，涵盖8,377位独立用户，并附有分为9个类别的细粒度分割掩码。图像通过众包平台收集并标注。与大多数人脸解析数据集不同，EasyPortrait中胡须不被视为皮肤掩码的一部分，口腔内部区域也与牙齿分离。这些特性使EasyPortrait可用于皮肤美化与牙齿美白任务。本文描述了利用众包平台构建大规模、干净图像分割数据集的流程（无需额外合成数据）。此外，我们在EasyPortrait上训练了多个模型并展示了实验结果。所提出的数据集及训练模型均已公开。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日