The success of deep learning models depends on the size and quality of the dataset to solve certain tasks. Here, we explore how far generated data can aid real data in improving the performance of Neural Networks. In this work, we consider facial expression recognition since it requires challenging local data generation at the level of local regions such as mouth, eyebrows, etc, rather than simple augmentation. Generative Adversarial Networks (GANs) provide an alternative method for generating such local deformations but they need further validation. To answer our question, we consider noncomplex Convolutional Neural Networks (CNNs) based classifiers for recognizing Ekman emotions. For the data generation process, we consider generating facial expressions (FEs) by relying on two GANs. The first generates a random identity while the second imposes facial deformations on top of it. We consider training the CNN classifier using FEs from: real-faces, GANs-generated, and finally using a combination of real and GAN-generated faces. We determine an upper bound regarding the data generation quantity to be mixed with the real one which contributes the most to enhancing FER accuracy. In our experiments, we find out that 5-times more synthetic data to the real FEs dataset increases accuracy by 16%.
翻译:深度学习模型的成功依赖于解决特定任务所需数据集的大小与质量。本文探究生成数据能在多大程度上辅助真实数据提升神经网络性能。由于面部表情识别需要在嘴部、眉毛等局部区域进行具有挑战性的数据生成(而非简单的数据增强),我们以此作为研究案例。生成对抗网络(GANs)为生成此类局部形变提供了替代方法,但仍需进一步验证。为回答上述问题,我们采用基于非复杂卷积神经网络(CNNs)的分类器识别埃克曼情绪。在数据生成过程中,我们基于两种GANs生成面部表情:第一种生成随机身份,第二种在此之上施加面部形变。我们分别使用真实人脸、GAN生成人脸以及真实与GAN生成人脸组合的面部表情训练CNN分类器,并确定了与真实数据混合时对提升表情识别精度贡献最大的生成数据量上限。实验表明,当生成数据量达到真实面部表情数据集的5倍时,识别精度提升16%。