Facial expression recognition (FER) is a crucial part of human-computer interaction. Existing FER methods achieve high accuracy and generalization based on different open-source deep models and training approaches. However, the performance of these methods is not always good when encountering practical settings, which are seldom explored. In this paper, we collected a new in-the-wild facial expression dataset for cross-domain validation. Twenty-three commonly used network architectures were implemented and evaluated following a uniform protocol. Moreover, various setups, in terms of input resolutions, class balance management, and pre-trained strategies, were verified to show the corresponding performance contribution. Based on extensive experiments on three large-scale FER datasets and our practical cross-validation, we ranked network architectures and summarized a set of recommendations on deploying deep FER methods in real scenarios. In addition, potential ethical rules, privacy issues, and regulations were discussed in practical FER applications such as marketing, education, and entertainment business.
翻译:面部表情识别(FER)是人机交互中的关键组成部分。现有FER方法基于不同的开源深度模型与训练策略,已实现高精度与强泛化能力。然而,这些方法在较少被探索的实际应用场景中,其性能并非始终稳定。本文收集了新的野外面部表情数据集用于跨域验证,并按照统一协议实现了23种常用网络架构进行系统评估。此外,我们验证了输入分辨率、类别平衡管理及预训练策略等不同设置对性能的贡献。基于对三个大规模FER数据集及我们实际交叉验证的广泛实验,我们对网络架构进行了排序,并总结出一套在真实场景部署深层FER方法的建议。最后,本文探讨了市场营销、教育及娱乐产业中实际FER应用所涉及的潜在伦理规范、隐私问题及法规要求。