Unsupervised learning of Data-driven Facial Expression Coding System (DFECS) using keypoint tracking

The development of existing facial coding systems, such as the Facial Action Coding System (FACS), relied on manual examination of facial expression videos for defining Action Units (AUs). To overcome the labor-intensive nature of this process, we propose the unsupervised learning of an automated facial coding system by leveraging computer-vision-based facial keypoint tracking. In this novel facial coding system called the Data-driven Facial Expression Coding System (DFECS), the AUs are estimated by applying dimensionality reduction to facial keypoint movements from a neutral frame through a proposed Full Face Model (FFM). FFM employs a two-level decomposition using advanced dimensionality reduction techniques such as dictionary learning (DL) and non-negative matrix factorization (NMF). These techniques enhance the interpretability of AUs by introducing constraints such as sparsity and positivity to the encoding matrix. Results show that DFECS AUs estimated from the DISFA dataset can account for an average variance of up to 91.29 percent in test datasets (CK+ and BP4D-Spontaneous) and also surpass the variance explained by keypoint-based equivalents of FACS AUs in these datasets. Additionally, 87.5 percent of DFECS AUs are interpretable, i.e., align with the direction of facial muscle movements. In summary, advancements in automated facial coding systems can accelerate facial expression analysis across diverse fields such as security, healthcare, and entertainment. These advancements offer numerous benefits, including enhanced detection of abnormal behavior, improved pain analysis in healthcare settings, and enriched emotion-driven interactions. To facilitate further research, the code repository of DFECS has been made publicly accessible.

翻译：现有面部编码系统（如面部动作编码系统FACS）的开发依赖于对面部表情视频的人工检查来定义动作单元。为克服这一过程的劳动密集型特性，我们提出通过利用基于计算机视觉的面部关键点跟踪来实现自动化面部编码系统的无监督学习。在这一称为数据驱动面部表情编码系统的新型面部编码系统中，动作单元通过将面部关键点从基准帧开始的运动，经由我们提出的全脸模型进行降维估计而得。全脸模型采用字典学习和非负矩阵分解等先进降维技术进行两级分解。这些技术通过向编码矩阵引入稀疏性和非负性等约束，增强了动作单元的可解释性。结果显示，从DISFA数据集估计的DFECS动作单元在测试数据集（CK+和BP4D-Spontaneous）中平均可解释高达91.29%的方差，且在这些数据集中超越了基于关键点的FACS动作单元等效方法所解释的方差。此外，87.5%的DFECS动作单元具有可解释性，即与面部肌肉运动方向一致。总之，自动化面部编码系统的进步可加速面部表情分析在安防、医疗和娱乐等多元领域的应用。这些进步带来诸多益处，包括异常行为检测的增强、医疗场景中疼痛分析的改进以及情感驱动交互体验的丰富。为促进进一步研究，DFECS的代码仓库已公开提供。