Deep learning methods have led to significant improvements in the performance on the facial landmark detection (FLD) task. However, detecting landmarks in challenging settings, such as head pose changes, exaggerated expressions, or uneven illumination, continue to remain a challenge due to high variability and insufficient samples. This inadequacy can be attributed to the model's inability to effectively acquire appropriate facial structure information from the input images. To address this, we propose a novel image augmentation technique specifically designed for the FLD task to enhance the model's understanding of facial structures. To effectively utilize the newly proposed augmentation technique, we employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss to achieve collective learning of high-level feature representations from two different views of the input images. Furthermore, we employ a Transformer + CNN-based network with a custom hourglass module as the robust backbone for the Siamese framework. Extensive experiments show that our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
翻译:深度学习方法显著提升了面部关键点检测任务的性能。然而,在头部姿态变化、夸张表情或光照不均匀等具有挑战性的场景中,由于高变异性与样本不足,关键点检测仍面临困难。这种不足源于模型难以从输入图像中有效获取适当的面部结构信息。为此,我们提出一种专为面部关键点检测任务设计的图像增强技术,以增强模型对面部结构的理解。为有效利用这一新提出的增强技术,我们采用基于孪生架构的训练机制,结合深度典型相关分析(DCCA)损失,实现从输入图像两个不同视图对高级特征表征的联合学习。此外,我们采用基于Transformer+CNN的网络架构,并集成定制化沙漏模块作为孪生框架的稳健主干网络。大量实验表明,该方法在多个基准数据集上均优于多种前沿方法。