Recently, deep learning based facial landmark detection (FLD) methods have achieved considerable success. However, in challenging scenarios such as large pose variations, illumination changes, and facial expression variations, they still struggle to accurately capture the geometric structure of the face, resulting in performance degradation. Moreover, the limited size and diversity of existing FLD datasets hinder robust model training, leading to reduced detection accuracy. To address these challenges, we propose a Frequency-Guided Task-Balancing Transformer (FGTBT), which enhances facial structure perception through frequency-domain modeling and multi-dataset unified training. Specifically, we propose a novel Fine-Grained Multi-Task Balancing loss (FMB-loss), which moves beyond coarse task-level balancing by assigning weights to individual landmarks based on their occurrence across datasets. This enables more effective unified training and mitigates the issue of inconsistent gradient magnitudes. Additionally, a Frequency-Guided Structure-Aware (FGSA) model is designed to utilize frequency-guided structure injection and regularization to help learn facial structure constraints. Extensive experimental results on popular benchmark datasets demonstrate that the integration of the proposed FMB-loss and FGSA model into our FGTBT framework achieves performance comparable to state-of-the-art methods. The code is available at https://github.com/Xi0ngxinyu/FGTBT.
翻译:近年来,基于深度学习的人脸关键点检测方法取得了显著成功。然而,在诸如大姿态变化、光照变化和面部表情变化等具有挑战性的场景中,这些方法仍难以准确捕捉人脸的几何结构,导致性能下降。此外,现有FLD数据集的规模和多样性有限,阻碍了模型的鲁棒性训练,从而降低了检测精度。为应对这些挑战,我们提出了一种频率引导任务平衡Transformer,该模型通过频域建模和多数据集统一训练来增强面部结构感知。具体而言,我们提出了一种新颖的细粒度多任务平衡损失函数,它超越了粗粒度的任务级平衡,通过根据各个关键点在不同数据集中的出现频率为其分配权重。这使得统一训练更加有效,并缓解了梯度幅度不一致的问题。此外,我们设计了一个频率引导结构感知模型,该模型利用频率引导的结构注入和正则化来帮助学习面部结构约束。在多个流行基准数据集上的大量实验结果表明,将所提出的FMB损失函数和FGSA模型集成到我们的FGTBT框架中,实现了与最先进方法相媲美的性能。代码可在 https://github.com/Xi0ngxinyu/FGTBT 获取。