Although facial landmark detection (FLD) has gained significant progress, existing FLD methods still suffer from performance drops on partially non-visible faces, such as faces with occlusions or under extreme lighting conditions or poses. To address this issue, we introduce ORFormer, a novel transformer-based method that can detect non-visible regions and recover their missing features from visible parts. Specifically, ORFormer associates each image patch token with one additional learnable token called the messenger token. The messenger token aggregates features from all but its patch. This way, the consensus between a patch and other patches can be assessed by referring to the similarity between its regular and messenger embeddings, enabling non-visible region identification. Our method then recovers occluded patches with features aggregated by the messenger tokens. Leveraging the recovered features, ORFormer compiles high-quality heatmaps for the downstream FLD task. Extensive experiments show that our method generates heatmaps resilient to partial occlusions. By integrating the resultant heatmaps into existing FLD methods, our method performs favorably against the state of the arts on challenging datasets such as WFLW and COFW.
翻译:尽管面部关键点检测已取得显著进展,但现有方法在面对部分不可见面部(如存在遮挡、极端光照条件或姿态的情况)时仍会出现性能下降。为解决这一问题,我们提出了ORFormer,一种新颖的基于Transformer的方法,能够检测不可见区域并从可见部分恢复其缺失特征。具体而言,ORFormer为每个图像块标记关联一个额外的可学习标记,称为信使标记。信使标记聚合来自除其自身对应块之外所有块的特征。通过这种方式,可以通过参考一个块的常规嵌入与其信使嵌入之间的相似性来评估该块与其他块之间的一致性,从而实现不可见区域的识别。随后,我们的方法利用信使标记聚合的特征来恢复被遮挡的块。借助恢复的特征,ORFormer为下游面部关键点检测任务生成高质量的热力图。大量实验表明,我们的方法生成的热力图对部分遮挡具有鲁棒性。通过将生成的热力图集成到现有的面部关键点检测方法中,我们的方法在WFLW和COFW等具有挑战性的数据集上取得了优于现有技术的性能。