PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization

from arxiv, Published in Informatics in Medicine Unlocked. Code available at: https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention

Manual annotation of anatomical landmarks on 3D facial scans is a time-consuming and expertise-dependent task, yet it remains critical for clinical assessments, morphometric analysis, and craniofacial research. While several deep learning methods have been proposed for facial landmark localization, most focus on pseudo-landmarks or require complex input representations, limiting their clinical applicability. This study presents a fully automated deep learning pipeline (PAL-Net) for localizing 50 anatomical landmarks on stereo-photogrammetry facial models. The method combines coarse alignment, region-of-interest filtering, and an initial approximation of landmarks with a patch-based pointwise CNN enhanced by attention mechanisms. Trained and evaluated on 214 annotated scans from healthy adults, PAL-Net achieved a mean localization error of 3.686 mm and preserves relevant anatomical distances with a 2.822 mm average error, comparable to intra-observer variability. To assess generalization, the model was further evaluated on 700 subjects from the FaceScape dataset, achieving a point-wise error of 0.41\,mm and a distance-wise error of 0.38\,mm. Compared to existing methods, PAL-Net offers a favorable trade-off between accuracy and computational cost. While performance degrades in regions with poor mesh quality (e.g., ears, hairline), the method demonstrates consistent accuracy across most anatomical regions. PAL-Net generalizes effectively across datasets and facial regions, outperforming existing methods in both point-wise and structural evaluations. It provides a lightweight, scalable solution for high-throughput 3D anthropometric analysis, with potential to support clinical workflows and reduce reliance on manual annotation. Source code can be found at https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention

翻译：在三维面部扫描上手动标注解剖学标志点是一项耗时且依赖专业知识的任务，但其对于临床评估、形态计量分析和颅面研究仍然至关重要。虽然已有多种深度学习方法被提出用于面部标志点定位，但大多数方法侧重于伪标志点或需要复杂的输入表示，这限制了其临床适用性。本研究提出了一种全自动的深度学习流程（PAL-Net），用于在立体摄影测量面部模型上定位50个解剖学标志点。该方法结合了粗对齐、感兴趣区域过滤和标志点的初始近似，并采用了一种通过注意力机制增强的基于Patch的逐点CNN。在来自健康成年人的214个已标注扫描上进行训练和评估后，PAL-Net实现了3.686毫米的平均定位误差，并以2.822毫米的平均误差保持了相关的解剖学距离，该精度与观察者内变异性相当。为了评估泛化能力，该模型进一步在来自FaceScape数据集的700名受试者上进行了评估，实现了0.41毫米的逐点误差和0.38毫米的距离误差。与现有方法相比，PAL-Net在精度和计算成本之间提供了有利的权衡。虽然在网格质量较差的区域（例如耳朵、发际线）性能会下降，但该方法在大多数解剖区域表现出了一致的准确性。PAL-Net在不同数据集和面部区域间均能有效泛化，在逐点评估和结构评估方面均优于现有方法。它为高通量三维人体测量分析提供了一个轻量级、可扩展的解决方案，具有支持临床工作流程和减少对人工标注依赖的潜力。源代码可在 https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention 找到。