Landmark Detection for Medical Images using a General-purpose Segmentation Model

Radiographic images are a cornerstone of medical diagnostics in orthopaedics, with anatomical landmark detection serving as a crucial intermediate step for information extraction. General-purpose foundational segmentation models, such as SAM (Segment Anything Model), do not support landmark segmentation out of the box and require prompts to function. However, in medical imaging, the prompts for landmarks are highly specific. Since SAM has not been trained to recognize such landmarks, it cannot generate accurate landmark segmentations for diagnostic purposes. Even MedSAM, a medically adapted variant of SAM, has been trained to identify larger anatomical structures, such as organs and their parts, and lacks the fine-grained precision required for orthopaedic pelvic landmarks. To address this limitation, we propose leveraging another general-purpose, non-foundational model: YOLO. YOLO excels in object detection and can provide bounding boxes that serve as input prompts for SAM. While YOLO is efficient at detection, it is significantly outperformed by SAM in segmenting complex structures. In combination, these two models form a reliable pipeline capable of segmenting not only a small pilot set of eight anatomical landmarks but also an expanded set of 72 landmarks and 16 regions with complex outlines, such as the femoral cortical bone and the pelvic inlet. By using YOLO-generated bounding boxes to guide SAM, we trained the hybrid model to accurately segment orthopaedic pelvic radiographs. Our results show that the proposed combination of YOLO and SAM yields excellent performance in detecting anatomical landmarks and intricate outlines in orthopaedic pelvic radiographs.

翻译：放射影像在骨科医学诊断中具有基础性地位，而解剖关键点检测是信息提取的关键中间步骤。通用基础分割模型（如SAM）本身不支持关键点分割，需要提示信息才能运行。然而在医学影像中，关键点的提示信息具有高度特异性。由于SAM未接受过此类关键点的识别训练，无法生成适用于诊断的精确关键点分割结果。即使是医学适配的SAM变体MedSAM，其训练目标也仅限于识别器官及其组成部分等较大解剖结构，缺乏骨科骨盆关键点检测所需的细粒度精度。为突破这一局限，我们提出利用另一种通用非基础模型：YOLO。YOLO在目标检测方面表现卓越，可提供作为SAM输入提示的边界框。虽然YOLO在检测方面效率突出，但在复杂结构分割任务上显著逊色于SAM。两者结合形成的可靠流程，不仅能分割包含八个解剖关键点的小型试验集，还能处理扩展的72个关键点集及16个具有复杂轮廓的区域（如股骨皮质和骨盆入口）。通过采用YOLO生成的边界框引导SAM，我们训练出能够精确分割骨科骨盆X光片的混合模型。实验结果表明，所提出的YOLO与SAM组合方案在骨科骨盆X光片的解剖关键点检测和复杂轮廓分割方面表现出卓越性能。