Diffusion-Based Hierarchical Multi-Label Object Detection to Analyze Panoramic Dental X-rays

Due to the necessity for precise treatment planning, the use of panoramic X-rays to identify different dental diseases has tremendously increased. Although numerous ML models have been developed for the interpretation of panoramic X-rays, there has not been an end-to-end model developed that can identify problematic teeth with dental enumeration and associated diagnoses at the same time. To develop such a model, we structure the three distinct types of annotated data hierarchically following the FDI system, the first labeled with only quadrant, the second labeled with quadrant-enumeration, and the third fully labeled with quadrant-enumeration-diagnosis. To learn from all three hierarchies jointly, we introduce a novel diffusion-based hierarchical multi-label object detection framework by adapting a diffusion-based method that formulates object detection as a denoising diffusion process from noisy boxes to object boxes. Specifically, to take advantage of the hierarchically annotated data, our method utilizes a novel noisy box manipulation technique by adapting the denoising process in the diffusion network with the inference from the previously trained model in hierarchical order. We also utilize a multi-label object detection method to learn efficiently from partial annotations and to give all the needed information about each abnormal tooth for treatment planning. Experimental results show that our method significantly outperforms state-of-the-art object detection methods, including RetinaNet, Faster R-CNN, DETR, and DiffusionDet for the analysis of panoramic X-rays, demonstrating the great potential of our method for hierarchically and partially annotated datasets. The code and the data are available at: https://github.com/ibrahimethemhamamci/HierarchicalDet.

翻译：由于精确治疗规划的必要性，利用全景X光片识别不同牙齿疾病的应用已大幅增加。尽管已有众多机器学习模型被开发用于解读全景X光片，但目前尚不存在一个能够同时识别问题牙齿、标注牙位编号并给出诊断的端到端模型。为开发此类模型，我们按照FDI系统将三种不同类型的标注数据按层级结构组织：第一类仅标注象限，第二类标注象限与牙位编号，第三类则完整标注象限、牙位编号及诊断。为联合学习所有三个层级的信息，我们引入了一种新型基于扩散的分层级多标签目标检测框架，该方法将目标检测视为从噪声框到目标框的去噪扩散过程。具体而言，为利用层级化标注数据，我们采用了一种新颖的噪声框操控技术，通过按层级顺序将已训练模型的推理结果融入扩散网络的去噪过程。同时，我们利用多标签目标检测方法高效地从部分标注中学习，并生成每个异常牙齿用于治疗规划所需的全部信息。实验结果表明，我们的方法在全景X光片分析任务上显著优于包括RetinaNet、Faster R-CNN、DETR和DiffusionDet在内的现有最优目标检测方法，充分展现了本方法在层级化及部分标注数据集上的巨大潜力。代码与数据可在以下网址获取：https://github.com/ibrahimethemhamamci/HierarchicalDet。