Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using a GPT-Based VLM: A Preliminary Study on Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework

翻译：基于GPT的视觉语言模型生成颌骨囊肿全景X线片诊断发现：构建具有结构化输出的两阶段自校正循环（SLSO）框架的初步研究

Nanaka Hosokawa,Ryou Takahashi,Tomoya Kitano,Yukihiro Iida,Chisako Muramatsu,Tatsuro Hayashi,Yuta Seino,Xiangrong Zhou,Takeshi Hara,Akitoshi Katsumata,Hiroshi Fujita

from arxiv, Revised manuscript; supplementary materials added. Submitted to Diagnostics

Vision-language models (VLMs) such as GPT (Generative Pre-Trained Transformer) have shown potential for medical image interpretation; however, challenges remain in generating reliable radiological findings in clinical practice, as exemplified by dental pathologies. This study proposes a Self-correction Loop with Structured Output (SLSO) framework as an integrated processing methodology to enhance the accuracy and reliability of AI-generated findings for jaw cysts in dental panoramic radiographs. Dental panoramic radiographs with jaw cysts were used to implement a 10-step integrated processing framework incorporating image analysis, structured data generation, tooth number extraction, consistency checking, and iterative regeneration. The framework functioned as an external validation mechanism for GPT outputs. Performance was compared against the conventional Chain-of-Thought (CoT) method across seven evaluation items: transparency, internal structure, borders, root resorption, tooth movement, relationships with other structures, and tooth number. The SLSO framework improved output accuracy for multiple items compared to the CoT method, with the most notable improvements observed in tooth number identification, tooth movement detection, and root resorption assessment. In successful cases, consistently structured outputs were achieved after up to five regenerations. The framework enforced explicit negative finding descriptions and suppressed hallucinations, although accurate identification of extensive lesions spanning multiple teeth remained limited. This investigation established the feasibility of the proposed integrated processing methodology and provided a foundation for future validation studies with larger, more diverse datasets.

翻译：以GPT（生成式预训练Transformer）为代表的视觉语言模型在医学影像解读中展现出潜力，但在临床实践中生成可靠的放射学发现仍存在挑战，牙科病理学领域即为典型例证。本研究提出一种具有结构化输出的自校正循环框架，作为集成处理方法以提升人工智能对颌骨囊肿在全景X线片中生成诊断发现的准确性与可靠性。研究采用包含颌骨囊肿的牙科全景X线片，实施包含影像分析、结构化数据生成、牙位提取、一致性检验和迭代再生等十个步骤的集成处理框架。该框架作为GPT输出的外部验证机制发挥作用。通过透明度、内部结构、边界、牙根吸收、牙齿移位、毗邻结构关系和牙位识别等七个评估项目，将本框架性能与传统思维链方法进行对比。相较于思维链方法，SLSO框架在多项评估项目中提升了输出准确性，其中牙位识别、牙齿移位检测和牙根吸收评估的改善最为显著。在成功案例中，经过最多五次再生迭代后获得了结构稳定的输出结果。该框架强制要求明确的阴性发现描述并有效抑制幻觉生成，但对跨越多颗牙齿的广泛性病变的准确识别仍存在局限。本研究验证了所提集成处理方法的可行性，并为未来采用更大规模、更多样化数据集的验证研究奠定了基础。