IOSVLM: A 3D Vision-Language Model for Unified Dental Diagnosis from Intraoral Scans

3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.

翻译：三维口内扫描（IOS）因其丰富的几何证据在常规牙科诊疗中日益普及，而统一的多疾病诊断对于临床记录与交流具有重要价值。尽管近期研究引入了牙科视觉-语言模型（VLM），以实现在二维图像或由IOS渲染的多视角图像上的统一诊断与报告生成，但这些方法未能充分利用原生三维几何信息。此类研究既具必要性又面临挑战，主要源于：（i）异构的扫描形态与复杂的IOS拓扑结构；（ii）多疾病共现伴随的类别不平衡与细粒度形态学模糊性；（iii）有限的三维IOS-文本配对数据。为此，我们提出IOSVLM——一种端到端的三维VLM，该模型将扫描数据表示为点云，并采用三维编码器-投影器-大语言模型架构，以实现统一诊断与生成式视觉问答（VQA）。同时，我们构建了IOSVQA，一个大规模多源IOS诊断VQA数据集，包含19,002个病例、249,055个VQA对，涵盖23种口腔疾病及多种异构扫描类型。为弥合无色IOS数据与依赖颜色的三维预训练之间的分布差异，我们提出一种几何-色彩代理机制，以稳定细粒度几何感知与跨模态对齐。进一步采用两阶段课程训练策略以增强模型鲁棒性。IOSVLM在各项基准测试中均显著优于现有强基线模型，宏准确率提升至少+9.58%，宏F1分数提升至少+1.46%，这验证了直接三维几何建模在IOS诊断中的有效性。