3D intraoral scans (IOS) are increasingly adopted in routine dentistry due to abundant geometric evidence, and unified multi-disease diagnosis is desirable for clinical documentation and communication. While recent works introduce dental vision-language models (VLMs) to enable unified diagnosis and report generation on 2D images or multi-view images rendered from IOS, they do not fully leverage native 3D geometry. Such work is necessary and also challenging, due to: (i) heterogeneous scan forms and the complex IOS topology, (ii) multi-disease co-occurrence with class imbalance and fine-grained morphological ambiguity, (iii) limited paired 3D IOS-text data. Thus, we present IOSVLM, an end-to-end 3D VLM that represents scans as point clouds and follows a 3D encoder-projector-LLM design for unified diagnosis and generative visual question-answering (VQA), together with IOSVQA, a large-scale multi-source IOS diagnosis VQA dataset comprising 19,002 cases and 249,055 VQA pairs over 23 oral diseases and heterogeneous scan types. To address the distribution gap between color-free IOS data and color-dependent 3D pre-training, we propose a geometry-to-chromatic proxy that stabilizes fine-grained geometric perception and cross-modal alignment. A two-stage curriculum training strategy further enhances robustness. IOSVLM consistently outperforms strong baselines, achieving gains of at least +9.58% macro accuracy and +1.46% macro F1, indicating the effectiveness of direct 3D geometry modeling for IOS-based diagnosis.
翻译:三维口内扫描(IOS)因其丰富的几何证据在常规牙科诊疗中日益普及,而统一的多疾病诊断对于临床记录与交流具有重要价值。尽管近期研究引入了牙科视觉-语言模型(VLM),以实现在二维图像或由IOS渲染的多视角图像上的统一诊断与报告生成,但这些方法未能充分利用原生三维几何信息。此类研究既具必要性又面临挑战,主要源于:(i)异构的扫描形态与复杂的IOS拓扑结构;(ii)多疾病共现伴随的类别不平衡与细粒度形态学模糊性;(iii)有限的三维IOS-文本配对数据。为此,我们提出IOSVLM——一种端到端的三维VLM,该模型将扫描数据表示为点云,并采用三维编码器-投影器-大语言模型架构,以实现统一诊断与生成式视觉问答(VQA)。同时,我们构建了IOSVQA,一个大规模多源IOS诊断VQA数据集,包含19,002个病例、249,055个VQA对,涵盖23种口腔疾病及多种异构扫描类型。为弥合无色IOS数据与依赖颜色的三维预训练之间的分布差异,我们提出一种几何-色彩代理机制,以稳定细粒度几何感知与跨模态对齐。进一步采用两阶段课程训练策略以增强模型鲁棒性。IOSVLM在各项基准测试中均显著优于现有强基线模型,宏准确率提升至少+9.58%,宏F1分数提升至少+1.46%,这验证了直接三维几何建模在IOS诊断中的有效性。