We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.
翻译:我们提出Singpath-VL,一个视觉-语言大模型,以填补宫颈细胞学领域人工智能助手的空缺。近年来,多模态大语言模型(MLLMs)的进展显著推动了计算病理学领域的发展。然而,它们在细胞病理学,特别是宫颈细胞学中的应用仍未得到充分探索,这主要是由于缺乏大规模、高质量的标注数据集。为弥补这一空白,我们首先开发了一个新颖的三阶段流程,用于合成一个百万规模的图像-描述数据集。该流程利用多个通用MLLMs作为弱标注器,通过共识融合和专家知识注入来优化其输出,从而生成高保真的细胞形态学描述。利用该数据集,我们随后通过多阶段策略对Qwen3-VL-4B模型进行微调,创建了一个专门的细胞病理学MLLM。最终模型命名为Singpath-VL,在细粒度形态感知和细胞级诊断分类方面展现出卓越性能。为推进该领域发展,我们将开源部分合成数据集及基准测试。