Singpath-VL Technical Report

We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significantly propelled the field of computational pathology. However, their application in cytopathology, particularly cervical cytology, remains underexplored, primarily due to the scarcity of large-scale, high-quality annotated datasets. To bridge this gap, we first develop a novel three-stage pipeline to synthesize a million-scale image-description dataset. The pipeline leverages multiple general-purpose MLLMs as weak annotators, refines their outputs through consensus fusion and expert knowledge injection, and produces high-fidelity descriptions of cell morphology. Using this dataset, we then fine-tune the Qwen3-VL-4B model via a multi-stage strategy to create a specialized cytopathology MLLM. The resulting model, named Singpath-VL, demonstrates superior performance in fine-grained morphological perception and cell-level diagnostic classification. To advance the field, we will open-source a portion of the synthetic dataset and benchmark.

翻译：我们提出Singpath-VL，一个视觉-语言大模型，以填补宫颈细胞学领域人工智能助手的空缺。近年来，多模态大语言模型（MLLMs）的进展显著推动了计算病理学领域的发展。然而，它们在细胞病理学，特别是宫颈细胞学中的应用仍未得到充分探索，这主要是由于缺乏大规模、高质量的标注数据集。为弥补这一空白，我们首先开发了一个新颖的三阶段流程，用于合成一个百万规模的图像-描述数据集。该流程利用多个通用MLLMs作为弱标注器，通过共识融合和专家知识注入来优化其输出，从而生成高保真的细胞形态学描述。利用该数据集，我们随后通过多阶段策略对Qwen3-VL-4B模型进行微调，创建了一个专门的细胞病理学MLLM。最终模型命名为Singpath-VL，在细粒度形态感知和细胞级诊断分类方面展现出卓越性能。为推进该领域发展，我们将开源部分合成数据集及基准测试。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

关于 GPT-5.2、Gemini 3 Pro、Qwen3-VL、豆包 1.8、Grok 4.1 Fast、Nano Banana Pro 及 Seedream 4.5 的安全性研究报告

专知会员服务

25+阅读 · 1月18日

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

MME-Survey：多模态大型语言模型评估的综合性调查

专知会员服务

43+阅读 · 2024年12月1日