Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited. Text PLMs cannot process single-cell RNA sequencing data, while cell PLMs lack the ability to handle free text, restricting their use in multimodal tasks. Existing efforts to bridge these modalities often suffer from information loss or inadequate single-modal pre-training, leading to suboptimal performances. To address these challenges, we propose Single-Cell MultiModal Generative Pre-trained Transformer (scMMGPT), a unified PLM for joint cell and text modeling. scMMGPT effectively integrates the state-of-the-art cell and text PLMs, facilitating cross-modal knowledge sharing for improved performance. To bridge the text-cell modality gap, scMMGPT leverages dedicated cross-modal projectors, and undergoes extensive pre-training on 27 million cells -- the largest dataset for multimodal cell-text PLMs to date. This large-scale pre-training enables scMMGPT to excel in joint cell-text tasks, achieving an 84\% relative improvement of textual discrepancy for cell description generation, 20.5\% higher accuracy for cell type annotation, and 4\% improvement in $k$-NN accuracy for text-conditioned pseudo-cell generation, outperforming baselines.
翻译:预训练语言模型(PLMs)已彻底改变科学研究,但其在单细胞分析中的应用仍然有限。文本PLMs无法处理单细胞RNA测序数据,而细胞PLMs缺乏处理自由文本的能力,这限制了它们在多模态任务中的应用。现有连接这些模态的努力常因信息损失或单模态预训练不足而导致性能欠佳。为应对这些挑战,我们提出了单细胞多模态生成式预训练Transformer(scMMGPT),一个用于联合细胞与文本建模的统一PLM。scMMGPT有效整合了最先进的细胞与文本PLMs,促进了跨模态知识共享以提升性能。为弥合文本-细胞模态鸿沟,scMMGPT利用专用的跨模态投影器,并在2700万个细胞上进行了大规模预训练——这是迄今为止多模态细胞-文本PLMs的最大数据集。这种大规模预训练使scMMGPT在联合细胞-文本任务中表现出色,在细胞描述生成任务上实现了84%的文本差异相对改进,在细胞类型注释任务上获得了20.5%的准确率提升,在文本条件伪细胞生成的$k$-NN准确率上提高了4%,均优于基线模型。