Karyotype AI for Precision Oncology

Zahra Shamsi,Drew Bryant,Jacob Wilson,Xiaoyu Qu,Avinava Dubey,Konik Kothari,Mostafa Dehghani,Mariya Chavarha,Valerii Likhosherstov,Brian Williams,Michael Frumkin,Fred Appelbaum,Krzysztof Choromanski,Ali Bashir,Min Fang

Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions.

翻译：染色体分析对于诊断遗传性疾病至关重要。对于血液系统恶性肿瘤，通过组型分析识别体细胞克隆性畸变仍是当前标准诊疗方法。然而，由于主要依赖人工流程且需要专业经验来识别和注释畸变，组型分析成本高昂且耗时。迄今为止，自动化组型分析的尝试在畸变检测方面仍存在不足。利用来自弗雷德·哈钦森癌症中心超过5年间约1万份患者样本和约5万张组型图的训练集，我们创建了代表单个染色体的标注图像集。这些单条染色体图像被用于训练和评估深度学习模型，以实现对24条人类染色体的分类及染色体畸变识别。最优精度模型采用近期提出的拓扑视觉Transformer（TopViTs）结合两级块托普利茨掩码，以此引入结构归纳偏置。TopViT在染色体识别任务中表现优于CNN（Inception）模型，准确率超过99.3%，并且在大多数畸变检测中准确率超过99%。值得注意的是，即使在"少样本"学习场景下，我们仍能展现高质量性能。引入克隆性定义显著提升了精确率和召回率（灵敏度）。当应用于"零样本"场景时，该模型无需训练即可捕获畸变，在召回率超过50%的条件下保持完美精确率。这些结果表明，现代深度学习模型能够在染色体畸变检测中达到接近专家水平的性能。据我们所知，这是首个证明TopViTs下游有效性的研究。这些结果不仅为加速患者结果报告开辟了令人振奋的机遇，还提供了一种可扩展的技术，用于低丰度染色体病变的早期筛查。