Histopathological Image Classification with Cell Morphology Aware Deep Neural Networks

Histopathological images are widely used for the analysis of diseased (tumor) tissues and patient treatment selection. While the majority of microscopy image processing was previously done manually by pathologists, recent advances in computer vision allow for accurate recognition of lesion regions with deep learning-based solutions. Such models, however, usually require extensive annotated datasets for training, which is often not the case in the considered task, where the number of available patient data samples is very limited. To deal with this problem, we propose a novel DeepCMorph model pre-trained to learn cell morphology and identify a large number of different cancer types. The model consists of two modules: the first one performs cell nuclei segmentation and annotates each cell type, and is trained on a combination of 8 publicly available datasets to ensure its high generalizability and robustness. The second module combines the obtained segmentation map with the original microscopy image and is trained for the downstream task. We pre-trained this module on the Pan-Cancer TCGA dataset consisting of over 270K tissue patches extracted from 8736 diagnostic slides from 7175 patients. The proposed solution achieved a new state-of-the-art performance on the dataset under consideration, detecting 32 cancer types with over 82% accuracy and outperforming all previously proposed solutions by more than 4%. We demonstrate that the resulting pre-trained model can be easily fine-tuned on smaller microscopy datasets, yielding superior results compared to the current top solutions and models initialized with ImageNet weights. The codes and pre-trained models presented in this paper are available at: https://github.com/aiff22/DeepCMorph

翻译：组织病理学图像广泛应用于病变（肿瘤）组织分析和患者治疗方案选择。尽管以往多数显微图像处理工作由病理学家手动完成，但近期计算机视觉领域的进展使得基于深度学习的解决方案能够准确识别病灶区域。然而，此类模型通常需要大量标注数据集进行训练，而在本研究所针对的任务中，可用患者数据样本数量极为有限。为解决该问题，我们提出了一种新颖的DeepCMorph模型，通过预训练学习细胞形态并识别多种不同癌症类型。该模型包含两个模块：第一个模块执行细胞核分割并对每种细胞类型进行标注，该模块在8个公开数据集的组合上进行训练以确保其高度泛化性和鲁棒性；第二个模块将获得的分割图与原始显微图像相结合，并针对下游任务进行训练。我们在包含7175名患者、8736张诊断切片提取的超过27万组织切片的泛癌症TCGA数据集上对该模块进行了预训练。所提出的解决方案在目标数据集上实现了新的最优性能，以超过82%的准确率检测32种癌症类型，较所有先前提出的解决方案性能提升超过4%。我们证明所得预训练模型能够轻松在较小显微数据集上进行微调，相比当前最优解决方案及使用ImageNet权重初始化的模型可获得更优越的结果。本文提出的代码与预训练模型发布于：https://github.com/aiff22/DeepCMorph

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日