HistoEncoder: a digital pathology foundation model for prostate cancer

Foundation models are trained on massive amounts of data to distinguish complex patterns and can be adapted to a wide range of downstream tasks with minimal computational resources. Here, we develop a foundation model for prostate cancer digital pathology called HistoEncoder by pre-training on 48 million prostate tissue tile images. We demonstrate that HistoEncoder features extracted from tile images with similar histological patterns map closely together in the feature space. HistoEncoder outperforms models pre-trained with natural images, even without fine-tuning or with 1000 times less training data. We describe two use cases that leverage the capabilities of HistoEncoder by fine-tuning the model with a limited amount of data and computational resources. First, we show how HistoEncoder can be used to automatically annotate large-scale datasets with high accuracy. Second, we combine histomics with commonly used clinical nomograms, significantly improving prostate cancer-specific death survival models. Foundation models such as HistoEncoder can allow organizations with limited resources to build effective clinical software tools without needing extensive datasets or significant amounts of computing.

翻译：基础模型通过海量数据训练以识别复杂模式，并能以最小计算资源适配广泛的下游任务。本文通过4800万张前列腺组织切片图像的预训练，开发了一种名为HistoEncoder的前列腺癌数字病理学基础模型。我们证明，从具有相似组织学模式的切片图像中提取的HistoEncoder特征在特征空间中紧密映射。即使未经微调或使用少1000倍的训练数据，HistoEncoder仍优于基于自然图像预训练的模型。我们描述了两种利用HistoEncoder能力的应用场景：通过有限数据和计算资源对模型进行微调。首先，我们展示HistoEncoder如何以高精度自动标注大规模数据集。其次，我们将组织学特征与常用临床列线图相结合，显著提升了前列腺癌特异性死亡生存模型的性能。诸如HistoEncoder的基础模型可使资源有限的机构无需大量数据集或巨额计算资源，即可构建有效的临床软件工具。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日