Medical imaging technologies are generating increasingly large amounts of high-quality, information-dense data. Despite the progress, practical use of advanced imaging technologies for research and diagnosis remains limited by cost and availability, so information-sparse data such as H&E stains are relied on in practice. The study of diseased tissue requires methods which can leverage these information-dense data to extract more value from routine, information-sparse data. Using self-supervised deep learning, we demonstrate that it is possible to distil knowledge during training from information-dense data into models which only require information-sparse data for inference. This improves downstream classification accuracy on information-sparse data, making it comparable with the fully-supervised baseline. We find substantial effects on the learned representations, and this training process identifies subtle features which otherwise go undetected. This approach enables the design of models which require only routine images, but contain insights from state-of-the-art data, allowing better use of the available resources.
翻译:医学成像技术正产生越来越多高质量、信息密集的数据。尽管取得了进展,先进成像技术用于研究和诊断的实际应用仍受限于成本和可获取性,因此在实践中依赖于H&E染色等信息稀疏的数据。疾病组织的研究需要能够利用这些信息密集数据的方法,从常规信息稀疏的数据中提取更多价值。通过自监督深度学习,我们证明能够在训练过程中将信息密集数据的知识蒸馏到仅需信息稀疏数据进行推理的模型中。这提高了信息稀疏数据上的下游分类准确率,使其与完全监督基线模型相当。我们发现这对学习表征产生显著影响,且该训练过程能够识别出原本被忽略的细微特征。该方法使设计仅需常规图像但蕴含来自先进数据洞察的模型成为可能,从而更好地利用现有资源。