Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images

One of the main obstacles of adopting digital pathology is the challenge of efficient processing of hyperdimensional digitized biopsy samples, called whole slide images (WSIs). Exploiting deep learning and introducing compact WSI representations are urgently needed to accelerate image analysis and facilitate the visualization and interpretability of pathology results in a postpandemic world. In this paper, we introduce a new evolutionary approach for WSI representation based on large-scale multi-objective optimization (LSMOP) of deep embeddings. We start with patch-based sampling to feed KimiaNet , a histopathology-specialized deep network, and to extract a multitude of feature vectors. Coarse multi-objective feature selection uses the reduced search space strategy guided by the classification accuracy and the number of features. In the second stage, the frequent features histogram (FFH), a novel WSI representation, is constructed by multiple runs of coarse LSMOP. Fine evolutionary feature selection is then applied to find a compact (short-length) feature vector based on the FFH and contributes to a more robust deep-learning approach to digital pathology supported by the stochastic power of evolutionary algorithms. We validate the proposed schemes using The Cancer Genome Atlas (TCGA) images in terms of WSI representation, classification accuracy, and feature quality. Furthermore, a novel decision space for multicriteria decision making in the LSMOP field is introduced. Finally, a patch-level visualization approach is proposed to increase the interpretability of deep features. The proposed evolutionary algorithm finds a very compact feature vector to represent a WSI (almost 14,000 times smaller than the original feature vectors) with 8% higher accuracy compared to the codes provided by the state-of-the-art methods.

翻译：数字病理学推广的主要障碍之一是高效处理超维度数字化活检样本（即全切片图像，WSIs）的挑战。在后疫情时代，迫切需要利用深度学习并引入紧凑的WSI表示方法，以加速图像分析并促进病理结果的可视化与可解释性。本文提出了一种基于大规模多目标优化（LSMOP）的深度嵌入进化方法用于WSI表示。首先，我们采用基于补丁的采样方法，将图像输入至专用于组织病理学的深度网络KimiaNet，提取大量特征向量。粗粒度多目标特征选择利用分类准确率和特征数量引导的搜索空间缩减策略。在第二阶段，通过多次运行粗粒度LSMOP构建一种新型WSI表示——频繁特征直方图（FFH）。随后，基于FFH应用细粒度进化特征选择，以获取紧凑（短长度）特征向量，并借助进化算法的随机性能力，为数字病理学提供更稳健的深度学习方法。我们使用癌症基因组图谱（TCGA）图像，从WSI表示、分类准确率和特征质量三个方面验证所提方案。此外，本文还提出了一种用于LSMOP领域多准则决策的新型决策空间。最后，提出了一种补丁级可视化方法以增强深度特征的可解释性。与现有最优方法提供的编码相比，所提出的进化算法能够找到一种非常紧凑的特征向量（比原始特征向量小约14000倍）来表示WSI，同时准确率提高8%。