We develop here a semiparametric Gaussian mixture model (SGMM) for unsupervised learning with valuable spatial information taken into consideration. Specifically, we assume for each instance a random location. Then, conditional on this random location, we assume for the feature vector a standard Gaussian mixture model (GMM). The proposed SGMM allows the mixing probability to be nonparametrically related to the spatial location. Compared with a classical GMM, SGMM is considerably more flexible and allows the instances from the same class to be spatially clustered. To estimate the SGMM, novel EM algorithms are developed and rigorous asymptotic theories are established. Extensive numerical simulations are conducted to demonstrate our finite sample performance. For a real application, we apply our SGMM method to the CAMELYON16 dataset of whole-slide images (WSIs) for breast cancer detection. The SGMM method demonstrates outstanding clustering performance.
翻译:本文提出了一种考虑空间信息的半参数高斯混合模型(SGMM),用于无监督学习。具体而言,我们为每个样本假设一个随机空间位置,并以该随机位置为条件,对特征向量采用标准高斯混合模型(GMM)进行建模。所提出的SGMM允许混合概率通过非参数形式与空间位置相关联。与经典GMM相比,SGMM具有更高的灵活性,并使得同一类别的样本在空间上呈现聚类特性。为估计SGMM参数,我们开发了新颖的EM算法,并建立了严格的理论渐近性质。通过大量数值模拟验证了模型在有限样本下的优异性能。在实际应用方面,我们将SGMM方法应用于乳腺癌检测的CAMELYON16全切片图像(WSI)数据集,结果表明该方法具有卓越的聚类性能。