The Mapper algorithm is an essential tool for visualizing complex, high dimensional data in topology data analysis (TDA) and has been widely used in biomedical research. It outputs a combinatorial graph whose structure implies the shape of the data. However,the need for manual parameter tuning and fixed intervals, along with fixed overlapping ratios may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent. In this work, we develop a soft Mapper framework based on a Gaussian mixture model(GMM) for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a stochastic gradient descent algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank (MSBB) successfully identifies a distinct subgroup of Alzheimer's Disease.
翻译:Mapper算法是拓扑数据分析(TDA)中用于可视化复杂高维数据的重要工具,并已广泛应用于生物医学研究。它输出一个组合图,其结构揭示了数据的形状。然而,标准Mapper算法需要手动调整参数、使用固定区间以及固定的重叠比例,这可能阻碍其性能。针对这些限制,已开发出标准Mapper算法的多种变体,但其中大多数仍需要手动调整参数。此外,包括文献中的标准版本在内的许多变体,都是在确定性框架内构建的,忽略了数据固有的不确定性。为了缓解这些限制,本文引入了一种新颖的框架,该框架通过一个隐藏的分配矩阵隐式地表示区间,从而能够通过随机梯度下降实现自动参数优化。本文开发了一种基于高斯混合模型(GMM)的软Mapper框架,用于灵活且隐式的区间构建。我们进一步通过引入Mapper图模式作为输出图的点估计,来说明软Mapper算法的鲁棒性。此外,提出了一种结合特定拓扑损失函数的随机梯度下降算法,用于优化模型中的参数。模拟和应用研究均证明了其在捕捉底层拓扑结构方面的有效性。此外,应用于来自西奈山/JJ Peters VA医学中心脑库(MSBB)的RNA表达数据集时,该算法成功识别出阿尔茨海默病的一个独特亚组。