This paper presents a computational framework for the Wasserstein auto-encoding of merge trees (MT-WAE), a novel extension of the classical auto-encoder neural network architecture to the Wasserstein metric space of merge trees. In contrast to traditional auto-encoders which operate on vectorized data, our formulation explicitly manipulates merge trees on their associated metric space at each layer of the network, resulting in superior accuracy and interpretability. Our novel neural network approach can be interpreted as a non-linear generalization of previous linear attempts [65] at merge tree encoding. It also trivially extends to persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our algorithms, with MT-WAE computations in the orders of minutes on average. We show the utility of our contributions in two applications adapted from previous work on merge tree encoding [65]. First, we apply MT-WAE to data reduction and reliably compress merge trees by concisely representing them with their coordinates in the final layer of our auto-encoder. Second, we document an application to dimensionality reduction, by exploiting the latent space of our auto-encoder, for the visual analysis of ensemble data. We illustrate the versatility of our framework by introducing two penalty terms, to help preserve in the latent space both the Wasserstein distances between merge trees, as well as their clusters. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a C++ implementation that can be used for reproducibility.
翻译:本文提出了一种合并树的Wasserstein自编码计算框架(MT-WAE),这是经典自编码器神经网络架构在合并树Wasserstein度量空间上的新型扩展。与处理向量化数据的传统自编码器不同,我们的方法在网络的每一层都在关联度量空间中显式地操作合并树,从而实现了更高的准确性和可解释性。这种新颖的神经网络方法可被理解为对先前合并树编码线性尝试[65]的非线性推广,并且能够自然地扩展到持久性图。在公开集成数据上的大量实验表明,我们的算法具有高效性——MT-WAE的平均计算时间在分钟量级。我们通过两个改编自先前合并树编码工作[65]的应用展示了本研究的实用性:首先,将MT-WAE应用于数据降维,使用自编码器最后一层的坐标简洁表示合并树以实现可靠压缩;其次,利用自编码器的潜在空间进行集成数据的可视化分析,展示了降维的应用效果。通过引入两项惩罚项(分别用于保留合并树之间的Wasserstein距离及其聚类结构),我们验证了框架的通用性。在这两个应用中,定量实验均证实了本文框架的有效性。最后,我们提供了用于结果复现的C++实现代码。