We introduce algorithms for robustly computing intrinsic coordinates on point clouds. Our approach relies on generating many candidate coordinates by subsampling the data and varying hyperparameters of the embedding algorithm (e.g., manifold learning). We then identify a subset of representative embeddings by clustering the collection of candidate coordinates and using shape descriptors from topological data analysis. The final output is the embedding obtained as an average of the representative embeddings using generalized Procrustes analysis. We validate our algorithm on both synthetic data and experimental measurements from genomics, demonstrating robustness to noise and outliers.
翻译:本文提出了一种在点云数据上鲁棒计算内蕴坐标的算法。该方法通过多次对数据进行子采样并调整嵌入算法(例如流形学习)的超参数,生成大量候选坐标。随后,我们通过拓扑数据分析中的形状描述符对候选坐标集合进行聚类,从而识别出一组代表性嵌入。最终输出是通过广义普氏分析对代表性嵌入进行平均得到的嵌入结果。我们在合成数据及基因组学实验测量数据上验证了该算法,证明了其对噪声和异常值具有鲁棒性。