We present a new technique for visualizing high-dimensional data called cluster MDS (cl-MDS), which addresses a common difficulty of dimensionality reduction methods: preserving both local and global structures of the original sample in a single 2-dimensional visualization. Its algorithm combines the well-known multidimensional scaling (MDS) tool with the $k$-medoids data clustering technique, and enables hierarchical embedding, sparsification and estimation of 2-dimensional coordinates for additional points. While cl-MDS is a generally applicable tool, we also include specific recipes for atomic structure applications. We apply this method to non-linear data of increasing complexity where different layers of locality are relevant, showing a clear improvement in their retrieval and visualization quality.
翻译:我们提出了一种名为聚类MDS(cl-MDS)的高维数据可视化新技术,旨在解决降维方法的一个常见难题:在单一二维可视化中同时保持原始样本的局部与全局结构。该算法将经典的多维尺度分析(MDS)工具与$k$-中心点数据聚类技术相结合,支持分层嵌入、稀疏化处理以及对新增点的二维坐标估计。尽管cl-MDS是一种通用工具,我们还特别提供了针对原子结构应用的具体实施方案。我们将该方法应用于复杂度递增的非线性数据(其中不同层级的局部性特征均具有意义),结果显示其在数据重构与可视化质量方面均有显著提升。