Multidimensional scaling (MDS) is an unsupervised learning technique that preserves pairwise distances between observations and is commonly used for analyzing multivariate biological datasets. Recent advances in MDS have achieved successful classification results, but the configurations heavily depend on the choice of hyperparameters, limiting its broader application. Here, we present a self-supervised MDS approach informed by the dispersions of observations that share a common binary label ($F$-ratio). Our visualization accurately configures the $F$-ratio while consistently preserving the global structure with a low data distortion compared to existing dimensionality reduction tools. Using an algal microbiome dataset, we show that this new method better illustrates the community's response to the host, suggesting its potential impact on microbiology and ecology data analysis.
翻译:多维缩放(MDS)是一种通过保持观测值之间成对距离的无监督学习技术,常用于分析多变量生物数据集。MDS的最新进展已实现成功的分类结果,但配置高度依赖超参数选择,限制了其更广泛的应用。本文提出一种自监督MDS方法,利用共享同一二元标签($F$比率)的观测值离散度进行指导。与现有降维工具相比,我们的可视化方法能在精确配置$F$比率的同时,以较低的数据失真度一致保持全局结构。通过使用藻类微生物组数据集,我们证明这种新方法能更好地展示群落对宿主的响应,暗示其在微生物学和生态学数据分析中的潜在影响。