Multidimensional scaling (MDS) is a widely used approach to representing high-dimensional, dependent data. MDS works by assigning each observation a location on a low-dimensional geometric manifold, with distance on the manifold representing similarity. We propose a Bayesian approach to multidimensional scaling when the low-dimensional manifold is hyperbolic. Using hyperbolic space facilitates representing tree-like structures common in many settings (e.g. text or genetic data with hierarchical structure). A Bayesian approach provides regularization that minimizes the impact of measurement error in the observed data and assesses uncertainty. We also propose a case-control likelihood approximation that allows for efficient sampling from the posterior distribution in larger data settings, reducing computational complexity from approximately $O(n^2)$ to $O(n)$. We evaluate the proposed method against state-of-the-art alternatives using simulations, canonical reference datasets, Indian village network data, and human gene expression data.
翻译:多维缩放(MDS)是一种广泛用于表示高维依赖数据的方法。该方法通过将每个观测值分配至低维几何流形上的位置,并利用流形上的距离表征相似性。针对低维流形为双曲空间的情形,我们提出了一种贝叶斯多维缩放方法。双曲空间的应用有助于表示许多场景中常见的树状结构(例如具有层级结构的文本或遗传数据)。贝叶斯方法提供的正则化机制既能最小化观测数据中测量误差的影响,又能评估不确定性。我们还提出了一种病例对照似然近似方法,该方法能够在较大数据集场景下从后验分布中进行高效采样,将计算复杂度从约$O(n^2)$降至$O(n)$。通过仿真实验、经典参考数据集、印度村庄网络数据及人类基因表达数据,我们将所提方法与当前最优替代方案进行了比较评估。