Quantile-Quantile (Q-Q) plots are widely used for assessing the distributional similarity between two datasets. Traditionally, Q-Q plots are constructed for univariate distributions, making them less effective in capturing complex dependencies present in multivariate data. In this paper, we propose a novel approach for constructing multivariate Q-Q plots, which extend the traditional Q-Q plot methodology to handle high-dimensional data. Our approach utilizes optimal transport (OT) and entropy-regularized optimal transport (EOT) to align the empirical quantiles of the two datasets. Additionally, we introduce another technique based on OT and EOT potentials which can effectively compare two multivariate datasets. Through extensive simulations and real data examples, we demonstrate the effectiveness of our proposed approach in capturing multivariate dependencies and identifying distributional differences such as tail behaviour. We also propose two test statistics based on the Q-Q and potential plots to compare two distributions rigorously.
翻译:分位数-分位数图广泛应用于评估两个数据集之间的分布相似性。传统上,分位数-分位数图针对单变量分布构建,在捕捉多元数据中存在的复杂依赖关系方面效果不佳。本文提出了一种构建多元分位数-分位数图的新方法,将传统分位数-分位数图方法扩展到高维数据处理。我们的方法利用最优传输和熵正则化最优传输对齐两个数据集的经验分位数。此外,我们引入了另一种基于最优传输和熵正则化最优传输势的技术,可有效比较两个多元数据集。通过大量模拟实验和真实数据案例,我们证明了所提方法在捕捉多元依赖关系及识别尾部行为等分布差异方面的有效性。我们还基于分位数-分位数图和势图提出了两种检验统计量,以严格比较两个分布。