Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds remains underexplored, even though many important applications inherently involve data distributed on, e.g., spheres (climate and geological phenomena), and tori (dihedral angles in proteins). By leveraging optimal transport theory and c-concave functions, we meaningfully define conditional vector quantile functions of high-dimensional variables on manifolds (M-CVQFs). Our approach allows for quantile estimation, regression, and computation of conditional confidence sets and likelihoods. We demonstrate the approach's efficacy and provide insights regarding the meaning of non-Euclidean quantiles through synthetic and real data experiments.
翻译:分位数回归是一种无需分布假设的统计工具,用于估计给定解释变量条件下目标变量的条件分位数。传统分位数回归受限于目标分布为定义在欧氏空间中的单变量分布这一假设。尽管分位数的概念近期已推广至多变量分布,但针对流形上多变量分布的分位数回归仍研究不足——然而许多重要应用场景中数据天然分布于流形上,例如球面(气候与地质现象)和环面(蛋白质二面角)。通过利用最优传输理论与c-凹函数,我们合理定义了流形上高维变量的条件向量分位数函数(M-CVQFs)。该方法可实现分位数估计、回归以及条件置信集与似然度的计算。我们通过合成数据与真实数据实验验证了该方法的有效性,并揭示了非欧氏分位数的内涵意义。