Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
翻译:分位数回归(QR)是一种强大的工具,用于在给定解释特征 $\boldsymbol{\mathrm{X}}$ 的条件下估计目标变量 $\mathrm{Y}$ 的一个或多个条件分位数。由于目标函数形式的限制且分位数概念在多变量分布中缺乏标准定义,传统QR仅适用于标量目标变量。近期,通过最优传输理论对多变量分布分位数概念的有意义推广,向量分位数回归(VQR)被提出作为QR在向量值目标变量上的扩展。尽管VQR具有理论优势,但实际应用仍面临若干局限:(i)假定目标变量 $\boldsymbol{\mathrm{Y}}$ 在给定特征 $\boldsymbol{\mathrm{X}}$ 条件下的分位数服从线性模型;(ii)精确形式即使在目标维度、回归分位数水平数或特征数量适中的问题中已难以求解,其松弛对偶形式可能破坏估计分位数的单调性;(iii)目前缺乏VQR的快速或可扩展求解器。本研究全面解决了上述局限:(i)将VQR扩展至非线性情形,并展示其对线性VQR的显著改进;(ii)提出{向量单调重排}方法,确保VQR估计的分位数函数具有单调性;(iii)提供快速且支持GPU加速的线性和非线性VQR求解器,其内存占用保持恒定,并证明该方法可扩展至百万级样本与数千分位数水平;(iv)发布求解器的优化Python工具包,以推动VQR在实际应用中的广泛使用。