Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robust alternative to the other classical technique of Ordinary Least Square Regression (OLS). However, while there exist efficient algorithms for OLS, almost all of the known results for QR are only weakly polynomial. Towards filling this gap, this paper proposes several efficient strongly polynomial algorithms for QR for various settings. For two dimensional QR, making a connection to the geometric concept of $k$-set, we propose an algorithm with a deterministic worst-case time complexity of $\mathcal{O}(n^{4/3} polylog(n))$ and an expected time complexity of $\mathcal{O}(n^{4/3})$ for the randomized version. We also propose a randomized divide-and-conquer algorithm -- RandomizedQR with an expected time complexity of $\mathcal{O}(n\log^2{(n)})$ for two dimensional QR problem. For the general case with more than two dimensions, our RandomizedQR algorithm has an expected time complexity of $\mathcal{O}(n^{d-1}\log^2{(n)})$.
翻译:线性回归是统计学和机器学习中的一项基础技术,其目标是在响应(即因变量)与一个或多个预测变量(即自变量)之间建立线性预测模型。本文重新审视了经典的分位数回归(QR)技术,它在统计上比另一种经典技术——普通最小二乘回归(OLS)更具鲁棒性。然而,尽管存在针对OLS的高效算法,但几乎所有已知的QR结果都仅具有弱多项式性质。为填补这一空白,本文针对多种设定提出了若干用于QR的高效强多项式算法。对于二维QR,通过建立与几何概念$k$-集($k$-set)的联系,我们提出了一种确定最坏情况时间复杂度为$\mathcal{O}(n^{4/3} polylog(n))$的确定性算法,以及随机化版本的期望时间复杂度为$\mathcal{O}(n^{4/3})$。我们还提出了一种随机化分治算法——RandomizedQR,其针对二维QR问题的期望时间复杂度为$\mathcal{O}(n\log^2{(n)})$。对于超过二维的一般情形,我们的RandomizedQR算法具有期望时间复杂度$\mathcal{O}(n^{d-1}\log^2{(n)})$。