Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in many other areas. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$, the PLS estimator of the vector of coefficients of the linear regression model based on $L$ PLS components, and $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$, the one obtained by ordinary least squares (OLS), as a function of $L$. Specifically, ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$ is the vector of coefficients in the aforementioned Krylov subspace that is closest to $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$ in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$ and $\hat{\boldsymbol \beta}_{\mathrm{OLS}}$ depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.
翻译:偏最小二乘回归(PLS)是一种在化学计量学领域引入并成功应用于许多其他领域的降维技术。PLS成分通过最大化回归变量线性组合与目标变量之间的协方差获得。本研究聚焦于其在标量回归问题中的应用。PLS回归旨在寻找一个与PLS成分子集线性组合的最小二乘预测器。此外,PLS回归可被表述为限制在Krylov子空间中的最小二乘问题。利用这种等价形式,分析基于L个PLS成分的线性回归模型系数向量的PLS估计量${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$与普通最小二乘(OLS)估计量$\hat{\boldsymbol \beta}_{\mathrm{OLS}}$之间的距离随L变化的函数关系。具体而言,${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$是前述Krylov子空间中,在OLS估计的协方差矩阵马氏距离度量下最接近$\hat{\boldsymbol \beta}_{\mathrm{OLS}}$的系数向量。我们给出了一个仅依赖于回归变量协方差矩阵特征值分布的这一距离的界限。通过合成数据和真实世界数据的数值示例,说明了${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}}^{\scriptscriptstyle {(L)}}$与$\hat{\boldsymbol \beta}_{\mathrm{OLS}}$之间的距离如何取决于回归变量协方差矩阵特征值聚集的簇数。