The classical latent factor model for linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant for the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features vector and the response variable. This framework allows for a comprehensive study of the partial-least-squares (PLS) algorithm under random design. In particular, a novel perturbation bound for PLS solutions is proven and the high-probability $L^2$-estimation rate for the PLS estimator is obtained. This novel framework also sheds light on the performance of other regularisation methods for ill-posed linear regression that exploit sparsity or unsupervised projection. The theoretical findings are confirmed by numerical studies on both real and simulated data.
翻译:经典线性回归中的潜因子模型通过假设在未知正交变换下,特征由与响应相关和不相关的子集组成而得到扩展。此外,我们仅对相关特征向量与响应变量施加联合低维约束。该框架允许在随机设计条件下对偏最小二乘算法进行系统研究。特别地,我们证明了PLS解的新型扰动界,并获得了PLS估计量的高概率$L^2$估计速率。这一新框架也阐明了利用稀疏性或无监督投影处理病态线性回归问题的其他正则化方法的性能。对真实数据与模拟数据的数值实验验证了理论发现。