Pre-smoothing is a technique aimed at increasing the signal-to-noise ratio in data to improve subsequent estimation and model selection in regression problems. However, pre-smoothing has thus far been limited to the univariate response regression setting. However, there are many scientific applications in which interest lies in multi-response regression problems, particularly when the number of responses is large. Motivated by this setting, this article proposes a technique for data pre-smoothing based on low-rank approximation. We establish theoretical results on the performance of the proposed methodology, which show that in this large-response setting, the proposed technique outperforms ordinary least squares estimation with the mean squared error criterion, whilst being computationally more efficient than alternative approaches such as reduced rank regression. We quantify our estimator's benefit empirically in a number of simulated experiments. We also demonstrate our proposed low-rank pre-smoothing technique on real data arising from the environmental and biological sciences.
翻译:预平滑是一种旨在提高数据信噪比的技术,用于改进回归问题中的后续估计和模型选择。然而,此前预平滑仅限于单变量响应回归设置。但许多科学应用中,关注点在于多响应回归问题,尤其是当响应数量庞大时。受此场景启发,本文提出了一种基于低秩逼近的数据预平滑技术。我们建立了所提方法性能的理论结果,表明在大规模响应设置下,所提方法在均方误差准则上优于普通最小二乘估计,同时在计算效率上高于降秩回归等替代方法。我们通过一系列模拟实验量化了估计器的经验优势,并利用来自环境与生物科学领域的真实数据展示了所提低秩预平滑技术的可行性。