We propose a method for high dimensional multivariate regression that is robust to random error distributions that are heavy-tailed or contain outliers, while preserving estimation accuracy in normal random error distributions. We extend the Wilcoxon-type regression to a multivariate regression model as a tuning-free approach to robustness. Furthermore, the proposed method regularizes the L1 and L2 terms of the clustering based on k-means, which is extended from the multivariate cluster elastic net. The estimation of the regression coefficient and variable selection are produced simultaneously. Moreover, considering the relationship among the correlation of response variables through the clustering is expected to improve the estimation performance. Numerical simulation demonstrates that our proposed method overperformed the multivariate cluster method and other methods of multiple regression in the case of heavy-tailed error distribution and outliers. It also showed stability in normal error distribution. Finally, we confirm the efficacy of our proposed method using a data example for the gene associated with breast cancer.
翻译:本文提出了一种高维多元回归方法,该方法对重尾或包含异常值的随机误差分布具有鲁棒性,同时在正态随机误差分布下仍能保持估计精度。我们将Wilcoxon型回归扩展至多元回归模型,作为一种无需调参的鲁棒性处理方法。此外,所提出的方法基于k均值聚类对L1与L2项进行正则化,该正则化框架由多元聚类弹性网络扩展而来。回归系数的估计与变量选择可同步完成。同时,通过聚类考虑响应变量间的相关性关系,有望提升估计性能。数值模拟表明,在重尾误差分布和存在异常值的情况下,我们提出的方法优于多元聚类方法及其他多元回归方法,且在正态误差分布下也表现出稳定性。最后,我们通过一个与乳腺癌相关基因的数据实例验证了所提方法的有效性。