This paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage any auxiliary information on row and column structures. The GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, the GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for modeling relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse; it also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI on simulation studies and an application to human microbiome data.
翻译:本文研究具有双向结构数据的高维回归问题。为估计高维系数向量,我们提出广义矩阵分解回归(GMDR)方法,以高效利用行、列结构的辅助信息。GMDR将主成分回归(PCR)推广至双向结构数据,但与PCR不同,GMDR会选择对结果预测能力最强的成分,从而提升预测精度。针对单个变量回归系数的推断问题,我们提出广义矩阵分解推断(GMDI)——一种适用于包含所提GMDR估计量在内的广泛估计量族的高维推断框架。GMDI为建模相关行、列辅助结构提供了更多灵活性,因此无需要求真实回归系数具有稀疏性,且允许观测值存在依赖性与异方差性。我们从第一类错误率和统计功效两方面研究了GMDI的理论性质,并通过模拟实验及人类微生物组数据应用验证了GMDR与GMDI的有效性。