High-dimensional compositional data are frequently encountered in many fields of modern scientific research. In regression analysis of compositional data, the presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component of the composition has an impact on others. To simultaneously address the compositional nature and measurement errors in the high-dimensional design matrix of compositional covariates, we propose a new method named Error-in-composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign-consistent selection properties are established. We then illustrate the finite sample performance of Eric Lasso using simulation studies and demonstrate its potential usefulness in a real data application example.
翻译:高维组合数据在现代科学研究的诸多领域中频繁出现。在组合数据的回归分析中,协变量测量误差的存在对现有的统计误差在变量回归分析方法提出了重大挑战,因为组合中某一成分的测量误差会影响其他成分。为同时处理组合协变量高维设计矩阵的组合特性与测量误差,我们提出了一种名为"组合误差Lasso"(Eric Lasso)的新方法,用于处理受污染组合预测变量的回归分析。本文建立了Eric Lasso的估计误差界及其渐近符号一致性选择性质。随后通过模拟研究阐明了Eric Lasso在有限样本下的性能,并通过实际数据应用示例展示了其潜在实用价值。