We propose a novel inference procedure for linear combinations of high-dimensional regression coefficients in generalized estimating equations (GEE), which are widely used to analyze correlated data. Our estimator for this more general inferential target, obtained via constructing projected estimating equations, is shown to be asymptotically normally distributed under certain regularity conditions. We also introduce a data-driven cross-validation procedure to select the tuning parameter for estimating the projection direction, which is not addressed in the existing procedures. We demonstrate the robust finite-sample performance, especially in estimation bias and confidence interval coverage, of the proposed method via extensive simulations, and apply the method to a longitudinal proteomic study of COVID-19 plasma samples to investigate the proteomic signatures associated with disease severity.
翻译:我们针对广义估计方程(GEE)中高维回归系数的线性组合提出了一种新颖的推断方法。该方法通过构造投影估计方程,得到更一般推断目标的估计量,并证明在特定正则条件下该估计量渐近服从正态分布。同时,我们引入了一种数据驱动的交叉验证程序,用于选择估计投影方向的调谐参数,该问题在现有方法中未得到解决。通过大量模拟实验,我们验证了所提方法在有限样本中的稳健性能(特别是在估计偏差和置信区间覆盖方面),并将其应用于一项COVID-19血浆样本的纵向蛋白质组学研究,以探究与疾病严重程度相关的蛋白质组学特征。