Studying unified model averaging estimation for situations with complicated data structures, we propose a novel model averaging method based on cross-validation (MACV). MACV unifies a large class of new and existing model averaging estimators and covers a very general class of loss functions. Furthermore, to reduce the computational burden caused by the conventional leave-subject/one-out cross validation, we propose a SEcond-order-Approximated Leave-one/subject-out (SEAL) cross validation, which largely improves the computation efficiency. In the context of non-independent and non-identically distributed random variables, we establish the unified theory for analyzing the asymptotic behaviors of the proposed MACV and SEAL methods, where the number of candidate models is allowed to diverge with sample size. To demonstrate the breadth of the proposed methodology, we exemplify four optimal model averaging estimators under four important situations, i.e., longitudinal data with discrete responses, within-cluster correlation structure modeling, conditional prediction in spatial data, and quantile regression with a potential correlation structure. We conduct extensive simulation studies and analyze real-data examples to illustrate the advantages of the proposed methods.
翻译:针对复杂数据结构情形下的统一模型平均估计问题,本文提出了一种基于交叉验证的新型模型平均方法(MACV)。MACV统一了包括新方法与现有方法在内的一大类模型平均估计量,并涵盖了一类非常通用的损失函数。此外,为减轻传统留一/留主体交叉验证带来的计算负担,我们提出了二阶近似留一/留主体(SEAL)交叉验证方法,显著提升了计算效率。在非独立非同分布随机变量的背景下,我们建立了用于分析所提MACV与SEAL方法渐近行为的统一理论框架,其中允许候选模型数量随样本量发散。为展示所提方法论的广泛适用性,我们例举了四种重要场景下的最优模型平均估计量,即:离散响应纵向数据、簇内相关结构建模、空间数据条件预测以及具有潜在相关结构的分位数回归。我们开展了大量模拟研究并分析了实际数据案例,以阐明所提方法的优势。