Big data is ubiquitous in practices, and it has also led to heavy computation burden. To reduce the calculation cost and ensure the effectiveness of parameter estimators, an optimal subset sampling method is proposed to estimate the parameters in marginal models with massive longitudinal data. The optimal subsampling probabilities are derived, and the corresponding asymptotic properties are established to ensure the consistency and asymptotic normality of the estimator. Extensive simulation studies are carried out to evaluate the performance of the proposed method for continuous, binary and count data and with four different working correlation matrices. A depression data is used to illustrate the proposed method.
翻译:大数据在实际应用中无处不在,同时也带来了沉重的计算负担。为降低计算成本并确保参数估计的有效性,本文提出了一种针对海量纵向数据边际模型参数估计的最优子集抽样方法。推导了最优子抽样概率,并建立了相应的渐近性质,以确保估计量的一致性和渐近正态性。通过大量模拟研究,评估了所提方法在连续型、二值型和计数型数据以及四种不同工作相关矩阵下的性能。最后,采用抑郁数据实例验证了该方法的有效性。