In one extension of scalar-on-function regression modeling, the covariate is taken to be a density that is estimated from a finite number of measurements gathered for each observational unit. When this number of measurements is relatively small, the estimated coefficient function suffers from attenuation bias. This paper studies how the bias depends on the number of measurements per unit and proposes a bias-correction method based on simulation extrapolation (SIMEX). We establish that the bias decreases monotonically as the number of measurements per unit increases. The proposed SIMEX procedure applies bootstrap resampling to simulate smaller measurement counts and then extrapolates to infinitely many measurements, thereby correcting finite-measurement bias. A comprehensive simulation study, conducted over a range of sample sizes and noise levels, shows that the mean integrated squared error of the coefficient function decreases with more measurements per unit and that the SIMEX-extrapolated estimates achieve lower bias than the naive estimates based on the full set of measurements. The practical utility of the method is further illustrated through an application to the National Health and Nutrition Examination Survey, for which we relate 24-hour physical activity profiles to all-cause mortality. This example supports the validity of the method and demonstrates its ability to detect and correct for finite-measurement bias.
翻译:在标量对函数回归建模的一种扩展中,协变量被视为由每个观测单元收集的有限数量测量值估计得到的密度。当测量数量相对较少时,估计的系数函数会出现衰减偏差。本文研究了偏差如何随每个单元的测量数量变化,并提出了基于模拟外推(SIMEX)的偏差校正方法。我们证明了偏差随每个单元测量数量的增加而单调递减。所提出的SIMEX方法通过自举重抽样模拟较少的测量数量,然后外推至无限多个测量,从而校正有限测量偏差。在一系列样本量和噪声水平下进行的综合模拟研究表明,系数函数的平均积分均方误差随每个单元测量数量的增加而减小,并且基于SIMEX外推的估计值比基于全部测量值的朴素估计实现了更低的偏差。通过将方法应用于美国国家健康与营养调查数据,进一步说明了其实用性——我们将24小时体力活动特征与全因死亡率相关联。该案例验证了方法的有效性,并展示了其检测和校正有限测量偏差的能力。