Identifying Sample Size and Accuracy and Precision of the Estimators in Case-Crossover Designs with Distributed Lags of Heteroskedastic Time-Varying Continuous Exposures Measured with Simple or Complex Error

2024 年 6 月 15 日

翻译：识别具有简单或复杂误差的异方差时变连续暴露分布滞后病例交叉设计中的样本量及估计量准确度与精确度

Honghyok Kim

from arxiv, Submitted for peer-reviewed publication

Understanding of sample size, statistical power, and the accuracy and precision of the estimator in epidemiological research can facilitate power and bias analyses. However, such understanding can become complicated for several reasons. First, exposures varying spatiotemporally may be heteroskedastic. Second, distributed lags of exposures may be used to identify critical exposure time-windows. Third, exposure measurement error may exist, impacting the accuracy and/or precision of the estimator that consequently affects sample size and statistical power. Fourth, research may rely on different study designs, so understanding may differ. For example, case-crossover designs as matched case-control designs, are used to estimate health effects of short-term exposures. To address these gaps, I developed approximation equations for sample size, estimates of the estimators and standard errors, including polynomials for non-linear effect estimation. With air pollution exposure estimates, I examined approximations using statistical simulations. Overall, sample size, the accuracy and precision of the estimators can be approximated based on external information about validation, without validation data in hand. For distributed lags, approximations may perform well if residual confounding due to covariate measurement errors is not severe. This condition may be difficult to identify without validation data, so validation research is recommended in identifying critical exposure time-windows.

翻译：理解流行病学研究中的样本量、统计功效以及估计量的准确度与精确度，有助于进行功效与偏倚分析。然而，这种理解可能因多种原因而变得复杂。首先，时空变化的暴露可能存在异方差性。其次，暴露的分布滞后可用于识别关键暴露时间窗口。第三，暴露测量误差可能存在，影响估计量的准确度和/或精确度，进而影响样本量与统计功效。第四，研究可能依赖不同的研究设计，因此理解可能有所差异。例如，病例交叉设计作为匹配病例对照设计，用于估计短期暴露的健康效应。为填补这些空白，我推导了样本量、估计量及其标准误的近似方程，包括用于非线性效应估计的多项式。利用空气污染暴露估计，我通过统计模拟检验了这些近似方法。总体而言，样本量、估计量的准确度与精确度可基于外部验证信息进行近似，而无需实际掌握验证数据。对于分布滞后模型，若协变量测量误差导致的残余混杂不严重，近似方法可能表现良好。若无验证数据，此条件可能难以识别，因此在识别关键暴露时间窗口时建议开展验证研究。