The association between multidimensional exposure patterns and outcomes is commonly investigated by first applying cluster analysis algorithms to derive patterns and then estimating the associations. However, errors in the underlying continuous, possibly skewed, exposure variables lead to misclassified exposure patterns and therefore to biased effect estimates. This is often the case for lifestyle exposures in epidemiology, e.g. for dietary variables measured on daily basis. We introduce three new algorithms for correcting the biased effect estimates, which are based on regression calibration (RC), simulation extrapolation (SIMEX) and multiple imputation (MI). In addition, the naive method ignoring the measurement error structure is considered for comparison. These methods are combined with the k-means cluster algorithm and the Gaussian mixture model to derive exposure patterns. The performance of the correction methods is compared in a simulation study regarding absolute, maximum and relative bias. The simulated data mimic a typical situation in nutritional epidemiology when diet is assessed using repeated 24-hour dietary recalls. Continuous and binary outcomes are considered. Simulation results show, that the correction method based on RC and MI perform better than the naive and the SIMEX-based method. Furthermore, the MI-based approach, which can use outcome information in the error model, is superior to the RC-based approach in most scenarios. Therefore, we recommend using the MI-based approach.
翻译:多维暴露模式与结局之间的关联通常通过先应用聚类分析算法推导模式,再估计关联性进行研究。然而,基础连续(可能呈偏态分布)暴露变量中的测量误差会导致暴露模式分类错误,进而产生有偏的效应估计值。这在流行病学中尤为常见,例如每日测量的膳食变量等生活方式暴露领域。我们提出三种基于回归校准(RC)、模拟外推(SIMEX)和多重插补(MI)的新算法以校正有偏的效应估计值。此外,还纳入忽略测量误差结构的朴素方法作为比较基准。这些方法与k均值聚类算法和高斯混合模型相结合以推导暴露模式。通过模拟研究比较了各校正方法在绝对偏倚、最大偏倚和相对偏倚方面的性能表现。模拟数据模拟了营养流行病学中采用重复24小时膳食回顾法评估饮食摄入时的典型情境,同时考虑了连续型和二元结局变量。模拟结果表明,基于RC和MI的校正方法优于朴素法和SIMEX法。此外,基于MI的方法(可在误差模型中利用结局信息)在多数场景下优于RC方法。因此,我们推荐使用基于MI的校正方法。