Multiple Imputation Approaches for Epoch-level Accelerometer data in Trials

Clinical trials that investigate interventions on physical activity often use accelerometers to measure step count at a very granular level, often in 5-second epochs. Participants typically wear the accelerometer for a week-long period at baseline, and for one or more week-long follow-up periods after the intervention. The data is usually aggregated to provide daily or weekly step counts for the primary analysis. Missing data are common as participants may not wear the device as per protocol. Approaches to handling missing data in the literature have largely defined missingness on the day level using a threshold on daily wear time, which leads to loss of information on the time of day when data are missing. We propose an approach to identifying and classifying missingness at the finer epoch-level, and then present two approaches to handling missingness. Firstly, we present a parametric approach which takes into account the number of missing epochs per day. Secondly, we describe a non-parametric approach to Multiple Imputation (MI) where missing periods during the day are replaced by donor data from the same person where possible, or data from a different person who is matched on demographic and physical activity-related variables. Our simulation studies comparing these approaches in a number of settings show that the non-parametric approach leads to estimates of the effect of treatment that are least biased while maintaining small standard errors. We illustrate the application of these different MI strategies to the analysis of the 2017 PACE-UP Trial. The proposed framework of classifying missingness and applying MI at the epoch-level is likely to be applicable to a number of different outcomes and data from other wearable devices.

翻译：在探究体力活动干预效果的临床试验中，通常使用加速度计以极细粒度（常为5秒时段）测量步数。受试者通常在基线和干预后一个或多个为期一周的随访期内佩戴加速度计。数据通常汇总为每日或每周步数以进行主要分析。由于受试者可能未按方案佩戴设备，缺失数据十分常见。现有文献中处理缺失数据的方法多基于每日佩戴时长阈值定义日级别缺失，这导致丢失了数据缺失具体时段的信息。我们提出了一种在更精细时段级别识别和分类缺失数据的方法，并介绍了两种处理缺失数据的方法。首先，我们提出一种考虑每日缺失时段数量的参数化方法；其次，我们描述了一种非参数化的多重插补（MI）方法——在日间缺失时段优先使用同一受试者的供体数据替换，若无数据则使用人口统计学和体力活动相关变量匹配的其他受试者数据。通过多场景模拟研究比较，非参数化方法在保持较小标准误的同时，能产生偏差最小的治疗效果估计值。我们以2017年PACE-UP试验为例，展示了这些不同MI策略在实际分析中的应用。本文提出的时段级缺失分类与MI应用框架，很可能适用于来自其他可穿戴设备的多类结局指标及数据。