Research on modeling the distributional aspects in sensor-based digital health (sDHT) data has grown significantly in recent years. Most existing approaches focus on using individual-specific density or quantile functions. However, there has been limited exploration to assess the practical utility of alternative distributional representations in clinical contexts collecting sDHT data. This study is motivated by accelerometry data collected on 246 individuals with multiple sclerosis (MS)representing a wide range of disability (Expanded Disability Status Scale, EDSS: 0-7). We consider five different individual-level distributional representations of minute-level activity counts: density, survival, hazard, quantile, and total time on test functions. For each of the five distributional representations, scalar-on-function regression fits linear discriminators for binary and continuously measured MS disability, and cross-validated discriminatory performance of these linear discriminators is compared across. The results show that individual-level hazard functions provide the highest discriminatory accuracy, more than double the accuracy compared to density functions. Individual-level quantile functions provided the second-highest discriminatory accuracy. These findings highlight the importance of focusing on distributional representations that capture the tail behavior of distributions when analyzing digital health data, especially in clinical contexts.
翻译:近年来,基于传感器的数字健康(sDHT)数据分布特征建模研究显著增长。现有方法大多聚焦于使用个体特异性密度函数或分位数函数。然而,在收集sDHT数据的临床环境中,对其他分布表示形式实际效用的探索仍较为有限。本研究基于246名多发性硬化症(MS)患者的加速度计数据展开,这些患者具有广泛的残疾程度分布(扩展残疾状态量表EDSS评分范围:0-7)。我们考虑了分钟级活动计数的五种个体层面分布表示形式:密度函数、生存函数、危险函数、分位数函数及总时间检验函数。针对每种分布表示,通过标量对函数回归拟合二元及连续测量的MS残疾线性判别器,并通过交叉验证比较这些线性判别器的鉴别性能。结果表明,个体层面的危险函数具有最高的鉴别准确率,其准确率较密度函数提升超过一倍。个体层面的分位数函数则表现出第二高的鉴别准确率。这些发现强调了在分析数字健康数据时,应重点关注能够捕捉分布尾部行为的分布表示形式,在临床研究背景下尤为如此。