Research on modeling the distributional aspects in sensor-based digital health (sDHT) data has grown significantly in recent years. Most existing approaches focus on using individual-specific density or quantile functions. However, there has been limited exploration to assess the practical utility of alternative distributional representations in clinical contexts collecting sDHT data. This study is motivated by accelerometry data collected on 246 individuals with multiple sclerosis (MS) representing a wide range of disability (Expanded Disability Status Scale, EDSS: 0-7). We consider five different individual-level distributional representations of minute-level activity counts: density, survival, hazard, quantile, and total time on test functions. For each of the five distributional representations, scalar-on-function regression fits linear discriminators for binary and continuously measured MS disability, and cross-validated discriminatory performance of these linear discriminators is compared across. The results show that individual-level hazard functions provide the highest discriminatory accuracy, more than double the accuracy compared to density functions. Individual-level quantile functions provided the second-highest discriminatory accuracy. These findings highlight the importance of focusing on distributional representations that capture the tail behavior of distributions when analyzing digital health data, especially in clinical contexts.
翻译:近年来,基于传感器的数字健康(sDHT)数据中分布特征的建模研究显著增长。现有方法大多聚焦于使用个体特异性密度函数或分位数函数。然而,对于在收集sDHT数据的临床环境中评估替代性分布表示的实际效用,相关探索仍然有限。本研究基于246名多发性硬化症(MS)患者采集的加速度计数据展开,这些患者代表了广泛的残疾程度(扩展残疾状态量表,EDSS:0-7)。我们考虑了分钟级活动计数的五种不同个体层面分布表示:密度函数、生存函数、危害函数、分位数函数和总时间检验函数。针对这五种分布表示,标量对函数回归分别拟合了针对二元和连续测量MS残疾的线性判别器,并通过交叉验证比较了这些线性判别器的判别性能。结果表明,个体层面的危害函数提供了最高的判别准确率,其准确率是密度函数的两倍以上。个体层面的分位数函数提供了第二高的判别准确率。这些发现强调了在分析数字健康数据时,尤其是在临床背景下,应重点关注能够捕捉分布尾部行为的分布表示方法。