Physical activity is crucial for human health. With the increasing availability of large-scale mobile health data, strong associations have been found between physical activity and various diseases. However, accurately capturing this complex relationship is challenging, possibly because it varies across different subgroups of subjects, especially in large-scale datasets. To fill this gap, we propose a generalized heterogeneous functional method which simultaneously estimates functional effects and identifies subgroups within the generalized functional regression framework. The proposed method captures subgroup-specific functional relationships between physical activity and diseases, providing a more nuanced understanding of these associations. Additionally, we develop a pre-clustering method that enhances computational efficiency for large-scale data through a finer partition of subjects compared to true subgroups. We further introduce a testing procedure to assess whether the different subgroups exhibit distinct functional effects. In the real data application, we examine the impact of physical activity on the risk of dementia using the UK Biobank dataset, which includes over 96,433 participants. Our proposed method outperforms existing methods in future-day prediction accuracy, identifying three distinct subgroups, with detailed scientific interpretations for each subgroup. We also demonstrate the theoretical consistency of our methods. Codes implementing the proposed method are available at: https://github.com/xiaojing777/GHFM.
翻译:体力活动对人类健康至关重要。随着大规模移动健康数据的日益普及,研究发现体力活动与多种疾病之间存在显著关联。然而,准确捕捉这种复杂关系具有挑战性,这可能是因为这种关系在不同亚组的受试者之间存在差异,尤其是在大规模数据集中。为填补这一空白,我们提出了一种广义异质函数方法,该方法在广义函数回归框架内同时估计函数效应并识别亚组。所提出的方法捕捉了体力活动与疾病之间特定于亚组的函数关系,为这些关联提供了更细致的理解。此外,我们开发了一种预聚类方法,通过与真实亚组相比对受试者进行更精细的划分,提高了大规模数据的计算效率。我们进一步引入了一种检验程序,以评估不同亚组是否表现出不同的函数效应。在实际数据应用中,我们使用包含超过96,433名参与者的英国生物银行数据集,研究了体力活动对痴呆症风险的影响。我们提出的方法在未来日预测准确性方面优于现有方法,识别出三个不同的亚组,并为每个亚组提供了详细的科学解释。我们还证明了我们方法的理论一致性。实现所提方法的代码可在以下网址获取:https://github.com/xiaojing777/GHFM。