Researchers across different fields, including but not limited to ecology, biology, and healthcare, often face the challenge of sparse data. Such sparsity can lead to uncertainties, estimation difficulties, and potential biases in modeling. Here we introduce a novel data augmentation method that combines multiple sparse time series datasets when they share similar statistical properties, thereby improving parameter estimation and model selection reliability. We demonstrate the effectiveness of this approach through validation studies comparing Hawkes and Poisson processes, followed by application to subjective pain dynamics in patients with sickle cell disease (SCD), a condition affecting millions worldwide, particularly those of African, Mediterranean, Middle Eastern, and Indian descent.
翻译:不同领域的研究人员,包括但不限于生态学、生物学和医疗保健领域,常常面临数据稀疏的挑战。这种稀疏性可能导致建模过程中的不确定性、估计困难以及潜在的偏差。本文介绍了一种新颖的数据增强方法,该方法在多个稀疏时间序列数据集具有相似统计特性时将其结合,从而改善参数估计并提高模型选择的可靠性。我们通过比较霍克斯过程和泊松过程的验证研究,证明了该方法的有效性,随后将其应用于全球数百万人(尤其是非洲、地中海、中东和印度裔人群)所患的镰状细胞病(SCD)患者的主观疼痛动态分析中。