Predicting the human burden of vector-borne diseases from limited surveillance data remains a major challenge, particularly in the presence of nonlinear transmission dynamics and delayed effects arising from vector ecology and human behavior. We develop a data-driven framework based on an extension of Sparse Identification of Nonlinear Dynamics (SINDy) to systems with distributed memory, enabling discovery of transmission mechanisms directly from time series data. Using severe fever with thrombocytopenia syndrome (SFTS) as a case study, we show that this approach can uncover key features of tick-borne disease dynamics using only human incidence and local temperature data, without imposing predefined assumptions on human case reporting. We further demonstrate that predictive performance is substantially enhanced when the data-driven model is coupled with mechanistic representations of tick-host transmission pathways informed by empirical studies. The framework supports systematic sensitivity analysis of memory kernels and behavioral parameters, identifying those most influential for prediction accuracy. Although the approach prioritizes predictive accuracy over mechanistic transparency, it yields sparse, interpretable integral representations suitable for epidemiological forecasting. This hybrid methodology provides a scalable strategy for forecasting vector-borne disease risk and informing public health decision-making under data limitations.
翻译:从有限的监测数据预测媒介传播疾病对人类造成的负担仍然是一个重大挑战,特别是在存在非线性传播动力学以及由媒介生态和人类行为引起的延迟效应的情况下。我们开发了一个数据驱动框架,该框架基于对具有分布式记忆的系统进行非线性动力学稀疏辨识(SINDy)的扩展,从而能够直接从时间序列数据中发现传播机制。以发热伴血小板减少综合征(SFTS)作为案例研究,我们表明,仅使用人类发病率和当地温度数据,该方法就能揭示蜱传疾病动力学的关键特征,而无需对人类病例报告施加预定义的假设。我们进一步证明,当数据驱动模型与基于实证研究的蜱-宿主传播途径机制表征相结合时,预测性能得到显著提升。该框架支持对记忆核和行为参数进行系统的敏感性分析,从而识别出对预测准确性影响最大的因素。尽管该方法优先考虑预测准确性而非机制透明度,但它产生了适用于流行病学预测的稀疏、可解释的积分表示。这种混合方法为在数据有限的情况下预测媒介传播疾病风险和指导公共卫生决策提供了一种可扩展的策略。