This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models, including the Bass model (used in modeling consumer adoption) and the SIR model (used in modeling epidemics). We show that one cannot hope to learn such models until quite late in the diffusion. Specifically, we show that the time required to collect a number of observations that exceeds our sample complexity lower bounds is large. For Bass models with low innovation rates, our results imply that one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. This lower bound in estimation further translates into a lower bound in regret for decision-making in epidemic interventions. Our results formalize the challenge of accurate forecasting and highlight the importance of incorporating additional data sources. To this end, we analyze the benefit of a seroprevalence study in an epidemic, where we characterize the size of the study needed to improve SIR model estimation. Extensive empirical analyses on product adoption and epidemic data support our theoretical findings.
翻译:本文首次为简单扩散模型的估计提供了样本复杂度下界,涵盖巴斯模型(用于建模消费者采纳)和SIR模型(用于建模流行病传播)。我们证明,在扩散过程较晚阶段之前,无法期望有效学习此类模型。具体而言,我们表明,收集超过样本复杂度下界所需观测值的时间较长。对于低创新率的巴斯模型,我们的结果意味着:在对新采纳者增长率达到峰值的时间点完成至少三分之二之前,无法预测最终采纳客户数量。类似地,对于SIR模型,在感染率峰值时间点完成约三分之二之前,无法预测最终感染人数。这一估计下界进一步转化为流行病干预决策中遗憾值下界。我们的研究形式化了精确预测的挑战,并突显了整合额外数据源的重要性。为此,我们分析了流行病血清阳性率研究的价值,量化了改进SIR模型估计所需的研究规模。基于产品采纳与流行病数据的大量实证分析支持了我们的理论发现。