Accurately predicting the onset of specific activities within defined timeframes holds significant importance in several applied contexts. In particular, accurate prediction of the number of future users that will be exposed to an intervention is an important piece of information for experimenters running online experiments (A/B tests). In this work, we propose a novel approach to predict the number of users that will be active in a given time period, as well as the temporal trajectory needed to attain a desired user participation threshold. We model user activity using a Bayesian nonparametric approach which allows us to capture the underlying heterogeneity in user engagement. We derive closed-form expressions for the number of new users expected in a given period, and a simple Monte Carlo algorithm targeting the posterior distribution of the number of days needed to attain a desired number of users; the latter is important for experimental planning. We illustrate the performance of our approach via several experiments on synthetic and real world data, in which we show that our novel method outperforms existing competitors.
翻译:在特定时间窗内精确预测特定活动的起始,在多个应用场景中具有重要意义。特别地,对于开展在线实验(A/B测试)的研究人员而言,准确预测未来将暴露于干预措施的用户数量是一项关键信息。本文提出了一种新颖方法,既能预测给定时间段内活跃用户的数量,也能预测达到期望用户参与阈值所需的时间轨迹。我们采用贝叶斯非参数方法对用户活动进行建模,从而捕捉用户参与度中的潜在异质性。我们推导出给定周期内预期新用户数量的闭式表达式,并设计了一个简单的蒙特卡洛算法,用于推断达到期望用户数量所需天数的后验分布——后者对实验规划至关重要。通过在合成数据与真实数据上的多项实验,我们展示了该方法的表现,结果表明所提出的新方法优于现有竞争对手。