In online experimentation, appropriate metrics (e.g., purchase) provide strong evidence to support hypotheses and enhance the decision-making process. However, incomplete metrics are frequently occurred in the online experimentation, making the available data to be much fewer than the planned online experiments (e.g., A/B testing). In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a clustering-based imputation method using $k$-nearest neighbors. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation of large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method is compared to several conventional methods in both simulation studies and a real online experiment at eBay.
翻译:在在线实验中,合适的指标(如购买行为)能为假设提供有力证据并增强决策过程。然而,在线实验中常出现指标数据不完整的情况,导致可用数据远少于计划中的在线实验(如A/B测试)预期数据量。本研究引入"流失购买者"概念,将具有不完整指标值的用户分为两类:访客和流失购买者。针对不完整指标的分析,我们提出一种基于聚类的填补方法,采用$k$近邻算法。该方法同时考虑了实验特定特征与用户购物路径中的行为活动,可为不同用户生成差异化的填补值。为高效处理在线实验中的大规模数据集,本方法结合了分层采样与聚类技术。通过模拟实验及eBay真实在线实验的对比验证,本方法相较于多种传统方法展现出更优性能。