Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders' approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called "Offline A/B Testing", is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.
翻译:软件公司已广泛采用在线A/B测试来评估新技术的影响,其方法是将技术提供给用户群体并与未修改的产品进行比较。然而,实施在线A/B测试不仅需要在设计、实现和利益相关者批准投入生产方面付出努力,还需要数周时间以迭代方式收集数据。为解决这些问题,一个新兴课题——"离线A/B测试"正受到越来越多的关注,其目标是通过分析历史日志数据对新技术的效果进行离线评估。尽管该方法因实施成本较低、周转时间更快且不会对用户造成潜在损害而前景广阔,但要使其在实践中作为需求被有效优先考虑,仍需解决若干局限性,包括与在线A/B测试结果的差异,以及对变化数据和参数缺乏系统性更新。为此,本愿景论文提出AutoOffAB构想:通过针对近期日志数据自动运行多版本离线A/B测试并持续更新评估结果,使需求决策过程更可靠、更系统化。