Can Offline A/B Testing Be Automated for Data-Driven Requirement Engineering?

Online A/B testing has been widely used by software companies to evaluate the impact of new technologies by offering it to a groups of users and comparing against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation and stakeholders' approval to be served in production, but also several weeks to collect the data in iterations. To address these issues, a recent emerging topic, called \textit{offline A/B testing}, is getting increasing attention, with the goal to conduct offline evaluation of a new technology by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on new data. In response, in this vision paper, we introduce AutoOffAB, an idea to automatically runs variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.

翻译：在线A/B测试已被软件公司广泛应用，通过向部分用户群体提供新技术并与未修改产品进行对比，评估新技术的影响。然而，实施在线A/B测试不仅需要设计、实现和获得利益相关者批准以投入生产环境，还需花费数周时间迭代收集数据。为解决这些问题，近期兴起的"离线A/B测试"研究正获得日益关注，其目标是通过估计历史记录数据对新技术进行离线评估。尽管该方法因实施成本低、反馈周期短且不会对用户造成潜在损害而颇具前景，但若要有效优先作为需求在实践应用中落地，仍需解决若干局限性问题，包括其与在线A/B测试结果的差异，以及缺乏对新数据的系统性更新机制。针对这些挑战，本愿景论文提出AutoOffAB理念，通过自动对最新日志数据运行多种离线A/B测试变体，持续更新离线评估结果，从而更可靠、更系统地支撑需求决策。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日