Experimentation Accelerator: Interpretable Insights and Creative Recommendations for A/B Testing with Content-Aware ranking

Modern online experimentation faces two bottlenecks: scarce traffic forces tough choices on which variants to test, and post-hoc insight extraction is manual, inconsistent, and often content-agnostic. Meanwhile, organizations underuse historical A/B results and rich content embeddings that could guide prioritization and creative iteration. We present a unified framework to (i) prioritize which variants to test, (ii) explain why winners win, and (iii) surface targeted opportunities for new, higher-potential variants. Leveraging treatment embeddings and historical outcomes, we train a CTR ranking model with fixed effects for contextual shifts that scores candidates while balancing value and content diversity. For better interpretability and understanding, we project treatments onto curated semantic marketing attributes and re-express the ranker in this space via a sign-consistent, sparse constrained Lasso, yielding per-attribute coefficients and signed contributions for visual explanations, top-k drivers, and natural-language insights. We then compute an opportunity index combining attribute importance (from the ranker) with under-expression in the current experiment to flag missing, high-impact attributes. Finally, LLMs translate ranked opportunities into concrete creative suggestions and estimate both learning and conversion potential, enabling faster, more informative, and more efficient test cycles. These components have been built into a real Adobe product, called \textit{Experimentation Accelerator}, to provide AI-based insights and opportunities to scale experimentation for customers. We provide an evaluation of the performance of the proposed framework on some real-world experiments by Adobe business customers that validate the high quality of the generation pipeline.

翻译：现代在线实验面临两大瓶颈：稀缺流量迫使测试者艰难选择待测变体，而事后洞察提取过程依赖人工、缺乏一致性且往往忽略内容特征。与此同时，组织未能充分利用历史A/B测试结果与丰富的内容嵌入特征来指导测试优先级排序与创意迭代。本文提出一个统一框架，旨在：(i) 确定待测变体的优先级，(ii) 解释优胜变体获胜原因，(iii) 发掘具有更高潜力的新变体的定向机会。通过融合处理嵌入特征与历史结果数据，我们训练了一个带有情境偏移固定效应的CTR排序模型，该模型在平衡价值与内容多样性的同时对候选变体进行评分。为提升可解释性与理解度，我们将处理方案投影至精选的语义营销属性空间，并通过符号一致、稀疏约束的Lasso方法在该空间中重构排序器，从而获得各属性系数与符号化贡献度，以支持可视化解释、Top-k驱动因子识别及自然语言洞察生成。随后，我们结合属性重要性（来自排序器）与当前实验中的属性低表达度，计算机会指数以识别缺失的高影响力属性。最后，利用大语言模型将排序后的机会转化为具体创意建议，并预估学习潜力与转化潜力，从而实现更快速、信息更丰富且更高效的测试循环。这些组件已集成至Adobe实际产品——\textit{实验加速器}中，为客户提供基于人工智能的洞察与规模化实验机会。我们通过对Adobe商业客户真实实验数据的评估，验证了所提出框架生成流程的高质量性能。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

对抗性实验：利用敏感性分析、邻域搜索启发式算法和概率性想定生成来暴露人工智能弱点 | 2025最新83页

专知会员服务

30+阅读 · 2025年10月21日

机载电子战管理系统（EWMS）开发《加强测试和评估流程：实施敏捷开发、测试自动化和基于模型的系统工程概念》180页

专知会员服务

60+阅读 · 2024年3月17日

【2023新书】实用A/B测试:创建实验驱动的产品，255页pdf

专知会员服务

35+阅读 · 2023年11月7日

《学习型系统的测试与评估》

专知会员服务

61+阅读 · 2023年3月12日