MAC: A Conversion Rate Prediction Benchmark Featuring Labels Under Multiple Attribution Mechanisms

Multi-attribution learning (MAL), which enhances model performance by learning from conversion labels yielded by multiple attribution mechanisms, has emerged as a promising learning paradigm for conversion rate (CVR) prediction. However, the conversion labels in public CVR datasets are generated by a single attribution mechanism, hindering the development of MAL approaches. To address this data gap, we establish the Multi-Attribution Benchmark (MAC), the first public CVR dataset featuring labels from multiple attribution mechanisms. Besides, to promote reproducible research on MAL, we develop PyMAL, an open-source library covering a wide array of baseline methods. We conduct comprehensive experimental analyses on MAC and reveal three key insights: (1) MAL brings consistent performance gains across different attribution settings, especially for users featuring long conversion paths. (2) The performance growth scales up with objective complexity in most settings; however, when predicting first-click conversion targets, simply adding auxiliary objectives is counterproductive, underscoring the necessity of careful selection of auxiliary objectives. (3) Two architectural design principles are paramount: first, to fully learn the multi-attribution knowledge, and second, to fully leverage this knowledge to serve the main task. Motivated by these findings, we propose Mixture of Asymmetric Experts (MoAE), an effective MAL approach incorporating multi-attribution knowledge learning and main task-centric knowledge utilization. Experiments on MAC show that MoAE substantially surpasses the existing state-of-the-art MAL method. We believe that our benchmark and insights will foster future research in the MAL field. Our MAC benchmark and the PyMAL algorithm library are publicly available at https://github.com/alimama-tech/PyMAL.

翻译：多归因学习（MAL）通过学习由多种归因机制产生的转化标签来提升模型性能，已成为转化率（CVR）预测领域一种前景广阔的学习范式。然而，现有公开CVR数据集中的转化标签仅由单一归因机制生成，这阻碍了MAL方法的发展。为填补这一数据空白，我们建立了多归因基准（MAC），这是首个包含多种归因机制标签的公开CVR数据集。此外，为促进MAL领域的可复现研究，我们开发了开源算法库PyMAL，其中涵盖了广泛的基线方法。我们在MAC上进行了全面的实验分析，并揭示了三个关键发现：（1）MAL能在不同归因设置下带来一致的性能提升，尤其对于具有长转化路径的用户。（2）在多数设置中，性能提升随目标复杂度增加而扩大；然而，在预测首次点击转化目标时，简单地添加辅助目标会产生负面效果，这凸显了谨慎选择辅助目标的必要性。（3）两个架构设计原则至关重要：首先，要充分学习多归因知识；其次，要充分利用该知识服务于主任务。基于这些发现，我们提出了非对称专家混合模型（MoAE），这是一种融合多归因知识学习和以主任务为中心的知识利用的有效MAL方法。在MAC上的实验表明，MoAE显著超越了现有的最先进MAL方法。我们相信，本基准与相关发现将推动MAL领域的未来研究。MAC基准与PyMAL算法库已公开于https://github.com/alimama-tech/PyMAL。