Diverse and enriched data sources are essential for commercial ads-recommendation models to accurately assess user interest both before and after engagement with content. While extended user-engagement histories can improve the prediction of user interests, it is equally important to embed activity sequences from multiple sources to ensure freshness of user and ad-representations, following scaling law principles. In this paper, we present a novel three-dimensional framework for enhancing user-ad representations without increasing model inference or serving complexity. The first dimension examines the impact of incorporating diverse event sources, the second considers the benefits of longer user histories, and the third focuses on enriching data with additional event attributes and multi-modal embeddings. We assess the return on investment (ROI) of our source enrichment framework by comparing organic user engagement sources, such as content viewing, with ad-impression sources. The proposed method can boost the area under curve (AUC) and the slope of scaling curves for ad-impression sources by 1.56 to 2 times compared to organic usage sources even for short online-sequence lengths of 100 to 10K. Additionally, click-through rate (CTR) prediction improves by 0.56% AUC over the baseline production ad-recommendation system when using enriched ad-impression event sources, leading to improved sequence scaling resolutions for longer and offline user-ad representations.
翻译:多样且丰富的数据源对于商业广告推荐模型在用户与内容交互前后准确评估用户兴趣至关重要。虽然扩展的用户参与历史可以改进用户兴趣预测,但根据缩放定律原则,嵌入来自多个来源的活动序列以确保用户和广告表征的新鲜度同样重要。本文提出了一种新颖的三维框架,用于增强用户-广告表征,同时不增加模型推理或服务复杂度。第一维度考察纳入不同事件源的影响,第二维度考虑更长用户历史记录的益处,第三维度侧重于通过额外事件属性和多模态嵌入来丰富数据。我们通过比较有机用户参与源(如内容浏览)与广告展示源,来评估我们源增强框架的投资回报率。即使在线序列长度较短(100至10K),所提方法也能使广告展示源的曲线下面积和缩放曲线斜率相比有机使用源提升1.56至2倍。此外,当使用增强的广告展示事件源时,点击率预测的AUC比基线生产广告推荐系统提高了0.56%,从而为更长和离线的用户-广告表征实现了改进的序列缩放分辨率。