Bid shading plays a crucial role in Real-Time Bidding (RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations research techniques, are constrained by unimodal assumptions that fail to adapt for non-convex surplus curves and are vulnerable to cascading errors in sequential workflows. Additionally, existing discretization models of continuous values ignore the dependence between discrete intervals, reducing the model's error correction ability, while sample selection bias in bidding scenarios presents further challenges for prediction. To address these issues, this paper introduces Generative Bid Shading (GBS), which comprises two primary components: 1) an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors; and 2) a reward preference alignment system, which incorporates a channel-aware hierarchical dynamic network (CHNet) as the reward model to extract fine-grained features, along with modules for surplus optimization and exploration utility reward alignment, ultimately optimizing both short-term and long-term surplus using group relative policy optimization (GRPO). Extensive experiments on both offline and online A/B tests validate GBS's effectiveness. Moreover, GBS has been deployed on the Meituan DSP platform, serving billions of bid requests daily.
翻译:出价调优在实时竞价(RTB)中发挥着至关重要的作用,它通过自适应调整出价来避免广告主超支。现有的主流两阶段方法——首先建模出价分布,然后利用运筹学技术优化盈余——受限于单峰假设,无法适应非凸的盈余曲线,且在串行工作流中容易产生级联误差。此外,现有对连续值的离散化模型忽略了离散区间之间的依赖关系,降低了模型的纠错能力,而竞价场景中的样本选择偏差也给预测带来了进一步挑战。为解决这些问题,本文提出了生成式出价调优(GBS),其包含两个主要组成部分:1)一个端到端的生成模型,利用自回归方法通过逐步残差生成调优比率,无需依赖预定义先验即可捕获复杂的价值依赖关系;2)一个奖励偏好对齐系统,该系统采用通道感知的层次动态网络(CHNet)作为奖励模型以提取细粒度特征,并结合盈余优化模块和探索效用奖励对齐模块,最终使用组相对策略优化(GRPO)来同时优化短期和长期盈余。大量的离线和在线A/B测试实验验证了GBS的有效性。此外,GBS已在美团DSP平台部署,每日处理数十亿次出价请求。