Bid shading plays a crucial role in Real-Time Bidding (RTB) by adaptively adjusting the bid to avoid advertisers overspending. Existing mainstream two-stage methods, which first model bid landscapes and then optimize surplus using operations research techniques, are constrained by unimodal assumptions that fail to adapt for non-convex surplus curves and are vulnerable to cascading errors in sequential workflows. Additionally, existing discretization models of continuous values ignore the dependence between discrete intervals, reducing the model's error correction ability, while sample selection bias in bidding scenarios presents further challenges for prediction. To address these issues, this paper introduces Generative Bid Shading (GBS), which comprises two primary components: 1) an end-to-end generative model that utilizes an autoregressive approach to generate shading ratios by stepwise residuals, capturing complex value dependencies without relying on predefined priors; and 2) a reward preference alignment system, which incorporates a channel-aware hierarchical dynamic network (CHNet) as the reward model to extract fine-grained features, along with modules for surplus optimization and exploration utility reward alignment, ultimately optimizing both short-term and long-term surplus using group relative policy optimization (GRPO). Extensive experiments on both offline and online A/B tests validate GBS's effectiveness. Moreover, GBS has been deployed on the Meituan DSP platform, serving billions of bid requests daily.
翻译:竞价阴影通过在实时竞价(RTB)中自适应调整出价来避免广告主过度支出,发挥着关键作用。现有主流的两阶段方法(先用竞价环境建模,再通过运筹学技术优化收益)受限于单峰假设,无法适应非凸收益曲线,且顺序工作流中易出现级联误差。此外,现有对连续值的离散化建模忽略了离散区间之间的依赖性,削弱了模型纠错能力,而竞价场景中的样本选择偏差进一步加剧了预测挑战。为解决这些问题,本文提出生成式竞价阴影(GBS),包含两个核心组件:1)端到端生成模型,通过自回归方式逐步生成残差形式的阴影比例,无需依赖先验假设即可捕获复杂值依赖性;2)奖励偏好对齐系统,融合通道感知层次动态网络(CHNet)作为奖励模型以提取细粒度特征,并包含收益优化与探索效用奖励对齐模块,最终通过组相对策略优化(GRPO)同时优化短期与长期收益。离线与在线A/B测试的大量实验验证了GBS的有效性。目前,GBS已部署于美团DSP平台,每日服务数十亿次竞价请求。