Auto-bidding services optimize real-time bidding strategies for advertisers under key performance indicator (KPI) constraints such as target return on investment and budget. However, uncertainties such as model prediction errors and feedback latency can cause bidding strategies to deviate from ex-post optimality, leading to inefficient allocation. To address this issue, we propose JD-BP, a Joint generative Decision framework for Bidding and Pricing. Unlike prior methods, JD-BP jointly outputs a bid value and a pricing correction term that acts additively with the payment rule such as GSP. To mitigate adverse effects of historical constraint violations, we design a memory-less Return-to-Go that encourages future value maximizing of bidding actions while the cumulated bias is handled by the pricing correction. Moreover, a trajectory augmentation algorithm is proposed to generate joint bidding-pricing trajectories from a (possibly arbitrary) base bidding policy, enabling efficient plug-and-play deployment of our algorithm from existing RL/generative bidding models. Finally, we employ an Energy-Based Direct Preference Optimization method in conjunction with a cross-attention module to enhance the joint learning performance of bidding and pricing correction. Offline experiments on the AuctionNet dataset demonstrate that JD-BP achieves state-of-the-art performance. Online A/B tests at JD.com confirm its practical effectiveness, showing a 4.70% increase in ad revenue and a 6.48% improvement in target cost.
翻译:自动竞价服务在关键绩效指标(KPI)约束下(如目标投资回报率和预算),为广告主优化实时竞价策略。然而,模型预测误差和反馈延迟等不确定性因素会导致竞价策略偏离事后最优性,从而造成分配效率低下。为解决这一问题,我们提出JD-BP——一种联合决策的竞价与定价生成框架。与先前方法不同,JD-BP联合输出竞价值和定价修正项,该修正项以可加方式作用于GSP等支付规则。为缓解历史约束违反带来的不利影响,我们设计了无记忆的“返回目标”(Return-to-Go)机制,在累积偏差由定价修正处理的同时,鼓励竞价动作的未来价值最大化。此外,我们提出轨迹增强算法,基于(任意)基础竞价策略生成联合竞价-定价轨迹,使我们的算法能够从现有强化学习/生成式竞价模型出发实现高效的即插即用部署。最后,我们采用基于能量的直接偏好优化方法,并结合交叉注意力模块,以增强竞价与定价修正的联合学习性能。在AuctionNet数据集上的离线实验表明,JD-BP达到了最先进的性能。京东在线A/B测试验证了其实用有效性,广告收入提升4.70%,目标成本改善6.48%。