Auto-bidding serves as a critical tool for advertisers to improve their advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static offline dataset. To address this, we propose {AIGB-Pearl} (\emph{{P}lanning with {E}valu{A}tor via RL}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator for scoring generation quality and designing a provably sound KL-Lipschitz-constrained score maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm incorporating the synchronous coupling technique is further devised to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.
翻译:自动出价作为提升广告投放效果的关键工具,近年来取得重要进展。研究表明,基于离线数据学习条件生成规划器的AI生成出价方法,相较于典型的离线强化学习自动出价方案展现出更优性能。然而,现有AI生成出价方法受限于静态离线数据集,其固有缺陷导致性能瓶颈。为此,我们提出AIGB-Pearl(基于强化学习的评估规划器),一种融合生成规划与策略优化的创新方法。该方案核心在于构建轨迹评估器以量化生成质量,并设计可证明的KL-Lipschitz约束分数最大化机制,确保在离线数据集之外进行安全高效的探索。进一步提出融合同步耦合技术的实用算法,以满足方案所需的模型正则性要求。在仿真与实际广告系统中的大量实验表明,本方法达到了当前最优性能水平。