Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequence modeling capabilities, applying them to this constrained setting encounters two challenges: 1) standard Return-to-Go conditioning causes state aliasing by neglecting the cost dimension, preventing precise resource pacing; and 2) standard regression forces the policy to mimic average historical behaviors, thereby limiting the capacity to optimize performance toward the constraint boundary. To address these challenges, we propose PRO-Bid, a constraint-aware generative auto-bidding framework based on two synergistic mechanisms: 1) Constraint-Decoupled Pareto Representation (CDPR) decomposes global constraints into recursive cost and value contexts to restore resource perception, while reweighting trajectories based on the Pareto frontier to focus on high-efficiency data; and 2) Counterfactual Regret Optimization (CRO) facilitates active improvement by utilizing a global outcome predictor to identify superior counterfactual actions. By treating these high-utility outcomes as weighted regression targets, the model transcends historical averages to approach the optimal constraint boundary. Extensive experiments on two public benchmarks and online A/B tests demonstrate that PRO-Bid achieves superior constraint satisfaction and value acquisition compared to state-of-the-art baselines.
翻译:自动竞价系统旨在最大化营销价值,同时满足严格的效率约束,如目标每次行动成本(Target Cost-Per-Action, CPA)。尽管决策Transformer(Decision Transformers)提供了强大的序列建模能力,将其应用于这种约束场景仍面临两大挑战:1)标准的"回报-目标"(Return-to-Go)条件设定忽略了成本维度,导致状态混淆,阻碍了精确的资源调控;2)标准的回归方法迫使策略模仿平均历史行为,从而限制了向约束边界优化性能的能力。为解决这些挑战,我们提出了PRO-Bid,一个基于两种协同机制的约束感知生成式自动竞价框架:1)约束解耦帕累托表示(Constraint-Decoupled Pareto Representation, CDPR)将全局约束分解为递归的成本与价值上下文以恢复资源感知,同时基于帕累托前沿对轨迹进行重加权,以聚焦于高效数据;2)反事实遗憾优化(Counterfactual Regret Optimization, CRO)通过利用全局结果预测器识别更优的反事实行动,以促进主动性能提升。通过将这些高效用结果作为加权回归目标,模型得以超越历史平均水平,逼近最优约束边界。在两个公开基准和在线A/B测试上的大量实验表明,与最先进的基线方法相比,PRO-Bid在约束满足和价值获取方面均实现了更优的性能。