Auto-bidding is a core component of real-time advertising systems, where decisions must optimize long-term performance under budget and cost constraints, while online exploration is prohibitively risky. Offline reinforcement learning and, more recently, Transformer-based sequence modeling have shown promise for learning bidding policies from logged data, but their unimodal and purely parametric formulations often collapse multiple effective bidding strategies into suboptimal averaged actions and perform unreliably under sparse or long-tail traffic. To mitigate these limitations, we propose DRIVE (Distributional and Retrieval-Augmented Bidding with Value Evaluation), a unified Transformer-based framework that decouples candidate action generation from decision making for offline auto-bidding. DRIVE combines distributional action modeling, retrieval-augmented candidate generation from high-quality historical decisions, and value-based evaluation to select the most promising bid at inference time. Extensive experiments on AuctionNet and additional offline reinforcement learning benchmarks demonstrate that DRIVE consistently improves bidding performance and generalizes well across multiple Transformer-based methods.
翻译:自动出价是实时广告系统的核心组成部分,其决策需在预算和成本约束下优化长期性能,而在线探索存在极高风险。离线强化学习及近期基于Transformer的序列建模方法在从日志数据中学习出价策略方面展现出潜力,但其单模态及纯参数化设计常将多个有效出价策略退化为次优的平均动作,并在稀疏或长尾流量场景下表现不可靠。针对上述局限,我们提出DRIVE(基于分布式与检索增强的出价策略与价值评估),一种统一的基于Transformer框架,将候选动作生成与离线自动出价的决策过程解耦。DRIVE结合分布式动作建模、基于高质量历史决策的检索增强候选生成,以及基于价值的评估机制,在推理阶段选择最具潜力的出价。在AuctionNet及其他离线强化学习基准上的大量实验表明,DRIVE能显著提升出价性能,并在多种基于Transformer的方法中展现出良好的泛化能力。