The autobidding system generates huge revenue for advertising platforms, garnering substantial research attention. Existing studies in autobidding systems focus on designing Autobidding Incentive Compatible (AIC) mechanisms, where the mechanism is Incentive Compatible (IC) under ex ante expectations. However, upon deploying AIC mechanisms in advertising platforms, we observe a notable deviation between the actual auction outcomes and these expectations during runtime, particularly in the scene with few clicks (sparse-click). This discrepancy undermines truthful bidding among advertisers in AIC mechanisms, especially for risk-averse advertisers who are averse to outcomes that do not align with the expectations. To address this issue, we propose a mechanism, Decoupled First-Price Auction (DFP), that retains its IC property even during runtime. DFP dynamically adjusts the payment based on real-time user conversion outcomes, ensuring that advertisers' realized utilities closely approximate their expected utilities during runtime. To realize the payment mechanism of DFP, we propose a PPO-based RL algorithm, with a meticulously crafted reward function. This algorithm dynamically adjusts the payment to fit DFP mechanism. We conduct extensive experiments leveraging real-world data to validate our findings.
翻译:自动竞价系统为广告平台创造了巨大收益,因而获得了广泛的研究关注。现有关于自动竞价系统的研究主要集中于设计自动竞价激励兼容机制,该机制在事前期望下满足激励兼容性。然而,在广告平台部署AIC机制后,我们观察到实际拍卖结果与这些期望在运行时存在显著偏差,尤其是在点击量稀少(稀疏点击)的场景中。这种差异削弱了广告主在AIC机制中真实出价的动机,尤其对于那些厌恶结果偏离期望的风险规避型广告主而言。为解决此问题,我们提出了一种机制——解耦第一价格拍卖,该机制即使在运行时也能保持其激励兼容特性。DFP根据实时用户转化结果动态调整支付,确保广告主在运行时的实际效用紧密逼近其期望效用。为实现DFP的支付机制,我们提出了一种基于PPO的强化学习算法,并精心设计了奖励函数。该算法动态调整支付以适应DFP机制。我们利用真实世界数据进行了大量实验以验证我们的发现。