We propose a new Markov Decision Process (MDP) model for ad auctions to capture the user response to the quality of ads, with the objective of maximizing the long-term discounted revenue. By incorporating user response, our model takes into consideration all three parties involved in the auction (advertiser, auctioneer, and user). The state of the user is modeled as a user-specific click-through rate (CTR) with the CTR changing in the next round according to the set of ads shown to the user in the current round. We characterize the optimal mechanism for this MDP as a Myerson's auction with a notion of modified virtual value, which relies on the value distribution of the advertiser, the current user state, and the future impact of showing the ad to the user. Moreover, we propose a simple mechanism built upon second price auctions with personalized reserve prices and show it can achieve a constant-factor approximation to the optimal long term discounted revenue.
翻译:我们提出了一种新的马尔可夫决策过程(MDP)模型用于广告竞价,该模型通过捕捉用户对广告质量的响应,以实现长期折现收益最大化为目标。通过纳入用户响应,我们的模型考虑了竞价中涉及的三个参与方(广告主、拍卖方和用户)。用户状态被建模为用户特定的点击率(CTR),而CTR会根据当前轮次向用户展示的广告集在下轮发生变化。我们刻画了该MDP的最优机制,该机制表现为一种基于修正虚拟价值的迈尔森拍卖,其修正虚拟价值依赖于广告主价值分布、当前用户状态以及向用户展示广告的未来影响。此外,我们提出了一种基于个性化保留价的二价拍卖的简单机制,并证明其可实现与最优长期折现收益的常数因子近似比。