We study regret minimization in repeated first-price auctions (FPAs), where a bidder observes only the realized outcome after each auction -- win or loss. This setup reflects practical scenarios in online display advertising where the actual value of an impression depends on the difference between two potential outcomes, such as clicks or conversion rates, when the auction is won versus lost. We incorporate causal inference into this framework and analyze the challenging case where only the treatment effect admits a simple dependence on observable features. Under this framework, we propose algorithms that jointly estimate private values and optimize bidding strategies under two different feedback types on the highest other bid (HOB): the full-information feedback where the HOB is always revealed, and the binary feedback where the bidder only observes the win-loss indicator. Under both cases, our algorithms are shown to achieve near-optimal regret bounds. Notably, our framework enjoys a unique feature that the treatments are actively chosen, and hence eliminates the need for the overlap condition commonly required in causal inference.
翻译:本文研究重复一价拍卖中的遗憾最小化问题,其中投标人每轮仅能观测到拍卖的最终结果——中标或流标。该设定反映了在线展示广告的实际场景:当拍卖中标与流标时,展示曝光量的实际价值取决于点击率或转化率等两种潜在结果之间的差异。我们将因果推断引入该框架,并分析仅处理效应对可观测特征存在简单依赖关系的挑战性场景。在此框架下,我们针对最高竞争标价的两种反馈类型提出了联合估计私有价值与优化竞价策略的算法:全信息反馈(始终公开最高竞争标价)和二元反馈(投标人仅观测中标指示变量)。两种情况下,我们的算法均被证明能达到近似最优的遗憾界。值得注意的是,本框架具有处理变量被主动选择的独特优势,从而消除了因果推断中通常需要的重叠性条件。