This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound. Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies. We first delve into the static full-information setting to formulate an optimal pricing policy as a constrained optimization problem. Here, we propose an approximation algorithm for efficiently and approximately computing the ideal policy. We also use mathematical analysis and computational studies to characterize the structures of optimal contextual pricing policies subject to fairness constraints, deriving simplified policies which lays the foundations of more in-depth research and extensions. Further, we extend our study to dynamic pricing problems with demand learning, establishing a non-standard regret lower bound that highlights the complexity added by fairness constraints. Our research offers a comprehensive analysis of the cost of fairness and its impact on the balance between utility and revenue maximization. This work represents a step towards integrating ethical considerations into algorithmic efficiency in data-driven dynamic pricing.
翻译:本文针对需求不确定场景下受效用公平约束的个性化定价问题,提出了一种新颖的上下文赌博机算法,并实现了最优遗憾上界。该算法融合动态定价与需求学习策略,重点解决了定价策略中的公平性关键挑战。首先,在静态完全信息设定下,将最优定价策略形式化为带约束的优化问题,并提出一种近似算法以高效逼近理想策略。通过数学分析与计算实验,本文刻画了公平约束下最优上下文定价策略的结构特征,推导出简化策略,为后续深入研究和拓展奠定基础。进一步,我们将研究延伸至包含需求学习的动态定价问题,建立了非标准遗憾下界,揭示了公平约束带来的额外复杂性。本研究全面分析了公平性成本及其对效用与收益最大化平衡的影响,标志着数据驱动动态定价中伦理考量与算法效率整合的重要进展。