This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and their reference price, and consecutive periods in the repeated games are connected by reference price updates. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy for the single-period game, to simultaneously capture the long-run market equilibrium and stability. We propose the online projected gradient ascent algorithm (OPGA), where the firms adjust prices using the first-order derivatives of their log-revenues that can be obtained from the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by OPGA converge to the unique SNE, thereby achieving the no-regret learning and a stable market. Moreover, with appropriate step-sizes, we prove that this convergence exhibits a rate of $\mathcal{O}(1/t)$.
翻译:本文致力于竞争性框架下的算法设计,核心目标是学习稳定均衡。我们考虑不透明市场中两家企业的动态价格竞争,其中每家企业均缺乏竞争对手的信息。需求遵循多项对数单位(MNL)选择模型,该模型取决于消费者观测到的价格及其参考价格,重复博弈中的连续周期通过参考价格更新相互关联。我们采用静态纳什均衡(SNE)概念(定义为单期博弈均衡定价策略的不动点)来同时刻画长期市场均衡与稳定性。我们提出在线投影梯度上升算法(OPGA),企业通过市场反馈机制获取的自身对数收益的一阶导数来调整价格。尽管缺乏在线博弈收敛所需的典型性质(如强单调性和变分稳定性),我们证明在递减步长下,OPGA生成的价格和参考价格路径收敛至唯一SNE,从而实现无悔学习与稳定市场。此外,在适当步长下,我们证明该收敛速度达到$\mathcal{O}(1/t)$。