We study sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $λ_i(\mathbf{p}) = μ_i(\langle \boldsymbolθ_{i,0}, \mathbf{p} \rangle)$, with known increasing link $μ_i$ and unknown parameter $\boldsymbolθ_{i,0}$, where the vector $\mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $O(N^{2}\sqrt{T}\log(T))$ regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.
翻译:我们研究了$N$个卖家之间的序贯价格竞争问题,其中每个卖家的决策均受到竞争对手定价策略的影响。具体而言,每个卖家$i$的需求函数遵循单指标模型$λ_i(\mathbf{p}) = μ_i(\langle \boldsymbolθ_{i,0}, \mathbf{p} \rangle)$,其中递增连接函数$μ_i$已知而参数$\boldsymbolθ_{i,0}$未知,向量$\mathbf{p}$表示所有卖家在给定时刻同时提供的价格向量。每个卖家仅能观测到自身实现的需求量(竞争对手无法观测)以及竞争对手设定的价格。我们的框架推广了现有仅关注线性需求模型的研究方法。我们提出了一种新颖的分散策略PML-GLUCB,该策略将惩罚极大似然估计与置信上界定价规则相结合,消除了卖家间协调探索阶段的需求(这在先前线性模型中不可或缺),并能同时处理二元和实数值需求观测。相较于动态基准策略,每个卖家实现了$O(N^{2}\sqrt{T}\log(T))$的遗憾界,该结果本质上匹配了线性设定下的已知最优速率。本研究的一个重要技术贡献在于:针对竞争性多智能体环境,发展了椭圆势引理(通常应用于单智能体系统)的变体形式。