We study the problem of contextual online bilateral trade. At each round, the learner faces a seller-buyer pair and must propose a trade price without observing their private valuations for the item being sold. The goal of the learner is to post prices to facilitate trades between the two parties. Before posting a price, the learner observes a $d$-dimensional context vector that influences the agent's valuations. Prior work in the contextual setting has focused on linear models. In this work, we tackle a general nonparametric setting in which the buyer's and seller's valuations behave according to arbitrary Lipschitz functions of the context. We design an algorithm that leverages contextual information through a hierarchical tree construction and guarantees regret $\widetilde{O}(T^{{(d-1)}/d})$. Remarkably, our algorithm operates under two stringent features of the setting: (1) one-bit feedback, where the learner only observes whether a trade occurred or not, and (2) strong budget balance, where the learner cannot subsidize or profit from the market participants. We further provide a matching lower bound in the full-feedback setting, demonstrating the tightness of our regret bound.
翻译:本文研究情境在线双边交易问题。在每一轮中,学习者面对一个卖方-买方对,必须在未观测到交易物品私有估值的情况下提出交易价格。学习者的目标是通过发布价格促进双方交易。在发布价格前,学习者会观测到一个影响参与者估值的$d$维情境向量。现有情境化研究主要聚焦于线性模型。本文针对一般非参数化场景展开研究,其中买方与卖方的估值遵循关于情境的任意利普希茨函数。我们设计了一种通过分层树结构利用情境信息的算法,其遗憾上界为$\widetilde{O}(T^{{(d-1)}/d})$。值得注意的是,该算法在两种严格约束下仍能运行:(1) 单比特反馈机制——学习者仅能观测交易是否发生;(2) 强预算平衡约束——学习者不得对市场参与者进行补贴或从中牟利。我们进一步在全反馈设定下给出了匹配的下界,证明了遗憾上界的紧致性。