We study the sample complexity of learning a uniform approximation of an $n$-dimensional cumulative distribution function (CDF) within an error $ε> 0$, when observations are restricted to a minimal one-bit feedback. This serves as a counterpart to the multivariate DKW inequality under ''full feedback'', extending it to the setting of ''bandit feedback''. Our main result shows a near-dimensional-invariance in the sample complexity: we get a uniform $ε$-approximation with a sample complexity $\frac{1}{ε^3}{\log\left(\frac 1 ε\right)^{\mathcal{O}(n)}}$ over a arbitrary fine grid, where the dimensionality $n$ only affects logarithmic terms. As direct corollaries, we provide tight sample complexity bounds and novel regret guarantees for learning fixed-price mechanisms in small markets, such as bilateral trade settings.
翻译:本文研究了在仅能获得最小化单比特反馈的观测条件下,学习$n$维累积分布函数(CDF)在误差界$ε> 0$内的均匀逼近所需的样本复杂度。该研究构成了"完全反馈"场景下多元DKW不等式的对应理论,并将其扩展至"赌博机反馈"场景。我们的主要结果表明样本复杂度具有近似维度不变性:在任意精细网格上,我们以$\frac{1}{ε^3}{\log\left(\frac 1 ε\right)^{\mathcal{O}(n)}}$的样本复杂度获得均匀$ε$逼近,其中维度$n$仅影响对数项。作为直接推论,我们为小规模市场(如双边交易场景)中固定价格机制的学习问题提供了紧致的样本复杂度界限与新颖的遗憾保证。