While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. Concretely, we test if $\exists x: Ax \ge 0$ for an unknown $A \in \mathbb{R}^{m \times d}$, by playing a sequence of actions $x_t\in \mathbb{R}^d$, and observing $Ax_t + \mathrm{noise}$ in response. By identifying the hypothesis as determining the sign of the value of a minimax game, we construct a novel test based on low-regret algorithms and a nonasymptotic law of iterated logarithms. We prove that this test is reliable, and adapts to the `signal level,' $\Gamma,$ of any instance, with mean sample costs scaling as $\widetilde{O}(d^2/\Gamma^2)$. We complement this by a minimax lower bound of $\Omega(d/\Gamma^2)$ for sample costs of reliable tests, dominating prior asymptotic lower bounds by capturing the dependence on $d$, and thus elucidating a basic insight missing in the extant literature on such problems.
翻译:尽管近期文献中对约束赌博机问题的研究激增,但现有方法均始于对底层问题可行性的假设。我们首次对这类可行性假设的检验展开研究,特别针对线性赌博机场景下的问题,从而刻画了使用赌博机反馈检验未知线性规划可行性的成本。具体而言,我们通过选择一系列动作$x_t\in \mathbb{R}^d$并观测响应$Ax_t + \mathrm{noise}$,来检验是否存在$x$使得$Ax \ge 0$,其中$A \in \mathbb{R}^{m \times d}$未知。通过将假设检验问题转化为确定极小极大博弈值的符号,我们基于低遗憾算法和非渐近迭代对数律构建了一种新颖的检验方法。我们证明了该检验具有可靠性,并能自适应于任意实例的"信号水平"$\Gamma$,其平均样本成本按$\widetilde{O}(d^2/\Gamma^2)$缩放。我们进一步给出了可靠检验样本成本的极小极大下界$\Omega(d/\Gamma^2)$,该下界通过捕捉对$d$的依赖关系超越了先前的渐近下界,从而阐明了现有文献中缺失的基础性洞见。