In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most $\epsilon$ smaller than the optimum and prove that the number of queries needed can be infinitely large even when $g(\gamma, v)$ is monotone with respect to both $\gamma$ and $v$. On the positive side, we provide a tight query complexity $\tilde{\Theta}(1/\epsilon^3)$ when $g$ is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight $\tilde{\Theta}(1/\epsilon^3)$ query complexity can be achieved as long as $g$ satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight $\Theta(T^{2/3})$ regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.
翻译:本文研究潜空间中主动学习阈值的问题,其中未知奖励$g(\gamma, v)$取决于提议阈值$\gamma$和潜在值$v$,且仅当阈值小于或等于未知潜在值时才能获得该奖励。该问题在实践场景中具有广泛应用,例如在线拍卖中的保留价优化、众包中的在线任务分配、招聘中的设置录用标准等。我们首先刻画了在期望奖励与最优值相差不超过$\epsilon$的条件下学习阈值的查询复杂度,并证明即使$g(\gamma, v)$关于$\gamma$和$v$均单调,所需查询次数仍可能无限大。在积极方面,当$g$单调且价值分布的累积分布函数满足Lipschitz条件时,我们给出了紧的查询复杂度$\tilde{\Theta}(1/\epsilon^3)$。进一步,我们表明只要$g$满足单侧Lipschitz性,即可达到紧的$\tilde{\Theta}(1/\epsilon^3)$查询复杂度,这为该问题提供了完整刻画。最后,我们将该模型扩展至在线学习场景,利用连续臂赌博机技术及前述查询复杂度结果,证明了紧的$\Theta(T^{2/3})$遗憾界。