We consider the problem of ranking n experts based on their performances on d tasks. We make a monotonicity assumption stating that for each pair of experts, one outperforms the other on all tasks. We consider the sequential setting where in each round, the learner has access to noisy evaluations of actively chosen pair of expert-task, given the information available up to the actual round. Given a confidence parameter $\delta$ $\in$ (0, 1), we provide strategies allowing to recover the correct ranking of experts and develop a bound on the total number of queries made by our algorithm that hold with probability at least 1 -- $\delta$. We show that our strategy is adaptive to the complexity of the problem (our bounds are instance dependent), and develop matching lower bounds up to a poly-logarithmic factor. Finally, we adapt our strategy to the relaxed problem of best expert identification and provide numerical simulation consistent with our theoretical results.
翻译:我们考虑基于 d 个任务的表现对 n 位专家进行排序的问题。我们假设每对专家之间存在单调性,即一位专家在所有任务上的表现均优于另一位。我们考虑序贯设置:在每一轮中,学习者根据当前轮次之前可获得的信息,对主动选定的专家-任务对进行含噪评估。给定置信参数 $\delta \in (0, 1)$,我们提出了能够以至少 $1 - \delta$ 的概率恢复专家正确排序的策略,并给出了算法总查询次数的上界。我们证明策略能自适应于问题的复杂度(所给上界是实例相关的),并得到了至多相差一个多对数因子的匹配下界。最后,我们将策略适用于放松的最优专家识别问题,并提供了与理论结果一致的数值模拟。