We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of $\widetilde{O}(\sqrt{α^* T})$ after $T$ rounds, where $α^*$ is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of $α^*$. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor and Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.
翻译:我们提出了一种新的在线学习问题的部分可观测性模型,其中学习器除了自身的损失外,还根据问题的底层结构观测到其他动作的含噪反馈。我们用一个加权有向图表示这种结构,其中边的权重与连接节点间共享反馈的质量相关。我们的主要贡献是提出了一种高效算法,该算法在T轮后保证了$\widetilde{O}(\sqrt{α^* T})$的遗憾界,其中$α^*$是我们称之为有效独立数的新图属性。我们的算法完全无参数,且无需知道(甚至估计)$α^*$。对于二值边权重的特殊情况,我们的设置退化为Mannor和Shamir(2011)以及Alon等人(2013)的部分可观测性模型,且我们的算法恢复了近最优的遗憾界。