We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.
翻译:我们研究一个奖励函数定义在图结构数据上的在线决策问题。我们正式将该问题建模为图动作赌博机的一个实例。我们随后提出 \texttt{GNN-TS},一种图神经网络(GNN)驱动的汤普森采样(TS)算法,该算法采用一个GNN近似器来估计平均奖励函数,并利用图神经正切特征进行不确定性估计。我们证明,在奖励函数的某些有界性假设下,GNN-TS实现了最先进的遗憾界,该遗憾界(1)在交互轮数 $T$ 和有效维度 $\tilde{d}$ 的意义上是 $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ 阶次线性的,并且(2)与图节点数量无关。实证结果表明,我们提出的 \texttt{GNN-TS} 在图动作赌博机问题上表现出有竞争力的性能且具有良好的可扩展性。