We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.
翻译:我们研究一个在线决策问题,其奖励函数定义在图结构数据上。我们正式将该问题建模为图动作赌博机的一个实例。随后,我们提出 \texttt{GNN-TS},一种基于图神经网络(GNN)的汤普森采样(TS)算法,该算法采用 GNN 近似器来估计平均奖励函数,并利用图神经正切特征进行不确定性估计。我们证明,在奖励函数的某些有界性假设下,GNN-TS 实现了最先进的遗憾界,该遗憾界(1)在交互轮数 $T$ 和有效维度 $\tilde{d}$ 的意义下,是阶为 $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ 的次线性;(2)与图节点数量无关。实证结果验证了我们提出的 \texttt{GNN-TS} 在图动作赌博机问题上表现出有竞争力的性能且具有良好的可扩展性。