We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear models (GLMs) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a poly(S)-free radius where S is the norm of the unknown parameter. Our first technical novelty is its derivation, which utilizes a time-uniform PAC-Bayesian bound with a uniform prior/posterior, despite the latter being a rather unpopular choice for deriving CSs. As a direct application of our new CS, we propose a simple and natural optimistic algorithm called OFUGLB applicable to any generalized linear bandits (GLB; Filippi et al. (2010)). Our analysis shows that the celebrated optimistic approach simultaneously attains state-of-the-art regrets for various self-concordant (not necessarily bounded) GLBs, and even poly(S)-free for bounded GLBs, including logistic bandits. The regret analysis, our second technical novelty, follows from combining our new CS with a new proof technique that completely avoids the previously widely used self-concordant control lemma (Faury et al., 2020, Lemma 9). Finally, we verify numerically that OFUGLB significantly outperforms the prior state-of-the-art (Lee et al., 2024) for logistic bandits.
翻译:我们提出了一种基于似然比的统一置信序列(CS),适用于任何(自协调的)广义线性模型(GLM),该序列被保证是凸的且在数值上是紧致的。我们证明,对于包括高斯、伯努利和泊松在内的各种GLM,该CS与已知CS相当或更优。特别地,我们的伯努利模型CS首次实现了与未知参数范数S无关的多项式自由半径。我们的第一个技术新颖性在于其推导过程,尽管均匀先验/后验在推导CS时是一个相当不受欢迎的选择,但我们利用了具有均匀先验/后验的时间均匀PAC-Bayesian界。作为我们新CS的直接应用,我们提出了一种简单且自然的乐观算法,称为OFUGLB,适用于任何广义线性赌博机(GLB;Filippi等人(2010))。我们的分析表明,这种著名的乐观方法同时为各种自协调(不一定有界)GLB实现了最先进的遗憾界,甚至对于有界GLB(包括逻辑赌博机)实现了与S无关的多项式自由遗憾界。遗憾分析作为我们的第二个技术新颖性,源于将新CS与一种全新的证明技术相结合,该技术完全避免了先前广泛使用的自协调控制引理(Faury等人,2020,引理9)。最后,我们通过数值实验验证了OFUGLB在逻辑赌博机问题上显著优于先前的最先进方法(Lee等人,2024)。