The population $\mathrm{KL}_{\inf}$ is a fundamental quantity that appears in lower bounds for (asymptotically) optimal regret of pure-exploration stochastic bandit algorithms, and optimal stopping time of sequential tests. Motivated by this, an empirical $\mathrm{KL}_{\inf}$ statistic is frequently used in the design of (asymptotically) optimal bandit algorithms and sequential tests. While nonasymptotic concentration bounds for the empirical $\mathrm{KL}_{\inf}$ have been developed, their optimality in terms of constants and rates is questionable, and their generality is limited (usually to bounded observations). The fundamental limits of nonasymptotic concentration are often described by the asymptotic fluctuations of the statistics. With that motivation, this paper presents a tight (upper and lower) law of the iterated logarithm for empirical $\mathrm{KL}_{\inf}$ applying to extremely general (unbounded) data.
翻译:总体 $\mathrm{KL}_{\inf}$ 是一个基本量,它出现在纯探索随机多臂老虎机算法(渐近)最优遗憾的下界中,以及序贯检验的最优停止时间中。受此启发,经验 $\mathrm{KL}_{\inf}$ 统计量常被用于设计(渐近)最优老虎机算法和序贯检验。虽然已经发展了经验 $\mathrm{KL}_{\inf}$ 的非渐近集中界,但其在常数和速率方面的最优性值得商榷,且其普适性有限(通常限于有界观测)。非渐近集中的基本极限通常由统计量的渐近波动来描述。基于此动机,本文针对经验 $\mathrm{KL}_{\inf}$ 提出了一个紧的(上界和下界)重对数律,适用于极其一般(无界)的数据。