We study a \emph{max-risk} objective for active learning in a multi-group mean estimation $d$-armed bandits: a learner adaptively allocates a budget of $T$ samples across $d$ groups to minimize the worst-case uncertainty index $\max_{k\in[d]}σ_k^2/n_k$, where $σ_k$ is the standard deviation of the distribution of arm $d$, and $n_k$ is the number of times arm $d$ is sampled. We develop a local minimax framework and prove the first general lower bound for this objective, valid for any finite-variance hypothesis class. The bound separates difficulty into three orthogonal factors: a \emph{budget} term, a \emph{heteroscedasticity} index measuring how unevenly the uncertainty is spread across arms, and a model-dependent complexity measure, the \emph{Variance Local Curvature} ($\mathrm{VLC}$), which captures how much information a local change of variance creates inside the hypothesis class. For smooth classes, the $\mathrm{VLC}$ is a reparametrization of a variance--Fisher information, with closed-form values for common families. Benchmarking against the strongest available upper bound shows near-optimality up to logarithmic factors in broad regimes, and pinpoints a systematic gap in highly heterogeneous instances. Our proof introduces two key ingredients: a loss-induced $\ell_1$ geometry on the decision space, and a representation-based instance generator that reduces hard-instance construction to an explicit random matrix calculation.
翻译:我们研究了多组均值估计$d$臂赌博机中主动学习的\emph{最大风险}目标:学习者自适应地将$T$个样本预算分配至$d$个组,以最小化最坏情况下的不确定性指标$\max_{k\in[d]}σ_k^2/n_k$,其中$σ_k$为臂$d$分布的标偏差,$n_k$为臂$d$被采样的次数。我们提出了一个局部极小极大框架,并证明了该目标的第一个一般性下界,适用于任何有限方差假设类。该界将难度分解为三个正交因素:\emph{预算}项、衡量不确定性在各臂间分布不均程度的\emph{异方差性}指标,以及一个模型相关的复杂度度量——\emph{方差局部曲率}($\mathrm{VLC}$),它捕捉了局部方差变化在假设类内部创造的信息量。对于光滑类,$\mathrm{VLC}$是方差-费希尔信息量的重新参数化,常见分布族具有闭式表达。与现有最强上界的对比显示,在广泛场景中其接近最优性(仅对数因子差距),并精确指出了高度异质性实例中的系统性间隙。我们的证明引入了两个关键要素:决策空间上的损失诱导$\ell_1$几何结构,以及一种基于表示的实例生成器,将困难实例构造简化为显式随机矩阵计算。