We develop a unifying framework for information-theoretic lower bound in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's method, Le Cam's method, and Assouad's lemma -- are central to the study of minimax risk in statistical estimation, yet are insufficient to provide tight lower bounds for \emph{interactive decision making} algorithms that collect data interactively (e.g., algorithms for bandits and reinforcement learning). Recent work of Foster et al. (2021, 2023) provides minimax lower bounds for interactive decision making using seemingly different analysis techniques from the classical methods. These results -- which are proven using a complexity measure known as the \emph{Decision-Estimation Coefficient} (DEC) -- capture difficulties unique to interactive learning, yet do not recover the tightest known lower bounds for passive estimation. We propose a unified view of these distinct methodologies through a new lower bound approach called \emph{interactive Fano method}. As an application, we introduce a novel complexity measure, the \emph{Fractional Covering Number}, which facilitates the new lower bounds for interactive decision making that extend the DEC methodology by incorporating the complexity of estimation. Using the fractional covering number, we (i) provide a unified characterization of learnability for \emph{any} stochastic bandit problem, (ii) close the remaining gap between the upper and lower bounds in Foster et al. (2021, 2023) (up to polynomial factors) for any interactive decision making problem in which the underlying model class is convex.
翻译:我们提出了一个用于统计估计与交互式决策中信息论下界的统一框架。经典下界技术——如Fano方法、Le Cam方法与Assouad引理——是研究统计估计中极小极大风险的核心工具,却不足以为交互式收集数据的交互式决策算法(例如赌博机与强化学习算法)提供紧致下界。Foster等人(2021, 2023)的最新工作通过看似与经典方法不同的分析技术,为交互式决策提供了极小极大下界。这些基于复杂度度量决策-估计系数(DEC)证明的结果捕捉了交互式学习特有的困难,却未能复现被动估计中最紧致的已知下界。我们通过一种称为交互式Fano方法的新下界途径,为这些不同方法论提供了统一视角。作为应用,我们引入了一种新的复杂度度量——分数覆盖数,它通过融入估计复杂度扩展了DEC方法论,从而为交互式决策提供了新的下界。利用分数覆盖数,我们(i)为任意随机赌博机问题提供了可学习性的统一刻画,(ii)对于底层模型类为凸的任意交互式决策问题,以多项式因子为界,闭合了Foster等人(2021, 2023)工作中上下界之间的剩余间隙。