We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new $XY$-design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive $d^2$ term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.
翻译:我们研究固定置信度下的近参数化赌博机最优臂识别(BAI)问题,其中奖励由臂特征的线性函数加上未知加性基线漂移项构成。与线性赌博机BAI不同,该场景需采用正交化回归方法,其实例最优样本复杂度问题尚未解决。针对转导场景,我们建立了可实现的实例相关下界,该下界由移位特征上对应的线性赌博机复杂度刻画。基于新型$XY$设计正交化回归方法,我们提出一种计算高效的相位消除算法。分析表明,该算法的高概率样本复杂度上界(仅相差对数因子及附加$d^2$项)接近最优,且基于合成实例与Jester数据集的实验相较于现有基线方法展现出显著优势。