We develop a frequentist decision-theoretic framework for selecting the best arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach characterizes the minimax-regret (MMR) optimal decision rule for any multivariate location family reward distribution with full support. We show that the MMR rule is deterministic, unique, and computationally tractable. We then specialize to the case of multivariate normal (MVN) rewards with an arbitrary covariance matrix, and establish the local asymptotic minimaxity of a plug-in version of the rule when only estimated means and covariances are available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix estimate directly into decision boundaries, allowing straightforward implementation in practice. Our analysis highlights a sharp contrast between two-arm and multi-arm designs. With two arms, the "pick-the-winner" empirical success rule remains MMR-optimal, regardless of the arm-specific variances. By contrast, with three or more arms and heterogeneous variances, the empirical success rule is no longer optimal: the MMR decision boundaries become nonlinear and systematically penalize high-variance arms, requiring stronger evidence to select them. Our multi-arm AMMR framework offers a rigorous foundation that leads to practical criteria for comparing multiple policies simultaneously.
翻译:我们为一次性多臂随机对照试验(RCT)中的最优臂选择问题建立了一个频率派决策理论框架。该方法刻画了具有全支撑的任意多元位置族奖励分布的极小极大后悔(MMR)最优决策规则。我们证明了MMR规则是确定性的、唯一的且计算可行的。随后我们专门研究了具有任意协方差矩阵的多元正态(MVN)奖励情形,并建立了当仅能获得估计的均值与协方差时,该规则的插件版本的局部渐近极小最优性。这一渐近MMR(AMMR)程序将协方差矩阵估计直接映射为决策边界,使得实际应用中的实现变得简单直接。我们的分析揭示了两臂设计与多臂设计之间的显著差异。在两臂情况下,"选择胜者"的经验成功率规则始终保持MMR最优性,与各臂特异性方差无关。相比之下,当存在三个或更多臂且方差异质时,经验成功率规则不再最优:MMR决策边界变为非线性,并系统性地惩罚高方差臂,需要更强的证据才能选择它们。我们的多臂AMMR框架提供了一个严格的理论基础,为同时比较多个策略提供了实用准则。