We develop and compare e-variables for testing whether $k$ samples of data are drawn from the same distribution, the alternative being that they come from different elements of an exponential family. We consider the GRO (growth-rate optimal) e-variables for (1) a `small' null inside the same exponential family, and (2) a `large' nonparametric null, as well as (3) an e-variable arrived at by conditioning on the sum of the sufficient statistics. (2) and (3) are efficiently computable, and extend ideas from Turner et al. [2021] and Wald [1947] respectively from Bernoulli to general exponential families. We provide theoretical and simulation-based comparisons of these e-variables in terms of their logarithmic growth rate, and find that for small effects all four e-variables behave surprisingly similarly; for the Gaussian location and Poisson families, e-variables (1) and (3) coincide; for Bernoulli, (1) and (2) coincide; but in general, whether (2) or (3) grows faster under the alternative is family-dependent. We furthermore discuss algorithms for numerically approximating (1).
翻译:本文针对k个数据样本是否来自同一分布(备择假设为各样本来自指数族不同元素)的问题,开发并比较了相应的e-变量。我们考虑以下三种情况下的GRO(增长率最优)e-变量:(1)同一指数族内部的"小"零假设;(2)非参数"大"零假设;以及(3)通过条件化充分统计量总和得到的e-变量。其中(2)和(3)具有高效可计算性,分别将Turner等人[2021]和Wald[1947]的思想从伯努利分布推广至一般指数族。我们通过理论推导和模拟对比,基于对数增长率对这些e-变量进行了比较分析,发现当效应较小时,所有四个e-变量表现异常相似;对高斯位置族和泊松族而言,e-变量(1)与(3)完全一致;对伯努利分布,e-变量(1)与(2)完全一致;但在一般情况下,(2)与(3)在备择假设下的增长快慢取决于具体指数族类型。此外,我们讨论了e-变量(1)的数值近似算法。