Multivariate Adjustments for Average Equivalence Testing

Multivariate (average) equivalence testing is widely used to assess whether the means of two conditions of interest are `equivalent' for different outcomes simultaneously. The multivariate Two One-Sided Tests (TOST) procedure is typically used in this context by checking if, outcome by outcome, the marginal $100(1-2\alpha$)\% confidence intervals for the difference in means between the two conditions of interest lie within pre-defined lower and upper equivalence limits. This procedure, known to be conservative in the univariate case, leads to a rapid power loss when the number of outcomes increases, especially when one or more outcome variances are relatively large. In this work, we propose a finite-sample adjustment for this procedure, the multivariate $\alpha$-TOST, that consists in a correction of $\alpha$, the significance level, taking the (arbitrary) dependence between the outcomes of interest into account and making it uniformly more powerful than the conventional multivariate TOST. We present an iterative algorithm allowing to efficiently define $\alpha^{\star}$, the corrected significance level, a task that proves challenging in the multivariate setting due to the inter-relationship between $\alpha^{\star}$ and the sets of values belonging to the null hypothesis space and defining the test size. We study the operating characteristics of the multivariate $\alpha$-TOST both theoretically and via an extensive simulation study considering cases relevant for real-world analyses -- i.e.,~relatively small sample sizes, unknown and heterogeneous variances, and different correlation structures -- and show the superior finite-sample properties of the multivariate $\alpha$-TOST compared to its conventional counterpart. We finally re-visit a case study on ticlopidine hydrochloride and compare both methods when simultaneously assessing bioequivalence for multiple pharmacokinetic parameters.

翻译：多元（平均）等效性检验被广泛用于评估两种关注条件的均值是否在多个结果上同时达到“等效”。在此背景下，通常采用多元两次单侧检验（TOST）程序，通过逐结果检查两种关注条件间均值差异的边际$100(1-2\alpha$)\%置信区间是否位于预先定义的下等效限与上等效限之内。已知该程序在单变量情形下较为保守，当结果数量增加时——尤其当某些结果的方差相对较大时——会导致检验功效迅速下降。本研究提出一种针对该程序的有限样本调整方法，即多元$\alpha$-TOST，其核心在于对显著性水平$\alpha$进行校正，同时考虑关注结果之间的（任意）依赖性，使其在功效上一致优于传统多元TOST。我们提出一种迭代算法，可高效确定校正后的显著性水平$\alpha^{\star}$——由于$\alpha^{\star}$与属于原假设空间且决定检验规模的数值集合之间存在相互关联，该任务在多元设定中极具挑战。我们通过理论分析和广泛的模拟研究（考虑现实分析相关情形——即相对较小的样本量、未知且异质的方差以及不同的相关结构）考察了多元$\alpha$-TOST的操作特性，并证明其相较于传统方法具有更优的有限样本性质。最后，我们重新审视盐酸噻氯匹定的案例研究，比较两种方法在同时评估多个药代动力学参数的生物等效性时的表现。