Equipoise calibration of clinical trial design

Statistical methods for clinical trial design are currently unable to rely on a sufficiently precise and general definition of what is an adequately powered study. Operationally, this definition is needed to ensure an alignment by design between statistical significance and clinical interpretability. To address this gap, this paper shows how to calibrate randomised trial designs to establishing strong clinical equipoise imbalance. Among several equipoise models, the least informed population distribution of the pre-trial odds of the design hypotheses is recommended here as the most practical calibrator. Under this model, primary analysis outcomes of common phase 3 superiority designs are shown to provide at least 90% evidence of equipoise imbalance. Designs carrying 95% power at 5% false positive rate are shown to demonstrate even stronger equipoise imbalance, providing an operational definition of a robustly powered study. Equipoise calibration is then applied to design of clinical development plans comprising randomised phase 2 and phase 3 studies. Development plans using oncology clinical endpoints are shown to provide strong equipoise imbalance when positive outcomes are observed in phase 2 and in phase 3. Establishing equipoise imbalance on a statistical basis when a positive phase 2 is not confirmed in phase 3 is shown to require large sample sizes unlikely to be associated with clinically meaningful effect sizes. Equipoise calibration is proposed as an effective clinical trial methodology ensuring that the statistical properties of clinical trial outcomes are clinically interpretable. Strong equipoise imbalance is achieved for designs carrying 95% power at 5% false positive rate, regardless of whether the primary outcome is positive or negative. Sponsors should consider raising power of their designs beyond current practice to achieve more conclusive results.

翻译：目前，临床研究设计的统计方法尚无法依赖一个足够精确且普适的定义来判断何为具备充分效力的研究。从操作层面而言，该定义对于确保统计显著性与临床可解释性在设计层面达成一致至关重要。为弥补这一空白，本文展示了如何通过校准随机化试验设计来确立强临床均衡性失衡。在多种均衡性模型中，本文推荐将设计假设的先验优势比的最小信息总体分布作为最实用的校准基准。在此模型下，常见的3期优效性设计的主要分析结果被证明至少能提供90%的均衡性失衡证据。研究进一步表明，在5%假阳性率下具备95%效力的设计能呈现更强的均衡性失衡，这为稳健效力研究提供了可操作的定义。随后，均衡性校准被应用于包含随机化2期和3期研究的临床开发方案设计中。采用肿瘤学临床终点的开发方案在2期和3期观察到阳性结果时，均显示出强均衡性失衡。研究同时证明，若阳性2期结果未在3期得到证实，要在统计基础上确立均衡性失衡需要极大的样本量，而这通常与具有临床意义的效应规模不相符。均衡性校准作为一种有效的临床试验方法被提出，它能确保临床试验结果的统计特性具有临床可解释性。对于在5%假阳性率下具备95%效力的设计，无论主要结局呈阳性或阴性，均可实现强均衡性失衡。申办方应考虑将研究设计的效力提升至超越当前实践水平，以期获得更具结论性的结果。