Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential causal inference over time. The validity of AsympCS is also established in the MAB setup with a possibly mis-specified model. The empirical results demonstrate the strengths of this method in optimizing dose allocation compared to randomized allocations and traditional contextual bandits focused solely on efficacy. Moreover, an application on real data generated from a recent Phase IIb study aligns with actual findings.
翻译:早期临床试验面临选择最佳药物剂量的挑战,需要在安全性和有效性之间取得平衡,这源于剂量-反应关系的不确定性以及参与者特征的多样性。传统的随机剂量分配方法通常不考虑个体协变量,导致参与者暴露于次优剂量,从而需要更大的样本量并延长药物开发周期。本文提出一种风险包容性上下文多臂赌博机算法,该算法利用多臂赌博机策略,通过整合参与者特异性数据来优化剂量分配。通过结合两个独立的汤普森采样器(一个用于有效性评估,一个用于安全性评估),该算法增强了剂量分配中有效性与安全性之间的平衡。效应量通过渐近置信序列的广义版本进行估计,为时序因果推断提供了统一的覆盖保证。在模型可能设定错误的情况下,渐近置信序列在多臂赌博机框架中的有效性也得到了验证。实证结果表明,与随机分配以及仅关注有效性的传统上下文多臂赌博机相比,该方法在优化剂量分配方面具有显著优势。此外,在最近一项IIb期研究产生的真实数据上的应用结果与实际发现相符。