A constrained Markov decision process (CMDP) approach is developed for response-adaptive procedures in clinical trials with binary outcomes. The resulting CMDP class of Bayesian response- adaptive procedures can be used to target a certain objective, e.g., patient benefit or power while using constraints to keep other operating characteristics under control. In the CMDP approach, the constraints can be formulated under different priors, which can induce a certain behaviour of the policy under a given statistical hypothesis, or given that the parameters lie in a specific part of the parameter space. A solution method is developed to find the optimal policy, as well as a more efficient method, based on backward recursion, which often yields a near-optimal solution with an available optimality gap. Three applications are considered, involving type I error and power constraints, constraints on the mean squared error, and a constraint on prior robustness. While the CMDP approach slightly outperforms the constrained randomized dynamic programming (CRDP) procedure known from literature when focussing on type I and II error and mean squared error, showing the general quality of CRDP, CMDP significantly outperforms CRDP when the focus is on type I and II error only.
翻译:本文提出了一种基于约束马尔可夫决策过程(CMDP)的方法,用于处理二元结果临床试验中的响应自适应方案。所得到的CMDP类贝叶斯响应自适应方案可用于针对特定目标(例如患者获益或统计功效),同时利用约束条件控制其他操作特性。在CMDP方法中,约束条件可在不同先验下进行设定,这可以在给定统计假设或参数位于参数空间特定区域的前提下,诱导出策略的特定行为。本文开发了求解最优策略的方法,以及一种基于反向递归的高效方法,后者通常能在给出可用最优性间隙的情况下生成近似最优解。本文还考虑了三种应用场景,涉及第一类错误与统计功效约束、均方误差约束以及先验稳健性约束。当聚焦于第一类与第二类错误以及均方误差时,CMDP方法的性能略优于文献中已知的约束随机动态规划(CRDP)方法,这体现了CRDP的总体质量;而当仅聚焦于第一类与第二类错误时,CMDP方法的性能则显著优于CRDP方法。