Clinical trials are characterized by high costs, extended timelines, and substantial operational risk, yet reliable prospective methods for predicting trial success before initiation remain limited. Existing artificial intelligence approaches often focus on isolated metrics or specific development stages and frequently rely on variables unavailable at the trial design phase, limiting real-world applicability. We present a hierarchical latent risk-aware machine learning framework for prospective prediction of clinical trial operational success using a curated subset of TrialsBank, a proprietary AI-ready database developed by Sorintellis, comprising 13,700 trials. Operational success was defined as the ability to initiate, conduct, and complete a clinical trial according to planned timelines, recruitment targets, and protocol specifications through database lock. This approach decomposes operational success prediction into two modeling stages. First, intermediate latent operational risk factors are predicted using more than 180 drug- and trial-level features available before trial initiation. These predicted latent risks are then integrated into a downstream model to estimate the probability of operational success. A staged data-splitting strategy was employed to prevent information leakage, and models were benchmarked using XGBoost, CatBoost, and Explainable Boosting Machines. Across Phase I-III, the framework achieves strong out-of-sample performance, with F1-scores of 0.93, 0.92, and 0.91, respectively. Incorporating latent risk drivers improves discrimination of operational failures, and performance remains robust under independent inference evaluation. These results demonstrate that clinical trial operational success can be prospectively forecasted using a latent risk-aware AI framework, enabling early risk assessment and supporting data-driven clinical development decision-making.
翻译:临床试验具有成本高、周期长、运营风险大的特点,但启动前可靠预测试验成功的前瞻性方法仍然有限。现有的人工智能方法通常聚焦于孤立指标或特定开发阶段,且常依赖试验设计阶段不可用的变量,限制了实际应用性。我们提出了一种分层潜在风险感知机器学习框架,用于前瞻性预测临床试验运营成功,该框架基于Sorintellis开发的自有AI就绪数据库TrialsBank的精选子集,包含13,700项试验。运营成功定义为根据计划时间线、招募目标和方案规范,从试验启动、执行到数据库锁定完成的能力。该方法将运营成功预测分解为两个建模阶段:首先,利用试验启动前可获得的180多个药物及试验层面特征,预测中间层的潜在运营风险因子;然后,将这些预测的潜在风险整合到下游模型中,以估算运营成功概率。我们采用分阶段数据分割策略防止信息泄露,并使用XGBoost、CatBoost和可解释提升机进行模型基准测试。在I-III期试验中,该框架取得了稳健的样本外性能,F1分数分别为0.93、0.92和0.91。纳入潜在风险驱动因子可提升对运营失败的判别能力,且在独立推断评估下性能保持稳健。这些结果表明,可利用潜在风险感知AI框架前瞻性预测临床试验运营成功,从而实现早期风险评估并支持数据驱动的临床开发决策。