Clinical trials are conducted to test the effectiveness and safety of potential drugs in humans for regulatory approval. Machine learning (ML) has recently emerged as a new tool to assist in clinical trials. Despite this progress, there have been few efforts to document and benchmark ML4Trial algorithms available to the ML research community. Additionally, the accessibility to clinical trial-related datasets is limited, and there is a lack of well-defined clinical tasks to facilitate the development of new algorithms. To fill this gap, we have developed PyTrial that provides benchmarks and open-source implementations of a series of ML algorithms for clinical trial design and operations. In this paper, we thoroughly investigate 34 ML algorithms for clinical trials across 6 different tasks, including patient outcome prediction, trial site selection, trial outcome prediction, patient-trial matching, trial similarity search, and synthetic data generation. We have also collected and prepared 23 ML-ready datasets as well as their working examples in Jupyter Notebooks for quick implementation and testing. PyTrial defines each task through a simple four-step process: data loading, model specification, model training, and model evaluation, all achievable with just a few lines of code. Furthermore, our modular API architecture empowers practitioners to expand the framework to incorporate new algorithms and tasks effortlessly. The code is available at https://github.com/RyanWangZf/PyTrial.
翻译:临床试验旨在评估潜在药物在人体中的有效性和安全性,以获取监管批准。机器学习(ML)近期已成为辅助临床试验的新工具。尽管取得了一定进展,但鲜有工作系统整理并基准测试机器学习领域现有的临床试验算法(ML4Trial)。此外,临床试验相关数据集的获取途径有限,且缺乏明确定义的临床任务以推动新算法的开发。为填补这一空白,我们开发了PyTrial,其为一系列用于临床试验设计与实施的机器学习算法提供了基准测试及开源实现。本文系统研究了涵盖6项不同任务的34种临床试验机器学习算法,包括患者结局预测、试验中心选择、试验结果预测、患者-试验匹配、试验相似性搜索及合成数据生成。我们还收集并整理了23个可被机器学习直接使用的数据集及其Jupyter Notebook工作示例,以便快速实现与测试。PyTrial通过简单的四步流程(数据加载、模型指定、模型训练、模型评估)定义每项任务,仅需数行代码即可完成。此外,其模块化API架构使从业者能够轻松扩展框架,集成新算法与新任务。代码已开源至https://github.com/RyanWangZf/PyTrial。