Adverse drug events (ADEs) significantly impact clinical research, causing many clinical trial failures. ADE prediction is key for developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, a dataset for multilabel predictive modeling of ADEs in monopharmacy treatments. CT-ADE integrates data from 2,497 unique drugs, encompassing 168,984 drug-ADE pairs extracted from clinical trials, annotated with patient and contextual information, and comprehensive ADE concepts standardized across multiple levels of the MedDRA ontology. Preliminary analyses with large language models (LLMs) achieved F1-scores up to 55.90%. Models using patient and contextual information showed F1-score improvements of 21%-38% over models using only chemical structure data. Our results highlight the importance of target population and treatment regimens in the predictive modeling of ADEs, offering greater performance gains than LLM domain specialization and scaling. CT-ADE provides an essential tool for researchers aiming to leverage artificial intelligence and machine learning to enhance patient safety and minimize the impact of ADEs on pharmaceutical research and development. The dataset is publicly accessible at https://github.com/ds4dh/CT-ADE.
翻译:药物不良事件(ADEs)对临床研究具有重大影响,是导致许多临床试验失败的主要原因。ADE预测对于开发更安全的药物和改善患者预后至关重要。为支持这一工作,我们推出了CT-ADE,这是一个用于单药治疗中ADE多标签预测建模的数据集。CT-ADE整合了来自2,497种独特药物的数据,包含从临床试验中提取的168,984个药物-ADE对,并标注了患者与背景信息,同时涵盖了在MedDRA本体多个层级上标准化的全面ADE概念。使用大语言模型(LLMs)进行的初步分析实现了高达55.90%的F1分数。相较于仅使用化学结构数据的模型,利用患者和背景信息的模型在F1分数上取得了21%-38%的提升。我们的研究结果凸显了目标人群和治疗方案在ADE预测建模中的重要性,其带来的性能提升超过了LLM领域专业化与规模扩展的效果。CT-ADE为旨在利用人工智能和机器学习来提升患者安全、并最大限度减少ADE对药物研发影响的研究人员提供了一个重要工具。该数据集可通过 https://github.com/ds4dh/CT-ADE 公开访问。