Adverse drug events (ADEs) are a major safety issue in clinical trials. Thus, predicting ADEs is key to developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, a dataset for multilabel ADE prediction in monopharmacy treatments. CT-ADE encompasses 2,497 drugs and 168,984 drug-ADE pairs from clinical trial results, annotated using the MedDRA ontology. Unlike existing resources, CT-ADE integrates treatment and target population data, enabling comparative analyses under varying conditions, such as dosage, administration route, and demographics. In addition, CT-ADE systematically collects all ADEs in the study population, including positive and negative cases. To provide a baseline for ADE prediction performance using the CT-ADE dataset, we conducted analyses using large language models (LLMs). The best LLM achieved an F1-score of 56%, with models incorporating treatment and patient information outperforming by 21%-38% those relying solely on the chemical structure. These findings underscore the importance of contextual information in ADE prediction and establish CT-ADE as a robust resource for safety risk assessment in pharmaceutical research and development.
翻译:药物不良事件(ADEs)是临床试验中的主要安全问题。因此,预测ADEs对于开发更安全的药物和改善患者预后至关重要。为支持这一工作,我们推出了CT-ADE,一个用于单药治疗中多标签ADE预测的数据集。CT-ADE涵盖来自临床试验结果的2,497种药物和168,984个药物-ADE对,并使用MedDRA本体进行标注。与现有资源不同,CT-ADE整合了治疗和目标人群数据,使得在不同条件下(如剂量、给药途径和人口统计学特征)进行比较分析成为可能。此外,CT-ADE系统地收集了研究人群中的所有ADEs,包括阳性和阴性病例。为了提供使用CT-ADE数据集进行ADE预测性能的基线,我们利用大语言模型(LLMs)进行了分析。最佳LLM的F1分数达到56%,其中整合了治疗和患者信息的模型比仅依赖化学结构的模型性能高出21%-38%。这些发现凸显了上下文信息在ADE预测中的重要性,并确立了CT-ADE作为药物研发中安全风险评估的可靠资源。