ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introduce a unified deep learning framework called ADCNet to help design potential ADCs. The ADCNet highly integrates the protein representation learning language model ESM-2 and small-molecule representation learning language model FG-BERT models to achieve activity prediction through learning meaningful features from antigen and antibody protein sequences of ADC, SMILES strings of linker and payload, and drug-antibody ratio (DAR) value. Based on a carefully designed and manually tailored ADC data set, extensive evaluation results reveal that ADCNet performs best on the test set compared to baseline machine learning models across all evaluation metrics. For example, it achieves an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under receiver operating characteristic curve of 0.9293 on the test set. In addition, cross-validation, ablation experiments, and external independent testing results further prove the stability, advancement, and robustness of the ADCNet architecture. For the convenience of the community, we develop the first online platform (https://ADCNet.idruglab.cn) for the prediction of ADCs activity based on the optimal ADCNet model, and the source code is publicly available at https://github.com/idrugLab/ADCNet.

翻译：抗体药物偶联物（ADC）凭借其精准靶向癌细胞并释放高效药物的能力，已在精准医学时代彻底改变了癌症治疗领域。然而，由于ADC结构与活性之间的关系难以理解，实现其理性设计极其困难。在本研究中，我们提出了一种名为ADCNet的统一深度学习框架，以帮助设计潜在的ADC。ADCNet高度整合了蛋白质表征学习语言模型ESM-2与小分子表征学习语言模型FG-BERT，通过从ADC的抗原和抗体蛋白序列、连接子与载荷的SMILES字符串以及药物-抗体比（DAR）值中学习有意义特征，实现活性预测。基于精心设计并手工整理的ADC数据集，大量评估结果表明，与基线机器学习模型相比，ADCNet在测试集上的所有评估指标中均表现最佳。例如，其在测试集上实现了87.12%的平均预测准确率、0.8689的平衡准确率以及0.9293的受试者工作特征曲线下面积。此外，交叉验证、消融实验及外部独立测试结果进一步证明了ADCNet架构的稳定性、先进性与鲁棒性。为方便学术界使用，我们基于最优ADCNet模型开发了首个ADC活性预测在线平台（https://ADCNet.idruglab.cn），并将源代码公开于https://github.com/idrugLab/ADCNet。