Breast cancer, the second most prevalent cancer among women worldwide, necessitates the exploration of novel therapeutic approaches. To target the four subgroups of breast cancer "hormone receptor-positive and HER2-negative, hormone receptor-positive and HER2-positive, hormone receptor-negative and HER2-positive, and hormone receptor-negative and HER2-negative" it is crucial to inhibit specific targets such as EGFR, HER2, ER, NF-kB, and PR. In this study, we evaluated various methods for binary and multiclass classification. Among them, the GA-SVM-SVM:GA-SVM-SVM model was selected with an accuracy of 0.74, an F1-score of 0.73, and an AUC of 0.94 for virtual screening of ligands from the BindingDB database. This model successfully identified 4454, 803, 438, and 378 ligands with over 90% precision in both active/inactive and target prediction for the classes of EGFR+HER2, ER, NF-kB, and PR, respectively, from the BindingDB database. Based on to the selected ligands, we created a dendrogram that categorizes different ligands based on their targets. This dendrogram aims to facilitate the exploration of chemical space for various therapeutic targets. Ligands that surpassed a 90% threshold in the product of activity probability and correct target selection probability were chosen for further investigation using molecular docking. The binding energy range for these ligands against their respective targets was calculated to be between -15 and -5 kcal/mol. Finally, based on general and common rules in medicinal chemistry, we selected 2, 3, 3, and 8 new ligands with high priority for further studies in the EGFR+HER2, ER, NF-kB, and PR classes, respectively.
翻译:乳腺癌是全球女性中第二常见的癌症,亟需探索新的治疗方法。为针对乳腺癌的四种亚型——“激素受体阳性/HER2阴性、激素受体阳性/HER2阳性、激素受体阴性/HER2阳性、激素受体阴性/HER2阴性”,必须抑制EGFR、HER2、ER、NF-kB和PR等特定靶点。本研究评估了多种二分类与多分类方法。其中,GA-SVM-SVM:GA-SVM-SVM模型在BindingDB数据库的配体虚拟筛选中表现最佳,准确率达0.74,F1分数为0.73,AUC为0.94。该模型成功从BindingDB数据库中识别出针对EGFR+HER2、ER、NF-kB和PR类别的配体,分别有4454、803、438和378个配体在活性/非活性及靶点预测中达到90%以上的精确率。基于筛选出的配体,我们构建了按靶点分类的树状图,旨在促进不同治疗靶点化学空间的探索。选取活性概率与正确靶点选择概率乘积超过90%阈值的配体,采用分子对接进行进一步研究。这些配体与其对应靶点的结合能范围在-15至-5 kcal/mol之间。最后,依据药物化学通用规则,分别从EGFR+HER2、ER、NF-kB和PR类别中筛选出2、3、3和8个优先级别较高的新配体,用于后续研究。