Explainable artificial intelligence approaches accelerate drug discovery by improving molecular representation learning, identifying key molecular structures, and rationalizing drug property prediction. However, developing end-to-end explainable models for target-specific structure-activity relationship modeling remains challenging because compound-protein interaction data are often limited for individual targets, and small changes in chemical substituents or local structural motifs can cause large differences in molecular properties. Therefore, effectively leveraging structural and property information to identify key moieties associated with compound-protein affinity is essential. We propose a graph neural network (GNN) framework that uses property and structural information from activity-cliff molecule pairs targeting specific proteins to predict compound-protein affinity, measured by half-maximal inhibitory concentration (IC50), and explain property differences. To improve explainability, we trained GNNs with structure-aware loss functions using group lasso and sparse group lasso regularization, which prune and highlight molecular subgraphs relevant to activity differences. We applied this framework to activity-cliff data from molecules targeting six tyrosine-protein kinases across the Src, Abl, and Tec families, as well as anaplastic lymphoma kinase. Integrating common- and uncommon-node information with sparse group lasso improved target-specific molecular property prediction, producing lower root mean square errors and higher Pearson correlation coefficients. Regularization also enhanced GNN feature attribution by improving graph-level global direction scores and atom-level coloring accuracy. These results support more interpretable drug discovery pipelines, particularly for identifying critical molecular substructures during lead optimization.
翻译:暂无翻译