An important aspect in the development of small molecules as drugs or agro-chemicals is their systemic availability after intravenous and oral administration. The prediction of the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows to focus the drug or agrochemical development on compounds with a favorable kinetic profile. However, such pre-dictions are challenging as the availability is the result of the complex interplay between molecular properties, biology and physiology and training data is rare. In this work we improve the hybrid model developed earlier [1]. We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture as well as the parametrization of mechanistic model. Further, we extend our approach to predict additional endpoints and to handle different covariates, like sex and dosage form. In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained. We demonstrate this feature by predicting the exposure over the first 24h, while the model has only been trained on the total exposure.
翻译:在开发作为药物或农用化学品的小分子过程中,其经静脉和口服给药后的系统暴露量是重要考量因素。从候选化合物的化学结构预测系统暴露量极具价值,因为这能使药物或农用化学品开发聚焦于具有良好动力学特征的化合物。然而,此类预测面临挑战,因为暴露量是分子性质、生物学与生理学之间复杂相互作用的结果,且训练数据稀缺。本研究改进了前期开发的混合模型[1],通过使用更大规模数据集训练、优化神经网络架构及改进机制模型的参数化方法,将口服总暴露量的中位数倍变化误差从2.85降至2.35,静脉给药误差从1.95降至1.62。此外,我们扩展了该方法以预测额外终点并处理不同协变量(如性别和剂型)。与纯机器学习模型不同,本模型能够预测未训练过的新终点——我们通过仅训练于总暴露量的模型预测前24小时暴露量,验证了这一特性。